The Partiest Song

Our group set out to find out what features made a party song ideal for a party and if there was a song that was the best for a party. In order to find data, we took data from six different Spotify playlists made up of various party songs. We believed that Spotify would provide a fair sample of songs due to the abundance of younger users - who enjoy partying - and ability to create personalized playlists. After creating a playlist compiling all of the songs, we were immediately met with repeats. Given this, we chose to include duplicate songs in our dataset, as we believed that if they appeared multiple times, they were desirable party songs. Therefore, their attributes would be taken into account multiple times as we analyzed the data, giving the repeated songs more value. The next step was to scrape Spotify for metrics regarding each song; each metric is calculated based on various features and attributes of each song that give it a value within Spotify’s data network.

Spotify Data Meanings

Valence 🐱

Valence describes the musical positiveness conveyed by a track. Tracks with high valence sound more positive (happy, cheerful, euphoric), while tracks with low valence sound more negative (sad, depressed, angry).

Danceability 🕴️

How suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

Energy 🤨

Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

Mode 😊

Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

Key Mode 😗

Basically the same thing as mode but with the whole mode and key displayed.

Loudness 😡

The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.

Tempo 🙁

The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

Liveness 🎉

Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

Time Signature 👏

An indication of rhythm shown by a fraction with the denominator defining the beat as a division of a whole note, and the numerator giving the number of beats in each bar.

Acousticness 😂

A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

Speechiness 😀

Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words.

Factors We Left Out

Some features we decided to leave out were acousticness, instrumentalness, liveness, and mode. The data that we found for each of these was either not strong enough to factor into our research, or too broad. Acousticness is measured from 0.0 to 1.0 and these songs mostly having a high danceability and energy makes them less likely to be acoustic. We decided that whether or not a song is acoustic is not a quality that defines a party song. Liveness is how likely the track was to be recorded in front of a live audience with a value of 0.8 having a high confidence that the track was live. Again, we found that this metric was not significant in denoting what makes a party song a party song, as only five of our songs had a value of 0.8 or higher, which would exclude over 99% of our data. Both instrumentalness and mode were also not taken into consideration due to the fact that many of our songs are electronic, and the key was being taken into account in other analyses, making mode alone fairly unimportant.

Factors We Used

In order to determine what factors would be useful to analyze, we made scatter plots of each metric to see if there were any trends that were observable in the data.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(tidyr)
library(readr)
library(knitr)

horowitz_horowitz <- read_csv("horowitz - horowitz.csv")

## New names:
## • `key...37` -> `key`

## Rows: 1068 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): track.name, key_name, key_mode
## dbl (14): danceability, energy, loudness, mode, speechiness, acousticness, l...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

party<-horowitz_horowitz

party %>% 
  ggplot(aes(track.name,energy))+geom_point(color="#E60000")+
  theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

party %>% 
  ggplot(aes(track.name,danceability))+geom_point(color="#FFC24C")+
  theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

party %>% 
  ggplot(aes(track.name,loudness))+geom_point(color="#FFDC33")+
theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

party %>% 
  ggplot(aes(track.name,speechiness))+geom_point(color="#2F8501")+
  theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

party %>% 
  ggplot(aes(track.name,valence))+geom_point(color="#4D4DFF")+
  theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

party %>% 
  ggplot(aes(track.name,tempo))+geom_point(color="#9966FF")+
  theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

party %>% 
  ggplot(aes(track.name,track.duration_ms))+geom_point(color="#FF33CC")+
  theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

Survivorship Bias

party %>% 
  ggplot(aes(track.name,track.popularity,color=as.factor(track.album.release_date)))+geom_point()+theme(axis.title.x = element_blank(),axis.text.x = element_blank(),axis.ticks.x = element_blank())

While analyzing our data, we found that our data had a clear example of a survivorship bias. We found that the songs that have survived from the 70s and 80s have a high track popularity because they’re the songs that people have heard a million times and really like. This is called survivorship bias because if you were to look at a party playlist from the early 2000s, there probably would’ve been a lot more songs from those decades with lower popularity.

Normal Distributions

To get a better visualization of our data, we created normal distributions of our data to see the mean, median, peaks, and how any outliers may have affected the results.

party %>% 
  ggplot(aes(energy))+geom_density(kernel="gaussian")

summary(party$energy)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1270  0.6290  0.7395  0.7300  0.8480  0.9890

The energy levels range based on the song’s intensity and activity but most are above a 0.3. We can see the outliers at the bottom that ranked lower than a 0.2 and these are both Sweet Caroline. The average energy level is about 0.7.

party %>% 
  ggplot(aes(danceability))+geom_density(kernel="gaussian")

summary(party$danceability)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3520  0.6570  0.7320  0.7299  0.8220  0.9860

The metric that Spotify uses to gauge a song’s “danceability” is based on their measurements of tempo, rhythm stability, beat strength, and overall regularity. The closer to 1 the more “danceable” a song is. From our graph here you can see that the danceability is very spread out, but there is a slight clustering from .6 to .9. This lets us know that a majority of the songs are relatively danceable.

party %>% 
  ggplot(aes(tempo))+geom_density(kernel="gaussian")

summary(party$tempo)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   60.01  104.02  125.02  122.93  130.06  207.50

The tempo is measured by the speed of the song and the graph shows a clear line at around 125. An average mid-tempo song is 90-105 bpm and based on the graph, these songs surpass this average by having an average median of 125.02.

party %>% 
  ggplot(aes(loudness))+geom_density(kernel="gaussian")

summary(party$loudness)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -18.064  -6.463  -5.349  -5.480  -4.125  -0.930

Since many of these songs have a high BPM and overall high danceability, they mostly fall above -10 in the loudness feature. The closer to 0, the louder the song. The song with the lowest ranking in loudness was Africa by TOTO.

party %>% 
  ggplot(aes(speechiness))+geom_density(kernel="gaussian")

summary(party$speechiness)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0253  0.0491  0.0810  0.1301  0.1732  0.5650

Speechiness measures the presence of words in a song. The graph shows that a majority of party songs obtain a lot of speech. This can be inquisitive because it shows a good party song has more words in it. The song with the most amount of words spoken is Splashin by Rich the Kid. The song with the least amount of words spoken is When Love Takes Over by David Guetta and Kelley Rowland. Values between 0.33 and 0.66 are tracks that contain both music and speech.

party %>% 
  ggplot(aes(valence))+geom_density(kernel="gaussian")

summary(party$valence)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0348  0.4250  0.5935  0.5801  0.7602  0.9720

Valence measures the positiveness of a song based on the tempo rather than lyrics. In some instances, a song has negative lyrics with a positive beat and that would count the song as a higher valence even if it sounds negative.

party %>% 
  ggplot(aes(track.duration_ms))+geom_density(kernel="gaussian")

summary(party$track.duration_ms)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   96825  193200  217867  218675  238946  449160

party %>% 
  group_by(key_mode) %>% 
  count() %>% 
  print(n=25) %>% 
  kable()

## # A tibble: 24 × 2
## # Groups:   key_mode [24]
##    key_mode     n
##    <chr>    <int>
##  1 A major     28
##  2 A minor     17
##  3 A# major    15
##  4 A# minor    69
##  5 B major     43
##  6 B minor     75
##  7 C major     87
##  8 C minor     29
##  9 C# major   143
## 10 C# minor    55
## 11 D major     78
## 12 D minor     18
## 13 D# major    19
## 14 D# minor    10
## 15 E major     14
## 16 E minor     35
## 17 F major     23
## 18 F minor     55
## 19 F# major    21
## 20 F# minor    57
## 21 G major     69
## 22 G minor     32
## 23 G# major    54
## 24 G# minor    22

key_mode	n
A major	28
A minor	17
A# major	15
A# minor	69
B major	43
B minor	75
C major	87
C minor	29
C# major	143
C# minor	55
D major	78
D minor	18
D# major	19
D# minor	10
E major	14
E minor	35
F major	23
F minor	55
F# major	21
F# minor	57
G major	69
G minor	32
G# major	54
G# minor	22

Finding the Partiest Song (Based on Our Data)

Using our data values and averages, we set filters to find the song that fit all criteria to name it the partiest song.

party %>% 
  filter(key_mode=="C# major") %>% 
  filter(energy>0.70) %>% 
  filter(energy<0.75) %>% 
  filter(danceability>0.70) %>% 
  filter(danceability<0.75) %>% 
  filter(tempo>120) %>% 
  filter(tempo<125) %>% 
  filter(loudness<(-5.4)) %>% 
  filter(loudness>(-5.5)) %>% 
  filter(speechiness>1.3) %>% 
  filter(speechiness<1.4) %>% 
  filter(valence>0.59) %>% 
  filter(valence<0.6) %>% 
  filter(track.duration_ms>218500) %>% 
  filter(track.duration_ms<218700)

## # A tibble: 0 × 17
## # ℹ 17 variables: track.name <chr>, danceability <dbl>, energy <dbl>,
## #   loudness <dbl>, mode <dbl>, speechiness <dbl>, acousticness <dbl>,
## #   liveness <dbl>, valence <dbl>, tempo <dbl>, time_signature <dbl>,
## #   track.duration_ms <dbl>, track.popularity <dbl>,
## #   track.album.release_date <dbl>, key <dbl>, key_name <chr>, key_mode <chr>

However, these filters did not come up with a song. This is because it is impossible to have every attribute contained within one song. If a song had a high danceability, it would have a low speechiness value due to the breaks within the song. If a song contained both speechiness and danceability, it would be far too long to be desirable as a party song. These mutually exclusive metrics prevent one song from becoming the purest, most perfect party song.

Findings

We found track popularity as a metric to be a bit biased and unreliable when trying to narrow down what makes a party song good. Because of how Spotify both measures popularity, as well as their (tampering?) of data such as with Lady Gaga. We found a discrepancy with Lady Gaga's overall popularity being at 0, and figured this was due to the popularity of the song title LADY GAGA by Peso Pluma. Another factor that we need to take into account with interpreting this data is the tempo graph. The graph has a double peak, one being around 100 and the other being around 130. This could be due to the counting of half time vs full time.

While finding the ultimate party song is not something we found to be possible due to contradicting factors, we have been able to narrow down a prevalent artist who seems to be a powerful force in the party scene. Pitbull, Mr. Worldwide himself. From the data we gathered, we had four songs that appeared on every playlist we sampled. Two of these four were by Pitbull, his songs Hotel Room Service, and Time of Our Lives. He was the most recurring artist in our dataset overall.

Final Project

2023-12-11