
Using Spotify data to predict what songs will be hits - lelf
https://techxplore.com/news/2019-09-spotify-songs.html
======
glifchits
Did I miss something or does this project include popularity measures as
_features_?

In the section on dataset features, they include "popularity" (calculated by
Spotify) as well as Billboard chart stats like weeks, rank, and a custom-made
"score". To me it's not clear whether these features were hidden from the
train/test sets or whether the popularity features were only used in their
"artist past performance" measures.

If they included these popularity features, it's like asking "can we predict
whether a song is a hit just by looking at how popular it is?" If it is the
case that they peeked into the future and observed ex-post song popularity,
obtaining just 89% accuracy hints at how unpredictable song success truly is.
Check out [1] for a famous study of song success which experimentally
demonstrates the unpredictability of song success.

[1] Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental Study
of Inequality and Unpredictability in an Artificial Cultural Market. Science,
311(5762), 854–856.
[https://doi.org/10.1126/science.1121066](https://doi.org/10.1126/science.1121066)

~~~
Pils
From the paper:

>To extend previous work, in addition to audio analysis features, we consider
song duration and _mine an additional artist past-performance feature. Artist
past-performance for a given song represents how many prior Billboard hits the
artist has released before that track’s release date_

emphasis mine.

I wonder how accurate a model using this feature alone would be.

~~~
glifchits
Right, this sentence made it unclear to me whether they only used the
popularity features to compute past-performance, or whether they included
past-performance in addition to other popularity features.

To your question, other work on success prediction of tweets [1, 2]
demonstrates that past-performance is indeed much more predictive than the
typical content features. This way of looking at success of "cultural
products" assumes it depends to varying extents on both inherent "quality"
(measured by content features), and the social processes of sharing (which are
much harder to understand ahead of time, as the paper I referenced in my
parent post shows).

[1] Martin, T., Hofman, J. M., Sharma, A., Anderson, A., & Watts, D. J.
(2016). Exploring Limits to Prediction in Complex Social Systems. Proceedings
of the 25th International Conference on World Wide Web - WWW ’16, 683–694.
[https://doi.org/10.1145/2872427.2883001](https://doi.org/10.1145/2872427.2883001)

[2] Bakshy, E., Hofman, J. M., Mason, W. A., & Watts, D. J. (2011). Everyone’s
an Influencer: Quantifying Influence on Twitter. Proceedings of the Fourth ACM
International Conference on Web Search and Data Mining - WSDM ’11, 65.
[https://doi.org/10.1145/1935826.1935845](https://doi.org/10.1145/1935826.1935845)

------
mrguyorama
Hah, my friend and I did nearly the exact same project in college, though
minus the publication. We had an open ended project for an intro to machine
learning class we were taking.

We ended up using the million song dataset, because I'm not sure Spotify gave
out this data six years ago, which includes various info about roughly a
million songs including artist, length, and supposedly Echonest api results
for things like "dancyness". We then merged this with a list of something like
250k results of play counts. We then found out the Echonest data was quite
literally all just set to null, so I went out to their api, signed up for a
developer key, and spent six days querying to fill out our dataset.

We were massive novices to machine learning, so we basically were just script-
kiddying it, and pretty much none of the models we made over a 24ish (because
we were dumb college students doing things last minute) period had any
significant accuracy. Finally we made a random forest model that was able to,
with 80% accuracy, predict the "magnitude" of plays, ie roughly whether a song
would get a million plays or a thousand.

When we broke it down (model explainability is an awesome feature) we found
that out of everything interesting we had done with feature investigation and
data cleaning etc, the model was about 90% based on which artist made the
song. In retrospect, that makes sense, in a sort of cynical way; even a great
song by an unknown artist rarely makes it big. The moral of the story I guess
is that machine learning isn't magic

I still have all the data, and I've been meaning to revisit it now that I
actually have a better understanding of the field. It's on my list of things
to revisit/do, a very long list

~~~
data4lyfe
"90% based on which artist made the song"

Doesn't that demonstrate that the actual business case of producers
discovering new artists doesn't even factor into the model's case of
discovering which new songs will actually be hits.

It seems like the former is much harder than the latter in this case.

~~~
dwd
I think you will find is that the process of an artist being discovered is
basically the same as getting into YC.

It's about the artist and whether they have star quality and are saleable.
There are plenty of song writers to write the actual songs.

Classic case is someone like Sia Furler who has written a ton of hits for
other artists.

[https://time.com/4209769/sia-best-songs-written-for-other-
ar...](https://time.com/4209769/sia-best-songs-written-for-other-artists-
ranked/)

------
bo1024
An interesting '06 study. Participants were divided into groups and given
access to obscure online songs. Some groups could see number of prior
downloads.

Song popularity in different groups was very weakly correlated. Some of the
same songs would sometimes be hits and sometimes busts for no apparent reason.

In other words, even song popularity (in an i.i.d. trial) doesn't predict song
popularity.

[https://www.nytimes.com/2006/02/14/health/in-music-others-
ta...](https://www.nytimes.com/2006/02/14/health/in-music-others-tastes-may-
help-shape-your-own.html)

------
zwkrt
It’s crazy how much sway Spotify and other streaming apps have over the future
of what music is popular. Radio pop hits aside, I know lots of people who’s
music preference is something along the lines of “a few bands I like and their
associated Spotify recommendations”. Music preferences and trends have always
been culturally motivated both through grassroots word of mouth, and top-down
by those creatively in charge at record labels.

But now with algorithmic suggestions there is a third trend-motivator, which
is music listeners and music producers reflecting on the overall system for
statistical trends and correlations. I expect this motivator to push out the
other two because it makes labels more (and safer) money and consumers seem to
like it. The conditions for success are now highly metricked and quickly A/B
testable. It’s like the internet before and after PageRank—and just the same,
I would expect a cottage industry of “SEO” to pop up for the music industry.

~~~
holy_city
I'll take algorithms over payola. At least they suggest things to me that I'd
like to hear.

~~~
bsder
Really? Ed Sheeran is so good that he had 5 of the top 10 hits simultaneously?

If that's the end result, I think we need to bring back payola.

~~~
holy_city
meh? Charts have always been poor representations of musical
quality/popularity. I use spotify because the algorithms suggest me good music
based on my taste, and despite the weird polarity of it they always find new
music that interests me.

I have no problem with Spotify's algorithms. They do a very good job with a
very difficult problem.

------
minimaxir
Per the paper, in regards to fixing the data set imbalance:

> In order to balance our data, we randomly sampled 12,000 non-hits from the
> Spotify data and created a new dataset. This dataset contained approximately
> 12k non-hits and 12k hits (∼24k tracks total).

Won't a indiscriminate random sampling of the non-hits introduce a temporal
sampling bias, as a) music trends change over time and b) music output is not
equal across years?

~~~
triggercut
I was thinking along the same lines but in general.

How do you account for the linearity of humans judging against and liking
things that we've heard before, excluding the things we haven't heard from the
future?

Do we trust that SVM/NN account for it somehow?

Is there a way to limit corpora or have a point in time training set i.e. only
against inputs that precede it.

Of course this could just be a fine tuning knob against the other temporal
style/fashion trends you would expect in the data-set.

------
jjuhl
So, more motion towards ever more boring music in bulk and more
marginalisation of quirky, weird, different, interesting, intelligent music.
Yeah, progress. ️️:-(

------
bitforger
Step 2: Train a GAN to try to fool the classifier: instant hit generator?

~~~
asdfman123
Check the new hit by artist Justin Timberbrook: Baby You!

Lyrics

\------------------------------

Hey baby you \ Summer sun beatin' down \ I texted my ex the other night \
Going down to the bar to see if I can hook up

<1 second burst of static>

I like the way you shake it \ (Shake it, shake it, shake it) \ Got dolla dolla
bills in area codes \ Yeah

<Sirens and machine gun noises>

~~~
sushisource
Not going to lie, you got me. Thought that was real until the last line.

