In the section on dataset features, they include "popularity" (calculated by Spotify) as well as Billboard chart stats like weeks, rank, and a custom-made "score". To me it's not clear whether these features were hidden from the train/test sets or whether the popularity features were only used in their "artist past performance" measures.
If they included these popularity features, it's like asking "can we predict whether a song is a hit just by looking at how popular it is?" If it is the case that they peeked into the future and observed ex-post song popularity, obtaining just 89% accuracy hints at how unpredictable song success truly is. Check out  for a famous study of song success which experimentally demonstrates the unpredictability of song success.
 Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science, 311(5762), 854–856. https://doi.org/10.1126/science.1121066
>To extend previous work, in addition to audio analysis features, we consider song duration and mine an additional artist past-performance feature. Artist past-performance for a given song represents how many prior Billboard hits the artist has released before that track’s release date
I wonder how accurate a model using this feature alone would be.
To your question, other work on success prediction of tweets [1, 2] demonstrates that past-performance is indeed much more predictive than the typical content features. This way of looking at success of "cultural products" assumes it depends to varying extents on both inherent "quality" (measured by content features), and the social processes of sharing (which are much harder to understand ahead of time, as the paper I referenced in my parent post shows).
 Martin, T., Hofman, J. M., Sharma, A., Anderson, A., & Watts, D. J. (2016). Exploring Limits to Prediction in Complex Social Systems. Proceedings of the 25th International Conference on World Wide Web - WWW ’16, 683–694. https://doi.org/10.1145/2872427.2883001
 Bakshy, E., Hofman, J. M., Mason, W. A., & Watts, D. J. (2011). Everyone’s an Influencer: Quantifying Influence on Twitter. Proceedings of the Fourth ACM International Conference on Web Search and Data Mining - WSDM ’11, 65. https://doi.org/10.1145/1935826.1935845
We ended up using the million song dataset, because I'm not sure Spotify gave out this data six years ago, which includes various info about roughly a million songs including artist, length, and supposedly Echonest api results for things like "dancyness". We then merged this with a list of something like 250k results of play counts. We then found out the Echonest data was quite literally all just set to null, so I went out to their api, signed up for a developer key, and spent six days querying to fill out our dataset.
We were massive novices to machine learning, so we basically were just script-kiddying it, and pretty much none of the models we made over a 24ish (because we were dumb college students doing things last minute) period had any significant accuracy. Finally we made a random forest model that was able to, with 80% accuracy, predict the "magnitude" of plays, ie roughly whether a song would get a million plays or a thousand.
When we broke it down (model explainability is an awesome feature) we found that out of everything interesting we had done with feature investigation and data cleaning etc, the model was about 90% based on which artist made the song. In retrospect, that makes sense, in a sort of cynical way; even a great song by an unknown artist rarely makes it big. The moral of the story I guess is that machine learning isn't magic
I still have all the data, and I've been meaning to revisit it now that I actually have a better understanding of the field. It's on my list of things to revisit/do, a very long list
Doesn't that demonstrate that the actual business case of producers discovering new artists doesn't even factor into the model's case of discovering which new songs will actually be hits.
It seems like the former is much harder than the latter in this case.
It's about the artist and whether they have star quality and are saleable. There are plenty of song writers to write the actual songs.
Classic case is someone like Sia Furler who has written a ton of hits for other artists.
Song popularity in different groups was very weakly correlated. Some of the same songs would sometimes be hits and sometimes busts for no apparent reason.
In other words, even song popularity (in an i.i.d. trial) doesn't predict song popularity.
But now with algorithmic suggestions there is a third trend-motivator, which is music listeners and music producers reflecting on the overall system for statistical trends and correlations. I expect this motivator to push out the other two because it makes labels more (and safer) money and consumers seem to like it. The conditions for success are now highly metricked and quickly A/B testable. It’s like the internet before and after PageRank—and just the same, I would expect a cottage industry of “SEO” to pop up for the music industry.
It's particularly concerning because there did a study that showed in different control groups of music listeners, new songs became popular just based on which ones became popular earlier.
I mean, try this trick on yourself: find a random song and listen to it 20 times. Odds are you'll like it a lot better after the first several listens than the first time.
If in addition to that all your friends happened to know and like the song, and they started advertising the artists heavily, bam! It's a hit. It's just groupthink, and we're all susceptible to it.
If that's the end result, I think we need to bring back payola.
I have no problem with Spotify's algorithms. They do a very good job with a very difficult problem.
As an aside, and speaking of the nonmusical elements that make a song popular, three other things that do it for me are when I heard that particular song (huge fan of Garth Brooks' "Callin' Baton Rouge" because it was what my host parents played on a fantastic summer I spent on exchange), and the CD cover or other imagery that's associated with the artist (it actually makes me sad to delete humdrum songs from my library if they have fantastic album covers) are two. The third is the subsection of songs that I grew up with, which is different than those from the summer exchange situation above because these are emotionally tied to a time in my life when everything was fresh and exciting; they were the theme not just songs but entire albums running through the background of my youth. Objectively speaking, there's nothing too special about Oasis or Weezer or ATB or Robert Miles or Darude or Aqua but that they found me in a particularly impressionable time in my life. Simon and Garfunkel are in this gathering as well, but it's just sheer luck that my friends were turning hipster and so I was exposed. They were actually what caused me to spring off from the popular tunes of the day into more timeless classics : )
> In order to balance our data, we randomly sampled 12,000 non-hits from the Spotify data and created a new dataset.
This dataset contained approximately 12k non-hits and 12k hits (∼24k tracks total).
Won't a indiscriminate random sampling of the non-hits introduce a temporal sampling bias, as a) music trends change over time and b) music output is not equal across years?
How do you account for the linearity of humans judging against and liking things that we've heard before, excluding the things we haven't heard from the future?
Do we trust that SVM/NN account for it somehow?
Is there a way to limit corpora or have a point in time training set i.e. only against inputs that precede it.
Of course this could just be a fine tuning knob against the other temporal style/fashion trends you would expect in the data-set.
Hey baby you \ Summer sun beatin' down \ I texted my ex the other night \ Going down to the bar to see if I can hook up
<1 second burst of static>
I like the way you shake it \ (Shake it, shake it, shake it) \ Got dolla dolla bills in area codes \ Yeah
<Sirens and machine gun noises>