
Building a Music Recommender with Deep Learning - myautsai
http://mattmurray.net/building-a-music-recommender-with-deep-learning/
======
dnadler
Very cool! One minor nitpick -- the author mentions that this is 'completely
unsupervised'. It's true that the author didn't need to manually classify the
data, but someone did.

So, I believe that this is actually supervised learning, as the author is
training a classifier on preexisting labels (the genres).

I believe that unsupervised learning would not make use of a target variable
at all. If the network architecture terminated at the fully connected layer,
and then propagated that layer backwards to reconstruct the input (something
like Contrastive Divergence), that would be an unsupervised method.

~~~
rahimnathwani
You're correct of course. But it's cool that you can learn a useful embeddings
(in this case into a 128-dimensional space) with only relative few (in this
case 9) binary labels.

~~~
mkorfmann
I'd love to see an analysis of what exactly these embeddings represent
concerning the musical features of a sound snippet.

------
oscii
In my opinion, the results are not quite exciting as they might seem like at
the first glance. The hip-hop and minimal house classification perform almost
randomly (the random classifier would have accuracy of 50%). The claim of
music genre subjectivity is not fully appropriate for the categories used in
this work: the presented genres are quite distinct, and they have objective
differences. Knowing only BMP and rhythm structure of the tracks would be
sufficient to classify most of the mentioned genres. Also, the article lacks
of critical analysis of the results. The network may not have learned to
analyze structural properties of the music; if this is true, than what is it
classifying exactly? An averaged spectral envelope or spectral distribution?
In this case the network will fail if you feed a filtered music piece into it.
There is a nice paper on issues like these called “A Simple Method to
Determine if a Music Information Retrieval System is a Horse”, you may want to
check it out:
[https://www.researchgate.net/publication/265645782](https://www.researchgate.net/publication/265645782)

I understand this is an educational project, but nevertheless it's published,
hence open for critics ;)

Edit: small style corrections.

~~~
mattmcknight
"The hip-hop and minimal house classification perform almost randomly (the
random classifier would have accuracy of 50%). " You are assuming that this is
a series of binary classifiers. It is multiclass classification, so the base
rate for nine classes is 11%.

~~~
1r3n1cus
If the classes are balanced, that is. Without knowing the distribution of the
classes it is difficult to understand if the result is good or not.

~~~
redsparrow
The author downloaded 1000 tracks from each class, so they are evenly
distributed.

------
amelius
> It did a really good job classifying trance music while at the other end of
> the scale was hip hop / R&B with 61%, which is still almost 6 times better
> than randomly assigning a genre to the image. I suspect that there’s some
> crossover between hip hop, breakbeat and dancehall and that might have
> resulted in a lower classification accuracy.

The first step to analyze this is to make a confusion matrix, [1]. It would be
nice if the article included it.

[1]
[https://en.wikipedia.org/wiki/Confusion_matrix](https://en.wikipedia.org/wiki/Confusion_matrix)

------
madmax108
This is interesting, but fairly easy to confuse. Esp. would be interesting to
see what results come up when you use modified "artistic" spectographs like
that of Windowlicker by Aphex Twin [1]. One thing I've learned from years of
having worked with audio and images is that image representations of audio are
horrible representations of it (other than for temporal changes).

The results are good though! Good work! :D

[1] [http://twistedsifter.com/2013/01/hidden-images-embedded-
into...](http://twistedsifter.com/2013/01/hidden-images-embedded-into-songs-
spectrographs/)

~~~
cubano
> ...image representations of audio are horrible representations of it (other
> than for temporal changes).

Yes, and thus the reason why the classifier was so good at recognizing
trance...it's one of the few genres that locks in at around 144bpm.

~~~
Drdrdrq
What would be a better representation?

------
matchagaucho
The greatest value of a music recommendation engine, IMO, is cross-genre
discovery.

The history of recording industry "Genres" has close ties to cultural
segregation. Pandora's Music Genome approach is optimized to break the genre
barrier.

It'd be interesting to see how many "Down tempo" songs shared characteristics
with "R&B", for example. I think the Author's approach could still be applied.

------
mattmurray
Wow thanks for sharing + reading my blog post! I did this for my final project
on the Data Science bootcamp at Metis [1] this spring.

[1] [https://www.thisismetis.com/](https://www.thisismetis.com/)

~~~
strgrd
This is a really cool project. The hardest part of DJing is knowing which set
of songs have similar sonic profiles, and would mix well together. I would
love to see this put to use in personal music collections, or in a Traktor
playlist, and be able to sort songs by their similarity.

------
amelius
> Wouldn’t it be cool if you could discover music that was released a few
> years ago that sounds similar to a new song that you like?

Perhaps. But of course, this is likely to put the user literally into an "echo
chamber" :)

~~~
JKCalhoun
Yeah, sorry, old-timer here. I loathe "genres" and strip them off my purchased
music.

Is R.E.M. "Alternative", "Rock", "College"? Maybe you consider an album like
"Reckoning" from R.E.M. "Rock" but then it includes a track like "Rockville"
that is perhaps "Country"?

Genre makes sense for "Soundtrack" or perhaps "Classical"? But beyond that
it's just mental gymnastics.

And given how fondness for music is qualitative, I've always been suspect of
any sort of algorithm that tries to recommend music based on fast-Fourier-
transforms. Maybe AI isn't for everything....

------
StreamBright
Site is overloaded, cached version

[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://mattmurray.net/building-
a-music-recommender-with-deep-learning/)

------
ollin
Highly recommended further reading:

[http://benanne.github.io/2014/08/05/spotify-
cnns.html](http://benanne.github.io/2014/08/05/spotify-cnns.html)
(Recommending music on Spotify with deep learning) uses CNNs trained on
spectrograms + similarity data from collaborative-filtering to predict per-
song vectors.

------
CuriouslyC
Interesting. You didn't specify, I'm guessing you did 3x3 convolutions on the
spectrographs? Also, how did you choose the convolution size, number of
conv/pooling layers, etc? Did you consider asymmetric convolution/pooling
layers to account for the differences between the frequency and time
dimensions?

There are a number of interesting directions you could go with that data set.
One interesting possibility is to make a convolutional autoencoder, then use
that to apply "deep dreaming" filters to music. Another interesting evolution
would be to handle the frequency dimension using a 1D convolution, and run a
RNN on top of that to deal with time.

------
halflings
Very cool post! :) "Simple" method (good ol' spectrograms, and something
people can realistically actually reproduce without requiring a GPU farm), and
great results!

------
nl
This is interesting.

My first thought was to wonder how a LSTM would do. Once might think it would
be a better representation for music? There's some models which use
convolutional layers along with a LSTM for video representation (eg [1]) and
it would be interesting to see if convolutions are useful for capturing
similar themes of music.

I wonder if one could build a music embedding (word2vec style) and use
similarities in the embedding space as recommendations? The obvious objective
function would be skip-gram, but there might be more interesting objectives
there too.

[1] [https://github.com/loliverhennigh/Convolutional-LSTM-in-
Tens...](https://github.com/loliverhennigh/Convolutional-LSTM-in-Tensorflow)

~~~
Matumio
An architecture like WaveNet could also be interesting here:
[https://deepmind.com/blog/wavenet-generative-model-raw-
audio...](https://deepmind.com/blog/wavenet-generative-model-raw-audio/) (HN
thread:
[https://news.ycombinator.com/item?id=12455510](https://news.ycombinator.com/item?id=12455510))

------
stcredzero
Music recommendation is a relatively easy problem on one level, and a huge
problem on another. If you are recommending music to a neophyte of a certain
genre, we've clearly been able to do this for awhile in a way that has real
value. But if you're trying to recommend music for someone who is an
expert/aficionado of a certain genre, this inevitably annoys that sort of
person. For the 2nd type of recommendation, it's hard to provide results of
actual interest. Instead, you wind up getting recommendations for pale
imitations of things you like. The 2nd problem might require something close
to hard sentient AI to accomplish.

------
mordredklb
This is pretty cool. Maybe I'm missing something, but what's the point in the
initial genre training?

He's taking 185000 samples, and finding similar "looking" samples elsewhere in
other songs, and then making recommendations based on that. I don't see what
that could possibly have to do with genre labels, unless we're under the
assumption that finding a match between a Drum & Bass song and one that seems
similar with a tag of Trance is somehow a bad match? (which very well could be
the case, but seems like a big assumption to make off the bat)

Are these recommendations silo'd to the current genre or are they allowed to
span genres?

------
unityByFreedom
LeCun recently griped about this topic w/rt classical music on GooglePlay,

[https://www.facebook.com/photo.php?fbid=10154605399547143](https://www.facebook.com/photo.php?fbid=10154605399547143)

> Don't you guys realize that putting everything from Monteverdi to Bach,
> Mozart, Beethoven, Brahms, Moussorgsky, Stravinsky, and Bernstein in the
> same "Classical" bucket makes no sense?

> (Particularly when you have ultra fine-grained categories for popular
> music!)

Any comments about that?

------
make3
That's not how you build a recommendation engine... You build a recommendation
engine by creating an embedding from each song from which user prefers them,
as you would for words in word to vec. This is how Amazon and Youtube do it.

[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf)

~~~
krosaen
Couldn't you view the output of the last layer of the convnet used as the
embedding in this case? Yes, this was a different approach than leveraging
user preferences, but I don't see why this is inherently the wrong approach.

------
personjerry
My understanding of convolutions is that it's a way of extracting patterns
from images. To convert audio into an image and then create convolutions from
that seems... convoluted, if you will. I imagine a better way would be to
think of what the equivalent of a convolution would be in the audio space?
I.e. noise detection, treble/bass filters, etc.?

~~~
Matumio
Convolution is generic signal-processing. It's quite common to use a one-
dimensional convolution for audio filters, it would work perfectly fine as a
bass filter for example.

However, 2D conv+maxpool is an image processing technique that gets you
translation invariance. Fine for the time dimension of the spectrogram, but
rather dubious for the frequency axis. Surely you'd want to distinguish if
some feature happens at a high or low frequency?

~~~
ssalazar
> Fine for the time dimension of the spectrogram, but rather dubious for the
> frequency axis.

MFCCs[1] are exactly that, a type of convolution along the frequency axis of a
Fourier transform, and are highly apt features for music classification tasks.

It makes sense if you think of timbre as a time-varying relationship between
the harmonics of a single pitch; translation invariance along the frequency
axis can tell you that you there are partials typical e.g. of a guitar or of a
flute, without caring what particular pitch those instruments are playing. And
timbre is a bigger source of variety in popular music than e.g. the particular
notes used.

[1] [https://en.wikipedia.org/wiki/Mel-
frequency_cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)

------
Froyoh
I'm having issues scrolling on your site

~~~
muglug
The page uses [https://github.com/galambalazs/smoothscroll-for-
websites](https://github.com/galambalazs/smoothscroll-for-websites), which is
terrible IMO. It hijacks left/right swipe in Chrome, for no real benefit.

------
visarga
This music CNN classifier could be used to match songs that mix (transition)
well together, having similar textures.

~~~
j_s
Within one song: _Infinite Gangnam Style_ |
[https://news.ycombinator.com/item?id=4709472](https://news.ycombinator.com/item?id=4709472)

Within one arbitrary song (Inifinite Jukebox - no longer working?):
[http://labs.echonest.com/Uploader/index.html](http://labs.echonest.com/Uploader/index.html)

[https://www.reddit.com/r/infinitejukebox/comments/4cmr4f/met...](https://www.reddit.com/r/infinitejukebox/comments/4cmr4f/meta_on_october_1_you_will_no_longer_be_able_to/)

------
wellboy
Why trying to do it via A.I.?

Why not checking which are the top 3 most played songs by other users who are
the 1000 users who have the most similarity with the current user, and then
recommend the current user the most played songs from the 1000 similar users
that the current user has not listened to yet.

As far as I can see this would be superior to any existing A.I. recommendation
algorithm.

~~~
halflings
What you're describing is also an "A.I". It's called collaborative filtering,
and your algorithm (picking top 3 of the 1000 most similar users) would give
results heavily biased towards popular songs, there are better approaches in
that field.

~~~
wellboy
Yes, all algorithms are A.I. in that case.

My 1 min effort description would be biased towards popular songs, but you can
easily change that by selecting songs that are not popular, but that occupy a
lot of playtime with a user.

------
johnlbevan2
Warning: this comment has little to do with the article, beyond being a rant
on the approach taken by all recommendation engines I've seen.

This an interesting approach, but the objective is similar to most
recommendation engines: "Find me something similar to something I like".
Sometimes that's a good requirement (e.g. when trying to queue up the next
song in a playlist, it's good to have some similarity to the song you're
currently listening to). However, when trying to discover new music it's
generally a bad approach; since (depending how the requirement is tackled)
you'll get recommendations that tend towards some median; i.e.:

    
    
      - Other songs by the same artist
      - Songs by artists who have collaborated with the current artist
      - Popular songs (i.e. if almost everyone has a Beetles album in their playlist, getting "people who bought this also bought" recommendations for anything would list Beetles, since technically that's true; it's just uninteresting.
      - Songs in the same genre
      - Songs with a similar sound / structure
    

i.e. it tends to list things which you're likely to be aware of anyway. Also
this means you'll get lots of songs with little variety between them; making
your playlists monotonous.

What I'd be really interested in seeing was an engine which finds things on
the peripheral; i.e. figures out the things that are likely to appeal to you
because of the more unique things you're interested in; or the popular things
that you dislike. That way you're likely to get a more eclectic mix of
suggestions, and broaden your musical awareness. This would likely produce a
lot more false positives initially, as it's expanding your taste range rather
than narrowing in on some "ideal" average, so may stray into unknowns; but
once you've heard and rated something in this new area, that data can quickly
feedback into the algorithm and thus you learn of things you'd previously
never have discovered.

~~~
flashman
> Popular songs (i.e. if almost everyone has a Beetles album in their
> playlist, getting "people who bought this also bought" recommendations for
> anything would list Beetles

I've been learning recommendation engines by looking at peoples' Steam games
libraries.

One feature of the data set is that many, many people own multiple versions of
Counter-Strike as well as Team Fortress 2. So "a high number people who bought
[almost any game] also bought Counter-Strike: Global Operations" is a
recurring problem with a naive recommender.

What I've been learning how to do is weight recommendations by how
'surprising' they are, for want of a more accurate term. If 80% of people who
own Game A also own Game B, but only 5% of the _total_ population owns Game B,
then we should upweight that relationship.

~~~
SanderMak
I think 'serendipity' [1] is the most-used term in recommender systems to
describe what you mean.

[1]
[https://books.google.nl/books?id=_AfABAAAQBAJ&pg=PA258&lpg=P...](https://books.google.nl/books?id=_AfABAAAQBAJ&pg=PA258&lpg=PA258&dq=serendipity+machine+learning)

------
robzi
hugged?

------
peteretep
Tangential: has anyone found anything that doesn't completely suck for
recommending books? Goodread's recommendations are terrible.

~~~
treehau5
Amazon tends to recommend things I like and the majority of the purchases have
been because of their recommendations (good job Amazon, your software is doing
it's job, raising sales). I think books and music are different though. Books
have well defined categories. If I buy a pop-psych book, say "Blink", and then
I am recommended "Peak: Secrets from the New Science of Expertise" it's pretty
much going to be very likely I buy that too if I am interested in that
subject.

If however I want recommendations for new Metal music, and my previous
selection was Metallica then you play me some Megadeth, I am going to hate it
and not be interested in it at all!

------
bigtoine123
There is a simple music recommender webapp shown in the video. From your model
you got a python function that maps one song (e.g. by artist,title) to other
songs. What is the fastest way to build this interactive webapp (for internal,
experimental) use?

~~~
saamm
For internal/experimental/exploratory use, I like Jupyter [0].

0: [https://jupyter.org/](https://jupyter.org/)

~~~
bigtoine123
I appreciate it I'm gonna try it soon

