
Building a ‘People Who Like This Also Like’ Feature - benfrederickson
http://www.benfrederickson.com/distance-metrics/?hn=1
======
Joeri
The book 'Progranming Collective Intelligence' covers this topic and related
ones really well.

------
degenerate
I really like articles that use real data and let me play with it too. Nice.

~~~
benfrederickson
Thanks!

If you really want to play around with it, all the code is on my github
[https://github.com/benfred/bens-blog-
code/tree/master/distan...](https://github.com/benfred/bens-blog-
code/tree/master/distance-metrics) =)

------
TTPrograms
My bias is towards matrix completion by rank minimization, but that's just,
like, my opinion: [http://coral.ie.lehigh.edu/~katyas/wp-
content/OPTML/Lecture1...](http://coral.ie.lehigh.edu/~katyas/wp-
content/OPTML/Lecture19.pdf)

~~~
benfrederickson
I like matrix factorization models too !

Most of the netflix prize style models will have problems with this sort of
data though. The problem here is that you only have positive data, and no
negative data: you can assume that artists the user listened are positive, but
there is no explicit negative votes for artists the user didn’t like. This
means that there is no gradient for Funk style
([http://sifter.org/~simon/journal/20061211.html](http://sifter.org/~simon/journal/20061211.html))
SGD solvers to descend.

Better matrix factorization models would be something like the weighted ALS
([http://labs.yahoo.com/files/HuKorenVolinsky-
ICDM08.pdf](http://labs.yahoo.com/files/HuKorenVolinsky-ICDM08.pdf)) or BPR
([http://www.ismll.uni-
hildesheim.de/pub/pdfs/Rendle_et_al2009...](http://www.ismll.uni-
hildesheim.de/pub/pdfs/Rendle_et_al2009-Bayesian_Personalized_Ranking.pdf))
approaches which can handle this sort of thing directly.

Having said all that - this post isn’t about CF. Its just trying to explain
some basic concepts in information retrieval using finding similar artists as
a problem.

------
gbog
I wonder if the premices are correct here. For me at least, if I am listening
to, say, Kraftwerk, a good suggestion for next artist to listen to is not
necessarily the artists most people who like Kraftwerk like. Kraftwerk is very
old techno music. It is very influencial. It taste strongly like Electro
music. It is also arranged like classical music (the guys were musically
educated).

I could like it because I like to listen to precusors, and then a precursor
for Metal like Napalm Death could be a good suggestion.

I could like it because I like Electro, and then a modern but less known
Electro track could be fit.

I may like classic music done with electronic instruments, and then next track
could be contemporary classic music.

I have no idea how to solve this. But I have always been disappointed by most
suggestions I got from mixcloud or last.fm, because I either already know the
proposed track or it do not fit my taste. The only good suggestions I ever got
from outside are from friends that share my taste and explored different areas
(e.g. I got into Raggamuffin from a Techno-friend)

So maybe another suggestion engine could work like this:

Find a couple of other users who have similar tastes. Propose artists they
like but I may not know of yet.

Also, I am baffled that there is usually no "Hate it" button. No
recommendation will ever work if the engine do not know that I hate Brit pop,
guitar rifs in rock, female shouting (I can't even tolerate Bjork, sadly). In
music, movies, art, food, we are much better defined by what we dislike (which
is usually well-defined and stable) than what we like (which is open and
changing), aren't we?

[edit: clarified]

~~~
vidarh
> For me at least, if I am listening to, say, Kraftwerk, a good suggestion for
> next artist to listen to is not necessarily the artists most people who like
> Kraftwerk like.

There are several separate things here:

The article, as far as I can tell looks at "if you like X, you're more likely
to like Y".

You're describing "if you enjoyed listening to X _right now_ , you're likely
to enjoy listening to Y _right now_ " and "if you like X and Y, you're like
user B, who also likes Z".

These are all wildly different, and are useful for different purposes. The
first is great if you don't have more data on the user than what they picked
right now. The other two are better for different applications if you have the
data.

One pet peeve of mine is that most music players seems to only take into
account user similarity, rather than what you're likely to like _right now_.
E.g. I rarely want to suddenly transition e.g. from a slow classical track to
a super-noisy 8-bit war game chip tune even if I like both. And some
transitions are great at some times of day, but not so great e.g. when I want
a specific type of music because it works best for me when I'm working.

It greatly annoys me when I use of these services and they make transitions
that are "obviously" wrong. If I've skipped the last 5 noisy tracks and
listened to every slow track, clearly I want slow music _right now_ even
though I've previously loved all those noisy tracks...

It must be possible to do so much better... Even just mixing in the other
recommendation data with some simple markov chains you'd think would help.

------
chiph
Is this not called collaborative filtering any more?

~~~
benfrederickson
Item-Item collaborative filtering could use these distance metrics when doing
a nearest neighbours recommendation - but on its own not really CF. CF would
provide results that are personalized to each user, while this just brings up
a static list of results for everyone.

------
squiggy22
There's a good breakdown of the algo used at Amazon here. Useful for anyone
looking at decent sized data sets:

[http://www.cs.umd.edu/~samir/498/Amazon-
Recommendations.pdf](http://www.cs.umd.edu/~samir/498/Amazon-
Recommendations.pdf)

------
thecopy
Related: [http://en.wikipedia.org/wiki/Multi-
armed_bandit](http://en.wikipedia.org/wiki/Multi-armed_bandit)

------
juanuys
+1 for the Bieber reference, and chapeau for the "this only scratches the
surface" when it's actually quite fleshed out.

------
kbart
It's an interesting technical challenge, but as a user I find this "People Who
Like This Also Like" feature totally useless, especially when items of
interest are generally less popular.

------
nathanwdavis
Well written. The topics are explained carefully and the interactive examples
are very nice. Thank you Ben!

How did you embed the examples into the post?

~~~
benfrederickson
Thanks!

The examples were done with d3.js, the code for all the graphs is here
[https://github.com/benfred/bens-blog-
code/tree/master/distan...](https://github.com/benfred/bens-blog-
code/tree/master/distance-metrics/js/src)

I'm using the python code to generate a series of json files, one for each
artist. I stored them all in S3, and then load them via ajax calls. Its not an
elegant solution, but it lets me keep my website statically generated.

------
nolite
this seems like it would be a simple graph query..

~~~
maxdemarzi
Yup. There is a very approachable blog post on this subject by Nicole White =>
[http://gist.neo4j.org/?8173017](http://gist.neo4j.org/?8173017)

