Hacker News new | comments | show | ask | jobs | submit login
Building a ‘People Who Like This Also Like’ Feature (benfrederickson.com)
108 points by benfrederickson 729 days ago | hide | past | web | 17 comments | favorite

The book 'Progranming Collective Intelligence' covers this topic and related ones really well.

I really like articles that use real data and let me play with it too. Nice.


If you really want to play around with it, all the code is on my github https://github.com/benfred/bens-blog-code/tree/master/distan... =)

My bias is towards matrix completion by rank minimization, but that's just, like, my opinion: http://coral.ie.lehigh.edu/~katyas/wp-content/OPTML/Lecture1...

I like matrix factorization models too !

Most of the netflix prize style models will have problems with this sort of data though. The problem here is that you only have positive data, and no negative data: you can assume that artists the user listened are positive, but there is no explicit negative votes for artists the user didn’t like. This means that there is no gradient for Funk style (http://sifter.org/~simon/journal/20061211.html) SGD solvers to descend.

Better matrix factorization models would be something like the weighted ALS (http://labs.yahoo.com/files/HuKorenVolinsky-ICDM08.pdf) or BPR (http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle_et_al2009...) approaches which can handle this sort of thing directly.

Having said all that - this post isn’t about CF. Its just trying to explain some basic concepts in information retrieval using finding similar artists as a problem.

I wonder if the premices are correct here. For me at least, if I am listening to, say, Kraftwerk, a good suggestion for next artist to listen to is not necessarily the artists most people who like Kraftwerk like. Kraftwerk is very old techno music. It is very influencial. It taste strongly like Electro music. It is also arranged like classical music (the guys were musically educated).

I could like it because I like to listen to precusors, and then a precursor for Metal like Napalm Death could be a good suggestion.

I could like it because I like Electro, and then a modern but less known Electro track could be fit.

I may like classic music done with electronic instruments, and then next track could be contemporary classic music.

I have no idea how to solve this. But I have always been disappointed by most suggestions I got from mixcloud or last.fm, because I either already know the proposed track or it do not fit my taste. The only good suggestions I ever got from outside are from friends that share my taste and explored different areas (e.g. I got into Raggamuffin from a Techno-friend)

So maybe another suggestion engine could work like this:

Find a couple of other users who have similar tastes. Propose artists they like but I may not know of yet.

Also, I am baffled that there is usually no "Hate it" button. No recommendation will ever work if the engine do not know that I hate Brit pop, guitar rifs in rock, female shouting (I can't even tolerate Bjork, sadly). In music, movies, art, food, we are much better defined by what we dislike (which is usually well-defined and stable) than what we like (which is open and changing), aren't we?

[edit: clarified]

> For me at least, if I am listening to, say, Kraftwerk, a good suggestion for next artist to listen to is not necessarily the artists most people who like Kraftwerk like.

There are several separate things here:

The article, as far as I can tell looks at "if you like X, you're more likely to like Y".

You're describing "if you enjoyed listening to X right now, you're likely to enjoy listening to Y right now" and "if you like X and Y, you're like user B, who also likes Z".

These are all wildly different, and are useful for different purposes. The first is great if you don't have more data on the user than what they picked right now. The other two are better for different applications if you have the data.

One pet peeve of mine is that most music players seems to only take into account user similarity, rather than what you're likely to like right now. E.g. I rarely want to suddenly transition e.g. from a slow classical track to a super-noisy 8-bit war game chip tune even if I like both. And some transitions are great at some times of day, but not so great e.g. when I want a specific type of music because it works best for me when I'm working.

It greatly annoys me when I use of these services and they make transitions that are "obviously" wrong. If I've skipped the last 5 noisy tracks and listened to every slow track, clearly I want slow music right now even though I've previously loved all those noisy tracks...

It must be possible to do so much better... Even just mixing in the other recommendation data with some simple markov chains you'd think would help.

Is this not called collaborative filtering any more?

Item-Item collaborative filtering could use these distance metrics when doing a nearest neighbours recommendation - but on its own not really CF. CF would provide results that are personalized to each user, while this just brings up a static list of results for everyone.

There's a good breakdown of the algo used at Amazon here. Useful for anyone looking at decent sized data sets:


+1 for the Bieber reference, and chapeau for the "this only scratches the surface" when it's actually quite fleshed out.

It's an interesting technical challenge, but as a user I find this "People Who Like This Also Like" feature totally useless, especially when items of interest are generally less popular.

Well written. The topics are explained carefully and the interactive examples are very nice. Thank you Ben!

How did you embed the examples into the post?


The examples were done with d3.js, the code for all the graphs is here https://github.com/benfred/bens-blog-code/tree/master/distan...

I'm using the python code to generate a series of json files, one for each artist. I stored them all in S3, and then load them via ajax calls. Its not an elegant solution, but it lets me keep my website statically generated.

this seems like it would be a simple graph query..

Yup. There is a very approachable blog post on this subject by Nicole White => http://gist.neo4j.org/?8173017

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact