If you really want to play around with it, all the code is on my github https://github.com/benfred/bens-blog-code/tree/master/distan... =)
Most of the netflix prize style models will have problems with this sort of data though. The problem here is that you only have positive data, and no negative data: you can assume that artists the user listened are positive, but there is no explicit negative votes for artists the user didn’t like. This means that there is no gradient for Funk style (http://sifter.org/~simon/journal/20061211.html) SGD solvers to descend.
Better matrix factorization models would be something like the weighted ALS (http://labs.yahoo.com/files/HuKorenVolinsky-ICDM08.pdf) or BPR (http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle_et_al2009...) approaches which can handle this sort of thing directly.
Having said all that - this post isn’t about CF. Its just trying to explain some basic concepts in information retrieval using finding similar artists as a problem.
I could like it because I like to listen to precusors, and then a precursor for Metal like Napalm Death could be a good suggestion.
I could like it because I like Electro, and then a modern but less known Electro track could be fit.
I may like classic music done with electronic instruments, and then next track could be contemporary classic music.
I have no idea how to solve this. But I have always been disappointed by most suggestions I got from mixcloud or last.fm, because I either already know the proposed track or it do not fit my taste. The only good suggestions I ever got from outside are from friends that share my taste and explored different areas (e.g. I got into Raggamuffin from a Techno-friend)
So maybe another suggestion engine could work like this:
Find a couple of other users who have similar tastes. Propose artists they like but I may not know of yet.
Also, I am baffled that there is usually no "Hate it" button. No recommendation will ever work if the engine do not know that I hate Brit pop, guitar rifs in rock, female shouting (I can't even tolerate Bjork, sadly). In music, movies, art, food, we are much better defined by what we dislike (which is usually well-defined and stable) than what we like (which is open and changing), aren't we?
There are several separate things here:
The article, as far as I can tell looks at "if you like X, you're more likely to like Y".
You're describing "if you enjoyed listening to X right now, you're likely to enjoy listening to Y right now" and "if you like X and Y, you're like user B, who also likes Z".
These are all wildly different, and are useful for different purposes. The first is great if you don't have more data on the user than what they picked right now. The other two are better for different applications if you have the data.
One pet peeve of mine is that most music players seems to only take into account user similarity, rather than what you're likely to like right now. E.g. I rarely want to suddenly transition e.g. from a slow classical track to a super-noisy 8-bit war game chip tune even if I like both. And some transitions are great at some times of day, but not so great e.g. when I want a specific type of music because it works best for me when I'm working.
It greatly annoys me when I use of these services and they make transitions that are "obviously" wrong. If I've skipped the last 5 noisy tracks and listened to every slow track, clearly I want slow music right now even though I've previously loved all those noisy tracks...
It must be possible to do so much better... Even just mixing in the other recommendation data with some simple markov chains you'd think would help.
How did you embed the examples into the post?
The examples were done with d3.js, the code for all the graphs is here https://github.com/benfred/bens-blog-code/tree/master/distan...
I'm using the python code to generate a series of json files, one for each artist. I stored them all in S3, and then load them via ajax calls. Its not an elegant solution, but it lets me keep my website statically generated.