Metadata is useful when you need to interpret the results, and most people care about why something is recommended to them. Top netflix algorithms are black boxes from the user standpoint, that doesn't help when making a decision to buy or rent based on that recommendation.
Compare that with Amazon's approach when they sacrifice the predictive power for the useful explanation (customers who bought X also bought Y)
I wonder if, having used an impossible-to-explain algorithm to arrive at your solution, you could compare your solution against a few very simple algorithms, and offer up the best fit as a rationalization.
I don't know you can say Amazon is sacrificing any power. The Netflix prize isn't about better recommendations per se, rather its about predicting users movie ratings. Its not clear predicting these ratings better will give better recommendations.
I think most of the component algorithms in the Bell labs method provide fairly straight forward explanations - the best predictor is a nearest neighbour algorithm.
Amazon isn't sacrificing predictive power because they want to give you the explanation. They're just including profit margin in the data to give recommendations that are valuable to them. Not that I can blame them.
I'm talking about current winners in netflix prize. Their algorithms are a combination of hundreds individual recommenders blended together by another learning algorithm. There's no easy way to explain the result, but I guess one can cheat and run a simpler version on top of it to produce an "explanation".
Correct. When using a latent factor method, providing an explanation for for the prediction has been essentially intractable.
A common post-processing step for many teams (particularly BellKor/KorBell) is a KNN. The KNN's are used to provide a sort of confidence metric and if the anyone so decided, using the KNN derived network for explanation is pretty simple and not really cheating.
Edit: Yehuda might have made some breakthrough - He'll be presenting a paper at KDD "Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model", the paper will probably be released after the conference.
It's interesting, because there's evidence that that's what we as humans do. The subconscious decides something, then we make up a story when asked why we did X.
This is fascinating. I was aware of one half of what the article talks about: me and my co-author broke the anonymity of the netflix data (see http://www.cs.utexas.edu/~arvindn/ for paper/press links). Our main insight was that everyone's movie watching behavior is different. The quote "User tastes are infinite shades of grey" in the article just about sums it up perfectly.
What's funny is that I keep arguing for using more meta-data with my friends who are participating in the competition. I guess I didn't realize that data mining algorithms actually capture the nuances of user tastes.
This reminds me of some of the spam filtering algorithms I've read about.
You'd think that categorizing spam based on keywords (or sender IP, etc) would be useful, but machine learning algorithms can pick up more subtle nuances of language patterns and act more effectively.
Metadata is in fact useful, though not the metadata that you might expect. One of the biggest wins many teams made was when they started ranking similarity based on edit distance of titles.
I'd be really curious to know which teams in particular use movie metadata? Yehuda Koren (the first commenter in the blog post) has explicitly stated many times that in his humble opinion movie titles and any non-explicit info has been useless.
BellKor, BigChaos, Gravity, and Gavin Potter (just a guy in a garage) are going to be presenting at Yehuda's KDD workshop next week. I'm sure other teams will also be represented. I'll ask them if they use movie metadata, and I'm pretty sure the answer will be no.
The crux of the argument though, is that if you have a strong CF model with many many ratings, you don't seem to get much benefit with their approach (linear combination of models). That doesn't mean that metadata can't be useful with a different approach. It also doesn't mean that metadata isn't useful for sparse data: in fact, it's incredibly useful, because you don't have much of anything else.
I cannot dispute that metadata can be useful. But it appears, at least for prediction tasks similar to the prize, that an ounce of weak or strong explicit user input is worth a ton of rich implicit data (including item metadata).
This makes perfect sense; genres and other information about movies is just an approximation of user taste, while the actual taste of users themselves is clearly the best data to train your models on, since that's what you have to predict.
Any good model should be able to derive the relationships between films without knowing them beforehand, solely by using users' choices. And, likely, these relationships will be more useful than any from an external database.
I can see that metadata about the movies becomes worthless, because there is already a wealth of data about each movie entity in the dataset. However, metadata about the users should be fruitful since each user has fairly few data points to use for prediction.
Take for example two users who have each rated only Wall-E, and they both rated it a 5. Now, given Jet Li's "The One", what prediction do you give for each user? It is unlikely that two real people with this one data point on Wall-E would have the same outcome on "The One", so any additional data that can help to statistically separate the people can only help your case. For example, is the person male/female? What are the person's favorite genre's (something netflix collects), even things like did the person sign up for 6-at-a-time or 2-at-a-time might correlate slightly.
The way I see it is that these people have a set of data that one could say is a line on an xy axis. This line goes up, down, etc, and there does not seem to be any pattern. So they come up with a bunch of algorithms that go as near to the line as possible -> they approximate the line with an algorithm. So from that, they can predict how the next step of the function is going to look like.
Metadata is like placing some dots on this line and saying "this spot is horror", "this spot is comedy". It becomes irrelevant, because you are already near enough to the line, and that dot does not help you any.
If I were dealing with this problem, what I would do is break free of these constraints and concentrate on taking the data as an abstract blob of random, then splitting the individual data (i.e, move data into separate 'dimensions') till I had hundreds of straight lines, and then using those for prediction. But I'm sure they must have tested this already :)
I'm rooting for the team with the two jewish guys and the black guy, afterwards they could get together and make a sitcom. Or a joke.
Compare that with Amazon's approach when they sacrifice the predictive power for the useful explanation (customers who bought X also bought Y)