Netflix prize competitor: With the best algorithms, metadata becomes worthless

gaika · on Aug 15, 2008

Metadata is useful when you need to interpret the results, and most people care about why something is recommended to them. Top netflix algorithms are black boxes from the user standpoint, that doesn't help when making a decision to buy or rent based on that recommendation.

Compare that with Amazon's approach when they sacrifice the predictive power for the useful explanation (customers who bought X also bought Y)

andrewf · on Aug 15, 2008

I wonder if, having used an impossible-to-explain algorithm to arrive at your solution, you could compare your solution against a few very simple algorithms, and offer up the best fit as a rationalization.

byrneseyeview · on Aug 15, 2008

Fine, but if anyone copyrights normal human reasoning, you're in trouble.

wheels · on Aug 16, 2008

Legal nitpick: You meant patents. Patents are for processes, copyright is for instances.

byrneseyeview · on Aug 16, 2008

You're right. Thanks for the correction.

greendestiny · on Aug 15, 2008

I don't know you can say Amazon is sacrificing any power. The Netflix prize isn't about better recommendations per se, rather its about predicting users movie ratings. Its not clear predicting these ratings better will give better recommendations.

I think most of the component algorithms in the Bell labs method provide fairly straight forward explanations - the best predictor is a nearest neighbour algorithm.

ehedberg · on Aug 15, 2008

Amazon isn't sacrificing predictive power because they want to give you the explanation. They're just including profit margin in the data to give recommendations that are valuable to them. Not that I can blame them.

wmf · on Aug 15, 2008

But Netflix does say that. e.g.:

Twin Peaks: Season 1 (3-Disc Series)

Because you enjoyed: Mulholland Drive Brazil Videodrome

So right away I know that Twin Peaks is pretty freaky.

gaika · on Aug 15, 2008

I'm talking about current winners in netflix prize. Their algorithms are a combination of hundreds individual recommenders blended together by another learning algorithm. There's no easy way to explain the result, but I guess one can cheat and run a simpler version on top of it to produce an "explanation".

elq · on Aug 15, 2008

Correct. When using a latent factor method, providing an explanation for for the prediction has been essentially intractable.

A common post-processing step for many teams (particularly BellKor/KorBell) is a KNN. The KNN's are used to provide a sort of confidence metric and if the anyone so decided, using the KNN derived network for explanation is pretty simple and not really cheating.

Edit: Yehuda might have made some breakthrough - He'll be presenting a paper at KDD "Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model", the paper will probably be released after the conference.

carterschonwald · on Aug 17, 2008

its already on his page http://www.research.att.com/~yehuda/index_pubs.html

4th one down

ericb · on Aug 15, 2008

It's interesting, because there's evidence that that's what we as humans do. The subconscious decides something, then we make up a story when asked why we did X.

randomwalker · on Aug 16, 2008

This is fascinating. I was aware of one half of what the article talks about: me and my co-author broke the anonymity of the netflix data (see http://www.cs.utexas.edu/~arvindn/ for paper/press links). Our main insight was that everyone's movie watching behavior is different. The quote "User tastes are infinite shades of grey" in the article just about sums it up perfectly.

What's funny is that I keep arguing for using more meta-data with my friends who are participating in the competition. I guess I didn't realize that data mining algorithms actually capture the nuances of user tastes.

martian · on Aug 15, 2008

This reminds me of some of the spam filtering algorithms I've read about.

You'd think that categorizing spam based on keywords (or sender IP, etc) would be useful, but machine learning algorithms can pick up more subtle nuances of language patterns and act more effectively.

http://portal.acm.org/citation.cfm?id=1216017&jmp=cit...

bluishgreen · on Aug 15, 2008

In addition it can pick up stuff that you never thought about as properties/clues for classification. PG has written about this in his spam posts.

schtog · on Aug 15, 2008

I made a simple spamfilter in Erlang based on the bayesian approach, works really well already with just a few spamemails.

DaniFong · on Aug 15, 2008

Metadata is in fact useful, though not the metadata that you might expect. One of the biggest wins many teams made was when they started ranking similarity based on edit distance of titles.

elq · on Aug 15, 2008

levenshtein distance for predictions? haha.

I'd be really curious to know which teams in particular use movie metadata? Yehuda Koren (the first commenter in the blog post) has explicitly stated many times that in his humble opinion movie titles and any non-explicit info has been useless.

BellKor, BigChaos, Gravity, and Gavin Potter (just a guy in a garage) are going to be presenting at Yehuda's KDD workshop next week. I'm sure other teams will also be represented. I'll ask them if they use movie metadata, and I'm pretty sure the answer will be no.

ambition · on Aug 15, 2008

Edit distance of titles?! Do you have a source? I'm very curious about how and why that would help.

trevelyan · on Aug 15, 2008

Indiana Jones and the _______________.

akd · on Aug 15, 2008

"Heat" and "WALL-E" have a shorter edit distance between them than any Indiana Jones movies.

jsn · on Aug 15, 2008

not if you normalize by e.g. minimal string length.

DaniFong · on Aug 15, 2008

Here's a paper on the BellKor solution, from one of the top teams:

http://research.att.com/~volinsky/netflix/ProgressPrize2007B...

elq · on Aug 15, 2008

Yehuda later wrote http://glinden.blogspot.com/2008/03/using-imdb-data-for-netf... and http://hunch.net/?p=331 that using movie metadata has produced no measurable improvement in RMSE.

DaniFong · on Aug 15, 2008

The crux of the argument though, is that if you have a strong CF model with many many ratings, you don't seem to get much benefit with their approach (linear combination of models). That doesn't mean that metadata can't be useful with a different approach. It also doesn't mean that metadata isn't useful for sparse data: in fact, it's incredibly useful, because you don't have much of anything else.

elq · on Aug 15, 2008

I cannot dispute that metadata can be useful. But it appears, at least for prediction tasks similar to the prize, that an ounce of weak or strong explicit user input is worth a ton of rich implicit data (including item metadata).

DarkShikari · on Aug 15, 2008

This makes perfect sense; genres and other information about movies is just an approximation of user taste, while the actual taste of users themselves is clearly the best data to train your models on, since that's what you have to predict.

Any good model should be able to derive the relationships between films without knowing them beforehand, solely by using users' choices. And, likely, these relationships will be more useful than any from an external database.

ggrot · on Aug 17, 2008

I can see that metadata about the movies becomes worthless, because there is already a wealth of data about each movie entity in the dataset. However, metadata about the users should be fruitful since each user has fairly few data points to use for prediction.

Take for example two users who have each rated only Wall-E, and they both rated it a 5. Now, given Jet Li's "The One", what prediction do you give for each user? It is unlikely that two real people with this one data point on Wall-E would have the same outcome on "The One", so any additional data that can help to statistically separate the people can only help your case. For example, is the person male/female? What are the person's favorite genre's (something netflix collects), even things like did the person sign up for 6-at-a-time or 2-at-a-time might correlate slightly.

maxklein · on Aug 15, 2008

The way I see it is that these people have a set of data that one could say is a line on an xy axis. This line goes up, down, etc, and there does not seem to be any pattern. So they come up with a bunch of algorithms that go as near to the line as possible -> they approximate the line with an algorithm. So from that, they can predict how the next step of the function is going to look like.

Metadata is like placing some dots on this line and saying "this spot is horror", "this spot is comedy". It becomes irrelevant, because you are already near enough to the line, and that dot does not help you any.

If I were dealing with this problem, what I would do is break free of these constraints and concentrate on taking the data as an abstract blob of random, then splitting the individual data (i.e, move data into separate 'dimensions') till I had hundreds of straight lines, and then using those for prediction. But I'm sure they must have tested this already :)

I'm rooting for the team with the two jewish guys and the black guy, afterwards they could get together and make a sitcom. Or a joke.

andreyf · on Aug 15, 2008

I could see how it's easier to learn user simple preferences from their voting history, but it's shortsighted to say "all meta data is useless".

What about deriving statistical information from scripts, reviews, or online forums?

aswanson · on Aug 16, 2008

I would guess that an SVD/SVM feature extraction of the movie script could be of predictive value.