Hacker News new | past | comments | ask | show | jobs | submit login
The Netflix Prize: 300 Days Later (whimsley.typepad.com)
32 points by nickb on Aug 1, 2007 | hide | past | favorite | 4 comments



Great article and speaking a someone who is actively working with the Netflix prize data on a daily basis the summarize some of the more bizarre quirks of the dataset pretty well. The other problem that he didn't really get into is the sheer size of the dataset, the .txt files are about 4GB in size to hold the 100,480,507 ratings ...

Beyond that I think he missed the really interesting ideas behind the recommender system possibilities. Once a user has enough ratings on movies and there is enough of a cross section of other users rating the same movies the math of the problem starts to create a profile of the user. With that profile it is more likely to be able to find the things the user will find interesting instead of just what the crowd voted most popular.

The real trick is not to use some artificial error measuring method to validate how your algorithm is working but instead to focus on how the recommendation system changes the way the user interacts with your website.

The Netflix prize did a few things that where wonderful for the process of recommendation system, they brought a lot of attention to the field and they gave a huge dataset for people to compare their results with. In the long run, even if no one wins the grand prize, they will have give much more to the field of study then the $1 million.

Oh yeah .. and maybe this January there will be one more YC start-up doing recommendation systems.


I will watch that with interest. I have played with Netflix dataset, although not as much as I'd like. You can do a lot of useful things just with it, but in real world it is very useful to use different definition of the problem from the Netflix one. "Similar items" are useful because customers can understand what it means, besides having some abstract "we think you would rate this X".

Also it makes a lot of sense to integrate the other data you have about your items (genre, director). It is actually pretty easy to do.


From the YC crowd, Reddit and most recently Adpinion seem to try exploiting recommendation systems. My experience with Reddit's "recommended" page tells me it is useless for now. I hope Adpinion doesn't rely solely on getting recommendation system right.


Great article. The limit doesn't seem to be in the algorithm (as this article makes clear), it's the ability of a 1 through 5 star system to accurately reflect a person's feelings about a movie.

While algorithms for predicting your upcoming ratings based on your previous ratings may not be able to improve...surely there are some creative ways to re-think the entire system:

Two obvious approaches would be to rate on multiple scales (funniness, intellectual weight, 'light'-ness, etc.), and to include a comment-tracking system alongside the simple rating system to provide a better more complex picture of the film (as Reddit, and YCnews do).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: