Good to see the Netflix prize paid off for Netflix. In pure terms of hourly rate, Netflix managed to get some of the smartest people in the world to work for less than chump change. In those terms alone it was a huge success, but the Netflix prize also pushed forward the field so really no-one was exploited.
The same idea is been commercialised by Kaggle (http://www.kaggle.com/) but there are several issues. Of course there is less up-take as the idea is no longer novel and the prizes are less. More than that, I think people are realising that winner-takes-all sucks, and the winning entries tend to combine so many different techniques that, as Netflix found, putting them into production is difficult. There is some interesting work on a better model here: http://arxiv.org/abs/1111.2664
The Netflix prize's criteria -- accurately rate a pile of movies -- is a red herring, though. It may help some, but the broad accuracy comes at the expense of an optimal algorithm for what's really important.
What I want is a recommendation of what I'll probably like. It is absolutely irrelevant if mid-range movies sore a 2 or a 3. If Netflix can pick out a list of movies that I would rate a 5 (and maybe even 4), they've got the holy grail. Nothing else matters. So why optimize your algorithm to capture the 2s and 3s as well?
For bonus point, it might be nice to be able to pick out the real dogs. If it could warn me that I'm about to rent a 1 or 2, that would be cool. But it doesn't matter if they can tell me which of 1 or 2 it is. The precision is irrelevant, just tell me I won't like it.
(If I've said this once, I've said it a hundred times. But I guess I'll keep on like a broken record as long as Netflix keeps trumpeting what an achievement the Prize's algorithm was.)
In the OPs article they mention they monitor if a movie is watched to completion, which gives them a much better metric to optimise. The other issue is that this is really a sequential decision making problem. Recommending a movie has an opportunity cost -- there are other movies you don't recommend -- and the recommendation is an ongoing process, so it is probably best to spend some time exploring the user's taste on the assumption this will let you make better recommendations in the future. Accounting for these issues is much harder in a competition format.
"Of course, when we say “you”, we really mean everyone in your household." It's interesting that they've made a conscious decision to go that route. My wife and I share a streaming account which does not allow for separate instant queues. We have different tastes so I often see prompts to rate or recommendations to watch films and shows I dislike. If I'm honest in my rating it seems like I'll skew the recommendation engine away from shows my wife would like but if I'm not I'll end up seeing only recommendations i done like. It's a catch-22 situation and makes me distrust pretty much anything that is recommended.
I let a family stay at my home for a month, and forgot my Xbox 360 was on Netflix. Came back from a trip and the recommendations we're all kids all the time. Cancelled the Xbox account. Three months later, still all kids, all the time. Went to website and negatively rated all kids shows (though I like some things like British lit movies for kids) it suggested, mercilessly. Kept at that for two weeks. Didn't help. It kept cycling around the dregs of kids shows, surfacing Pokeman, or old Nik reruns. Found an obscure "don't consider this" button, cycled through everything again. Five months after that, our top 10 "for you" don't list kids shows any more. But, one of those precious custom genres is still always kids and family related, no matter what we try. So, we're giving up on fixing it, and just going to have a kid instead.
I almost spit out my coffee, very well done. I have the same issue in that I set up my preferences, and told my wife (who uses it alot more but doesn't look at the recommendations much, she generally already knows what she wants to watch) to just not rate anything, so that it wouldn't be confused by my ratings. All it took was one accidental 5* on Desperate Housewives and my recommendations were shot.
It seems to me that the rating algorithm is a bit sensitive. Also, I wonder if Netflix has considered giving people they option of multiple "personalities" for the purpose of suggestions, queues, etc. I bet if they offered this for something like $3/mo more, people would pay for it.
Yeah - instead of putting their brainpower in finessing recommendation, I wish they would put it into improving the user experience. The web UI has hardly changed since they launched. It does not support notes (Joe recommended this movie), it does not support filtering of long queues (show me all the movies in my queue with predicted 4+ star rating that were added more than a year ago"), search is not great and lacks features (show me only PG movies), the site works poorly on the iPad, and yeah - a single streaming queue and the assumption that everybody in the household has the same tastes? Ai.
My wife and I have separate accounts because we don't want our profiles for recommending new movies to influence each other. We try to find a common movie, and if nothing jumps out as being a good match for both of us, we simply sit next to each other with our laptops and use noise canceling headphones. We have been married for almost 30 years, so this is not as isolating as you might think.
I'm working on a site that will solve this exact problem. I aggregate streaming movie sites and let you check off your subscriptions and preferences. I'll be continuously improving recommendation quality as I gather the user-data to pull it off. You can also have as many different queues as you want, and you can share and collaboratively edit them with your friends. You can have a look at http://www.dexy.tv
It's brand new and is my first web app so I'd love any feedback , good or bad.
"Another important element in Netflix’ personalization is awareness. We want members to be aware of how we are adapting to their tastes. This not only promotes trust in the system, but encourages members to give feedback that will result in better recommendations. A different way of promoting trust with the personalization component is to provide explanations as to why we decide to recommend a given movie or show. We are not recommending it because it suits our business needs, but because it matches the information we have from you: your explicit taste preferences and ratings, your viewing history, or even your friends’ recommendations."
Well, at least they're trying to demonstrate awareness and explain their recommendations, even if they come up short.
Awareness? Am I missing something here because most people I've talked to (several hundred for the record) feel like Netflix recommendations are hit or miss, a black box that plateaus after a while. If anyone can explain to me how Netflix conveys their awareness I'd love to hear it.
Speaking of explanations, sure, showing that I'm being recommended a movie because of some other movies is sometimes helpful, but the rest of the time it just reveals how poorly they understand me. It's can be pretty obvious that the algorithms have no clue why I actually like certain movies, after all, how could they? There are tons of things to like/dislike about any given movie, but when that gets boiled down to 5-stars all that context gets lost. Diminishing returns and noise will keep them from really understanding my tastes with their current system.
Why am I ranting? Because I want to be understood but Netflix and other personalization services still feel so damn impersonal. It's frustrating.
Anyways, we're working on a solution, sign up for our alpha at http://tagbax.com or comment/vote to tell me how right/wrong I am. Thanks for reading.
It seems to me they'd be able to get a lot more metadata for their recommendation engine if they worked on allowing multiple personas per household account. Analyzing the viewing behaviour of people within the same household would give them a treasure trove of more data to make better recommendations for other people in similar households.
"You watched and liked XYZ, your 38 year old wife hates ABC, and you've got 3 daughters 8/10/12? They'll probably like these shows ...."
It seems they missed out on years of awesome data by not allowing for segmenting by persona from the early days.
A year or so ago we discovered this at work and copied the URL for the setup for this from someone with DVDs to someone who didn't, and they were able to set it up for instant streaming, even though they didn't have the UI presenting the feature visible. I'm not sure if it still works, but it was an interesting bug to find, nevertheless.
But making a multiuser-interface complicates a design and ease of use tremendously. Think about password protection (and recovery). And you don't want to administer another computer system, you just want to watch TV.
Wouldn't necessarily need password protection. There's an assumption that you're all living in the same house. But a 'who are you?' at the top, letting you type your name in, and 'switch to', shouldn't be too hard.
I agree, it does complicate things some, but consider
A) not every account would need it.
B) not everyone offered it would use it
C) the accounts that used it would probably be a bit savvier to start with
D) they'd get a lot of extra valuable data, not least of which how people use that feature, and could make it easier to use over time.
I'm actually mocking up a recommendation engine at the moment, and this certainly gives me a lot to think about.
For example, I should probably be planning for more future dimensions (ratings, metadata, etc.) so I can manipulate their interactions in a much more elegant way as we add new data, rather than trying to manually flatten everything out (combining metadata and ratings into a score for each item, then calculating compatibility) for every possible combination, which seems like a much less computationally expensive approach. I probably just need to always keep every scrap of data we get, then find the right ways to combine the data into a score with one routine, and compare the scores to calculate compatibility with another. The hard part is figuring out how the score should be calculated (I'll get to that in the morning).
I wish that they used a 10 star metric. Often I'll want to give a 3.5 or 4.5 rating - not possible. They need to use it a finer scale so customers get the impression, at least, of finer control of how they rate movies.
I often wondered if this amount of effort is truly required and if it actually provides results people can relate to. I used Netflix for a while and I find the recommendations to be pretty average for me. Quite possibly it is because I don't tend to rate most movies I watch. I wonder how often that is the case for others though.
More importantly I just don't think you can predict what I want to watch based on factors such as history, ratings, watch times etc. External factors such as mood, recommendation from friend (not on Netflix) and curiosity almost always play a large factor in choose the thing I watch next. These things are not known to Netflix.
As a semi-educated guess I would say that simply providing a listing of popular movies based on ratings/watch times per genre as well as similarity would be enough for most people. It is easier to understand and how many people want to watch something just because it is popular anyway?
To me, the Netflix price is as much about marketing as it is about the quality of the algorithm: Netflix was ahead of the pack (i.e. Blockbuster) even before, but after spreading the word that it takes rocket scientists to even slightly improve the existing algorithm, who would even think twice about testing a DVD/ streaming provider other than Netflix?
That's why they show different rows from different sources. In my country Netflix is not available yet so I'm not sure about what I'm going to say, but I'm quite confident that one of those rows is just the top-rated/popular movies of the site.
Great read! Isn't this a bummer though? -- "On the topic of friends, we recently released our Facebook connect feature in 46 out of the 47 countries we operate – all but the US because of concerns with the VPPA law."
(I wonder to what extent it also constrains recommendations based on user similarity.)
The Video Privacy Protection Act (VPPA) was a bill passed by the United States Congress in 1988. Congress passed the VPPA after Robert Bork's video rental history was published during his Supreme Court nomination. It makes any "video tape service provider" that discloses rental information outside the ordinary course of business liable for up to $2500 in actual damages.
(Covers DVD rentals too, and apparently paid-for streaming content.)
Interesting story. Robert Bork said that the Constitution does not contain a general "right to privacy", which somehow is related to abortion (sometimes I think all american politics is related to Roe vs Wade) and then a Journalist just visited Bork's videothek and asked for his record.
When I arrived that day I told the assistant manager I was thinking of writing about Judge Bork’s video tastes.
“Cool,” the assistant manager said. “I’ll look.”
While I stewed in a sudden outbreak of conscience – what if Robert Bork only rented homosexual porn…or slasher flicks…or (the…horror…) Disney? – she went upstairs to eyeball the records, returning a little glum.
“There sure are a lot of them,” she said. “Is it okay if I make a Xerox copy?”
Next day, while visiting Jack Shafer, I asked if an article on Judge Bork’s video rentals would interest him.
recommendation is a good concept if you have things to recommend. Netflix current catalog is seriously lacking. We were in the middle of watching the TV show Spartacus when it was taken off (due to licensing problems).
Screw netflix -- they stopped trying to even upsell you to dvd rentals if you have instant. So if you search for something, and you have instant, they no longer tell you that you could rent the dvd. It's just plain dumb, but then so was their last 'de-design'.
Also the instant que is not in the order we added the items in, and seems to be arbitrary. It's just a terrible service with stale content. We'll be cancelling soon.