
How Netflix Reverse Engineered Hollywood - coloneltcb
http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
======
smsm42
For all the high praise that gets heaped on Netflix for their brilliant
technology, I have a feeling there is some other Netflix that is concealed
from me.

I have been Netflix customer for years. I thought the idea was brilliant -
super-cheap movies arriving whenever you want, what could be better?! I loved
Netflix. Then I slowly discovered Netflix is running out of movies I want to
watch - up to now where about 95% of movies I want to see are out. Then there
was that streaming vs. DVS fiasco - and I stayed with streaming. But then I
discovered there's nothing for me to stream. I thought maybe my tastes are
weird - so I went to wikipedia and IMDB and looked "top X movies" \- and most
of them, of course, can't be watched on Netflix, except for those few that
I've already watched long ago.

And that million dollar recommendation system? I've over 800 ratings, and I
have hard time remembering last time their system suggested me something
useful. In fact, the only reason I am keeping the subscription is because my
wife has some series on her sub-account that she's watching. For me, Netflix
has become almost 100% useless. So I wonder, with all the high praise to their
brilliant data usage and innovative technology - am I doing something wrong?
Am I missing some important part of Netflix that everybody else is seeing?

~~~
coliveira
Yes, you're right. What is making Netflix gain market share are the series.
Basically most people are using Netflix instead of TV, not to get Hollywood
movies. The same pattern happened at my home: there are no good movies to see,
but plenty of activity around TV shows and documentaries. And from a business
point of view this is even better for them, because TV series make people
stick around longer than a single movie.

~~~
ghaff
Right, you basically still need to get the little plastic disks--from Netflix
or elsewhere--if you want to watch quality movies, especially recent ones.
Someone I know there told me a year or two back that people get streaming with
the intention of watching movies but they end up watching TV shows instead.

This doesn't quite apply to me because I quit streaming when they did the
whole splitting of physical media from streaming thing. And signed back up
when House of Cards came out. But I do view streaming today as a source for TV
shows and any movies I want to watch on it are basically lagniappe.

~~~
hkmurakami
The algorithms and operations behind the storage and distribution of those
physical disks is a marvel to behold though. A friend of mine was in Netflix's
DVD operations for a number of years, and it was fascinating to hear what they
had implemented (I can't do it justice). Imagine a bunch of Computer
Scientists implementing sorting and data storage optimizations in the physical
world, and you get the Netflix DVD distribution system.

If you ever meet someone from the DVD division, take him out for some drinks.
You won't regret it :)

------
refurb
A friend of a friend works at Netflix and told me how they use some of this
data.

House of Cards was basically a data driven production. Based on Netflix's
customer preferences, they knew that a political thriller, starring Kevin
Spacey and directed by David Fincher would maximize the number of views based
on the habits of its customers.

It would appear the data was correct!

~~~
thetylerhayes
Is there a post somewhere where they describe this?

~~~
ImpressiveWebs
Good link by yonasb.

But the comment by refurb sounds a bit misleading. As I understand it, they
did not produce this solely because of the data (as if the show was born out
of the data). The idea, the actors, and the production staff were already
chosen before Netflix's involvement. Netflix looked at those things as they
had been proposed and realized it was a good gamble to bid on the show.

~~~
refurb
I don't have the facts to contradict your statement about finding the show
after it had already been made, but wasn't it a "Netflix original production"?
As in, Netflix pitched it to the studios?

I know the script originated from the British series, but don't they usually
only select actors after they have funding? I'm not that familiar with the
process, so I may be wrong.

~~~
ImpressiveWebs
As I understand it, it was indeed a Netflix original, but the plans for who
would be in it, who would direct it, what it was about, etc. were already in
place before Netflix got involved. Again, I'm no expert on the subject either,
but that's how I understand what Wikipedia explains:

"MRC approached different networks about the series, including HBO, Showtime
and AMC, but Netflix, hoping to launch its own original programming, outbid
the other networks. Ted Sarandos, Netflix's Chief Content Officer, looked at
the data of Netflix user's streaming habits and concluded that there was an
audience for Fincher and Spacey."

([http://en.wikipedia.org/wiki/House_of_Cards_%28U.S._TV_serie...](http://en.wikipedia.org/wiki/House_of_Cards_%28U.S._TV_series%29))

See the sources linked in those paragraphs, where those statements are
summarized from.

------
danielharan
Netflix's data allows it not only to recommend movies, but also to finance
original productions.

Lots of businesses want "recommendation engines" to appease their cargo cult
gods, few ask what possibilities their data really creates.

Sometimes data can make you better at delivering your service. Other times you
can optimize inventory, enter entirely new lines of business or even obsolete
your competitors.

~~~
objclxt
So audience data when you're talking about original productions can be a
blessing and a curse. We don't need big data to tell us that the top four
favorite subjects of Netflix users are marriage, royalty, parenthood, and
reunited lovers. We've been telling stories about those things for hundreds -
if not thousands - of years.

The data can also be misleading, because sometimes what audiences want isn't
what's good for dramatic effect. Both Dexter and Homeland suffered from
audiences reacting positively to the main character, which caused Showtime to
stop the writers from making creative decisions that yes, would have alienated
the audience initially, but on the other hand would have made for shows that
would have in their later seasons been better received.

Another example: I'm pretty sure that if you looked at the Netflix data it
would show that people liked to watch movies and TV shows that either had
happy endings or twist endings. The former makes you feel good, the latter
makes you appreciate the writing. What people really _don 't_ like are
ambiguous endings. If we based creative decisions purely on audience data a
show like The Sopranos wouldn't have the ending it does. The upshot of all of
this is that many writers end up 'cheating' the data to get their shows made.
Take Orange is the New Black, where the creator Jenji Kohan has gone on record
saying she basically used the main white, engaged protagonist as a trojan
horse to get the show commissioned. What I'm basically trying to say in this
rant is that using data to inform what shows to commission is one thing, but
using it to direct the shows as they progress is another (and something I'm
against).

~~~
graylights
You're right that audience data is going to be full of the obvious. But it's
also good for watching trends, before they go bust. A number of genres (e.g.
superheroes) have cycles of growing, being massively popular, then dying off
for a couple decades.

If you're in the business of making content, you want to join the trend early
and get out before everyone realizes it's a fad. A truly great movie can buck
trends, or even change them, but there are few truly great movies. Or you
could just ignore this all and make the perennial favorites, generic romance
or action movies.

~~~
agibsonccc
Actually their recommendation engine is a little too accurate. It drives as
much as 70% of their views. They have to purposely randomize it. For those
curious how it works: See the brilliant talk from the guy who heads it up
himself (this was a great talk to attend in person)
[http://www.mlconf.com/mlconf-2013/mlconf-2013-agenda/xavier-...](http://www.mlconf.com/mlconf-2013/mlconf-2013-agenda/xavier-
amatriain/)

One reference I could find for the statistic from google:
[http://blog.kissmetrics.com/how-netflix-uses-
analytics/](http://blog.kissmetrics.com/how-netflix-uses-analytics/)

I don't think they'll have to worry about this anytime soon.

------
eli
Haven't people gone to jail for scraping a URL and enumerating its possible
values?

~~~
res0nat0r
Yes. When they knowingly are collecting data they know they shouldn't have
access to, unlike the data in the article.

~~~
eli
Not defending the law (which I think is pretty terrible), but that seems like
an incredibly fine distinction. I doubt Netflix would agree that I'm supposed
to be able to download a list of every possible genre.

~~~
tptacek
It is a fine distinction. The effort in a CFAA case to resolve the distinction
will focus on evidence of intent; of the state of mind of the person
committing the acts.

The CFAA is a very bad law, but it's hard to resolve "authorized" versus
"unauthorized" deterministically without coming up with absurd situations. I
think a better thing to look at is CFAA sentencing and severity, which
currently scale up with the counter in your "for()" loop and quickly produce
dizzying penalties for minor offenses. Taking the piss out of CFAA sentencing
would also reduce the incentive that US Attorneys have to throw the book at
petty offenders.

~~~
TeMPOraL
> _it 's hard to resolve "authorized" versus "unauthorized" deterministically
> without coming up with absurd situations_

Like in other cases when computers meet the law, it's often a "colour of bits"
distinction. I recommend this article:

[http://ansuz.sooke.bc.ca/entry/23](http://ansuz.sooke.bc.ca/entry/23)

~~~
Dylan16807
I don't think that's really what is going on here. "colour of bits" is
distinction between process and result. The question of authorization comes
down to intent and what fits societal norms of fair dealing. It's a complex
mess and there's no alternative to it, vs. colour of bits where there are two
sane points of view.

------
msg
At the top of the article is a Netflix genre generator. That is worth the
price of admission all by itself.

But then there's a fairly entertaining look into what happened to content at
Netflix after the million dollar challenge.

~~~
clarkmoody

      Rogue-Cop Viral Plague Tearjerkers Based on Real Life Set in the Edwardian Era

~~~
girvo
I'd actually watch that.

~~~
officemonkey
"Sherlock Holmes and the Influenza (zombie) Pandemic of 1918."

------
zheng
What would be really cool is if this list of genres was open-sourced
somewhere. I can see Netflix not wanting that, but it would really save time
for however many hackers read this article and decide they want the same data.

~~~
joshdick
I was just thinking how sad it is that Netflix has all this great data on
movies (as Pandora does for music), and it's locked away from us.

------
shawnc
I find the part at the end about the Perry Mason aspect very interesting, and
actually my favourite part of the article.

And the final sentence, feels like the real reason this was posted to HN: "And
sometimes we call that a bug and sometimes we call it a feature."

Edit: Also, the 'Gonzo' genre of Post-Apocalyptic Comedies and Friendship
seems it's got its first one in "This Is The End".

------
mixmastamyk
Meanwhile, their client still can't separate my daughter's kid shows from
mine. It took them several years to implement profiles on iOS and then another
to do it on Android.

Now implemented, "My Top Picks" last night were still dominated by My Little
Pony.

Also would like to choose which shows she can watch, but the client doesn't
support that. </complaints-over> ;)

------
hershel
There's also jinni.com which has a similar system, not limited by UI issues
and that can be used globally. Usually i get great recommendations from them ,
and they're fun to play with.

~~~
chrisgd
I signed up for that a year ago and everytime I am reminded of its existence,
I kick myself for not using it more often. Especially a time like now when the
TV shows my wife and I used to watch are not on and we are looking for
something new.

Just finished Broadchurch over the holidays, very entertaining for those
interested. BBC murder mystery

~~~
pyre
Broadchurch was aired on ITV[1], not BBC. (And in typical fashion, when
bringing it to the US, it needs to be a remake because US audiences are
apparently too stupid -- according to the television network execs -- to watch
British television without heavy localization -- by Fox Networks in this
case).

[1]
[http://en.wikipedia.org/wiki/ITV_%28TV_network%29](http://en.wikipedia.org/wiki/ITV_%28TV_network%29)

~~~
chrisgd
Already brought to America on BBC America, which is why I said BBC.

------
agibsonccc
For those who are data curious:
[https://gist.github.com/agibsonccc/8230583](https://gist.github.com/agibsonccc/8230583)

I cherry picked this from the source for those who might want the generator. I
"think" that's everything, correct me if I'm wrong there. I didn't really test
it, just took a few seconds to grab what I saw for later.

------
mathattack
I get a kick out of the genre names. My wife and I both use the same account,
and each rate movies ourselves. Whenever something comes out of left field, we
say, "Look at your movies..."

I wonder if Netflix can tell if multiple people are rating movies. Does it
think we are one confused person, or two distinct personalities?

~~~
ryen
Have you tried their multiple profiles feature?
[https://movies.netflix.com/EditProfiles](https://movies.netflix.com/EditProfiles)

~~~
officemonkey
We have multiple profiles now, but before that there were 10 years of Netflix
and five of them dominated by a child who enjoys "Curious George" and "Super
Hero Squad."

It's going to take a while to get back to normal.

~~~
mathattack
Makes me wonder about time decay to account for changing tastes.

------
arek2
76k micro-genres seems much. For my website
[http://5000best.com/movies/](http://5000best.com/movies/) I created 40 main
genres using IMDb tags together with the 100 million ratings from the Netflix
Prize data (I was 43rd in that competition).

Additionally, earlier I extracted and named 12 new genres (those ones on the
right) from the Netflix ratings alone - I described the process here:
[http://arek-paterek.com/book/predict_sample.pdf](http://arek-
paterek.com/book/predict_sample.pdf)

~~~
KaoruAoiShiho
What do you mean much? When working like this the more the better. There's an
opportunity for an open source variant of this technology.

------
discardorama
How is this any different from that Pandora did with music?

~~~
JumpCrisscross
> _The only semi-similar project that I could think of is Pandora 's once-
> lauded Music Genome Project, but what's amazing about Netflix is that its
> descriptions of movies are foregrounded. It's not just that Netflix can show
> you things you might like, but that it can tell you what kinds of things
> those are. It is, in its own weird way, a tool for introspection._

I have both Pandora One and Netflix, and am very satisfied with both. When I
read why Pandora served up a particular song I have sometimes gone "hmm," but
not much more.

But I can still remember pausing when Netflix put "critically-acclaimed
cerebral dark thrillers" or "visually-striking foreign sci-fi & fantasy" in
front of me. Not only did it map my preferences to films I hadn't seen, but it
told me something new about my tastes and thus myself.

~~~
swanson
Just an FYI if you didn't know this, but Pandora will show you the same kind
of information if you hover over the song (web) and pick "Why was this track
selected?".

"Based on what you've told us so far, we're playing this track because it
features electronica roots, house influences, danceable beats and vocal
samples."

~~~
nitrogen
Pandora's genome doesn't seem to have enough sub-genes to distinguish between
songs and groups that are noticeably different to the ear, particularly in
obscure sub-genres where the music is similar, but the vocal styles are wildly
different.

------
mbillie1
> There are so many that just loading, copying, and pasting all of them took
> the little script I wrote more than 20 hours.

I want to see that script.

~~~
lstamour
It ran on a netbook and was actually controlling a browser and copying and
pasting. It's not optimized, but it got the job done, so I'll applaud anyway.
Great article, I expect we'll see more like this soon. ;-)

~~~
tonfa
It actually looks pretty wasteful, since it probably needed to load the full
webpage instead of just the html.

~~~
ihsw
It was enough for the author to write a good story on it.

Wastefulness be damned, it got the job done.

------
peter303
I wonder how often there are "new generes", that is a complete creative movie
that doesnt fit an existing one. Someofthose quirkly little movies that seem
to win oscars may be such.

------
jasimq
I wonder if NSA would find that data useful. You know, to profile netflix
users.

------
alvare
What about the Perry Mason thing? That's scary shit.

~~~
firebones
Theories:

1) Assuming there's some clustering algorithm in use, it could just be some
tradeoff edge case that isn't worth optimizing away.

2) It could be that one of their reviewers had an overly-aggresive and non-
standard genre-tagging approach for Perry Mason and classified a bunch of
shows in such a way that the source data was polluted. This could be something
stupid like most other movies having many reviewers giving higher confidence,
while a large number of Perry Mason shows and DVDs only had a single reviewer
who was either through a bug or through overweighting, given too much
influence. This seems to be the most likely cause of the bug--some skew or
amplification in the source data.

3) Intentional poisoning of the results, like cartographers putting in bogus
features, or data sellers seeding their data with watermarks, etc.

------
afterburner
Royalty is America's second favourite topic?

~~~
smacktoward
Nobody loves monarchs more than someone who has never had to live under one.

------
xacaxulu
Anyone else wish that "Taken 3: Lil Bub" was a real movie?

