
Protecting Netflix Viewing Privacy at Scale - samber
http://techblog.netflix.com/2016/08/protecting-netflix-viewing-privacy-at.html
======
nocarrier
The two AsiaBSD papers linked from the post are good for more detail. I was a
little surprised they had to hack sendfile to do the crypto in the kernel in
order to get the throughput they're used to with http, but the reasons are
explained in the papers.

However, I'm quite surprised Netflix went with Intel's ISA-L library for AES-
GCM given Intel's perf gains were so very marginal compared to BoringSSL. I
would have gone with the library that had more eyeballs on it, and in general
I'd give Google the edge over writing solid, secure code than I would Intel.

~~~
drewg123
I'm on the team. A few limited comments:

ISA-L: There will be some more recent results presented next week at IDF:
[http://myeventagenda.com/sessions/0B9F4191-1C29-408A-8B61-65...](http://myeventagenda.com/sessions/0B9F4191-1C29-408A-8B61-65D7520025A8/14/5#sessionID=1362)

Hacking sendfile:

Note that only the bulk encryption for a limited number of ciphers is done in
the kernel. All the TLS setup still happens in the userspace SSL library. So
the kernel part is quite small. So it is more like hacking the bulk encryption
into the kernel, not the entire library.

~~~
mfjordvald
Have you guys ever written anything on how you configure nginx and FreeBSD in
general? Would love to read more about this.

~~~
drewg123
That's a great idea -- I will pass it along.

I'm afraid that most of the interesting "configuration" is to run patched
kernel (async sendfile vs aio + sendfile, tls sendfile vs read/encrypt/send,
etc). Of course, I work on the kernel, so I'm biased :)

~~~
2trill2spill
> I'm afraid that most of the interesting "configuration" is to run patched
> kernel (async sendfile vs aio + sendfile, tls sendfile vs read/encrypt/send,
> etc).

Do you have any idea if or when Netflix plans on open sourcing tls sendfile?

~~~
drewg123
It has always been the plan to upstream it. However, the patch is rather
extensive, and it needs quite a lot of cleanup. (for example, making ISA-L
pluggable)

------
nrb
Is protecting user privacy really the primary focus of this initiative? It
seems to be another step in the multi-front battle between Netflix and ISPs:

1) Expose ISPs who limit your bandwidth to the service. 2) Protect against
manipulation/degradation of the video stream quality.

~~~
brianwawok
Does this really solve the ISP identifying the traffic?

I suspect netflix still has limited source IPs. If you see netflix like steady
bandwidth from a known netflix IP, pretty sure it would be trivial to
throttle.

~~~
samplonius
It is not that Netflix is being access, but what on Netflix is being accessed.

Netflix does not publish viewership information. But a large ISP could run DPI
on Netflix traffic to determine what content is being viewed. And since many
ISPs have their own TV product, they would be really interested in that
information themselves.

But if Netflix streams at TLSed, then good luck figuring out Archer from The
Lust of the Dead.

~~~
tveita
> good luck figuring out Archer from The Lust of the Dead.

Not necessarily that hard. TLS won't hide the sizes of the files being
downloaded. You may even be able to estimate the size of each segment as
they're downloaded, which should give you a pretty accurate fingerprint.

You'd need to spend resources to play each movie with different devices and
bandwidths and record the traffic pattern, which raises the bar a fair bit.

~~~
angry_octet
The traffic pattern would be more specific to the playback device than the
content. There might be more spread in duration, but even so, television
networks like shows to go for specific times for ease of programming. As for
encoding rates, it is quite possible they use CBR. Even if they use VBR they
may choose different playout sources depending on the consuming devices and
network conditions.

On the whole I doubt you would have a high probability of identifying any
specific show. Even if you are able to cull 75% of the possibilities (i.e. 4
bits of entropy) that still leaves lots of shows (total ~1000 tv series and
~5000 movies by one source), plus all the noise of people pausing/switching,
skipping credits, etc.

~~~
repsilat
How big is a show, sent compressed? A gigabyte an episode? I guess the range
is more relevant, say plus or minus ten megs to be conservative. I don't know
how big packets are, maybe 4k is too small? 10M/4k=2.5k, not bad, but not
great if you want to avoid birthday collisions, you'd only get maybe 60
uniques if they're uniformly distributed.

CBR does kill it, though, and "uniformly distributed" is too big an ask.

Off topic, but an interesting thought: you know the HBO intro? With the
static? That static is the hardest thing in the world to compress, and also
the thing that viewers care the least about having compressed accurately.
That's weird, I wonder how true it is across the board -- certainly artifacts
can be jarring in flat shaded cartoons...

~~~
angry_octet
900MB for 720p for an episode of TV drama seems typical, but I haven't run
wireshark on my Netflix yet.

For DSL typical packet MTU is 1400-something. Why do you ask? It doesn't
really matter because the upper level proto is oblivious. You can just use b/s
instead of packets/s if you want to compare bitrates.

There are special encoding modes in 264/265 for handling animation. I'm not
aware if there being modes for the intro except that it is monochromatic.

~~~
repsilat
> For DSL typical packet MTU is 1400-something. Why do you ask?

I was wondering how finely you can get the total filesize if it's encrypted.
Can you just count fixed size packets, or can you figure the length down to
the byte? Makes a big difference if you're trying using that to fingerprint
the shows.

~~~
angry_octet
You can reassemble the https stream and get the exact length. Unless they do
something weird like multiplexing, or change the playout rate, or the user
pauses, it should be exact.

------
hm8
This is amazing. May be the paper has details on the following questions: 1\.
Is the data being encrypted on the go, meaning it is encrypted as needed
probably with the login user's shared key. That would explain the need for
running sendfile on every video traffic packet. 2\. How would CDN caching work
with this?

~~~
bertrandmt
1\. the data is encrypted with the established TLS session key for the current
downloading session, i.e. it is indeed a per instance key. Therefore, there is
no choice but to encrypt on the go.

2\. the CDN cache in fact is the place where this encryption (for the purpose
of TLS) takes place. It therefore does not interfere with the CDN function.

------
0x0
DRM is briefly mentioned. But no explanation for why TLS is necessary on top
of that. Is the DRM crypto cracked? Or, as they hint to, are DRM keys the same
for all users ("pre-encoded") and thus identifiable?

~~~
angry_octet
It is pre-encoded, with a unique stream key (like a session key). The stream
key is then encrypted with the public key of each playback manufacturer
(multiple copies of the key, one per manufacturer) in DVD CSS. A more
complicated scheme is used for Bluray which provides for unique keys for each
player but without significant overhead (see
[http://www.wisdom.weizmann.ac.il/~naor/PAPERS/2nl.html](http://www.wisdom.weizmann.ac.il/~naor/PAPERS/2nl.html)).

------
Animats
"Protecting viewing privacy". Yeah, right. Netflix successfully lobbied
Congress to be exempted from the Video Privacy Protection Act [1], so they
could monetize viewing information.

[1] [http://money.cnn.com/2013/01/10/technology/social/netflix-
vp...](http://money.cnn.com/2013/01/10/technology/social/netflix-vppa-
facebook/)

~~~
LeifCarrotson
Perhaps there were nefarious purposes behind that lobbying.

But the stated purpose was more benign: they couldn't add "share" buttons that
would let you one-click-post "I'm watching House of Cards on Netflix" because
they were supposed to have written consent to release that information.

I am OK with Netflix getting permission to build useful features. And I am
absolutely OK with Netflix using my anonymized viewing data to, for example,
build better content predictors to make my experience with their service
better.

The law protected judges with sketchy video rentals from nosy journalists. It
has almost nothing to do with protecting Netflix viewers from ISP and
government surveillance. This measure deals with the latter.

~~~
pc86
Playing devil's advocate but couldn't that written consent easily be included
in the TOS/TOU? Not sure why they would need a legislative exemption.

~~~
samplonius
Please laws over ride TOS?

Can you add a item to the ToS that says: Vendor is allowed to kill customer at
any time during contract term?

Maybe that is a bad example, since assisted suicide is basically that.

------
mediaserf
The place to start for Netflix would be some real account recovery. If your
account is compromised and the email address changed, they set you up with a
new one instead of recovering the existing one. They also don't credit you for
any time remaining on your previous subscription.

