Protecting Netflix Viewing Privacy at Scale

nocarrier · on Aug 8, 2016

The two AsiaBSD papers linked from the post are good for more detail. I was a little surprised they had to hack sendfile to do the crypto in the kernel in order to get the throughput they're used to with http, but the reasons are explained in the papers.

However, I'm quite surprised Netflix went with Intel's ISA-L library for AES-GCM given Intel's perf gains were so very marginal compared to BoringSSL. I would have gone with the library that had more eyeballs on it, and in general I'd give Google the edge over writing solid, secure code than I would Intel.

drewg123 · on Aug 8, 2016

I'm on the team. A few limited comments:

ISA-L: There will be some more recent results presented next week at IDF: http://myeventagenda.com/sessions/0B9F4191-1C29-408A-8B61-65...

Hacking sendfile:

Note that only the bulk encryption for a limited number of ciphers is done in the kernel. All the TLS setup still happens in the userspace SSL library. So the kernel part is quite small. So it is more like hacking the bulk encryption into the kernel, not the entire library.

mfjordvald · on Aug 8, 2016

Have you guys ever written anything on how you configure nginx and FreeBSD in general? Would love to read more about this.

drewg123 · on Aug 8, 2016

That's a great idea -- I will pass it along.

I'm afraid that most of the interesting "configuration" is to run patched kernel (async sendfile vs aio + sendfile, tls sendfile vs read/encrypt/send, etc). Of course, I work on the kernel, so I'm biased :)

2trill2spill · on Aug 8, 2016

> I'm afraid that most of the interesting "configuration" is to run patched kernel (async sendfile vs aio + sendfile, tls sendfile vs read/encrypt/send, etc).

Do you have any idea if or when Netflix plans on open sourcing tls sendfile?

drewg123 · on Aug 9, 2016

It has always been the plan to upstream it. However, the patch is rather extensive, and it needs quite a lot of cleanup. (for example, making ISA-L pluggable)

nrb · on Aug 8, 2016

Is protecting user privacy really the primary focus of this initiative? It seems to be another step in the multi-front battle between Netflix and ISPs:

1) Expose ISPs who limit your bandwidth to the service. 2) Protect against manipulation/degradation of the video stream quality.

fred256 · on Aug 8, 2016

There may be another angle to this: Netflix does not publish viewership ratings of its shows, but there have been a few attempts by various companies to infer the ratings by traffic analysis. Encrypting the traffic makes it harder to do so.

brianwawok · on Aug 8, 2016

Does this really solve the ISP identifying the traffic?

I suspect netflix still has limited source IPs. If you see netflix like steady bandwidth from a known netflix IP, pretty sure it would be trivial to throttle.

samplonius · on Aug 8, 2016

It is not that Netflix is being access, but what on Netflix is being accessed.

Netflix does not publish viewership information. But a large ISP could run DPI on Netflix traffic to determine what content is being viewed. And since many ISPs have their own TV product, they would be really interested in that information themselves.

But if Netflix streams at TLSed, then good luck figuring out Archer from The Lust of the Dead.

tveita · on Aug 8, 2016

> good luck figuring out Archer from The Lust of the Dead.

Not necessarily that hard. TLS won't hide the sizes of the files being downloaded. You may even be able to estimate the size of each segment as they're downloaded, which should give you a pretty accurate fingerprint.

You'd need to spend resources to play each movie with different devices and bandwidths and record the traffic pattern, which raises the bar a fair bit.

angry_octet · on Aug 9, 2016

The traffic pattern would be more specific to the playback device than the content. There might be more spread in duration, but even so, television networks like shows to go for specific times for ease of programming. As for encoding rates, it is quite possible they use CBR. Even if they use VBR they may choose different playout sources depending on the consuming devices and network conditions.

On the whole I doubt you would have a high probability of identifying any specific show. Even if you are able to cull 75% of the possibilities (i.e. 4 bits of entropy) that still leaves lots of shows (total ~1000 tv series and ~5000 movies by one source), plus all the noise of people pausing/switching, skipping credits, etc.

repsilat · on Aug 9, 2016

How big is a show, sent compressed? A gigabyte an episode? I guess the range is more relevant, say plus or minus ten megs to be conservative. I don't know how big packets are, maybe 4k is too small? 10M/4k=2.5k, not bad, but not great if you want to avoid birthday collisions, you'd only get maybe 60 uniques if they're uniformly distributed.

CBR does kill it, though, and "uniformly distributed" is too big an ask.

Off topic, but an interesting thought: you know the HBO intro? With the static? That static is the hardest thing in the world to compress, and also the thing that viewers care the least about having compressed accurately. That's weird, I wonder how true it is across the board -- certainly artifacts can be jarring in flat shaded cartoons...

angry_octet · on Aug 9, 2016

900MB for 720p for an episode of TV drama seems typical, but I haven't run wireshark on my Netflix yet.

For DSL typical packet MTU is 1400-something. Why do you ask? It doesn't really matter because the upper level proto is oblivious. You can just use b/s instead of packets/s if you want to compare bitrates.

There are special encoding modes in 264/265 for handling animation. I'm not aware if there being modes for the intro except that it is monochromatic.

repsilat · on Aug 10, 2016

> For DSL typical packet MTU is 1400-something. Why do you ask?

I was wondering how finely you can get the total filesize if it's encrypted. Can you just count fixed size packets, or can you figure the length down to the byte? Makes a big difference if you're trying using that to fingerprint the shows.

angry_octet · on Aug 14, 2016

You can reassemble the https stream and get the exact length. Unless they do something weird like multiplexing, or change the playout rate, or the user pauses, it should be exact.

ejcx · on Aug 8, 2016

No it would not, you're right, but there's more too it.

Most ISPs, it's just easier to get an OCA in their data center that going through the process of throttling traffic or whatever else they want to do with it.

TLS SNI is still a way that an ISP could identify the traffic when it's routing folks on the net to an OCA in a different AS.

CaptSpify · on Aug 8, 2016

Aren't there ISP's using other traffic-shaping techniques than simply throttling? I'm guessing this is meant to also impede those.

kakarot · on Aug 8, 2016

I guess it solves both problems, so it's probably a smart move.

hm8 · on Aug 9, 2016

This is amazing. May be the paper has details on the following questions: 1. Is the data being encrypted on the go, meaning it is encrypted as needed probably with the login user's shared key. That would explain the need for running sendfile on every video traffic packet. 2. How would CDN caching work with this?

bertrandmt · on Aug 19, 2016

1. the data is encrypted with the established TLS session key for the current downloading session, i.e. it is indeed a per instance key. Therefore, there is no choice but to encrypt on the go.

2. the CDN cache in fact is the place where this encryption (for the purpose of TLS) takes place. It therefore does not interfere with the CDN function.

0x0 · on Aug 8, 2016

DRM is briefly mentioned. But no explanation for why TLS is necessary on top of that. Is the DRM crypto cracked? Or, as they hint to, are DRM keys the same for all users ("pre-encoded") and thus identifiable?

angry_octet · on Aug 9, 2016

It is pre-encoded, with a unique stream key (like a session key). The stream key is then encrypted with the public key of each playback manufacturer (multiple copies of the key, one per manufacturer) in DVD CSS. A more complicated scheme is used for Bluray which provides for unique keys for each player but without significant overhead (see http://www.wisdom.weizmann.ac.il/~naor/PAPERS/2nl.html).

davb · on Aug 8, 2016

I presume the DRM encrypts only the video data, not metadata (video titles, etc).

Animats · on Aug 8, 2016

"Protecting viewing privacy". Yeah, right. Netflix successfully lobbied Congress to be exempted from the Video Privacy Protection Act [1], so they could monetize viewing information.

[1] http://money.cnn.com/2013/01/10/technology/social/netflix-vp...

LeifCarrotson · on Aug 8, 2016

Perhaps there were nefarious purposes behind that lobbying.

But the stated purpose was more benign: they couldn't add "share" buttons that would let you one-click-post "I'm watching House of Cards on Netflix" because they were supposed to have written consent to release that information.

I am OK with Netflix getting permission to build useful features. And I am absolutely OK with Netflix using my anonymized viewing data to, for example, build better content predictors to make my experience with their service better.

The law protected judges with sketchy video rentals from nosy journalists. It has almost nothing to do with protecting Netflix viewers from ISP and government surveillance. This measure deals with the latter.

CaptSpify · on Aug 8, 2016

You may be OK with them using that data, but not everyone is. I don't really see a downside to it being optional for the consumer.

LeifCarrotson · on Aug 8, 2016

The law said that even "Optional" meant they needed individual written permission. As I recall, they interpreted that to mean a check box or button click was insufficient.

A potential downside is that my predictions are less effective because I don't get to find that because you and I both liked Monk and Pink Panther, I would probably also like Psych, or another show you liked.

If this required mailing in a letter to Netflix so they could use your data, it would never work.

I would be fine with it being opt-out, or even opt-in. I would be curious what the response would be if you had to opt in to get useful recommendations.

pc86 · on Aug 8, 2016

Playing devil's advocate but couldn't that written consent easily be included in the TOS/TOU? Not sure why they would need a legislative exemption.

samplonius · on Aug 8, 2016

Please laws over ride TOS?

Can you add a item to the ToS that says: Vendor is allowed to kill customer at any time during contract term?

Maybe that is a bad example, since assisted suicide is basically that.

cpmsmith · on Aug 8, 2016

TOS agreements are looking pretty shaky in terms of enforceability lately, so if I were them I wouldn't want to rely on that alone.

mediaserf · on Aug 8, 2016

The place to start for Netflix would be some real account recovery. If your account is compromised and the email address changed, they set you up with a new one instead of recovering the existing one. They also don't credit you for any time remaining on your previous subscription.