The two AsiaBSD papers linked from the post are good for more detail. I was a little surprised they had to hack sendfile to do the crypto in the kernel in order to get the throughput they're used to with http, but the reasons are explained in the papers.
However, I'm quite surprised Netflix went with Intel's ISA-L library for AES-GCM given Intel's perf gains were so very marginal compared to BoringSSL. I would have gone with the library that had more eyeballs on it, and in general I'd give Google the edge over writing solid, secure code than I would Intel.
Note that only the bulk encryption for a limited number of ciphers is done in the kernel. All the TLS setup still happens in the userspace SSL library. So the kernel part is quite small. So it is more like hacking the bulk encryption into the kernel, not the entire library.
I'm afraid that most of the interesting "configuration" is to run patched kernel (async sendfile vs aio + sendfile, tls sendfile vs read/encrypt/send, etc). Of course, I work on the kernel, so I'm biased :)
> I'm afraid that most of the interesting "configuration" is to run patched kernel (async sendfile vs aio + sendfile, tls sendfile vs read/encrypt/send, etc).
Do you have any idea if or when Netflix plans on open sourcing tls sendfile?
It has always been the plan to upstream it. However, the patch is rather extensive, and it needs quite a lot of cleanup. (for example, making ISA-L pluggable)
Is protecting user privacy really the primary focus of this initiative? It seems to be another step in the multi-front battle between Netflix and ISPs:
1) Expose ISPs who limit your bandwidth to the service.
2) Protect against manipulation/degradation of the video stream quality.
There may be another angle to this: Netflix does not publish viewership ratings of its shows, but there have been a few attempts by various companies to infer the ratings by traffic analysis. Encrypting the traffic makes it harder to do so.
Does this really solve the ISP identifying the traffic?
I suspect netflix still has limited source IPs. If you see netflix like steady bandwidth from a known netflix IP, pretty sure it would be trivial to throttle.
It is not that Netflix is being access, but what on Netflix is being accessed.
Netflix does not publish viewership information. But a large ISP could run DPI on Netflix traffic to determine what content is being viewed. And since many ISPs have their own TV product, they would be really interested in that information themselves.
But if Netflix streams at TLSed, then good luck figuring out Archer from The Lust of the Dead.
> good luck figuring out Archer from The Lust of the Dead.
Not necessarily that hard. TLS won't hide the sizes of the files being downloaded. You may even be able to estimate the size of each segment as they're downloaded, which should give you a pretty accurate fingerprint.
You'd need to spend resources to play each movie with different devices and bandwidths and record the traffic pattern, which raises the bar a fair bit.
The traffic pattern would be more specific to the playback device than the content. There might be more spread in duration, but even so, television networks like shows to go for specific times for ease of programming. As for encoding rates, it is quite possible they use CBR. Even if they use VBR they may choose different playout sources depending on the consuming devices and network conditions.
On the whole I doubt you would have a high probability of identifying any specific show. Even if you are able to cull 75% of the possibilities (i.e. 4 bits of entropy) that still leaves lots of shows (total ~1000 tv series and ~5000 movies by one source), plus all the noise of people pausing/switching, skipping credits, etc.
How big is a show, sent compressed? A gigabyte an episode? I guess the range is more relevant, say plus or minus ten megs to be conservative. I don't know how big packets are, maybe 4k is too small? 10M/4k=2.5k, not bad, but not great if you want to avoid birthday collisions, you'd only get maybe 60 uniques if they're uniformly distributed.
CBR does kill it, though, and "uniformly distributed" is too big an ask.
Off topic, but an interesting thought: you know the HBO intro? With the static? That static is the hardest thing in the world to compress, and also the thing that viewers care the least about having compressed accurately. That's weird, I wonder how true it is across the board -- certainly artifacts can be jarring in flat shaded cartoons...
900MB for 720p for an episode of TV drama seems typical, but I haven't run wireshark on my Netflix yet.
For DSL typical packet MTU is 1400-something. Why do you ask? It doesn't really matter because the upper level proto is oblivious. You can just use b/s instead of packets/s if you want to compare bitrates.
There are special encoding modes in 264/265 for handling animation. I'm not aware if there being modes for the intro except that it is monochromatic.
> For DSL typical packet MTU is 1400-something. Why do you ask?
I was wondering how finely you can get the total filesize if it's encrypted. Can you just count fixed size packets, or can you figure the length down to the byte? Makes a big difference if you're trying using that to fingerprint the shows.
You can reassemble the https stream and get the exact length. Unless they do something weird like multiplexing, or change the playout rate, or the user pauses, it should be exact.
No it would not, you're right, but there's more too it.
Most ISPs, it's just easier to get an OCA in their data center that going through the process of throttling traffic or whatever else they want to do with it.
TLS SNI is still a way that an ISP could identify the traffic when it's routing folks on the net to an OCA in a different AS.
This is amazing.
May be the paper has details on the following questions:
1. Is the data being encrypted on the go, meaning it is encrypted as needed probably with the login user's shared key. That would explain the need for running sendfile on every video traffic packet.
2. How would CDN caching work with this?
1. the data is encrypted with the established TLS session key for the current downloading session, i.e. it is indeed a per instance key. Therefore, there is no choice but to encrypt on the go.
2. the CDN cache in fact is the place where this encryption (for the purpose of TLS) takes place. It therefore does not interfere with the CDN function.
DRM is briefly mentioned. But no explanation for why TLS is necessary on top of that. Is the DRM crypto cracked? Or, as they hint to, are DRM keys the same for all users ("pre-encoded") and thus identifiable?
It is pre-encoded, with a unique stream key (like a session key). The stream key is then encrypted with the public key of each playback manufacturer (multiple copies of the key, one per manufacturer) in DVD CSS. A more complicated scheme is used for Bluray which provides for unique keys for each player but without significant overhead (see http://www.wisdom.weizmann.ac.il/~naor/PAPERS/2nl.html).
"Protecting viewing privacy". Yeah, right. Netflix successfully lobbied Congress to be exempted from the Video Privacy Protection Act [1], so they could monetize viewing information.
Perhaps there were nefarious purposes behind that lobbying.
But the stated purpose was more benign: they couldn't add "share" buttons that would let you one-click-post "I'm watching House of Cards on Netflix" because they were supposed to have written consent to release that information.
I am OK with Netflix getting permission to build useful features. And I am absolutely OK with Netflix using my anonymized viewing data to, for example, build better content predictors to make my experience with their service better.
The law protected judges with sketchy video rentals from nosy journalists. It has almost nothing to do with protecting Netflix viewers from ISP and government surveillance. This measure deals with the latter.
The law said that even "Optional" meant they needed individual written permission. As I recall, they interpreted that to mean a check box or button click was insufficient.
A potential downside is that my predictions are less effective because I don't get to find that because you and I both liked Monk and Pink Panther, I would probably also like Psych, or another show you liked.
If this required mailing in a letter to Netflix so they could use your data, it would never work.
I would be fine with it being opt-out, or even opt-in. I would be curious what the response would be if you had to opt in to get useful recommendations.
The place to start for Netflix would be some real account recovery. If your account is compromised and the email address changed, they set you up with a new one instead of recovering the existing one. They also don't credit you for any time remaining on your previous subscription.
However, I'm quite surprised Netflix went with Intel's ISA-L library for AES-GCM given Intel's perf gains were so very marginal compared to BoringSSL. I would have gone with the library that had more eyeballs on it, and in general I'd give Google the edge over writing solid, secure code than I would Intel.