Netflix can serve up to 40Gbps of traffic per server in cleartext HTTP. Their initial test of adding TLS induced a performance cut of ~3x, which has huge cost impact to their infrastructure. This is due to sendfile(), a kernel syscall used by nginx, not being available when data needs to be encrypted by TLS in userland.
Netflix really needs sendfile() to work with TLS to enable HTTPS on all of its traffic, so they added some limited crypto support to the BSD kernel. What that paper really demonstrates is the need for a TLS stack in the kernel that can let userspace daemons benefit from low-level optimizations. I, for one, would love to see something like this added to the linux kernel!
This is great work from Netflix. They could have just said "screw it, it's too hard/expensive, we'll do HTTP". But instead they listened to a minority of their users that really cares about privacy and dedicated time and resources to fixing a hard problem. Really impressive work!
"What that paper really demonstrates is the need for a TLS stack in the kernel that can let userspace daemons benefit from low-level optimizations. I, for one, would love to see something like this added to the linux kernel!"
The direction the industry seems to be going in instead is to put the network stack in userspace. Same effect on speed, less likely to produce kernel vulnerabilities.
I'm not sure the network stack in userspace is going to help MUCH in this scenario.
sendfile() does the disk (fd) read / enqueue to network socket (fd) in the kernel space. This ends up being DMA to pagecache from the file and hopefully DMA to network socket Eg. zero copy. In reality the TCP stack will have some overhead (segmenting).
If they have a network stack in userspace you need to make syscalls to for disk IO and you end up making copies from the page cache to your application buffers. You can do DMA from disk to your userspace buffer with O_DIRECT, but then you lose the page cache (and you'll end up having to re-implement it in userspace).
If you don't have disk IO that needs be performed. Say you have a in-memory hash table then you might be able to get a measurable improvement in performance by using a userspace network stack.
Lastly, you still need (some) elevated privileges for networking to a TCP in userspace so you end with a vulnerability vector there.
To make this really work involves special hardware and driver support; user-space networking, AFAIK, doesn't help for conventional hardware, for those reasons. I also don't know if they're currently doing anything with TLS; the guys working on it next to me are just doing lower-level routing now and deliberately not peeking into the streams at all, let alone doing TLS.
What's particularly nervewracking in security terms for the user-space networking stuff is really the fact that right now, you're probably only using it if you want to push 10Gbps, therefore you're working in an environment where you have merely hundreds of cycles to figure out what to do with a packet (maybe low single-digit thousands if your multicore support is working perfectly), and therefore everything is written in raw C. Ack. We needed Rust 1.0 about five years ago.
Perhaps full TLS in the kernel isn't necessary, but only the stream encryption - which tends to be the simpler part and thus less likely to have vulnerabilities compared to session negotiation/key exchange/etc.
It might be a rather expensive option, but what do you think of offloading TLS to the NIC? A lot of them do TCP offload already and there are "SSL Accelerator" HSMs available.
Judging by the few comments here, people seem to be more accepting of moving tls into the kernel than they are of windows moving http into the kernel. Is this a double standard? They are both protocols implemented on tcp, most often by userspace programs.
Netflix really needs sendfile() to work with TLS to enable HTTPS on all of its traffic, so they added some limited crypto support to the BSD kernel. What that paper really demonstrates is the need for a TLS stack in the kernel that can let userspace daemons benefit from low-level optimizations. I, for one, would love to see something like this added to the linux kernel!
This is great work from Netflix. They could have just said "screw it, it's too hard/expensive, we'll do HTTP". But instead they listened to a minority of their users that really cares about privacy and dedicated time and resources to fixing a hard problem. Really impressive work!