
How Netflix works: the stuff that happens when you hit Play - sds111
https://medium.com/refraction-tech-everything/how-netflix-works-the-hugely-simplified-complex-stuff-that-happens-every-time-you-hit-play-3a40c9be254b
======
luckydude
So this is perhaps a lame post but I'm super tired, been up since 4am. I'll
try and follow up with more detail in the morning.

This article, in my opinion, is way off. I base that on the fact that I've
been dancing with Netflix for a month, I might end up working with them to try
and make NUMA machines serve up content faster. As in I'd be working on
exactly what happens when you hit play.

My take on how Netflix serves you movies is nothing like what this article
says.

They have servers in every ISP, the servers send a heartbeat to a conductor in
AWS, the heartbeat says "I've got this content and I am this over worked",
when you hit play the app reaches out the conductor and says "I want this",
the conductor looks around and finds a server close to you that is not
overloaded and off you go.

That might look easy. It's not. Take a look at this post about how they fill a
100Gbit pipe:
[https://news.ycombinator.com/item?id=15367421](https://news.ycombinator.com/item?id=15367421)

I'm a kernel guy, I'm old school, I get what they are doing there, that is
impressive.

I wish hacker news got excited about the filling the pipe post and less
excited about this thread.

~~~
Slartie
Hehe, I can feel your pain. Had a discussion just a few days ago with some
guys demonstrating the typical enthusiasm about cloud stuff, especially AWS.
Because after all, Netflix is using it, and they create 1/3rd of all Internet
traffic, right?

There's this very widespread misconception about Netflix being a monolithic
service running on Amazons' cloud infrastructure, even though the truth is
that just the "rather boring" routine stuff like billing, view history
tracking, suggestions, everything necessary to show the UI is running there.
Netflix does a good job at this, but after all it's not that impressive when
you've architected some distributed systems yourself. Not even their extreme
take on the microservices concept - that is after all just a nice way of
letting their devs do "their thing" in the way they want with as little
restrictions from the environment as possible (which basically only works
because they clearly have above-average-competence devs who can deal with
these degrees of freedom).

What's really crazy is the way they squeeze unimaginable amounts of bytes per
second out of modern hardware and into the internet infrastructure. They
saturate 100Gbit links that usually serve an aggregation of many boxes in a
datacenter with just ONE box! This is way beyond what even most above-average
devs are capable of doing - you NEED old-school guys which still managed to
stay on top of the crazy stack created by the evolution of hardware and low-
level software in the last decade. There's not many of those out there, and
Netflix apparently managed to catch a good bunch of them. These guys do the
magic, and the magic they do never touches that damn Amazon Cloud. It just
floats way above it.

~~~
SomeStupidPoint
Netflix is my goto example of "do your core competency in-house, outsource the
rest".

They do one thing themselves in their stack: their distribution network for
content, and they do an incredible job of it. Every post I see about Netflix's
CDN for video is insightful and a learning experience.

Then they throw a CRUD app in the cloud on top of it and call it a day. Okay,
a little simplified -- there's still some neat tech in the DRM, in load-
balancing the CDN, and in keeping all of their tech highly available. But
conceivably, Netflix could retain much of their value by simply offering all
the content to other websites that displayed it to customers -- the hard part
of what they do is the CDN (and contracts with content owners), and opening
their platform to other interfaces doesn't change that. (Heck, Netflix might
be worth _more_ if they opened their content to other interfaces, since
they're not actually very good at the front end experience.)

But when I was doing consulting (about cloud stuff), that was my advice: do
the core of your business yourself (eg, CDN) then offload as much of the rest
as you can.

------
kaplun
_An Amazon Web Services data center in Frankfurt, Germany, specially dedicated
to CERN._

I believe this is actually the CERN computer center itself and totally
unrelated with Amazon.

[http://cds.cern.ch/record/1103476](http://cds.cern.ch/record/1103476)

------
qualitytime
You know what, I'm going to use this post to tell you what happened last night
when I hit play in firefox.

Nothing happened.

I saw The spinning loading wheel and the firefox/netflix header saying "blah
blah audio video software being installed try again blah restart blah"...

I waited, I reloaded the page, I googled, I checked the DRM settings, I did
blah blah blah.

Wasted time and frustrated I then opened microsoft edge and everything worked.

You know that firefox share is at 8% according to W3Counter? You know this
kind of crap will only reduce this?

And then we'll only have the big daddy corporate sponsored browsers.

And all because users want to watch netflix.

What a bag of mediocre horse shit.

~~~
crypt1d
FWIW, Firefox has been behaving quite weird for me for the last 2-4 weeks
(Linux version). It randomly freezes and slows down during page loading, even
crashed a few times. Seems to happen mostly when there is some video content
on one of the tabs, so I'm blaiming the plugins for now. Kinda sucks because I
switched from Chrome few months ago only. I really hope I didn't make a
mistake.

~~~
bababooey
Same thing happened to me last year. I got very hyped up on the switch. But
then it turned into hot garbage.

Tried to get support on the firefox subreddit and they told me I shouldn't be
using Firefox Beta if I'm not "technically inclined". Okay, so I switched to
the normal release and it was too slow for me to even bother. Back to
Google...

I'm wondering if their new quantum engine fixes issues. Are you on that newest
version?

------
wscott
BTW. Netflix is in the process of transitioning to TLS (https) transport. Not
because they need it for any reason, but because shady advertisers keep
snooping connections and using that to profile users. Netflix is tired of
being accused of selling user data. My understanding is that the movie was
already encrypted, but that stream could be fingerprinted to identify which
movie it was.

Using TLS is a lot more expensive so that costs them money, so I have to
respect them for that.

Grandma's TV is still unencrypted, but anyone who updates their client is
protected.

They do have to deal with a whole ton of legacy clients.

~~~
arianvanp
Alas. The https protected videos can be fingerprinted as well.
[https://news.ycombinator.com/item?id=14070130](https://news.ycombinator.com/item?id=14070130)

------
johnwheeler
i don’t understand how microservices solves the problem of broken interfaces.
if an api changes or disappears, how is that different from the locations.txt
file changing or disappearing?

i think it’s just another instance of humans making things more complicated
than they need be. Same line of reasoning Linus went with a monolith vs a
micro kernel.

~~~
reificator
Microservices set out to do the same thing as Object Oriented programming set
out to do. You define an {object,microservice} by exposing methods, and other
services and clients interact by calling those methods. Theoretically, if your
API remains stable, your internal implementation can change drastically and
the system will continue to function as it should. There's no reason you can't
draw these boundaries inside a monolith, but with microservices you'd have to
go out of your way to not have those boundaries.

IMO by using HTTP to communicate, they ended up being significantly closer to
the original concepts behind OOP and message passing.

[http://lists.squeakfoundation.org/pipermail/squeak-
dev/1998-...](http://lists.squeakfoundation.org/pipermail/squeak-
dev/1998-October/017019.html)

Now, whether microservices are successful in dealing with the problems they
set out to solve, and are worth the tradeoffs they entail is still up for
debate.

------
rb666
"A special piece of code is also added to these files to lock them with what
is called digital rights management or DRM — a technological measure which
prevents piracy of films."

They should probably update this to say "tries to prevent", as NF DRM has been
long cracked.

~~~
colde
Do you have a source for it being "cracked"?

Circumvented sure, but the actual cryptography components haven't been cracked
as far as I am aware.

~~~
wolco
I think lower resolutions have been cracked. I don't think 4K has been cracked
yet but the comment might be related to this bug.

[https://www.reddit.com/r/Piracy/comments/6pkypj/direct_strea...](https://www.reddit.com/r/Piracy/comments/6pkypj/direct_stream_copy_has_netflix_4k_streaming_drm/)

------
fasouto
Correct me if I'm wrong but the data center picture looks very similar to the
CERN computing center inside CERN installations in Meyrin.

~~~
saagarjha
The caption says that it is:

> An Amazon Web Services data center in Frankfurt, Germany, specially
> dedicated to CERN.

~~~
ephimetheus
But CERN is not in Frankfurt and there is a data center on site here.

------
Thaxll
I still believe that Netflix stack is way too complicated for what Netflix
needs. It's a CDN with a recommendation engine that is completely garbage (
the 95% stuff I'm interested ). Also comparing Youtube and Netflix speed,
Youtube is like 2/3x faster to load any content.

------
X86BSD
Their open conect FreeBSD boxes are nuts. The amount of data they spew is
crazy. 1/3 of all internet traffic. One service. Mind blowing.

~~~
StillBored
I suspect (having worked on an application handing > 100Gbit of I/O per node)
that the OS choice doesn't really matter that much.

That is because in my case, the data path portion basically talked directly to
a couple PCIe boards. It bypassed the entirety of the kernel outside of some
setup API's to claim memory/interrupts/etc. That meant the transfer limits
generally came down to lack of PCIe or memory bandwidth (depending on which
generation of machine/configuration we were using). The CPU's in the machines
spent 99.99% of their time running code we wrote. Despite the talents of most
OS developers, generic OS/driver code is not optimized for absolute
performance in one case, rather it tends to be tuned to perform well over a
wide range of situations. The general goal is to be a fair arbitrator of
system resources to multiple competing processes. Further, most general
purpose OS's are under the assumption that I/O is slow or low bandwidth. Take
the entirety of the linux filesystem/block layer/scsi layer, which is written
under the assumption that the system is attached to a high latency low
bandwidth spinning disk, so burning a few cycles coalescing requests, or
handling the page cache isn't a big deal. That code doesn't scale when you
plug it into a NVMe disk with 2GB/sec of bandwidth, much less a storage
network with 100GB/sec of IO bandwidth.

Anyway, if you throw all these assumptions away and ignore modern "best
practices" development models of assembling piles of unrelated libraries to
solve a task, you end up with really lean (probably fits in the L1i cache)
software that can perform two or three orders of magnitude faster than similar
code written using modern methods.

~~~
luckydude
That sounds like a ftp-like measurement of throughput, and yeah, what you said
will work for that just fine.

Netflix connections are typically about 1mbit/sec each (older apps open up ~4
connections per video for reasons that are no longer valid but the apps aren't
all updated).

So to fill a 100Gbit pipe they have 100,000 connections running at the same
time. Which makes filling that pipe super super impressive.

~~~
StillBored
In our case we were doing a fair amount of data manipulation, so it wasn't
strictly a case of pushing the data through, although we had higher bandwidth
per stream.

But, there are a bunch of different ways to solve the problems. I guess how
impressive it is depends on they have gone about solving their particular
cases. There is a fair number of network accelerators that offload individual
stream level management to little cores running on the network adapter itself.
Cavium, EzChip and now even companies like mellanox are playing in this space
[https://www.enterprisetech.com/2017/10/04/mellanox-
etherneta...](https://www.enterprisetech.com/2017/10/04/mellanox-ethernetarm-
nics-lighten-cpu-burden/).

So, i'm not sure the impressive parts are necessarily in the stream counts but
what they must be doing to "align" (for lack of a better term) them. AKA the
trade offs between keeping a few seconds of a video stream in RAM vs sourcing
it from disk/wherever so that multiple users streams are aligned to avoid
having to hit a secondary storage medium. In netflix's case I suspect that
requiring fairly large buffers on the endpoint allow them to get away with a
much lower QoS metric on any given stream.

Put another way, at least the few times I've watched netflix's bandwidth
usage, it seems to be bursty. It blasts a few 10's of MB/s of data and then
sits idle for a few seconds while the stream plays and then you get another
chunk.

~~~
luckydude
Randall Stewart at Netflix did a new TCP implementation that helps quite a
bit. And he did this really cool thing for the nay sayers, he made it possible
to have multiple stacks running in FreeBSD at the same time. I believe the
default is you get the original stack, you can ask for his stack, and he did a
super simple TCP stack just to show you how small a TCP stack could be.

They are using either Chelsio or Mellanox cards and they use the offload but
they are doing TLS with the Xeon cpus. So they are getting 100Gbit while
touching every byte.

And don't under estimate how hard it is to do 100,000 TCP connections. When I
was at SGI we had a bunch of big SMP machines (I think they were 12 cpu
Challenge) that someone was using to serve up web pages (AOL? It was someone
big). Modems brought that machine to its knees. You would think that would be
easy but it was not. A single (or small number of) fast streams is easy, a
boat load of slow streams is hard. Think about it, if you have a TCP stack
that gets a request and then nothing, you have all the overhead of finding
that socket, doing that work, then nothing. It's way easier to have a stream
of packets all for one socket.

It's that sort of stuff that they worked on so far as I can tell. Your caching
idea is nice but the cache hit rate is very very low. They did way more work
in the sendfile area, managing the page cache. Did you read Drew's post? It's
worth a read for sure.

~~~
StillBored
I didn't mean to minimize the difficulties of maintaining that many TCP
connections (much less getting useful work out of them). I read the original
article when it was on HN, but must have mentally thrown most of it away due
to the freebsd bias. So I just reread it, and the fact that they are getting
those numbers utilizing much of the OS buffer management and Nginx, is
impressive by itself. But their difficulties sort of plays into my original
assumptions. Basically, if you want cutting edge I/O perf your better off
dumping most general purpose OS's I/O stacks unless you want to spend a lot of
time re-engineering them to work around bottlenecks.

sendfile() is good, but the general concept tends to waste far to much time
doing filesystem traversals, buffer management, dma scatter gather lists, and
a bunch of other crap that gets in the way of getting a blob of data from the
disk, encrypting it, and passing it off to a send offload to handle breaking
up and apply the TCP/IP headers/checksums. Frankly the minimum MSS size is
something that ipv6 should have fixed, given that no one is on 9600bps modems,
but didn't.

Good for them for realizing that modern machines have a little less than a GB
of bandwidth per pcie lane per direction, and memory bandwidth to match. If
you don't mess up the CPU side of things you can even touch all that data once
or twice and still maintain pretty amazing I/O numbers.

EDIT: Also in the case of x86 NUMA, you _REALLY_ want to make sure that the
nvme/source disk, the memory buffer your writing to and the network adapter
are on the same node with the core doing the encryption/etc. That is pretty
easy if the "application" controls buffer allocation/pooling, but much harder
with a general purpose OS which will fragment the memory pools.

~~~
kev009
We'll make it work on FreeBSD

------
pidybi
nice to know ;)

------
alexnewman
Wish they would have mentioned the widevine drm Shit show

