
Netflix's Open Connect Appliances - elecboy
http://gizmodo.com/this-box-can-hold-an-entire-netflix-1592590450
======
frio
Isn't this just a CDN? ISPs quite happily host Akamai and Google nodes -- are
they being paid for maintenance of those? Empirically, from an old job, the
answer is _no_ (in fact, we had to petition for quite a while to get some
Google boxen), but that's in a country at the tail end of a long, long pipe.
Is it different in the US?

~~~
porpoisemonkey
What this article is describing sounds like a colocation agreement which is a
quite common service offered by ISPs and is typically paid for by a monthly
rental fee calculated by storage size, dedicated bandwidth use and power
consumption.

~~~
rahimnathwani
It's not the same. If I want Verizon to host my server, they have nothing to
gain, but I do. So, yes, they should charge me. If instead they host a CDN
server for a big content provider, they save money on routing content within
their network.

~~~
jsz0
That's not always the case though. Sometimes rack space is more valuable to
ISPs than improving bandwidth efficiency via CDN co-location deals. It really
depends how much space you've got to spare and who else wants it.

~~~
liveoneggs
a friendly ISP always has rack space.. When you own a few data centers for
yourself space is easy to come by.

------
nrzuk
Could someone explain to me (it's been a long day, and I'm no storage expert!)
how these are useful, wouldn't you need a whole bunch of these at each
location they are to be installed?

Just basing this on my own experience I have 12 drives in RAID which have a
fairly substantial sequential throughput. Start multiple streams of high
bandwidth videos and the maximum throughput from the drives drops sharply due
to the random reads.

I would have assumed that having even 100 people streaming from a single box
would be an interesting challenge.

~~~
twoodfin
These things can pump out 12Gbit during peak viewing, which is enough for
about 4,000 avg. 3 Mbit streams.

If they can cram in 256GB of DRAM, that's enough room for a 64MB buffer per
stream, or about 170 seconds worth of streaming. Now you only need to be able
to fill those buffers at a rate of about 24/second.

I'm assuming whatever file system is on the disks uses a massive block size,
so the number of seeks you'd have to perform to pull 64MB off is probably
pretty low. Eight? Sixteen? Even if it's the latter, that's only 384
seeks/second, which you could very plausibly do striped over only a half dozen
disks, and the device presumably has many more.

~~~
runlevel1
> I'm assuming whatever file system is on the disks...

Netflix uses UFS+J on FreeBSD 10.

Here are some notes from their talk at NYCBSDCon 2014:

\- 400,000 stream files per appliance.

\- 5,000 - 25,000 client streams per appliance.

\- 300 - 500 streams coming off each disk all the time.

\- Attempt to buffer 1MB ahead, but caching is futile.

\- Result is completely random disk workload.

\- System becomes limited by disk latency and CPU load.

Video here:
[https://www.youtube.com/watch?v=FL5U4wr86L4](https://www.youtube.com/watch?v=FL5U4wr86L4)

------
JTon
Can anyone explain why an ISP would push back Netflix installing one of these
boxes? Does the ISP want Netflix service congestion to better sell its own
media assets/distribution (i.e. cable TV)?

~~~
tobz
Based on the blog posts by Level 3 -- how much internal capacity Comcast /
Verizon say they have versus what Level 3 sees for utilization at their
peering points with both ISPs -- I would have to assume "yes".

~~~
dragontamer
Can you please post a link? This sounds very interesting...

~~~
k3oni
Here's a link to one of the blog posts: [http://blog.level3.com/global-
connectivity/verizons-accident...](http://blog.level3.com/global-
connectivity/verizons-accidental-mea-culpa/)

------
bashinator
I'd be interested to find out more of hardware details of these devices. 100+
TB of storage in a 4U is pretty respectable. From the images in the article,
it looks like the drives are _not_ hot-swappable, so I'd guess Netflix is able
to remotely track loss of redundancy and will just send out an entire
replacement unit when needed.

For comparison, Supermicro makes one of the highest-density storage servers
that I'm aware of[1]. 72 3.5" drives in 36 drive bays, so up to 288tB of raw
storage, if you're brave enough to use 4tB drives.

[1]
[http://www.supermicro.com/products/system/4U/6047/SSG-6047R-...](http://www.supermicro.com/products/system/4U/6047/SSG-6047R-E1R72L2K.cfm)

~~~
jedberg
> It looks like the drives are not hot-swappable,

That's correct.

> so I'd guess Netflix is able to remotely track loss of redundancy and will
> just send out an entire replacement unit when needed.

Exactly. If a drive dies, the capacity of that box is just reduced. Once
enough drives die, the box gets an RMA.

~~~
bashinator
> Exactly. If a drive dies, the capacity of that box is just reduced.

So not even redundancy? Just cope with losing whatever media happened to be on
the dead drive?

~~~
MBCook
The box is effectively one giant cache. If you lose a drive you can get the
movie back from the main distribution network.

The article also mentions the box stores multiple copies of done things for
increased throughput on popular titles.

------
mirkules
From my understanding, these boxes are basically local copies of the entire
Netflix online catalog - a catalog that frequently changes. How do these
~15,000 machines pick up new content? I assume they synchronize with the
Netflix mothership every once in a while.

My question (without any intent to trivialize what Netflix is doing) is, isn't
that just a good-old caching system?

But it makes me wonder if it would be possible to create a distributed,
localized caching system? For example, if I watch The Avengers, my machine
could be configured to cache a part (or all) of the movie. The more people
watch it, the more it is distributed. Since Netflix can correlate location
data with what movies are watched, they can target certain areas to cache
certain movies more aggressively. When the next person in my neighborhood
watches The Avengers, they would pull part of the movie from me (maybe the
beginning until the rest buffers) and cache another part of it themselves for
the next person in line. That way all the people in my neighborhood or
adjacent neighborhoods don't have to pull the entire movie across the internet
every time - instead, we help each other watch movies with higher quality.

On second thought, this may not be cheap considering the amount of development
that may be required (and, at the end of the day, someone has to pay for this
storage - although Netflix could, for example, offer personal mini-caching
boxes in exchange for a monthly discount), but it's fun to dream about a
sanctioned, decentralized, peer-to-peer movie service.

~~~
rakoo
So, Bittorrent ?

It _would_ be a perfect setup (much much better than sending the same stream
through the same lines just because 2 persons are watching the same show)

The problem here is not technical (Bittorrent solves that) the problem is
societal: for most of the content that is of interest to population, you are
not allowed to redistribute it. So Netflix has to DRM-protect the content for
everyone, meaning that the actual content that transits on the line has to be
different, meaning that you can't redistribute it (since it's targeted to you,
no one else could decode it).

If you want to see how much the technical side works, take a look at
PopcornTime.

~~~
mirkules
As I wrote that comment, BitTorrent came to mind with all the societal
implications that carries (that's what I meant by "sanctioned" in the last
sentence)

I haven't really explored PopCorn Time but I'll take a look.

With regards to DRM, the content must be de-protected before you watch it - so
if Netflix installed a mini-box on my network, they could decrypt it to let me
watch the show, and encrypt it to store on the device before it is sent to
someone else. The key itself could be retrieved from the a centralized server.

If I think about it some more, there really is no need to keep an _entire_
Netflix catalog in every location. As I mentioned, they have usage data and a
strong prediction system too, so predicting the next show that will be watched
in an area and even when would be "straight-forward" (In quotes because it
wouldn't be easy for me to do it, but I think it would be easy for Netflix).

Another thing Netflix could try is pre-fetching shows during off-hours. That
would work well for binge show watchers, and reduce peak time traffic.

------
tzs
If the ISPs are reclassified as common carriers, will they still be allowed to
let Netflix install those boxes?

Presumably there is limited space in the ISP data center, so they cannot let
everyone who wants to stick a box in there do so, so I'm wondering what
methods a common carrier can use to decide which content providers can have
access, and how this fundamentally differs from the "fast lanes" that common
carrier status is supposed to prevent.

~~~
wmf
Maybe they can come up with a rule like the box must save 2 Gbps of peak hour
bandwidth per U.

From the ISP's perspective they are hosting these boxes to save money, not
improve performance. If performance happens to improve (ahem) that's a
convenient side effect.

------
ubercore
Interesting to see some more details on the appliances. Isn't the argument
primarily about peering, not open connect appliances, though?

~~~
wmf
It's all one discussion in my mind. There are three ways to get Netflix:
boxes, peering, or transit (from cheapest to most expensive). If peering saves
the ISP money and the boxes save even more money, why are some ISPs refusing
to use them?

~~~
josho
Because some ISPs are trying to maximize their revenue by charging Netflix
directly.

There are a few other legitimate reasons as well. Some ISPs dislike hosting
hardware that they don't own in their data centre. Or it may be that an ISP
has contracts in place that commit the ISP to certain bandwidth costs, so
removing Netflix traffic may not actually save them money.

~~~
qq66
Why shouldn't an ISP charge Netflix to host a box in their datacenter? I'd
charge you a fee if you wanted to store your DVDs in my house, even if you let
me watch them.

~~~
wmf
Because the box saves the ISP more money than its hosting costs.

~~~
warfangle
On the other hand, if the ISP also runs a video service, they've got some very
compelling reasons to do everything they can to not increase the quality of
service for Netflix. They're green, and rectangular, and...

------
tux1968
What stops an unscrupulous ISP from making a timely copy of all his favorite
movies?

~~~
michaelneale
Same applies to all other CDNs that ISPs host?

Although this makes a fairly rich target of concentrated content ;)

------
Thaxll
Lot of mistakes in this article, afair it can't hold all the movies but just
the most popular. [http://oc.nflxvideo.net/docs/OpenConnect-Deployment-
Guide.pd...](http://oc.nflxvideo.net/docs/OpenConnect-Deployment-Guide.pdf)

" Each appliance stores a portion of the catalog, which in general is less
than the complete content library for a given region. Popularity changes, new
titles being added to the service, and re- encoded movies are all part of the
up to 7.5TB of nightly updates each cache must download to remain current. We
recommend setting up a download window during off-peak hours when there is
sufficient free capacity."

~~~
pessimizer
"But wait, you say. That'd be all fine and dandy if Netflix was etched in
stone, but it's not! They add and remove stuff all the time! That's right,
which is why most OCAs take a 7.5TB update. Every. Freaking. Day.

"If you had to take a 7.5TB update on your home internet, you would be
screwed. According to Ookla's Net Index, the average download speed in the
United States is 18.6Mbps. So with an average connection, that 7.5TB would
take you about 40 days to chew through.

"Fortunately, data centers tend to have a little more speed to work with. A
firehose, compared to your bendy straw. So Netflix recommends a much shorter
10-hour span (from 2AM to noon, local time) where these boxes can soak up
their updates with 1.2Gbps connections. That is to say, 'Google Fiber
speeds.'"

Source: TFA.

~~~
ssmoot
I'm sure thought has been put into it. I just wonder why the OCAs aren't
caches? I'd imagine a cache you be a lot more (mostly bandwidth) efficient for
the ISPs and scale much better.

I request a stream. It's proxied through the device. It's not present on the
device. So the device requests the 1080P from Netflix. It transcodes the
stream it receives on-the-fly and sends me whatever quality I require while
writing both the original and my lower-quality version to disk.

I mean maybe the limitation in that plan is the number of streams that could
be transcoded simultaneously. So maybe instead the on-the-fly version isn't
cached. Because I may need to switch quality on-the-fly. So maybe instead jobs
are just queued up for the 5 or so formats required based on the 1080P
download and they can process on spare transcoders opportunistically.

Just interesting to think through it as a programming problem. I'm sure the
people working on this are very smart. It just seems really heavy on the
bandwidth utilization as is.

~~~
mturmon
I can think of reasons why a pure cache would impose costs that are too high
-- namely, the device would then be writing during times of very read load.

But: I agree with your bigger point, that it's an interesting programming
problem.

We're all familiar with CPU caches, so it's interesting mental exercise to
think of this as the same system, but optimized for a different part of design
space.

