Hacker News new | past | comments | ask | show | jobs | submit login
Serving Netflix Video Traffic at 800Gb/s and Beyond [pdf] (nabstreamingsummit.com)
590 points by ksec on Aug 19, 2022 | hide | past | favorite | 391 comments



I love technical content like this. Not only is it incredibly interesting and informative, it also serves as a perfect counterpoint to the popular "why does Netflix need X thousand engineers, I could build it in a weekend" sentiment that is frequently brought up on forums like this one.

Building software and productionalizing/scaling it are two very different problems, and the latter is far more difficult. Running a successful company always requires an unlimited number of very smart people who are willing to get their hands dirty optimizing every aspect of the product and business. Too many people today think that programming starts and ends at pulling a dozen popular libraries and making some API calls.


The way I've been putting it to people lately is: Never underestimate how hard a problem can grow by making it big. And also, at times, it is hard to appreciate how difficult something becomes if you haven't walked the path at least partially.

Like, from work, hosting postgres. At this point, I very much understand why a consultant once said - "You cannot make mistakes in a postgres 10GB or 100GB and a dozen transactions per second in size". And he's right, give it some hardware, don't touch knobs except for 1 or 2 and that's it. The average application accessing our postgres clusters is just too small to cause problems.

And then we have 2 postgres clusters with a dataset size of 1TB or 2TB peaking at like 300 - 400 transactions per second. That's not necessarily big or busy for what postgres can do, but it becomes noticeable that you have to do some things right at this point and some patterns just stop working hard.

And then there are people dealing with postgres instances 100 - 1000x bigger than this. And that's becoming tangibly awesome and frightening by now, using awesome in a more oldschool way there.


Not only make it big, engineer it in a way that makes it profitable for the business.

I'm sure there are many teams that could design such a network with nearly unlimited resources, but it is entirely different when you have profit margins.


I know how to operate a hose! How hard can it be to manage a large stream? It's "just" more water: https://www.youtube.com/watch?v=jxNM4DGBRMU


As someone once said “Big is different”


> Too many people today think that programming starts and ends at pulling a dozen popular libraries and making some API calls.

The needle keeps moving doesn’t it? A tremendous breadth of difficult problems can be effectively addressed by pulling together libraries and calling APIs today that weren’t possible before. Today’s hard problems are yesterday impossibilities. The challenge for those seeking to make an impact is to dream big enough.


The basic problem is the same, pushing the hardware to its limits.


The basic problem is delivering value to someone.


Programmers are not passionate to deliver value to someone, that's the businessman problem.


Not every programmer is passionate about the same thing. I got into this field because I love building things that make people's lives easier.


Sure.

Anecdotal, but most of the people I've worked with as ICs couldn't give a damn about that. They want dollarydoos.

One of the 10X-ers I know (they exist and are real), told me repeatedly how he'd much rather be doing his own thing. He hates the business needs. But income is important and that's why he's dedicated to doing it. I'm surprised at how focused and good he is given his disposition, and I want to hire him when I scale my business more. Drive and passion are sometimes just spontaneous.

An old CEO of mine even quipped that we were not family and that we were there to do a job. All true. Most of the people doing that job were only there for the money.

Most jobs that drive sales and revenue simply aren't fun or rewarding. There's lots of infrastructural glue and scaling. Tiring, boring, monotonous work. 24/7 oncall work. The money is good, though.


This! I am frustrated at how often devs will not accept that simple things become incredibly complicated at scale. That favorite coding technique? That container you wrote? Those tests you added? All good, but until you've tested them at scale, don't assert that everyone should use them. This dynamic is true in the other direction too: that techniques often taken for granted simply are not feasible in highly resource-constrained environments. With rare exception, the best we can say with accuracy is that "I find X works well enough in the typical situations I code for."


> simple things become incredibly complicated at scale

In a way it's the opposite. Things scale up by removing complex interactions. Software gets faster by solving the same problem with less steps.

Any beginner can write to much code and pile up too many software and hardware components.

It takes knowledge to simplify things.


It reminds me of a saying: "Any idiot can build a bridge that stands, but it takes an engineer to build a bridge that barely stands."


> Any beginner can write to much code and pile up too many software and hardware components.

I'd take it a step further, the beginner has no choice but to write too much code/cruft. By definition they lack the experience required to know the options before them, many of which are simpler.


Indeed, but it's also about having the wrong mindset: piling up stuff gives people the idea that they are building something.

Removing complexity and saving CPU cycles does not feel as good to beginners.


People have career and ego incentives to overestimate the constraints of the environment they're working in, and will use lazy heuristics like rounding some number to "a lot" and assuming that it justifies pulling out all the stops. Computers are fast. One or a few of them can accomplish quite a lot. Some people really do exceed those limits, and godspeed to them, but even in the companies they work at, the vast majority of applications do not.


This goes both ways - things written “to scale” often, or usually, do worse in more typical, smaller scenarios, because of all the bloat that’s irrelevant to 98% of use cases.

That’s one of the fundamental problems in todays Open Source world: everything’s being optimized for a very rare case.


I think the problem is that the "easy" parts of netflix such as the UI or the recommendation engine seem like they were hacked together over the weekend. Of course deploying and maintaining something of the scale of netflix is incredibly hard. But if they can afford thousands of engineers who optimize the performance why can't they hire a few UI/UX engineers to fix the godawful interface which is slightly different on every device? I think this is where this sentiment stems from.


Technically speaking I think Netflix's UX blows every other streaming app out of the water. It loads instantly, scrolling is smooth, search is instant. Buttons are where you'd expect and do what you expect. They have well-performing and up-to-date apps for every conceivable device and appliance. They support all the latest audio and video codecs.

This is all in stark contrast to services like HBO Max and Disney+ which still stutter and crash multiple times a day. Amazon for some reason treats every season of a TV show and HD/SD versions of movies as independent items in their library. I still haven't been able to download a HBO Max video for offline viewing on iOS without the app crashing on me at 99%.

The problems you mention with Netflix are real, but they have more to do with the business side of things. Netflix recommendations seem crap because they don't have a lot of third party content to recommend in the first place. Their front page layout is optimized to maximize repetition and make their library seem larger. They steer viewers to their own shows because that's what the business team wants. None of these are problems you can fix by reassigning engineers.


> Buttons are where you'd expect and do what you expect.

Wait, what? Netflix is the absolute worst at this. Every time I log in the interface is different! Netflix could not care less about users having a consistent seamless experience.

But as far as performance goes, I totally agree with you. The performance is impressively good and noticeably better than the other streaming apps I use.

The UX is just so bad in so many ways (UI churn, autoplay, useless ratings, useless categories, recaps that can be watched exactly once, and so on...) it mostly ruins the app for me. The actual video quality is great though.


Interface is the same, order of rows is different. Yes, it sucks. However, other streaming apps are much worse:

- by the time HBO Max finish loading, I've already lost interest

- Amazon Prime constantly gives me errors, and it's often hard to find what you paid for and what you have to pay for

- Paramount+ often restart episode from beginning instead of resuming.

- Many leave shit in your queue with a few seconds left for you to "Continue Watching". I still have shows in Paramount+ that I've finished months ago in the queue, and there is no way to delete them without watching end credits. - HBO Max only allows you FF in small fixed intervals

- Plex...used to be okay, now it's pushing its streaming services and works very bad offline

- Apple TV has awful offline experience compared to netflix in terms of UX

Nah, I will take netflix constantly changing rows over shit others do.


> Netflix recommendations seem crap because they don't have a lot of third party content to recommend in the first place.

Their recommendations used to be world class until they got rid of the user feedback mechanisms.


> Amazon for some reason treats every season of a TV show and HD/SD versions of movies as independent items in their library

I don't know how it is in the US, but in Japan, it's even worse than that. Japanese dub and English with Japanese subs are different items (although the UI has a way to choose between audio and subtitle channels), and Japanese subtitles are burned into the video.

Very often, despite all the config being in Japanese, it will complain in Japanese, when starting a video, that there isn't an English dub for the show, as if I had asked for it, which I never have.

In related info for Anime that is definitely not an English dub, it shows the names of the US voice actors and their related works more often than the Japanese voice actors.

Multi-season shows in the Prime Video app might be either separate items or not, depending on the title. Through the Fire TV home, though, it's a huge mess. You may have Season 1, 2, 3 under an item for $Show, with Season 2 being the second half of season 1, but on netflix and season 3 really being season 2, on netflix as well. At the same time, the real season 2 is also a separate item which has 2 seasons, with season 1 being really season 2 on Prime and season 2 being... season 2 on another service. Confused?

It's often super hard to get those non-season 1 items as search results too...

Things are however a little better if only using the Prime video app.


My complaint about netflix UI/UX aren't technical in nature, I agree with you their player is the best out there, hands down.

The issue is the business polices surrounding it. The UI itself is user-hostile.


I'm surprised you think Netflix's UI and UX is that poor. Which streaming service do you think does a better job?


None of them, since they basically all copied Netflix! The grid view limits users to slowly looking over limited categories of content. Any list based tree structure would be better in my opinion.


I think you are overestimating your knowledge of design and UI. Mainstream software has been rigorously researched, tested, and proven to work. If we left UI design up to HN users we would end up with some plain text directory listing with vim keybindings.


At some point around 2006 I could sort all Netflix titles by user rating and even better, what Netflix expected I would rate a given title. This expected rating became incredibly accurate after I’d rated 50-100 titles according to the extant five star system. It was so good that I found myself watching many titles that I otherwise wouldn’t have considered, because the system was invariably right. I could also safely avoid titles I was very interested in when the system surmised I would be disappointed. And I could very easily inspect, sort, and edit my watch list.

Today I rarely use Netflix, and I wouldn’t pay for it. Periodically I open the app and add to my list those titles that are immediately visible which I know I want to watch, like comedy specials of comedians I’m familiar with, but I don’t inspect further, because if I haven’t already heard about a title from some external sources I trust, it’s not worth my time to check. That list just grows and grows, though titles are often removed as they become unavailable, but I never prune the list, because when I remove one title, I’m taken all the way back to the beginning of the list. Trying is just a wast of time.

It is baffling to me that anyone could interact with Netflix’s current UI and conclude that it was anything but a raging dumpster fire.


Not OP, but I think the Swedish TV streaming service has a simpler, while nicer UX (hope you can at least see this from your country, if not play the content): https://www.svtplay.se/

Admitedly, it follows the same pattern as Netflix, but I like how it's more responsive and feels way simpler/lighter.


You just linked to an exact copy of Netflix.


Why do you say "UI and UX"; how are they different in your view?

Jargon BS is invading people's heads and it has to stop.


I honestly find Netflix's the easiest to navigate, by far.

Hulu did that big redesign, and it's extremely pretty to look at, but even after a few years of trying to use it, I still struggle to do anything other than "resume episode". Finding the previous episode, list episodes, etc is always an exercise in randomly clicking, swiping, long pressing, waiting for loading bars, etc.

One thing Netflix really got right as well: the "Watch It Again" section. So many times I want to rewatch the episode I just "finished" (because either my wife finished a show when I leave the room, the kids fell off the table, I fell asleep or wasn't paying attention, etc), and every other platform makes this extremely difficult to find.

Back to Hulu--the only way I know how is the search feature, which is a PITA with a remote.


> godawful interface which is slightly different on every device

Which devices are you referring to? I’ve only used the PC and mobile interfaces both of which are quite pleasant.


That's what puzzles me about Uber. I believe that behind the scenes it does pretty complex things as explained many times on HN, but it's the worst app I've ever used. UI and UX wise it's so bad that if you told me it was a bootcamp graduation project I'd have no problem believing you.


I work on a very technically trivial service at a large company.

It's the kind of thing that people run at home on a raspberry pi, docker container or linux server and it consumes almost no resources.

But at our organization this needs to scale up to millions of users in an extremely reliable way. It turns out this is incredibly hard and expensive and takes a team of people and a bucket of money to pull it off correctly.

When I tell people what I work on they only think about their tiny implementation of it, not the difficulty of doing it at an extreme scale.


I think a fair criticism would be how many engineers they have compared to their competitors. Disney+ is on a similar scale, can they do the same/similar job with less people? And considering netflix pays top of market, how much does Disney spends for their engineering effort to get their result. Would netflix benefit from just throwing more hardware at the problem vs paying more engineers 400-500k/y to optimize?


Standing on the shoulders of giants, Netflix engineers didn't have blog posts from other companies on how to handle the scale they started facing. Facebook didn't have blog posts to reference when they scaled to 1B users. They pay for talent that have built systems that had not been built before and they have seen a return on it so they continue to do it.


Hulu was around before netflix


Sure? "After an early beta test in Oct. of that year, Hulu was made available to the public on March 12, 2008—a year after Netflix launched its own streaming service."

[1] https://www.foxbusiness.com/technology/5-things-to-know-abou...


Hulu was never Netflix scale. YouTube is a better example.


Youtube is very different than Netflix from a technical problem perspective. They serve free videos to anyone around the world that are uploaded by users.

It's closer to a live streaming problem than pre-encoded video like Netflix.

Having worked at Netflix I can say that the YouTube problem is much more complex.


I wonder what portion of Youtube's request traffic can be served with cache servers at the edge with a few hundred terabytes of storage. There's a very long tail but i would guess a significant portion of their traffic is the top ~10,000 videos at any given moment.


There was a Google organised hackathon on this topic. Given a set of resources, locations, and (estimated) popularity, Optimise for video load time by determining what should be moved to the cache when and where.


did anything from the hackathon turn out to be used?


I’m not sure as I don’t work at Google and didn’t see any follow up. Maybe some in the winning teams would know.

I do think there is some temporal logic already present in Google’s algorithm which wasn’t part of the challenge.


Not even close. YouTube has orders of magnitude more content and vastly more users. Google Global Cache was the inspiration for Open Connect.


yeah and have you see the awful performance of Hulu? its basically unusable. poster child for under investing in the streaming platform.


Huh? Netflix predates Hulu by over a decade.


> Would netflix benefit from just throwing more hardware at the problem vs paying more engineers 400-500k/y to optimize?

Where the CDN boxes go, you can't always just throw more hardware. There's a limited amount of space, it's not controlled by Netflix, and other people want to throw hardware into that same space. Pushing 800gbps in the same amount of space that others do 80gbps (or less) is a big deal.


The engineers are definitely cost-effective at this scale. They may be the highest-leverage engineers at the company in terms of $ earned from their efforts compared to $ spent. The improvements that come from performance engineers at large companies are frequently worth $10M/year/person or more.

Most companies maintain internal calculations of these sorts of things, and make rational decisions.


Sorry for the tangent, but really curious to ask:

When you say that companies maintain internal calculations of the benefits, would you say that it’s (extremely roughly) something like: $10M benefit, need 5 core engineers + benefits + PM + testing lab etc etc -> we can spend up to $500k per eng give or take.

Or is the $10M one number (that would be held somewhat secretly internally at the company) and the salaries mostly represents where the market is? Does the (salary) market take into account the down-the-line $10M value?

Basically, could those engs negotiate to be paid more, or are they already sort of paid close to exactly what the group they’re part of generates in terms of revenue?

Thanks!

I see that you said $10M per person, not for the “network optimization group”. Hmm. So it would be fair to say that the engs are definitely not paid according to the value they generate..? I wouldn’t be surprised by that but just to confirm.


If you are an employee there is little to no relationship between your output and your compensation. Employer employee relationships are based on the cost to the employer to secure equivalent or better output.

Secondly, yes $10M per employee of revenue or cash flow is pretty reasonable for similar companies. The prioritization is NOT “how many employees per $MM.” The allocation is “what opportunity is the highest $MM return per available employee.”


Relatively few companies have the scale or the efficiency for it to be profitable to hire engineers at Silicon Valley wages.


Sure. Lots of companies and employees are located elsewhere. And software + services is a heck of a financial drug. I dont think that changes the underlying message of how to think about employee comp or business opportunities.


Compensation is negotiated up and down marginally by the cost of hiring someone else, but bounded above by profitability. You could be the best price:performance employee in the world for a small business, there's just not all that much they could pay you. A huge mega-profitable tech company can pay whatever it takes for top talent and also get away with sometimes inefficiently overpaying rest-and-vesters.


Not OP, but 1 engineer -> 10M of benefit sounds right for my company.

In terms of negotiation, it really depends on how differentiated your skills are. Short answer is that if you can convince management that it would be difficult to find other engineers who could deliver the optimizations you're delivering, yes, you have leverage.


This is exactly right about negotiation and your skillset. I have seen performance engineers in the right place at the right time get 10-20% of their benefit to the company (I have seen both $1 million/year compensation for line workers and $10+ million/year for very senior folks).

Very highly skilled engineers in specific niches can basically price themselves like monopolists, because the company can easily figure out how much money they are leaving on the table by not hiring them. This is not like "feature work" engineers, whose value is very nebulous and unknown.


The simple fact is that you are not paid for the value you create. You are paid based on the salary you can demand. For performance engineers, $10 million/year/person opportunities are kind of rare, meaning that you can't demand close to that. Your alternatives to big tech are things like wall street, which pay very well, so you can demand a higher salary (and/or higher level) than a normal engineer of your skill would get. However, this is nowhere near the value of the work.


Disney (the company) has 20x the number of employees as Netflix, and just 2x the market cap (in fact they were briefly worth the same last year), ~2x the revenue and 2/5 the net income. So Netflix is clearly doing something right.


>Disney (the company) has 20x the number of employees

Is that all of Disney or just Disney+?

It doesn't seem like that would be a useful statistic if that includes completely unrelated positions (e.g. does that 20x statistic include Disney employees working at Disney Land/World serving up hotdogs? Because they probably don't contribute much to the streaming service)


Netflix also has production studios they now own making content.


Content like hotdogs at an amusement park?



Disney Streaming had 850 employees as of 2017 [0] (can't find any newer figures); LinkedIn is suggesting 1k-5k.

[0] https://en.wikipedia.org/wiki/Disney_Streaming


Perhaps they are just running different business models?

Walmart's market cap per employee is probably much, much lower than Disney or Netflix, too. That doesn't mean Walmart is doing anything wrong.


I watch Disney content sometimes and it constantly drops or freezes, you can see the difference in quality compared to Netflix.


Yeah, you can totally see the difference. Netflix encoding looks like shit.

I've done a lot of video processing professionally (the server side stuff, exactly what Netflix does) and Netflix is by far the worst of all the streaming providers. They absolutely sacrifice the quality of the video to save bandwidth costs in aggregate and it shows (or more accurately it doesn't show, all the fidelity is lost).


Your isp may be throttling bandwidth for Netflix, leading to lower quality encodings being served to you. Comcast does this, for instance.


Are you getting the best Netflix encodings? You might be getting worse quality because your ISP throttles Netflix.


For anybody looking at this after-the-fact, I'm not making it up. Netflix really does look like shit: https://news.ycombinator.com/item?id=32553958


Do you think it's worth it to pay the extra $5-10/month for premium quality? https://help.netflix.com/en/node/24926


Even the Premium 4k streams have surprisingly low bitrates and, occasionally, framerates. I dug out the blu-ray player the other day and was absolutely shocked how good things looked and, even more so, sounded--the audio quality from Netflix (and most streaming services, really) is simply atrocious.


Of course, but you can't expect 100GB movies for 4K HDR from an online provider.


The Sony Bravia Core service isn't far off that to be honest.


You can’t make such conclusions from your own experience. It is one form of bias. There are many variables. For me it is the opposite, for example.


I wasn't able to watch disney+ via chromecast for like a year in 4k. Stuttering every 10 seconds or so. I never had problems like this with netflix.


I guess you weren't a Comcast customer in 2014 trying to watch Netflix and getting low quality, stuttering video. At the time lots of people tried to frame it as a net neutrality issue but in the end I think it was a peering dispute that involved a third party.

https://www.wsj.com/articles/SB10001424052702304834704579401...


I think this just validates their points. Netflix has more engineers and 8 years of them building and fixing things, so they have fewer issues.


Disney bought a majority ownership in BAMTECH to build Disney+.


That seems like a fair point if you just consider the video streaming. I know that Netflix wants to break into gaming. I'd imagine the bandwidth required for that is higher than streaming videos.


It's really not, especially if you look at their current model for doing so. Netflix at the moment are breaking into mobile gaming, which means the bandwidth requirements are placed on Apple/Google's app store infrastructure. I'd be surprised if Netflix don't have any sort of metrics gathering infrastructure to judge how much people are playing those games, but they're also likely reusing the same infrastructure used by Netflix video streaming for that, so the incremental increase in load may well be negligible.


I was referring to their plans for a game streaming service.


I dont see the point. A centralized data hose that is replacing what internet was designed to be : a decentralized, multi routed network. The problem may be useful to them, but unlikely to be useful to anyone who doesn't already work there. I dunno, if it was possible to monetize decentralized or bittorrent video hosting, i think it would solve the problem in a more interesting and resilient way. With fewer engineers.

But it's like, every discussion today must end with something about the pay and head count of engineers.


It's funny you mention this. When I worked at Netflix, we looked at making streaming peer to peer. There were a lot of problems with it though. Privacy issues, most people have terrible upload bandwidth from home, people didn't like the idea of their hardware serving other customers, home hardware is flakey so you'd constantly be doing client selection, and other problems.

So it turns out decentralized multi routed is not a good solution for video streaming.


Works great for storing pirated content though


Usually you aren't live streaming your pirated content right off other people's boxes. You download it first and then view it. So you don't need every chunk available at just the right time.


Somebody I know (cough) starts torrent downloads in sequential order after downloading the first and last chunk, and then opens the file in VLC while it is downloading.

Works amazingly well for watching something front to back if your download speed is fast enough; you'd never know it wasn't being streamed. The hardest part is finding a good torrent for what you want to watch. Ironically the Netflix catalog is one of the most easily available to pirate since people rip it directly from web.


Popcorn Time worked pretty well with just that model; watching more or less immediately is it's downloaded in order from the swarm.


Did you actually use Popcorn time? It got stuck all the time waiting for a chunk. Also, again, people sharing pirated content don't care about privacy and are happy to share their home hardware for other people to use. Paying customers care about that stuff.


I have; it worked flawlessly for content that was decently seeded. And that's without the sorts of table stakes you'd expect for a streaming platform like the same content encoded at different bit rates, but chunked on the same boundaries so you can dynamically change bitrate as your buffer depletes.

And I'm not sure most people actually care if their home hardware is being used for whatever by the service they're using, or else there'd be pushback on electron apps from more than just HN.

The sense I always got from Netflix's P2P work was that it was heavily tied into the political battles wrt the BS arguments that Netflix should pay for peering with tier 2 ISPs. Did this work there continue much after that problem went quieter?


Used it dozens of times, usually works fine for the popular content.

Good quality, barely any buffering.

The niche content may be too difficult for a "live" streaming experience.


wouldn't a peer-to-peer setup be a non-starter legally? ..or at least incredibly high risk. I could see major ISPs complaining if Netflix is using the upstream side of the ISP's customers for profit.


No, Microsoft is doing the same thing and nobody cares. Just mention it the small print in the agreement and offer a way to turn it off.


Yes. :)


recently i see a lot of people with very high upload speeds. Nobody is using them though, but nominally they are there.


Sure, very recently. But all the other issues still apply. A real time feed from random people's machines is very difficult at best.


I ve watched a lot of HBO (not available here) on popcorn time


I understand and even share a little bit of your sentiment, but I'm tired of stretched "X is now not what X was supposed to be".

Strictly speaking, the Internet was supposed to help some servers survive and continue working together despite some others being destroyed by a nuke. That is more-or-less the case today: we see how people use VPNs to route around censorship. Whether you were supposed to stream TikTok videos directly from the phones of their authors or through a centralized data hose - i'm not sure that was ever the grand idea.

Also "decentralized" and "monetize" don't go well together because innovation is stimulated by profit margins and rent-free decentralized solutions by definition have those margins equal to zero (otherwise the solution is not decentralized enough).


While we are at it let's just put video streaming on the blockchain! Who needs all these engineers and servers.


But only seven people can stream at once!


Once you download the chain you can watch anything you want! You'll have a local copy of _everything_


I think nobody said Netflix' infrastructure can be built in a weekend. However, the scale doesn't matter that much after a certain point once the scaling "wall" has been pierced. If you are a biscuit factory that produces 100'000'000 biscuits per year or 500'000'000 biscuit per year then the gap between 100M and 500M isn't that impressive so much anymore as it's mostly about scaling existing processes. However, if you turn a 1'000 biscuit shop into a 1'000'000 biscuits company then it's very impressive.


Nonsense.

It's still impressive. A 5x increase at that scale can be a phenomenal challenge. Where do you source the ingredients? Where do you build the factories (plural because at that scale you almost certainly have multiple locations in different geographic locales subject to different regulatory structures). Where do you hire the people? How do you manage it? What about the storage and shipping and maintenance of all the equipment and on and on? How much do you do in house how much do you outsource to partners? What happens when a partner goes belly up or can't meet your ever increasing needs?

Your comment is a great example of what the OP pointed out.


My favourite example of these sort of extreme scaling issues are the fact that McDonald's apparently declined to sell products with blueberries in them because modelling showed they'd have to buy the world's entire supply of blueberries in order to do so.


I thought this was hyperbolic, so I looked into it:

> The menu team comes up with interesting ideas like including kale in salads. The procurement team and suppliers then try to get the menu team to understand the challenges. How do you bring kale to 14,000 restaurants? As one example, when they introduced Blueberry Smoothies in the U.S., McDonald’s ended up consuming one third of the blueberry market overnight.

https://www.forbes.com/sites/stevebanker/2015/10/14/mcdonald...

I couldn't find any other source to back it up, but still wow! That's an absurd number.


So there is an extreme dearth of blueberries I guess compared to other food goods? I mean, McDs isn’t taking over the entire supply of potatoes or chickens for example correct?


I think the point is that the supply chains probably need upwards of years of time to adapt in some cases, you can’t just turn on a recipe that needs a full cup of blueberries per serving on Monday and expect there to be a spare million cups of blueberries to be lying around the supply chain on Tuesday.

In the case in animal product, there are almost certainly major operations worldwide that have been built and financed purely to serve McDonalds demand. They probably have to even build these out well before entering some markets.


They grow a lot of potatoes in the US. Last week I hauled a load of tater tots destined for McDonalds. I’ve hauled potato products for McDonalds quite often.

They raise a lot of chickens in the US. I’ve hauled chicken nuggets or chicken breasts for McDonald’s in the past quite often.

I can’t even tell you where they grow blueberries.


McDonald's sells blueberry muffins


It's the exact opposite.

Taking the software example, you can easily scale from 1 to 100 users on your own machine. You can handle thousands by moving to a shared host. Using off-the-shelf web servers and load balancers will help you serve a million+. From there on you'll have to spend a lot more effort optimizing and fixing bottlenecks to get to tens, maybe hundreds of millions. What if you want to handle a billion users? Five billion? Ten billion? It always gets harder, not easier.

Pushing the established limits of a problem takes exponentially more effort than reusing existing solutions, even though the marginal improvement may be a lot smaller. Getting from 99.9% to 99.99% efficiency takes more effort than getting from 90% to 99%, which takes more effort than getting from 50% to 90%.

You never pierce the scaling wall. It only keeps getting higher.


If you can serve 1K users with 10 employees, you can probably serve 1M users with 10k employees.


And you can birth one baby in 3 months by 3 women, right?

To add something useful as well besides snark, first of all, there are hard physical limits, which are sometimes well within context (you really shouldn’t try to outcompete light speed for example, relevant in some high-freq trading, infrastructure projects). Then you can try to increase headcount to any number, you won’t produce for example a better compiler. There are simply jobs that are more “serial” - the only way to win at those is to try to employ the very best of the field in a small team.


But the goal wasn't better or faster. It was giving more customers the same service. You're talking about a completely different problem.


No, just 3 babies in 9 months.


That's too simplistic. What about the doctors and medical facilities and other supporting infrastructure? What about the baby food and medicine and clothing and supplies and what about the people to take care of the children? You think you can just keep throwing more women at a hospital having babies to infinity and not have any problems?


There's a limit on those things, but it might as well be infinity when you're trying to have 1 baby or 3 babies or 100 babies.

1M users and 10k employees is not in the range where you have crushingly impactful logistics.


You can deliver DVDs to netflix subscribers as well to achieve a much bigger throughput, but I doubt they would be as popular as they are right now :D


Sneakernet!

“Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.“ -Andrew Tannenbaum


That won't help your customer expecting their 'baby' after three month due to the increased mother-workforce ;)


Remember that global productivity usually does not scale with headcount!

Each employee adds some overhead, which requires more employees... which requires more employees.


Sounds like the rocket equation! Perhaps big companies are rocket science?


If you told McDonald's to double their number of McRibs produced next year that would be an incredible challenge to meet. They already sell enough that it affects the global pork market, it'd be insane for them to double their demand for pork. What about other supplies, would this result in a reduced burger demand? How can they ensure they can respond appropriately either way? They probably run near fridge/storage capacity, does increasing this mean they need to also increase storage at restaurants?

That's a 2X increase. Now do it again and a half for a 5x. Crazy to say there's a "scaling wall" that once you "pierce" it's easy to scale up. It's the opposite, McDonald's already knows how to supply and sell X McRibs a year, there's no company that's ever sold 5X those McRibs so they have to figure it out themselves.


There's an old rule of thumb that each order of magnitude increase (10x) brings a whole new set of challenges.

Anecdotally I experienced this when scaling my software product from 1 --> 10 --> 100's --> 1000's etc. of users.

Thats not to say 2x can't be a substantial challenge, as you pointed out. It gets harder (and IMO more fun) when you're at the bleeding edge of your industry.


> the gap between 100M and 500M isn't that impressive

This is absolutely not true. The closer you are to peak performance, the harder it is to scale, and the returns diminish heavily. At many major tech companies, there's a huge amount of effort into just 1% - 5% optimizations -- these efforts really require creative thinking and complex engineering (not just "scaling existing processes".) At the volumes these companies operate, even a 1% optimization is quite significant.


Aren't you contradicting yourself?

If you're on 100M users you're probably scaling vertically. So adding 5x more hardware shouldn't be a problem.

But when you're at 500M all of a sudden it makes sense to optimize further since the capital saved will be the same percentage(ish) but the money is worth peoples time all of a sudden.

I know that we don't care particularly about power savings in the DCs I've worked in, because they're relatively small. While bigtech will do all kinds of shenanigans to save a couple watts here and there, because it's worth it across your hundreds of thousands of servers.


In the real world, it's rare to find a non trivial business where operational costs to its problem size scales linearly. Let's take an example of search. If t he web size grows by 10x; this not only means # of your user grows by 10x but also # of web pages to be indexed, size of the ranking model, ratio of unstructured/dynamic content formats, faster pace of content updates, reduced informational density of overall web, size of user logs, advanced SEO, (those are just a tip of iceberg, there are so many other stuffs) etc etc... Every single dimension scales and stacks up. And you need to handle all of that within a limited latency budget where you cannot really make any compromises.


> So adding 5x more hardware shouldn't be a problem.

If only things were that simple. https://en.wikipedia.org/wiki/Amdahl%27s_law


Seeing scale issues as purely hardware bound is incredibly naive. Even in a case like streaming, if you’re pushing more bits through the wire, it’s likely the increase in usage causing the traffic increase affects the software systems you have in place to support your service start degrading and you need to rearchitect them. Very few problems at that scale can be solved by throwing more hardware at them.


Part of it depends on if "build it five more times, again" is a viable strategy.

Building five "Netflixes" with identical content is possible; the amount of content wouldn't change (it would decrease, the cynic says); you just need parallel copies of everything (servers, bandwidth, etc).

The fun would come in syncing usernames, etc through the system.

It's an entirely different class of problem compared to "acquire resource, convert it, sell it".


It's a good point, and I think it's an interesting comparison. Obviously improving by a factor of 1000 is better than improving by a factor of 5. But the absolute improvement is still 4 times larger. 400'000'000 extra biscuits is going to bring a lot more revenue than 999'000 biscuits


> However, the scale doesn't matter that much after a certain point once the scaling "wall" has been pierced.

Sorry, you gotta overhaul majority of your architecture and its components to scale by every 10x. It's not a single "scaling wall" to break through but it's more of a relentless stream of uphill battles. And this gets even more interesting when you reached to the point where there's no prior art for your problem, usually at hundreds of billions of users.


Actually, you're incorrect. Scale problems seem to have quanta. In your example you will have physical issues with ingredients, etc at some point. It might be storage, it might be because you've run out of water. It might be because you've run out of electricity.

Making 5 things and making 5,000 things is as different as making 50,000 things and 1m things. There are always cost constraints at each level, and each design can only go so far.


> the popular "why does Netflix need X thousand engineers, I could build it in a weekend" sentiment that is frequently brought up on forums like this one.

I don't think that's a popular sentiment about Netflix. Twitter, Reddit, Facebook, yes, but Netflix, YouTube, Zoom, not so much.


I don't think this actually answers why Netflix needs to many engineers. This seems like the sort of thing that one or two experienced engineers would spend a year refining, and it would turn out like this.

This is the sort of impressive work that I've never seen scale.


Author here... Yes, most of this work was done by me, with help from a handful of us on the OCA kernel team at Netflix (and external FreeBSD developers), and our vendor partners (Mellanox/NVIDIA).

With that said, we are standing on the shoulders of giants. There are tons of other optimizations not mentioned in this talk where removing any one of them could tank performance. I'm giving a talk about that at EuroBSDCon next month.


Thank you for sharing!


> "why does Netflix need X thousand engineers, I could build it in a weekend"

Believe me or not, I was in a company doing web file streaming in 2009 using Nginx, sendfile and SSL offloading on the NIC.

It was installed by one dude. A standard Linux distro, standard kernel and no custom software. Just compile the SSL offloading kernel module once.


And how many concurrent users did you have?


It was about large file streaming so bandwidth is the bottleneck, not number of users. What matters is that the configuration was good enough to use the NICs at 100%


Yeah, that's perhaps nice (and hopefully moderately interesting) enough for a hobbyist work. Good luck for multiplying the scale by 1,000,000 for many dimensions.


Completely wrong. The configuration I described was replicated on different servers and they scale linearly, obviously. It was saturating the NICs and therefore would have been good enough for Netflix at the time.

It's strange that you assumed otherwise.


> why does Netflix need X thousand engineers, I could build it in a weekend

I would like to hope nobody asks that. Video is the one of the, if the not the hardest data plumbing use-case on the internet.


I'd say realtime communications is harder.

A lot of these tricks being discussed here cannot be applied to Skype calls.


Surely GP would agree, unless you mean even audio-only calls? Otherwise it's just an extra requirement(s) on top of 'video'.


Streaming pre-recorded video and streaming realtime video are almost entirely different use cases.

Pre-recorded video streaming is, under the hood, really just a high-volume variant of serving up static web pages. You have a few gigabytes of file to send from the server its stored on to the device that wants to play back the video. As this presentation demonstrates that isn't trivial at scale, but the core functionality of sending files over the internet is what it was designed to do from day one. Because you can generally download video across the internet faster than it can be played back its possible to build up a decent sized buffer which allows you to paper over temporary variance in network performance without the customer noticing.

Realtime video streaming has two variants. One to many Twitch style video streaming is relatively simple, since you can encode video into files and upload them to a server for people watching to download those files. This is how HLS streaming works, and most of the techniques Netflix use to optimise video delivery can also be applied here at the cost of adding latency between the event being streamed and people consuming it. That latency will often sit at about 30 seconds, and people generally find that acceptable.

Skype style realtime video streaming is much harder. You're taking video from one person's camera, and then sending it over the internet to one or more people's device. You can't do any sort of pre-processing on that, or stage the video on servers closer to the consuming users, because you have no way of generating that video until the point your users decide to start talking to each other. Because you can't pre-stage that video you need to be able to establish a network route between the people on a call, potentially in an environment where none of the participants have any open connection from the internet directly to the device they're streaming from. Slight fluctuations in network performance can potentially degrade video delivery to the point of it being unusable. The most common route to deal with that is systems that attempt to establish a direct connection (ideally over a local network) between participants, and if that doesn't work going via relay servers operated by the software provider. These servers provide a single point on the internet all parties can connect to, and then allow passing packets as if they were all on the same network.


> Skype style realtime video streaming is much harder.

I agree but since some optimization is theoretically impossible, perhaps it can be said that optimizable area is much smaller than other services. YouTube (many many videos, high quality, many livestreams, and supports VR) seems to most difficult service to fully optimize.


The amount of transcoding needed to get a conference call up hurts my brain. If 20 people are talking on Skype, the server needs to receive those 20 streams, decode them, mix the audio together, recode the streams, and then broadcast it back out to all 20 people.

I'm not a telecommunications guy, but I had some professors back in college explain how difficult and fundamental the research of "ma-bell" was from the 60s through 80s. I'm talking Erlang, C++, CLOS circuits, etc. etc. The innovations from Bell Labs are nearly endless.

Telephone communications is one of the biggest sources of fundamental comp. sci research over the 1950s through 2000s.


See also Claude Shannon's trifecta of compression, error correcting codes, and cryptography, which also came out of the study of telecommunications.

Compression removes redundancy from your message. Error correcting codes introduces carefully controlled redundancies, so your message survives transmission errors, so your recipient can read it. And cryptography deals with 'scrambling' your message, so that no one else can read it (and also deal with authentication etc).


> I'm talking Erlang, C++, CLOS circuits, etc. etc. The innovations from Bell Labs are nearly endless.

A lot of innovations did come out of Bell Labs. But I'm pretty sure Erlang wasn't one of them.


Oh, you're right. There seems to have been a glitch in my memory somehow. Still, its Ericsson, which is telecommunications nonetheless.


> Otherwise it's just an extra requirement(s) on top of 'video'.

I'm not sure what you mean? Real time communication, both video and audio-only, have much lower latency requirements. You can't just buffer ahead when you have some spare bandwidth, like Netflix or YouTube can.


Yeah that's what I'm kind of facetiously calling 'just an extra requirement'.

My point was intended to be that there's the same challenges and more - but it's not something I've thought about in depth (and certainly not had to work with), it maybe wasn't a very good characterisation because it's not the same on the other side either, no large file to serve because at the start of the call it doesn't exist yet for example, so perhaps I take it back.


Video calling is "easier" in that the p2p option is workable for some variant of "Works".

It is much much harder because you can't do the "cache everything on the edge" solution. If storage was infinitely cheap and small, Netflix could run their entire business by sending a USB stick with every single movie/tv show they have on it encrypted to you, and everything would play locally. This is basically what they do with their edge servers/CDNs.

You can't do that with video calls, because the video/audio didn't exist 1 millisecond ago.


Yeah, jon-wood really did a great write-up of the challenges involved.

In any case, it's hard to say what the 'greatest' engineering challenge is. You can make almost any kind of engineering really challenging, if you (or the market..) sets yourself a very low cost-ceiling.


To be pedantic, scaling by itself isn't that difficult.

Scaling cost-effectively is.


Yes, and No. At some point, even scaling at all would be hard.

(Just like sending a human to Alpha Centauri is hard, even if you had unlimited funds.)


Eh, sending a human to Alpha Centauri wouldn't be that hard... Although it would be difficult to know for sure if they arrived, and for ease of transport, you may want to send a dead human.


You are joking, but even sending 10kg to Alpha Centauri to arrive within a century (or at all) is something that has never been done.


Arriving within a century is a significant new requirement! But yeah, we haven't done it at all. We've gotten 5 objects to escape velocity to leave the solar system though, so it's not a huge leap to chuck it at Alpha Centauri instead of wherever we did... Of course those things aren't going very fast, the internet says Voyager 1 would take 74,000 years or so to make it to Alpha Centauri at its current speed if that's where it were heading. That's a really long time, so it's not super worthwhile to do. You might be able to design a system to launch a 10kg object from an interstellar probe after it reaches heliopause to get a higher velocity, but it's still going to take approximately forever to get there... And I don't know that we can track our interstellar probes if they're not sending radio signals anyway? So chuck it at Alpha Centauri and make a website that counts down to the day it gets there, cause nobody is going to know if it actually made it there and you'll be dead anyway.


Like it how? Accomplishing a grand feat is nearly the opposite of scaling.

If Netflix built out more slower servers, that would be acceptable scaling. I don't see any plausible scenario where that becomes too difficult. Even if they had billions of subscribers.


There's lots of scaling involved in grand feats.

Eg enriching a tiny bit of uranium is 'relatively' easy. Enriching enough of the stuff to build an atomic bomb is a question of scaling. And, yes, in principle you can just use more and more centrifuges etc.

Humanity has already built some small infrastructure off-earth, like the ISS or satellite networks. Also various unmanned probes that toiled away for years in space and on Mars.

Building a habitat big enough to plausibly keep humans alive for the decades or more it would take to get to the Alpha Centauri would involve enormous scaling efforts. And that's not even touching on propulsion, yet.

However, yes, grand feats require more than just scaling.


> Eg enriching a tiny bit of uranium is 'relatively' easy. Enriching enough of the stuff to build an atomic bomb is a question of scaling. And, yes, in principle you can just use more and more centrifuges etc.

But we're not scaling for scaling's sake. We're scaling with the number of customers.

If it was profitable to enrich some milligrams by hand, then you'd be able to get more scientists and engineers and better equipment and automate some parts and make even more profit. If you go the centrifuge route, it's not because you had to do the harder methods, it's because it's massively cheaper per gram of product.

Such a scenario is very different from "we can't do it at all, and we need to build complex infrastructure just to make it possible".


It depends entirely on the problem domain. Sure, it is more of a devops problem when the problem is trivially parallelizable, but often you have a bottleneck service (e.g. the database) that has to run on a single machine. No matter how many instance serves the frontend * if every call will have to pass through that single machine.

* after a certain scale


Tell that e.g. Tesla

What I've read they burned a lot of money and hat large problems scaling nevertheless. Which I don't find too surprising, not because they are unable, but because it isn't easy to scale.

From my experience and from what I read scaling people roughly a power of ten is a larger change in an organisation and therefor likely a challenge. For _any_ technical process the boundaries might not be strictly a power of ten but i would say that scaling a power of a hundred is a challenge if this value is not already reached on any process in your organisation.


True.

Scaling to - say - Paramount+ size should not be difficult if you're willing to pay AWS / Azure / GCP 10-100x what it would cost to serve it yourself (which in many cases actually makes sense).

It's possible at Netflix's size, they couldn't just run on AWS anymore. Though, given enough lead time and a realistic growth curve - I'm sure it's feasible.

Obviously scaling manufacturing is not a solved problem like (realistically) scaling network and compute usage.


Serving Netflix streaming traffic from AWS would be… unwise. One the bandwidth cost would be enormous even if they can handle it. And two I doubt they can handle that much traffic.


> Building software and productionalizing/scaling it are two very different problems, and the latter is far more difficult.

Is this claim based on some example I should know? Countless companies never achieve product/market fit, but very few I can think of fail because they weren't able to handle all their customers.


and maybe it can help stem asinine "System Design" interviews like "Design netflix"

I call them asinine because these kinds of architectures aren't written out in one day (let alone 60 minutes) and it's silly to not build an evolving setup (unless you are replacing a legacy system in a company with scale, but then again you have more than 60 minutes...). I wish system design interviews would not just think about if you know how to write high scale designs, but if you know how to build tiny designs that minimize footprint before scale and give maximum flexibility for scaling when its indicated ...


Lmao. _But it does_. At least for the most part.

It's such a previous generation thing to be angry at the way that modern development is done. Of _course_ dev is now web heavy and people are pulling in all sorts of libraries to make their lives easier.

But pretending that those same devs wouldn't also be capable of development on lower/less abstracted levels (well sure, maybe not _all_ of them) is insulting.


I'm not sure this particular presentation helps your point though? I sifted through it and if anything I was struck with how simple it seemed? I'm sure there's more to running Netflix though and in my mind they're allowed to have as many engineers as they see fit.


Sure, if you place yourself in an arbitrarily hard problem, it takes a lot to solve it. "How we dug a 100m pit without using machines in 2 days" is an incredible feat, but the constraints only serve those who put them.

Serving large content has been solved for decades already. It's much easier and reliable to serve from multiple sources, each at their maximum speed. Want more speed ? Add another source. Any client can be a source.

Netflix artificially restrains itself by only serving from their machines. It is a very nice engineering feat, but is completely artificial. As a user it feels weird to think of them highly when they could just have gone the easier road.


This just isn't true though. I worked at a relatively minor video streaming company and we overloaded and took down AWS CloudFront for an entire region. They refused to work with us or increase capacity because the datacenter (one of the ones in APac) was already full. This was on top of already spreading the load across 3 regions. We only had a few million viewers.

We ended up switching to Fastly for CDN. There's something hidden here though that becomes a problem at Netflix size. We were willing to pay the cloud provider tax, and we didn't dig down into kernel level or storage optimizations because off the shelf was good enough. At Netflixes scale, that adds up to millions of extra server hours you have to pay for if you don't do the 5% optimizations outlined in the article.


You still have the same constraints: only you can serve content.

The solution I'm talking about is bittorrent. The more people watch your content, the less your servers bear load. That is using the internet to its best potential instead of reverting back to the centralized model of the big shopping mall and its individual users.


Plenty of people have tried to build a peer to peer streaming system. Amazing tried for years.

They work like garbage in practice for mainstream users.


The constraint is profit. Sure, with unlimited money you can just keep getting more and more servers. But that costs money. It would end up swamping any profit to be made.

By creating this optimized system, it makes serving that much video profitable.


No, the constraints is only you serve content. But once the content is distributed, anyone else can also distribute it.


And break the profit and also probably legal constraints. Good job now you don’t have a company anymore.


I'm curious as to who you think would pay for the video if anyone could distribute it and watch it.


How would you do it if you had much more modest scale requirements? Say a few thousand simultaneous viewers. I'm kicking around an idea for a niche content video streaming service, but I don't know much about the tech stacks for it.


Use bittorrent. Every viewer is also a source. The more people watch, the less your servers are loaded.

Bittorrent is built towards "offline" viewing. Try Peertube for a stack that is more built for streaming and has bittorrent sharing built-in (actually webtorrent, because the browser doesn't speak raw TCP or UDP, but the idea is the same)


A few thousand?

Just use Nginx and a backend lang of your choosing.


Not even bother using a cdn?


For low-traffic niche content that might not be a cache hit in the first place in every region?

I wouldn't bother. Unless you use storage at the CDN - which is probably very not cost effective for you.


Seems to be mixing too many things here. Many scaling/ hardware challenges need a lot of people but it can still be true that Netflix has choke full of engineers making half-assed turd Java frameworks day in and day out. I know this because we are forced to use these crappy tools as they are made by Netflix so supposed to be best.

It's just that they succeeded in streaming market with low competition and great success bring in lot of post facto justifications on how outrageously great Netflix tech infra is.

I mean it may be excellent for their purpose but to think their solution can be industry wide replicated seems not true to me.


Que? You don't seem to have much justification for your points; it seems more like a rant as you have had a bad experience using software provided by Netflix. It would be great if you could provide more details about what was wrong with it rather than just "we are forced to use these crappy tools". I'm genuinely interested.

In my personal experience lots of companies (admittedly all large companies, but many of which sell their services / software / hardware to smaller companies) have a use for serving hundreds of Gbps of static file traffic as cheaply as possible. And the slides for this talk seem exactly on the money (again from my experience slinging lots of static data to lots of users).


So Netflix published a framework which seemingly isn't suitable for your use case, your managers forced you to use it, and your response is to blame...Netflix?


That's hilarious. To me it reinforces how few resources you need to run an operation like Netflix. The hardware is brilliantly fast. The more engineering you do, the slower it gets. Engineering is an aristocratic tradition. What you see today is an imitation of that, done badly, which Netflix tries to circumvent.


Scaling streaming for a company at the size of Netflix is very easy. You can use any edge cache solution, even homemade. The complexity at N seems to stem from other things.


>You can use any edge cache solution

Umm, those solutions exist (from places like AWS and Azure) because Netflix was able to do it without them. The cloud platforms recognized that others would want to build their own streaming services, so they built video streaming offerings.

You have the cart in front of the horse. The out-of-the-box solutions of today don’t exist without Netflix (and YouTube) building a planet scale video solution first.


N had problems in US because they served data from CA. Today, N uses edge caching and the data for me in Europe is sent less than 10km to my home. And it should be cheap. We are talking about serving static content here. It is not very difficult.


Why do you think Netflix served out of California? They only did that for the first few months, until they adopted Akamai, Limelight, and L3 CDNs. That was long before Netflix launched in Europe.


Well they use to, they tried to bully various ISPs into increasing their throughput before they jumped the edge cache wagon, long time after competitors. Akamai is a stellar company. Don’t think N uses A services today. At the end of the day. N mostly serves static content to users and I highly doubt that hardware costs is a very relevant parameter.


With all due respect, you have no idea what you're talking about. I worked there during the transition from 3rd party CDNs to OpenConnect. We got off 3rd party CDNs in 2013/4 and operated solely out of OpenConnect, in large part because no 3rd party CDN was capable of serving our amount of video at any price, including Akamai. We weren't even streaming out of our own datacenter anymore by the time I started, and that was when streaming was still free with your DVD plan.

And your timeline is all wrong too. Netflix didn't even engage with the ISPs about bandwidth until long after moving out of our own datacenter. We started the OpenConnect program specifically to make it easier for ISPs, there was no bullying. The spat you're thinking of is that Comcast didn't want to adopt the OpenConnect but also didn't want to appropriately peer with other networks to give their customers the advertised speeds.

And hardware cost is a hugely relevant parameter. Being efficient with hardware is the difference between profitable streaming at that scale and not profitable.


You mean all the heat maps provided by Comcast and so on from 2014(?) are incorrect? That they lied about all the traffic from CA caused by N?


Please link those heat maps. I think you're reading them wrong.


> any edge cache solution

Someone still has to do the R&D for edge cache? These slides are about Open Connect - their own edge cache solution that gets installed in partners racks (i.e. ISPs and Exchanges). Before things that Netflix and Nginx implemented in FreeBSD, hardware compute power was wasted on various things they discuss in slides.

Yes, you can throw money at the problem and buy more hardware.


Fair. Point taken. I answered the comment not the article.


This is exactly the type of comment OP is referring to. Have you build a steaming service at this scale? Do you actually know what’s involved? Or are you just looking at the surface level, making a bunch of assumptions and reaching a gut feel conclusion?


The gut feeling is that you basically need a file share that serves static files that are 5-50MB in size that are joined together on client side.

How do you think this compares in complexity with real time distributed transactions that spans across several financial partners across the globe?


I have some experience serving static content and working with CDNs. Here is what I find interesting / unique here:

- They are not using OS page cache or any memory caching for that, every request is served directly from disks. This seems possible only when requests are spread between may NVMe disks since single high-end NVMe like Micron 9300 PRO has max 3.5GB/s read speed (or 28Gbps) - far less than 800Gbps. Looks like it works ok for long-tail content but what about new hot content everybody wants to watch at the day of release? Do they spread the same content over multiple disks for this purpose?

- Async I/O resolves issues with nginx process stalling because of disk read operation but only after you've already opened the file. Depending on FS / number of files / other FS activities, directory structure opening the file can block for significant time and there is no async open() AFAIK. How they resolve that? Are we assuming i-node cache contains all i-nodes and open() time is insignificant? Or are they configuring nginx() with large open file cache?

- TLS for streamed media was necessary because browsers started to complain about non-TLS content. But that makes things sooo complicated as we see in the presentation (kTLS is 50% of CPU usage before moving to encryption offloaded by NIC). One has to remember that the content is most probably already encrypted (DRM), we just add another layer of encryption / authentication. TLS for media segments make so little sens IMO.

- When you relay on encryption or TCP offloading by NIC you are stuck with that is possible with your NIC. I guess no HTTP/3 over UDP or fancy congestion control optimization in TCP until the vendor somehow implements it in the hardware.


Responding to a few points. We do indeed use the OS page cache. The hottest files remain in cache and are not served from disk. We manage what is cached in the page cache and what is directly released using the SF_NOCACHE flag.

I believe our TLS initiative was started before browsers started to complain, and was done to protect our customer's privacy.

We have lots of fancy congestion optimizations in TCP. We offload TLS to the NIC, *NOT* TCP.


Can I ask if your whole content can be stored on a single server so content is simply replicated everywhere or there is some layer above that that directs requests to the specific group of servers storing the requested content? I assume the described machine is not just part of tiered cache setup since I don't think nginx capable for complex caching scenarios.


No, the entire catalog cannot fit on a single server.

There is a Netflix Tech Blog from a few years ago that talks about this better than I could: https://netflixtechblog.com/content-popularity-for-open-conn...


> We offload TLS to the NIC, NOT TCP.

How is this possible? If TCP is done on the host and TLS on the NIC data will need to pass through the CPU right? But the slides show cpu fully bypassed for data


The CPU gets the i/o completion for the read, and is in charge of the ram address where it was stored, but it doesn't need to read that data...

Modern NICs use packet descriptors that allow you to more or less say take N bytes from this address, then M bytes from some other address, etc to form the packet. So the kernel is going to make the tcp/ip header, and then tell the nic to send that with the next bytes of data (and mark it for TLS however that's done).


A Micron 9300 Pro is getting rather long in the tooth. They are using PCIe gen 4 drives that are twice as fast as the Micron 9300.

My own testing on single socket systems that look rather similar to the ones they are using suggests it is much easier to push many 100 Gbit interfaces to their maximum throughput without caching. If your working set fits in cache, that may be different. If you have a legit need for sixteen 14 TiB (15.36 TB) drives, you won't be able to fit that amount of RAM into the system. (Edit: I saw a response saying they do use the cache for the most popular content. They seem to explicitly choose what goes into cache, not allowing a bunch of random stuff to keep knocking the most important content out of cache. That makes perfect sense and is not inconsistent with my assertion that hoping a half TiB cache will do the right thing with 224 TiB of content.)

TLS is probably also to keep the cable company from snooping on the Netflix traffic, which would allow the cable company to more effectively market rival products and services. If there's a vulnerability in the decoders of encrypted media formats, putting the content in TLS prevents a MITM from exploiting that.

From the slides, you will see that they started working with Mellanox on this in 2016 and got the first capable hardware in 2020, with iterations since then. Maybe they see value in the engineering relationship to get the HW acceleration that they value into the hardware components they buy.

Disclaimer: I work for NVIDIA who bought Mellanox a while back. I have no inside knowledge of the NVIDIA/Netflix relationship.


Just from reading the specs (I.E. real world details might derail all of this):

https://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2

Given one can specify arbitrary offsets for sendfile(), it's not clear to me that there must be any kind of O(k > 1) relationship between open() and sendfile() calls: As long as you can map requested content to a sub-interval of a file, you can co-mingle the catalogue into an arbitrarily small number of files, or potentially even stream directly off raw block devices.


Does the encryption in DRM protect the metadata?


AFAIK no. The point of DRM is to prevent recording / playing the media on a device without decryption key (authorization). So the goal is different than TLS that is used by the client to ensure the content is authentic, unaltered during transmission and not readable by a man-in-the-middle.

But do we really need such protection for a TV show?

"Metadata" in HLS / DASH is a separate HTTP request which can be served over HTTPS if you wish. Then it can refer to media segments served over HTTP (unless your browser / client doesn't like "mixed content").


> But do we really need such protection for a TV show?

DRM may be mandated by the content owners. TLS gives Netflix customers privacy against their ISP snooping what they're watching.


> But do we really need such protection for a TV show?

What you watch can be a very private thing, especially for famous people.


No, and it doesn't protect the privacy of the viewer either!


FWIW, neither does the TLS layer: because the video is all chunked into fixed-time-length segments, each video causes a unique signature of variable-byte-size segments, making it possible to determine which Netflix movie someone is watching based simply on their (encrypted) traffic pattern. Someone built this for YouTube a while back and managed to get it up to like 98% accuracy.

https://www.blackhat.com/docs/eu-16/materials/eu-16-Dubin-I-...

https://americansforbgu.org/hackers-can-see-what-youtube-vid...


Did TLS 1.3 fix this with content length hiding? Doesn't it add support for variable-length padding that could prevent the attacker from measuring the plaintext content length? Do any major servers support it?


A lot of the reasons they've had to build most of this stuff themselfs is because they decided for some reason to use freeBSD.

The NUMA work they did, I remember being in a meeting with them as a Linux Developer at Intel at the time. They bought NVMe drives or were saying they were going to buy NVMe drives from Intel which got them access to "on the ground" kernel developers and CPU people from Intel. Instead of talking about NVMe they spent the entire meeting asking us about howt the Linux kernel handles NUMA and corner cases around memory and scheudling. If I recall correctly I think they asked if we could help them upstream BSD code for NVMe and NUMA. I think in that meeting there was even some L9 or super high up NUMA CPU guy from Hillsborough they some how convinced to join.

The conversation and technical discussion was quite fun, but it was sort of funny to us at the time they were having to do all this work on the BSD kernel that was solved years ago for linux.

Technical debt I guess.


Netflix tried Linux. FreeBSD worked better.


It's hard to believe in 2022, Google, Amazon, FB etc .. all use Linux, all CDN use Linux as well, and some services serve even more traffic than Netflix ( Youtube ). BSD faster than Linux is a myth, the fact that 99% of those run on Linux means more people worked on those problems means it's most likely always faster.

The funny thing is the rest of Netflix runs on Ubuntu, only those edge CDN runs on BSD.


Disclaimer: SRE at Google on a team vaguely related to video CDN stuff, but have no inside knowledge

I don’t think you can dismiss BSD faster than Linux (or make any claims about the relative speed of different OSes) just because big companies run Linux. There are other costs involved and optimisations that can be shared if your edge serving stack is as similar as possible to the non-edge serving stack (that you have many more engineers developing for).

All you can conclude is that with enough optimisation, Linux can be made to perform well enough for it to not be worth replacing (yet). Because replacing Linux would require replicating all the custom software and optimisations made to it for whatever other platform you pick.


You'd be surprised how many businesses run FreeBSD and keep it a secret as a competitive advantage.


*At the time when they created the OCA project.

If someone was going to do a similar comparison now the results could be different.


By some definition of better.


It worked faster. It's a common misconception among newbies that "Linux has NUMA" automatically means it will use NUMA properly in a given workload. What it actually means is you _should_ be able to use existing functionality. Sometimes you'll only need to configure it, sometimes you'll need to reimplement it from the scratch, and doing that in FreeBSD is easier because there's less bloat.


I still don't get the NUMA obsession here. It seems like they could have saved a lot of effort and a huge number of powerpoint slides by building a box with half of these resources and no NUMA: one CPU socket with all the memory and one PCIe root complex and all the disks and NICs attached thereto. It would be half the size, draw half the power, and be way easier to program.


This is a testbed to see what breaks at higher speed. Our normal production platforms are indeed single socket and run at 1/2 this speed. I've identified all kinds of unexpected bottlenecks on this testbed, so it has been worth it.

We invested in NUMA back when Intel was the only game in town, and they refused to give enough IO and memory bandwidth per-socket to scale to 200Gb/s. Then AMD EPYC came along. And even though Naples was single-socket, you had to treat it as NUMA to get performance out of it. With Rome and Milan, you can run them in 1NPS mode and still get good performance, so NUMA is used mainly for forward looking performance testbeds.


Modern CPUs like the AMD EPYC server processor are "always NUMA", even in single-socket configurations!

They have 9 chips on what is essentially a tiny, high-density motherboard. Effectively they are 8-socket server boards that fit in the palm of your hand.

The dual-socket version is effectively a 16-socket motherboard with a complex topology configured in a hierarchy.

Take a look at some "core-to-core" latency diagrams. They're quite complex because of the various paths possible: https://www.anandtech.com/show/16214/amd-zen-3-ryzen-deep-di...

Intel is not immune from this either.Their higher core-count server processors have two internal ring-bus networks, with some cores "closer" to PCIe devices or certain memory buses: https://semiaccurate.com/2017/06/15/intel-talks-skylakes-mes...


If you are buying servers at scale the costs will certainly add up vs. buying two processors. If you buy single proc servers, that is double the amount of chassis, rail kits, power supplies, power cables, drives, iLO/iDRAC licenses, etc.


You can build motherboards with two or more completely isolated sets of CPU and memory, that are physically compatible with standard racks etc.


Good point, I forgot about those. It would be interesting to see if 1x PowerEdge C6525 with four single processor nodes is cheaper than 2x Dell R7525 servers. The C6525 does support dual processor, so it does seem a bit wasteful to me.


Can you buy non NUMA mainstream CPUs though? Honest question because I’d love to be rid of that BS too


NUMA is an outcome of system configuration. You can make a non-NUMA platform using any CPU. You just limit yourself to 1 CPU socket.

Here's a Facebook engineering blog post about how they left NUMA behind. https://engineering.fb.com/2016/03/09/data-center-engineerin...


> You can make a non-NUMA platform using any CPU. You just limit yourself to 1 CPU socket.

Well, not on Epyc generation 1. Those have four NUMA segments in each socket.

Also those Xeon Platinum 9200 processors Intel made as an attention grab.


EPYC Naples wasn't good for much of anything though, so I am trying to forget it.


Is NUMA a solved issue on Linux? Correct me if I am wrong but I was under the impression it may be better handled under certain conditions, but NUMA, the problem in itself is hardly solved.


Maybe Brendan Gregg can further enlighten his new coworkers at Intel why Netflix chose both AMD & FreeBSD.


The OpenConnect team at Netflix is truly amazing and lots of fun to work with. My team at Netflix partnered closely with them for many years.

Incidentally, I saw some of their job posts yesterday. If you think this presentation was cool, and you want to work with some competent yet humble colleagues, check these out:

CDN Site Reliability Engineer https://jobs.netflix.com/jobs/223403454

Senior Software Engineer - Low Latency Transport Design https://jobs.netflix.com/jobs/196504134

The client side team is hiring, too! (This is my old team.) Again, it's full of amazing people, fascinating problems, and huge impact:

Senior Software Engineer, Streaming Algorithms https://jobs.netflix.com/jobs/224538050

That last job post has a link to another very deep-dive tech talk showing the client side perspective.


>Senior Software Engineer - Low Latency Transport Design

I am not a Netflix subscriber but I dont think Netflix does live streaming for anything much if at all.

The Ultra Low Latency seems to suggest Netflix is exploring this idea. Which could be Live Sport or some other shows.


I am curious why they manually split the video to compress individual clips with different bit rates. Encoders usually have a variable-bit-rate option that does the same?


That's the essence of adaptive streaming - the player continuously selects a video bitrate that can be downloaded quickly enough given the network conditions. That requires multiple video bitrates being available, and each one must be encoded such that the player can switch between them at known cut points.


I see netflix has an office in Toronto, but the jobs are "remote, US". Any idea if remote is also an option for canadians?



I found the same video on the website of the summit: https://nabstreamingsummit.com/videos/2022vegas/

I’m on mobile and there does not seem to exist a direct link. Search for: “Case Study: Serving Netflix Video Traffic at 400Gb/s and Beyond”


Video of this presentation available here: https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa...


thank you!


And this is still on ConnectX-6 Dx, with PCI-Gen 5 and ConnectX-7, Netflix should be able to push for 1.6Tbps per box. This will hopefully keep drewg123 and his team busy for another year :P


At that point, RAM itself would likely be the bottleneck.

But maybe DDR5 will come out by then and get this team busy again lol.


Genoa does indeed have roughly double the memory bandwidth.


This is amazing work from the Netflix team. I'm looking forward to 1.6 Tb/s in 4 years.

It is interesting that this work is happening on FreeBSD, and potentially with diverging implementations than Linux. Linux programs seem to be moving towards userspace getting more power, with things like io_uring and increasing use of frameworks like DPDK/SPDK. This work is all about getting userspace out of the way, with things like async sendfile and kernel TLS. That's pretty neat!


kTLS has been added to linux too including offload. It also has p2p-dma, so in principle you can shovel the file directly from NVMe to the NIC and have the NIC encrypt it, so it'll never touch the CPU or main memory. But that only works on specific hardware.


Memory is the cache for popular content. You couldn’t serve fast enough directly from NVMe.

“~200GB/sec of memory bandwidth is needed to serve 800Gb/s” and “16x Intel Gen4 x4 14TB NVME”. So each NVMe drive would need to serve 12.5GB/s which is more than the 8GB/s limit for PCIe 4.0 x4. Also popular content would need to be on every drive, drastically lowering the total content stored.

Also see drewg’s comment on this for a different reason: https://news.ycombinator.com/item?id=32523509


With HBM2 sapphire rapids chips, I assume you can actually get there. There is probably an insane price premium for them, though, so I wouldn't hold my breath.


PCIe Gen 5 drives look poised for wide availability next year and NVIDIA has been demoing CX7 [1] which is also PCIe Gen 5. Intel already has some Gen 5 chips and AMD looks like they will follow soon [2]. Surely there will be other bumps, but I bet they pull it off in way less than 4 years.

1. https://www.servethehome.com/nvidia-connectx-7-shown-at-isc-...

2. https://wccftech.com/amd-epyc-7004-genoa-32-zen-4-core-cpu-s...


This is innovation and proper engineering. They choose freebsd. Shows they are not afraid of solving actual hard problems that yield impressive results. These are the types of engineers i’d hire in a heart beat - if i ever was to own a successful company.

Simply following trends and doing what everyone else does leads to mediocre results and the assembly line type of work that most software development has become.


Here is an interesting thought experiment. With PCI-7.0 coming in 2025/2026, product launch by 2028. And assuming no further bottleneck, we assume even a linear increase of memory bandwidth would be handled with HBM3 or HBM4. We should be able to hit 6.4Tbps by 2030. In a single 1U unit. With next generation codec VVC, or even a generation beyond that. Serving an average 10Mbps files, a single box is capable of serving 640K customers. A single Rack could serve 25 Million Customer. i.e 10 Rack would be able to serve all of the current Netflix subscribers world wide, simultaneously.

Although I would assume the CPU being the bottleneck with 8x the required processing power.


This got me thinking, at what point would the benefits of these optimisation are no longer meaningful. At 225Million subscriber, assuming somehow, it has a peak concurrent user of 70% total, around 160M. This figure should already be way above the actual number because they are spread across the Globe in different Time Zone. At 10Mbps Netflix would need a peak bandwidth of 1600M Mbps, or 1600Tbps. At 1.6Tbps that is only 1000 box required worldwide. Or 2000 if you include redundancy. Once you factor in you need a box with every ISP, Local Peer Exchange or Regional ISP worldwide, and there are hundreds of those. Shrinking it further wouldn't make much of a different.

Hopefully that means Netflix now has the incentive to bump bitrate to 20Mbps or higher.


Everything old is new again: Anyone remember seeing a 32-bit/33 MHz PCI (not pci-x, not pci-e) card for SSL acceleration in the late 1990s? It was totally a thing at one point in time when your typical 1U rackmount server was single-core CPUs and quite weak in overall math processing power.

OpenBSD had support for them like 22 years ago.

https://www.google.com/search?client=firefox-b-d&q=SSL+accel...

Now we have TLS1.2/TLS1.3 offload getting built into the PCI-E 4.0 100/200/400GbE (whatever speed) NIC.


The cards you reference are not really the same thing. They are doing lookaside acceleration, where the card has to transfer the data from host memory, encrypt it, and transfer it back.

The Mellanox (and Chelsio) NICs do in-line crypto. They DMA down the network packets as they would have to do anyway in order to send them. Then they encrypt them on their way out onto the wire. This reduces memory bandwidth requirements by roughly 50%.


Author here. AMA


Every iteration of this prezzo I've seen over the years has made for a fascinating morning read, thanks!

As much as I enjoy the results of the work, I'm always a bit curious how the sausage is made. Is pushing the hardware limits your primary job or something you do periodically? How do you go about selecting the gear you use? How much do you work with the vendors? (etc etc) I'd really enjoy a behind the scenes blog post or something wrt this serving absurd amounts of traffic from a single box.


My role is to make our CDN servers more efficient. One of the easiest and most fun ways to do that is to push servers as hard as I can and see what breaks and what doesn't scale. I also work with our hardware team and their vendors to evaluate new hardware and how it can fit into our system.

But I do plenty of other things as well, including fixing random kernel bugs. You can read the git log of the FreeBSD main branch to see some of the things I've been working on..


When are you going to cut the CPU/main memory out completely?

The bottleneck is at your NIC anyways, so seems like there would be a market for NIC that can directly read from disk into NIC's working memory


We've looked at this. The problem is that NICs want to read in TCP MSS size chunks (1448 bytes, for example), while storage devices are highly optimized for block-aligned (4K) chunks. So you need to buffer the storage reads someplace, and for now the only practical answer is host memory. There are NVME technologies that could help, but they are either too small, or come at too large of a price premium. CXL memory looks promising, but its not ready yet.


Does it? I thought with segmentation offloads the NIC basically gets TCP stream data in more or less arbitrary sizes, and then segments in into MTU sizes on its own?


We do fairly sophisticated TCP pacing, which requires sending down some small multiple of MSS to the NIC, so it doesn't always have the freedom to pull 4K at a time.


You mention AIO in nginx.

In 2021 somebody submitted a patch for io_uring support in nginx:

https://mailman.nginx.org/pipermail/nginx-devel/2021-Februar...

I'm not sure if there has been further progress on it so far. In one comment feedback is "it doesn't seem to make the typical nginx use case much faster" [at that time].

But I find this interesting, because io_uring can make almost all things async that can't be used async so far in Linux (open(), stat(), etc) and thus in nginx.

Would io_uring integration in nginx be relevant for you?


Thank you very much "drewg123"!

Future technology advances increasingly looks like this complex work integrating hardware, OS fixes, team collaboration. People and teams and companies working together, and contributing to shared resources like FreeBSD. Tolerating mistakes at scale, giving credit where credit is due, and all the other things that make respect real, which creates a space to get things done.

Most of us will never get close to these opportunities or contexts, but still it helps us advance our own technique/culture to observe and model your story. And perhaps you'll help new collaborators find you. All the best.


At what point does it make sense to replace the CPU and OS with custom hardware and software? At this point the CPU is basically doing TCP state maintenance and DMA supervision, but not much else, right?

I totally get the cost, convenience, and supply chain risk-value in commodity stuff that you can just go out and buy, but once you're bound to a single network card, this advantage starts to go away, and it seems like you're fighting with the entire system topology when it comes to NUMA, no? Why not a "TCP file send accelerator" instead of a whole computer?


I suppose you could attach NVMe drives directly to Bluefield and cut out x86.


> Sendfile

Ah, so this is why everything stutters / falls apart when you switch subtitles on or off -- it has to access a whole different file and resume at the same place in that file I assume? I would think you would want the (verbal) audio separated out in a different file so it can be swapped out on the fly without re-initializing the video stream, and same thing with subtitle files? I'm just making some assumptions based on the behavior I've seen but would be cool to know how this works.


No, video and subtitles are separate files.

I've never seen this bad behavior myself. Do you mind sharing the client you're using?


4k Apple TV


You're using ConnectX-6 Dx here. Any technical reason for that particular NIC, or just haven't gotten around to ConnectX-7s yet?

Have you examined other NIC vendors? (Chelsio?)


This talk was given roughly 4 months ago. CX7 was not available yet. I'm looking forward to testing on them when we get some.

We looked at Chelsio (as T6 was available well before CX6-DX). However, the CX6-DX offers a killer feature not available on T6. The CX6-DX can remember the crypto state of any in-order stream, while the T6 cannot. That means that the TCP stack can send, say, 4K of a TLS record, wait for acks, and come back 40ms later and send the next 4K and DMA just the requested 4K from the host. The T6 cannot remember the state, and would need to DMA the first 4K (which was already sent) in order to re-establish the crypto state, and then DMA the requested 4K. This could run the PCIe bus out of bandwidth. The alternative is to make TCP always chunk sends at the TLS record size, but this was horrible for streaming quality.


> Serve only static media files

This part I don't get. How about DRM? Unless Netflix pre-DRM all contents for all user?


Encrypting assets on the fly using a per-consumer symmetric key would be prohibitively expensive, so I'm sure the media is stored pre-encrypted using a shared symmetric key.

It only really matters that this key is unique per package, not per user, because once even a single user can compromise the trusted execution environment and extract either the key or the plain video stream, that piece of content is now pirated anyway. So, key reuse against the same content probably isn't really a major part of the threat model - this attacker could share the key with others, but they might as well share the decrypted content instead.


Yes, all our content is also DRMed. Else somebody could easily pirate content..


To be fair, it already seems easily pirated. DRM is useless, if content is able to be viewed on some personal device it can be ripped and shared. I'd be curious how much effort/money companies dump into adding DRM measures, it seems like a lost cause. Maybe it just makes the execs sleep better at night.


This is a myth.

Netflix's DRM is sufficiently good that the Reddit Piracy subreddit has spent the last three months moaning that they have no access to 4K Netflix rips, at least for weeks or months after the content comes out.

Netflix's DRM and key management systems do what they care about pretty well at this point, which is protect the initial airing of popular shows.


1080p rips are found within hours of the content first airing on NetFlix. You don't need to download Linux ISOs in full quality to enjoy them.


I would think that media files would be already encrypted and gets decrypted by the Netflix client. Otherwise the DRM could easily be defeated by using something like Wireshark.


Do you have a link to video or audio for this presentation? I'm probably don't speak for just myself when I say I would love to see it.


Someone else linked the video here: https://news.ycombinator.com/item?id=32520750


What kind of tuning is done in the BIOS? Is that profile available to view to everyone? Are you using a custom BIOS from Dell?


Not much tuning needed to be done. The little that was is mentioned in the talk, and was basically to set NPS=1 and to disable DLWM, in order to be able to access the full xgmi interconnect bandwidth at all times, even when the CPU is not heavily loaded.


What about the storage? Is it using Raid? Does blocksize matter? What filsystem is used?


Every storage device is independent (no RAID), and runs UFS. We use UFS because, unlike ZFS, it integrates directly with the kernel page cache.


Which is part of the reason sendfile doesn’t work well for ZFS.



As a total outsider it looks like FreeBSD is the “silent” OS behind lot of the big money projects. Not just this but recently learned it’s the base OS of the Playstation 4 and 5 system too. Is there a reason why FreeBSD is so popular? Just general reliability? And why not the other BSD projects? Also one, like me, would assume Linux is behind all of these but alas not.


In Netflix's case, the FreeBSD networking stack was a lot faster at the time.

FreeBSD (and other BSDs) are also a much simpler platform with orders of magnitude less "churn" taking place - if you are doing an engineering experiment that involves low-level integration with OS-level or kernel-level subsystems (eg kernel TLS f.ex) then it's a lot simpler to start from BSD, and you can very safely assume that your code will pretty much continue working in future releases as well. It's a far more stable platform to build a product around, Linux is "constantly on fire all the time" in comparison to the absolutely sedate pace of BSD development. It was built over the last 50 years (the oldest code in the FreeBSD repos dates to 1979) and it mostly Just Works Like You'd Expect. The good old 'cathedral vs the bazaar' tradeoff. Well, if you are building an annex, at least you know the cathedral isn't going to collapse under you next week, and by the time you get your project finished the bazaar might have moved on to something completely different and abandoned the thing you needed. But you don't get docker and the other New Things either.

Sony likes it because it allows them to release proprietary software off an open codebase. The conceptual divide is - GPL protects end user freedoms, at the cost of developer freedoms (eg linking to code with GPL-incompatible licenses, or not distributing the source to proprietary extensions to a GPL'd product). While BSD/MIT license it's the other way around, they're "here's this code, you can use it however you want", and sometimes that means doing things that users might consider "evil".


The sysadmin experience on FreeBSD used to be more opinionated than on Linux. This was before most Linux distros adopted systemd.

The reason companies like Sony and Apple pick FreeBSD is because they get an open source POSIX-compliant OS they can drastically modify down to the kernel level without having to open source their modifications.


Sony used it because they got an entire OS for free with no obligation to release the source.


Blame the GPL for this. The GPL is directly responsible for the livelihood of many/all BSD developers, and i could not be happier about that. Linux is overrated in a lot of ways.


However the GPL is irrelevant in this case because the NetFlix Open Connect appliance is not sold to ISPs. The GPL is only relevant if you are distributing GPL software e.g. Sony PlayStation.


Not directly relevant perhaps. I would not go as far as to say it is necessarily completely irrelevant; there are many ways usage of FreeBSD in one portion of a company, or by employees hired from another company, can affect decisions made by the company.

Who knows what plans Netflix had in mind when this decision was made? Maybe they simply commit to FreeBSD because they do not want to release their code under the GPL. I certainly never do, and I will never ever make any improvements to any GPL code I use. I think it is an awful license. Maybe Netflix does, too? That would be directly relevant to the license of the OS even if they never release their customized OS, if they have a customized version at all.


Linux is powering the web, like 99% of it + half of all the mobile phones. BSD is like a drop in the ocean in comparison.


This is incredible. I really like how you're able to trace the evolution of the systems as well.

It makes me wonder what the next hardware revolution will be. It seems like most resource intense applications are bottlenecking at transferring memory. UE5's nanite tech hinges on the ability to transfer memory directly from disk to GPU, Netflix built specific hardware to avoid copying memory between userspace and hardware, and I wonder how much other performance we're missing out on because we can't transfer memory fast enough.

How much faster could AI training be if we could get memory directly from disk to the GPU and avoid the CPU orchestrating it all? What about video streaming? I have a feeling these processes already use some clever tricks to avoid unnecessary trips through the CPU, but it will be interesting to see which direction hardware goes with this in mind.


This is definitely the direction that things are going. In the GPU space, see things like GPUDirect[1]. In networking and storage, especially for hyperscale stuff, see the rise of DPUs[2] replacing CPUs.

1: https://developer.nvidia.com/gpudirect

2: https://www.servethehome.com/what-is-a-dpu-a-data-processing...


So... the driver and device level seems happy here, but is anyone else creeped out by "asynchronous sendfile()"? I mean, how do you even specify that? You have a giant file you want dumped down the pipe, so you call it, and... just walk away? How do you report errors? What happens to all the data buffered if the other side resets the connection? What happens if the connection just stalls?

In synchronous IO paradigms, this is all managed by the application with whatever logic the app author wants to implement. You can report the errors, ping monitoring, whatever.

But with this async thing, what's the API for that? Do you have to write kernel code that lives above the driver to implement devops logic? How would one even document that?

+1 for the technical wizardry, but seems like it's going to be a long road from here to an "OS" feature that can be documented.


This has been a feature upstream in FreeBSD for roughly 6 years.

If there is a connection RST, then the data buffered in the kernel is released (either freed immediately, or put into the page cache, depending on SF_NOCACHE).

sendfile_iodone() is called for completion. If there is no error, it marks the mbufs on the socket buffer holding the pages that were recently brought in as ready, and pokes the TCP stack to send them. If there was an error, it calls TCP's pru_abort() function to tear down the connection and release what's sitting on the socket buffer. See https://github.com/freebsd/freebsd-src/blob/main/sys/kern/ke...


Here is announcement for the feature: https://www.nginx.com/blog/nginx-and-netflix-contribute-new-...

And here are the slides explaining it: https://www.slideshare.net/facepalmtarbz2/new-sendfile-in-en...

There are video of various talks by Gleb Smirnoff explaining all this magic on YouTube.

The feature is fully documented in `man 2 sendfile`, it was part of the patch that did the work.


This was my thought too. I've been struggling with the concept of "if you don't have anything nice to say, don't say anything at all" lately, because I've been programming too long and just see poison pills and better alternatives everywhere I look.

But I believe that async is an anti-pattern. From the article:

  * When an nginx worker is blocked, it cannot service other requests
  * Solutions to prevent nginx from blocking like aio or thread pools scale poorly
Nothing against nginx (I use it all the time, it's great) but I probably would have used a synchronous blocking approach. The bottleneck there would be artificial limits on stuff like I/O and the number of available sockets or processes.

So.. why isn't anyone addressing these contrived limits of sync blocking I/O at a fundamental level? We pretend that context switching overhead is real, but it's not. It's an artifact of poorly written kernels from 30+ years ago (especially in Windows) where too many registers and too much thread state must be saved while swapping threads. We're basically all working around the fact that the big players have traditionally dragged their feet on refactoring that latency.

And that some of the more performant approaches like atomic operations using compare and swap (CAS) on thread-safe queues beat locks/mutexes/semaphores. And that content-addressable memory with multiple busses or even network storage beats vertical scaling optimizations.

So I dunno, once again this feels like kind of a drink-the-kool-aid article. If we had a better sync blocking foundation, then a simple blocking shell script could serve video and this whole PDF basically goes away. Rinse, repeat with most web programming too, where miles-long async code becomes a single deterministic blocking function that anyone can understand.

I'm kind of reaching the point where I expect more from big companies to fix the actual root causes that force these async workarounds. I kind of gave up on stuff like that over the last 10 years, so am behind the times on improvements to sync blocking kernel code. I'd love to hear if anyone knows of an OS that excels at that.


If you rely on blocking I/O, you've got to deal with blocked processes as i/o isn't instant.

You can deal with that by having lots of threads or processes doing the work, but coordinating many threads can be difficult. With async sendfile, you let the kernel manage all of this without needing userspace to do anything, and the kernel is well organized to manage pushing data to the right place when disk i/o completes.

If you just want to write code in blocking style, check out Erlang, but know that under the hood it massages most of the blocking into aggregated select/kqueue/epoll/etc calls.


I probably would have used a synchronous blocking approach

Then Varnish is probably more your style. (A discussion between phk and drewg would be fascinating to watch.)

We pretend that context switching overhead is real, but it's not.

This sounds crackpot to be honest. Linux has put a lot of effort into optimizing context switching (that's why they have NPTL instead of M:N) and I assume FreeBSD has as well.

...this whole PDF basically goes away

Sync vs. async doesn't solve any of the NUMA or TLS issues that this whole PDF is about.


Slide 25 shows benchmark between "old" sendfile and "new" sendfile:

https://www.slideshare.net/facepalmtarbz2/new-sendfile-in-en...

> but I probably would have used a synchronous blocking approach.

Well, send a patch, then.


I have a bit of a naive question. If TLS has this much overhead how do HFT and other finance firms secure their connections?

I know they use a number of techniques like kernel bypass to get the lowest latency possible, but maybe they have explored some solution to this problem as well.


TLS has the highest overhead when you're serving data at rest, like static files that are not already in the CPU cache. For serving dynamic data that is in the CPU cache, TLS offload matters a lot less. Our workload is basically the showcase for TLS offload.


i love (love) how everyone else who answered this question alongside you made what appears to be a complete stab in the dark guess while only you knew the answer.

never be afraid to admit that you don’t know something. guessing wrong is a much worse look than not answering at all.


Trading connections that go over private links such as cross-connects between the firm's and the exchange's equipment within a colocation facility are not encrypted.


TLS doesn't really add latency on top of TCP after you make the initial connection - it mostly adds a bit of extra processing overhead for encryption. HFT firms aren't usually encryption-bandwidth-constrained. I'm not actually sure if most exchange FIX connections or whatever actually run over TLS, but that would be reasonable.


HFT needs to be outlawed.

No exchange should allow trades to complete, I would argue in any time less than 15 minutes, and each trade should have a random 1-15 minute delay pad on top of that.

The HFT access only serve the larger financial firms, and are used to do frontloading and other basically-illegal tricks. It provides anti-competitive advantage to large firms for markets that are supposed to open access/fair trading. And of course it leads to AI autotrading madness.

I get that it keeps a lot of tech people very well compensated, but it is either in the service of unregulated fraud at worst and unfair advantage at best.


Not really germane to the topic, but a financial transactions tax would effectively kill HFT without the complexity that you're suggesting.


I guess you are completely oblivious to how HFT's work and their actual position in the markets. I am a bit biased considering that I work in the industry but actually working in the industry gave me the inner workings of it and how actually they help to close / tighten the bid - ask spread for various securities.

I would highly recommend you to read this book ( https://press.princeton.edu/books/hardcover/9780691211381/tr... ) or if it's too long form, read this article ( https://www.thediff.co/p/jane-street )

I am not saying HFT's are in charitable business but they do serve an important role in the financial markets.


Mellanox cards or private links


How do you deal with the higher power density of these servers that needs to be put at the ISP locations ? Don't they have some constraints for the open connect machines ?


The Dell R7525 chassis is available with dual 800w power supplies. General thinking for power supplies is that each power supply is connected to completely separate power distribution - independent cabling, battery backup, generators, and points of entry to the facility. In many cases it's also two different power grids. This is so that if one power source fails anywhere the load can move over to the other power supply without exceeding the power that can be delivered through a single power supply or trip a breaker anywhere. Under normal operating conditions each power supply is doing half the load.

Additionally, the National Electric Code in the US specifies that continuous load should not exceed 80% of given circuit/breaker capacity.

So with dual 800 power supplies at "max" 80% load that's "only" 640 watts for one of these 2U servers. For 208V power that's only 3 amps. High density (for sure) compared to the old days but not as ridiculous as it may seem.


You're right, it's not that much for 2U. But for this config, I think they'd probably go for the 1400W power supplies:

- 16 x 25W SSDs

- 2 x 225W CPUs

- On top of that, add RAM, cooling, etc.

Honestly, it's still manageable. I doubt they'd put 10 of those in a single rack (you'd need an ISP that would want serve 2.2M subscribers in peak from a single location, not necessarily desirable on their side); but if the site is getting full, you'd feel the (power) pressure (slowly) mounting.


I didn't dig into the CPU config, etc but you're right they'd probably go for the 1400W power supplies which is 5.4 amps max at 208V. It's for an older config (based on the other specs) but the current Netflix OpenConnect docs call for 750w[0] which is more reasonable even for this hardware configuration because no one really wants to consistently run their power supplies at 80% (even in branch loss) for obvious reasons.

They absolutely wouldn't want to concentrate them. The entire purpose is to reduce ISP network load and get as close to the customer eyeballs as possible. I don't have any experience with these but I imagine ISPs would install them at their peering "hubs" in major cities - in my experience the usual suspects like Chicago, NYC, Miami, etc.

[0] - https://openconnect.zendesk.com/hc/en-us/articles/3600345383...


Delivering power is not the problem, cooling is. You can load up a cabinet with four 60A PDU's (~50kW) but the challenge is to cool all that hardware you packed in the cabinet.


Yeah, I was including that in the budget (server fans), but technically you're correct, DC cooling is powered separately.


This is a remarkable technical achievement that builds on all of its past work, as are the other updates from Netflix in the past with serving ever more traffic from a single box. That said, I still find it terrifying that so many users would be affected by a single machine going down, that blast radius is so huge!

Do we know if the rates that these hosts serve actually make it into production? Or do they derate the amount they serve from a single host and add others?


As I said in a parallel comment, this is a testbed platform to see what problems we'll encounter running at these speeds. Production hosts are single socket, and can run at roughly 1/2 this speed.

I regret that I've crashed boxes doing hundreds of Gb/s. Thankfully our stack is resilient enough that customers didn't notice.


I think they buffer and if the stream has issues the client connects to another host. They have been doing chaos monkey for a long long time.


Discussion of the same presentation 11 months ago, when the title was 400GB/s.

https://news.ycombinator.com/item?id=28584738

This was the video which was posted back then alongside the slides: https://www.youtube.com/watch?v=_o-HcG8QxPc


This is amazing work. I cant help but state that we have been doing these in HPC environments for at least 15 years - User space networking, offloads, NUMA domains aware scheduling, jitter reduction ... great to see it being put to good use in more mainstream workloads. Goes to show - software is eating the world.


I worked in HPC as well, and I have to point out emphatically that this IS NOT USERSPACE NETWORKING. The TCP stack resides in the kernel. The vast majority of our CPU time is system + interrupt time.


Great engineering but how does this 800Gb/s throughput achieved translate downstream all the way to the consumers? I suspect there may be switches and routers from ISPs and others that Netflix do not control in between that will reduce the effective throughout to the end user.


The 800 Gb/s isn't going to a single user. There are switches and routers in the middle, sure, but they are all doing their job, which is to split up traffic. The end user only needs ~8 Mb/s for a 4K stream.


ISP routers have been more-or-less indistinguishable from switches for decades at this point. They're all "line rate" which is to say that regardless of features, packet size, etc they'll push traffic between interfaces at whatever the physical link is capable of without breaking a sweat.

In the case of Netflix it is in the ISPs best interest to let them push as much traffic to their customer eyeballs as possible. After all, it's much "easier" and cheaper to build out your internal fabric and network (which you have to do anyway for the traffic) than it is to buy and/or acquire transit to "the internet" for this level of traffic.


Modern routers are unable to do line-rate regardless of packet size. See for example the Q100 ASIC from Cisco. Rated for 10.8 Tbps, it is only able to achieve 6 Bpps [1]. So it needs 200-byte packets to hit line-rate. However, as for Netflix, this is not problem since they only push big packets.

[1]: https://xrdocs.io/8000/tutorials/8201-architecture-performan...


Wow, I've been out of this space for a while! Last I was paying close attention to any of this 10G ports were new. Glad I learned something from my old life today!

I stand corrected on "always line rate all the time in any circumstance" but by your math and my general point < 1 Tbps from one of these appliances across multiple 100G ports isn't problematic in the least from a hardware standpoint - especially for the Netflix traffic pattern with relatively full (if not max MTU) packets.


On the contrary, even older routers can handle this load with no sweat. Service provider-grade routers can handle 10 to 200 Tbps depending on size.


But then it gets to my home and it’s trashed down to 100Mbit/s


Of course—the fat backbone pipe is progressively split into smaller pipes as it gets to your house. The internet isn’t a big truck. It’s a series of tubes.


That would be more than enough to watch half a dozen Netflix streams at the same time.


Awesome feats of engineering here taking hardware and software into account when designing the system for a holistic approach to serving content as quickly as possible!

The slide deck background though: At least half of the products in the slide deck template are no longer on Netflix...


At this point, they should've just gone for an in-house bare-bones operating system that supports the bare minimum: reading chunks from disk, encrypting them, and forwarding them to the NIC.

Besides that, it sees like all of the heavy lifting here is done by Mellanox hardware...


Its only doing the crypto. The VM system and the TCP stack are doing most of the heavy lifting, and are both stock FreeBSD.


FreeBSD is their "in-house" operating system since they modify it to do what they want.


But do they really need an entire operating system for what amounts to simply copying around chunks of data? I think they could've gone for some slim RTOS-ish solution instead: no user-mode, no drivers, bare minimum.


I worked on an OS like that once. The problem is with "all the other stuff" that you need to support that's outside the core mission of your OS. You wind up bogged down on each additional feature that you need to implement from scratch (or port from another OS with a compatible license). With FreeBSD, all this comes for free.

We chose to use FreeBSD, and have contributed our code back to the FreeBSD upstream to make the world a better place for everyone.


They're using the FreeBSD filesystem and network stack, both of which are significant amounts of code. I guess they could have tried the rump kernel concept but it sounds like a lot of work.


They... serve 800 gigabytes a second on one single content server, do I get that right?


Gigabits, I presume, so 100 GB/s.


Almost, it's 800 gigabits. Still a lot.


Will be fun to see what can be done with pcie5 stuff and new 400g NICs. Really amazed by the recent increase in bandwidth. Sfp56 recently becoming 'mainstream' in datacenters with 200G controlers at <1500 each, you can stuff 8 or 10 of those in your server. And you get immediate x2 with next gen. If you can offload some of the heavy work to one (or several) GPUs or these FPGA accelerator boards (Alveo or more niche but also crazy ReflexCES with eth-800G capability) you're really starting to get a 'datacenter in a box' system. If compacity is important, the next years are going to be very interesting.


This could be an answer as to why Netflix comes up reliably when all the other streaming services in my experience (Hulu, Disney, HBO Max, Amazon Prime) can take many multiples of time to initialize and deliver a stable stream.


To be honest, this has much more to do with Randall Stewart's RACK TCP, and his team's obsession with improving our member's QoE. Ironically, this costs a lot of CPU as compared to normal TCP (since it is doing pacing, limiting TSO burst sizes, etc). https://github.com/freebsd/freebsd-src/blob/main/sys/netinet...


Of those I only have Prime, and really agree. It was never as good I don't think, but lately in particular it's been so slow to start (and then it's an advert! It'll do it again for the actual content once I click 'skip'!) and occasionally pauses to buffer mid-stream.

I don't get that with Netflix, I've occasionally had it crash out 'sorry this could not be played right now' (which is a weird bug itself - because it always loads fast & fine when I immediately press play on it again) but never such slow loading or pausing.


Does anyone know where these servers are hosted? Certainly not AWS I imagine?


As close to the eyeballs as possible. With OpenConnect[0] they are located in ISP facilities and/or carrier-neutral facilities with access to a peering fabric (kind of the same thing as the OpenConnect Appliance is "hosted" by the ISP).

It's a win-win. ISPs don't have to use their peering and/or transit bandwidth to upstream peers and users get a much better experience with lower latency, higher reliability, less opportunity for packet loss, etc.

[0] - https://openconnect.netflix.com/en/


I've wondered how they achieve it and it's so far beyond my knowledge and skills, truly astounding. The level of expertise and costs must be so high.


Spend a few years just thinking of how to optimize video delivery and you’d be a lot closer to understanding :)


> Serve only static media files This part is weird to me. My understanding is DRM lock the file at a per user level, so the DRM encrypted chunk would be different from yours. Unless for all bitrate, and for all streaming format, Netflix has already pre-computed everything. Otherwise, there must be some sort of pre-computation before it can be served over TLS.


That's not how DRM works. The content is encrypted once and that key is sent to the client. The content key is probably wrapped in some per-session key (which may be wrapped in a per-user key wrapped in a per-device key or something).


What's the benefit of going from 100Gb/s to 800Gb/s through kernel/hardware optimizations as opposed to adding more machines to meet the same throughput in this case? I'd be curious at what point returns on the engineering effort is diminishing in this problem.


I think it's quite obvious that instead of 8 machines you then only need 1. This results in reduced costs for machinery, storage (as each machine would have its own storage) and probably power consumption too. Also, same room of servers can throughput 8 times more content.

Edit: Whoops, apparently this tab has been open for four hours and of course someone already had responded to you, lol.


IIRC a lot of these boxes are deployed at actual ISP's so they're closer to customers. I'd imagine the rackspace is therefore limited and the more you can push from a single machine, the better.


How many customers does that serve?


At 15Mb/s for a start-quality 4k stream (5 times higher than the average ISP speed measured by Netflix), that serves 53k simultaneous customers.

In the US, the fastest ISP for Netflix usage seems to be Comcast (https://ispspeedindex.netflix.net/country/us ), with an average speed of 3.6Mbps. That would serve an average of 222k simultaneous customers on a single server.


That 15Mb/s figure for 4K is out of date by a couple of years. They previously targeted a fixed average bitrate of 15.6Mb/s. They now target a quality level, as scored by VMAF. This makes their average bitrate for 4K variable, but they say it has an upper bound of about 8Mb/s. See https://netflixtechblog.com/optimized-shot-based-encodes-for...


Yep, that's correct. It looks like Netflix forgot to update their support pages for this: https://help.netflix.com/en/node/306 .


More likely that they just wanted to keep it at that as a kind of worst case scenario. If you meet their recommended spec, there should be no way you will have issues.


What does start-quality mean?


Video formats require more data for the first frame of each scene - subsequent frames can be encoded as transformations of the previous frame.


Not much, see sibling comment. It used to be the minimum quality for enjoyable 4k. (4k Blu-Ray discs have much higher bitrates with HEVC). But since, Netflix heavily optimized their encoding, greatly reducing the bandwidth needs.


That depends on the content's bitrate. Netflix serves their video with bitrate anywhere from 2 - 18Mbps. Say if average were 10Mbps, that is roughly 80K customer per box.


I was specifically looking for whats their tech stack for playback? they pretty much have to use HLS for ios safari right? where do those manifest server fit in? what about non ios browser playback?


Slide says "we don’t transcode on the server"

Surely they transcode on some server? Maybe they just mean they don't do it on the same server that is serving bits to customers?


It seemed clear to me: they don't transcode on the server that is sending data to the viewer. Transcoding is done once per piece of media and target format combination, instead of on the fly as it is viewed.


Possibly speed increase:

Switch from TCP to UDP protocol for even faster video delivery send speed?

Why use TCP for video content? Packet loss but then one would use http3 like UDP tech?


I’m wondering if for these extreme use cases a bare metal program that doesn’t sit on top of a OS / kernel would be substantially more efficient.


How would this compare with 42 server slots running 100Gbps DRBD in RAID 0? If I recall, it can pre-shard the data based on a round-robin balancer. ;)


We don't consider solutions like DRBD that introduce inter-dependencies between servers. Any CDN server has to be able to fail and not take down our service.


Most CDNs just use DNS to route user traffic, and can redirect if latency spikes even at ISP NOCs (the CDN traffic controller often doesn't care why a given service is under performing).

Have a wonderful day. =)


Isn't this a $50k-$100k box? Feels like this isn't the cheapest way to do this by an order of magnitude. What am I missing?


So how would you serve that amount of traffic for an order of magnitude less? I'm not too deep into this stuff, but it doesn't seem that far out?


I guess you simply can't in the density required where this would go. It was just a knee-jerk reaction to the napkin-math for the cost.

I've never wanted to build a single box pulling that much traffic when 20 would do.


It amazes me, that Netflix is capable of such top of the line engineering things, but is for the love of god unable to stream HD Content to my iPhone. Tried everything gigabit wifi, cellular ...

It is better for me to pirate their content, play it with plex and be happy. I pay for Netflix and it is absurd.

I think the best years are over for Netflix. The hard awakening is here to make content that the users want and they are a movie/tv content company, not primarily a „tech company“.


Some ISPs throttle Netflix. Not sure your background but it might be helpful to have more details about the type phone (I'd expect a difference in 13 pro max vs an old 7) and ISP to see if others have similar problems.


iPhone XS iOS 15.4 Netflix App Updated A1 Telekom both mobile and wifi. It does not even work at work with a different ISP.

Netflix has all the bandwidth data and metrics, but this is not working since ages. Maybe a more basic setup on their end would bring better results. Focus more and delivery, not 10 different UI versions, AB Tests, Batch Job Workflows and so on. They Post on their engineering blog how they test multiple TVs, multiple Profiles in encoding, great things, but if the basics don’t work ... well what is it good for.

I think they lost their focus.


I think it is more likely there is a specific issue with your device or connectivity.


Support is not helpful, so ... I don’t know anything to do else. I m not angry, but maybe I m not the only one and sooner or later they have to focus on delivering a good experience. Even Prime delivers a better experience, and they know nothing about video :D Their interface is so basic, but to their credit ... the quality is there.


I wonder if they are using something like truenas or just interfacing directly with OpenZFS (assuming they use ZFS).


We use UFS for content. The ZFS caching layer (ARC) does not integrate with the kernel's page cache, and so sendfile() requires a copy to send data residing on ZFS. I believe that ZFS also does not implement the async methods required for the async portion of sendfile.


My work requires to deal with political crap day long to get promoted to a staff role. I miss this kind of work.


> deal with political crap day long to get promoted to a staff role

That's largely what many staff+ engineers have to do, even in otherwise healthy organizations. "Staff" isn't a glorified, autonomous, and stress-free version of senior at most companies. There's nothing wrong with staying at the senior level indefinitely provided (1) the pay and other factors are keeping up with your contributions and (2) the staff+ and management folks are being effective umbrellas for the politics and messy uninteresting details behind interesting problems like this.


Why use/make processors with numa if you have to go to all this trouble not to use it?


Well, the point of NUMA is to allow you to do things like in slides rather than - everyone suffers equally talking to North Bridge. Fabric between NUMA nodes isn't the selling point - fast and direct connection between CPU and other components is.

Plus, not every workload is: read from disk -> encrypt -> send to nic


Funny, I thought Netflix was all AWS. NM, this is for the hosted solution.


i haven't looked yet but i'm going to guess: edge caching running on custom hardware with smart predictions and congestion control algorithms for determining what gets cached where and when.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: