7 Years Of YouTube Scalability Lessons In 30 Minutes

ajross · on March 26, 2012

Most of this is relatively straightforward and unsurprising. But the one part that grabbed me is about "jittering". They insert random delays into timed events (the example given is cache expiration) to prevent a thundering herd problem when all the parts of the distributed system see the event at the same time (and for popular content, presumably repopulate the cache from the backend simulteously).

This is simple enough when described, but is not a technique I've seen applied much in practice or discussed in the community. I'm wondering if it's something that gets reinvented for all the projects that need it or if it's secret sauce known only in youtube. Regardless, I thought it was pretty insightful.

alexgartrell · on March 26, 2012

As a datapoint, we use jitter a lot in cache infrastructure at Facebook

cpeterso · on March 26, 2012

The Adblock Plus blog details the thundering herd problems they faced. Their ad blocking lists checked for updates every 5 days. Eventually, many users' update schedules would converge on Mondays because office computers did not run over the weekend. Updates that had been scheduled for Saturday or Sunday would spill over to Monday.

https://adblockplus.org/blog/downloading-a-file-regularly-ho...

sk5t · on March 27, 2012

Windows does much the same thing w.r.t. policy refresh (which sucks down files from domain controllers) and update of "occasionally updated" metadata like last logon timestamp.

colmmacc · on March 26, 2012

A related technique is to use distinct prime numbers for periodicities in the first place:

http://www.stdlib.net/~colmmacc/2009/11/26/period-pain-3/

(http://www.stdlib.net/~colmmacc/2009/09/27/period-pain-part-... http://www.stdlib.net/~colmmacc/2009/09/14/period-pain/ for the background material).

The reasons primes are optimal should be obvious :)

toddh · on March 26, 2012

As an interesting counter point, Jeff Dean at Google says they prefer to have a known hit every once in a while, so they will have all the cron jobs go off at the same time versus introducing jitter. It perhaps has to do with their emphasis on reducing variability to control the size of the long tail distribution.

jamii · on March 26, 2012

A nice extension of this is to use exponentially distributed delays. Then reoccurring events form poisson processes which are really easy to reason about when composed eg if you have N servers firing off events at the same exponentially distributed rate the times are distributed the same as if you had one server firing events at N x the rate and distributing jobs uniformly at random to the other servers.

http://en.wikipedia.org/wiki/Poisson_process

cpeterso · on March 26, 2012

A counterexample is that the Linux kernel tries to schedule timer events for the same deadline time. That allows the processor to sleep longer because the kernel doesn't need to wake up as often just to handle 1 or 2 timer events.

cbsmith · on March 26, 2012

It's kind of a weird way to describe it. The better way is that you want your caching to be probabilistic rather than deterministic. In general, you want to avoid anything that would get the nodes in a distributed system to harmonize.

The other way to solve the problem though would be to handle that circumstance cleanly. There are ways to resolve a thundering herd without creating a scalability problem.

davnola · on March 26, 2012

I've heard it called "splay" before e.g. `chef-client` will take an interval option and a splay option. Splay is a random amount of time added to the interval to prevent thundering herds.

breidh · on March 26, 2012

On this topic, an interesting paper from 1994 on adding randomization to network trafﬁc sources:

http://ee.lbl.gov/papers/sync_94.pdf

minimax · on March 27, 2012

Fascinating paper, even if it is 18 years old now. Thanks.

0xbadcafebee · on March 26, 2012

In the past i've seen jitter used in all sorts of applications, from cron jobs to configuration management to memcache key expiration. Any time you have a crapload of nodes that all need to do an operation at a specific time you rely on jitter to keep resources from bottlenecking. Probably used anywhere the systems or network has a miniscule amount of resource headroom (like cluster nodes that run at 93% utilization)

makmanalp · on March 26, 2012

Same in multicast DNS which is used in zeroconf:

http://files.multicastdns.org/draft-cheshire-dnsext-multicas...

Search for "random". Cool spec written by Stuart Cheshire of Apple.

minimax · on March 26, 2012

I hadn't thought about it in the general case, but I do frequently find myself adding a sleep for PID mod some appropriate constant to the beginning of big distributed batch jobs in order to keep shared resources (NFS, database, etc) from getting hammered all at once.

sneak · on March 27, 2012

Why is this not done in electronic exchanges to render HFT operations pointless?

tedunangst · on March 27, 2012

First come, first served is one of the rules that makes the exchanges the exchanges.

blhack · on March 26, 2012

What's this? Python? Apache? MySQL? But I thought you had to be running beta-release key value stores, esoteric web servers, and experimental programming languages if you wanted to scale!

/s

zmj · on March 26, 2012

Got your experimental programming language:

> Vitess - a new project released by YouTube, written in Go, it’s a frontend to MySQL. It does a lot of optimization on the fly, it rewrites queries and acts as a proxy. Currently it serves every YouTube database request. It’s RPC based.

maukdaddy · on March 26, 2012

Ah yes, but it's using a ~30 year old technology (RPC).

simonw · on March 26, 2012

Is RPC an actual technology? I thought it was more of a protocol design pattern.

ajross · on March 26, 2012

Remote Procedure Call is a design paradigm for synchronous call-and-response network communication. The Sun RPC protocol is an actual technology defined in RFC1057: http://www.ietf.org/rfc/rfc1057.txt

It's not insane, though not terribly relevant in the modern world. The only common technology still using it is NFS.

m0th87 · on March 26, 2012

If you look at the original RPC work by Bruce Nelson [1], it's pretty clear that there's no strict definition of it. I think most would argue that SOAP would be included, which is still pretty common.

1: http://nd.edu/~dthain/courses/cse598z/fall2004/papers/birrel...

ajross · on March 26, 2012

No, that's the ambiguity I was addressing. RPC means two things. The protocol is used, for the most part, only by NFS. The concept is pervasive.

cbsmith · on March 26, 2012

That's ONC RPC. Not the only implementation. XDR is still used in places, though you're right that it isn't widely used.

old-gregg · on March 26, 2012

man callrpc

tav · on March 26, 2012

When Youtube refers to Vitess as being RPC-based, they are not referring to Sun RPC (callrpc), but rather to the generic design pattern of exposing service calls over the network. In particular, Vitess makes services callable [-] using either BSON or JSON serialisation over HTTP CONNECT calls.

[-] http://code.google.com/p/vitess/source/browse/go/rpcwrap/

j2labs · on March 26, 2012

I know the rage is async & nonblocking, but putting Python behind Apache is a good way to get predictable performance without concerns for calls that block, since they're mitigated via threads.

Also, it helps to have money since this approach requires more boxes. But as I said, it's very reliable.

orlandu63 · on March 26, 2012

Nope, you just need lots of cash.

joering2 · on March 26, 2012

speaking of - are they profitable yet?

0xbadcafebee · on March 26, 2012

Fast, cheap, scalable: pick two.

The fastest, easiest, arguably most reliable way to scale is throwing money at it. And apparently it's easier to hire good people who know 20-year old tech rather than 3-year old.

mattdeboard · on March 26, 2012

The first 10 minutes are about monetization from one of the Youtube dev advocates. Skip to 9:45 to get to the "good stuff".

As an aside, this fellow is probably one of the best presenters I've seen from the pycon videos for this year. Confident, smooth, not reading from a computer screen or sheet of paper, clearly smart and in firm command of the subject matter.

I'd love to see more talks from him.

tricolon · on March 26, 2012

Wadsworth to the rescue! http://www.youtube.com/watch?v=G-lGCC4KKok&wadsworth=1

cellularmitosis · on March 27, 2012

as someone forever with his head buried in technical matters, I really enjoyed seeing the talk balanced with some biz.

hendzen · on March 26, 2012

"They wrote their own BSON implementation which is 10-15 times faster than the one you can download."

Curious to hear more about that one. If true, I hope they open source it, because that could potentially make MongoDB a lot faster for everyone.

EDIT: It's apparently in their vitess code. Relevant code: http://code.google.com/p/vitess/source/browse/#hg%2Fgo%2Fbso...

willvarfar · on March 27, 2012

fwiw I've done some python benchmarking; it happens that I'm actually genuinely needing a faster protocol right now: http://stackoverflow.com/questions/9884080/fastest-packing-o...

supersillyus · on March 27, 2012

I believe it could be faster still. Their Itoa cache is a map[int]string, where it could be a []string. Also, I suspect that a few more primitive type special cases in the first type switch in EncodeField could go a long way.

willvarfar · on March 27, 2012

I got the impression that the 15x they talk of is on the Python side?

pothibo · on March 26, 2012

I fully agree with Youtube faking data. However, I reckon they are faking a bit too much. Many times I would see 2000 likes and the video having 1700 views (Viral videos that is).

I knew the view counter wasn't propagated but the likes were and I was like: "Damn this is Youtube, kinda disappointing..."

I guess if both were propagated at the same time I wouldn't mind.

cbsmith · on March 26, 2012

I honestly don't understand why they simply don't use out of sync data. You could have nodes periodically send aggregates of likes & views, and then add those in to the total ever N heartbeats. Why bother fudging the in-between.

moe · on March 27, 2012

They're probably propagating the likes and views independently. Which still doesn't explain why they allow counter-intuitive gaps like that instead of fixing them up on the client-side in javascript.

modeless · on March 27, 2012

Yeah, the view counter is a clear case where they've gone too far. People care about view counts and it seems like every other video I watch has people complaining in the comments about the view count being obviously wrong.

batiudrami · on March 27, 2012

The only time I notice this is when the view counts freeze at 300-ish for a few hours. That's because they're checking if the views are legit or from a bot, IIRC.

swalsh · on March 26, 2012

Youtube started off as a dating website?

This has to go down in history as one of the best pivot decisions ever made.

InclinedPlane · on March 26, 2012

Originally ebay started out as "auction web" hosted on the same site that Pierre Omidyar used for hosting information about the ebola virus.

ez77 · on March 26, 2012

"Dummer code is easier to grep for and easier to maintain.

The more magical the code is the harder is to figure out how it works."

A nice formulation of the kind of advice I keep reading here in HN.

xentronium · on March 26, 2012

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan

InclinedPlane · on March 26, 2012

Complicated code that isn't easy to maintain isn't smart, it's just complicated. One thing people screw up a lot is implementing some design pattern in a way that ends up shotgunning configuration or business logic over a wide area instead of keeping it in one location and using DRY.

apu · on March 27, 2012

At 11:28 he says, "at last count, there was over a million lines of python running this thing"

Having never worked with code-bases larger than ~50kloc, I have a lot of trouble understanding what 1 million lines of code is needed for, especially considering that python is such a high-level language.

Does anyone have any idea why there would be this much code?

quink · on March 27, 2012

> Does anyone have any idea why there would be this much code?

It's the world's 3rd biggest website with hundreds of billions of views and dozens of millions of users, so maybe that's why.

I think it's a credit to Python that a website that does that and has grown in a fairly haphazard fashion only has about 1000k SLOCs.

apu · on March 27, 2012

Oh, it wasn't meant to be snarky or a slam -- I'm honestly just curious what kinds of things require so much code? E.g., is it one or two things that dominate usually in codebases this size, or is it just a LOT of components, each of which is tens of thousands of lines long? Do these kind of counts usually include auto-generated code?

nikcub · on March 27, 2012

that would be 1M LOC split across all the projects, libraries and modules.

cwbrandsma · on March 26, 2012

I love the part on faking data. I take the viewpoint that only software testers care that the comment count is exactly correct in the majority of system. Users don't care.

RandallBrown · on March 26, 2012

I always thought it was a bug that view counts on YouTube varied depending on what page you saw them on. It can be different on the video view page, the channel page, and in your dashboard.

Turns out they're just making that shit up.

leeskye · on March 26, 2012

The publisher who uploaded the video might care. Especially so, if they are selling to advertisers. Comments, ratings, and favorites are three variables we calculate to reflect how engaging a video is.

ilaksh · on March 27, 2012

LOL users care and they notice a LOT..Probably something like 5% of videos have a comment about how the view count is inaccurate.

kami8845 · on March 27, 2012

does it stop them from watching videos or in any way hinder their enjoyment of the content?

ckg · on March 27, 2012

If they feel strongly enough about it to leave a comment then I think it's safe to say it does hinder their enjoyment - in the same way that obviously broken things distract and displeasure in any medium.

paulhauggis · on March 26, 2012

I faked data for a client once. He had phpbb running and wanted a script that would slowly and randomly generate views on a specific topic.

Since this was just a field in a database, it involved some simple update code.

The results? More people are interested in anything that they think are popular (or they are curious as to why so many people viewed it).

My client got more actual hits overall on these topics.

mbell · on March 27, 2012

My biggest gripe with youtube: why are comment almost always repeated? Yea i realize that most you tube comments are relatively worthless but I do tend to speed through them to get a feel for what the response is to a particular video. Inevitably I get through 20 comments and then the same 20 are repeated over again, often they are repeated several times. Perhaps they are trying to give the illusion of lots of comments or assuming the comments don't matter. Personally I find it extremely annoying, I'd rather them block to load more than repeat.

spdy · on March 26, 2012

http://www.youtube.com/watch?v=G-lGCC4KKok Video to the article from pycon 2012. Talk starts around 10 minutes.

jcromartie · on March 27, 2012

What about this:

> The number of videos has gone up 9 orders of magnitude and the number of developers has only gone up two orders of magnitude.

2 orders of magnitude means at the very least, going from 9 to 100 developers, which is a huge increase, but it could mean way more. I wonder how big the team really is, and what the changing team dynamics are like on that scale at that pace.

Tloewald · on March 27, 2012

I'm sure many of us are disappointed that YouTube doesn't see consistent presentation of user comments as mission critical.

irahul · on March 27, 2012

I am not. From what I have seen of YouTube, comments are vile, and mostly there are two strangers posting pointless arguments about something equally pointless.

For me, YouTube is good for watching videos. If I want to discuss it, I post it on FB.

sylvinus · on March 26, 2012

I wonder how much speedup they could get from PyPy

dropshopsa · on March 27, 2012

The part about "Faking Data" is quite worrisome.

tbsdy · on March 27, 2012

Uh, cheating?

"Cheating - Know How to Fake Data

Awesome technique. The fastest function call is the one that doesn’t happen. When you have a monotonically increasing counter, like movie view counts or profile view counts, you could do a transaction every update. Or you could do a transaction every once in awhile and update by a random amount and as long as it changes from odd to even people would probably believe it’s real. Know how to fake data."

So all those people who buy views are kinda screwed now :-) I suspect this is a bad example. I HOPE this is a bad example, if only for the KONY2012 campaign :P

henrikschroder · on March 27, 2012

No no, the correct amount of views will be recorded for a specific video, it's just that each webserver doesn't know the exact number all the time. You make each webserver fetch the correct value perhaps every hour, and fake it inbetween. You'll get an ok approximation, users can't tell the difference, and you don't have to fetch the actual number every single pageview.

tbsdy · on March 27, 2012

Sir, thank you :-) I appreciate you clarifying this!

charbach007 · on March 26, 2012

Thumbs up if you're the 311th viewer