

Building a Billion User Load Balancer [video] - noveltysystems
https://www.youtube.com/watch?v=MKgJeqF1DHw

======
acaloiar
In 2001 when I was 14, I had no idea what scalability was or how anyone scaled
websites. I ran a site with 20k users on a single sever with a single Pentium
IV (1.4? Ghz) processor from a garage with no climate control and the fastest
cable connection I could afford. At peak hours the processor finally succumbed
to the load and I had to buy a second server. I was able to talk a local ISP
into co-locating the servers in their rack (Climate control! Fiber!). When I
realized that I could load balance with DNS and no load balancing software, I
thought I was a genius. The king of scalability.

Needless to say, I was not the king. The engineers at Facebook are king.

~~~
zongitsrinzler
This is gold.

But really, DNS load balancing can be really useful (and is so easy to set
up).

------
stann
Can anybody summarize the talk for folks like me who for several reasons
cannot watch the video?

~~~
rroriz
I'd like to complement this comment and suggest that this practice should be
applied to links that may be blocked to a grand part of us, people that are
behind a firewall.

~~~
cmrx64
There are also accessibility reasons for not being able to watch a video. Let
alone dedicating the time to it.

~~~
mhuffman
There are also those that have limited bandwidth available, or very expensive
bandwidth, where a video is a luxury.

------
patrickshuff
Hi All! Thanks for the comments here. The video for this same (more recent)
talk at SRECon Europe 2015 is up and it is much higher quality. I have
iterated on the presentation with feedback I got since giving this in
February. Enjoy!

Usenix:
[https://www.usenix.org/conference/srecon15europe/program/pre...](https://www.usenix.org/conference/srecon15europe/program/presentation/shuff)

Direct MP4:
[http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f...](http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/srecon15europe/shuff.mp4)

------
scoj
Quite an interesting talk. I liked the format: "How to get more rps?..."

It's humbling to think of the scale that some other folks deal with. I'm
concerned of 2x growth with running a couple thousand sites on a dozen
servers.

You can't just hit up SO when you run into a problem at Facebook's scale.

~~~
nocarrier
We didn't get serious about load balancing and create a team dedicated to it
until fall of 2011. At the time, Facebook had 800 million active users per
month. We had run out of runway on commercial and open source options, and
needed a lot more flexibility with how we allocated and moved traffic around.
It took a lot of effort to get to where we are today, but the thing I'll tell
you is that these huge systems always start with a prototype running on a dev
server. And then you get it to run traffic for your whole team, and then the
whole company. Then you try it in a cluster, a datacenter, a region, and all
of a sudden you've built a load balancer.

It took Proxygen less than 18 months to go from inception at a hackathon to
running all of Facebook's production HTTP traffic. And we could have shaved
that by a few months if we had made smarter decisions along the way.

------
scurvy
Not trying to be a wet blanket, but there's nothing really groundbreaking in
this talk. It's pretty basic stuff that everyone else running L3 networks
already does.

I'd be much more interested in a talk that is solely on proxygen.

I'd also like to know their fragmentation rate is between the L4 and L7 LB's.
I know what ours is, but I'm sure FB takes a lot more large uploads than we
do. Fragmentation is the downside to using IPVS in a L3 network.

~~~
nocarrier
This stuff seems basic at first glance, but it's not easy to get right, and
you're discounting a ton of effort that went into getting feedback loops and
demand prediction to work properly in Cartographer for example. Maybe it's
easy to slap together L4/L7 LB and throw a DNS server in front of it at
smaller scales, but it's a notable accomplishment to do this at Facebook's
scale. Nothing out there approaches Facebook's request rates.

There have been detailed talks about Proxygen's architecture at various open
houses in Seattle and Menlo Park, and they should have HTTP2 support working
well soon. I'd expect there's going to be more detailed blog posts on Proxygen
at some point. Or you can check out the source code too.

Fragmentation isn't really an issue for either Shiv or Proxygen. I don't think
they will share the rate but it is very low.

~~~
scurvy
You'll get fragmentation on every segment over 1460 bytes, which would be the
majority of datagrams for a large file upload like pictures or movies. Only
way to work around this is to run jumbo frames from L4 to L7. You can't
magically encapsulate IP in IP at 1460 segments without fragmenting.

------
feld
Surprised they use tinydns.

~~~
feld
Why would I get downvoted for that? It's a serious concern. It's missing
modern and often important features unless you use unofficial patchsets, and
some missing entirely (NSEC3). And then there are all the forks like dbndns
from Debian, N-DJBDNS, etc...

~~~
scurvy
Specifically, what's bad about tinydns? Compared to BIND, it's a veritable
Fort Knox of security.

DNSSEC isn't a requirement for most people, and I'd wager a lot of people
consider DNSSEC more harmful than beneficial.

------
scurvy
Well, he kinda failed the infamous Facebook interview question of "what
happens when you type www.facebook.com and hit enter in a browser?" ;)

