
Ways to do load balancing wrong - alanfranz
http://queue.acm.org/detail.cfm?id=3028689
======
wjossey
As someone who has worked at high scale (1,000+ load balanced servers), this
is a nice "primer" I'd share with a new(er) engineer, or someone who has not
worked in a horizontally scaled environment.

A few things I'll point out in addition to this article:

[1] I've found I greatly prefer databases that handle the resiliency and
horizontal scalability natively (or natively in their respective client
libraries). DBs natively scaling has been one of the greatest advancements in
web technologies during my career, and I'd really recommend taking a look at
the technologies like Riak & Cassandra for inspiration.

[2] When scaling horizontally with load balancing, connection pooling becomes
something to always pay attention to. If you have 1,000 servers, and 100
database instances, you likely don't want 100,000 active connections. Focusing
on things like intra-zone preferences for servers (Server A prefers all DBs in
its own rack, "zone", or "region") can be a real headache saver as you scale
up.

[3] Capacity planning is always key. What are you trying to plan for? Do you
want to survive an entire AWS zone outage? Do you want to be able to survive a
region outage? Do you worry about single servers going down, and if something
"bigger" happens you'll just eat the downtime? Make sure you and your business
stakeholders agree on what "resiliency" looks like to your business. It'll
help you balance cost and downtime appropriately.

[4] Remember when you're looking at spare capacity that you take into account
the situation where a cache layer suffers an outage. If you have to fall back
to the primary source of data, and it potentially is also suffering from
degradation of service, what sort of capacity will you have?

~~~
peterwwillis
There needs to be more publications covering this, because I don't see this
becoming common knowledge or practiced very often.

Then there's the modern design decisions that make some of these subjects
taboo, like "you should never cache anything" and "a single node can handle a
million concurrent connections so why worry" and "let's make all our machines
use 99% of their capacity with containers because it's more efficient and hope
we can deploy standbys before the database queues too much and blows the
entire stack up". With the emergence of the pseudo-Devops cargo cult, people
who like using JavaScript for server backend code are designing infrastructure
they've never administered before. (sorry for the rant)

There was a really great paper published a few years ago that was basically an
Everything You Need To Know To Run Large Scale Enterprise Systems type paper.
I can never find it when these types of threads come up, but it was dozens of
pages long and really in-depth.

~~~
torinmr
The SRE Book
([http://shop.oreilly.com/product/0636920041528.do](http://shop.oreilly.com/product/0636920041528.do))
has a number of chapters on load balancing that are well worth reading.

One thing that the book covers that I think this article glossed over was the
fact that in sufficiently large systems there's never a single "load balancer"
\- instead there's many layers of load balancing systems at different levels
of the stack. E.g.:

DNS load balancing -> high capacity network-level load balancing -> shared
reverse HTTP proxy -> application server -> database (with a "load balancer"
internal to the application server load balancing among DB replicas).

~~~
peterwwillis
They mention there are different applications for load balancers, and then use
one or two examples. But their main points are the most important ones: if you
don't know what you're using it for, it might be worthless.

The SRE book seems to only have a couple sub-chapters on it, so I would
recommend looking up white papers and best practices for your particular
application of load balancing. Safari Books has over 17,000 matches for "load
balancing" in its library, for example, and F5 has 12 white papers on it just
for their products.

------
stenio123
Is the discussion of "load balancers for resilience vs scale" even relevant in
a cloud-enabled world? Just use an AWS Elastic Load Balancer with an Auto
Scale Group, plus Cloudwatch alarms for monitoring and you get the best of all
worlds... :-)

------
surrealvortex
Load balancing strategies are also important. For a service that receives
requests that take different amounts of effort to fulfill, weighted least
conns is almost certainly better than random or round robin.

------
hueving
How does something like this end up published by the acm? Is this a special
publication that isn't related to research at all?

I don't mean to sound harsh, but there was nothing in this article even close
to novel in an academic sense, and none of it is even new field knowledge if
you've spent any non-trivial time around load balancers. Perhaps my notions of
what the acm publishes are wrong...

Make no mistake, it's a good article for things to look out for if you're
getting into load balancing. It's just nothing new if you are already
experienced in the field.

~~~
alanfranzoni
ACM is 'advancing computing as a science and profession'. It's not just for
the Academy. One of the targets of modern ACM is exactly that of closing the
gap between research and practice.

~~~
hueving
Doesn't such a vague definition just make it a blog platform for anything CS
related? Nothing in this article was even new to practice.

~~~
alanfranzoni
Well, yes, ACM itself is quite broad, with many focused groups as well as
other, more divulgative, sections.

But ACM Queue it's not "a blog platform", more like a magazine. Something is
new, something it's a recap, something is an opinion.

