
When Simple Wins: Power of 2 Load Balancing - mattdennewitz
https://fly.io/articles/simple-wins-power-of-2-load-balancing/
======
alxv
The method is called "Power of Two _Random Choices_ "
([http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...](http://www.eecs.harvard.edu/~michaelm/postscripts/handbook2001.pdf)).
And the two-choices paradigm is widely applicable beyond load balancing. In
particular, it applies to hash table design (e.g. cuckoo hashing) and cache
eviction schemes ([https://danluu.com/2choices-
eviction/](https://danluu.com/2choices-eviction/)).

~~~
mrkurt
You're right, I updated the title. Got a little too clever with the whole
"power" thing.

------
zubspace
I'm not an expert in this field, but an engineer of vimeo went into detail,
why this approach did not work for them. [1]

Problem with consistent hashing:

    
    
      However, consistent hashing comes with its own problem: uneven distribution of requests.
      Because of its mathematical properties, consistent hashing only balances loads about as
      well as choosing a random server for each request, when the distribution of requests is
      equal. But if some content is much more popular than others (as usual for the internet),
      it can be worse than that.
    

Problem with Power of 2 Load Balancing:

    
    
      Why wasn’t there a way to say “use consistent hashing, but please don’t overload any
      servers”? As early as August 2015, I had tried to come up with an algorithm based on
      the power of two random choices that would do just that, but a bit of  simulation said
      that it didn’t work. Too many requests were sent to non-ideal servers to be worthwhile.
    

Instead, he used something called _Consistent Hashing with Bounded Loads_.

[1] [https://medium.com/vimeo-engineering-blog/improving-load-
bal...](https://medium.com/vimeo-engineering-blog/improving-load-balancing-
with-a-new-consistent-hashing-algorithm-9f1bd75709ed)

~~~
user5994461
They are different algorithm for different purpose.

Consistent hashing is used to always attach a request to the same host. It's
the opposite of load balancing.

Load balancing algorithms (least connection, business, etc...) are used to
distribute requests across servers as well as possible to maximize
performances.

~~~
pas
Usually both properties are desirable .. up to a point.

You want to minimize load on all servers, but you also want to pack things up
efficiently (so minimize operational costs), but of course you want the
benefits of caching, so you want requests from a sessions to land on the same
node/server/box.

Basically a multi-dimensional optimization problem. Completely solvable with
constraints. Let the business people decide what's more important, latency or
throughput or low cost of operations.

------
throwaway13337
The simplest load balancing I've done is modulo the user ID by the number of
servers then point at that server.

This solves caching too since you are only ever receiving and caching user
data on a single server. No cache communication required. You can enforce it
on the server side for security as well.

Doesn't require a load balance server - just an extra line of code.

Keep it simple.

~~~
alxv
What happens when the number of servers changes? The cache hit rate would
likely drop to zero until it warms up again, which is a good way to
accidentally overload your systems.

Load balancing based on consistent hashing is the better way to implement
this.

~~~
mnutt
Consistent hashing is a bit cleaner way to do it, but pretty much the same
result as modulo-ing the user id against number of servers. At least as I
understand it, you consistently hash something (a user id, a request URL, etc)
into N buckets, where N is the number of servers, so changing N re-shuffles
all of the buckets anyway.

Short of something like cassandra's ring topology, how would you use
consistent hashing add new servers and assign them requests?

~~~
alxv
You are missing a crucial piece here to have consistent hashing: you also need
hash the names of the servers. With consistent hashing you hash both the names
of the requests and of the servers, then you assign the request to the server
with closest hash (under the modulus). With this scheme, you only need to
remap 1/n of the keys (where n is the number of servers).

~~~
alexgartrell
You're kind of right. You can also use something like jump consistent hash [0]
which only requires you to have a consistent ordering of the hosts where
you're sending the information. We (Facebook) use something similar for our
caches. It requires a linear array of hosts but you've already got that if
you're load balancing.

[0] [https://arxiv.org/abs/1406.2294](https://arxiv.org/abs/1406.2294)

------
euph0ria
Regarding the math section, could someone please describe it like you were
talking to a 5 year old?

1) Θ( log n = log / log n )

2) Θ(log log n)

~~~
alxv
There is a proof shown in this handout:
[https://people.eecs.berkeley.edu/~sinclair/cs271/n15.pdf](https://people.eecs.berkeley.edu/~sinclair/cs271/n15.pdf)

It's hard to understand _why_ this technique works so well without digging
deep in the math. Roughly speaking, if you throw n balls in n bins at random,
the maximum of number balls in any bins will grow surprisingly quickly
(because of the birthday paradox). However, if we allow ourselves to choose
between two random bins instead of one, and put the ball in the one with the
fewest balls in it, the maximum number of balls in any bins grow much more
slowly (i.e., O(ln ln n)). Hence, having that one extra random choice allows
us to get surprisingly close to the optimal approach of comparing all bins
(which would give us O(1)), without doing all that work.

~~~
MichaelGG
Thanks for the explanation! Much clearer and I get the concept. In the case of
load balancing, we'd need a ton of servers (1000s?) for this to pay off vs
just comparing all, right? Cache updating aside, most of the overhead would be
in reading the load numbers in. Comparing a thousand numbers has to be quick
in comparison, no?

~~~
mrkurt
The problem with load balancing is herd behavior. Stats for load are usually
at least a little stale, because it's a distributed system where you can't
afford to wait for consistency. When there are traffic spikes a whole herd of
new connections will go to the least loaded server for a window of time where
the cached "load" number is out of date. Picking two at random helps keep from
a bunch of connections racing to one server, even when you're only running 3-4
of them.

------
gopalv
"Power of 2 Random Choices" ... has nothing to do with the "Power of 2"
directly.

I like 2Choice because it is not dependent on hash function design & is
temporal, but I have a positive aversion to the 2^n hash distributions when it
comes to data, specifically for distributed systems which need to flex up/down
[1].

[1] -
[http://notmysock.org/blog/hacks/1440](http://notmysock.org/blog/hacks/1440)

------
scame
I've seen a paper doing the same thing directly at the network layer using
IPv6 extension headers: [http://www.thomasclausen.net/wp-
content/uploads/2017/06/2017...](http://www.thomasclausen.net/wp-
content/uploads/2017/06/2017-ICDCS-SRLB-The-Power-of-Choices-in-Load-
Balancing-with-Segment-Routing.pdf)

------
adrianratnapala
Can someone expand on the maths that the OP elided? What is the thing that
comes out to O(log n / log log n)?

