
Mcrouter: A memcached protocol router for scaling memcached deployments - dsr12
https://code.facebook.com/posts/296442737213493/introducing-mcrouter-a-memcached-protocol-router-for-scaling-memcached-deployments/
======
finnh
At first I thought "how does this improve over consistent hashing?"

Then I read it & saw the pretty pictures and realized they address many more
concerns that just consistent hashing would ... and, even in that case, moving
the server list to a single centralized spot (mcrouter) and letting your
clients just point at that spot is a definite win. Otherwise, when you update
your server list, you have to update every single client.

nice work, FB! And well described in the blog post.

------
stock_toaster
How is this different from twemproxy[1]?

EDIT: Looks like instead of using distinct ports to delineate separate
clusters like twemproxy, it uses key prefix routing. It also supports
"replicated pools", and a few other fancy/neat things. Interesting!

[1]:
[https://github.com/twitter/twemproxy](https://github.com/twitter/twemproxy)

~~~
rbranson
One of the primary reasons we switched to it at Instagram (we were using
twemproxy) is that it has much more robust failure handling than twemproxy
did, at least at the time. That may have changed. Our experience was that if
we lost a single memcache server, it caused multi-gets that included the lost
server to fail completely or hang / timeout, instead of just not including
results from the down host.

~~~
stock_toaster
At ($dayjob) we currently use twemproxy (30 nodes with ~4k get/s each), so
this is very interesting to me. Thanks for the info/feedback.

~~~
rbranson
To elaborate: twemproxy supports auto_eject_hosts, but the host is removed
from the ring until a timeout and it's portion of the keyspace is re-allocated
to other machines during this time. There are numerous problems with this, but
the biggest problem is that it causes inconsistent views of the ring if one or
more client hosts is unable to reach some of the machines in the ring for
transient periods of time, which is a pretty normal kind of thing if you've
got clients and ring nodes distributed over multiple loosely coupled failure
domains. mcrouter is much more capable in terms of handling node failure.

------
lukepatrick
Looks like a great improvement over a simpler scaling solution, Netflix's
EVCache:
[https://github.com/Netflix/EVCache/wiki](https://github.com/Netflix/EVCache/wiki)

------
jetm1
from the github readme "It's a core component of cache infrastructure at
Facebook and Instagram where mcrouter handles almost 5 billion requests per
second at peak." 5 billion per sec?! i think that include frontend and backend
like ads, analytics etc.

