

Facebook's Memcached Multiget Hole: More machines = More Capacity - vijaydev
http://highscalability.com/blog/2009/10/26/facebooks-memcached-multiget-hole-more-machines-more-capacit.html

======
bantic
the title should be != More Capacity

~~~
compay
Yes, as in - this is not sarcasm; the actual article title is: "Facebook's
Memcached Multiget Hole: More machines != More Capacity"

~~~
sstrudeau
Right.

Though, to argue with the article's title -- while a straightforward addition
of more memcached nodes did not solve their capacity problem, the strategy of
_replicating_ nodes & load balancing read requests between them appears to be
a solution to this particular capacity problem; so more machines = more
capacity, so long as they are organized appropriately.

~~~
brown9-2
I'm not very familiar with memcached - is replication and load balancing
built-in, or something facebook added on themselves?

------
yummyfajitas
Being a little smarter on the client side can help a lot here.

If you store and retrieve an object only to server # (object.id % numservers),
you don't face this issue. (Obviously you could use a better hash than %.)

~~~
sstrudeau
This would help to distribute requests for individual objects, but doesn't
help for multi-get requests. The problem is adding more nodes increases the
number of actual requests a multi-get call needs to make (assuming you're
asking for a sufficient # of objects they are likely to be distributed across
all nodes). This decreases the # of keys requested per node but _increases_
the total number of requests to the cluster. Because the bound was on
throughput of requests (bound by CPU), minimizing the number of keys per
request to a node doesn't help.

The proposed solution is to instead replicate nodes and load balance read
requests. In this case, this doubles your read capacity, though you must write
twice (or N times depending on your replication level).

~~~
henrikschroder
It's worth noting that that solution only works if you have much fewer writes
than reads, but that's probably true for most people. A really simple way of
doing this that all memcached clients should be able to handle is to set up
two separate memcached clusters, write to both, and randomly read from either
of them. That way you don't replicate nodes, you replicate the entire cluster.
:-)

A better solution would be to consolidate your items, i.e. make sure that
items that often are fetched together with multi-get end up on the same
server, however this requires a lot of extra knowledge about your data, and
it's not very likely that this knowledge is available.

~~~
anamax
> It's worth noting that that solution only works if you have much fewer
> writes than reads, but that's probably true for most people.

Is it true of tweets? How about e-mail? What facebook content works that way?
(IIRC the discussion of their new image store, one of the motivations was that
most pictures were never viewed.)

