

How Facebook deals with memcache consistency in multiple datacenters - jimgreer
http://www.new.facebook.com/notes.php?id=9445547199

======
thorax
I've always been curious why someone hasn't (or why I haven't heard of
someone) prototyping an improvement to MySQL's query cache to invalidate the
cache less frequently and get more dependable usage. Currently it invalidates
when any table referenced by the query changes. It seems like you could
improve this algorithm without too much trouble. For example, it feels like if
the query was located solely through indexes, it should be able to do cache
invalidation on an update to those index hits rather than on the entire table.

I'll have to look through the code one day and see why this makes no sense.

------
thedob
Interesting use of the 20-second cookie. What happens when they scale to data
centers in Europe, Asia, and South America? Does it become a arbitrary-second
cookie depending on where the user is?

Nonetheless, I can appreciate how a simple approach solves the problem. If
these are the most dramatic scaling issues that facebook faces, then this
provides strong support to the argument that you shouldn't focus too much on
scaling and optimization when first building your site. There are probably
better uses of your early time.

~~~
mseebach
I'd guess that they'd need to partition their data somehow by then, so the
Europe DB is master for European users and slave for the rest, and so on.

------
smoody
note to self: don't start companies that require multiple synchronized data
centers. :-)

~~~
mdasen
While there's a ton of talk about scaling on here, it isn't needed by the vast
majority of people and even Facebook's way isn't that complicated when you get
down to it.

Basically, the only change that needs to be done is that you have MySQL
trigger the delete from memcached on replication. All your slave database gets
an updated record for you, saves it in its tables and then hits memcached with
a delete command for that key. While it does mean diving into the MySQL code,
when you get to the size at which you need such a feature, you can just hire a
MySQL expert.

~~~
goodtimescaling
This is probably a bad idea in a lot of cases and fb replication probably is
rather complicated, most likely relying on some sort of system that relies on
stale data when it can and asks for fresh data when it needs it. Number one if
you're high scalability then you don't want to run a query when you already
know how the underlying data has changed so any kind of invalidation trigger
would be ill advised. Further, if you're replicating data over a large
distance what's the chance that you'll ever have the right data in the right
state. From the moment the data is changed on the master you probably need to
fire off an invalidation, if not an update to that data, which is where some
sort of collection class to manage this is probably more suitable.
Invalidation will often quit to work well, even in a small replicated
environment, if replication takes more than even a very small amount of time,
which happens quite easily if a server is overloaded. Imagine I change my
friends, wait on my process for replication, invalidate and some friend of
mine views my page and re-caches my friend list from a mysql server that
hasn't yet replicated my newest data. It's fairly solvable but even then you
aren't sure that an update will go through on memcache so you have to have
some sanity check to know whether you think the data is actually what you want
to be seeing. I guess I don't really have a point but just more open ended
problems for consideration for those of you wishing to scale. I think it's all
pretty common sense but it's by no means simple. Best of luck.

------
fleaflicker
There's another easier way built into memcached:

When you issue a delete you can specify "the amount of time the client wishes
the server to refuse 'add' and 'replace' commands"

Maybe they couldn't afford to increase the read load so much but this seems
easier than rewriting mysql's replication engine.

~~~
ashu
That would not be guaranteed to be correct since you don't know the precise
time required for the dbs to be consistent.

------
golanzakai
[http://golanzakai.blogspot.com/2008/11/memcached-
replication...](http://golanzakai.blogspot.com/2008/11/memcached-replication-
and-namespaces.html)

Here is a setup where you can scale with memcached

------
smakz
Just seems hacky to me. Opening a whole data center to shave off 70ms? Now I
see why they need 500+ million.

~~~
akd
70ms can really affect the time people spend on a site.

~~~
smakz
You know what takes longer then 70 ms?

All the js/css/other stuff facebook includes on their homepage (one server
round trip each).

Before they spent the millions it takes to bring up another data center they
should have read this:

<http://developer.yahoo.com/yslow/help/#guidelines>

Also, I disagree that 70ms makes that big of a difference. People are more
used to slow sites then people realize - anything less then 200+ ms
improvement in the aggregate and I wouldn't spend a dime (not to mention
millions of dollars).

