

In-Memory caching: Why we can't just trust the database to get it right - bootload
http://www.25hoursaday.com/weblog/2008/11/09/InMemoryCachingWhyWeCantJustTrustTheDatabaseToGetItRight.aspx

======
tocomment
So if you're running a server with 32GB of RAM and your entire database is
4GB, why can't the whole database just be kept in memory and be done with it?

I don't see an obvious way to do this in MySQL at least and it certainly
doesn't do it automatically.

~~~
wayne
It essentially will be after everything's paged in. If you want to be sure
though, create a RAM disk and stick your data files on there, though then you
need to deal with system crashes on your own.

~~~
thwarted
I wonder how creating a raid1 device composed of a real block device and a ram
disk would serve in this regard. Has anyone tested this?

~~~
ars
It would be pointless. Just let linux cache the data and don't worry about it.

If you must, cat all your database tables to null (i.e. read all the data in
them) a few times, then you'll be sure they are in memory.

If you used a ramdisk you'll be paying double: once for the data, and again
for the cache.

~~~
thwarted
Raid1 idea was a suggestion of how to deal with system crashes, not to
populate the buffer cache.

------
mdasen
So, I'm just back from a party and I'm still a little tipsy, but here goes. .
.

It's not always about being faster. Sometimes, it's about being more scalable.
If I gave you a computer twice as fast as mine and told you to O(n^2) sort a
million things, it would take you a lot longer than me using the slower
machine, but using an O(n * log n) algorithm.

This gets us into the difference between MySQL and memcached. With MySQL I can
say "give me the people where the first name is 'Brian' and the last name is
not 'Applestein'" Awesome! Convenient, useful, slow :-(. With memcached, I can
say, "give me key 'Per:48'". Um, not that useful or convenient, but fast as
hell.

Why? What memcached is doing is creating a hash table. Basically, it knows
where something is going to be stored because of its key. So, 'Per:48' will
always be in place X because that's where it hashes to. When I want to
retrieve that key, it doesn't have to look. It can just go! That means that it
runs in constant Order(1) time. MySQL is going to use an index (if you're
lucky). That index is going to sort the items in a specific order and then
MySQL can retrieve an item in log n time. log n > 1 for almost all values
(sorry, I don't deal with math below like, 10 since it's never relevant in my
work since, well, scaling to 3 items doesn't matter).

So, even if MySQL kept everything in memory, it wouldn't be as scalable as
memcached. memcached is able to do an Order(1) hash lookup. MySQL (with an
index scan) is log n. This brings me to the query cache. While I don't pretend
to know all of the internals of these technologies, I'm guessing that MySQL's
query cache is basically like memcached. You have another key -> value store.
Basically, MySQL has stored the result set as the value of the key which is
the query.

The problem with the query cache is that it becomes invalid the second
something is changed in the database, thus making it useless for scale.

So, why does memcached overcome that problem? Well, part of it is that when
you scale, you leave behind those awesome queries that are so neat. You want
to get all of the friends of "Matt"? Well, grab the serialized object "Matt"
from memcached. That should have a list of the ids of his friends. Ask
memcached for the objects matching that list of keys. As you can see,
memcached isn't relying on a query of Matt's friends. Maybe you have never
done that before. BUT, you have put each record into memcached and therefore
it is available in this context. Where the entire query cache becomes invalid
everytime someone hits the database, you can hack MySQL to update memcached
with the one record that has been updated without invalidating the things that
haven't changed.

If you have questions, you can always respond and I'll try to answer in the
morning. Basically, querying is never infinitely scalable. It's log n. Hash
lookups can scale. While more memory makes a database faster and scales it
higher, it doesn't match a cache like memcached because queries are a more
costly operation than hash lookups. Yeah, it's a pain. You and I probably
won't have to deal with it much since databases are pretty damn efficient and
log n is pretty good. If you become the next Twitter, hire someone who can
explain this comment. Before then, don't worry so much! You'll give yourself
wrinkles!

~~~
lhorn
Yeah. Every time when someone tries to make a case for memchached they bring
in Facebook and LiveJournal.

Well guess what: you aren't Facebook. You don't serve 21,000 requests per
second. And you never will. Go and look at the size of your MySQL tables, then
look at how much RAM your server has and what percentage of it is allocated to
filesystem cache and how many real I/O reads/writes your hardware RAID does
per second. Then come back and tell me again why do "we need to switch to
memcached"?

------
patrickg-zill
I think what he is trying to say, is that knowing how your database stores and
accesses the data, can help you make better decisions in writing SQL
statements.

The example he gives is that a stored procedure for a large query used to take
almost 1GB but now it takes only about 1MB because he is "cooperating" with
the way the database slings data around.

