
Put that database in memory - wheels
http://glinden.blogspot.com/2009/11/put-that-database-in-memory.html
======
rm-rf
I've had very good results simulating an in-memory database with conventional
RDBMS's by simply running the databases with a high ratio of physical memory
to data - on the order of 128GB of memory for a 500GB database.

The advantage is that COTS applications work as is, you don't need to wait for
researchers to dream up something new, you don't need to re-write an
application to use the latest fashion in database technology, and most
importantly, poor design & poor coding can be papered over with fast logical
I/Os (in memory).

In conventional databases configured with high memory->data ratios, we don't
have traditional I/O bottlenecks on database reads. Hence I've made a simple
decision - that it is cheaper to add memory (even expensive memory) than it is
to build a disk I/O subsystem that can handle equivalent I/O.

Transaction writes are still a potential I/O issue though.

------
gruseom
I just finished reading the paper that Greg is blogging about, by Ousterhout
and a truckload of co-authors. Yes, they advocate keeping all data in RAM and
logging to disk purely for backup. But they also argue that it will take years
before this becomes practical (at least at scale), and that fundamental
research is still needed on how to achieve durability, distribution, and other
factors. I find that surprising. Do such systems really not exist today?

~~~
evgen
Such systems exist (Redis is a great example of the "just keep it in RAM"
idea) but I think what Ousterhout et al are getting at is that we are just
starting to think about the consequences of moving to RAM-based dbs and have
not really worked out the kinks. Right now RAM-based systems that I am aware
of do not do very much meta-analysis on the access patterns or the data itself
to try to migrate or cluster blobs for maximum performance. We have started to
get better at doing this for data on disk (e.g. column-based stores, using
large disk chunks to cut down on seeks, etc) but have just started thinking
about how things will change when primary data is all in RAM. We have a lot of
experience with dealing with efficient use of disk resources but the world of
RAM-resident data is still pretty new and we are learning how many of our
assumptions about how things should work based on the world of disk-based
storage will carry over to the new paradigm.

~~~
antirez
I'm starting to do things like this in Redis. For instance consider Redis
Lists: POP and PUSH are O(1), also to get the first or last 10 items in
constant time is possible with LRANGE, but what if the user is often accessing
a "far" range in a very long list? A similar problem happens to ZRANGE and
Sorted Sets.

Ok this problems allows to apply some access-pattern based optimization
solution. For instance the linked list will have an associated N-elements
circular buffer with pointers to far elements, so you can jump to the N-th
node if it's on the (small) circular buffer and the clients continue to ask
for an "LRANGE mylist 1000000 1000010".

This is probably just the start but in general a small cache of "nodes" to
recently very used places of data structures is a promising strategy in order
to turn otherwise O(N) access patterns in constant time when they are very
frequent.

------
swolchok
Um, don't you lose durability (or at least transactional semantics) if you do
that? Are we just assuming that replication is sufficient and not all the in-
memory copies of the data will die at once?

~~~
jeremyw
Sort of. Given a continuum of reliability needs, options for variably
controlling the window of data loss are welcome.

The bottom line is solid-state components are very reliable. Across hundreds
of systems, I see uptimes in years. If I can dial that data loss potential in
a robust way, I'll take the 5 nines of 100x write capability.

~~~
antirez
Well there are many ways to add durability to an in-memory DB. For instance
Redis 1.1 is supporting three of this ways with different degrees of safety VS
performances.

1 - snapshotting) This uses the idea that after a fork() the OS uses copy-on-
write semantics. So Redis forks() and dumps a very compact snapshot of the
data in RAM on disk. This snapshot is also used in master-slave replication
for the initial synchronization. You can configure different save-points, for
instance save when there are at least 100 changes and 60 seconds elapsed, and
so forth.

2 - append only journal) In this mode Redis just append every command that
modified the DB into a file, in order to later reload the log to rebuild the
status of the DB. This is much more durable. This days I'm coding a command
that is able to rebuild the log in background in order to avoid to end with a
huge log. The rebuilding process is fully non-blocking as it uses the same
trick of the background snapshotting, that is, copy-on-write of fork. So
basically Redis forks and starts to rebuild the append-only log on a different
file. The parent process continues to log to the old file and _accumulates all
the new differences in RAM_. When the child finished the rebuild, the parent
adds all the new logs at the end of the file and atomically rename(2) the new
log to the old one.

3 - master-slave replication.

~~~
jeremyw
That's exactly the continuum I'm referring to.

------
jbert
The old quipu X.500 directory originally used a similar approach (load to ram
at startup, writes go to ram+disk, reads+searches come from ram).

You get durability and run-time performance, but there can be appreciable
downtime during the startup phase.

------
forensic
It's crazy that this isn't standard practice already. I guess it just shows
how much momentum an old paradigm can have.

~~~
AndrewDucker
Cost/Benefit.

Buying several TB of RAM is not cheap...

~~~
neilc
Also, there is a _lot_ of cold data out there. Storing all of it in RAM just
doesn't make sense right now, given upfront + ongoing expense (energy).

