

Redis: new disk storage to replace VM - DennisP
http://groups.google.com/group/redis-db/browse_thread/thread/d444bc786689bde9?pli=1

======
ShabbyDoo
So, as a non-Redis user, am I correct in my understanding that an admin may
define a maximum delay before write-behind persistence must be attempted? For
applications which can afford to lose a few seconds of data in the case of a
failure, this seems like a great way to improve latency.

More generally, why are there no MySQL (or whatever) engines which offer
similar capabilities? Wouldn't it be possible for DB clients to "commit"
transactions to memory and then have them flushed to disk asynchronously with
all other ACI(d) properties maintained? There are many applications which can
survive a few seconds of data loss but need transactional properties to avoid
data corruption.

~~~
justinsb
All modern relational databases implement exactly what you've described...
transactions are written to a transaction log, which is flushed to disk every
few seconds (or whenever you want to guarantee that a txn is durable). Changes
to the actual data need not be persisted in a timely manner, because in the
event of a crash the data is recovered from the transaction log.

~~~
ShabbyDoo
"in the event of a crash the data is recovered from the transaction log"

Doesn't this statement imply that a disk hit occurred before a client is told
that a transaction committed (vs. being told that a unique key constraint was
violated, etc.)? I'm talking about a more extreme form where I don't have to
wait multiple milliseconds for a disk platter to spin around before continuing
with my processing.

~~~
justinsb
For full durability, you configure/ask the DB to fsync the transaction log
before reporting the transaction committed to the client.

Most people can tolerate a few seconds of data loss, so a sensible config will
only fsync every few seconds and will report a transaction committed before it
hits the disk. If the DB crashes, you lose those recent transactions in this
mode.

All (?) relational databases let you choose which fsync style you want. Most
(?) ship with this setting set to the conservative 'fsync on every commit'
mode. Once you configure a SQL database with a more relaxed setting you get a
database that performs much more similarly to NoSQL. But some people need full
durability - or want it for particular transactions. In that mode, you're
basically bound by the the number of IOPS your disk can do, but are guaranteed
full durability.

~~~
sokoloff
Also note that you can get the best of both worlds with a battery backed RAM
cache contained in a SAN storage backend, such that the storage subsystem can
be extremely low latency and yet "guarantee" that what it has accepted will
get persisted to a disk for durability. (Predictably, this isn't cheap, but
it's very effective.)

Your DB host tells the SAN to write this block, the SAN ingests the write to
local RAM and reports "got it" to the DB server in sub-millisecond. The SAN
will then dump that data to actual underlying discs over the next (hand-wavy)
short timeframe, but from the DB's perspective, it got a durable fsync in
under a millisecond.

------
snissn
sounds like he's trying to implement something similar to a 'dbm' (tokyo/kyoto
as current/modern implementations) which are k/v caches that seem to
intelligently write to disk.

Presumably the keyset can still be in ram (so that [star]foo[star]bar[star]1
searches on keys can work?), but the dbm model is a fairly efficient key/val
implementation, and tokyo/kyoto is fast, and fairly smart about writing to
disk, although I haven't explicitly testing their limitations as you approach
ram limits in production.

Not sure what tradeoffs are in mind, but atleast a feature/perf comparison to
kyoto compared with diskstore as an internal back for redis would be
interesting

[1](doesnt seem i can escape an *)

~~~
pashields
I don't think the keyset will be in ram: "Redis will never use more RAM, even
if we have 2 MB of max memory and 1 billion of keys. This works since now we
don't need to take keys in memory."

I haven't looked at the current code to see if there is a way to favor keeping
the keys in memory, but it would seem that wildcard searches here can/will be
disk-bound.

~~~
snissn
tokyo supports a b-tree index on the keys written to disk which would optimize
blah* queries but not [wildcard]foo[wildcard], and then writes become log(N)

------
nas
Are people aware of Python object databases like ZODB and Durus? I'm not very
familiar with Redis. However, the model used by ZODB and Durus (on disk
durable storage, in memory client caches) can be extremely efficient depending
on workloads.

------
IgorPartola
I am curious as to what people who use Redis in production think of these
types of changes. Is this alarming or hopeful? Seems like a rather large shift
in trade-offs and a whole new set of tuning parameters to play with.

~~~
jashkenas
Alarming, I'd imagine:

<https://groups.google.com/forum/#!topic/redis-db/ZTSm-1w-6AQ>

To quote from the end of the post:

    
    
        so, to sum this up -- after a while, you are stuck with an 
        in-memory database that you cannot backup, cannot replicate to 
        a standby machine, and that will eventually consume all memory 
        and crash (if it does not crash earlier).
    
        conclusion: redis with vm enabled is pretty much unusable, and we
        would really not recommend it to anybody else for production use
        at the moment. (at least not as a database, it might work better
        as a cache.)

~~~
madlep
Actually, that is talking about the existing virtual-memory (VM)
implementation, which swaps data in and out to disk, and doesn't work so
great.

The change being talked about here is all about replacing that exact flakey VM
with a more solid disk backed approach

~~~
jashkenas
I'm sorry, but that's what I meant. I'd be more alarmed than enticed to
discover that the current implementation of datasets-larger-than-RAM for my
chosen database was considered "flakey", and was going to be swapped out for a
green-field approach in the next release.

For reference, this is the blog post that introduced the VM idea:
<http://antirez.com/post/redis-virtual-memory-story.html>

~~~
antirez
> I'm sorry, but that's what I meant. I'd be more alarmed than enticed to
> discover that the current implementation of datasets-larger-than-RAM for my
> chosen database was considered "flakey", and was going to be swapped out for
> a green-field approach in the next release.

As Redis is mainly an in-memory DB, currently larger datasets than RAM were
not our first goal, and there was even the idea to drop support at all for
this use case. I think that what matters for most users is that the default
mode of operations is working great, and that for an alternative mode of
operations developers are not dogmatic and don't fear to drop what is not
optimal to replace it with something better. In many other contexts this would
be regarded as bad marketing and not done at all, but I try to follow a
scientific way to make progresses, and I tend to accept that I and the other
developers are not perfect and need to make mistakes and improve the design
again and again ;)

I like Redis data model and I think this is our biggest value, and we need to
find different underlaying implementations for different use cases, and keep
trying to provide more speed, better durability, better replication, and so
forth, ad libitum.

------
dchest
See also: <http://news.ycombinator.com/item?id=2053594>

