

Why I like Redis - simonw
http://simonwillison.net/2009/Oct/22/redis/

======
antirez
edit: all this is based on retwis.antirez.com memory usage.

Ok, just did some math. In order to run Twitter using just Redis 1.001 without
to use any new feature that may allow for memory savings, and guessing that
Twitter currently contains 4,000,000,000 of tweets (assuming they save the
full history for all the users, and that the recent 32 bit overflow means they
rougly have 4 billion messages), 30 128GB Linux Boxes are needed.

Is this crazy? I don't know honestly as I don't know how many servers they may
be using currently for the DB backend.

Btw the whole point is, IMHO, that many times to take the full dataset on
Redis is not needed. For instance in twitter only recent messages are accessed
frequently, together with user data, so it's probably a good idea to take only
the latest N messages of every user in Redis (with background jobs moving old
messages on disk incrementally), and take all the rest on MySQL or other on
disk solution suitable to access stuff by-id.

So when you want to get a message from Redis, and from time to time get a NULL
accessing to message:<id> you can run the same query against MySQL to get the
data. That's something like this:

    
    
        def getMessageById(id)
            m = redis.get("message:"+id)
            if !m
                m = getMessageFromMySQL(id)
            end
            return m
        end
    

In this context it is very simple to move old messages from a Redis server to
a MySQL server, since messages are on a redis list, so it's possible to RPOP
to get old elements, if LLEN (List length) reports that this user has more
tweets that we want to take in the "fast path".

Also note that Redis supports expires in keys. So old messages got from MySQL
can be set as expiring keys in order to avoid that a message that got linked
in some front page will stress the MySQL too much.

This is just to give a feeling about scaling a pretty big service using Redis
as main DB without caching layers.

~~~
jbellis
There's a big difference between sharding across 30 redis nodes, where your
application has to be shard-aware, and your ops team has to manually handle
failover, etc, and using a database that looks to the app like a single
system. In other words redis's story here isn't really any better than
sharding a relational db, and everyone knows how much that sucks.

So saying on the home page that "Redis can do [sharding] like any other key-
value DB, basically it's up to the client library" is inaccurate. Distributed
key-oriented databases like cassandra, voldemort, dynomite, riak handle all of
that so it's totally invisible to your app, including (at least in Cassandra's
case, and I think dynomite) adding nodes to the cluster.

~~~
antirez
Hello jbellis,

it's really a matter of design. I like the idea that the Redis servers are
dummy, and it's up to the client logic to handle sharding. For instance the
Ruby client supports this feature in a way mostly transparent to the client.

In traditional databases sharding is hard not because they are not good at it
form the point of view of "feature set" (like in Redis VS Cassandra), but
because the data model itself is not right for working with data split across
different servers. If you use an SQL DB just with tables accessed by IDs and
without queries more complex than lookups by primary key, then sharing starts
to become simpler.

Even if Redis will ever get server-side sharding, I'll code another process
that handles this issue instead to put the logic inside Redis itself.

Btw how is it possible to build something really horizontally scalable without
to use client-level sharding?

What you want is to have N web server and M databases, without any single-
dispatch-node. At least this is how I'm used to thing at it.

Without any kind of client help I guess there is some kind of master node
handling the dispatching of requests. Maybe I missed the point, please give me
some hint.

------
andr
Key-value stores develop fairly quickly. We need an up to date comparison
between their features and metrics (performance, code coverage, bugs filed per
month, etc.).

~~~
russss
We made a start on this at the NoSQL meetup, the spreadsheet is here:

[http://spreadsheets.google.com/ccc?key=0Ale_YaCwKEUVclVFVFlr...](http://spreadsheets.google.com/ccc?key=0Ale_YaCwKEUVclVFVFlrUWt5aWhQaGQ0OXVCMUl4Vmc&hl=en)

It covers a bit more than KV stores - it's basically all of the non-relational
data stores.

If anyone wants access to update it, let me know.

~~~
mahmud
Who do I contact about reviewing Lisp K/V stores? I would double the content
of your list and give exposure to some Lisp gems you haven't heard of :-)

Yes, Lisp stores with C API :-D

AllegroCache, Elephant and ManarDB.

~~~
russss
Never heard of any of them, but then I don't code Lisp ;).

I have to say that I'm only considering free, open source solutions in that
spreadsheet.

~~~
mahmud
~3 are MIT licensed.

I have brewed up my own thing as well. Shoot me an email will ya? or I can get
yours off of the .uk site in your profile.

Cheers!

------
antirez
This seems like the right thread to tell you I'm working on a new data type
for Redis, that is, ordered sets: <http://pastie.org/664270>

Any feedback is really welcomed, for instance, do you know of better data
structures to do the work? I'm using a red black tree and an hash table. The
pastie above documents the specification more or less. Thanks

~~~
kingkilr
Why not use a hashtable with a linked list running through it (basically the
python ordered dictionary class without the mapping). I believe ruby 1.9 uses
these for its default hash type.

~~~
antirez
insert in order inside a linked list is O(N), or I missed something? Maybe
they are using some more advanced kind of linked list where it's possible to
run a binary search (like skip lists or alike).

~~~
kingkilr
Sorry this assumes that you wanted consistent ordering, not necessarily that
you want to be able to create arbitrary ordering for it. So you'd have insert
ordering.

------
papaf
I like the idea of Redis but it has the obvious limitation that it can only be
used when the data you're working with will absolutely never grow bigger than
available memory. This not only reduces the problems it can be used on but
makes the Twitter example given on the Redis website very unrealistic.

~~~
antirez
Hello papaf. If you try to do the math of the number of messages you can hold
using four 64 GB Linux boxes you'll find that it's a perfectly viable (and
very scalable) solution to use Redis in a number of real-world scenarios.

Note that with this setup you don't need N additional memcached servers, so
all the $ can be spent in RAM for the DBs.

Im my personal experience with high traffic web services what I discovered is
that anyway when the on-disk dataset starts to get bigger than the available
RAM performances are horrible anyway. Also note that when there is a lot of
data to Index, the right thing to do can be the following:

take all the metadata on Redis, for fast access. Take the "bulk" data on
MySQL, just an incremental ID + Blob field. Perform queries against both.

It is absolutely possible to add some kind of aging algorithm to swap not
recently used keys on disk to Redis in order to allow for bigger than RAM
datasets, but most people dealing with high traffic sites told me "don't do
it, it's useless because we have access patterns that are more or less evenly
distributed among the key space".

~~~
simonw
Just out of interest, do you know what the largest Redis instance in
production is? It would be fascinating to hear a case study about a real-life
16GB+ store.

~~~
antirez
I don't know if they ever continued to use it in production, but there was a
guy experimenting with a number of 128 GB Redis servers :)

This guy found most of the bugs with huge datasets, and thanks to his work
Redis 1.01 was tested in extreme conditions.

------
henriklied
Redis is great. I use it in quite a few of my spare time projects.

The versatility is my main reason for loving it. From working on large a
dataset without waiting for it to finish loading (…as mr. Willison explained),
to storing object properties in a webapp-environment, to simple persistent
object queues, etc..

It's a very nice tool to have in the box.

------
huyegn
I LOVE redis, I'm surprised there is so much talk about tokyo cabinet et al,
but noone says anything about redis. It's dead simple to setup. The python
libraries are super easy to work with. For simple persistence and queue-like
abilities there's nothing that can match it.

------
ntoshev
Does anyone know what is the concurrency control used by Redis? I couldn't
find anything related to transaction isolation in the docs.

~~~
simonw
Many of the operations are documented as being atomic - you can get a
surprisingly long way using atomically incrementing counters and atomic set
operations. If you search through the Redis mailing list there are some
interesting discussions around lock-free algorithms, CAS and potentially
adding a LOCK command. There's also a recently commited SETNX command for
atomically applying a number of changes to a set of keys at once.

~~~
ntoshev
Lock-free algorithms can be very fast, but they are also hard to implement. It
seems Redis is going to get some lock-based pessimistic concurrency control
eventually:

[http://groups.google.com/group/redis-
db/browse_thread/thread...](http://groups.google.com/group/redis-
db/browse_thread/thread/6b8ccada4d4b79fa/cfccd6ef0409ea79)

~~~
antirez
Indeed there are plans for LOCK / TRYLOCK and UNLOCK. It's just a matter of
time, because from the feedbacks from the community it seems like sorted sets
and hashes are the priority, together with strategies to improve the memory
usage.

For instance Redis edge (the latest git version) is able to save a lot of RAM
(more or less 25%) encoding in a particular way objects that can be
represented as integers internally, but there is still a lot of work to do in
this regard.

------
zacharypinter
Are there any talks/videos about redis available online?

This was the only one I could find:
[http://mwrc2009.confreaks.com/13-mar-2009-19-24-redis-key-
va...](http://mwrc2009.confreaks.com/13-mar-2009-19-24-redis-key-value-
nirvana-ezra-zygmuntowicz.html)

~~~
antirez
Today the is the NoSQL event in Berlin, and they plan to take videos of the
talks. (<http://nosqlberlin.de/>)

Unfortunately my spoken English is even worse than my written one, so go
figure what can happen if I try to make Redis videos... :)

Edit: another video by Bob Ippolito: <http://blip.tv/file/1949416/>

~~~
dualogy
Dang I missed this simply because I was busy developing =/

