

Redis 1.1 supports durability via append only file - antirez
http://code.google.com/p/redis/wiki/AppendOnlyFileHowto

======
antirez
I want to share this with you as I think it's a game changer for everybody
that looked at Redis, didn't liked the snapshotting because their data is
important, and decided to discard it from the viable alternatives. Well I also
hope the article is somewhat informative as a side effect.

 _p.s. version 1.1 is currently in beta, the feature is available on Git, a
stable version (rc1) will be released at the end of this year._

~~~
pierrefar
Will the client libraries be updated in tandem with the 1.1 release?
Especially the PHP extension as it seems to be missing SETNX as far as I can
tell.

~~~
antirez
Hello pierrefar. Yes, the most important client libs will support (many
already do) 1.1 sorted sets, MSET and MSETNX commands (Multi keys set in a
single command atomically), and the other features once 1.1 will be released.

But about PHP, I've good news, there are two new fully featured high quality
implementations of the Redis protocol for PHP:

Predis (<http://github.com/nrk/predis/>) and Rediska
(<http://rediska.geometria-lab.net/>). Both with support for consistent
hashing.

Also the PHP C module got two new developers and is now much more stable,
supporting the full 1.0 protocol AFAIK: <http://github.com/owlient/phpredis>

So the client libs arena is getting better and better fortunately. Other good
quality client libs are for Ruby, Java, and Go Language. Python is getting
better with the time too.

~~~
pierrefar
Thanks.

I'm actually writing a CMS using Redis and absolutely need SETNX (which one
current PHP-only lib supports well). I would like to move to a compiled PHP
module so that I get a nice performance boost.

Coupled with this new log-based persistence, I'll be more comfortable with the
whole set up.

If you want to talk more, my email is hello at (my username).com.

------
mahmud
And the fun continues! Gonna upgrade to latest and start supporting 1.1 API in
cl-redis tomorrow. Win32 builds are greatly appreciated. Contacted the OHM
developer yesterday in #ohm on freenode; will have a chat and see if we can
make that into a language-independent protocol (I have been making the same
thing for Common Lisp for sometime now.)

~~~
antirez
Great, you can find me on #Redis (freenode) as well if I can help. About
Win32: Redis compiles without problems on Cygwin but at least two tests are
failing (it's about INCR with 64 bit values, strtol() is not working well
maybe?).

------
julien
Hum, that Reddis thingy really starts to be more and more exciting. We can't
wait to implement it at Superfeedr :)

~~~
mahmud
We use it as a log server, a message queue, and as a mailbox cache (inter-user
messages without attachments.)

It's a beast.

------
moe
Sad to see that a redis database is _still_ limited by the available RAM. This
seems like a bigger showstopper than durability...

~~~
antirez
Hello moe,

actually _a lot_ of people overestimate the amount of their data. Btw there
are a few solutions:

* given that RAM is cheap, use a Linux box with 32 or 64 GB of RAM. After all with the performances of Redis you need a single box when with other solutions more than one server is needed. It's hard to saturate 150,000 write or read per second even with a lot of users.

* to split data across different servers (using application-level partitioning or consistent hashing).

* to wait for Redis VM implementation (virtual memory).

Basically there is the plan to implement something like Operating Systems
already do with memory pages. More information (and why we can't just let the
OS do the work for us) in the Redis FAQ at
<http://code.google.com/p/redis/wiki/FAQ>

Search for "Do you plan to implement Virtual Memory in Redis? Why don't just
let the Operating System handle it for you?".

~~~
moe
Yes, I think we had this discussion before but I'll repeat my concerns:

 _given that RAM is cheap, use a Linux box with 32 or 64 GB of RAM. After all
with the performances of Redis you need a single box when with other solutions
more than one server is needed. It's hard to saturate 150,000 write or read
per second even with a lot of users._

For many apps even 64G is small. The problem is less about the height of the
ceiling but about the fact that a ceiling exists - and that redis effectively
stops working when it's reached. I agree that there are applications where
redis is a perfect fit, but for many others this limit is a serious problem.
Also note that many projects, especially those just starting out, simply can't
afford to start with 64G servers. 4-8G is a more realistic scale to assume and
_that_ will fill up faster than anyone likes.

FWIW, the EC2 high-memory instance (64G) you cite in your FAQ goes for a swift
$1728 USD/month. Translated to english that means: reddit is pretty much out
of the question for cloud-based apps because RAM is expensive in the cloud and
the normal instance types (<$300 USD/month) top out at 4G.

 _to split data across different servers (using application-level partitioning
or consistent hashing)._

Sharding is always worthwhile for scaling but it's a fairly delicate subject
(esp. the rebalancing after add/remove of a shard and queries spanning
multiple shards) and I haven't seen a shrinkwrapped solution for redis, yet.

So, while an option, most people will probably rather use a competing product
(e.g. MongoDB) before opening that can of worms.

 _to wait for Redis VM implementation (virtual memory)._

Yup, that would be me. Once the RAM-limitation is gone redis will suddenly
become very interesting to me.

~~~
antirez
_For many apps even 64G is small. The problem is less about the height of the
ceiling but about the fact that a ceiling exists - and that redis effectively
stops working when it's reached. I agree that there are applications where
redis is a perfect fit, but for many others this limit is a serious problem.
Also note that many projects, especially those just starting out, simply can't
afford to start with 64G servers. 4-8G is a more realistic scale to assume and
that will fill up faster than anyone likes._

Yes I understand this concerns, and this is why I'm trying to address this
with virtual memory and redis-cluster (a proxy that takes care of handling
fault tolerant consistent hashing). But my point is that there is also a
cultural barrier about this issues. with 8 GB of RAM, especially if you are
starting up and if you take care of selecting a data layout that is cheap,
there is a lot of data you can put in memory. Another interesting alternative
is to put only "hot" data (metadata) on Redis, and use another on-disk DB for
the rest.

 _FWIW, the EC2 high-memory instance (64G) you cite in your FAQ goes for a
swift $1728 USD/month_.

High performance DBs and EC2 IMHO are not a great fit. Not only RAM is
expensive, but memory bandwidth is not optimal.

EC2 is just expansive from every angle you see it, it's not a problem just
with Redis, but also with MySQL performances as memory is crucial to make
MySQL working well.

 _Sharding is always worthwhile for scaling but it's a fairly delicate subject
(esp. the rebalancing after add/remove of a shard and queries spanning
multiple shards) and I haven't seen a shrinkwrapped solution for redis, yet._

Application level partitioning is a good option I think. It's easy to manage,
for instance you take your users in an instance, their blog posts in another,
and comments in another one. Are you little and low traffic in the start and
can't buy three hosts? Just use three Redis instances in the same box, and
move them in different hosts as you grow.

 _So, while an option, most people will probably rather use a competing
product (e.g. MongoDB) before opening that can of worms._

MongoDB and Redis are very different products. If MongoDB is a good fit for
your application, use it, it's great. But if you need Redis, MongoDB is not a
drop in replacement in any way IMHO.

 _Yup, that would be me. Once the RAM-limitation is gone redis will suddenly
become very interesting to me._

This will only work well if data access pattern is biased btw. And the virtual
memory will not remove completely the dataset size limitation. For instance if
you have 2 GB of RAM it will make sense to setup VM to have up to 32 GB of
data for instance, given a biased enough access pattern, but it's not like it
will work well with 1 TB of data.

So I in short I think: most in-memory barrier is cultural. There are solutions
to distribute among different servers that are not hard to implement and
maintain. In every kind of DB the memory should be proportional to the dataset
for it to scale, and with some kind of evenly distributed data access pattern
you need to take everything in memory anyway.

Also the fact that in Redis writes are as cheap as reads is not something to
forgot. There are many applications where it will be much more viable to have
more RAM that scaling concerns with writes.

Redis is not for everything, but I think there is a domain of applications
where it is a very good fit.

~~~
moe
_Another interesting alternative is to put only "hot" data (metadata) on
Redis, and use another on-disk DB for the rest._

Yup that's how I (still) see redis at the moment, more a persistent cache than
a primary datastore.

 _EC2 is just expensive from every angle you see it_

Not really, but that's a different story (scaling out vs up etc.). In general
redis as of now is mostly geared towards a "scale-up" approach whereas cloud
deployments naturally need to "scale-out" instead.

 _Application level partitioning is a good option I think. It's easy to
manage, for instance you take your users in an instance, their blog posts in
another, and comments in another one. Are you little and low traffic in the
start and can't buy three hosts? Just use three Redis instances in the same
box, and move them in different hosts as you grow._

Yes, and that's exactly the can of worms you don't want to get into.

What happens when one of your comment/user/post instances outgrows its host?
You have to split it further. Either logically again (users A-L on host1, M-Z
on host2) or anonymously (even users on host1, odd users on host2).

Since nobody wants to constantly think about their data-layout the latter
variant is definitely preferable.

MongoDB has a leg up here by providing a beta of anonymous, maintenance-free
sharding already.

 _This will only work well if data access pattern is biased btw. And the
virtual memory will not remove completely the dataset size limitation. For
instance if you have 2 GB of RAM it will make sense to setup VM to have up to
32 GB of data for instance, given a biased enough access pattern, but it's not
like it will work well with 1 TB of data._

That sounds bad. Very bad. Not for the people who are happy with redis today,
but for those who are not touching it because of that constraint.

 _There are solutions to distribute among different servers that are not hard
to implement and maintain._

Okay, I'll bite. Where is the turnkey solution that distributes my data over n
redis-instances with automatic failover, automatic rebalancing after
adding/removing an instance, n-copies for redundancy, _reliable_ failure modes
when an instance outgrows the available memory?

I would say without at least some of these features a redis-cluster could
become a nightmare to maintain in the long run.

~~~
antirez
_I would say without at least some of these features a redis-cluster could
become a nightmare to maintain in the long run._

Indeed, this is very helpful, and it's exactly what redis-cluster will do. You
talk to Redis-cluster, and it will talk to other N redis instances, handling
faults, adding or removing nodes, and so forth.

It's like "mongos" process basically.

But the roadmap is to implement virtual memory first and redis-cluster later,
as I think that virutal memory is a more promptly available solution to start,
and works with applications designed to run into a singel Redis server (think
to SORT BY and so o, sets intersections and so forth).

------
jbellis
what is the speed hit?

~~~
antirez
read performance is the same of course, as the append only file is not
touched. write performances from the first tests appear to be like 70% of
snapshotting (tested with redis-benchmark), but more tests are needed to
really understand what's the write speed hit.

~~~
z8000
Are you putting the journal on its own disk?

~~~
antirez
No I'm just trying it in my macbook pro :)

