Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is it feasible to use redis as the only datastore?
19 points by clojurerocks on Sept 18, 2011 | hide | past | favorite | 26 comments
Ive read that Redis is used for caching. But for a simple application is it feasible to use Redis as the only datastore since it provides fairly robust querying capabilities?

From the point of view of durability there are no problems in using Redis as a primary data store, make sure to use AOF with fsync everysec setting.

However if you should do it or not depends on the data you need to store, and even more in the kind of queries you want to run against the data set. If you end with a complex schema in order to support SQL-alike queries it is a bad idea. If you want to query data in a very Redis-API-obvious way, it is a good idea.

A few remarks about not correct ideas expressed in different posts here:

1) Redis transactions are not good since there is no rollback: not true since in Redis query errors can only happen in the case of a type mismatch more or less, so when you have a MULTI/EXEC block if there are no obvious bugs in your code you are going to see all the commands executed.

2) Redis durability is weak. Not true since with AOF + fsync it offers a level of durability that is comparable to many other data stores. Depends on the actual configuration used of course. Your usual MyISAM table for sure is not more durable than Redis AOF + fsync everysec just to make an example. Replication can add another level of durability of course.

3) RDB persistence instead is a weak durability configuration, but works for many applications where in the case of a bad event to have 15 minutes of data lost is a viable option.

So if you, for instance, are planning to use Redis to store your meta data, you can avoid having this same dataset duplicated in other places. Redis is also pretty solid in providing solutions to backup your data. For instance while you use AOF + fsync you can still call BGSAVE and create a compact .rdb file to backup. You'll always be able to restore the data set from the RDB file.

That said I think that in most complex applications using Redis + an on disk and "more query friendly" data store makes a lot of sense. Redis + Mongo, Redis + Mysql, Redis + Riak and so forth are very interesting configurations IMHO.

It depends on what your simple application does and what its needs are. I could rewrite either of my products to use Redis only, to the exclusion of SQL. It wouldn't be a very good use of my time, but it would be trivially possible. Neither of them put terrible stress on persistence engines. Heck, I could rewrite them to do everything in flat files and that wouldn't be impossible, either.

There are some rather relational-data-intensive projects I've been involved with where that would have been an Exceptionally Poor Idea, both for the amount of pain one would go through writing a poorly tested version of LEFT JOIN to be able to get it to work, and because one will eventually discover that your SQL database of choice has been improved for hundreds of man-years along axes you care about and Redis has not.

I guess what im wondering is more about the scalability of it. I know that Mongodb could essentially be used as the only datastore however it apparently has some issues currently with scalability.

So, your app is unlikely to be as popular as foursquare, for instance, at least right now, and they seem fine and happy with MongoDB. You might recall the issues they had with it a while back, but those have all been fixed as far as I'm aware. 10gen is really quite impressive when it comes to support, etc.

But, the real question is, why do you think a SQL database will not be scalable to begin with? I'm going to say that well over 75% of the top sites, use a SQL database.

Mongo 1.8 fixed a lot of the scalability issues. +1 to Mongo as an alternative to SQL. Redis is great for a simple app but I wouldn't want to scale it to handle everything for a large site.

I didnt know that the issues in Mongo were fixed. I figured they would be eventually though.

No. Redis is not a database, it's an in-memory key-value store. It's also called a [distributed] data structure server, which (at least to me) implies a tight coupling with your application: it's a way to offload shared state between multiple components, but the persistence of that shared state is not guaranteed.

In fact, Redis' persistence layer is best understood as a best-effort value-add. If the server shuts down for any reason, you have to simply hope the last disk flush was recent and successful. Otherwise, your data is lost. This is (again, to me) fundamentally at odds with the contracts that any database should provide. Also, Redis cluster is not yet released, which means running more than one Redis server requires you to manage keyspace sharding at the app layer.

Not that any of this is a knock against Redis. Even with those caveats, there are a huge class of problems that Redis is perfectly suited for. I love the software and use it daily. But Redis competes with memcached, not MongoDB; if you ever find yourself shoe-horning Redis into a role where total data loss is anything other than temporarily annoying, you're doing it wrong.

tl;dr: IMO, using Redis as a database is a really bad idea, for most common definitions of "database."

I think your description of redis' reliability is a bit too vague and scary. There are several levels of reliability that redis supports, each with its own tradeoffs:

1. No disk. Everything in memory, and if redis dies, so does your data. This is closest to memcached.

2. In memory, with periodic background flushes to disk. After a timeout (shorter if there's a lot of modification to your data), redis will spawn a background thread and write out all its data to a file in the background. (Then it will atomically rename the file in the place of the previous dump file.) This is the easiest form of disk persistence, and good enough for most of what people use redis for.

3. You can also configure redis to write to an append-only file that logs every operation, which you can periodically compact with a cron job or something. The flushing interval is configurable, and makes a big difference in speed. This is not particularly convenient -- who the hell wants to write a cron job to compact database logs? -- but it gives you durability on par with a conventional database.

4. If you have another machine lying around, there's an option that you can combine with any of the three options above: master-slave replication. A read-only slave redis server receives a stream of all writes to the master, and changes itself to match. This gives a small data-loss window in the event of a master failure, and makes fail-over possible. If the master goes down, you can have a slave become the new master. Coordinating this can be tricky, but it can certainly be done.

tl;dr: If the reliability approaches above look good enough for your application, and redis looks like the best match from a semantic or performance standpoint, then go for it!

I had temporarily forgotten about master-slave replication. Thanks for the reminder. And I appreciate your broad points. But I still maintain that you're fooling yourself if you think any of those options is a stand-in for Proper™ database-style persistence. By that I mean some contractual assurance in the software that a change written to a data structure is permanent, to standards comparable to the underlying storage system.

If Redis were a database, I would expect a successful HMSET to generally be available in perpetuity, even if the machine was rebooted immediately after I got the "OK".

Append-only command logs don't solve the problem; replaying a complex series of state transitions is not a viable substitute for the storage of the end result of those transitions. It's computationally correct, and a straightforward implementation, but quickly and easily becomes unacceptably inefficient. I hope that's clear enough that I don't need to provide an example.

Replication is a solution for the problem of network or machine instability, assuming a valid Redis use-case. It doesn't address persistence in the sense of databases. In distributed computing, High-Availability is orthogonal to Persistence.

Periodic background flushes to disk in another thread come the closest to solving the persistence problem, but to get database-class QoS you'd need to trigger on every change (or, say, every second). Obviously this is a bad idea and not what the feature is designed for, which circles back to my main point: Redis is not designed to be a database. If it's operating as a memcached replacement in your stack, great. If it's standing in for authoritative, long-term storage of critical data, it's being misused.

Let's compare Redis's AOF mode with InnoDB. The way that InnoDB manages to give this guarantee is by flushing its log to disk on every transaction commit. If you're willing to sacrifice some durability for write speed, this can be relaxed. In redis, the closest equivalent to this would be running in AOF mode with a flush on every write.

The difference here is not one of durability, but in how the data is stored on disk. InnoDB keeps the logs small by periodically updating a B-tree with the changes in the logs, after which those changes can safely be removed from the logs. The result of this is strong durability, a reasonably compact on-disk representation, and fairly fast recovery when someone trips over the power cord.

Redis, in AOF mode, logs every command to the log file and (if you specify it in the config file) flushes to disk after every write. The problem is that this file grows without bound: if you leave redis running forever, it will eventually fill up your hard drive, and recovering from a restart will take way too damn long if you have to replay a 1 TB log file. The conventional way of dealing with this is to periodically use the BGWRITEAOF command, which does essentially the same thing as a background data dump: it writes out a new AOF file from the current contents of redis in memory, and deletes the old AOF file. This is roughly equivalent to augmenting the usual periodic-data-dump behavior of redis with periodically-flushed logs, just like a more conventional database.

If there's something I'm missing here, I'd love to hear it.

Redis also support PubSub, lists, ordered lists, and hashes. Not exactly a strict "key value" store.

We're working on integrating Redis into our stack, but we use it mostly as a caching layer. As with any NoSQL datastore, your data will have to be structured far differently than if you were using a relational database. The nice thing about Redis is that it's a lot more effective than a simple key-value store for representing the data structures that exist in your application. If you're willing to put the time into learning this paradigm of data persistence the logical interaction with your data won't be a problem.

Redis does have a number of shortcomings. Firstly, it doesn't provide a very sophisticated transactions system. You get multi blocks, and the ability to watch keys (which is like a check and set), but you don't get true transactions. For example, there's no rollback mechanism, and commands will still be executed in a multi block if one of them fails.

Secondly, Redis by default does not provide strong guarantees of durability. It writes a snapshot to disk of your data periodically, so if something happens to the server that causes the program to shut down unexpectedly, you'll lose a lot of data. Redis can be configured to provide stronger guarantees of durability, but at the expense of speed.

Thirdly, there is currently no sharding mechanism built into Redis. They're working on Redis Cluster, which will allow your data to be spread across multiple servers, but it won't come out for sometime. You can build your own distribution system into your application however. You can read up on consistent hashing algorithms to help you with ideas for that.

Fourthly, everything is in memory at all times. That's pretty expensive, though Redis is quite efficient with memory usage.

Redis is really fast, which is awesome, but we still use MySQL as our primary datastore. We have a write-through/read-through caching layer, and if the transaction ever fails in part, we just rollback the MySQL transaction and invalidate the key in Redis, because we can trust that MySQL's records are more authoritative.

Yes, it is, but I think they serve different purposes. I'd use Redis for short-to-medium lifetime data that has high transaction rates. I'd use SQL for medium-to-long term data with lower transaction rates. We use both.

Redis does what it does well -- make great use of the CPU and the memory on a single box. SQL keeps your data consistent long-term and makes it easy to do ad-hoc queries.

If your data fits in memory and can be modeled on a key/value design, then why not?

A few things you might want to consider:

* depending on the size of your dataset, starting/rebooting redis can take a while. The bigger your dataset is, the longer it takes

* AOF can be a pain to maintain, since you need to allocate enough disk space for it + for when you back it up.

* if you plan to have millions of keys, compressing them isn’t a bad idea.

I was asking myself the same question and my answer was Draw![1]. I made this little app to learn more about Redis and html5 canvas element.

I wrote model/helper classes to wrap repetitive redis code, you can check the source at Draw! github repository[2].

Anyway after this experience I learnt Redis is a great tool but it doesn't fit good as a datastore for "everything" you should store for your app.

[1] http://drawbang.com

[2] https://github.com/potomak/drawbang/tree/master/models

With redis, all your data must fit in RAM on your server(s).

It really depends on how much data your application uses (and how much you expect it to use as you grow).

What do you mean by robust query capabilities? There are different types of values in redis, each with special operations, but you can't really query on the values. If you can model your data with hashes, simple values, and lists, then redis may suite you. I use it in conjunction with relational tools, but I can imagine using it solely for a simple project.

In my original title i had added for a small applicaiton but then i decided i was curious to see about using for something beyond just something simple. But you are right about the querying. I was thinking of it more in comparison to membase and memcache.

Redis operations are performed in RAM and while you can specify an interval where its contents should be written to disk you risk loosing some data if it closes unexpectedly. If that is not a problem I don't see why it couldnt be used as the sole datastorage.

That's not quite true; Redis also provides an append-only-log mode of operation just like conventional databases, and of course it can be run with master-slave replication. I wrote a bigger reply here:


Why not?

EDIT: If it meets your application's logical and scaling needs, there's no reason you couldn't use it

Are there any hidden reasons you could think of not to?

I've read that persistence/virtual memory is quite slow (as of at least a few months ago). If your application can stand losing some of its data every once in a while, redis could be a good option.

As far as I know there have been improvements in the virtual memory field. However it is not recommended to use redis if the data you want to store in it exceeds the amount of ram in your server.

Virtual memory support is being dropped from redis in versions after 2.4, so you should definitely avoid using it.

thanks for keeping me updated about redis :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact