
Ask HN:  Is it feasible to use redis as the only datastore? - clojurerocks
Ive read that Redis is used for caching.  But for a simple application is it feasible to use Redis as the only datastore since it provides fairly robust querying capabilities?
======
antirez
From the point of view of durability there are no problems in using Redis as a
primary data store, make sure to use AOF with fsync everysec setting.

However if you should do it or not depends on the data you need to store, and
even more in the kind of queries you want to run against the data set. If you
end with a complex schema in order to support SQL-alike queries it is a bad
idea. If you want to query data in a very Redis-API-obvious way, it is a good
idea.

A few remarks about not correct ideas expressed in different posts here:

1) Redis transactions are not good since there is no rollback: not true since
in Redis query errors can only happen in the case of a type mismatch more or
less, so when you have a MULTI/EXEC block if there are no obvious bugs in your
code you are going to see all the commands executed.

2) Redis durability is weak. Not true since with AOF + fsync it offers a level
of durability that is comparable to many other data stores. Depends on the
actual configuration used of course. Your usual MyISAM table for sure is not
more durable than Redis AOF + fsync everysec just to make an example.
Replication can add another level of durability of course.

3) RDB persistence instead _is_ a weak durability configuration, but works for
many applications where in the case of a bad event to have 15 minutes of data
lost is a viable option.

So if you, for instance, are planning to use Redis to store your meta data,
you can avoid having this same dataset duplicated in other places. Redis is
also pretty solid in providing solutions to backup your data. For instance
while you use AOF + fsync you can still call BGSAVE and create a compact .rdb
file to backup. You'll always be able to restore the data set from the RDB
file.

That said I think that in most complex applications using Redis + an on disk
and "more query friendly" data store makes a lot of sense. Redis + Mongo,
Redis + Mysql, Redis + Riak and so forth are very interesting configurations
IMHO.

------
patio11
It depends on what your simple application does and what its needs are. I
could rewrite either of my products to use Redis only, to the exclusion of
SQL. It wouldn't be a very good use of my time, but it would be trivially
possible. Neither of them put terrible stress on persistence engines. Heck, I
could rewrite them to do everything in flat files and that wouldn't be
impossible, either.

There are some rather relational-data-intensive projects I've been involved
with where that would have been an Exceptionally Poor Idea, both for the
amount of pain one would go through writing a poorly tested version of LEFT
JOIN to be able to get it to work, and because one will eventually discover
that your SQL database of choice has been improved for hundreds of man-years
along axes you care about and Redis has not.

~~~
clojurerocks
I guess what im wondering is more about the scalability of it. I know that
Mongodb could essentially be used as the only datastore however it apparently
has some issues currently with scalability.

~~~
mrgordon
Mongo 1.8 fixed a lot of the scalability issues. +1 to Mongo as an alternative
to SQL. Redis is great for a simple app but I wouldn't want to scale it to
handle everything for a large site.

~~~
clojurerocks
I didnt know that the issues in Mongo were fixed. I figured they would be
eventually though.

------
redbad
No. Redis is not a database, it's an in-memory key-value store. It's also
called a [distributed] data structure server, which (at least to me) implies a
tight coupling with your application: it's a way to offload shared state
between multiple components, but the persistence of that shared state is not
guaranteed.

In fact, Redis' persistence layer is best understood as a best-effort value-
add. If the server shuts down for any reason, you have to simply hope the last
disk flush was recent and successful. Otherwise, your data is lost. This is
(again, to me) fundamentally at odds with the contracts that any database
should provide. Also, Redis cluster is not yet released, which means running
more than one Redis server requires you to manage keyspace sharding at the app
layer.

Not that any of this is a knock against Redis. Even with those caveats, there
are a huge class of problems that Redis is perfectly suited for. I love the
software and use it daily. But Redis competes with memcached, _not_ MongoDB;
if you ever find yourself shoe-horning Redis into a role where total data loss
is anything other than temporarily annoying, you're doing it wrong.

tl;dr: IMO, using Redis as a database is a really bad idea, for most common
definitions of "database."

~~~
pjscott
I think your description of redis' reliability is a bit too vague and scary.
There are several levels of reliability that redis supports, each with its own
tradeoffs:

1\. No disk. Everything in memory, and if redis dies, so does your data. This
is closest to memcached.

2\. In memory, with periodic background flushes to disk. After a timeout
(shorter if there's a lot of modification to your data), redis will spawn a
background thread and write out all its data to a file in the background.
(Then it will atomically rename the file in the place of the previous dump
file.) This is the easiest form of disk persistence, and good enough for most
of what people use redis for.

3\. You can also configure redis to write to an append-only file that logs
every operation, which you can periodically compact with a cron job or
something. The flushing interval is configurable, and makes a big difference
in speed. This is not particularly convenient -- who the hell wants to write a
cron job to compact database logs? -- but it gives you durability on par with
a conventional database.

4\. If you have another machine lying around, there's an option that you can
combine with any of the three options above: master-slave replication. A read-
only slave redis server receives a stream of all writes to the master, and
changes itself to match. This gives a small data-loss window in the event of a
master failure, and makes fail-over possible. If the master goes down, you can
have a slave become the new master. Coordinating this can be tricky, but it
can certainly be done.

tl;dr: If the reliability approaches above look good enough for your
application, and redis looks like the best match from a semantic or
performance standpoint, then go for it!

~~~
redbad
I had temporarily forgotten about master-slave replication. Thanks for the
reminder. And I appreciate your broad points. But I still maintain that you're
fooling yourself if you think any of those options is a stand-in for Proper™
database-style persistence. By that I mean some contractual assurance in the
software that a change written to a data structure is permanent, to standards
comparable to the underlying storage system.

If Redis were a database, I would expect a successful HMSET to generally be
available in perpetuity, even if the machine was rebooted immediately after I
got the "OK".

Append-only command logs don't solve the problem; replaying a complex series
of state transitions is not a viable substitute for the storage of the end
result of those transitions. It's computationally correct, and a
straightforward implementation, but quickly and easily becomes unacceptably
inefficient. I hope that's clear enough that I don't need to provide an
example.

Replication is a solution for the problem of network or machine instability,
assuming a valid Redis use-case. It doesn't address persistence in the sense
of databases. In distributed computing, High-Availability is orthogonal to
Persistence.

Periodic background flushes to disk in another thread come the closest to
solving the persistence problem, but to get database-class QoS you'd need to
trigger on every change (or, say, every second). Obviously this is a bad idea
and not what the feature is designed for, which circles back to my main point:
Redis is not designed to be a database. If it's operating as a memcached
replacement in your stack, great. If it's standing in for authoritative, long-
term storage of critical data, it's being misused.

~~~
pjscott
Let's compare Redis's AOF mode with InnoDB. The way that InnoDB manages to
give this guarantee is by flushing its log to disk on every transaction
commit. If you're willing to sacrifice some durability for write speed, this
can be relaxed. In redis, the closest equivalent to this would be running in
AOF mode with a flush on every write.

The difference here is not one of durability, but in how the data is stored on
disk. InnoDB keeps the logs small by periodically updating a B-tree with the
changes in the logs, after which those changes can safely be removed from the
logs. The result of this is strong durability, a reasonably compact on-disk
representation, and fairly fast recovery when someone trips over the power
cord.

Redis, in AOF mode, logs every command to the log file and (if you specify it
in the config file) flushes to disk after every write. The problem is that
this file grows without bound: if you leave redis running forever, it will
eventually fill up your hard drive, and recovering from a restart will take
way too damn long if you have to replay a 1 TB log file. The conventional way
of dealing with this is to periodically use the BGWRITEAOF command, which does
essentially the same thing as a background data dump: it writes out a new AOF
file from the current contents of redis in memory, and deletes the old AOF
file. This is roughly equivalent to augmenting the usual periodic-data-dump
behavior of redis with periodically-flushed logs, just like a more
conventional database.

If there's something I'm missing here, I'd love to hear it.

------
tabbyjabby
We're working on integrating Redis into our stack, but we use it mostly as a
caching layer. As with any NoSQL datastore, your data will have to be
structured far differently than if you were using a relational database. The
nice thing about Redis is that it's a lot more effective than a simple key-
value store for representing the data structures that exist in your
application. If you're willing to put the time into learning this paradigm of
data persistence the logical interaction with your data won't be a problem.

Redis does have a number of shortcomings. Firstly, it doesn't provide a very
sophisticated transactions system. You get multi blocks, and the ability to
watch keys (which is like a check and set), but you don't get true
transactions. For example, there's no rollback mechanism, and commands will
still be executed in a multi block if one of them fails.

Secondly, Redis by default does not provide strong guarantees of durability.
It writes a snapshot to disk of your data periodically, so if something
happens to the server that causes the program to shut down unexpectedly,
you'll lose a lot of data. Redis can be configured to provide stronger
guarantees of durability, but at the expense of speed.

Thirdly, there is currently no sharding mechanism built into Redis. They're
working on Redis Cluster, which will allow your data to be spread across
multiple servers, but it won't come out for sometime. You can build your own
distribution system into your application however. You can read up on
consistent hashing algorithms to help you with ideas for that.

Fourthly, everything is in memory at all times. That's pretty expensive,
though Redis is quite efficient with memory usage.

Redis is really fast, which is awesome, but we still use MySQL as our primary
datastore. We have a write-through/read-through caching layer, and if the
transaction ever fails in part, we just rollback the MySQL transaction and
invalidate the key in Redis, because we can trust that MySQL's records are
more authoritative.

------
sehugg
Yes, it is, but I think they serve different purposes. I'd use Redis for
short-to-medium lifetime data that has high transaction rates. I'd use SQL for
medium-to-long term data with lower transaction rates. We use both.

Redis does what it does well -- make great use of the CPU and the memory on a
single box. SQL keeps your data consistent long-term and makes it easy to do
ad-hoc queries.

------
iampims
If your data fits in memory and can be modeled on a key/value design, then why
not?

A few things you might want to consider:

* depending on the size of your dataset, starting/rebooting redis can take a while. The bigger your dataset is, the longer it takes

* AOF can be a pain to maintain, since you need to allocate enough disk space for it + for when you back it up.

* if you plan to have millions of keys, compressing them isn’t a bad idea.

------
potomak
I was asking myself the same question and my answer was Draw![1]. I made this
little app to learn more about Redis and html5 canvas element.

I wrote model/helper classes to wrap repetitive redis code, you can check the
source at Draw! github repository[2].

Anyway after this experience I learnt Redis is a great tool but it doesn't fit
good as a datastore for "everything" you should store for your app.

[1] <http://drawbang.com>

[2] <https://github.com/potomak/drawbang/tree/master/models>

------
josephg
With redis, all your data must fit in RAM on your server(s).

It really depends on how much data your application uses (and how much you
expect it to use as you grow).

------
simpsond
What do you mean by robust query capabilities? There are different types of
values in redis, each with special operations, but you can't really query on
the values. If you can model your data with hashes, simple values, and lists,
then redis may suite you. I use it in conjunction with relational tools, but I
can imagine using it solely for a simple project.

~~~
clojurerocks
In my original title i had added for a small applicaiton but then i decided i
was curious to see about using for something beyond just something simple. But
you are right about the querying. I was thinking of it more in comparison to
membase and memcache.

------
codecaine
Redis operations are performed in RAM and while you can specify an interval
where its contents should be written to disk you risk loosing some data if it
closes unexpectedly. If that is not a problem I don't see why it couldnt be
used as the sole datastorage.

~~~
pjscott
That's not quite true; Redis also provides an append-only-log mode of
operation just like conventional databases, and of course it can be run with
master-slave replication. I wrote a bigger reply here:

<http://news.ycombinator.com/item?id=3010949>

------
hundredwatt
Why not?

EDIT: If it meets your application's logical and scaling needs, there's no
reason you couldn't use it

~~~
clojurerocks
Are there any hidden reasons you could think of not to?

~~~
smokestack
I've read that persistence/virtual memory is quite slow (as of at least a few
months ago). If your application can stand losing some of its data every once
in a while, redis could be a good option.

~~~
codecaine
As far as I know there have been improvements in the virtual memory field.
However it is not recommended to use redis if the data you want to store in it
exceeds the amount of ram in your server.

~~~
pjscott
Virtual memory support is being dropped from redis in versions after 2.4, so
you should _definitely_ avoid using it.

~~~
codecaine
thanks for keeping me updated about redis :)

