

Thoughts on Redis - spahl
http://blog.kennejima.com/post/1226487020/thoughts-on-redis

======
codahale
I'm sorry, this is complete and total gibberish.

* The CAP theorem is a logical proof of the impossibility of providing both consistency and availability on a network which can lose messages (or using machines which can fail). You can't implement it any more than you can implement general relativity. It's a description of reality.

* The CAP theorem is not a data model which competes with or intersects at all with relational algebras. Rather, a relational algebra is the logical model (allegedly) underlying RDBMSes which are systems which historically provide consistency at the expense of availability in the presence of faults (thus obeying the CAP theorem because they're real systems and not opium dreams).

* Scaling horizontally does not imply anything about fault-tolerance. It instead describes systems in which resources can be incrementally added to incrementally gain capacity. It's possible to build a horizontally-scalable system which is less reliable than a single-machine system; it's also possible to build fantastically fault-tolerant systems which are also horizontally-scalable (c.f. Dynamo). Doing the latter is considered "a good idea;" doing the former is considering "fucking daft."

* Nothing about horizontally-scalable systems (or NoSQL or really anything the author mentions except for Redis) requires that the entire dataset be kept in memory. Systems like Riak (<http://riak.basho.com>) or Voldemort (<http://project-voldemort.com/>) use pluggable storage engines, some of which (e.g. InnoStore and BDB-JE) have excellent performance with 1:10 RAM-to-dataset ratios. By the author's own metric, the Holy Grail has not only been found but the damn things are multiplying.

* Neither epoll nor kqueue "scale indefinitely in terms of I/O concurrency." Nothing does. That's horseshit.

You're better off huffing glue than reading this thing. I don't even care what
he has to say about Redis. He could have some incredible insights about it,
but they'd be completely and totally negated by the incomprehension,
misinformation, untruths, and general crazytalk which preceded it.

tl;dr: 15+ years of RDBMS experience gives you 0 clues about distributed
computing; reading Time Cube (<http://www.timecube.com/>) is preferable to
reading this drek.

------
ericflo
"First, scaling horizontally has little to do with the database engine itself
- creating a transparent, consistent hash function is the easiest part."

That is just so incorrect that it's hard to take the rest of the post
seriously.

What happens when you want to add a node to your cluster? What happens when a
node goes down? What happens when you drop some packets between nodes? When
one node has an unbalanced number of keys?

Then, answering each of these questions brings with it many more questions.
For example, if the answer to "what happens when your node does down" is to
replicate some data to another node, then how do you deal with inconsistencies
between different replicas of the data? What happens if the node you try to
replicate to goes down? What if the node you try to replicate to has a
different idea of what that data should be?

If a database is to be truly horizontally scalable, it will have an answer for
all of these questions. Which has a _lot_ to do with "the database engine
itself."

~~~
FooBarWidget
Replication is for redundancy, not horizontal scalability. Unless you mean
scaling reads, but replication won't scale your writes. For scaling writes you
need sharding but not replication.

------
m0th87
"In the computer science terminology, an O(N) algorithm is considered “naive”,
and in the computer security terminology, it even has a name - “brute force”"

WTF?

~~~
barrkel
I was trying to figure out if he was suggesting that adding machines ought to
deliver a superlinear performance boost, which would be quite a trick.

(Yes, there are sources of slight superlinear speedup in parallel scenarios,
but nothing capable of beating n^2, never mind exponential growth.)

------
tmountain
Redis works great as long as the dataset fits in RAM. After that, the
background saving process kicks in, and performance becomes an issue. This
caused my company to move away from Redis to Mongo. It's foolish to assume
that just because a product goes beyond storing key/value pairs that it's over
engineered. It actually seems that no research was done outside of what Redis
can do given the portion of the article talking about namespaces for keys not
being inherent in NoSQL solutions. Check out Mongo's collections. That's
exactly what they are.

~~~
antirez
> Redis works great as long as the dataset fits in RAM

This is by design, companies moving away from Redis because of this did not
understood the deal at the beginning, and where looking for something else, so
it was a good idea to move away.

Redis is mostly an in-memory database that happens to be disk-backed. With VM
it is a different issue, and there are interesting VM uses, but the vast
majority of Redis users are using the DB without VM, and as it is, as an in
memory store, where the disk dump is used in order to reload the data on
startup.

Because this is the Redis way, even developments are focused towards this
direction: to use less memory for common data types, and scalable clustering
in order to make it simple to use multiple instances.

I see this as a very simple to gasp thing. Just because this is the argument
of the discussion instead I fail to see why Mongo should instead not be just
considered as an SQL-family DB. It seems more or less a subset of SQL, but
implemented with different tradeoffs. For sure they have some good motivations
to avoid SQL, but what I mean is that _semantically_ it looks a relational
database, while Redis has a completely different data model, so I can't see
how the two systems really are a reciprocal drop in replacement and/or
comparable solutions.

So I can see how MongoDB can be an alternative to MySQL when used to store a
lot (much larger than memory) of row-alike data (call it documents or like you
want).

And I can see how Redis can be used when you have memory fitting databases and
need very high performances, and in general for all the needs of atomic data
structures and complex server-side operations in this data structures. For
instance storing or caching timelines, taking leader boards for a game via
sorted sets, and a zillion other use cases that are currently running while we
are talking.

I can't see instead how MongoDB can replace Redis or the other way around, if
not for a very small subset of cases.

------
JulianMorrison
If the guy needs access counters without an ever-growing disk file, he
dismissed MongoDB too fast. The ability to repeatedly alter fixed size data
without growing the storage is something MongoDB has, and no other database
engine that I know of. (Their design has its own serious disadvantages, but
still, if that's the behaviour you need...)

~~~
bobfunk
Seems ironic that he dismisses MongoDB for being "over-featured," and then
start listing problems with Redis that these extra features solve.

------
Loic
I posted my answer here as it is long:
[http://www.ceondo.com/ecte/2010/10/challenge-web-scale-
integ...](http://www.ceondo.com/ecte/2010/10/challenge-web-scale-integration-
architecture-1-10)

To save you one click, the real problem about the web scale is the
integration. This is the hard part, this is where things start to break, this
is where on can have fun too.

------
souperhearo
Wonderful post. Probably the best I have seen on comparing a traditional
relational database and a nosql database. Also very well written.

