
The Redis criticism thread - HeyChinaski
http://antirez.com/news/67
======
rdtsc
To summarize quickly for those that didn't read the article, this focuses
mostly on the new "clustering" aspect not on the "classic" single Redis server
(if you wish).

I think it is important to be honest with the users and make it clear how and
what happens behind the scenes, how data could be lost. And Salvatore has done
most of this, maybe just make it a bit more explicit, as there still seems to
be some confusion around.

All this is in light of 2 things -- 1) With the popularity and amount of talks
and churn around distributed systems these days, people sort of expect a point
on the map in the CAP triangle. So just saying we kind of do this and we
provide some C, a little A and a dash HA was probably ok 5 years ago, now it
needs a bit more definition, 2) In light of other database systems misleading
users about what it could provides (you know which one I am talking about) and
having resulted in lost data, there is a bit of apprehension and a higher bar
that needs to be met in order for a db product to be accepted.

One good thing that came out recently is NoSQL database writers/vendors
pushing for more rigorous tests. Tests that run for weeks and months.
Consistency tests, network partition tests as run by Aphyr. It is a very good
idea those things are talked about and defined better.

~~~
fizx
I think Antirez is simply doing a master/slave system with async replication.
This is what mysql, solr, elasticsearch, kafka, etc do. This is not an unusual
model at all.

In CAP theorem terms, Redis has picked zero (remember CAP theorem is pick _at
most_ two).

There's a bunch of people who've made this choice, but why? C incurs a
synchronization cost. A means that you have to reconcile different
writestreams. If you want consistent semantics in a very fast database, you
can't pick either. So you end up somewhere in the middle of the triangle.

The consequence of picking zero is that you'll lose a time window of data
roughly proportional to the replication lag when the master fails/partitions.
There are many applications for which bounded data loss is a perfectly
reasonable paradigm.

~~~
aphyr
I assert that this design actually allows for arbitrarily long windows of data
loss, but I haven't verified that the implementation matches my understanding
of antirez's WAIT/failover algorithm yet. Pretty sure though.

~~~
antirez
If you read the Redis Cluster documentation, the data loss is acknowledged,
even explaining what are the most obvious failure modes where this happens.
WAIT does not provide strong consistency either, and is actually not even
documented in the Cluster doc. However WAIT as it is makes the possibility for
data loss less likely.

Once everything in the Redis Cluster design and documentation is about trading
consistency for performance, I would expect the system to be analyzed for what
it is: is it good at providing weak but practically usable consistency and
performances? In short, it does respect the design intent?

Saying again and again that it features not consistency is a sterile exercise.

------
StavrosK
This is the first time I've ever heard any criticism for redis. As far as I
know, everyone loves it, it's a great tool for many jobs and it's amazingly
written and solid to boot. I think many of the critics are trying to apply it
in ways it wasn't meant to be used.

As far as I know, its main purpose is non-critical data that needs to be
accessed as quickly as possible in various different ways, and that's where
redis shines.

~~~
tptacek
I don't pay super close attention to this stuff, but it's very obvious to me
that not everyone likes Redis; in particular, the kinds of people who have
problems with Mongodb tend also to have the same problems with Redis; long-
term operational reliability is a big question mark with it.

For my part: on only a few occasions (mostly fuzzer farm stuff) have I ever
used Redis and been happy with the decision. I usually regret using it. I
_always_ regret using it as an alternative to SQL.

~~~
StavrosK
I don't know, I don't think it should be compared (or considered as an
alternative to) MongoDB, and certainly never SQL. I use it to store cached
keys, session data and similar other data that I can always regenerate. I've
also had great success using it as a message queue on multiple projects.

What was your use case, and what was your experience with it?

~~~
meritt
Agreed. If Redis is an acceptable alternative to MongoDB you're using one of
the two very incorrectly.

~~~
lmm
Use case: I want to store some json values against some string keys. Which
would be very incorrect to use for this?

~~~
jjirsa
How often are you updating the json values? 10/s? 100/s? 1M/s?

How many string keys? Millions? Billions? What happens if you lose an update?

Redis great for lots of rapid reads and moderate write speed for data that
fits on a single server or can be manually sharded well (if it's straight
k=>v, as you describe, that basically means your json fits in RAM; if you're
using larger objects like sets/zsets/etc, it becomes a slightly different
discussion), as long as your application can lose a few seconds without
killing you (BGSAVE isn't instantaneous, of course).

------
pashields
Just to be clear, the criticism is of Redis Cluster, which has been redesigned
because the first design was so lax it was effectively unusable. The real
issue, is that as of right now I have no idea what the exact semantics and
failure conditions of Redis cluster are. I'm not clear that anyone really
knows. Salvatore _thinks_ he knows, but we can't be sure that he does.

With all this talk of practicality, what really makes distributed systems
practical is when someone can do a formal analysis of them and conclude what
possible states can occur in the system. This is not work that database users
should do, it is work that database implementors should do. The failure to use
a known consensus system is a failure to deliver a database I can understand.

I find this all a bit disappointing since I've been a huge fan of Redis since
the early days. It's an amazing tool that I still have in production, but I
get the feeling that it's utility will never expand to suit some of my larger
needs. Bummer.

~~~
pfraze
He is using known algorithms. He describes them here:
[https://news.ycombinator.com/item?id=6780342](https://news.ycombinator.com/item?id=6780342).

~~~
pashields
see Aphyr's comments here: [https://groups.google.com/d/msg/redis-
db/Oazt2k7Lzz4/q9YDD7D...](https://groups.google.com/d/msg/redis-
db/Oazt2k7Lzz4/q9YDD7Da9ZkJ)

~~~
pfraze
Thanks, I hadn't seen this

------
dorfsmay
> Redis is probably the only top-used database system developed mostly by a
> single individual currently

isn't SQLite pretty much just Richard Hipp?

~~~
rogerbinns
Dan Kennedy does a lot of work, certainly appearing to be a similar order of
magnitude to DRH. Joe Mistachkin appears to do Windows almost exclusively and
I can't tell if he is part time or full time.

[https://www.sqlite.org/crew.html](https://www.sqlite.org/crew.html)

~~~
dorfsmay
I had not realised that. Thanks.

------
HeyChinaski
Salvatore is admirably proactive at garnering criticism for his project. He's
also incredibly gracious when accepting it.

------
bsg75
There has been a lot of constructive discussion in the Google Groups thread,
and a lot of negativity on the same topic on Twitter. Ignoring for the moment
the difference in discussion platforms, I am uncertain why the sudden uptick
in Redis negativity.

Is it from users trying (and failing) to replace RDBMS or distributed
platforms feature-for-feature with a single threaded, memory limited store
like Redis? Or could it be FUD from people interested in seeing their new
platform-du-jour get more attention?

~~~
bri3d
The "clustering" feature is just now becoming stable. Distributed systems are
complex (in no small part because they're generally poorly explained) so some
mistakes were guaranteed. Hence the uptick in critics makes perfect sense even
sans self-promotion conspiracies.

------
elwell
I think many times people criticize new-ish web technologies because they
don't want to spend time learning it (which can be okay). They hope to sway
others away so that a critical mass of users never compels them to adopt the,
now necessary, tech (not that redis is 'necessary').

------
strlen
Quick things

1) I expected that thread to look like a Cultural Revolution struggle
session". Thankfully it wasn't.

2) As I am sure many others have already said it, durability has very little
to do with CAP. CAP is about the A and I in in ACID, D is an orthogonal
concern.

3) Durability doesn't necessarily mean losing high performance. Most databases
let the user choose how much data they're willing to lose and for what latency
decreases -- the standard approach (used even in main-memory databases like
RAMCloud[1] and recent versions of VoltDB[2]) is to keep a separate write-
ahead log (WAL) and let the end-user choose how frequently to fsync() it to
disk as well as how frequently to flush a snapshot of main memory data
structures to disk.

There are many papers (e.g.,
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174.6205))
that talk about various challenges of building WALs, but fundamentally users
who want strongest possible single-machine durability can choose to fsync() on
every write (and usually use battery backed raid controllers or separate log
devices like SSDs with supercapacitors or even NVRAM if the writes are going
to be larger than what fits into the RAID controller's write-back cache).
Others can choose to live with possibility of losing some writes, but use
replication[3] to protect against power supply failure and crashes -- idea
being that machines in a datacenter are connected to a UPS, replicas don't all
live on the same rack (to protect against -- usually rack local -- UPS
failure), and there's cross-data center replication (usually asynchronous with
possibility of some conflicts -- notable exception being Google's Spanner/F1)
to protect (amongst many other things...) against someone leaning against the
big red-button labeled "Emergency Power Off" (which is exactly what you think
it is).

Flushes of main data do also hurt bandwidth with spinning disks and old or
cheap SSDs, but there's a solution: use a a good, but commodity MLC SSD with
synchronous toggle NAND flash with a good controller/firmware (Sandforce
SF2000 or later series, Intel/Samsung/Indilinx's recent controllers) -- these
work on the same principle as DDR memory (latch onto both edges of a signal)
to provide sufficient bandwidth to handle both random reads (traffic you're
serving) and sequential writes (the flush).

4) I known several tech companies and/or engineering departments therein who
absolutely love and swear by redis. There are very good reasons for it: the
code is extremely clean and simple[4] and it handles a use case that neither
conventional databases nor pure kv-stores or caches handle well.

That use case is roughly described data structures on an outsourced heap for
maintaining a materialized view (such as a user's newsfeed, adjacency lists of
graphs stored efficiently using compressed bitmaps, counts, etc...) on top of
a database. So my advise to antirez is to focus the effort around making this
use case simpler rather than build redis out into a database: build primitives
to let developers piggy back durability and replication to a database or a
message queue. In fact, I've known of multiple startups that have (in an ad-
hoc way) implement pretty much exactly that.

This is still a tough problem, but one which (I think) would yield a lot more
value to redis users. Just thinking out loud, one approach could be a way to
associate each write to redis with an external transaction id (HBase LSN,
MySQL gtid, or perhaps an offset in a message queue like Kafka). When redis
flushes its data structures to disk, it stores the last flushed transaction id
to persistent storage.

I would also implement fencing within redis: when in a "fenced" mode redis
won't accept any requests on a normal port, but can accept writes through an
bulk batch update interface that users can program against. This could be more
fine grained by having both a read-fence and a write fence, etc...

This makes it easier for users to tackle replication and durability
themselves:

For recovery/durability, users can configure redis such that after a crash, it
is automatically fence and "calls-back" with that last flushed id into users'
own code -- by either invoking a plugin, doing an REST or RPC call to a
specified endpoint, or simply using fork() and executing a user configured
script which would use the bulk API.

For replication, users could use a high-performance durable message queue
(something I'd imagine some users already do) -- a (write-fenced) standby
redis node can then become a "leader" (unfence itself) once its caught up to
the latest "transaction id" (last consumed offset in the message queue, as
maintained by the message queue itself -- in case of Kafka this is stored in
ZooKeeper). More advanced users can tie this with database replication by
either tailing the database's WAL (with a way to transform WAL edits into
requests to redis) or using a plugin storage engine for the database.

Fundamentally, where I see redis used successfully are uses cases where (prior
to redis) users would use custom C/C++ code. This cycles back to the
"outsourced on heap data structures" idea -- redis lets you use a high level
languages to do fast data manipulation without worrying about performance of
the code (especially if using a language like Ruby or Python) or garbage
collection on large heaps (a problem with even the most advanced concurrent
GCs like Java's).

There have been previous attempts to build these outsourced heaps as end-to-
end distributed system that handle persistence, replication, and scale-out and
transactions. These are generally called "in-memory data grids" \-- some
simply provide custom implementations of common data structures, others act
almost completely transparent and require no modifications to the code (e.g.,
some by using jvmti). Terracotta is a well known one with a fairly good
reputation (friends who contract for financial institutions and live in
hell^H^H^H^H world of app servers and WAR files swear by it), but JINI and
JavaSpaces were some of the first (too came too early, way before the market
was ready) and are rightly still covered by most distributed systems
textbooks. However their successful use usually requires Infiniband or 10GbE
(or Myrinet back in dotcom days) -- reliable low-latency message delivery is
needed as (with no API to speak off) there's no easy way for users to recover
from network failures or handle non-atomic operations.

To sum it up, I'd suggest to examine and focus on use-case where redis is
already _loved_ by its users, don't try to build a magical end to system as it
won't conserve the former, and make it easy (and to an extent redis already
does this) to let users build custom distributed systems with redis as a well-
behaved component (again, they're already doing this).

[1]
[https://ramcloud.stanford.edu/wiki/display/ramcloud/Recovery](https://ramcloud.stanford.edu/wiki/display/ramcloud/Recovery)

[2] [http://voltdb.com/intro-to-voltdb-command-
logging/](http://voltdb.com/intro-to-voltdb-command-logging/)

[3] Whether it's synchronous or not is about the atomicity guarantees and not
durability -- the failure mode of acknowledging a write and then 'forgetting'
can happen in these systems even if they fsync every write.

[4] It reminds me of NetBSD source code: I can open up a method and it's very
obvious what it does and how.

~~~
antirez
Hello strlen, thanks for the interesting reply.

While CAP and durability are orthogonal they are very related in actual
systems, I don't think it is ok for a system like Redis to assume that users
have multi DC replication and/or other infrastructure preventing mass reboots
of small clusters composed of a few nodes. Also note that the more nodes in a
distributed system are "decoupled" from the point of view of failures
(different physical networks / equipment, different datacenter), the more you
are likely adding latency.

But the point in the discussion was _never_ that, since synchronous
replication by default is already, exactly as you express in your message, not
the Redis business, so fsync or not, Redis Cluster is not going to feature
"C". WAIT and its semantics ended taking all the attention, because you know,
if your work is to show "C" is violated, you tend to focus there, regardless
of the system analyzed not claiming to be consistent.

As for Redis Cluster, many places where Redis is used in the right way, in the
environments you say people are happy with Redis, would benefit from the
automatic sharding and the operational simplicity that Redis Cluster can
provide to Redis, this is why Redis Cluster is IMHO a good milestone in the
roadmap.

------
outside1234
I had no idea that there was ONLY antirez working on Redis. That is some crazy
inspirational stuff.

------
midysky
Mm Nn

