I think it is important to be honest with the users and make it clear how and what happens behind the scenes, how data could be lost. And Salvatore has done most of this, maybe just make it a bit more explicit, as there still seems to be some confusion around.
All this is in light of 2 things -- 1) With the popularity and amount of talks and churn around distributed systems these days, people sort of expect a point on the map in the CAP triangle. So just saying we kind of do this and we provide some C, a little A and a dash HA was probably ok 5 years ago, now it needs a bit more definition, 2) In light of other database systems misleading users about what it could provides (you know which one I am talking about) and having resulted in lost data, there is a bit of apprehension and a higher bar that needs to be met in order for a db product to be accepted.
One good thing that came out recently is NoSQL database writers/vendors pushing for more rigorous tests. Tests that run for weeks and months. Consistency tests, network partition tests as run by Aphyr. It is a very good idea those things are talked about and defined better.
In CAP theorem terms, Redis has picked zero (remember CAP theorem is pick at most two).
There's a bunch of people who've made this choice, but why? C incurs a synchronization cost. A means that you have to reconcile different writestreams. If you want consistent semantics in a very fast database, you can't pick either. So you end up somewhere in the middle of the triangle.
The consequence of picking zero is that you'll lose a time window of data roughly proportional to the replication lag when the master fails/partitions. There are many applications for which bounded data loss is a perfectly reasonable paradigm.
Once everything in the Redis Cluster design and documentation is about trading consistency for performance, I would expect the system to be analyzed for what it is: is it good at providing weak but practically usable consistency and performances? In short, it does respect the design intent?
Saying again and again that it features not consistency is a sterile exercise.
That might be true, but describing something in the CAP framework is not the only salient thing you can say about a distributed storage system. It is characterizing the failure modes.
You also have to think about what happens in the normal case. In the normal case you have a consistency vs. latency tradeoff. People have written about this, but unfortunately I don't think this broader way of thinking hasn't the attention it deserves:
"If you examine PNUTS through the lens of CAP, it would seem that the designers have no idea what they are doing (I assure you this is not the case). Rather than giving up just one of consistency or availability, the system gives up both!"
"The reason is that CAP is missing a very important letter: L. PNUTS gives up consistency not for the goal of improving availability. Instead, it is to lower latency. "
Even you are using "I think antirez is simply doing..."
Not arguing the design is not useful or it is a bad design, Savlatore's code is great, I think it is just a matter of more testing, docs or a blog post.
No, he picked P, Partition Tolerance. You can't not pick P. Unless you assume a never-failing network with never-failing nodes.
As far as I know, its main purpose is non-critical data that needs to be accessed as quickly as possible in various different ways, and that's where redis shines.
For my part: on only a few occasions (mostly fuzzer farm stuff) have I ever used Redis and been happy with the decision. I usually regret using it. I always regret using it as an alternative to SQL.
Mongo has taken a lot of knocks in part because its write ack behavior was (initially) documented in a way that not everyone felt was sufficient, but also because it is marketed as a universal solution for RDBMS woes.
Redis has never been touted as a replacement for a primary data store, but more of a "toolbox" in some cases, or "middleware" in others. To apply the same criticisms to Redis and Mongo implies those users did not research their platform decisions at all.
What was your use case, and what was your experience with it?
How many string keys? Millions? Billions? What happens if you lose an update?
Redis great for lots of rapid reads and moderate write speed for data that fits on a single server or can be manually sharded well (if it's straight k=>v, as you describe, that basically means your json fits in RAM; if you're using larger objects like sets/zsets/etc, it becomes a slightly different discussion), as long as your application can lose a few seconds without killing you (BGSAVE isn't instantaneous, of course).
I want to use it in a small project so I can learn the design patterns associated with key-value stores. I'm sure there are good and bad things about using Redis vs Postgres.
But it has to hold my data, so... does it hold my data or not? I'm talking about single box stuff here, of course.
(Of course, I'll just use it and find out myself).
Is all data in a database that important? Maybe alternative databses would generate new webapp patterns, like dynamic languages do.
Mongodb seems like a solution that was optimized for a certain performance sweet spot that, post-SSD and huge RAM, hardly anybody ever really encounters and 99% of the times it's used you would be better off with Postgres.
Redis on the other hand hits a (probably permanent) sweet spot of ultra-high performance one level of abstraction lower than your usual database; which is a use case everyone trying to saturate metal and squeeze out the last bit of performance on a minimal budget has.
As an alternative to an in-memory cache or (occasionally) a DB de-normalization, on the other hand, it works amazingly well for me.
I see the persistence is basically just a backup that helps your system get back up to speed quickly after a crash or systems' restart. It's not something that should ever be treated as reliable.
I wonder how many people who use it under this or similar use cases have an issue with it.
I've used it for caching, session and transient objects connected to Rails without issue for the last two years.
Additionally, I've been looking at it recently for a simple database to perform fast matches on sorted data (E.g. get me the lowest number in this set).
Why would you need to downgrade if you didn't have issues? ;) ...kinda joking, kinda honest curiosity.
And one more:
- Storing diffs like those used for Google Docs (literally every change) for a real-time communication system.
The last one works by the client not getting a response and continuing to queue up its diff and send it once the server is back up. Manual conflict resolution is likely necessary after a drop in availability, but consistency is easy to figure out with monotonically increasing IDs.
With all this talk of practicality, what really makes distributed systems practical is when someone can do a formal analysis of them and conclude what possible states can occur in the system. This is not work that database users should do, it is work that database implementors should do. The failure to use a known consensus system is a failure to deliver a database I can understand.
I find this all a bit disappointing since I've been a huge fan of Redis since the early days. It's an amazing tool that I still have in production, but I get the feeling that it's utility will never expand to suit some of my larger needs. Bummer.
isn't SQLite pretty much just Richard Hipp?
Is it from users trying (and failing) to replace RDBMS or distributed platforms feature-for-feature with a single threaded, memory limited store like Redis? Or could it be FUD from people interested in seeing their new platform-du-jour get more attention?
1) I expected that thread to look like a Cultural Revolution struggle session". Thankfully it wasn't.
2) As I am sure many others have already said it, durability has very little to do with CAP. CAP is about the A and I in in ACID, D is an orthogonal concern.
3) Durability doesn't necessarily mean losing high performance. Most databases let the user choose how much data they're willing to lose and for what latency decreases -- the standard approach (used even in main-memory databases like RAMCloud and recent versions of VoltDB) is to keep a separate write-ahead log (WAL) and let the end-user choose how frequently to fsync() it to disk as well as how frequently to flush a snapshot of main memory data structures to disk.
There are many papers (e.g., http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174....) that talk about various challenges of building WALs, but fundamentally users who want strongest possible single-machine durability can choose to fsync() on every write (and usually use battery backed raid controllers or separate log devices like SSDs with supercapacitors or even NVRAM if the writes are going to be larger than what fits into the RAID controller's write-back cache). Others can choose to live with possibility of losing some writes, but use replication to protect against power supply failure and crashes -- idea being that machines in a datacenter are connected to a UPS, replicas don't all live on the same rack (to protect against -- usually rack local -- UPS failure), and there's cross-data center replication (usually asynchronous with possibility of some conflicts -- notable exception being Google's Spanner/F1) to protect (amongst many other things...) against someone leaning against the big red-button labeled "Emergency Power Off" (which is exactly what you think it is).
Flushes of main data do also hurt bandwidth with spinning disks and old or cheap SSDs, but there's a solution: use a a good, but commodity MLC SSD with synchronous toggle NAND flash with a good controller/firmware (Sandforce SF2000 or later series, Intel/Samsung/Indilinx's recent controllers) -- these work on the same principle as DDR memory (latch onto both edges of a signal) to provide sufficient bandwidth to handle both random reads (traffic you're serving) and sequential writes (the flush).
4) I known several tech companies and/or engineering departments therein who absolutely love and swear by redis. There are very good reasons for it: the code is extremely clean and simple and it handles a use case that neither conventional databases nor pure kv-stores or caches handle well.
That use case is roughly described data structures on an outsourced heap for maintaining a materialized view (such as a user's newsfeed, adjacency lists of graphs stored efficiently using compressed bitmaps, counts, etc...) on top of a database. So my advise to antirez is to focus the effort around making this use case simpler rather than build redis out into a database: build primitives to let developers piggy back durability and replication to a database or a message queue. In fact, I've known of multiple startups that have (in an ad-hoc way) implement pretty much exactly that.
This is still a tough problem, but one which (I think) would yield a lot more value to redis users. Just thinking out loud, one approach could be a way to associate each write to redis with an external transaction id (HBase LSN, MySQL gtid, or perhaps an offset in a message queue like Kafka). When redis flushes its data structures to disk, it stores the last flushed transaction id to persistent storage.
I would also implement fencing within redis: when in a "fenced" mode redis won't accept any requests on a normal port, but can accept writes through an bulk batch update interface that users can program against. This could be more fine grained by having both a read-fence and a write fence, etc...
This makes it easier for users to tackle replication and durability themselves:
For recovery/durability, users can configure redis such that after a crash, it is automatically fence and "calls-back" with that last flushed id into users' own code -- by either invoking a plugin, doing an REST or RPC call to a specified endpoint, or simply using fork() and executing a user configured script which would use the bulk API.
For replication, users could use a high-performance durable message queue (something I'd imagine some users already do) -- a (write-fenced) standby redis node can then become a "leader" (unfence itself) once its caught up to the latest "transaction id" (last consumed offset in the message queue, as maintained by the message queue itself -- in case of Kafka this is stored in ZooKeeper). More advanced users can tie this with database replication by either tailing the database's WAL (with a way to transform WAL edits into requests to redis) or using a plugin storage engine for the database.
Fundamentally, where I see redis used successfully are uses cases where (prior to redis) users would use custom C/C++ code. This cycles back to the "outsourced on heap data structures" idea -- redis lets you use a high level languages to do fast data manipulation without worrying about performance of the code (especially if using a language like Ruby or Python) or garbage collection on large heaps (a problem with even the most advanced concurrent GCs like Java's).
There have been previous attempts to build these outsourced heaps as end-to-end distributed system that handle persistence, replication, and scale-out and transactions. These are generally called "in-memory data grids" -- some simply provide custom implementations of common data structures, others act almost completely transparent and require no modifications to the code (e.g., some by using jvmti). Terracotta is a well known one with a fairly good reputation (friends who contract for financial institutions and live in hell^H^H^H^H world of app servers and WAR files swear by it), but JINI and JavaSpaces were some of the first (too came too early, way before the market was ready) and are rightly still covered by most distributed systems textbooks. However their successful use usually requires Infiniband or 10GbE (or Myrinet back in dotcom days) -- reliable low-latency message delivery is needed as (with no API to speak off) there's no easy way for users to recover from network failures or handle non-atomic operations.
To sum it up, I'd suggest to examine and focus on use-case where redis is already loved by its users, don't try to build a magical end to system as it won't conserve the former, and make it easy (and to an extent redis already does this) to let users build custom distributed systems with redis as a well-behaved component (again, they're already doing this).
 Whether it's synchronous or not is about the atomicity guarantees and not durability -- the failure mode of acknowledging a write and then 'forgetting' can happen in these systems even if they fsync every write.
 It reminds me of NetBSD source code: I can open up a method and it's very obvious what it does and how.
While CAP and durability are orthogonal they are very related in actual systems, I don't think it is ok for a system like Redis to assume that users have multi DC replication and/or other infrastructure preventing mass reboots of small clusters composed of a few nodes. Also note that the more nodes in a distributed system are "decoupled" from the point of view of failures (different physical networks / equipment, different datacenter), the more you are likely adding latency.
But the point in the discussion was never that, since synchronous replication by default is already, exactly as you express in your message, not the Redis business, so fsync or not, Redis Cluster is not going to feature "C". WAIT and its semantics ended taking all the attention, because you know, if your work is to show "C" is violated, you tend to focus there, regardless of the system analyzed not claiming to be consistent.
As for Redis Cluster, many places where Redis is used in the right way, in the environments you say people are happy with Redis, would benefit from the automatic sharding and the operational simplicity that Redis Cluster can provide to Redis, this is why Redis Cluster is IMHO a good milestone in the roadmap.