> Also, the confusion does not only come from the overlaps in the wording, but in the necessity to go into the implementation details to understand the difference in a real-life concurrent application. In practice, this is one of the difficulties that had led many people (me included) to look at the ‘NoSQL’ world.
Consistency in database systems is a hard topic because maintaining consistency in any concurrent system is hard. Traditional systems provide you with a set of tools (isolation levels, foreign keys) that make a decent programmer capable of building a safe concurrent application.
Throwing away those tools and replacing them with nothing does not make life easier. The tool is easier to understand, but the problem is harder to solve.
This is not to say that NoSQL systems do not have a place - of course they do - but I feel like a lot of people adopting them talk about 'eventual consistency' while what they maintain is actually inconsistency.
Maintaining consistency in NoSQL systems when the application is nontrivial is really hard - and if the developer is not up to understanding the locking model of a traditional DB, I'd be pretty surprised if they were up to working with an eventually consistent system.
We have a lot of "acid simply works" and "NoSQL is available". The blog is basically about saying things are not that simple, and it includes this "isolation in acid isn't that simple".
True SERIALIZABLE-level ACID does pretty much simply just work - and if you're using Postgres the performance hit really isn't too bad. Of course you're chucking away replication then, so whether it's suitable for your needs may vary rather!
Dynamo-based systems have 'tunable consistency' but that's almost always over one key: multi-key operations are usually inconsistent. That being the case, they're pretty much only 'easy to use' for applications with a very simple data model: my experience is that most applications of any real complexity will at some point want to do some kind of multi-key operation. That being the case, you're probably on the hook for a pretty expensive programmer.
I'm vaguely aware that this doesn't strictly apply to Cassandra, and it has a limited notion of transactions - last I checked they didn't work very well at all ( https://aphyr.com/posts/294-call-me-maybe-cassandra/ ), but that may well have changed
I do appreciate your blog post in general - I think there's an awful lot of oversimplifying of this stuff out there. Part of the problem is that high speed, concurrent, distributed data storage is a topic that is, at its heart, pretty damn complicated. Unfortunately,
Not exactly what I like to call 'simply works' ;-)
But I didn't want to say that a NoSQL database is always better than an traditional one. Just that Isolation is complex on traditional systems when dealing with volumes & concurrency. And, typically, transactions between tables or even rows are difficult/impossible for a distributed database, as these rows can be on different nodes (the 'multi-key operations' you mentioned)
> 'applications using this level must be prepared to retry transactions due to serialization failures'
True of any serializable system that supports concurrent access, AFAIK - not quite sure that's a fair criticism :-).
> And, typically, transactions between tables or even rows are difficult/impossible for a distributed database
Depends what you mean by 'distributed', really - Oracle RAC is very much distributed, and supports normal transactional behaviour. On the other hand you won't get that working across a large geographical area.
I accept that understanding the impact of isolation levels can be complex - I'm just very much of the opinion that you'll take a lot more pain trying to maintain consistency in a typical NoSQL system.
val1 = read(account1)
val2 = read(account2)
newVal1 = val1 - 100
newVal2 = val2 + 100
If I write the data and then immediately read it I will get the new data under all circumstances.
Most of the really difficult problems show up when you have multiple users and still need consistency.
Granted, ACID databases make solving most of these far simpler.
Say for example you have two users on a normal CRUD app.
User 1 opens an object in a web form. User two opens the same objects a few seconds later.
User 1 updates field A. Saves the object back to the database.
User 2 updates field B. (field A is still in the web form at its initial state). Saves the object.
Field B is at the correct state, but field A is not.
It's one reason I'm wary of anti-ORM sentiment I sometimes see around. I'm quite often more productive when I eschew a heavyweight ORM in favour of something closer to the 'metal', but part of that is that I have a lot of background in database systems, so I'm confident that I will generally avoid concurrency mistakes. For a typical programmer, in a developer culture that doesn't want to understand the complexities of data storage, my experience is that they're often better off using an ORM.
One way out of the confusion is to use different words. For example, using "linearizability", "serializability" and "strict serializability" may cut down confusion. These terms have complex-sounding names, but generally very approachable definitions. Aphyr's blog post on different models is a good place to start: https://aphyr.com/posts/313-strong-consistency-models
Another set of terms that approaches a different part of this problem is Harvest and Yield, which can help explain real-world availability and consistency (http://codahale.com/you-cant-sacrifice-partition-tolerance/). Unfortunately, the original paper just seems to add to the term confusion (http://brooker.co.za/blog/2014/10/12/harvest-yield.html).
This is also confusing to people. It means successful response, you can't for example respond with an error and claim the system is still available. I've seen some NoSQL databases claim they were still available because the user was getting an error message back.
There are some elements on the CAP definition traps in the first post: http://thislongrun.blogspot.com/2015/03/comparing-eventually.... I also plan to do another post on this subject.
I was surprised this was not mentioned yet: http://www.infoq.com/articles/cap-twelve-years-later-how-the...
(Eric Brewer revisits the CAP theorem and explains why it's not always a "black or white" case of "choose only two" triangle...)
In the post (and in the blog) I stick to the theorem. It's not 'right' or 'wrong', it's a choice. I made this choice because a lot of people are deciding their trade-offs with CAP-as-a-theorem, while actually CAP-as-a-theorem cannot be applied to the problem they're working on.