
Visual Guide to NoSQL Systems - nathanh
http://blog.nahurst.com/visual-guide-to-nosql-systems
======
jbellis
Submitted <http://news.ycombinator.com/item?id=1191075> because a lot of
people in this thread could use an explanation of what CAP actually means. :)

~~~
strlen
I think P should be an S ("split"), a lot of people seem to confuse partition
in the context of "partitioning a database" with a network partition ("split-
brain" scenario).

Perhaps a better example to use should be a distributed chat room.

------
strlen
This doesn't look entirely right to me.

Some comments:

Some RDBMS actual perform background asynchronous replication. If you're doing
your reads from slaved replicas, you've got an "eventually consistent" system
("A{possibly, P}"). If you're only doing reads from the master node, you're no
longer highly available. If you're "promoting" slaves in the case of a master
failing, you're _AT_ best back to the case of eventual consistency (if the
master comes back and the replication backlog is replayed correctly) or you
might not be consistent at all. To echo others, let's call this "potential
consistency".

Eventual consistency isn't something Amazon decided to make up, we've dealt
with eventually consistent systems on a daily basis long before (DNS, usenet,
some distributed file systems, etc...).

BigTable and co. are not "CP" and much more true to the "CA" definition than
say MySQL replication or Postgres + Slony. They use protocols like Paxos with
leases (BigTable) and Zab (used by ZooKeeper based systems). These are
modified 3PC protocols: unlike 2PC protocols, they still guarantee fault
tolerance ("A") even if the coordinator node fails; unlike pure 3PC protocols
they offer certain guarantees of liveliness (e.g., through the use of leases)
and even some degree of partition tolerance. Still, Google had to build
Spanner on top of BigTable to provide partition tolerance / true multi-
datacenter operation.

~~~
cx01
> If you're "promoting" slaves in the case of a master failing, you're again
> back to the case of eventual consistency.

I don't think so. If you use asynchronous replication and the master fails
directly after acknowledging a new request but before forwarding it to the
slaves, this request may be totally lost. So you'd have no consistency at all
(or at least not eventual consistency).

~~~
strlen
My bad. Editing the original comment. You're completely right, I forgot the
case of master failing before all writes made to it are replicated.

------
gruseom
Given that it's been proven that there is a tradeoff between C, A, and P, this
triangle is a good idea, one of those ideas that is obvious as soon as you see
it (but not before).

It's fascinating how much disagreement there is in the comments, here and in
the OP, about where to place the various systems. We know that a system can't
occupy more than one edge, yet apparently it's non-trivial in several cases to
decide which. I wonder why? Are people are defining terms differently? Are
some of the systems themselves poorly specified?

~~~
cx01
It's unfortunate, but a lot of people participating in these discussions have
no clue. They read that CouchDB claims to support "ACID" and then conclude
that it fulfills the C in CAP. Now in reality, CouchDB isn't ACID compliant
because it doesn't even support transactions. Additionally the C in ACID is
not equivalent to the C in CAP.

Or take a look at this sentence from the article "Consistent, Available (CA)
Systems have trouble with partitioning and typically deal with it with
replication and sharding.". I assume that instead of "partitioning" he meant
"partitions". Even then, partitions can not be dealt with by sharding in any
sense. And mentioning "replication" is totally misplaced here because
obviously all of these data stores use some kind of replication.

~~~
by
"Additionally the C in ACID is not equivalent to the C in CAP."

It would help if the people writing papers about CAP knew what the C in ACID
meant before they wrote their papers.

Here is the relevant paragraph from the paper "Brewer's Conjecture and the
Feasibility of Consistent, Available, Partition-Tolerant Web Services"
Created: 10/08/2003 14:42 [http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-
SigAct.pd...](http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf)

"Most web services today attempt to provide strongly consistent data. There
has been significant research designing ACID2 databases, and most of the new
frameworks for building distributed web services depend on these databases.
Interactions with web services are expected to behave in a transactional
manner: operations commit or fail in their entirety (atomic), committed
transactions are visible to all future transactions (consistent), uncommitted
transactions are isolated from each other (isolated), and once a transaction
is committed it is permanent (durable). It is clearly important, for example,
that billing information and commercial transaction records be handled with
this type of strong consistency."

Notice it says "... committed transactions are visible to all future
transactions (consistent), ..."

This is not correct because the data can be overwritten by a subsequent
transaction.

In this version of the same paper
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdf)
Created: 09/11/2007 01:17 the paragraph is the same except it says

"... transactions never observe or result in inconsistent data (consistent),
..."

If we go back to the people who seem to have coined 'ACID' in this paper
"Principles of Transaction-Oriented Database Recovery" Theo Haerder and Adreas
Reuter 1983. [http://www.minet.uni-
jena.de/dbis/lehre/ws2005/dbs1/HaerderR...](http://www.minet.uni-
jena.de/dbis/lehre/ws2005/dbs1/HaerderReuter83.pdf)

We see

"Consistency. A transaction reaching its normal end (EOT, end of transaction),
thereby committing its results, preserves the consistency of the database. In
other words, each successful transaction by definition commits only legal
results."

and later they say

"... a database is consistent if and only if it contains the results of
successful transactions. Such a state will hereafter be called 'transaction
consistent' or 'logically consistent'. A transaction, in turn, must not see
anything but effects of complete transactions (i.e., a consistent database in
those parts that it uses), and will then, by definition, create a consistent
update of the database."

that is for consistency the trasaction only has to have isolation and to
commit consistent results, not 'observe' consistent data. This is because the
database up to the point where the transaction starts has been created by a
number of committed transactions against an originally empty database. By
induction the data the transaction 'observes' must be consistent because the
previous transactions have only committed consistent results.

I do not know what the C in CAP means.

~~~
cx01
> Notice it says "... committed transactions are visible to all future
> transactions (consistent), ..." > This is not correct because the data can
> be overwritten by a subsequent transaction.

It seems the paper is completely wrong here. AFAIK the C in ACID means that
the integrity constraints in the database will always hold, i.e. no foreign
key may reference a non-existing row. This would explain why they changed it
in the later version of the paper.

Your reference to the Haerder, Reuter paper is interesting. It seems they kept
the definition of consistency pretty vague.

~~~
by
Yes, that seems correct. The Wikipedia entry explained consistency best for me

<http://en.wikipedia.org/wiki/ACID#Consistency>

which is as you say with the keys for normal databases.

It made clear for me the distinction between application level consistency,
the bits that have to be enforced by the program, and database level
consistency, which can vary depending on the sophistication of the database,
and is enforced by the database. I hadn't really thought about that before. I
suppose you could have a database that was less clever and left most of the
consistency to the application but still be ACID compliant so long as you
defined the scope of the database consistency carefully.

------
viraptor
I'm not familiar with some of the solutions, but some of them got me
wondering:

\- Why is Tokyo Cabinet on the AP line? It's not distributed in any way by
itself. It's protected by a single write lock, so it's definitely consistent.
If they mean Tokyo Tyrant which provides a simple way of replication (even
master-master replication), then...

\- Why is MySQL _only_ on AC line? If you do master-master setup with it, it
could be on the AP line too. (and probably any relational db that does
replication / clustering)

~~~
javery
Partition Tolerant really means that it doesn't rely on replication/sharding
which means something like Cassandra that is truly distributed.

~~~
viraptor
Then TC is definitely in the wrong place, even if they meant TT, because it's
simply replicated.

MySQL Cluster on the other hand.... It does have a consistent view works on
many nodes and in some configurations doesn't depend on the number of data
nodes. It's properly distributed, not replicated. Doesn't seem to violate any
of C A P. What am I missing in this example?

~~~
cx01
Mysql Cluster performs 2phase commit for every transaction, so availability is
not guaranteed.

------
sjs
You can tune Riak to have whichever your qualities you need with its NRW
values. N is the number of replicas of your data which can be specified per-
bucket. R is the number of nodes that have to successfully respond to a read
operation before it's considered to succeed overall, and W is the number of
nodes that must respond to a write. R and W are specified with each read and
write operation which is quite flexible.

If you want consistency you push R and W closer to N. If you need availability
and partition tolerance then you lower R and W. You get to choose the mix of
CAP that you need, for each bucket and each request.

I'm a Riak noob but this is my understanding of the system so far.

------
oomkiller
Since Riak has tunable NRW parameters, it could be in any of these groups.

------
wanderr
Can anyone explain to me how redis supports partitioning? I'm still learning
but so far I haven't found any support for it.

~~~
cx01
You mean something like "store all keys with an even hash value on server X,
and all keys with an odd hash value on server Y"? You can do this yourself in
your application: Just hash the value and then send the request to the
corresponding server.

~~~
wanderr
How is that any different than how you would do it with MySQL? If both
products have the same solution to the same problem (handle it in your
application) then why aren't they on the same edge?

~~~
cx01
Read the comment by "strlen". Partitioning and network partitions are two
totally different things.

~~~
wanderr
I did, but I still don't see how redis and MySQL differ in how they handle
either type.

~~~
cx01
If you want to partition (shard) your data, then this will work roughly the
same in MySQL and Redis. But this article is only about partition tolerance.
Neither MySQL nor Redis are inherently distributed, so it's not possible to
generally state that either is partition-tolerant or not. The article is
misleading here. You could use a replication strategy that is partition-
tolerant or you could use a strategy that's not partition-tolerant.

~~~
wanderr
Thanks, that's what I thought. I was hoping I was wrong. :P

------
jcdreads
Oracle Coherence (formerly Tangosol) can be configured to be more or less
anywhere you like on this triangle, including persisting everything to disk.
Makes me sad that it's so expensive, and it chats a lot if you replicate fast-
changing data across too many nodes, but man it sure is nice to have around.

------
javery
I could be wrong but it seems like MongoDB isn't really partition tolerant
since it relies on replication or sharding (just like an RDMS). I think it is
more of an AC (with a caveat on the C)

~~~
mark_l_watson
As you say, MongoDB has support for replica pairs that are partition tolerant,
but part of this functionality is in client libs. For Ruby, you register both
dual masters in a replica pair with the client library. This seems like a
practical approach to me, or at least fairly easy to work with.

~~~
javery
replica pairs are replication though, they aren't truly distributed. (they
negotiate which one is master on launch)

------
mark_l_watson
Nice! A well thought out reference. One thing: I wish that people would add
RDF data stores to NoSQL discussions.

~~~
sid0
<http://quotes.burntelectrons.org/683> comes to mind. :)

------
sabat
Not entirely sure why Postgres, MySQL are in a NoSQL guide -- maybe the title
should be different? Interesting chart to look at, though.

FWIW: yourtinnitus.com is down right now.

~~~
dws
"Compare and Contrast." They're on the chart to show what your options are
when you select CA from CAP, and to provide a basis for comparison with the
other options.

