
CAP Theorem Explained - rg81
http://robertgreiner.com/2014/06/cap-theorem-explained/
======
room271
Unfortunately, this guy doesn't understand the CAP theorem at all.

Once you are distributed, P is not an optional. Rather, in the case of network
failure, consistency or availability is what suffers. A system cannot be both
CA and distributed.

So, for example, Elasticsearch is not a 'CA' solution despite the diagram in
the article, but is actually closer to PC (although, in practice it is far
more subtle than even that as it is not perfectly consistent, and
configuration options allow for some trade-offs between availability and
consistency in the case of communication errors.

~~~
rg81
There's some debate on the Elasticsearch forums around CA vs. CP - with the
overall consensus in favor of CA. Elasticsearch doesn't perform well across
networks so in practice, most deployments have multiple nodes in a single
datacenter - CA.

Thanks for your feedback though, I have some posts queued up on more of the
subtleties in the model, this was just meant to be an introductory post.

~~~
kainosnoema
Elasticsearch actually sacrifices both consistency and availability during
partitions, even when inside a single datacenter (trust me, they happen in
production).

If you configure ES to prioritize consistency _somewhat_
(minimum_master_nodes), it prevents writes during a partition—but there's at
least one "split" partition scenario where even minimum_master_nodes doesn't
prevent inconsistent writes. If you configure ES to prioritize availability
during a partition, it isn't consistent. Remember, ES doesn't claim to be
consistent and doesn't even use any sort of consensus algorithm.

------
mjb
As others have said, that article is very misleading. Some better ones:

* [http://research.microsoft.com/apps/pubs/default.aspx?id=1926...](http://research.microsoft.com/apps/pubs/default.aspx?id=192621) \- including a very nice way of thinking about CAP and tradeoffs)

* [http://www.infoq.com/articles/cap-twelve-years-later-how-the...](http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed) \- a good perspective on CAP, and why many people still don't understand it clearly.

* [http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pd...](http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf) \- don't get put off by the formal language. Gilbert and Lynch is still (IMO) the best explanation of what CAP means, and what it implies.

* [http://blog.cloudera.com/blog/2010/04/cap-confusion-problems...](http://blog.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/) \- Some good criticism of the way CAP is frequently explained.

* [http://codahale.com/you-cant-sacrifice-partition-tolerance/](http://codahale.com/you-cant-sacrifice-partition-tolerance/) \- Why CA systems don't actually exist.

* [http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf) \- PACELC, maybe a better model.

* [http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-an...](http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html) \- Another look at PACELC, with some examples.

------
SEJeff
For those who want to read about real tests of how various common software
fares under network partitions, here is some amazing work done by aphyr and
his infamous Jepesen tests:

[http://aphyr.com/tags/jepsen](http://aphyr.com/tags/jepsen)

This post has some info on elastic, but nothing in excruciating detail:
[http://aphyr.com/posts/288-the-network-is-
reliable](http://aphyr.com/posts/288-the-network-is-reliable)

~~~
SEJeff
Won't let me edit it, but there is now a full test on elasticsearch here:
[http://aphyr.com/posts/317-call-me-maybe-
elasticsearch](http://aphyr.com/posts/317-call-me-maybe-elasticsearch)

------
nwjsmith
You can't sacrifice partition tolerance: [http://codahale.com/you-cant-
sacrifice-partition-tolerance/](http://codahale.com/you-cant-sacrifice-
partition-tolerance/)

~~~
angersock
I'm sorry, but when I read this, I thought of SimCity...

"YOU CANT REDUCE FUNDING TO PARTITION TOLERANCE. YOULL REGRET THIS."

I wonder if anybody's made SimDataCenter?

~~~
dllthomas
_" I wonder if anybody's made SimDataCenter?"_

That would be really interesting.

------
mycodebreaks
In many talks and presentations, I have heard that only eventual consistency
was possible in a distributed system. since, A and P are already needed, you
can only compromise on C.

Therefore, I am curious in knowing. Here's my question: how do they configure
production systems in order to get reasonable or 100% maybe consistency? In
other words, the reads must factor latest write into account. No, stale reads.

------
doverton
"Consistency around CAP is similar to what you find in a typical ACID model -
except that, now, we're in a distributed model."

I thought that consistency in ACID meant that data was always consistent with
the rules of the database, whereas in CAP it means that the same data held in
different locations is the same? Is that not right?

~~~
Dave_Rosenthal
You are right. The article is totally wrong.

From Eric Brewer himself: "The relationship between CAP and ACID is more
complex and often misunderstood, in part because the C and A in ACID represent
different concepts than the same letters in CAP..."

(from [http://www.infoq.com/articles/cap-twelve-years-later-how-
the...](http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-
have-changed))

