

Minimal explanation of the CAP theorem - Dave_Rosenthal
http://blog.foundationdb.com/minimal-explanation-of-the-cap-theorem

======
jayvanguard
This is a very good explanation actually.

> Pro tip: If an article includes a diagram with a triangle, stop reading

Exactly. I've seen many explanations that try to turn it into a trade-off
between three things analogous to the maxim "resources, features, schedule:
fix any two and the third must change".

------
brianpgordon
> Choosing (C) only requires stopping the machines that got disconnected, not
> the whole system!

This post is generally good but I take exception to that statement. Partitions
can occur between nodes. If nodes A and B are partitioned from nodes C, D, and
E, but they all still receive requests, it's not clear which machines have
been "disconnected." Generally you use a quorum of greater than 50% of nodes
to decide which nodes should die. It's a more complicated picture than whether
the network interface is up or down on the server.

~~~
Dave_Rosenthal
Totally agreed that I glossed over how you actually resolve the split-brain
scenario and that it's not just about the network interface being up.

There are a lot of options for how to deal with the issue. For example, in
FoundationDB, a server needs to be able to talk to a majority of user-
designated "coordinator" nodes that are running PAXOS to be a part of the
cluster.

------
notacoward
Unfortunately, this "correction" itself isn't quite correct. "Availability" in
CAP doesn't require that _all nodes_ remain up (point 1b) but only that all
_non failing_ nodes remain up. The article also fails to provide even a
minimal definition of consistency, but those definitions are critical to
understanding what Gilbert and Lynch actually managed to prove. Without that,
I think this fails Einstein's test.

"Everything should be as simple as it can be, but not simpler"

~~~
Dave_Rosenthal
Clearly in 1b a node cannot be up if it is failed--we are talking about the
case nodes remaining "up" when they are disconnected. (There aren't many
interesting tradeoffs available for completely failed nodes!)

Also, the post does define consistency: "Normally, the system is consistent (a
read sees all previously completed writes)."

~~~
notacoward
Yes, clearly a node cannot be up if it's failed, and yet you refer to
"perfect" consistency and "all" nodes. That's the inconsistency (heh) that you
need to fix.

Speaking of consistency, I stand corrected that you did define it . . .
incorrectly. That's a definition of consistency, but it's not the operative
one when it comes to CAP. Gilbert and Lynch specifically refer to atomic or
linearizable consistency in 2.1, and that's a stronger requirement than yours
as it also precludes reading still-in-progress writes. The definition of
consistency is central to the proof, so if you want to put down others for not
understanding CAP it helps to get it right yourself.

