
An Illustrated Proof of the CAP Theorem - networked
https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/
======
dustingetz
"CP/AP: a false dichotomy" [https://martin.kleppmann.com/2015/05/11/please-
stop-calling-...](https://martin.kleppmann.com/2015/05/11/please-stop-calling-
databases-cp-or-ap.html) . Martin Kleppman is the author of "Designing Data-
Intensive Applications: The Big Ideas Behind Reliable, Scalable, and
Maintainable Systems" [https://www.amazon.com/Designing-Data-Intensive-
Applications...](https://www.amazon.com/Designing-Data-Intensive-Applications-
Reliable-Maintainable/dp/1449373321)

~~~
uluyol
I think that the CP/AP dichotomy is a good example of how we treat
fundamental, but not hard tradeoffs as hard.

E.g. often we have a fundamental tradeoff between latency and throughput, and
it's impossible to get 100% in both metrics. However, we can still do very
well in both, and that's what matters in practice.

We have the same thing with CAP. You _can_ build a CP system with high
availability.

~~~
zzzcpan
Fundamental tradeoffs in CAP are latency/time for consistency, since real
world networks are always partitioned. You are just using a different meaning
of availability when talking about high availability, not the one from CAP.

------
gweinberg
Once it's clear what the terms mean, the theorem is so obvious as to scarcely
need proof. If G1 and G2 can never talk to each other, of course you can't
write data to G1 and read it back from G2.

I think the point is more about timing. If your client writes to G1, then you
pretty much have to wait until the write has propagated to the entire network
to acknowledge the write or accept some risk that some other client will read
back stale data after the write has been acknowledged. I should think this
would be obvious to coders also, but it is not at all obvious to managers.

I say "pretty much" above because there is technically a ridiculous third
option where you essentially lock down the whole network with every read. But
it still doesn't get you around CAP.

~~~
thanatos_dem
The combination of partition tolerance being mandatory and the timing caveat
is why I prefer to think in terms of an extended CAP theorem called PACELC[1].

The first part, PAC, is your traditional CAP theorem - in the presence of
partitions (P), you can provide either availability (A), or consistency (C).
The second part, ELC, describes the system characteristics during the normal,
non-partition case. It reads as "else (E), you can provide either low latency
(L) or consistency (C)".

Even tho apart from some curious outliers most systems are either PA/EL or
PC/EC, I find the framing helpful for reasoning about a system in more than
just the partition or failure case.

[1]
[https://en.wikipedia.org/wiki/PACELC_theorem](https://en.wikipedia.org/wiki/PACELC_theorem)

------
Smaug123
I'm not really a fan of that proof because it's a proof by contradiction in
the least helpful sense. It doesn't tell you _why_ the theorem is true; only
that some assumption was wrong. Much better would be if the proof were
accompanied by examples of systems that had CA but not P, CP but not A, and so
on, to show that this result is best possible.

~~~
gregorygoc
Proof by example or analogy is not a proof.

~~~
federicoponzi
Proove that you can walk. Walk. Prooved by an example. Basically to prove that
something exists or that a property is not valid for every element of a set,
you can proove with an example.

~~~
patrickthebold
Proove that you can't walk. Don't walk.

Hmmmmmm.

Of course you can prove something exists by an example. And you can prove a
property does not always hold with a counter example.

But if you want to prove that something doesn't exist, or a property always
holds, then you shouldn't be looking for examples.

~~~
federicoponzi
This is exactly what I've said. My initial "proof by example" was an answer
for the op's statement:

> Proof by example or analogy is not a proof.

So as you also said "Of course you can prove something exists by an example".

To answer this: > Proove that you can't walk. Don't walk.

As I said, proof by example works only

> to prove that something exists or that a property is not valid for every
> element of a set.

Divide your lifetime in seconds. With that, you're trying to proove with the
property "walk" isn't valid for every element of the set lifetime. To proove
that the property can't walk dosen't hold for every element of the set, just
show an example of second while you were walking and you're done.

------
blt
This proof is missing a step showing that the problem exemplified by the
simple two-node system applies to _all_ system topologies. This website only
proves: "there exists a system for which CAP is impossible." I do not think
the generalizing argument will be complicated, but it should be included.

~~~
curlypaul924
I noticed this as well. Now I'm wondering -- how would one generalize this
proof, keeping the same easy-to-grok style?

------
gweinberg
Once you understand exactly what is meant by the terms, the theorem seems
immediately obvious. If your servers G1 and G2 cannot communicate with each
other, of course you can't write data to G1 and read the updated data from G2.

------
mcintyre1994
What if you added a rule that if 2 nodes lose communication then after some
period T the lowest ID node kills itself? And you add a delay of 2T to all
client/server communication to give that a chance to happen.

Then in the example, client writes to V1, V1 waits T time to communicate with
V2, it doesn't manage to so it returns failure to the client and kills itself.
Then the client writes + reads to V2 instead.

Does this break availability because returning an error that you can't act on
a client request counts as ignoring it in this informal definition?

~~~
sobani
Returning an error is basically saying "I'm not available", because there is
no guarantee there is an alternative.

Your example can't deal with there being no path _at all_ between V1 and V2.
So even the client can't reach V2. In that case the entire system V is
unavailable to the client.

Availability in CAP is about being able to reach a node, but not all of them
and because of that the system doesn't function. I guess killing a node when a
problem occurs can be argued to defeat CAP (all the nodes that are up can
reach each other), but it definitely doesn't improve the situation.

Also what does you system do when V5 out of a 5 node cluster crashes? Do the
nodes V1-V4 kill themselves, because they can no longer reach a higher ID
node?

~~~
mcintyre1994
Ah I see what you mean, fair point - I'd agree that doesn't really work!

TBH my 2-node system was a simplified version of Corosync which I've worked
with a bit, in that you allow quorum if you have half+1 of nodes up - so 4/5
up will maintain quorum. The highest-node lives rule is just a tie breaker in
the case of a half/half split which the 2 node system always has when one node
goes down. Good point though, that rule alone is definitely not a good idea.

------
spraak
Is partition intolerance the only piece of the three that disproves it? I'm
curious to know more about how partition intolerance is handled.

~~~
apengwin
The proof only shows that you can't get all three CAP. You can get any two out
of three though.

In practice, partition tolerance is the most important property to have, since
there's no point in having a distributed system if it can't function if any
part of the system is down (it has to be fault tolerant).

~~~
MaxBarraclough
> partition tolerance is the most important property to have, since there's no
> point in having a distributed system if it can't function if any part of the
> system is down

This is really a question of definitions, no?

If partition-tolerance is a defining property of a distributed database
system, then of course they all have it. Plenty of database systems don't have
partition-tolerance: non-distributed database systems.

~~~
annabellish
Partition tolerance is not a defining property of distributed systems, just a
very valuable one.

You only really want to sacrifice partition tolerance if the precise
correctness of your data is not tremendously important. A CA system can always
respond and appear correct, but network partitions result in desyncs. I
believe this approach is fairly popular in video games, where a CA system for
multiplayer can provide each client with a coherent gameplay experience at the
expense of things just breaking if a partition forms. Sorry, game over, please
try again.

Most serious systems, though, require P.

One potentially interesting example that uses the current fashion of the day
could be to compare payment networks. Visa's distributed payments system is
CP, which is to say that it's a consistent system and getting a successful
response back means your payment has gone through, but you have no guarantee
of getting a successful response. The bitcoin network is AP, which is to say
that it's a highly _available_ system which will never fail a response (so
long as you can reach one node), but getting a successful response is no
guarantee that your payment has gone through.

In all cases, the sacrificed letter is not gone, merely imperfect. This does
not mean that Visa is not Available, it just isn't _perfectly_ Available. It
does not mean Bitcoin is not Consistent, it just isn't _perfectly_ Consistent.
It does not mean that your video game is not Partition Tolerant, it just is
not _perfectly_ Partition Tolerant.

You can sacrifice any of the three and still have a distributed system, it
will just behave differently.

------
std_throwaway
Are there solutions that don't meet the formal definition but still perform
just as good as if they did in practical situations?

(As in the formal definition is not what we actually need to get a consistent
and reliable system for our real-world application. We need something a little
less and then suddenly it's possible to get a similar solution that actually
provides C A and P.)

~~~
fjsolwmv
Google Spanner

------
tlug
Is it just me, or the proof appears to be incorrect? In the last example, the
write operation should simply block before returning "done" to the client,
before it's able to replicate the state to the other server. In fact, in a
fault-tolerant system there cannot be just two servers, because it would be
impossible to achieve a majority vote.

~~~
DougBTX
The example looks fine to me. If the writer blocks until the partition is
resolved, then it isn’t Available for writes, so only meets CP.

~~~
tlug
OK, if you define the availability of writes in this way, that they must
complete immediately, then indeed the availability part of CAP is not met.

However his definition reads: "every request received by a non-failing node in
the system must result in a response"

It doesn't say anything about the timing of response. The network partition
will be resolved eventually and thus the write operation will complete. Or it
can time out and return an error, which is also fine based on the definition
of availability (must result in _a response_ ).

~~~
somenewacc
Yes, but if you're allowed to timeout, then any system is trivially available,
even a system where the nodes are always offline (simply timeout every
request).

------
otakucode
I know this is probably a stupid question, but please humor me. Since the
Client node has bidirectional communication with both servers, why can it not
be used as a channel to facilitate communication between the two servers when
a direct link is not available?

~~~
pram
I think in that case the client would essentially be another broker. It would
need to understand the cluster topology and manage replication awks etc. It
would resemble a p2p protocol like BitTorrent.

I think that might also make split brain situations more likely if all the
client brokers didn’t have the entire cluster topology known. You’d end up
with different data in different places if the clients are partitioned in
addition to the brokers.

------
cconroy
this proof is sound (for this simple structure) but does it imply you can
never have all three (CAP) in some scheme where you can draw from many nodes
and edges (not infinite) to handle partition failure while always being
available and consistent? Perhaps a proof that you need to draw from infinite
nodes and edges to achieve CAP would be interesting.

disclaimer: i am not well educated on the literature in Distributed systems :(

~~~
legacynl
I was thinking the same thing. The ability to handle partition failure comes
from having more nodes. This proves this is not possible when haviing just two
nodes. Kind of makes sense as it stops being a distributed system if you
disconnect all nodes.

~~~
y4mi
The CAP Theorem thinks about guarantees, not probabilities.

There are various ways to reduce the impact and probability of partitions, but
increasing the quantity of nodes does not make partitions impossible -- in
fact, you now have to worry about n possible partitions (n being the number of
nodes).

the only way to guarantee partition tolerance is by majority votum. If you do
that, you can no longer guarantee availability. For example, a cluster of 11
nodes might be partitioned thrice. (2 + 4 + 5) None of the nodes are allowed
to answer, breaking the availability guarantee.

------
jmd1
Google’s Spanner database is very likely an example of overcoming CAP. There’s
more details here:
[https://ai.google/research/pubs/pub45855](https://ai.google/research/pubs/pub45855).
From what i’ve read, Google is able to do this because they control the
physical data transfer infrastructure (the nuts, bolts, cables) which in many
cases is the reason why CAP is defeated.

~~~
lucio
>Does this mean that Spanner is a CA system as defined by CAP? The short
answer is “no” technically, but “yes” in effect and its users can and do
assume CA.

>The purist answer is “no” because partitions can happen and in fact have
happened at Google, and during (some) partitions, Spanner chooses C and
forfeits A. It is technically a CP system. We explore the impact of partitions
below.

...

>Conclusion

>\---------

>Spanner reasonably claims to be an “effectively CA” system despite operating
over a wide area, as it is always consistent and achieves greater than 5 9s
availability.

From the paper

------
akeck
Is there a paper about proving whether or not an arbitrary system is subject
to the CAP theorem?

------
vortico
Having never seen this theorem before, it seems so trivial that it's
completely useless. If you have no guarantee of having connected systems, then
of course you can't always have consistency between them if they're required
to answer all requests. Am I missing something here? What are the applications
of this idea, beyond assigning a name to a triviality?

~~~
mping
The immediate application of it is that when you choose a DB, you should be
able to tell if it's either CP or AP. The field is full of subtleties (eg
Network is not 100% reliable so no system should be CA - it's way subtler but
that is the gist of it. I'm no expert btw).

Lots of DB vendors have tried to circumvent the therorem by introducing their
own concepts or stretching the definitions. "We can beat the CAP theorem"

~~~
pdpi
is CA even possible in a distributed system though? I mean, it’s not like “the
network never fails” is a design decision you’re allowed to make (short of
removing the network altogether and therefore not being a distributed system
at all)

~~~
lucio
That's what Google spanner is claiming, that Spanner is Consistent and highly-
Available (nine fives, or 99.99999% of the time available), which "in
practice" they say is "CA".

~~~
teraflop
From the perspective of the CAP theorem, Spanner is unquestionably a CP
system, as the original paper [1] makes clear. (It relies on Paxos to elect
leaders, and Paxos chooses consistency over availability.)

The "CA in practice" claims are purely marketing.

[1]: [https://storage.googleapis.com/pub-tools-public-
publication-...](https://storage.googleapis.com/pub-tools-public-publication-
data/pdf/39966.pdf)

