
Please stop calling databases CP or AP (2015) - reese_john
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
======
sradman
Martin Kleppmann applies critical thinking to the claims of NoSQL marketing; I
wish I was aware of his work at the time.

I always found it frustrating that the semantics of the word "Consistent" in
ACID has nothing to do with the semantics of CAP "Consistency". ACID
Consistency refers to integrity constraints while CAP Consistency seems to
refer to Cache Coherence across nodes in a cluster. I was a little surprised
that Kleppmann uses the idea of "Linearizability" to explain the discrepancy
but it does clearly identify the same mistaken assumptions.

I was also frustrated with the NoSQL movement's focus on Partitioned clusters
when my intuition was that it was a very rare failure mode. Perhaps we can now
return to the metrics of data loss (RPO - Recovery Point Objective) and
Recovery Time Objective (RTO) since we seem no closer to fully transparent
failure recovery.

Regardless, the goal was always to have available and scalable databases that
operate on clusters of commodity cloud servers and all of the SQL, NoSQL, and
NewSQL solutions have converged around that goal. The transition to cloud
computing seems to have been more about NoRAID than NoSQL. Some technologies
like LSM-Trees address important use-cases in commodity clusters so maybe we/I
shouldn't focus on the truthiness of the original rhetoric.

~~~
fiedzia
>I was also frustrated with the NoSQL movement's focus on Partitioned clusters
when my intuition was that it was a very rare failure mode.

Network errors are very common among errors database has to deal with, and
particularly common when we are talking about networks between datacenters.

~~~
sradman
Agreed, network errors are very common but the partial failure scenario, with
a fully functional client-server connection but failed server-server
replication connection, seems rare. Maybe I don't read the network failure
post-mortems carefully enough. When routers fail or are misconfigured, all
hell breaks loose rather than clean partitions occurring.

~~~
lclarkmichalek
The classic would be a geographically distributed datastore with intra-
datacenter link being fine, but inter-datacenter being down/degraded. Clients
within the datacenter would still be able to talk to the server fine, but
server to server inter-datacenter is not going to be OK.

Another part of it is that all hell breaking loose can sometimes look very
similar to clean partitions, depending on the network. When network QoS
classes are available, what appears to be increased latency or packet loss for
one class may be 100% packet loss for the other. Though, you're totally right
that all hell breaking loose can be very different - the set of systems that
handle flapping & other ugly connection failures well is smaller than the set
of systems that handle clean disconnections well :)

------
georgewfraser
The CAP theorem is so dumb and yet so influential. If you want to gain a more
useful understanding of the trade offs of distributed systems, I highly highly
recommend the “PACELC theorem” by Daniel Abadi:

[https://www.cs.umd.edu/~abadi/papers/abadi-
pacelc.pdf](https://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf)

~~~
maxov
The posted article briefly discusses PACELC and Kleppmann says he doesn't find
it any more enlightening than CAP.

PACELC is an improvement over CAP in some sense because it is a finer-grained
characterization of distributed systems, but it still suffers from the problem
that the stated properties are too strong. As Kleppmann explains in the
article, there are systems that strictly only fulfill the partition tolerance
part of the equation, yet are still very practically useful. This is because
while they may not satisfy strict linearizability and availability, they have
softer guarantees (e.g. defining "availability" as "99% of requests complete
in 1 second") that we can rely on in practice. This reasoning extends to
PACELC as well.

I am inclined to agree with Kleppmann. CAP, and by consequence PACELC are
useful mental models in the extreme, but their characterizations are really
strong. It seems to me that e.g. probabilistic guarantees based on actual
failure statistics are much more useful for the real world, _because_ they
aren't as strong as CAP/PACELC guarantees.

Edit: I am also not sure if I agree with calling CAP "dumb". It's a very
useful theoretical result in a particular model. I think the problem is when
people try to extend the implications of the theorem beyond the restricted
model it operates in.

~~~
georgewfraser
Klepmann is a wannabe thought-leader, who repackages old ideas into pseudo-
profound “insights”, designed to capture the attention of amateurs rather than
move the state of the art forward. He’s the Nassim Taleb of distributed
systems. If you’re looking for enlightenment, read Daniel Abadi or Andy Pavlo.

~~~
maxov
That seems really unfair to Kleppmann. I know of all three, and all three are,
at the very least, accomplished distributed systems researchers with several
peer-reviewed publications. (They still have different levels of
expertise/publications depending on the area, of course)

In a different arena, I also know of no better book at the interface of
distributed systems theory and practice than Kleppmann's. It's useful for
engineers while being enriched by deep references to the literature. If that
doesn't demonstrate mastery, I don't know what does.

~~~
georgewfraser
OK, my comment was over the top, apologies to Kleppmann if you’re still
reading. He is a thought leader, and he’s the first introduction to
distributed systems for a lot of people, and sometimes people stop with his
work and don’t learn how much depth this field has. But that’s not really a
criticism of him. And nobody should be compared to Nassim Taleb, I was clearly
letting the heat wave get to me ;)

~~~
kronski
To anyone cutting their distributed-systems teeth on Kleppmann's excellent
Designing Data-Intensive Applications: each chapter ends with an _essential_
references section (which include multiple citations to Abadi and Pavlo!).

The book chapters do an solid job laying the ground-work for those papers. The
depth is in those references. Read them if you can!

------
genneth
In reality the application running on top of the database wants C-kinda-A.
[https://research.google/pubs/pub45855/](https://research.google/pubs/pub45855/)

> Despite being a global distributed system, Spanner claims to be consistent
> and highly available, which implies there are no partitions and thus many
> are skeptical. Does this mean that Spanner is a CA system as defined by CAP?
> The short answer is “no” technically, but “yes” in effect and its users can
> and do assume CA.

The point is that if my application will actually go down anyway if the WAN
craps out, then really, I don't actually need P. The application would also be
significantly simpler if it assumes that if the application can work, then the
DB is also up. And one seems that realistic systems built on Spanner simply
assume CA and then get on with life...

~~~
cthalupa
>The point is that if my application will actually go down anyway if the WAN
craps out, then really, I don't actually need P.

Can you guarantee no further modification to the data in the database occurs
when the WAN craps out? What happens to in flight requests, batch/scheduled
jobs, etc.? There's no situation where your application continues modifying
the contents of the database even when there's no WAN link?

I've seen very few real world services where there is a 0% chance of data
changing just because the application servers can no longer talk with clients
but can still talk with the database. What happens when those databases try to
rejoin the cluster and their data is now inconsistent, if you did not take
into account partition tolerance?

------
EGreg
CAP is trivial and doesn’t tell you anything except that you can’t have all
three things perfectly. It says nothing about the various trade-offs.

Often you can just let the people choose whether to accept the latest state of
a particular partition of the database, or wait until it becomes consistent
again.

~~~
srtjstjsj
CAP is trivial like Seinfeld isn't funny. It's so fundamental and well taught
now that we forget there was a time before it was known.

CAP theorem is older than the career of nearly everyone on HN.

~~~
andi999
What I do not understand though is how anyone could have thought that Seinfeld
was funny at some point of time. Edit:

This is not a snarky comment, I really do not understand it. (But extremely
off topic, I admit)

~~~
gandutraveler
Seinfeld was designed to be eventually funny.

~~~
EGreg
Almost every episode culminated in an absurdist interaction of the various
arcs, and it was hilarious.

This isn't funny?

[https://youtu.be/wGhg-htVnJ0](https://youtu.be/wGhg-htVnJ0)

------
Autowired
Always wondered, if the CAP theorem does not account for latency, one assumes
infinite latency does not impact availability. Shouldn't then be possible to
define a theoretical system with an infinite buffer that, in the event of a
network partition, will simply keep all incoming requests on hold (infinitely
or until the partition is undone), satisfying CAP?

I know this is a useless construct in practice, and there is probably a flaw
in my reasoning, but it seems to me that you have to establish a non-infinite
timeout for the proof to be consistent.

~~~
ses1984
If there is a partition and you want to execute a write on both sides of the
partition, both sides can wait, but eventually one side wins and the other
side gets back a failure, so the db turned out not to be available on that
side. At least that's my interpretation.

~~~
Autowired
Well, for a system to be consistent (in the meaning of CAP) AND available, the
problem with the concurrent writes would have to be solved anyway (i.e. via
distributed locking), so I don't think this would be an issue after a
partition event.

~~~
ses1984
That assumes a particular locking system, what if you're using optimistic
locking?

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=9525266](https://news.ycombinator.com/item?id=9525266)

------
exabrial
The best thing ever to happen to interviews was CAP questions to go out of the
way of the dodo in terms of Silicon Valley fashion.

~~~
cheph
Not sure why you are so happy, every time I worked with people who did not
understand it the outcome was really a massive mess of mismatched
expectations.

~~~
exabrial
We're saying the same thing :/ I think I articulated it poorly

------
GuB-42
So the CAP theorem is a bit like Godel incompleteness theorem.

CAP tells us that databases are fundamentally broken, Godel theorem tells us
that math is broken. But besides being important theoretical points, they
don't matter that much in practice. In the same way, the two generals problem
doesn't prevent TCP from working fine.

~~~
burakemir
Nah: the article says the CAP narrative is based on very specific assumptions
which make it unfit to talk about actual distributed systems.

Gödel's incompleteness (that, by a diagonalization argument, formal systems of
minimal expressivity will contain true statements that cannot be proven)
applies and will continue to apply to all such formal systems. The bar is so
low that any interesting formal system is affected by it. Also, it does not
make math broken at all - those formal systems can and in fact must be
consistent for the proof to go through (1st incompleteness theorem), and it's
just a pity that whatever formal system one comes up with, if it contains
arithmetic, it won't be able to prove it's own consistency (2nd incompleteness
theorem).

In the words of von Neumann: Gödel's two past papers ("Widerspruchfreiheit"
and Continuum Hypothesis), are in any case worth more than the total literary
output, past, present and future, of most mathematicians [...]

