
Please stop calling databases CP or AP - martinkl
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
======
haberman
The article convinced me that talking about databases as CP or AP is not very
useful.

The article did _not_ convince me that CAP-centered thinking is itself harmful
or counterproductive. It's true that C-Consistency and A-Availability are
relatively blunt definitions in a landscape of diverse and subtle possible
semantics. But their usefulness is that they represent our intuitive notions
of what a database _should_ do.

If it were possible and had good performance, every single database system
would provide C, A, and P, because it would make distributed databases _much_
easier to reason about and program against. Intuitively, it really seems like
a database _ought_ to read the data that was most recently written. And it
sure would be nice if a distributed database could somehow magically remain
available and consistent even in the presence of a partition.

C, A, and P are useful because they are a yardstick against which we can
measure a distributed database's actual capabilities against what a
distributed database could do in an ideal world. The real world will fall
short of that, and the CAP theorem gives a vocabulary for which part(s) of the
ideal world real databases cannot satisfy.

For example, even in this article CAP terminology is used to describe
interesting edge cases in ZooKeeper which, despite having generally strong
consistency guarantees, does not always guarantee full capital-C Consistency.

~~~
tracker1
It really depends on the amount of writes your database needs to handle, and
the overall quantity of data. There are very strong CAP systems (Zookeeper
comes to mind), but I wouldn't want that for very large quantities of data, as
it wouldn't be able to keep up.

For example, if you are doing logging for a few million simultaneous users
(say 5-8 per resource request across service layers), a single system wouldn't
be able to keep up, and definitely a single system that has to coordinate each
write as an atomic/consistent state.

The fact is, depending on your needs, you will need to sacrifice _something_
to reach a scale needed by some systems and to minimize down time. It's a
trade-off. And any distributed database will have real-world weaknesses.

~~~
haberman
> There are very strong CAP systems

There are not strong CAP systems -- that is the whole point of the CAP
theorem. It's impossible. Am I misunderstanding you?

~~~
GauntletWizard
No, zookeeper doesn't do magic. It does, however, create a large enough value
for P to be lost that it can be thought of as CAP; The number of failures
required is high enough that, in normal use and sane configurations, you won't
partition.

~~~
necubi
P (partition tolerance) refers to _network_ partitions. There's nothing a
distributed system can do to prevent network partitions; they're a property of
the network. The question is, in the face of partitions, what does the
software do? It basically has two options:

* Respond to requests, even though it may not have the most up to date information. I.e., it sacrifices consistency. These are AP systems.

* Not respond to requests, in which case it sacrifices availability. These are CP systems, of which ZK is one.

In particular with ZK, if you lose quorum (i.e., the cluster has fewer than (n
+ 1) / 2 active nodes where n is the cluster size) the cluster (or partition
thereof) will become unavailable in order to avoid sacrificing consistency.

~~~
sukilot
So it should be called the CA theorem, right? You can't choose partition-
intolerance.

~~~
necubi
Non-distributed databases (like postgres) are sometimes considered to be CA.
But I think it's clear that CA doesn't make sense in a distributed system.

------
jbangert
very broad theorems like CAP (and e.g. The halting problem) are still very
useful for proving things by reduction. Instead of talking about nodes
failing, you could talk about a network partition isolating just that one node
(sure, the node might process some things on its own, without input, or not
--because it crashed-- but for CAP that is totally out of scope, and part of
the underlying algorithm). Instead of talking about multi object transactions,
it talks about anything that implements a register - which includes multi
object transaction systems.

CAP therefore shows what real-life trade offs are feasible in the first place
( of course, there are still systems allowed by CAP that might be impossible
for other reasons).

~~~
mjb
Exactly.

CAP is one of the borders of the space. It's popular because it's relatively
easy to look at a claim and see if it's on the "possible" or "impossible" side
of the border. Tools like that (as in your halting problem example) are
extremely useful, because they can be used to cut through a lot of bullshit
masquerading as subtlety.

On the other hand, as OP's post does a good job of explaining, CAP is not a
good tool for exploring the area inside the boundary. We often use it to do
that, and it leads to all sorts of awkwardness and miscommunication. I think
OP captured that well, and it's an area I'd really like to see some
alternatives getting popularized (harvest and yield, PACELC, etc). Simplified
rules are useful in this space, because while it would be very useful for all
practitioners to understand the whole taxonomy, there's a lot of material to
understand there.

------
jellofiend84
What I dislike about articles like this is the author doesn't really talk
about what we should be calling databases.

While calling one CP or AP may not correctly capture all the nuance of what
those terms really mean it does give the user a quick idea of the capabilities
of the database without having to go into a mutli-paragraph technical
breakdown.

It's like describing a house as "orange", there are a million shades of
"orange" and it might actually be closer to red or yellow but it gives a quick
idea of what the house looks like. I feel the description CP and AP does the
same, it gives a quick overview of the capabilities and if that interests you,
one should be able to dig in deeper to the more technical details.

~~~
martinkl
There are some pointers to better terminology at the end of the post — e.g. I
like Terry's session guarantees (read-your-writes, monotonic reads, etc) — but
they too only cover a small subset of the design space.

In the end, this stuff is still evolving, and we simply don't yet know the
best way of describing things. I wouldn't want to propose some alternative
classification scheme that would be no better than the one it replaces. But I
do want to encourage people to be precise and thoughtful with the terminology
they use.

~~~
jellofiend84
I agree with most everything you've said, but until we have good concise
descriptions I think it is unfair to really say: "Never use these".

Quick less-precise overviews serve a real purpose and are very helpful to
users and decision makers.

I guess my opinion is projects should never ONLY say it is CP or AP, but I
think it is perfectly acceptable in your project overview to say: This
database is CP*. Then have a longer technical discussion somewhere else where
they use more precise and thoughtful language.

I understand and agree with the sentiment, but we humans need fuzzy short
overviews. Then again a blog post titled: "All databases need an in-depth
technical discussion page" doesn't quite capture the imagination.

------
ddispaltro
It's natural to reduce things to simple terms. I think CAP theorem caught on
because previously people didn't know how to characterize distributed systems
in an easy way and this gave them a framework and vocabulary to do it.

It's the same reason why companies create some index in their product (like
New Relic AppDex), to take something complex and simplify it so it can be
compared.

------
antirez
The article is good, however probably there is some issue here: "so quorum
operations are not CAP-available (at least temporarily, until the database
sets up additional replicas on the minority side)."

Availability is an eventual property, so technically speaking if the minority
partition within a non-infinite amount of time always recovers and is able to
provide a reply for our query, it is Available from the POV of CAP. However
this requires the DB to be always able to create new replicas in the minority
side and eventually reply to the query, and I'm not sure this is the case in
Dynamo-alike systems when there is a single-node partition.

~~~
simongui
What you describe is not how quorums work in the Dynamo paper or for example
in Riak (a Dynamo implementation). It does not require the DB to create
replicas on the minority side of a partition. The Dynamo paper (and Riak
implements this) includes Sloppy Quorums to increase availability.

------
MCRed
I agree. I feel it's time for a better, modern, clear elucidation of these
issues- something of which the CAP theorem would be a subset.

And on top of that, I'd love a simple test harness that will run a distributed
database thru its paces and produce a report.

Call it Bender, or something and then we can talk about the Bender
characteristics and the Bender Report and say "this database passed the Bender
report" or "bender found this database fails in this way during partitions: A
& B".

This would make it relatively easy to compare systems relatively objectively.

Too often well meaning people have chosen poor databased for production
systems (including bitcoin financial transactions companies!) because they
didn't understand the issues well enough.

Such an open source project would be open to criticism and improvement and
hopefully reasonable objective, and thus take it out of the realm of vendor
propaganda and into the realm of broad understanding.

~~~
muaddirac
Jepson seems to do a lot of what you want in Bender:
[https://github.com/aphyr/jepsen](https://github.com/aphyr/jepsen) and
[https://aphyr.com/tags/jepsen](https://aphyr.com/tags/jepsen)

------
ble
Impossibility results are really good for proving that one or more things are
impossible.

The CAP theorem is an impossibility result.

~~~
josteink
Looking at the discussion here in this thread, among these fine "experts", it
seems a majority believe there are "CAP capable" systems out there.

Well I'm certainly impressed.

------
jongraehl
The article presents a really dumb argument: that because CAP only proves it's
impossible to have CP and AP for a single record, and we care about more than
one record, we shouldn't use the CAP impossibility result.

> The CAP system model is a single, read-write register – that’s all. For
> example, the CAP theorem says nothing about transactions that touch multiple
> objects: they are simply out of scope of the theorem, unless you can somehow
> reduce them down to a single register. [WHICH YOU CAN. DUH. SINGLE-REGISTER
> IMPOSSIBILITY OF CONSISTENCY APPLIES TO ALL N>=1]

------
vogan42
When a partition happens, you cannot communicate, you can choose to be
available, or consistent.

If you choose to be available, for all operations at all disconnected sites,
you may need to resolve conflicts in some fashion when or if the partition
resolves.

If you choose to be consistent, you may need to deny operations to some
portion of sites or for some portion of operations.

I find the ATM example to be very useful, I also try to whenever possible
understand the system in terms of ledgers and accounting...they have been
solving these problems also for a long time.

The formal definitions are very useful, and I would argue have unfortunately
been lost in the noise. People are often trying to understand and compare many
different and sometimes quickly evolving technologies, and these working
definitions and misinterpretations seem to be largely related to
misunderstandings or in some cases fit to the terminology of a specific vendor
(which can be very confusing!)

It is also very useful to test your use cases and what you can tolerate in
terms of consistency, latency, and availability with specific implementations.

I think a lot of the problem is that marketing material and documentation have
collided in a similar fashion to reporting and advertising.

The unfortunate message seems to be: buyer beware

I think a similar thing has happened probably many times before, relational
databases for instance and isolation levels are rarely discussed in terms of
the formal definitions and much more often the vendor specific definitions.

My maybe much weaker message/ take away would be: define the terms that may be
ambiguous so that you can get to the real fun of the ideas !

------
Dylan16807
It irks me whenever something trying to clarify CAP insists that you can't
choose CA. You can. In the real world you can't _avoid_ partitions, but you
don't have to _tolerate_ them.

The rule for Availability is that non-failing nodes must respond. If you
detect a partition, you can fail the node. Make a system that shuts down when
it loses a connection and you can be CA.

~~~
fred256
Maybe I'm missing the point you're trying to make, but a node that's shut down
isn't exactly available.

~~~
Dylan16807
It's not the one node that shuts down, the whole system flicks a switch and
ceases to be. It's incompatible with partitions.

So while you end up with something that's for most purposes worse than CP,
it's still a _possible_ configuration.

Availability does not require uptime. It only requires that if you are up, you
are fully functional.

