
CAP: Don't settle for eventual consistency - fanf2
https://yokota.blog/2017/02/17/dont-settle-for-eventual-consistency/
======
rdtsc
> CP systems can be made to be highly available in practice.

Yes don't, if you happen to have Google's level infrastructure. Complete with
private fiber connection down to GPS synchronized clocks in your data centers.
Otherwise sometimes you don't have a choice. Now I am not being sarcastic as I
understand Google offers Spanner as an API in GCP so you could technically do
that by paying for it.

[https://cloud.google.com/spanner/](https://cloud.google.com/spanner/)

Also in general it is important to keep in mind that keeping consistency in a
large distributed system is fighting against the laws of physics. There is
nothing wrong with fighting them, and it sure is fun to do but it takes effort
and money. Often is possible you can architect your solution to work with
eventually consistency. There are CRDTs and other things that can help there.
Eventually consistent system have also been around for a while so it is not a
totally new and uncharted territory.

The bottom line is consider the underlying assumptions. What is true for
Google maybe not be and probably isn't true for your use case.

~~~
canes123456
It not clear at to me what you are actually advocating for. The article is
pretty clear that most people should choose a CP database over an AP database.
This is true regardless of your scale unless you don't care about consistency
at all.

A start up does not need to make a CP system highly available but they will
likely need consistency. They can probably make eventually consistency work
for them but it does take effort and money. I don't see many cases were AP is
the right choice. Again CRDT are good but you need to design carefully to make
sure you system can work with eventually consistency. Once you have done that
you are only gaining availability during partitions which does not increase
your up time much. I would recommend a standalone database, a stock CP
database, or renting spanner over an AP if time and money is a concern.

------
kefka
Nope. Eventual Consistency seems to be the best policy.

The constant "C" isn't changing (or if it is, it's not much). that means
there's a definite amount of time for a DB to be consistent to cover transit
around the world. When the 3 DB machines are in the same rack, sure, we can
approach 11.8"/nanosecond latencies, but the latencies are still there.

The larger and more global (and soon, interstellar), the more these C related
lags become. And it is glaringly obvious to me, that "eventual consistency" is
the big solution here.

The only case where it doesn't seem to be good, is transactional stuff like
banking and investments. In those cases, seem to require CP and hardware like
atomic clocks to verify every copy of the DB is on the same page.. Or perhaps
an internal blockchain would be more appropriate, since it enforces consensus.
But I digress on that one.

~~~
freneticfox
To go a little further, I suspect with CP cases like banks the real underlying
problem is more fundamental. You can't realistically, under the laws of
physics, have a hard notion of transactional ordering (did the account's money
come in before it went back out?) without pinning down the concept of an
account to a location. At least, not efficiently or quickly.

In other words, eventual consistency in the face of asynchronous remote actors
never makes sense when your requirements dictate hard, consistent
transactional ordering. You have to think of it as "the transactions happen in
the order they arrive at the account's virtual location in New York". To think
of them as globally-distributed in nature is always going to cause logical
problems at some level. If your database was eventually-consistent, you'd have
to build in some sort of after-the-fact safety checks that have the ability to
abort the outer transaction, at which point you've wasted a lot of effort
patching over the wrong model.

So either you have a need for strong consistency guarantees (order really
matters), in which case you have to pin the transactions' locality down (where
do they meet up at for their efficient strict ordering?) and CP is your model,
or you don't (simpler things like social network updates) and you're better
off with AP and eventual consistency to scale things out easier and make it
faster for everyone, and really who cares if once in a great long while a
user-visible race happens and some people see a couple of posts in a different
order than someone else does for a few minutes until things snap back into
sync?

~~~
pjc50
> You can't realistically, under the laws of physics, have a hard notion of
> transactional ordering (did the account's money come in before it went back
> out?) without pinning down the concept of an account to a location

You've just made me realise that the universe itself is only eventually
consistent - that's what all those weird quantum observer / wavefunction
collapse events are.

~~~
kefka
Yep, that's what I was trying to get at, but failed in my explanation.

Everything's eventually consistent on an ever-present sphere around the
incident expanding at the speed of light. No faster.

Even if the sun were to blink out of existence, we'd have 8 minutes that we
wouldn't know. Gravity would still be there, holding Earth in place. The light
would still warm us. Then 500 seconds later, darkness. We'd be flung out on a
tangential course.

~~~
jacobparker
This isn't quite right. Even with SR and the speed of light its possible to
build consistent systems and achieve consensus. Not-eventually-consistent
doesn't mean instantaneous. The SoL just sets a lower bound on the speed of
consensus.

It's important to not overstate the importance of that bound, though.

~~~
kefka
Sure. CA fulfills that requirement. Of course, you throw away any semblance of
partition tolerance.

Of course, a single machine guarantees there can be no partitioning, and
really easy to obtain consensus. It might not be terribly fault-tolerant,
however.

~~~
jacobparker
CA is a not really a valid/possible thing in the context of CAP. The original
phrasing of the theorem was poor and the "choose 2" myth persists. You can
choose to (or accidentally) give up C or A but you don't get to choose to not
have partitions. Not being partition tolerant doesn't really make sense
(you're just broken?) if partitions are going to happen. A better phrasing of
CAP is "in a network with partitions a distributed system cannot be both
consistent and available." (note: this doesn't guarantee that you are one of C
or A, you just can't be C _and_ A.) You can see that definition used in formal
treatments, e.g. Theorem 1 in
[https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-
cap.p...](https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.pdf)

(Briefly, note that the original article is critiquing that definition of
availability in practice which is legitimate but not relevant to this sub-
thread.)

What I'm saying is that EC is most definitely NOT a requirement of physics/the
speed of light (what your original post claimed.) The speed of light only sets
a (theoretical) limits on how fast you can implement a consistent system.

The original Paxos paper ("The part-time parliament") uses an analogy of a
quorum of parliamentarians occasionally getting together in the the same
building and agreeing on something. Of course it being the same building is
arbitrary and doesn't actually matter, but it's easier to intuit that the
speed of light isn't an insurmountable road-block at that scale.

~~~
kefka
My comment was a bit tongue-in-cheek. Who actually runs a single DB server,
with no replication, no slaves, no nothing other than the primary?

CA is the correct way to understand a single un-replicated DB instance. And
it's really really wrong :)

------
gregmac
It seems like this has the same opinion as CockroachDB said a couple days ago
in "The Limits of the CAP Theorem"[1].

Do both of these really boil down to misunderstanding the "A" in CAP? Until
the first time I actually used an AP system, I thought "A" was talking in the
sense of "five-9's highly available". I'm far from an expert on this, so
please correct me if I get this wrong, but here's my take-away from these two
articles:

Since CP systems can be made "highly available" (in the five-9's sense of the
term), I think what they are both saying is the only _real_ benefit of AP is
low latency.

CP still depends on some coordination, and despite tricks with locks and
consensus algorithms, it's often still limited (even for reads) by the highest
latency between nodes.

AP, on the other hand, can respond to both read and writes much faster because
it doesn't need to care about the consistency -- the trade-off is the
application must handle eventual consistency.

AP can also handle certain types of network partitions that "highly-available"
CP systems can't (when DB nodes can't all talk to each other, but are still
able to talk to client(s)) but in practice that type of failure almost never
happens (at least not when your DBs all live in datacenters), so it's not a
good reason to choose AP over CP.

Also, not all CP systems _can_ be made "highly available" (in the sense of
five-9's), so it's not always an apples-to-apples comparison, and that I think
causes a lot of confusion.

(Again: Not an expert. Please correct me if I got anything wrong here.)

[1]
[https://news.ycombinator.com/item?id=14646063](https://news.ycombinator.com/item?id=14646063)

~~~
jeffffff
Every part of the cap theorem is commonly misunderstood. Availability is
probably the most commonly misunderstood aspect as is outlined in this
article. People commonly conflate consistency with durability when really CAP
says nothing about durability because CAP is based on a model where the only
type of failure is a network partition. And of course there are the people who
claim network partitions don't happen in their network so they can choose CA.

CAP is also not stated with a sharded database in mind. Availability is stated
as follows:

"For a distributed system to be continuously available, every request received
by a non-failing node in the system must result in a response"

This means that if you have 100 machines and you store each row on 3 of those,
if the 3 machines storing a row are partitioned from the client then the
client can read and write the row on any machine. That machine must respond
with something like "the row doesn't exist" or "the row is empty" if queried
about the contents. This quickly devolves into an unusable system if you
follow this train of thought to its conclusion. No practical sharded database
is truly AP under this definition and it only makes sense to talk about CAP in
the context of a single shard of a sharded database (which can be AP).

It is actually quite easy to have consistency without sacrificing latency:
Don't replicate. What you are sacrificing in this scenario isn't consistency,
it's durability, which is totally ignored by CAP. Where the latency is really
introduced is in trying to achieve durability. A durable AP system will end up
with close to the same write latency as an equivalently durable CP system, but
no one bothers to build AP systems with the same level of durability as a CP
system. It is possible to build a durable CP system where reads are just as
fast as in any AP system as long as there are no partitions at the time of the
read. The trick is to use read leases so that reads only have to touch one
machine.

The CAP theorem is nowhere near as important as many people think it is with
regards to performance and uptime of real world systems. It is a theoretical
result based on a very simplified model.

The only thing AP buys you is handling certain types of network partitions
that almost never happen as you said. They have better uptime when you can
talk to some machines but less than a quorum. This is not common.

The real reason AP systems are common is because consistency+durability is
really really really hard.

~~~
canes123456
If you are not replicating, than it not a distributed system and cap doesn't
apply. Sharding just gives you two centralized databases. Talking about
durability is just adding more confusion IMO.

------
justinsaccount
Why does this read to me as an advertisement for cloud spanner?

~~~
idibidiart
or Spanner-like guarantees which I heard might be possible with CockroachDB.
I'm not really sure how the two compare, and whether or not there is anything
out there with similar strong consistency guarantees.

~~~
tyingq
There's a pretty good blog post from CockroachDB that explains the tradeoffs
for not having the same level of time synchronization as spanner.

[https://www.cockroachlabs.com/blog/living-without-atomic-
clo...](https://www.cockroachlabs.com/blog/living-without-atomic-clocks/)

CockroachDB is, though, only in it's first "real" 1.0 release. You see the
sort of bugs you would expect to see in their repo. That is, basically, it's
not really suitable for anything important yet. They do seem to be making a
lot of progress though.

~~~
idibidiart
Right. There is no GPS sync'd atomic clock in Cockroach so I imagine lock
requests for a distributed object must use some consensus protocol, and that's
overhead. Am I off by a lot?

~~~
irfansharif
We use Raft underneath the hood to maintain consistency across the replicas.

re: 'There is no GPS sync'd atomic clock in Cockroach', Cockroach actually has
a command line flag (--linearizable) which makes it behave like Spanner: wait
for the max clock offset before returning a successful commit[1]. This isn’t
practical if you’re using NTP synchronization, but if you have atomic clocks
at your disposal..

[1]: [https://www.cockroachlabs.com/blog/living-without-atomic-
clo...](https://www.cockroachlabs.com/blog/living-without-atomic-clocks/)

------
mnarayan01
I think the point of this post is: You can use
[https://cloud.google.com/spanner/](https://cloud.google.com/spanner/) to work
with a CP model (typically much less of a headache than AP), while sacrificing
so little availability that it's essentially CAP.

The big concern I'd have (assuming using a Google-hosted database was
practical) is the SLA. Unless I'm misreading
[https://cloud.google.com/spanner/sla](https://cloud.google.com/spanner/sla),
it seems like the SLA is...not very strong. Given how they discuss
availability elsewhere, it seems like they're totally unwilling to put their
money anywhere close to where their mouth is.

That said, it does seem like going with Spanner to have the ease (and power)
of consistency while also having reliability and scalability would be
something to consider in a whole lot of situations. (Though I'd be reluctant
to jump on it _this_ early.)

------
eternalban
(Unmediated) Human personal perception has always been CP. But then we humans
are akin to mobile agents that visit co-local (thus non-partitionable) data
spaces and perceive a coherent 'classical' (as in Physics) world that is
equally available.

But AP is (by necessity) baked into the story of collective human perception.
Collectively, we are more akin to static clusters that communicate information
_about_ data spaces.

("Your full deposite will be available to you the following business day. Your
book balance is x. Your available balance is y < x." Dealing with AP is a day
to day common experience as old as organized hills.)

I think the root of the surprising fact that programmers find it difficult to
reason about AP data spaces (given the _fact_ that that is how we humans
achieve 'civilization' and 'culture') is due to the personal perspective
nature of iterative programming. We see this again when we consider the quite
related issue of correct understanding of memory models.

------
urethrafranklin
Transactions are an answer for atomicity, not consistency. It's possible to
have transactions and not be consistent. Hopefully this is just the mistake of
a non-technical product manager and not the engineering team that works on
Cloud Spanner.

------
StreamBright
Sounds nice in theory, I would like to see few years of history of production
systems using Spanner before we have enough evidence that nothing goes wrong
with this approach.

~~~
vgt
While Spanner has only been available to be used outside of Google very
recently (GA in May), it's been in production at Google across hundreds of
products, including AdWords, since 2012.. with global consistency to boot...

(work at G)

~~~
anarazel
Isn't that global consistency in effect frequently weakened because
applications have to batch writes to get throughput? It's great to be able to
have consistency, but in many other cases the source of inconsistency will
largely just moved to outside of the data management system.

------
marknadal
Sigh, quoting from an earlier comment I made here:
([https://news.ycombinator.com/item?id=14648745](https://news.ycombinator.com/item?id=14648745))

It is true that if you assume your client app is not important that a CP
system is the right choice. And I would also say this /was/ true up till about
2004 when Gmail was released. But it definitely stopped being true in 2007
when the iPhone was released and you started having installed apps.

Since then, users have slowly grown to expect both mobile apps and SPAs to
work regardless of whether the servers work, regardless of load balances,
regardless of connectivity.

If you look at the market trends, things are increasingly going in this
direction. From self-driving cars, to IoT devices, to drone delivery, to even
traditionally server-dependent productivity tools like gDocs and others -
people need to get work done even if the internet to your server doesn't
exist.

Will banking applications still need mostly server-dependent behavior? Yes. Is
CP still important? Yes. But it is biased to say that CP systems are better.
Choose the right tool for the right job. CP systems are definitely the right
choice for a strongly consistent database, but they aren't the right choice
for everything. My database is an AP system, but it should not be used for
many apps out there. Neither of these are "better", they are just tradeoffs
you have to decide upon.

~~~
aidenn0
If your client has to work when the database isn't available anyways, then
isn't that an even stronger argument for a CP database?

~~~
naasking
To be CP, your client and the server database have to both view the same
consistent data, so your client can't be available if the database isn't
available.

If your client is available even in this case, then you have some kind of
eventual consistency.

~~~
aidenn0
The article was specifically comparing database software, not entire systems;
if your client is available when the DB is unavailable, then you can have an
AP system regardless of the design of the database.

~~~
ProblemFactory
I think the argument is that people expect mobile and client-side web apps to
continue working offline, and eventually sync and merge changes across
clients. So you need a good solution for resolving inconsistent updates made
by two offline clients. The same solution could be used for resolving
inconsistent updates made on two database servers.

~~~
naasking
> So you need a good solution for resolving inconsistent updates made by two
> offline clients. The same solution could be used for resolving inconsistent
> updates made on two database servers.

Exactly what I was trying to get at, thanks. The line between client state,
server state and db state is a thin illusion. As REST makes explicit, every
client operates on a _representation_ of some server resources, which means
CAP can come into play in even the most trivial online apps.

