CAP: Don't settle for eventual consistency

rdtsc · on June 28, 2017

> CP systems can be made to be highly available in practice.

Yes don't, if you happen to have Google's level infrastructure. Complete with private fiber connection down to GPS synchronized clocks in your data centers. Otherwise sometimes you don't have a choice. Now I am not being sarcastic as I understand Google offers Spanner as an API in GCP so you could technically do that by paying for it.

https://cloud.google.com/spanner/

Also in general it is important to keep in mind that keeping consistency in a large distributed system is fighting against the laws of physics. There is nothing wrong with fighting them, and it sure is fun to do but it takes effort and money. Often is possible you can architect your solution to work with eventually consistency. There are CRDTs and other things that can help there. Eventually consistent system have also been around for a while so it is not a totally new and uncharted territory.

The bottom line is consider the underlying assumptions. What is true for Google maybe not be and probably isn't true for your use case.

canes123456 · on July 2, 2017

It not clear at to me what you are actually advocating for. The article is pretty clear that most people should choose a CP database over an AP database. This is true regardless of your scale unless you don't care about consistency at all.

A start up does not need to make a CP system highly available but they will likely need consistency. They can probably make eventually consistency work for them but it does take effort and money. I don't see many cases were AP is the right choice. Again CRDT are good but you need to design carefully to make sure you system can work with eventually consistency. Once you have done that you are only gaining availability during partitions which does not increase your up time much. I would recommend a standalone database, a stock CP database, or renting spanner over an AP if time and money is a concern.

kefka · on June 28, 2017

Nope. Eventual Consistency seems to be the best policy.

The constant "C" isn't changing (or if it is, it's not much). that means there's a definite amount of time for a DB to be consistent to cover transit around the world. When the 3 DB machines are in the same rack, sure, we can approach 11.8"/nanosecond latencies, but the latencies are still there.

The larger and more global (and soon, interstellar), the more these C related lags become. And it is glaringly obvious to me, that "eventual consistency" is the big solution here.

The only case where it doesn't seem to be good, is transactional stuff like banking and investments. In those cases, seem to require CP and hardware like atomic clocks to verify every copy of the DB is on the same page.. Or perhaps an internal blockchain would be more appropriate, since it enforces consensus. But I digress on that one.

freneticfox · on June 28, 2017

To go a little further, I suspect with CP cases like banks the real underlying problem is more fundamental. You can't realistically, under the laws of physics, have a hard notion of transactional ordering (did the account's money come in before it went back out?) without pinning down the concept of an account to a location. At least, not efficiently or quickly.

In other words, eventual consistency in the face of asynchronous remote actors never makes sense when your requirements dictate hard, consistent transactional ordering. You have to think of it as "the transactions happen in the order they arrive at the account's virtual location in New York". To think of them as globally-distributed in nature is always going to cause logical problems at some level. If your database was eventually-consistent, you'd have to build in some sort of after-the-fact safety checks that have the ability to abort the outer transaction, at which point you've wasted a lot of effort patching over the wrong model.

So either you have a need for strong consistency guarantees (order really matters), in which case you have to pin the transactions' locality down (where do they meet up at for their efficient strict ordering?) and CP is your model, or you don't (simpler things like social network updates) and you're better off with AP and eventual consistency to scale things out easier and make it faster for everyone, and really who cares if once in a great long while a user-visible race happens and some people see a couple of posts in a different order than someone else does for a few minutes until things snap back into sync?

pjc50 · on June 28, 2017

> You can't realistically, under the laws of physics, have a hard notion of transactional ordering (did the account's money come in before it went back out?) without pinning down the concept of an account to a location

You've just made me realise that the universe itself is only eventually consistent - that's what all those weird quantum observer / wavefunction collapse events are.

kefka · on June 28, 2017

Yep, that's what I was trying to get at, but failed in my explanation.

Everything's eventually consistent on an ever-present sphere around the incident expanding at the speed of light. No faster.

Even if the sun were to blink out of existence, we'd have 8 minutes that we wouldn't know. Gravity would still be there, holding Earth in place. The light would still warm us. Then 500 seconds later, darkness. We'd be flung out on a tangential course.

jacobparker · on June 28, 2017

This isn't quite right. Even with SR and the speed of light its possible to build consistent systems and achieve consensus. Not-eventually-consistent doesn't mean instantaneous. The SoL just sets a lower bound on the speed of consensus.

It's important to not overstate the importance of that bound, though.

kefka · on June 29, 2017

Sure. CA fulfills that requirement. Of course, you throw away any semblance of partition tolerance.

Of course, a single machine guarantees there can be no partitioning, and really easy to obtain consensus. It might not be terribly fault-tolerant, however.

jacobparker · on June 29, 2017

CA is a not really a valid/possible thing in the context of CAP. The original phrasing of the theorem was poor and the "choose 2" myth persists. You can choose to (or accidentally) give up C or A but you don't get to choose to not have partitions. Not being partition tolerant doesn't really make sense (you're just broken?) if partitions are going to happen. A better phrasing of CAP is "in a network with partitions a distributed system cannot be both consistent and available." (note: this doesn't guarantee that you are one of C or A, you just can't be C and A.) You can see that definition used in formal treatments, e.g. Theorem 1 in https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.p...

(Briefly, note that the original article is critiquing that definition of availability in practice which is legitimate but not relevant to this sub-thread.)

What I'm saying is that EC is most definitely NOT a requirement of physics/the speed of light (what your original post claimed.) The speed of light only sets a (theoretical) limits on how fast you can implement a consistent system.

The original Paxos paper ("The part-time parliament") uses an analogy of a quorum of parliamentarians occasionally getting together in the the same building and agreeing on something. Of course it being the same building is arbitrary and doesn't actually matter, but it's easier to intuit that the speed of light isn't an insurmountable road-block at that scale.

kefka · on June 29, 2017

My comment was a bit tongue-in-cheek. Who actually runs a single DB server, with no replication, no slaves, no nothing other than the primary?

CA is the correct way to understand a single un-replicated DB instance. And it's really really wrong :)

jacobparker · on June 28, 2017

Special relativity and the speed of light impose some fundamental lower bounds on the cost of consistency: https://en.wikipedia.org/wiki/Relativity_of_simultaneity . This theoretically manifests as lower bounds on the performance of consistency in distributed systems.

dragonwriter · on June 28, 2017

> The larger and more global (and soon, interstellar)

Soon, interstellar? That's...optimistic, to say the least.

kefka · on June 28, 2017

I don't think so. It's not hard to see the beginnings of a satellite communication platforms around the Moon and Mars. And once we start looking at the local solar system, we start dealing with C being a significant source of lag.

For example, communications to and from Mars can take upwards to 20 minutes, depending on where the Earth is in relation to Mars. If this doesn't call for eventual consistency, I'm not sure what would.

The more humans grow, C is a thing we'll have to take in consideration for our communications mediums. Planet-wise comms still feel instant, but that's still on our ball of rock.

dragonwriter · on June 28, 2017

> I don't think so. It's not hard to see the beginnings of a satellite communication platforms around the Moon and Mars.

That's not even remotely close to interstellar.

> And once we start looking at the local solar system, we start dealing with C being a significant source of lag

Sure, interplanetary has substantial impact from light-speed delays.

I'm just saying that pointing to imminent interstellar scope is wildly improbable.

jrowen · on June 28, 2017

You mean interplanetary. It will be a while until Google has nodes orbiting Alpha Centauri.

toomuchtodo · on June 28, 2017

https://en.wikipedia.org/wiki/Interplanetary_Internet

https://www.wired.com/2013/05/vint-cerf-interplanetary-inter...

Even if you discount manned travel, we have an entire solar system full of humanities remote sending gear in flight, hence the need for an IP-ish store and forward network.

gregmac · on June 28, 2017

It seems like this has the same opinion as CockroachDB said a couple days ago in "The Limits of the CAP Theorem"[1].

Do both of these really boil down to misunderstanding the "A" in CAP? Until the first time I actually used an AP system, I thought "A" was talking in the sense of "five-9's highly available". I'm far from an expert on this, so please correct me if I get this wrong, but here's my take-away from these two articles:

Since CP systems can be made "highly available" (in the five-9's sense of the term), I think what they are both saying is the only real benefit of AP is low latency.

CP still depends on some coordination, and despite tricks with locks and consensus algorithms, it's often still limited (even for reads) by the highest latency between nodes.

AP, on the other hand, can respond to both read and writes much faster because it doesn't need to care about the consistency -- the trade-off is the application must handle eventual consistency.

AP can also handle certain types of network partitions that "highly-available" CP systems can't (when DB nodes can't all talk to each other, but are still able to talk to client(s)) but in practice that type of failure almost never happens (at least not when your DBs all live in datacenters), so it's not a good reason to choose AP over CP.

Also, not all CP systems can be made "highly available" (in the sense of five-9's), so it's not always an apples-to-apples comparison, and that I think causes a lot of confusion.

(Again: Not an expert. Please correct me if I got anything wrong here.)

[1] https://news.ycombinator.com/item?id=14646063

jeffffff · on June 29, 2017

Every part of the cap theorem is commonly misunderstood. Availability is probably the most commonly misunderstood aspect as is outlined in this article. People commonly conflate consistency with durability when really CAP says nothing about durability because CAP is based on a model where the only type of failure is a network partition. And of course there are the people who claim network partitions don't happen in their network so they can choose CA.

CAP is also not stated with a sharded database in mind. Availability is stated as follows:

"For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response"

This means that if you have 100 machines and you store each row on 3 of those, if the 3 machines storing a row are partitioned from the client then the client can read and write the row on any machine. That machine must respond with something like "the row doesn't exist" or "the row is empty" if queried about the contents. This quickly devolves into an unusable system if you follow this train of thought to its conclusion. No practical sharded database is truly AP under this definition and it only makes sense to talk about CAP in the context of a single shard of a sharded database (which can be AP).

It is actually quite easy to have consistency without sacrificing latency: Don't replicate. What you are sacrificing in this scenario isn't consistency, it's durability, which is totally ignored by CAP. Where the latency is really introduced is in trying to achieve durability. A durable AP system will end up with close to the same write latency as an equivalently durable CP system, but no one bothers to build AP systems with the same level of durability as a CP system. It is possible to build a durable CP system where reads are just as fast as in any AP system as long as there are no partitions at the time of the read. The trick is to use read leases so that reads only have to touch one machine.

The CAP theorem is nowhere near as important as many people think it is with regards to performance and uptime of real world systems. It is a theoretical result based on a very simplified model.

The only thing AP buys you is handling certain types of network partitions that almost never happen as you said. They have better uptime when you can talk to some machines but less than a quorum. This is not common.

The real reason AP systems are common is because consistency+durability is really really really hard.

canes123456 · on July 2, 2017

If you are not replicating, than it not a distributed system and cap doesn't apply. Sharding just gives you two centralized databases. Talking about durability is just adding more confusion IMO.

justinsaccount · on June 28, 2017

Why does this read to me as an advertisement for cloud spanner?

idibidiart · on June 28, 2017

or Spanner-like guarantees which I heard might be possible with CockroachDB. I'm not really sure how the two compare, and whether or not there is anything out there with similar strong consistency guarantees.

tyingq · on June 28, 2017

There's a pretty good blog post from CockroachDB that explains the tradeoffs for not having the same level of time synchronization as spanner.

https://www.cockroachlabs.com/blog/living-without-atomic-clo...

CockroachDB is, though, only in it's first "real" 1.0 release. You see the sort of bugs you would expect to see in their repo. That is, basically, it's not really suitable for anything important yet. They do seem to be making a lot of progress though.

idibidiart · on June 29, 2017

Right. There is no GPS sync'd atomic clock in Cockroach so I imagine lock requests for a distributed object must use some consensus protocol, and that's overhead. Am I off by a lot?

irfansharif · on June 29, 2017

We use Raft underneath the hood to maintain consistency across the replicas.

re: 'There is no GPS sync'd atomic clock in Cockroach', Cockroach actually has a command line flag (--linearizable) which makes it behave like Spanner: wait for the max clock offset before returning a successful commit[1]. This isn’t practical if you’re using NTP synchronization, but if you have atomic clocks at your disposal..

[1]: https://www.cockroachlabs.com/blog/living-without-atomic-clo...

tyingq · on June 29, 2017

"Spanner always waits on writes for a short interval, whereas CockroachDB sometimes waits on reads for a longer interval."

Where interval is clock drift.

Royalaid · on June 28, 2017

Just an important point the guarantees cockroach provides and Google cloud spanner provide are slightly different. Cockroach provides serializability while Spanner provides linearizability.

mnarayan01 · on June 28, 2017

I think the point of this post is: You can use https://cloud.google.com/spanner/ to work with a CP model (typically much less of a headache than AP), while sacrificing so little availability that it's essentially CAP.

The big concern I'd have (assuming using a Google-hosted database was practical) is the SLA. Unless I'm misreading https://cloud.google.com/spanner/sla, it seems like the SLA is...not very strong. Given how they discuss availability elsewhere, it seems like they're totally unwilling to put their money anywhere close to where their mouth is.

That said, it does seem like going with Spanner to have the ease (and power) of consistency while also having reliability and scalability would be something to consider in a whole lot of situations. (Though I'd be reluctant to jump on it this early.)

eternalban · on June 28, 2017

(Unmediated) Human personal perception has always been CP. But then we humans are akin to mobile agents that visit co-local (thus non-partitionable) data spaces and perceive a coherent 'classical' (as in Physics) world that is equally available.

But AP is (by necessity) baked into the story of collective human perception. Collectively, we are more akin to static clusters that communicate information about data spaces.

("Your full deposite will be available to you the following business day. Your book balance is x. Your available balance is y < x." Dealing with AP is a day to day common experience as old as organized hills.)

I think the root of the surprising fact that programmers find it difficult to reason about AP data spaces (given the fact that that is how we humans achieve 'civilization' and 'culture') is due to the personal perspective nature of iterative programming. We see this again when we consider the quite related issue of correct understanding of memory models.

urethrafranklin · on June 28, 2017

Transactions are an answer for atomicity, not consistency. It's possible to have transactions and not be consistent. Hopefully this is just the mistake of a non-technical product manager and not the engineering team that works on Cloud Spanner.

StreamBright · on June 28, 2017

Sounds nice in theory, I would like to see few years of history of production systems using Spanner before we have enough evidence that nothing goes wrong with this approach.

vgt · on June 28, 2017

While Spanner has only been available to be used outside of Google very recently (GA in May), it's been in production at Google across hundreds of products, including AdWords, since 2012.. with global consistency to boot...

(work at G)

anarazel · on June 28, 2017

Isn't that global consistency in effect frequently weakened because applications have to batch writes to get throughput? It's great to be able to have consistency, but in many other cases the source of inconsistency will largely just moved to outside of the data management system.

StreamBright · on June 29, 2017

Sorry I don't have access to your internal incident tracker to draw any conclusion here. Will see in few years how this works out for external companies.

obeattie · on June 28, 2017

Plenty of Google's products use Spanner and have for years.

marknadal · on June 28, 2017

Sigh, quoting from an earlier comment I made here: (https://news.ycombinator.com/item?id=14648745)

It is true that if you assume your client app is not important that a CP system is the right choice. And I would also say this /was/ true up till about 2004 when Gmail was released. But it definitely stopped being true in 2007 when the iPhone was released and you started having installed apps.

Since then, users have slowly grown to expect both mobile apps and SPAs to work regardless of whether the servers work, regardless of load balances, regardless of connectivity.

If you look at the market trends, things are increasingly going in this direction. From self-driving cars, to IoT devices, to drone delivery, to even traditionally server-dependent productivity tools like gDocs and others - people need to get work done even if the internet to your server doesn't exist.

Will banking applications still need mostly server-dependent behavior? Yes. Is CP still important? Yes. But it is biased to say that CP systems are better. Choose the right tool for the right job. CP systems are definitely the right choice for a strongly consistent database, but they aren't the right choice for everything. My database is an AP system, but it should not be used for many apps out there. Neither of these are "better", they are just tradeoffs you have to decide upon.

aidenn0 · on June 28, 2017

If your client has to work when the database isn't available anyways, then isn't that an even stronger argument for a CP database?

naasking · on June 28, 2017

To be CP, your client and the server database have to both view the same consistent data, so your client can't be available if the database isn't available.

If your client is available even in this case, then you have some kind of eventual consistency.

aidenn0 · on June 28, 2017

The article was specifically comparing database software, not entire systems; if your client is available when the DB is unavailable, then you can have an AP system regardless of the design of the database.

ProblemFactory · on June 28, 2017

I think the argument is that people expect mobile and client-side web apps to continue working offline, and eventually sync and merge changes across clients. So you need a good solution for resolving inconsistent updates made by two offline clients. The same solution could be used for resolving inconsistent updates made on two database servers.

naasking · on June 28, 2017

> So you need a good solution for resolving inconsistent updates made by two offline clients. The same solution could be used for resolving inconsistent updates made on two database servers.

Exactly what I was trying to get at, thanks. The line between client state, server state and db state is a thin illusion. As REST makes explicit, every client operates on a representation of some server resources, which means CAP can come into play in even the most trivial online apps.

marknadal · on June 28, 2017

This is so beautifully said, I wish I could have been that concise. Thank you for this comment, very well worded. That is exactly what I'm trying to do with my own system, for anybody curious: https://github.com/amark/gun (MIT / ZLIB / Apache 2 licensed).

aidenn0 · on June 28, 2017

Okay, that makes sense.

zzzcpan · on June 28, 2017

CP system is about not working when it is not available, otherwise you have no consistency at all and it is in fact very important to consider the whole system, not only the database part. Eventual consistency lets you fetch and store a tiny view of the database in the client and allows the user to work completely within the local storage, synchronizing with the global storage only when the internet connection is available. Which enables fast and always available user experience.

velzevur · on June 28, 2017

This! I can not stress this enough: algorithms, languages and paradigms are just tools and some are suitable for a certain job and some are not. How do you choose a language to complete a task? It depends on the task at hand. In handling money you must be consistent but in some other cases - eventual consistency is just good enough. When building a server should I kick all users out just because we've lost a couple of nodes in the cluster? It always depends, it always is a trade off.