Hacker News new | comments | show | ask | jobs | submit login
A Critique of the CAP Theorem (arxiv.org)
100 points by yarapavan on Sept 21, 2015 | hide | past | web | favorite | 39 comments

Indeed, applying the CAP theorem to real-world databases makes no sense, because the CAP definition of "available" is unnecessarily restrictive.

The real tradeoff is much simpler: if you want a consistent system, it will be slower and more expensive.

Regarding availability in a consistent system, as long as more than half of the servers are working and connected with each other, they will be able to elect a leader and function in a consistent way (using the Raft protocol for instance). Now as long as a client can connect to at least 1/k of the servers in the majority it can just keep trying connecting to random servers until it finds a reachable one in the functioning majority in time proportional to k.

For systems within a single datacenter, redundant networking makes partitions almost impossible, and having enough hosts makes losing more than half almost impossible, so the only failure mode is losing the whole datacenter. For systems distributed among multiple datacenters, availability is almost guaranteed unless a global catastrophe causes a global Internet partition (such as Eurasia and America no longer being connected).

The only issues, again, are that it's more expensive because you need more nodes and redundant networking and that it's slower because nodes have to communicate among each other before commits can be confirmed, especially if that needs to be done across datacenters. In particular, write throughput does not necessarily scale with more nodes in a consistent system, since all writes could modify the same value (after reading it) and that's not parallelizable in the general case.

For systems within a single datacenter, redundant networking makes partitions almost impossible,

derp, nope. No amount of redundant networking can avoid administrator mistakes or accidental "our redundancy all runs through the same conduit and ninjas chopped it in half during a new buildout."

better researched anecdotes: https://aphyr.com/posts/288-the-network-is-reliable

enough hosts makes losing more than half almost impossible

Losing network connectivity is indistinguishable from losing hosts though, and losing hosts is indistinguishable from your application just not responding any longer (dumb programming language GC pauses? kernel panic and halted? infinite docker deploy loop? SSD slowly failing and now writes take 1,000,000 times longer than before? Y2038 bug? All applications run on the same timer or compaction schedule and decide to blip for the exact same 30 seconds?). So, the failure mode for hosts is your network failure probability and your host failure probability, and host failure probability includes software failures.

>derp, nope.

Your post was full of good information. why did you have to start it by being an asshole?

Intractable character flaws.

more derp available at http://themajestichusky.tumblr.com/archive

> "our redundancy all runs through the same conduit and ninjas chopped it in half during a new buildout."

You can just consider that as a failure of the whole datacenter.

The ninjas could just as well chop all the external network connections, which would result in actual datacenter failure (from a client/service PoV), so it shouldn't increase the rate that much.

That's not the same thing. A network partition is not equivalent to the the whole datacenter being off nice and clean. That would be nice actually, because it is an and easily testable failure mode. Network partition due to misconfiguration will happen and they have verious interesting corner cases -- multiple plartitions or say partitions between servers but not between clients (clients see the servers, server don't see each other).

You can of course say "it will never happen" and just let chance decide what happens do the data in case when a partition happen.

No, the system needs to be properly designed to be consistent, so the data will always be fine (as long as you don't permanently lose all the servers and backups).

Partitions and datacenter failures only determine whether the system is up or not, and thus its availability properties.

You don't always have a choice: http://farm3.static.flickr.com/2316/2216487046_b7ca640f56_o....

Plus, we live in cloud la la land these days. You have no idea how any of your machines/VMs are connected together. We can assume nothing.

Partitions being "almost impossible" would be fine for running Facebook or HN. Not so much if you're a payment processor or investment brokerage. If you want your distributed system to be consistent then your application code needs to be able to handle partitions, end of story. The CAP theorem hasn't changed.

For a payment processor at least, there's basically nothing that can't be eventually consistent. Anything interfacing with banking has traditionally had a very high tolerance for eventual consistency as many processes have had wildly varying and long settlement timelines. Basically it's been rare for you to be able to guarantee you have a consistent view of an account at any given time.

From what I understand, even ATM's are eventually consistent.

Most things in the banking system is eventually consistent.

A lot of distributed systems literature seems to be too abstract for real world systems engineering. Where does one go to learn about architecting a distributed system on a that works in the real world?

The author of this paper, Martin Kleppmann, is writing a book "Designing Data-Intensive Applications" (http://shop.oreilly.com/product/0636920032175.do?cmp=af-stra...). I've been reading it via the O'Reilly immediate access and I think that it's the book you are looking for.

Indeed, he written a less formal post on the matter : "Please stop calling databases CP or AP" (http://martin.kleppmann.com/2015/05/11/please-stop-calling-d...)

I know you said a lot of it is too abstract, but I really enjoyed the readings from UIUC's CS525 when I was in school (years ago). The selected papers are a great "who's who" of distributed systems research.


I have found mixu's book (link: http://book.mixu.net/distsys/index.html) accessible and to the point.

Google's papers on F1 and spanner are decent. Unfortunately not many companies do this and a lot of it is tied up as 'proprietary information'.

Basho produce a lot of good documentation for Riak which I've found invaluable when creating my own distributed database.

CAP is a simple and obvious principle: If you have a partition, meaning you cannot always meet your read/write quorums, you can choose:

- availability: always accept writes, but reads in other partitions might not see them so there's no consistency.

- consistency: writes fail/block until the partition is resolved, so there's no availability.

Clearly, parts of your data that are not affected by a partition might still be consistent and available.

The point is, in classical database systems 100% of transactions are consistent. In distributed databases this is only possible if you sacrifice availability some of the time, or alternatively you can sacrifice consistency some of the time.

The author of this paper is suggesting a more elaborate theory that involves network delay, which is great, though criticising what is effectively a mathematical truth seems strange.

You have the correct response. The paper is a sleight of hand. It defines CAP as something it's not and then attacks the straw man.

For me the most interesting part was:

> ...we can prove that certain levels of consistency cannot be achieved without making operation latency proportional to network delay.

'Consistency requires waiting' is a pretty well known rule of thumb for distributed systems but this is the first quantitative proof that I've seen. It's really useful see exactly what kinds of consistency impose latency and how that latency varies with respect to network conditions.

Well, guaranteed consistency requires coordination and if your coordination mechanism is over a network, then the speed of your writes will be multiple factors of your network latency.

"network" doesn't necessarily have to mean high latency ethernet though. You can have a network running on top of an embedded backplane in a blade system. There are ways to minimize latency, but latency is as latency does.

There's no way for node A to distinguish between latency and partitioning in regards to node B until node A receives a response from node B. Partitioning is just another name for timing out.

Eric Brewer discussed his CAP theorem earlier this year on SE-Radio: http://www.se-radio.net/2015/05/the-cap-theorem-then-and-now... In the interview, Brewer mentions that the nuance has always been part of it, but a decade ago the unnuanced "pick any two" elevator pitch was what upset database vendors and developers. And it was close enough as a model to be useful.

Reading this and some of the contents I wonder what a galaxy scale distributed system might look like and whether speed of light would be sufficient to cope with trade that could proceed at say, 10% the speed of light.

I think everyone's been looking for more nuanced ways to describe the tradeoffs involved in creating distributed systems. Another look at the problem: http://radlab.cs.berkeley.edu/people/fox/static/pubs/pdf/c18...

(Edit: Worth pointing out that comes from Mr. CAP himself, Eric Brewer.)

Seems like the paper includes a pretty straightforward observation that CA just doesn't make much sense. How can you have consistency and availability if there's a partition? That's when you get the strong versus weak consistency or eventual consistency distinctions. If you look at the call me post by Aphyr consistency is a problem when there's a partition in a lot of real world software.

That's actually the exact opposite of what he says. He says that CA can be vacuously satisfied by a system by simply going unavailable. It makes totally sense, but is a pathological behavior. It's a confusion that many people have about CAP, which I tried to clarify in a blog post[0] a couple months ago, in response to some other articles were going around at the time.

[0]: http://computationallyendowed.com/blog/2015/07/09/cap-theore...

"CA can be vacuously satisfied by a system by simply going unavailable"

That kind of takes the A out of CA.

CA means you are giving up both consistency and availability if a partition happens (otherwise you'd be CP or AP). How is that at all useful in a distributed system?

Yep. A valid partition-intolerant system is one where nodes on both sides of a partition will automatically do something drastic, like unconditionally terminating, so that no subset of the nodes remains for clients to talk to.

The interesting thing about CA systems is that their behavior strongly resembles that of non-distributed systems. It's just like having one SPOF machine!

>How can you have consistency and availability if there's a partition?

I don't get why people seem to have such a problem with the CA trade off, but seem to understand the CP or AP tradeoff. If you trade partition tolerance then have a partition your clients need to handle that just like they need to handle inconsistency, or unavailability if that was the trade off selected.

Can you give an example of a client handling a partition?

The tie ins with the google dataflow paper are at least superficial (if not deeper on deeper reading) in that they mention: "latency, correctness and cost" as their drivers: http://blog.acolyer.org/2015/08/18/the-dataflow-model-a-prac...

Money quote: "we believe that CAP has now reached the end of its usefulness"

From Conclusion:

In this paper we discussed several problems with the CAP theorem: the definitions of consistency, availability and partition tolerance in the literature are somewhat contradictory and counter-intuitive, and the distinction that CAP draws between “strong” and “eventual” consistency models is less clear than widely believed.

CAP has nevertheless been very influential in the design of distributed data systems. It deserves credit for catalyzing the exploration of the design space of systems with weak consistency guarantees, e.g. in the NoSQL movement. However, we believe that CAP has now reached the end of its usefulness; we recommend that it should be relegated to the history of distributed systems, and no longer be used for justifying design decisions.

This is precisely the point of an article titled Consistency Tradeoffs in Modern Distributed Database System Design[0]. CAP focuses on failures and not much else. There needs to be a richer vocabulary to describe all the axes of performance in a properly (or partially) functioning distributed system.

[0]: http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf

I was gonna wait until vendors who should know better (cough Percona cough) stop advertising that they have CA systems before declaring that CAP has reached the end of its usefulness :-)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact