Hacker News new | past | comments | ask | show | jobs | submit login

It's kind of amazing how we have to have this discussion again every time somebody designs a CP system with excellent availability.

I'll just come out and say it: the 'A' in CAP is boring. It does not mean what you think it means. Lynch et al. probably chose the definition because it's one for which the 'theorem' is both true and easy to prove. This is not the impossibility result with which designers of distributed systems should be most concerned.

My heuristic these days is that worrying about the CAP theorem is a weak negative signal. (EDIT: This is not a statement about CockroachDB's post, which doubtless is designed to reassure customers who are misinformed on the topic. I'm familiar with that situation, and it makes me feel a deep sympathy for them.)

(Disclosure: I work on a CockroachDB competitor. Also none of this is Google's official position, etc., etc. For that, here's the whitepaper by Eric Brewer that we released along with the Cloud Spanner beta launch https://static.googleusercontent.com/media/research.google.c...).




I read the paper and I don't understand this passage:

  For example, any database cannot provide availability if all of its replicas are offline, which has nothing to do with
  partitions. Such a multi-replica outage should be very rare, but if partitions are signi cantly more rare, then you can effectively
  ignore partitions as a factor in availability. 
  For Spanner, this means that when there is an availability outage, it is not in practice due to a partition, 
  but rather some other set of multiple faults (as no single fault will forfeit availability).
My understanding was that 'Partition' in CAP was a bit of a misnomer. To a running node, a partition of half the cluster is indistinguishable from half the nodes failing. So, partition tolerance really covers partitions as well as multi-node failures. Brewer wrote the original paper, so I will trust his definitions. However, if P doesn't cover multi-node failures, it seems to weaken the usefulness of CAP considerably. As is mentioned, in my experience, partitions are very rare. Multi-node failures on the other hand are the primary failure case I worry about.

(edit): I have thought about it some more, and this article really annoys me. It reads like marketing material: "CAP doesn't apply to us because we are Google, bitches."

There is an argument there, but I think the way Brewer makes the argument is really weak. I would much rather them say: "We have built a really great CP system. Also, because we are Google we are capable of 99.99958% uptime, so you really don't need to worry to much about tiny edge cases where you will lose A."


From what I understand of CAP (and I'm no expert by any means), 'partition tolerance' is the ability of a system to reconcile itself in the event of the system being split into partitions.

In these types of scenarios, different sections of the system are still working as normal, and each have a different view of the network of available nodes.

In a scenario where a set of homogeneous nodes in a system is split in two, both are equally 'available' and so the system as a whole has to decide what to do in that scenario. If both sides present themselves as available then they will be making decisions based only on interactions with half of the nodes in the system and their view of the system as a whole will start to drift apart.

This is bad because at the point where they get reconnected again they may well realise that the system as a whole is now not internally consistent. If you think about a distributed database, then you can start to having conflicting commits and now your DB is FUBAR.

You are right in thinking that from the point of view of a partition that it can't know if the rest of the system is just partitioned away or has crashed and will never come back.

But what that simplifying assumption is saying is that if you can ensure that you are much much more likely to have the nodes go down completely rather than actually be partitioned, then things are easy because you don't have to consider diverging system views and how you might re-integrate them.

Or something like that, I ended up writing more than I was planning!


Right. My understanding is that the clever bit of CAP is that, from the perspective of a single node, a true network partition is indistinguishable from a bunch of nodes failing simultaneously. If, as a node, you find yourself in a minority network (and cannot make quorum) you need to decide what to do.

Maybe you are on the losing side of a split, and there is a set of nodes out there that can make quorum.

Or maybe 51% of the nodes crashed and what you see is all that is left of the cluster.

Whatever you decide, it has to work for both those possibilities.

So are you interpreting Brewer as saying, in practice we never have a split. Just assume what you see is all there is of the network. However, Spanner is a CP system. If you are willing to assume you will never need to merge inconsistent data, wouldn't you go for AP?


Would have replied sooner, but I had to sleep...

What he's saying is that it's strictly CP, as it will handle partitions (there's a section further down the paper describing how it would be handled). But as P's hardly ever ever happen, because Google, then it's pretty much CA (always consistent and available).

So yes, for all intents and purposes, he's saying "Yeah, CAP does apply but we're so good we can make P 'go away'."


And what's "quorum" in a system where nodes are free to join and leave?


There's a lot of complexity here, but the short version is that nodes cannot quite come and go freely. A replica set keeps track of its members; a quorum of the existing members must vote to admit a new node, or to remove one.

Note that in both CockroachDB and Spanner a cluster contains many independent and overlapping replica sets. The data is broken down into "ranges" (to use the terminology of CockroachDB; Spanner calls them "spans"), each of which has its own replica set (typically containing 3 or 5 members).


Is the answer, "a really hard problem?"


It is the continuity of 'state' agreed to by a qorum of actors. The identity of these actors is irrelevant.


Your comment mostly reads like signaling.

It is not "kind of amazing" considering Eric B. felt required to write a follow up to his CAP paper.


I think the definition of availability is due to Brewer. The weird part of the 'theorem' due to Lynch and Gilbert is the definition ood consistency: in I recall correctly, it's linearizability, which is somewhat stronger than any guarantee that and DB actuality makes.


> it's linearizability, which is somewhat stronger than any guarantee that and DB actuality makes.

Isn't that equivalent to serializability, which is a guarantee that lots of DBs offer (though you can choose weaker ones instead.)


Spanner offers "external consistency", which is similar to linearizability.

CockroachDB offers serializability (as the default isolation level). Most other SQL databases offer serializability as their highest isolation level, but default to something weaker.

Serializability and linearizability are not equivalent, although it's not always easy to devise a scenario in which the differences are apparent. The "comments" tests in Aphyr's Jepsen analysis of CockroachDB is one such scenario: http://jepsen.io/analyses/cockroachdb-beta-20160829#comments


Linearizability is stronger; see the 'definition of linearizability' in https://en.wikipedia.org/wiki/Linearizability

Note: its been so long since I dealt with these things that I can't remember what the difference means in practice. :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: