
The CAP FAQ - sytelus
https://github.com/henryr/cap-faq
======
roganp
What I have often wondered, is how are "distributed" systems different from
systems in the small where different components may be thought of as nodes on
a network. In other words, why are not multiple threaded apps writing to the
same database subject to the same constraints if the components of the system
are not provably 100% reliable? Is it just a practical matter, that
communication between a host and external storage is just so much more
reliable than communication across larger distances that we don't have to
worry about it?

~~~
themartorana
The only difference really is single-point-of-failure. Multiple threads
writing to the same DB record can be assisted with transactions or locking -
but again, the DB is, in the end, absolutely authoritative. What it says goes,
and if it's unavailable, so is everything.

When solving for high Availability through a distributed cluster, you might
give up _immediate_ Consistency. But partitions can end up giving different
answers (authoritatively, even) and an unavailable master record doesn't mean
something won't answer as if it _were_ authoritative.

A single node DB - like a MySQL server - still has failure points, but none
that plague CAP systems (well, except Availability, but that's how we got into
this mess to begin with).

------
ff_
The thing that really struck me is that it has been proven only in 2002 - and
it is by now a fundamental idea in (my) CS education about Databases and
Distributed Systems: I'm used to considering its implications when thinking
about a problem.

So I wonder: how did you did it before then?

~~~
pron
Two things.

1\. The notion of "high availability" wasn't used much before "web scale"
applications.

2\. As others have said, the CAP theorem is kind of tautological and does more
to confuse about what's possible than to clarify. For example, the CAP theorem
implies that even eventual consistency is impossible in the case of
partitions. If partitions can be of infinite length then no kind of
consistency is possible; if partitions are finite, then all kinds of
consistency are possible (with "availability" meaning very high latency,
though). In short, the CAP theorem itself doesn't contribute much information
to the analysis of distribution problems.

I recommend reading this nice overview of the issues with CAP:
[http://arxiv.org/abs/1509.05393](http://arxiv.org/abs/1509.05393)

~~~
notacoward
"High availability" was used _very_ often long before "web scale" anything
came along. It was in the very name of HACMP/6000, which I worked on in 1992,
and wasn't new then. The problem is that "availability" in that world meant
something very different than "availability" as it's defined in CAP - system
availability vs. node availability. That has caused almost as much confusion
as the difference in meaning between "consistency" in CAP and "consistency" in
ACID.

------
jtheory
This is well-explained and fairly concise.

It's a concept that could be much more rapidly explained in an intuitive way
with a few interactive diagrams, or honestly a few animated gifs, though.

The CAP theorem is one of those things that's a simple idea, but I didn't
understand it the first few times I ran across it just because:

1) the terminology is weird ("partition", huh... well, I know how to partition
a disk, but this clearly isn't that)

2) broad concepts need good _examples_ for newcomers -- to bridge the gap from
familiar to unfamiliar. If you've worked with large scale distributed
processing before, great, but these ideas are especially crucial for people
who are going to need such systems but don't yet understand the tradeoffs.

------
shin_lao
I think CAP is really more about how much consistency and availablity you
need. This FAQ talks about "relaxing" and that's the right word.

Some systems tolerate that a database (for example) is unavailable for a
couple of seconds. Some don't.

When you add a value into a distributed database, how much time are you ready
to wait for replicas to propagate? In some cases it's fine to wait several
minutes (think DNS) in some other it can have catastrophic results (think
finance/banking use cases).

~~~
pjc50
Ironically finance has coped just fine with delayed-update replicas for
centuries; that's what cheque clearing is. You issue a 'write' on a piece of
paper, speculatively commit a transaction locally, but in the event of a write
collision at head office it gets 'bounced' and the transaction rolled back.

------
jitix
Looking at this article IMO the best example of using CAP is Cassandra. The
user chooses the tradeoff.

~~~
themartorana
Most times the tradeoff has been chosen before the database is.

