
CAP Twelve Years Later: How the "Rules" Have Changed - thebootstrapper
http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
======
mad44
Here is a summary of the article.
[http://muratbuffalo.blogspot.com/2012/02/cap-12-years-
later-...](http://muratbuffalo.blogspot.com/2012/02/cap-12-years-later-how-
rules-have.html)

------
parasubvert
A great article that clarifies many of the CAP theorem debates that raged back
in late 2010.

A quick summary...

1\. Partitions are rare; why sacrifice consistency or availability for them in
the general case?

Thus consider a system with a "partition mode" that either limits operations
to preserve consistency, or allows operations that risk consistency, depending
on what the application needs.

This is approach taken with some of the newer NoSQL databases like Cassandra,
which tends to prefer availability over consistency, though this is adjustable
to prefer consistency in some cases; or the newer distributed RDBMS' like
Google F1, which tends to prefer consistency over availability, but generally
remains highly available across data centres:
<http://research.google.com/pubs/pub38125.html>

2\. CAP isn't a single choice for a system. Systems are composed of many sub-
systems that often make a different CAP choice, depending on the operation or
data or user involved.

3\. The properties of partition-tolerance, availability and consistency are
more continuous rather than binary.

Availability can be measured percentage-wise, whereas consistency isn't as
easily measured but can have many different forms, some weaker or stronger.
Partition tolerance can also take on many forms, such as tolerating certain
common partitions, but not other (perhaps low probability) ones.

Reflecting on the last two points, consider that a system might have:

\- an offline mode its for HTML or mobile client applications, which enables
availability over consistency when certain Internet WAN failures occur,

\- uses a traditional RDBMS in the server, preferring consistency over
availability when partitions occur in the data centre

\- has a backup RDBMS in a second data centre, which uses asynchronous log
shipping replication to preserve availability when partitions occur between
data centres (with a chance of data loss), but also preserves both consistency
and availability when partitions occur between the Internet and the primary
data centre (but not the secondary).

These are all nuanced choices that are very common. One could replace the
above RDBMS with a wide-area Riak Enterprise cluster or Cassandra cluster. The
tradeoffs would be different - say, no data loss, but the application would
have to be much more explicit about how to deal with inconsistencies when
recovering from a partition between the two data centres.

Overall, it's good that we have new, more scalable & available data management
options beyond the "all consistency all the time" traditional RDBMS. But the
solution differs based on the application and level of scalability required.
One shouldn't just throw out consistency because it seems like the new way of
doing things. Similarly, one shouldn't throw out availability just because
you're afraid of data loss or inconsistency. It depends on your problem.

------
dude_abides
The ATM example at the end of the article is awesome and a must-read, even if
you just skimmed over the rest of the article.

