

Stonebraker:  Clarifications on the CAP Theorem and Data-Related Errors - ora600
http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors

======
dasht
A quick summary follows the quick editorial and after that a quick new
thought:

Editorial: Stonebraker is, imo, and as usual, Right Again. His biggest problem
is that he's boring that way. He doesn't open his mouth in contexts like this
but to be Right.

Summary: People say "No SQL is right because of the CAP theorem." The CAP
"theorem" says of DBs that: Consistency, high Availability, or Partition-
tolerance .... pick any two. Quite true! So one of the pro no-SQL arguments is
that high availability and partition tolerance are often the priorities ... so
toss out consistency! SQL assumes consistency. Thus you need No SQL.
Stonebraker correctly points out that, hey, you know what? Partitions are
pretty rare and tossing out consistency really didn't increase your
accessibility average by much .... so you tossed out consistency for no reason
whatsoever. If you think the Cap "theorem" justifies NoSQL... you're just
wrong.

Stonebraker's rant is nearly boring because it makes such an obvious point.

New Thought: I don't think NoSQL is popular because of the CAP theorem. I
think it is popular because it is easier to get started with (even if that
means using it poorly) than SQL. SQL is a little hard to learn. It's a little
bit awkward to use in some "scripting" language or other HLL language. NoSQL
may be bad engineering in many of its uses ... but its easier. A lot easier.
And, people aren't much asking about engineering quality until sites start
failing often. Which a heck of a lot of them do but by then the NoSQL
architects have collected their money and are out of town or else are still
around but able to point fingers of blame away from the abandonment of ACID.

An ACID DB that gave a more simple-minded logical model than SQL ...
including, sure, relaxing ACID constraints where that was really desirable ...
could go a long way fixing the confusion around NoSQL.

p.s.: given a typical distributed NoSQL DB, one thing you could do is regard
that as the "physical model", implement proper transactions, and build a
library that gave you ACID properties. Build up a high level way of using that
library so that you have a logical model of the data that is independent of
exactly how it is laid out in the underlying thing.... and you've got a Codd-
style DB. Great thing to do. SQL 2.0

~~~
jhugg
Not all of the distributed NoSQL systems give up consistency. Notably, H-Base
and, in certain scenarios, MongoDB, both offer consistent atomic reads and
writes of one thing, be it a row, supercolumn, document or whatever.

You make a good point that NoSQL is about so much more than the CAP theorem.
That doesn't mean there aren't a ton of people (some very smart) out there
citing the CAP theorem as proof that you have to give up consistency to be
"web-scale".

~~~
dasht
Thank you. I wasn't aware of that.

Yeah, there are three things in play (I would say):

1\. The language design of SQL is often icky. A different logical language
could be nice.

2\. The logical / physical separation is sometimes right on and sometimes...
not so much. At the very least, there doesn't seem to be One True Logical
Model, hence exposing a physical model with transactions seems justified.

3\. Often, people are quick to give up consistency and while that is not
always the wrong thing to do, it's done more than it ought to be. But... a lot
of people hacking up big sites and similar these days... their standards are
low and the "customer's" tolerance for resulting bugs and glitches is
startlingly high (for now).

All three of those get lumped under the "NoSQL" heading and it is helpful (I
think) to tease them apart.

------
jchrisa
This is the comment I left on the post (still moderating):

There is an extreme case of partition tolerance that must be considered:
disconnected operation.

For users at the edge of the network, latency can be the biggest performance
killer. If it takes 1 second or more for each user action to be reflected in
application state due to round trip time (mobile web) those seconds add up and
users can be frustrated.

However, if you move the database and web application to the mobile device
itself, users no longer see network latency as part of the user experience
critical path. Latency has been proven to be correlated directly to revenue,
because users engage much more readily with snappy interfaces.

Once data is being operated on by the user on the local device, the key
becomes synchronization. Asynchronous multi-master replication demands a
different approach to consistency, than the traditional model which assumes
the database is being run by a central service.

The MVCC document model is designed for synchronization. It's a different set
of contraints than the relational model, but since it's such a highly
constrained problem space it also admits of general solutions and protocols.

It's my belief that the MVCC document model is closer to the 80% solution for
a large class of applications. Storing strongly typed and normalized
representations of data is an artifact of our historically constrained
computing resources, so it will always be a good way to optimize certain
problems.

But for many human-scale data needs, schemaless documents are a very good fit.
They optimize for the user, not the computer.

~~~
ora600
MVCC you mean multi-version concurrency? As in writers-don't-block-readers?
Because most relational databases have that.

~~~
jchrisa
the idea that conflicts are detected at write time, so that applications will
have conflict resolution capabilities.

eg if you get the Etag wrong CouchDB rejects the save.

(edited to add) the difference is that CouchDB makes the MVCC semantics
visible to the client.

------
logicalstack
He seems to assume a lot in this post, for instance his 200 vs 4 node
comparison assumes that you have 200 nodes because the poor performance of
your DBMS requires that many nodes. If that's the case, great, use voltdb. If
not, it's perfectly reasonable to think that 200 nodes would have more network
partitions than 4 nodes which is why one would pick an AP system in the first
place.

------
jnewland
tl;dr version: <http://files.jnewland.com/stonebraker-20101021-200946.jpg>

~~~
ieure
Also, <http://bit.ly/aD0TiH>

------
moonpolysoft
Wherein Stonebraker misunderstands distributed systems engineering completely.

~~~
ora600
I understood what he said as:

Solving the problems of distributed systems is incredibly hard and not
necessary if all you need is scalability and high availability. Better think
of your problems in terms of engineering trade-offs and not distributed theory
and understand what you are giving up and what this gains you.

When you decide to give up consistency, your application can no longer assume
consistency ever. Giving up availability in case of network partition means
few extra minutes of downtime a year.

I don't think he completely misunderstood distributed systems - I think he
decided to completely side-step the entire field.

~~~
moonpolysoft
> Solving the problems of distributed systems is incredibly hard and not
> necessary if all you need is scalability and high availability.

Once one scales beyond a single node the system becomes distributed. Then by
definition one must deal with distributed systems problems in order to achieve
scale beyond the capabilities of a single node.

> When you decide to give up consistency, your application can no longer
> assume consistency ever. Giving up availability in case of network partition
> means few extra minutes of downtime a year.

Depends on the application. Giving up availability might mean cascading
failures throughout your entire application. For instance if the datastore is
unavailable for writes then any kind of queueing systems built around the DB
(a common design pattern) run the risk of overflow during the downtime.

And I would make the argument that once an application scales beyond a single
datacenter it cannot help but give up strict consistency under error
conditions.

> I don't think he completely misunderstood distributed systems - I think he
> decided to completely side-step the entire field.

If he didn't misunderstand them then he is purposefully ignoring the hard
problems. Which is worse?

~~~
ora600
> Once one scales beyond a single node the system becomes distributed. Then by
> definition one must deal with distributed systems problems in order to
> achieve scale beyond the capabilities of a single node

I meant that one doesn't need to solve the general problem of distributed
systems. Sharding is a common way to scale avoiding most of the problems
generally associated with distributed systems. Scaling within LAN is easier
than across data centers. You can assume no malicious traffic between your
servers and suddenly solving the byzantine generals problem is far easier.

Purposefully ignoring really hard problems can be a very good engineering
practice.

~~~
benblack
I purposefully ignored the tornado and so it did not hit my datacenter, tear
off a section of the roof, kill all power sources, and drench my servers. Hard
problems: solved.

~~~
sophacles
Thanks! My app was in that datacenter too. I mean, it had replicated mongodb
instances, and well balanced app servers, and nodes going away have no
discernable affect on users. Turns out tho, that with all that distributed
engineering, I didn't find out that the hosting company doesn't put your nodes
in various data-centers. That tornado would have taken down my service during
peak hours.

I know you will try to write that off as a "you get what you diserve" but I
challenge you to go ask people if their apps would survive a tornado to the
data-center. Many of them will say "sure its in the cloud!" Then drop the
killer question on them... "How many different data centers are your nodes
running on right now". Most will say "i dont know". Some will say "My host has
many data centers" (note this doesn't answer the question). A few will
actually have done the footwork.

Also, the scenario you describe is as easily mitigated with hot failovers and
offsite backups. This probably qualifies as distributed engineering, but only
is only the same as the above discussions in the most pedantic senses.

~~~
benblack
As MongoDB did not exist at the time, it seems unlikely. Such things happen
more than we might like, of course!

"Also, the scenario you describe is as easily mitigated with hot failovers and
offsite backups."

This is a sadly wrong, though common, belief. There is exactly one way to know
that a component in your infrastructure is working: you are using it. There is
no such thing as a "hot failover": there are powered on machines you hope will
work when you need them. Off-site backups? Definitely a good idea. Ever tested
restore of a petabyte?

Here's a simple test. If you believe either of the following are true, with
extremely high probability you have never done large-scale operations:

1) There exists a simple solution to a distributed systems problem.

2) "Failover", "standby", etc. infrastructure can be relied upon to work as
expected.

Extreme suffering for extended periods burns away any trust one might ever
have had in either of those two notions.

~~~
sophacles
First and foremost, the opposite of simple is not hard, it's complicated.
There is not a correlation between the simple-complicated spectrum and easy-
hard spectrum.

If you don't regularly press the button swapping your hot failovers and live
systems, you don't have hot failovers, you have cargo-cult redundancy. It's
like taking backups and not testing them. If you don't test them, you have
cargo cult sysadmining.

Distributed is complicated, but not that hard, there are well understood
patterns in most cases. Sure, the details are a bit of a pain to get right,
but the same is true of alot of programming. I have done distributed systems.
Maybe not google scale, but bigger than a lot, hundreds of nodes
geographically distributed all over the place, and each of these nodes was
itself a distributed system. I've dealt with thundering herds (and other
statistical clustering issues), can't-happen bottlenecks, and plenty more. But
each and every one of these problems had a solution waiting for me on google.
Further, each and every one of these problems was instantly graspable.

A lot of distributed stuff just isn't that hard. Sure, things like MPI and
distributed agents/algorithms can get pretty hard, but this isn't the same as
multi-node programs, which isn't the same as redundancy.

Keep the smug attitude, I'm sure it helps you convince your clients that you
are "really good" or some crap.

~~~
gruseom
You're talking to the guy who invented EC2. I think he knows what's hard.

<http://blog.layer8.net/ec2_origins.html>

~~~
sophacles
Im not intimidated. If anything, he should be more far, far more understanding
of the point then: A simple fail-over model for a smallish database is
completely different than a full blown EC2. In fact given that EC2 is a
freakishly large example of distributed system, I would put it in a totally
different class of product than the average "big distributed system".

