

It's Time to Drop the "F" Bomb - or "Lies, Damn Lies, and NoSQL." - pharkmillups
http://blog.basho.com/2011/05/11/Lies-Damn-Lies-And-NoSQL/

======
benblack
To some commenters: the C in CAP and the C in ACID are not the same thing. If
that is not clear to you, it is unlikely the database you develop will include
correct implementations of core concepts. Knowledge is power.

Peace and love to the human family.

\- Lil' B

~~~
strlen
Just to expand on this, the "C" in CAP corresponds (roughly) to the "A" and
"I" in ACID. Atomicity across multiple nodes requires consensus. According to
FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of
FLP), consensus is impossible in a network that may drop or deliver packets.
Serializable isolation level requires that operations are totally ordered:
total ordering on multiple nodes, requires solving the "atomic multicast"
problem which is a private instance of the general consensus problem.

In practice, you can achieve consensus across multiple nodes with a reasonable
amount of fault tolerance if you are willing to accept high (as in, hundreds
of milliseconds) latency bounds. That's a loss of availability that's not
acceptable to many applications.

This means, that you can't build a low-latency multi-master system that
achieves the "A" and "I" guarantees. Thus, distributed systems that wish to
achieve a greater form of consistency typically (Megastore from Google being a
notable exception, at the cost of 140ms latency) choose master slave systems
(with "floating masters" for fault tolerance). In these systems availability
is lost for a short period of time in case the master fails. BigTable (or
HBase) is an example of this: ( _grand simplification follows_ ) when a tablet
master (RegionServer in HBase) for a specific token range fails, availability
is lost until other nodes take over the "master-less" token range.

These are not binary "on/off" switches: see Yahoo's PNUTS for a great "middle
of the road" system. The paper < <http://research.yahoo.com/node/2304> > has
an intuitive example explaining the various consistency models.

Note: in a partitioned system, the scope of consistency guarantees (that is,
_any_ consistency guarantees: eventual or not) is typically limited to (at
best) a single partition of a "table group"/"entity group" (in Microsoft Azure
Cloud SQL Server and Google Megastore, respectively), a single partition of a
table (usual sharded MySQL setups) or just a single row in a table (BigTable)
or document in a document oriented store. Atomic and isolated _cross row_
transactions are impractical on commodity hardware (and are limited even in
systems that mandate the use of infiband interconnect and high-performance
SSDs).

[Disclaimer: I am commiter on Project Voldemort, a Dynamo implementation; in
addition to Dynamo, I also find Yahoo's PNUTS and Google's BigTable to be very
interesting architectures.]

~~~
benblack
Thanks for bringing some much needed science to these proceedings, my man. Too
many people mistaking computer science for a therapy session where their
feelings matter. Computer science has much in common with the honey badger.
Think upon this and be enlightened.

Yours in perpetual discovery, \- Lil' B

~~~
alnayyir
Stop signing your posts.

------
arihant
We performed our own crash testing for a Riak and about a million other
databases, Riak and MySQL with InnoDB were the only cases scoring more than 5
out of 10.

I think the problem with NoSQL is that they are targeting the wrong people -
the lazy people. One of the DBs we reviewed (can't remember which) did not
have single datacentre durability, lost like 80% data while crash during
updating table contents and was boasting some geo-coordinate built-in datatype
on their website. Its the priorities plague. Is built-in geo more important
than data?

Why do NoSQL databases have to be all distributed beyond sharding? I think
because thats what people wrongly perceive out of them. Google, LinkedIn all
use NoSQL which is distributed, so if a NoSQL DB doesn't do it, its a shame.
Thats the root of misjudgment I believe. Every DB, NoSQL or not, needs to have
a place. One size fits all is what will kill NoSQL, whether enforced by
engineering or marketing. Thats why I think marketing "lie".

CouchDB, Riak and Redis are only few exceptions I know of which seemed to have
a vision and stuck to it.

~~~
tuna
wait until you fill your mysql + innodb with real data, have a crash and have
to perform a check to get it back online. stick with riak.

~~~
spudlyo
You're confusing InnoDB with MyISAM. InnoDB does redo log recovery like all
grown up databases. MyISAM requires a lengthy fsck type operation.

~~~
tuna
you are confusing real world with a comment on hackernews. try and do a full
recovery on innodb with real data and get back so uncle tuna can hold you
tight and promise that will never happen again. hint: corrupted tables
crashing mysql processes and you having to start it with innodb_recovery 4 for
ro, dumping the table, reapplying it back.

\-- Mel Gibson's surgeon on "Payback"

------
mmalone
Credit where credit's due, the 10gen and MongoDB guys have done a great job
convincing developers to adopt their product despite the existence of
technically superior alternatives. I guess that's what happens when a bunch of
ex-DoubleClick execs start a database company.

~~~
mnutt
"ex-DoubleClick exec" gives the image of some kind of ad executive or
something, which is _not_ what Dwight is.

<http://www.10gen.com/video/mongosv2010/replication>

------
regularfry
Was this targeted at anyone in particular, or is it a general rant (deserved
or not) at the state at the world?

~~~
ethangunderson
It was a thinly veiled bash on Mongo.

~~~
ericflo
Not just Mongo though, there are a lot of NoSQL companies out there right now
whose marketing claims the impossible.

~~~
sophacles
It's kind of sad too. There are lots of use cases where I want fast
datastores, and you know what, if the database goes down, who cares?

For example, I do lots of experiment logging to a mongodb. If the power goes
out, and the data is lost who cares? The data was no longer valid or useful --
but if I slow down my writes for 'safety' I will be causing problems by
introducing delay in ways that could cause conflict.

------
jhugg
Even if it's hard to disagree with anything specific in the post, who does
this tone appeal to? Is the author selling or venting?

------
ericflo
This post is fantastic! I couldn't agree more with its contents. Sorry that
this comment is vapid, but I wanted to do something more than just click the
upvote button.

------
coffeemug
You can't escape the laws of physics, but if you know how to get physics on
your side you can do things that at first glance appear impossible. See: human
flight.

~~~
strlen
Practical, heavier than air flight was made possible due to internal
combustion engines. It's also still less practical and more expensive for some
applications e.g., cargo than other options. Practical, low-latency
distributed databases based on "invoke consensus protocol on every commit to
the log" would be made possible (on limited size local networks) when
networking gear with performance exceeding 10GigE/Infiniband becomes
"commodity". Even then, it will still be impractical and too expensive for
some scenarios.

At the present time, the fact that I said "invoke consensus on every commit to
the log" and "low latency" in the same sentence is making distributed systems
engineers cringe (I would _not_ advocate building such a system). The fact
that I said "Infiniband" and commodity in the same sentence is also making
systems administrators and DBAs cringe.

------
tuna
Also, a word about Consistency:

“There’ll be a time when all people are alike.” “Which is precisely the ideal
society. No mysteries, no romantics, no discussions, no persecution because
there’s no one to persecute. When all have received the same conditioning, it
will be like…” “Insects.” “Who have existed longer than ourselves and will
outlast our race by many millennia.” “Is existence everything? “There’s
nothing else.”

------
bitdiddle
"The world is full of wicked men Wyatt" -- Gene Hackman in Wyatt Earp

------
VladRussian
Just a piece of self-glorification. I brushed my teeth, it is hard - all 32
teeths, complicated distributed problem, can't do them all at once
consistently, yet i still was able to do it.

~~~
edoloughlin
It reads like it was written by a marketing person.

~~~
jhugg
I disagree. It reads like an engineer who is frustrated that his/her tech is
getting out-businessed (perhaps in a slimy way). Engineers generally want to
live in a world where the best tech wins, but in the end, we go home to our
betamax players and wonder why we can't rent any good movies for them.

