
Mike Stonebraker: The "NoSQL" Discussion has Nothing to Do With SQL - neilc
http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext
======
alecco
Guys, please check his Wikipedia page before starting some random rambling.
This guy is a _living legend_ and has several of the most prestigious awards.
He also is/was an entrepreneur and large DB corp CTO. Oh, and a very early
open source advocate and developer. Among _many_ other things.

Don't say something stupid just because you like NoSQL. In fact, he was an
early and strong critic of the one-size-fits-all commercial RDBMS solutions.

    
    
      [...] These papers presented reasons and experimental evidence
      that showed that the major RDBMS vendors can be outperformed
      by 1-2 orders of magnitude by specialized engines in the data
      warehouse, stream processing, text, and scientific database
      markets.
    

1-2 orders of magnitude, that is in the range of 2-99 times faster.

And in particular, if you didn't read at least 5 of his papers you can't
pretend to have a clue about how database engines, and in particular the major
RDBMS, work.

Thanks.

[A n00b DB engine writer, currently working on a non-SQL database]

PS: If I get to achieve 10% of what he did so far in his life I'd consider my
life amazing. And he is still making new things. Respect!

~~~
antirez
I'm a bit shocked by your post. So because he is a living legend one should
not say nothing before reading all the papers cited even if from the article
his point is already clear and one may say something meaningful even as a
modest developer?

I'm going to say one for instance: the article's main point is about the fact
that because of a threaded and disk-based architecture you can't get much
faster than well engineered SQL architectures. For instance Redis is
completely excluded from this reasoning, being in-memory and single-threaded.

Another point: where are the numbers? For instance I can trivially give proof
that Redis can handle 150,000 operations/second per core in a decent Linux
Box. I want to know what's the performance of the SQL solutions cited in the
article.

Another one: all the SQL solutions cited in the article are names I never
heard before of today. NoSQL DBs are tar.gz you can download for free from
sites and compile _now_ without paying nothing.

Respect does not mean to shut up.

~~~
cx01
> the article's main point is about the fact that because of a threaded and
> disk-based architecture you can't get much faster than well engineered SQL
> architectures.

I disagree. His main point is that current SQL databases are slow not because
they use SQL but because their implementations suck. He then lists 4 areas
where the implementations spend most of the time and argues that all of them
can be eliminated (how this should be done is explained in the linked paper).

~~~
antirez
Yep but I read in the article:

> Second, many No SQL systems are disk-based and retain a > buffer pool as
> well as a multi-threaded architecture. This > will leave intact two of the
> four sources of overhead above.

That's not always true.

For the other points, please give me the download link of this SQL database
that can scale without problems (I want an opensource one since I'm a cheap
startup).

Also scalable != fast, it only means that adding more nodes I can scale,
possibly almost linearly, but again for startups that are cheap it is also
_very_ important how many boxes you are going to need, so it's important to
have numbers about this superior SQL engines to do some math.

That said the article is interesting, as the idea is more or less that in
theory it's possible to build SQL databases that are very scalable and that
mostly our problems in the past where about poor implementations. I guess this
is true, but I can't imagine how an ACID SQL system is without overhead
compared to a key-value store, even when the right technology is used for the
implementation. The only fix for this is to show numbers.

Well also I don't think at all avoiding SQL is a bad idea, but the author
wrote in this article he is going to show how the other NoSQL claim is false
(the claim is that SQL sucks at modeling a lot of problems).

Anyway the author resembles a lot Adam from Battlestar Galactica and this is a
good point.

~~~
cx01
> For the other points, please give me the download link of this SQL database
> that can scale without problems (I want an opensource one since I'm a cheap
> startup).

He didn't say that there are such systems, but that he expects them in the
next years.

> so it's important to have numbers about this superior SQL engines to do some
> math.

In the paper he talks about speedups of 1-2 orders of magnitude compared to
current OLTP systems.

> I can't imagine how an ACID SQL system is without overhead compared to a
> key-value store, even when the right technology is used for the
> implementation.

If he is right, then there should be no big difference (maybe 10-20%) between
a SQL system and a key-value store, assuming that both offer the same ACID
guarantees, because most of the complexity lies in the ACID guarantees and not
in the data-storage itself.

> Well also I don't think at all avoiding SQL is a bad idea, but the author
> wrote in this article he is going to show how the other NoSQL claim is false
> (the claim is that SQL sucks at modeling a lot of problems).

He's going to do that in the next blog entry.

------
petewarden
My summary: it's possible to jettison a lot of ACID requirements and implement
sharding and still have SQL as the interface to your data.

My response: Sure, in theory, but for right now Redis/Tokyo Tyrant/etc
actually exist and are free. They don't support a lot of what traditional
databases consider required features, but the NoSQL movement is based around a
recognition that many applications can sacrifice those in favor of performance
and scalability.

Key/value stores are like C, barely disguised assembler that lets you shoot
yourself in the foot, but is incredibly flexible. NoSQL is about acknowledging
that sometimes that's the right tool for the job. This article doesn't address
that choice at all, just handwaves about the wonderful technology that will
handle all the problems automagically Real Soon Now. Point me at an SQL
database I can use to handle my data workload of thousands of updates a second
on a cheapie EC2 instance, that will be a discussion I can use.

~~~
neilc
_Sure, in theory, but for right now Redis/Tokyo Tyrant/etc actually exist and
are free._

Yes, there's a question of whether we are talking the right way to build
systems, or which system you ought to use in production tomorrow. Building a
whole "movement" around the simple lack of some software is a little short-
sighted, IMHO.

BTW, Stonebraker's VoltDB (www.voltdb.com) is a high-performance OLTP engine,
and apparently an alpha release will be open-sourced shortly (a few months, I
believe).

 _the NoSQL movement is based around a recognition that many applications can
sacrifice those in favor of performance and scalability._

You make it sound like the NoSQL people were the first to observe this. That
is very far from the truth -- IMS, Codasyl, Berkeley DB, object-oriented DBs,
etc. have all been around for a long time.

~~~
jbellis
So, maybe you can clarify something for me.

Stonebraker cites Greenplum / Vertica / etc as examples of sql dbs that scale
out. But all the ones he mentions are data warehouses that measure time from
load to queriability in minutes. Not ms like the OLTP-focused nosql
distributed systems.

And of course systems like RAC or pgcluster rely on a SAN, so that's not
really playing the same game either.

Am I missing something? Feels like Stonebraker is cheating a little to make
his point to me.

~~~
zaphar
no your not missing anything. And I can speak from experience that RAC doesn't
even help. We are in the progress of migrating from a RAC system to a NoSql
system. RAC doesn't scale either.

~~~
ntoshev
Could you give more detail about what are you doing?

~~~
zaphar
Suffice to say that RAC has been nothing but trouble since we moved to it from
DB2 for scale reasons. Rather than pay a consultant tons of money to help us
get it stable after less than a year on it we are now migrating off to a non-
relational solution. I can't give more detail than that really. Maybe later
I'll be able to get permission to post a blog post about it.

------
vicaya
Actually many NoSQL DBs are not just about relaxing ACID requirements, but
also about limiting functionality for easier scaling via DHT. They only
support key/value set/get and not range scan, which is needed to implement SQL
semantics.

I think the best interpretation for NoSQL so far is "Not only SQL", which has
interface implication as well.

~~~
jimbokun
What's DHT?

~~~
gcheong
Distributed Hash Table.

------
gord
The article starts with a description/definition of 'key-value stores' vs
'document stores'... but doesn't clarify the difference, if any.

When people say 'NoSQL', that doesn't preclude another kind of query
language...

I assume NoSQL to mean, roughly : \- no traditional ansi RDBMS SQL \- key-
value store \- graph-like / treelike data as first class citizen \- map/reduce
style operations \- bundled with a dynamic lisp/python/ruby/javascript style
general programming language.

You'd have to agree that 'NoSQL' is a movement - ie. many people vocalizing
their shared emotional frustration at the limits of SQL. Not theoretical
limits, but practical ease of use and subjective syntactic inelegance.

I personally identify with the movement, after having built traditional sql
systems, preached data normalization etc., then slowly realising that this RDB
approach is just really unwieldy and brittle.

Moving from XML to JSON, from C++ to lisp languages.. this makes the ugly
syntax of SQL really stand out : as blatantly as COBOL or FORTRAN do to
someone who has read K&R.

Maybe we havent found the right replacement... but we still know there has to
be a better way.

The term 'NoSQL' is subjective, undefined, fuzzy - but its a label for a real
problem that needs solving, and groups together partial solutions.

------
jvyduna
Indeed Stonebreaker is a living legend. Entrepreneurs may find it interesting
that he is currently pitching VCs a commercial venture to augment his latest
DB project, <http://scidb.org/> (much like Cloudant is to Couch)

~~~
antirez
I also found this:

"VoltDB is an independent, Boston area-based company co-founded by DBMS R&D
pioneer Mike Stonebraker and startup veteran Andy Palmer, working in stealth
mode on the next-generation of OLTP DBMS."

So it's a biased legend. Sincerely if one is investing a lot in something like
VoltDB this buzz about NoSQL DBs and especially the fact that they are
starting to be actually used to get work done can be a problem.

~~~
jasonwatkinspdx
I believe this is the commercialization of H-Store, which has been featured
here on HN before.

------
kevinpet
The title is correct, the article confused and somewhat pointless. NoSQL is
about the systems, not the query language, but it's not about RDBMS minus
atomicity and consistency, it's about saying "this _not_ SQL, don't compare it
to RDBMS that you are used to querying with SQL".

Anyone talking about how you can implement SQL on top of NoSQL systems is
missing the point. KV and Document stores are a different way of storing data.

SQL: express what you want to happen using relational notation, and it will
magically happen. There are no real-world performance issues.

NoSQL: certain operations are fast, other operations are slow. This fact is
reflected in the API.

~~~
chwahoo
The article isn't confused - he's using the SQL in the same way you are (in
your first paragraph, anyway) - he's referring to RDBMS's. His point is that
that many of the performance benefits of a KV/documentstore based db can be
achieved in an RDBMS without throwing away time-tested ideas like ACID
guarantees.

I was confused by your post, however. You initially state that NoSQL is about
the system/approach to storing data (and not the query language) but then
proceed to contrast querying APIs.

------
igrekel
I am much more interested in the upcoming post.

Even tough I have not used the products mentioned. I have seen that RDBMSes
can be configured and optimized but usually it costs time or money plus it can
get quite complex. This complexity wouldn't be so bad if it allowed you to
make the rest of your application simpler. Sadly, it usually isn't the case.
You end up constantly dealing with theses two different worlds in your
application and ORMs also bring their share of complexity when you do more
sophisticated or customized things. And I am not even saying anything about
when you need to change things.

I think one of the nice things about many of the "noSQL" solutions is that
they keep things simple and under your control. You still need to do the
complex stuff yourself, but its never simple anyway. I am sure they don't
bring a solution to everything but it certainly is nice to see the area of
databases moving again after many years of status quo. And the author is
certainly one of the people pushing the field.

------
nullexpulsion
apparently, as a supporter of the NoSQL movement, i would just like to state a
small 'analogy'...

Imagine if someone from amongst our early ancestors hadn’t thought of creating
the wheel, where would we be?

And so it is with NoSQL - it is just an alternative way of looking at a
problem which in this case is that of databases and looks at it in terms of a
non-relational model. Therefore, i would like to urge all who are 'hostile'
towards it - just let us do what we want. If the results are not to your
liking what can we do.

i personally believe that through pursuing the principles on which NoSQL is
based we will manage to achieve something new that mayhap replace SQL.

Additionally, please do also give thought to the fact that Google's Big Table,
Amazon's Dynamo and Cassandra can be thought of as part of NoSQl movement..
Rebels are at first turned away from society in general - but later on society
realises that the rebels had the correct idea. Change is always essential to
progress without which i would not have been typing this..

On another account (sorry for the detour) i am rather getting quite interested
in Gopher these days. Anybody interested in Gopher?

[ I apologise for the non-tech response.]

[ Null Expulsion - nullexpulsion [at] gmail [dot] com - how long will it be
since spammers figure out how to detour around this? ]

------
nimbix
I think it's only a matter of time before this NoSQL movement morphs into a
SPARQL movement.

<http://en.wikipedia.org/wiki/SPARQL>

------
bham
Please refer to this independent and unbiased study that bolsters my point, a
study which I authored.

~~~
gcheong
I see nothing wrong in citing your own work. He does make a disclaimer at the
end that he is involved in several RDBMS technology startups.

