
NoSQL: Distributed and Scalable Non-Relational Database Systems - linuxmag
http://www.linux-mag.com/id/7579
======
njm
What bothers me most about the NoSQL movement is that it seems to be in
conflict with a central tenet of our community: to avoid premature
optimization, especially when at the expense of of expressiveness or
correctness.

While normalized relational databases are indeed slower than key/value stores,
they also foster data integrity and the ability to compose arbitrarily complex
queries with little effort (assuming one actually groks SQL). One can always
optimize trouble spots in a relational database later on (possible even with a
KVS), but using a KVS from the get-go is bound to lead to pain.

If we happily use Ruby, Python, et al. while chanting the mantra of premature
optimization being bad (and rightly so!), why does this not apply also to
databases?

~~~
Maro
That's like saying a Formula 1 team should first design a Ford Focus and then
optimize it to win the world championship. It just doesn't make sense to call
that process an optimization. The same is true for hi-performance/hi-volume
use-cases where KV stores make sense.

Also, putting the optimization issue aside, the KV data access model is
actually quite natural for web applications. This is hard to believe
(seriously), and you have to see it to believe it. Note that in web apps,
being able for the administrator to go in and issue complex queries/updates is
much less important than for eg. a billing system. I use an RDBMS for an old
billing project, and it wouldn't make sense to use a KV store, but I use a KV
store for a web app I'm writing, and it turns out to be a great fit, more
natural than an RDBMS.

~~~
njm
Sure, I suppose it matters a lot on the type of Web application you're
building. If you have a relatively simple domain model with very few
relationships, I'm sure a KVS works pretty well. Still, for anything other
than applications with simple domain models, or ones that need tremendous
performance, a KVS is inappropriate outside of isolated optimization -- in
practice, one runs into headaches with non-normalized data very quickly (think
MBAs and their giant, messy Excel spreadsheet).

~~~
Maro
I disagree.

In most web apps data in per-user (or per-object, whatever that object is),
and it's not a major headache to denormalize it to be per-user. One notable is
exception is a social network, but that can be handled with some care.

We're talking about web apps written by programmers, not MBAs. Sure, if your
web apps are written by MBAs, then they should use an RDBMS, because they can
do less damage (?).

I'm not arguing against RDBMS/SQL here, all I'm saying is that a KV store is
actually a good choice for the domain of web applications. You can also use an
RDBMS of course, as we have for the last 10 years with the LAMP stack, as long
as you're handling a managable amount of traffic. Even with a lot of traffic
you can handle it with an RDBMS, it just stops making sense at some point.

------
jbellis
Weird, the only distributed system that gets even a passing mention is
Voldemort.

------
earle
Naturally, Hypertable and HBase, two of the more prominent and technologically
advanced solutions aren't even included in this article.

~~~
vicaya
He probably equates NoSQL to KV stores that has no chance of having SQL
implemented, as they have no efficient range scan capabilities. It's totally
feasible to have full SQL (and more) implementation on top over Hypertable.

~~~
jbellis
> It's totally feasible to have full SQL (and more) implementation on top over
> Hypertable.

Except that would be ridiculous, since the data model is just not designed for
denormalization. (And that's okay!)

~~~
vicaya
Denormalization can happen automatically based on the usage of queries with
joins. Some commercial RDBMS like DB2 already do that. SQL is a declarative
language. It doesn't force you to use a particular implementation.

~~~
jbellis
That doesn't make it a good idea. :)

It's such a bad fit because your keys will be spread across other nodes in the
cluster. Even appengine datastore, which doesn't go nearly all the way towards
full sql and still emphasizes denormalization, has latency commonly in 100s of
ms.

~~~
vicaya
You're guessing the implementation here. It's only a bad idea if you're doing
it wrong :) The source/conceptual tables can be normalized, while the join
views can be materialized/implemented with Bigtable like sparse columns, which
are dynamically updated if the joins happen a lot. Any subsequent join queries
hits the views but the source tables. Manual denormalization should be allowed
but not required.

GAE has nothing to do with this, it's a shared cluster, such that any
benchmark is not reliable and comparable. It also has nothing to do with the
SQL discussion, as it doesn't support joins either.

