
The NoSQL movement - tux1968
http://radar.oreilly.com/2012/02/nosql-non-relational-database.html
======
einhverfr
The article is, IMO, total crap.

There are advantages to NoSQL in some cases. I am even happy to write about
them. However, they are not really the advantages listed here.

The first is that SQL databases can in fact be distributed in some cases. Look
at Postgres-XC for an example of write-scalable distributed shards while
maintaining a consistent, relational model.

The second is the focus on analytics. NoSQL analytics is very much of a
problematic area. Doing the analysis typically requires very intensive
searches of the entire data, and NoSQL databases are not optimized for this.
Consequently analytical data tends to be slow to build and then maintained on
data input, rather than generated ad hoc from existing data (using the entered
data as a single point of truth). This leads to a lack of flexibility even
though prepared reports load quickly. It isn't clear to me how different this
would be from summary tables maintained with triggers.

Now NoSQL has some advantages:

1) Where you don't need ad hoc analytics, where you have highly defined
functional requirements, and where you have well defined network protocols for
interop, development is often faster, and performance is better.

2) I think it could be very interesting as a network transparent back-plane if
you will for various kinds of network services.

------
jaylevitt
I can't even count how many mistakes this article makes. A random sampling:

\- Relational databases were designed for a world where availability is
unimportant, like transaction processing. Um, no. OLTP has four letters, not
two, and the first are as important as the last. Tandem was providing five-9's
systems in the '80s and '90s for ATM networks, lottery systems, airline
reservations, credit cards, etc. Tandem is a fault-tolerant relational
database in _hardware_.

\- Up-front schema design is a poor fit in a world where data requirements are
fluid: I don't think "up front" means what you think it means. You can change
schemas on the fly nowadays; your schema design is no more up-front than your
coding is.

\- You can't have millions of columns in a relational database: True, and you
wouldn't; you'd normalize that. This is an important difference, but not a
disadvantage of relational databases, any more than saying "In a relational
database, you'd join URLs with IP addresses, and maybe five other tables; this
design isn't even conceivable in a NoSQL database."

\- To optimize relational performance, you "do away with joins wherever
possible": 1995 called, and it wants its MyISAM back.

\- Two-phase commit is so obsolete, even banks don't use it: Of course they
do. You still need a two-phase commit to make sure the other end got your
data; whether "got your data" happens in the customer path or during
reconciliation is a design decision. How, exactly, do you think they
_discover_ that you and your spouse both got the money?

\- "relational databases were developed when distributed systems were rare and
exotic at best." That's nothing; when Von Neumann machines were developed, we
didn't even have transistors. Some legacy architectures keep on working.

\- "absolute consistency isn't a hard requirement for banks": see above. Yes
it is.

\- "So the CAP theorem is historically irrelevant to relational databases:
they're good at providing consistency, and they have been adapted to provide
high availability with some success, but they are hard to partition without
extreme effort or extreme cost." ... Wh... Bu... That's not even wrong.

\- "consistency requirements of many social applications are very soft." I
like the Facebook example from a recent article on causal consistency: I de-
friend my boss and then post that I'm quitting. Certain kinds of consistency
are in fact critical to social applications.

There are many good reasons to design around a NoSQL database instead of a
relational one. This article provides fewer than zero of them.

~~~
einhverfr
_"absolute consistency isn't a hard requirement for banks": see above. Yes it
is._

This one gave me a laugh. Thanks for pointing it out.

Let me see.... Your business does nothing but manage money. You don't need
absolute confidence over where that money is at any given point of time?
Right..... In fact accounting systems (and by extension ERP systems) are about
the LAST place you'd want to use anything other than a relational database
system/

~~~
bwarp
Most banks use an eventually consistent message oriented architecture, not a
transactional one.

At a low level banks use both absolute consistency (for physical transaction
stores) and eventual consistency (for logical transaction implementations).
The logical transaction implementation abstracts inter-bank and physical
payment messaging.

It would be impossible to have absolute consistency in the logical transaction
layer as transaction scopes would have to be open (i.e. locked) for days at a
time in some cases. That simply doesn't scale.

Ultimately banks have millions if not billions of pounds floating around not
in traditional transactional stores all the time.

~~~
einhverfr
I agree that ATM's are a bad solution to this. The issue there though is loose
coupling of third party financial networks though compared to a general need
for consistency in terms of one's own financial needs.

But loose coupling between third party payment networks (say debit card
purchases over Cirrus) and the bank is not really the same problem as using
Cassandra at Facebook.

------
jacques_chester
Here's my theory.

NoSQL came about because smart people were exposed to relational technology in
this order:

1\. A university course or book that mostly focused on SQL and then normalised
design

followed by

2\. Using MySQL in production.

What's missing from this picture is learning the other halves:

1\. Normalisation matters to _transaction processing_. Fast queries is another
matter entirely and usually only gets airily waved at in many books and
university courses. I went through an entire semester without seeing "OLAP".
Techniques I learned on the job were kept to the "Advanced Databases" course
which was only taught sporadically.

The idea that OLAP is some high mystery is just silly. It's join-beating,
denormalising stuff, like NoSQL, just with _decades of literature and code to
back it up_.

2\. It also matters that MySQL is not the benchmark of relational technology
performance or features.

My day job is working with Oracle databases. The price, the odd absence of
useful features because It's Never Been Done That Way (I'm looking at _you_ ,
primary key triggers and booleans-stored-as-char(1)) ... sometimes it's
amazing that people pay so much for it. Then you see the manuals, the
supporting tools[1] and the performance a half-decent DBA can massage and you
get that this stuff isn't as bad as the sticker price says.

For my own work, postgresql is where it's at. But for a big site I'd look to
DB2, Oracle RAC, Teradata, NonStop, Greenplum and on _and on and on_ before
betting on 5-year old technology reinventing a 50-year old paradigm that
didn't work real well the first time around.

[1] Except SQL Developer. What a dog.

~~~
gbog
> Normalisation matters to transaction processing.

If you mean to mean that normalization matters only for transactional
processing, I dare disagree. Normalization matters for data sanity. Building
and maintaining a complex and agile application is much easier on properly
normalized data. Making sure one bit of data is stored only in one place and
is properly decoupled from other bits of data's existence is still the Right
Thing to do. True, it is sometime in direct contradiction with data access
performance, but this is an optimization issue, which can be solved with
denormalization, materialization or the use of some "NoSQL" storage.

Anyway, in these NoSQL discussions I always wonder: As far as I know,
Wikipedia is still using a purely relational data-store, and it's main data
(an enormous dict of blobs) seem to be the perfect candidate for a non-
relational storage. So then, if NoSQL is so good at this task, how come they
don't have moved yet? Should they move? (Genuine question)

~~~
neilk
That's a complicated topic.

For the social web, people typically denormalize data because consistency is
less important than personalization and/or read performance. At some point,
it's just easier to write the same data in a redundant way than to query from
a canonical source and transform it on the fly. Wikipedia's raison d'etre is
to show the same data to everybody. In fact, it goes to great lengths to
ensure that everybody sees the very last updated version, no matter what. So
there's no great pressure to denormalize core services; in fact, quite the
opposite.

The next part of your question is whether a document-oriented database would
be better. I think it would be possible to write a wiki on top of a document
store. You'd gain a lot from simplicity, although you'd lose certain kinds of
flexibility.

But, this is not practical for Wikipedia at this point. For everything else
that goes into rebuilding a page, or administration, there are plenty of
traditional joins.

The software is very much married to SQL. MediaWiki, the software that powers
Wikipedia, is open source and database agnostic. There are people running
MediaWiki sites on pretty much every RDBMS you can name. While queries and
updates are all abstracted away, the core concepts are all obviously SQL. The
abstraction layer just gets rid of syntactical quirks and handles escaping.

That said, a typical MediaWiki installation is not well normalized either.
MediaWiki is capable of hosting a lot of extensions that extend the behavior
of the wiki. If the extension needs to persist data that's associated with
existing tables, a typical strategy is for the plugin to maintain its own
parallel tables of data, which reuse the same primary key.

Like a lot of successful websites, the MediaWiki culture is pragmatic above
all else. SQL databases are used to persist data and the best you could say is
that it's a hybrid strategy.

------
ryutin
The survey of nosql technologies sounds more like a rationale for another
O'Reilly bookshelf!

I mean, there are definitely arguments that can be made for alternatives to
relational databases, but without those special cases, in such an early stage
of maturation and without standards, it's worthy for pursuing by the hardy
cowboy or blissful novice.

As the nosql technologies do mature and standards emerge, I think it should be
expected that they will be subsumed into existing database products as new
features.

~~~
einhverfr
There's already some work on this in PostgreSQL with hstore and Javascript as
a stored procedure languae.

------
DiabloD3
There is no such thing as the NoSQL movement.... its the "NoMySQL and MySQL is
the only SQL implementation in the world because we're all PHP users and have
never heard of PostgreSQL" movement.

And it doesn't help a lot of the NoSQL dbs out there have SQL-like query
languages.

------
troymc
Having the two categories SQL and NoSQL is a bit like having the two
categories "books" and "non-books." The latter category includes giraffes,
planets, feelings, and windmills.

Okay, maybe it's not that bad, but NoSQL databases still make a huge non-
homogeneous set of things.

------
cageface
The debate over NoSQL vs SQL datastores seems to carry many of the same
overtones as the debate over static vs dynamic typing in programming
languages, with similar arguments being made on both sides.

------
wyuenho
Every time someone says you can't do schema-less design as easily in RDBMS', I
cite this EAV article on Wikipedia.

<http://en.wikipedia.org/wiki/Entity–attribute–value_model>

~~~
olalonde
Or you can use an ORM that generates the schema on the fly. (Technically, you
still have a schema in both cases.)

------
mbailey
I love the ad for MS SQL server in the middle of the page.

------
olalonde
After reading the article and comments here, I'm left even more confused.
Anyone care to explain some use cases for NoSQL databases?

