

NoSQL is What? - timf
http://blog.zawodny.com/2011/07/23/nosql-is-what/

======
fauigerzigerk
Clearly, we have to identify the non scaling or performance related qualities
of NoSQL for the debate to make any sense. I don't think it is possible in
general to define those qualities, because NoSQL systems don't have much in
common. Using a negation to name the category is telling in itself.

You mention schemaless, but non of the BigTable derived systems are
schemaless. Key-value stores are schemaless but RDBMS can do key-value storage
just fine as can file systems.

I think this whole debate boils down to whether or not you need to normalize
data. If you normalize, you need joins and that's the weak spot of most NoSQL
systems. Doing joins in procedural code requires all data to be transferred
into application process memory, which is only viable for modest amounts of
data. (I'm not saying that only RDBMS can ever do proper joins, just that the
popular NoSQL solutions in use today don't)

Normalization is also what mandates ACID because normalization means you're
losing what I would call the "physical unit of consistency". Normalization,
joins and ACID go together. It's all or nothing. (Of course pragmatically it's
never all or nothing but it's useful to highlight the general point)

So, my conclusion is this: Use RDBMS or don't normalize (much). All the
debates around RDBMS or NoSQL being simpler or more complicated turn out to be
implicit debates about the need for normalization. When some people say this
or that model is simpler, they either imply or don't imply a need for
normalization.

In my view, whether or not you need to normalize depends primarily on whether
or not the data is single purpose or multi purpose. If it's one app and its
own private data island, then not normalizing often makes sense for simplicity
and performance reasons.

If the data has it's own seperate life cycle, idependent of any individual
app, then not normalizing is a terrible mistake that brings down everyone's
productivity no matter how simple it may appear initially.

Having worked on data integration and anlytics projects for many years, I'm
leaning towards the view that most data is multi purpose even if it's not
initially expected to be. But that may well be survivors bias as apps that die
young never cause integration issues. That doesn't mean they haven't fulfilled
their original purpose.

~~~
St-Clock
I really like this idea of basing your decision on the need to normalize or
not. It certainly fits document-oriented databases and key-value stores, but
I'm less sure about column-oriented databases (I have no experience with them,
except many hours trying to understand them...).

The single-purpose vs. multi-purpose that comes from denormalization vs.
normalization would explain why certain companies stick to RDBMS for their
main data and use NoSQL only for certain specific scenarios.

~~~
fauigerzigerk
I agree. And using both has been the status quo for ages if you count the file
system as a kind of NoSQL data store.

As far as I can tell, there are no significant qualities in the BigTable
clones that are not related to scalability.

------
haberman
"threw up in my mouth a little." "Gee, let me get this straight." "Bullshit."
"Seriously?"

I have to say that one of my regrets about growing up in programmer circles is
seeing stuff like this held up as an acceptable example of how adults
communicate with other adults.

It took me a long time to realize that this style of communication is not
necessary, is not effective, and reflects poorly on the speaker. C'mon, this
guy appears to be in his 30s or 40s and has written books, so why does he
write like he's an angsty teen? (I know Linus does it. I think it's lame when
he does it too.)

There's still room for humor and snark, here are three of my favorite blog
postings/articles ever, all very snarky, but not embarrassingly juvenile:

[http://wanderingbarque.com/nonintersecting/2006/11/15/the-s-...](http://wanderingbarque.com/nonintersecting/2006/11/15/the-
s-stands-for-simple/)

[http://diveintomark.org/archives/2004/01/14/thought_experime...](http://diveintomark.org/archives/2004/01/14/thought_experiment)

<http://www.info.ucl.ac.be/~pvr/decon.html>

~~~
rdouble
If you're a programmer of a certain vintage, you've been reading writing like
this on mailing lists for years. It's easy to think programmers in particular
produce embarrassing writing. However, most blogs are written in this style,
not just blogs by programmers. This issue is not confined to the internet, or
the contemporary age. The editorial page in my hometown's daily paper was not
any better. If one reads historical letters to the editor in regional
newspapers, you'll find that adults have been communicating in poorly written,
juvenile language for centuries .

~~~
sophacles
Not even just regional papers, lower level stuff.

It is very very very important to remember in this, that language which sound
dignified and mature to us today, is in many cases just vulgar "crap talk" of
a previous era. The reason it sounds so fancy and polished to us is because it
is old more than because it is good in many cases.

------
justin_vanw
Ok, whenever someone opens with something to the effect of 'I use MySQL, so I
have experience with relational databases and can make a comparison with
NoSQL' all credibility is lost.

MySQL is a 'relational database', but one in which JOIN is so expensive and
poorly optimized that you almost have to use it as a key-value store, looking
everything up directly with synthetic primary keys.

I've had this discussion several times. Some startup guys say 'we should look
at NoSQL', and I ask questions to get to the bottom of why they think that.
They will say something like 'we have this huge join we have to do, but it's
too expensive, so we pre-compute it'. I ask more questions, and the 'huge
join' is not huge at all, in fact it is just a reasonable join, something that
you could expect to do on every page view without difficulty. Well, except
they are using MySQL, and it can't join for shit. The MySQL query planner is
disgusting.

So, although I don't expect to persuade the world to stop using MySQL (to be
honest, I love that it is the go-to thing, those of us who use a decent
database like Postgresql end up with a huge competitive advantage, better
performance, more features, more scaleable, amazing query planner, top shelf
performance analysis), I think we should at least admit that in practice, to
get any performance out of it, you have to effectively use it as a key-value
store anyway. And when comparing MySQL, which is a shitty key-value store,
against real key-value stores, you can make a case for some NoSQL thing.

~~~
evilswan
Totally agree

------
jister
I have to agree with one of the comments. All you did was rant and didn't say
something useful. Perhaps you can tell your readers about your experiences so
that you can convince them that NoSQL is useful (Of course, I am NOT saying it
isn't) to implement in their projects?

------
jerrya
I did find this point from the original article to be very dubious:

 _In fact, I would argue that starting with NoSQL because you think you might
someday have enough traffic and scale to warrant it is a premature
optimization, and as such, should be avoided by smaller and even medium sized
organizations. You will have plenty of time to switch to NoSQL as and if it
becomes helpful. Until that time, NoSQL is an expensive distraction you don’t
need._

Consider:

\- how hard most organizations find it to refactor, rewrite, retest,
especially in systems that are online 24x7

\- when would you prefer to climb the learning curve with an immature
technology, when you are small and starting out, or when you are a large
company with a large set of users and under "mission critical" constraints
(and possibly stockholders and the like.)

My guess is that ongoing companies find it extremely difficult and expensive
(and wanting for talent) to switch from one sql database to another, much less
switch from sql to nosql.

~~~
wfarr
It's not a matter of SQL vs. NoSQL. They are complementary.

The fact of the matter is, there are some components of systems for which
Redis, Riak, etc may be better suited than SQL in the long term. Starting out,
keeping everything in SQL provides less friction, but as time goes on it may
be necessary to scale the component separately from typical relational data
storage, and that's the point at which these switches are evaluated. These
companies would be replacing SQL only in these components — not across the
board.

The myth of a silver bullet datastore solution is just that: a myth. Different
data stores have different strengths and weaknesses and it becomes necessary
to mix and match at scale.

To quote Benjamin Black: "Scale is pain, princess. Anyone who tells you
different is selling something."

------
flocial
This is opinion versus opinion. I'm sorry to say there's no real content here.
The author went from Yahoo to Craigslist so there's no such thing as premature
optimization at that scale and with the small staff at CL you can be sure that
chasing NoSQL as a fad can ruin the company. Obviously he doesn't fit the bill
of the essay he's criticizing but most devs don't experience the scale of his
problems.

You can't do the topic of NoSQL vs SQL justice with an essay because it would
just be semantic, we're talking about a different theoretical representation
of data structure. You might as well scream "better taste!", "Less filling!".

~~~
jzawodn
Agreed. This is opinion.

Mine is based on years of experience, but I got the impression that the
original article was written based on some cherry-picked reports of what a few
companies said (as opposed to actually being there and doing it).

Maybe I'm being overly critical?

------
LeafStorm
One thing that bothers me is people who talk about "SQL databases vs. NoSQL
databases." That's like framing a debate on transportation as "Cars vs. Not
Cars," where "Not Cars" includes bicycles, planes, buses, subways, boats,
zeppelins, etc. etc.

If you take CouchDB, Redis, MongoDB, and all the other "NoSQL" databases and
compare them, the only thing they share in common is that they do not use a
relational data model or SQL. The way the word "NoSQL" is used, however,
implies that they are some kind of united front against SQL databases, which
is not the case at all. (It's why I am not a big fan of the term.)

Just like you would not use bicycles, planes, subways, and boats for the same
things, you would not use CouchDB, Redis, MongoDB, and Cassandra for the same
things. If you're choosing a database just because it's "NoSQL," then you are
completely missing the point.

~~~
mitchty
I think the problem is the term NoSQL itself, originally was penned as Not
Only SQL. But everyone now looks at the term with No being the actual word No
in relation to SQL, as if there is some war between SQL and not...SQL. I think
that alone is causing more heartburn than needed between the two camps.

------
jpterry
Firstly, I can attest that migrating the datastore of an application which has
scaled to require a NoSQL solution is no trivial task.

Secondly, I believe the author of the original posting really meant that
"premature optimization is the root of all evil." Like this post points out,
NoSQL solutions vary wildly in their abilities and usefulness. A relational
database is a good place to start on the path to an MVP. And if you need
features that a NoSQL solution can provide, and you understand the problem
you're trying to solve, then use a NoSQL solution.

~~~
jzawodn
I think that most people who argue that such a migration "isn't that bad"
haven't actually done it. Or at least they haven't done it for anything
sizable.

~~~
j-kidd
It matters little how bad the migration is, when you ain't gonna need it in
99.9% of cases. When you are big enough to need the migration, your company
has enough resources to roll your own Hadoop distribution.

------
swampthing
Obviously this doesn't really have any bearing on points the author is making,
but a small nit for posterity's sake - I think the point Clayton Christensen
was making in _The Innovator’s Dilemma_ was not that people should adopt
inferior technologies to gain leverage later.

I think the point in that book was more that new technologies are often
inferior in many ways to existing technologies when they first start out, and
the way these new technologies survive/grow is by appealing to niches that
value the existing ways in which the new technology is superior. Then, when
the new technology matures a little more, the market to which it appeals grows
a little larger, and this repeats.

------
jhawk28
The problem is that NoSQL is such a broad term for datastores. Some of them
are simple (like redis) and some more complex (like Cassandra/HBase). They
also have different targets for data types. Using one just because it is a
NoSQL can be a premature optimization just like using a RDBMS can be a
premature optimization. You really need to understand the data and how it will
be used. Before you know what you want to build, it is easy to prematurely
optimize for something you don't need.

Start simple, then iterate...

------
tapvt
Undertaking "optimization", in this case selecting and developing with a NoSQL
datastore early in the process, should only be considered premature if the
costs of doing so (which will be mainly represented by developer-hours spent)
are greater than the value provided by having a datastore that can accommodate
well the needs of the application itself, development team, and end-users.

Adaptability, flexibility (with regard to schema/key structure migration and
maturation), as well as ease of partitioning data intelligently ahead of
demand are all hugely important factors that can and often should inform the
process of selecting a datastore.

If the datastore selected for use: \- shortens development time, \- provides
improved performance for anticipated scale, \- better represents the data
model needing to be captured, \- avoids re-work and "post"-mature optimization
of data models & datastores, \- or accomplishes any combination of the above
... ... then the selection of that datastore should not be considered
premature optimization.

Finding that your traditional RDBMS does not well support the data models you
have developed, especially once the product is out of the gate, will not be
fun. Having to engage in a refactor and data migration to move to a more
appropriate or more performant datastore will be a time- and resource-
consuming process.

As soon as the initial synthesis phase of development can begin, it may be
well worth the effort to experiment with multiple datastores as a means of
evaluating their performance and suitability. Depending on the scope and
potential for the project to scale, modularizing distinct pieces of core
functionality into separate services, each with their own most-suitable
datastore, can also provide great benefit in flexibility of development
processes, as well as adaptability of the product to the demands of the end-
users.

------
antirez
when there are arguments, like in the Jeremy post, commenting about tones and
formal things is a huge FAIL. It is part of the expression of everyone to use
the words and tones he wishes, as long as no one is going to be offended (if
you are super-sensible this is your problem). One thing I always feel as a
problem is that the programming community here in HN is a bit too middle
class-ish, this is annoying: you are off topic, you are not polite, respect
the fact I don't understand, blablabla. Hacking is in my vision connected with
cultural freedom, and not being polite is not the only but one of the possible
expressions. So reply to arguments and stop to be so childish.

------
Devilboy
He's only experienced with MySQL? How can he judge the SQL vs NoSQL battle
when he's never used a proper SQL system? NoSQL does not 'save development
time' in general, it's just a different tool. A much younger and less refined
one at that. Real RDBMSs do a whole lot more than execute your SQL queries for
you.

~~~
bad_user
I don't think there's a SQL versus NoSQL battle, whatever that means.

SQL refers to relational databases, which are databases using the "relational
model" of representing data: <http://en.wikipedia.org/wiki/Relational_model>

This means that any SQL database is very flexible in regards to what you can
store in it, not to mention that it is based on proved theory and battle-
tested implementations of various features, like ACID.

But the relational model also breaks heavily when wanting to work with data
structures that don't blend well -- like graph data. It also breaks down
heavily when you want to spread your data across many servers. It is also not
well suited to storing and querying billions of records -- sooner or later,
your indexes are going to go beyond whatever storage / RAM capacity your
servers have.

Btw, MySQL is a real RDBMS. Even if it lacks some features, it doesn't lack
anything essential to calling it "relational" and talk about advantages or
disadvantages of RDBMSs versus key-value stores or other NoSQL types.

~~~
Devilboy
Facebook seems to be doing just fine with their graph data on SQL

~~~
jjm
There has been talk for sometime that they're MySql issues are crippling
progress and development due to complexity of management and upkeep.

[http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-
worse...](http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-
death/)

~~~
kemiller
Stonebraker has something to sell them, though.

~~~
jjm
Very true.

------
i_crusade
"Again, I think we need to talk about the best tool for the job, not the best
tool for every job. Relational databases are not the best tool for every data
storage job."

Pretty much disqualifies him as moron. Hell, he doesn't say anything.

~~~
burgerbrain
Does this mean that you think that relational databases are the best tool for
every data storage job?

~~~
jzawodn
Uhm, what?

I was trying to counter balance some of the crap I read in the original
posting--was that really not clear?

~~~
burgerbrain
The above comment by myself is in response to 'i_crusade', not you...

~~~
jzawodn
Ugh. Reply fail. Sorry. I was intending to reply to the parent comment, not
yours.

~~~
burgerbrain
no worries

