

NoSQL: If only it was that easy - bjclark
http://bjclark.me/2009/08/04/nosql-if-only-it-was-that-easy/

======
endtwist
I'll be honest (and chances are I'll get a great explanation as to why I'm
wrong), I don't understand the whole YeSQL-NoSQL argument.

Relational databases, such as MySQL and PostgreSQL, are great for some things
-- especially when you need to (put simply) divide and recombine various
pieces of information easily. JOINs. UNIONs, etc. are fantastic when you need
them, and it's where relational databases excel.

On the other side of the equation are hash and "document" databases, which are
mostly key-value stores with some unique functionality (e.g. the indexing
feature of Tokyo Cabinet Tables). These are great when you need to store a
large volume of data, but don't need to frequently recombine the stored data
in many different ways. You can run simple queries with some hash DBs and
retrieve specific data, by key, at the drop of a hat; this is where (for the
most part) key-value stores win out over relational databases.

It is clear, at least to me, that these two classes of DB serve two different
purposes, with some overlap. By that logic, I see no reason why they can't
happily co-exist in a single project. In fact, I'm using multiple database
formats in a single application without any problem. Sure, there is a little
more management and some extra logistics involved, but the point is that each
type of database is used with its strengths in mind.

I'm not going to force MySQL to be a giant hash table when something else will
do the job better, and I'm not going to force Tokyo Tyrant to try and be a
relational database (not that I really could).

Am I wrong in finding a happy medium between multiple technologies?

~~~
bsaunder
I think you are missing another fundamental difference. The NoSQL solutions
are more like a dynamically typed language where as the YeSQL ones are
statically typed.

In NoSQL, I don't have to tell the database layer what I'm stuffing into the
value (or even how I'm building the keys). Of course the application is taking
on a lot of responsibility that previously belonged in the persistence layer.

I think this is the true draw to the NoSQL philosophy. Especially when you
consider the apparent (to me) rise in meta-programming.

~~~
bjclark
I don't know that being schemaless is fundamentally different. Instead of
thinking that key/value stores are schemaless, I think of them as having a
single rigid schema: key and value. The data you stick into a k/v db will
still have essentially, a schema, you just end up writing your application to
be flexible to data inconsistency.

The comparison to dynamic vs static typing doesn't hold much water.

Can you explain how meta-programming relates to this topic at all?

~~~
silentbicycle
Metaprogramming: I think he/she is associating schema-less DBs with
dynamically typed languages and metaprogramming, and static typing with
languages such as C that tend to lack sophisticated metaprogramming
facilities.

I'm not sure it's a valid comparison, though: OCaml has a statically typed
Lisp-style macro engine (camlp4), for example. (In all honesty, though, I've
never used it. Lazy evaluation, the packaging system, and other language
features cover many of the same use cases.) I think it's a case of assuming
the C family's type system is the cutting edge of static typing, when it's
actually pretty archaic.

The point about K/V databases having a rigid schema of (key_type ->
untyped_value) is a good one, by the way. Of course, association tables (AKA
"dictionaries") are pretty versatile as a basic collection type - looking at
Lua, Python, Awk, or Javascript. They're not ideal for all cases, but they're
a good start for most.

------
davidw
The subjunctive: if only it were that easy. Actually, it is in English because
we only ever really notice it in a few places...

Grammar nitpickery aside, good article. Is client-side hashing really that big
a deal? I don't know from first hand experience because I've never bothered
with these things yet.

~~~
bjclark
It's not a big deal when you do it the first time. It's a big deal when you
need to start adding servers. What do you do?

~~~
davidw
I don't have scaling problems, I don't have sites that big, yet.

~~~
qhoxie
He's pointing out a situation where it becomes a bigger deal, to answer your
question. While there are solutions in practice for this situation, it can be
painful.

~~~
davidw
I thought he wanted to know what I did. Since I already said I don't know much
about this problem domain, I'd be curious why it's a problem when you start
adding servers.

~~~
qhoxie
One such bad situation would be if you have numeric keys that you shard based
on a mod 2. You do this because you want to distribute data between 2 servers,
so one will get all even keys and the other will get the odd keys.

A few months down the line your load gets so high that you need to add a third
server, but your sharding mechanism is based on mod 2 - so you need to hack
around this in one of many ways to utilize the third server effectively.

Make sense?

~~~
davidw
So don't people just publish client libs that handle the details for you? It
doesn't seem like an unsurmountable problem, but once again, I've never dealt
with it in practice.

~~~
qhoxie
_So don't people just publish client libs that handle the details for you?_

Sometimes, but oftentimes sharding issues are unique enough that they require
custom solutions when the backend technology does not support it natively.

~~~
davidw
And these sharding issues don't happen with the things he marks as "scales!"?

~~~
qhoxie
That is correct. Things like S3 can expand without issue, and others like
Voldemort handle partitioning data natively.

------
banditaras
I thing there's an explanation to the NoSQL movement and why non-rdbms are
gaining popularity these days.

Not a long time ago, if you had to handle 1 billion records then you were
already big with piles of $ sitting beside you. Then you would just call
Oracle or Greenplum (or whatever) and say "Hey, I've got some million $ here,
care for it? I just want to be fast(tm)"

Lately it is quite possible to be small and forced to handle millions of
records daily or struggle with new technologies that require less man power
for development but need scaling a lot earlier (e.g Rails). You have limited
resources and if you can get away with Tokyo Cabinet on a single machine
instead of a 100k mysql cluster then you get to live longer.

As any new cool technology it'll get some hype and then fall back to being the
right tool for the job. It's good to have some alternatives to RDBMS, some
problems are really not relational.

------
10ren
Before relational databases, you know what was the standard? Hierarchical
databases, aka key-value stores.
[http://en.wikipedia.org/wiki/Database_management_system#Hist...](http://en.wikipedia.org/wiki/Database_management_system#History)
Since then, object oriented databases have come and gone - another
hierarchical DB.

I wasn't there when it happened in the 70's, but I think it's essential to
understand why relational beat hierarchical then; and to understand what has
changed since; because only then can we make an informed prediction.

I think the issues were efficiency and ability to analyze the data. Anybody
know what the actual reasons were?

~~~
silentbicycle
There's a major niche in which hierarchial DBs are quite common - filesystems.
While there have been attempts to graft relational features onto them
("journaling" overlaps strongly with transactions, symlinks, etc.), apparently
most people haven't historically considered the ability to do arbitrary joins
over a whole filesystem to be worth the impact it would probably have on
performance.

Also: _Hierarchial databases are very different from key/value stores._ K/Vs
are flat (joining, batch consistency checks, etc. are usually done clientside
in a faster/safer language than the DB was implemented in, such as PHP), while
hierarchial databases are (naturally) hierarchial. If a client searches for
data in a directory, it will only see the nodes in that directory, or those
found by walking any subdirectories. While this can be _very_ fast (most
records won't be in the same directory, so the first operation has already
reduced the search space considerably), much data fits poorly in trees, and
constantly rearranging subdirectories or adding symlinks is at best a
workaround for the massive conceptual mismatch.

OTOH, it's a good fit for some cases - we're communicating on a hierarchial
database right now. As moving one comment thread from one post or parent
comment to another would be incredibly rare (if it were possible), a lot of DB
functionality to allow flexibility in data modeling can be discarded. Modeling
the discussion as a tree of nodes (S-exps, in this case) with pointers to
parent comments, subthreads, user IDs, etc. is pretty straightforward.

------
gcv
Why doesn't Oracle Coherence ever come up in these discussions of RDBMS
alternatives?

I've been going over its documentation, and it looks like the cat's meow in
getting rid of the database scalability bottleneck. Think of it as memcached
done right. Coherence has automatic clustering, needs almost no configuration,
supports many flexible distributed schemes, easy API. With Clojure and Scala,
you don't even have to use Java to use it.

Has no one heard of it? Does it get no love because it's commercial and
(probably) expensive?

~~~
bjclark
>> Does it get no love because it's commercial and (probably) expensive?

Yes.

------
luckyland
Peer pressure. I have never seen this in any other light.

The "versus" mentality of this stupid meme comes from those either
proselytizing or stubborn. Plain and simple.

If you're too ignorant to understand what is the appropriate tool for your
situation, or offended by the perceived alternatives, then you should withdraw
from this topic.

~~~
bsaunder
_If you're too ignorant to understand_

That's not fair. Everyone starts out as ignorant. I think the "versus"
mentality helps less informed people separate two fundamentally different
approaches to persistence.

There's a lot of information in this post. Granted, it's just one person's
opinion. Take it for what it's worth and move on.

------
amrith
Great post. NoSQL seems to be a band-aid. And for some problems, the best
solution is to rip off the band-aid :)

My thoughts at <http://hypecycles.wordpress.com/2009/08/05/look-ma-nosql/>

SQL has survived because it is reasonably portable, is descriptive and has a
rich ecosystem. NoSQL is a response to a problem but it is one that will not
get mainstream adoption because it is non-standard.

It is a great stepping stone; what comes out of it is hopefully a standards
based scalable data storage and retrieval system.

------
viraptor
Coincidence... I'm looking for some simple caching layer right now. But what I
need is that data is distributed to all server nodes. I don't care about
insert time that much. And the only thing I need to store is a 'unique string
-> string' map. (very short strings)

Can someone recommend any solution? I'd be perfectly happy with a "replicating
memcached"...

------
jhancock
"What am I going to build my next app on? Probably Postgres. Will I use NoSQL?
Maybe"

I just spent some time evaluating several k/v stores for a new project. My
conclusion: These things may be useful when I'm ready to optimize. Until then,
I'll put everything in postgres.

------
figital
Most of the noSQL information I've read appears to be more focused on caching,
speed, scalability, distribution, etc. To me SQL is a query language, not an
RDBMS. "noSQL" is unfortunately misnamed (and rhetorically loaded) although
it's much catchier than "noRDBMS".

------
kevinpet
I stopped reading as soon as he posted something looking like a dictionary
definition.

~~~
qhoxie
Hopefully you don't always stop reading thoughtful pieces for trivial reasons.

------
siculars
im sure people affiliated with or users of the respective projects will take
issue with points made. all told, however, the main takeaway for me is what i
already knew and which the author mainly drives home in the last paragraph.
specifically, use and take advantage of all the tools out there including the
ones in the nosql camp and the more mature rdbms camp. and simply, get started
already!

projects im working on now sometimes have the same data represented
differently in different systems to take advantage of what those systems have
to offer. many of them mentioned in the article.

------
tjogin
I don't think the primary drive or hype behind NoSQL is _performance_. I think
it is that SQL has become some sort of norm for data persistance, when in many
cases a SQL database isn't necessary, or even optimal.

~~~
dmerr
Right: there are multiple possible benefits of "NOSQL" (nonrelational): (1)
scalability; (2) better performance, and (3) easier coding and development
especially with agile development methodologies. With certain of these
products, notably the document-oriented stores (MongoDB&CouchDB), development
is more natural as object<->relational mapping goes away. That is a nice
benefit even for the smallest of projects.

~~~
tjogin
Absolutely, it feels like you're just working with hashes and arrays, as
opposed to SQL-tables masquerading as such.

