
NoSQL, Heroku, and You - shawndumas
http://blog.heroku.com/archives/2010/7/20/nosql/
======
viraptor
I find it interesting that every time "NoSQL" solutions are listed, noone
mentions Berkeley DB. It provides many options that you'd normally find in k-v
stores and many more. Is it just not cool enough / too old / ... ?

It seems very similar to the time (a year or two ago) when every couple of
weeks people got excited by a new cool wire-protocol (around the time of
protocol buffers, thrift, etc.), but noone even mentioned ASN.1.

~~~
whopa
Berkeley DB is an embedded database with no network story, let alone
distributed story, so it isn't really a complete solution compared to the k-v
stores people talk about. It can be a building block though, e.g. memcachedb
uses it.

ASN.1 is hideously complex, part of the appeal of all those new wire protocols
is simplicity.

SQLite is Berkeley DB's main competitor... it's actually funny that NoSQL on
the server is all the rage, but SQL use is now really popular client side.

~~~
viraptor
> Berkeley DB is an embedded database with no network story, let alone
> distributed story

True, true and not really. It can be easily distributed - replication works
great with them. Yes, it's not good for everything, but it's great for example
for simple web api servers. Skip the connections if you don't really need it
and write/read k-vs at a ridiculous speed locally.

> ASN.1 is hideously complex

Yeah... I guess this is more of a personal preference, but I never found it
that bad. Write a wrapper that can (de)serialise your native objects once and
you don't have the problem anymore.

------
rlm
Dupe: <http://news.ycombinator.com/item?id=1530895>

------
okaramian
As someone that's been interested in this stuff but has not been able to apply
any of it for work/personal purposes, this is a pretty darn useful summary of
the solutions out there.

------
carson
Even if you aren't interested in using Heroku this is a good read. I hadn't
seen the mix of technologies referred to as polyglot persistence before but it
sounds appropriate.

------
stevefink
This is a good reference point, at least for starters, when someone asks you
"what kind of NoSQL database should my application use, if any?"

------
dholowiski
It was an interesting article, but it would be interesting to see what the
author has to say now (it was written about a month ago) after the spectacular
failure of Digg/Cassandra.

------
chrisbaglieri
Heroku just gets it. Top-notch hosting aside, for personal exploration,
there's no better platform: frictionless, free, and dead simple.

~~~
franck
I'm sorry but Heroku is far from being free, unless you mean free as in Free
Software.

Google App Engine is another awesome PaaS which is really free, because the
free quotas are huge compared to Heroku's tiny 5MB database.

~~~
chrisbaglieri
Let me clarify, my comment was in regards to _personal exploration_ and
nothing beyond that. Most add-ons have a basic, very limited, free option. The
same goes for the pricing on Heroku's platform. If you're rockin' an app that
needs to be production scale, baseline won't cut it on Heroku; frankly, you
get what you pay for so I don't expect it to be free.

Point for point, GAE's baseline quotas are obviously better but it's not an
apples to apples comparison in my opinion.

------
ergo98
Interesting read, but as with almost all of these pieces it starts off with a
_completely incorrect_ statement about old-school databases that turns it from
good information to, essentially, propaganda.

"SQL (the language) and SQL RDBMS implementations (MySQL, PostgreSQL, Oracle,
etc) have been the one-size-fits-all solution for data persistence and
retrieval for decades."

This is completely untrue.

For instance some RDBMS systems are purely in-memory. Some are optimized for
SSDs. Some are forced persistence, where durability is job #1.

With an RDBMS you have the _option_ to use bounding (and expensive)
transactions. Or you might not.

You have the option to normalize. Or to denormalize. Or to store all of your
data in a giant table that is nothing but a varchar. Or to find some balance
in between.

RDBMS systems have supported loose replication for many years -- see
replication in SQL Server, with multiple masters, conflict resolution, and as
much decoupling as you'd like.

You have always had the option of choosing and picking your style of ACID with
the classic RDBMS.

The RDBMS solution was never a one-sized fits all solution. Some would then
argue that either you use them as a fully-transaction, ACID, fully-normalized
stack or you're "effectively using NoSQL", which is utter bunk that defies
reason.

That particular bit of NoSQL advocacy has always derailed the conversation
because it isn't factually correct and turns it into a religious argument.

Then there's the RDBMS are some rusty, approach-

"The SQL databases we’re using today were designed over a decade ago. They
were written with the constraints of 1990s hardware in mind: storage is cheap,
memory and cpu are expensive."

This, and the conclusions drawn out of this, are so extraordinarily wrong that
I don't even know where to begin. It's yet another example of trying to twist
reality show how the RDBMS has rusted, but it's completely in defiance of
reality.

The weak point of the RDBMS chain has virtually always been I/O -- getting
lots of IOPS has always been a problem, and it is virtually always the weak
link in most database operations. IOPS to the disk matter because most
database systems don't consider the job of a transaction done until the
operations have been confirmed completed to the disk.

Storage has _never_ been cheap (despite the absurd claims in this article). It
has always been the most expensive part of the equation! To get a decent
platform to run a moderate sized database on is almost always the most
prohibitive part of the equation, with ridiculously expensive rigs from high
end SAN providers.

But of course we now have SSDs. Limited IOPS have been the Achilles heal of
the RDBMS, especially for those who heavily normalize (it's an _option_ in
some scenarios), but SSDs move us from 100 IOPS per disk to 15000+ IOPS. If
anything, the RDBMS was designed for tomorrow's computer.

And for the record, at this very moment -- when I pulled up HN for a
distraction -- I'm working on a MongoDB solution. Despite my appreciation of
the product, I have a strong, _very_ strong, dislike for misleading
propaganda. The RDBMS has some serious downsides, but manufacturing a new
reality to sell alternatives isn't the best approach.

~~~
akshayubhat
I completely agree, I find it weird whenever someone mentions Hadoop as as
NOSQL. Hadoop is a Distributed Computing system. While HDFS is the Database.

Hadoop is more of an Distributed O.S. to run process and store data across
multiple machines.

~~~
akshayubhat
err HBASE is the database and not HDFS.

