

Jeremy Zawodny: NoSQL is Software Darwinism - mark_l_watson
http://blog.zawodny.com/2010/03/28/nosql-is-software-darwinism/

======
andr
Am I the only one that finds certain aspects of NoSQL databases easier to use
than SQL? Despite having used different SQL ORMs and writing two on my own,
I'm a big fan of MongoDB. It's fun using it for small projects, too, even if I
know scalability and speed won't be an issue.

~~~
justinsb
OK - I have to bite. Who writes 2 ORMs? What was wrong with the first one?

~~~
neilk
You were surprised? In my experience most hackers love redoing the last
project, only better. There's all the joy of victory over your past mistakes,
and less of the pain of struggling with an unfamiliar problem.

They love it so much they usually have to be counseled out of doing it when it
isn't a good idea for strategic or other reasons.

~~~
silentbicycle
This is usually called the "second system effect", especially when the second
is overambitious and never gets finished.

~~~
frou_dh
That made me think of TextMate 2.

------
justinsb
NoSQL is the duck-billed platypus in the evolution of databases. Absolutely
fascinating, provides great material for PhD students, but a total dead end.
Might survive in some isolated corners of the world; not one to bet on to be
the next dominant species.

~~~
DrJokepu
A while ago I have held the same opinion; that is up until I have faced a
database problem that genuinely wasn't very RDMBS-friendly so I was "forced"
to have a look at the alternatives. Ever since then I have realized that there
there are large domains of problems that can be solved a lot easier with
"nosql" configurations than with RDBMS systems. I mean stuff that I've used to
solve with SQL databases before.

My point is: it is always a good idea to have a larger perspective.

~~~
spudlyo
I had a similar change of perspective in 1997 when I worked at a well known
Internet retailer. The entire catalog of items we sold (as well as customer
reviews and other data) were all stored in a number of key value stores
(Berkeley DBs) that were routinely built and pushed out to each web front end.
This was very fast and for our purposes was much better than storing this
information in a centralized SQL database.

~~~
justinsb
If you'd had viable open source SQL databases in 1997, would you have spent
the engineering time on BerkeleyDB, or would you simply have replicated the
master database onto each server? In 1997, you weren't choosing NoSQL vs SQL,
you were choosing open-source vs commercial.

Anyway, you're essentially storing pre-generated pages, which isn't a use case
that I think anyone considers particularly database-appropriate (SQL or
NoSQL). Using memcached to cache the data that is in active use seems more
efficient, faster, and gives you the option to force through out-of-sequence
updates (though again, that wasn't available to you off-the-shelf in 1997.)
Complete re-generation of data might have worked for Amazon in 1997, but is
this what they're using today?

~~~
spudlyo
With open source SQL databases there is no "simply" when it comes to
replication. Even today, MySQL replication is brittle, and master/slave
inconsistencies are the rule rather than the exception. Slave crashes often
cause replayed transactions due to lack of atomicity in writing master.info
and relay-log.info. The replication landscape with PostgeSQL is varied and
essentially a bag on the side. Last I counted there were more than 10
different ways of doing it, a number involving trigger based log shipping. It
wasn't about open-source vs commercial, it was about scaling the reads.

The detail pages weren't pre-generated, they were based on read-only catalog
data, which I think is entirely database appropriate. I imagine that complete
re-generation of data is no longer done, but I'd be willing to bet that
Berkeley DBs are still used in production somewhere.

~~~
justinsb
A read-only database isn't a database in my book.

I agree that built-in replication can be difficult to administer even today,
but you're being completely revisionist here. Replication wasn't introduced
into MySQL until 2000. In 1997, you would by necessity have rolled your own
replication system tailored to your needs (much simpler than solving the
general-case problem). That's basically what you did anyway, but you solved it
in the most trivial way possible: you 'replicated' by doing a complete
database dump and re-distributing the entire DB. If you'd had a viable open-
source relational database, you could have scaled the reads and got more
developer productivity by distributing a SQL database (e.g. SQLLite) rather
than a key-value database (BDB).

I appreciate your standing up and giving a concrete example of NoSQL usage -
nobody else has been brave enough to do so. But it seems that the reasons for
it were highly specific to the time: there were no viable open-source
databases, Amazon was just introducing the idea of customer reviews (i.e. pre
Web 2.0) so data was primarily read-only, memory was comparatively expensive
and memcached didn't exist, and you had a comparatively small product catalog
where complete re-generation was an option. I don't think you can carry
forward the optimizations you made in that framework into today's world.

~~~
nicpottier
See my reply to the grandparent.

I actually was responsible for that system, and moving away from BDB's being
pushed to servers sometime in '00 or so.

As you said, these weren't really databases by any stretch of the imagination,
simply snapshots, and built for a very specific type of query. (by asin, by
time, reverse ordered)

The building of the DB's was a pain in the ass, because the sheer scale of
them was so big that you had to do clean builds (instead of incrementals)
fairly often without them wasting space. There was also all sorts of voodoo
magic going on to work around various BDB issues.

The system did eventually move to a service architecture (as all of AMZN did),
for two main reasons:

1) pushing that much data to more and more servers was getting insane, even on
their inner networks.

2) we wanted faster turnaround for new reviews

3) rebuilding the BDBs was becoming more and more cumbersome with scale

All that said, the original system did take us pretty darn far, both in
scalability of traffic and scalability of data, farther than most websites
will ever reach.

Fun times working there, you really get to work on some unique problems.

