

Choosing NoSQL For The Right Reason - fatalmind
http://blog.fatalmind.com/2011/05/13/choosing-nosql-for-the-right-reason/

======
latch
There are two main reasons to choose NoSQL.

The first, aimed at every developer/application out there, is that some NoSQL
solutions (document stores being the most obvious to me) have less friction
with OO languages. In other words, they are simply more productive to work
with. Embedded types, arrays as first class objects, and schemaless design all
help you write simpler infrastructure code that takes less time to write, test
and maintain.

The second reason is that some NoSQL solutions provide specialized
features/capabilities. For example, what you can do with an [Ordered] Set in
Redis simple isn't possible with relational databases of moderate (say 100K
rows) size. You either have to lower your requirements (not real time), or
increase your hardware. There are various examples...Solr does better
indexing, Hadoop does better data processing....MongoDB does good logging and
geospatial stuff.

The right way to look at it, imo, is to see RDBMS as specialized systems. Most
people should almost always opt for a more productive general purpose solution
upfront (say MongoDB), and then only turn on a specialized supplementary
system (say Hadoop, or PostgreSQL) when they have those specialized needs.

Choose SQL for the right reasons.

~~~
St-Clock
The first part of your post made sense until you said "The right way to look
at it, imo, is to see RDBMS as specialized systems".

RDBMS have a lot more features and support many more usage scenarios out of
the box than any NoSQL systems and SQL offers more features as a query
language than any other NoSQL query languages.

Of course, NoSQL query languages are simpler than SQL by design because the
programmer is expected to write a program or a script for complex queries
instead of using the query language. But then, you have to write your own
execution plan, and think about atomicity and consistency, etc.

NoSQL are clearly the specialized systems, not the general ones.

------
Joakal
Here's some NoSQL differences: <http://kkovacs.eu/cassandra-vs-mongodb-vs-
couchdb-vs-redis>

It's outdated though (MongoDB stable is 1.8.1 already).

------
MatthewPhillips
One of the benefits of NoSQL is supposedly that you can change your schema on
the dime and not suffer from it. I'm finding that you just have to plan for
different things up front. For example, most NoSQL databases are awful at
querying and to get things that are easy in SQL you have to plan ahead in
NoSQL. To do a count in Redis, for example, you have to use INCR on a key to
keep a count.

One gripe I have is that NoSQL is becoming the default by most PaaS providers.
I have a Duostack account and they offer 3 database choices: Redis, Mongo, and
MySQL. In their documentation they recommend using persistencejs which doesn't
work with the new versions of node.

------
lvh

      Honestly, if I would have to develop a revision control
      system, I wouldn’t take an SQL database as back-end.
    

But then, suddenly, Fossil, which seems to be doing pretty okay. (Admittedly,
it does a lot more than manage blobs.) <http://fossil-
scm.org/index.html/doc/trunk/www/index.wiki>

------
silon
I wish there was an SQL option to cause a fatal error if it needs to do a
table scan. This would be very useful on test systems, at least.

NoSQL forces discipline which is a good thing, preventing "late" performance
optimization when in production.

~~~
wulczer
That would be nearly useless in my opinion.

If you have a table with four rows in it, would you like the database to give
you errors when you do a full scan on it?

What you want is an option to throw errors when a query takes too long to
execute and most products already have this in the form of a statement timeout
parameter.

On a separate note, the balance between "late" optimization and "premature"
optimization is thin indeed... These test systems need to run with the same
data and on the same hardware as your main database to give meaningful
results, so it's generally unavoidable to do some kind of performance tweaking
on the already deployed code. And of course as the data distribution and its
amount changes, you need to keep on optimizing...

~~~
bryanmig
The point is that when your table has 4 rows in it, it doesn't matter if you
do a full table scan. Wait until you deploy your app and that table now has
4,000,000 rows in it.. then you have a problem.

In MongoDB (and maybe others too) there exists an option to fail when doing a
table scan so that you know immediately while developing that there exists a
potential for things to go awry at some later point. You can fix that now by
either re-thinking your data structure, re-thinking your query, or adding
appropriate indexes.

MySQL's slow query log is good for reactive development instead of proactive
development.

I think its an indispensable feature.

~~~
St-Clock
Errr... With a typical RDBMS, doing a full table scan depends on the current
stats (e.g., number of rows, are the stats current, state of the index,
possibility to use an index, presence of cached data, previous query
performance etc.). It does not depend only on the query and the data
structure, so it is near __impossible __at development time to be certain that
a table scan will occur during production. In fact, a table scan might occur
today and not tomorrow for the same query, but with different data.

Many developers have been burned by trying to optimize their queries and
indexes too early. The query planner, although sometimes unpredictable, is
generally better than developers at predicting the performance of a particular
execution plan. Table scans are also preferable in certain situations.

Relying on the fact that RDBMS have sophisticated query planner (as opposed to
MongoDB) is not a sign of "reactive development". It just means that you
should test your queries (and your indexes) with close-to-production data,
and, as the data grows and evolves, continue to test, because the best
execution plan might change.

~~~
silon
Slower query using indexes is better than randomly using table scans depending
on statistics, ...

At least for interactive apps, NoSQL wins.

