Hacker News new | past | comments | ask | show | jobs | submit login
Choosing NoSQL For The Right Reason (fatalmind.com)
41 points by fatalmind on May 13, 2011 | hide | past | web | favorite | 13 comments

There are two main reasons to choose NoSQL.

The first, aimed at every developer/application out there, is that some NoSQL solutions (document stores being the most obvious to me) have less friction with OO languages. In other words, they are simply more productive to work with. Embedded types, arrays as first class objects, and schemaless design all help you write simpler infrastructure code that takes less time to write, test and maintain.

The second reason is that some NoSQL solutions provide specialized features/capabilities. For example, what you can do with an [Ordered] Set in Redis simple isn't possible with relational databases of moderate (say 100K rows) size. You either have to lower your requirements (not real time), or increase your hardware. There are various examples...Solr does better indexing, Hadoop does better data processing....MongoDB does good logging and geospatial stuff.

The right way to look at it, imo, is to see RDBMS as specialized systems. Most people should almost always opt for a more productive general purpose solution upfront (say MongoDB), and then only turn on a specialized supplementary system (say Hadoop, or PostgreSQL) when they have those specialized needs.

Choose SQL for the right reasons.

The first part of your post made sense until you said "The right way to look at it, imo, is to see RDBMS as specialized systems".

RDBMS have a lot more features and support many more usage scenarios out of the box than any NoSQL systems and SQL offers more features as a query language than any other NoSQL query languages.

Of course, NoSQL query languages are simpler than SQL by design because the programmer is expected to write a program or a script for complex queries instead of using the query language. But then, you have to write your own execution plan, and think about atomicity and consistency, etc.

NoSQL are clearly the specialized systems, not the general ones.

I think I'd agree with a milder version of your post (I definitely don't see why you're getting negged here). I didn't like the phrase "noSql" when I first heard it (still don't, really), but once I realized the movement was really more about "notAlwaysSql", it started to make sense. People had been using an RDBMS almost reflexively for most kinds of persistence, and they don't always make sense.

That said... some NoSQL may work better with OO, but so do some RDBMS schemas. I've had experiences where it's simple and almost 1:1 between object and table, and other times when I'm bending so far that I wonder why we're taking an OO approach at all.

I don't think that I'd disagree with your claim that an RDBMS is a specialized system, but I'm not sure that various noSQL solutions are any less specialized.

I agree, choose SQL for the right reasons, but you could just replace that with "choose your persistence approach for the right reasons".

"MongoDB does geospatial stuff" - FYI, PostgreSQL with Postgis has many more geospatial querying capabilities vs Mongo which primarily uses geohashes.

Here's some NoSQL differences: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

It's outdated though (MongoDB stable is 1.8.1 already).

One of the benefits of NoSQL is supposedly that you can change your schema on the dime and not suffer from it. I'm finding that you just have to plan for different things up front. For example, most NoSQL databases are awful at querying and to get things that are easy in SQL you have to plan ahead in NoSQL. To do a count in Redis, for example, you have to use INCR on a key to keep a count.

One gripe I have is that NoSQL is becoming the default by most PaaS providers. I have a Duostack account and they offer 3 database choices: Redis, Mongo, and MySQL. In their documentation they recommend using persistencejs which doesn't work with the new versions of node.

  Honestly, if I would have to develop a revision control
  system, I wouldn’t take an SQL database as back-end.
But then, suddenly, Fossil, which seems to be doing pretty okay. (Admittedly, it does a lot more than manage blobs.) http://fossil-scm.org/index.html/doc/trunk/www/index.wiki

I wish there was an SQL option to cause a fatal error if it needs to do a table scan. This would be very useful on test systems, at least.

NoSQL forces discipline which is a good thing, preventing "late" performance optimization when in production.

That would be nearly useless in my opinion.

If you have a table with four rows in it, would you like the database to give you errors when you do a full scan on it?

What you want is an option to throw errors when a query takes too long to execute and most products already have this in the form of a statement timeout parameter.

On a separate note, the balance between "late" optimization and "premature" optimization is thin indeed... These test systems need to run with the same data and on the same hardware as your main database to give meaningful results, so it's generally unavoidable to do some kind of performance tweaking on the already deployed code. And of course as the data distribution and its amount changes, you need to keep on optimizing...

The point is that when your table has 4 rows in it, it doesn't matter if you do a full table scan. Wait until you deploy your app and that table now has 4,000,000 rows in it.. then you have a problem.

In MongoDB (and maybe others too) there exists an option to fail when doing a table scan so that you know immediately while developing that there exists a potential for things to go awry at some later point. You can fix that now by either re-thinking your data structure, re-thinking your query, or adding appropriate indexes.

MySQL's slow query log is good for reactive development instead of proactive development.

I think its an indispensable feature.

Errr... With a typical RDBMS, doing a full table scan depends on the current stats (e.g., number of rows, are the stats current, state of the index, possibility to use an index, presence of cached data, previous query performance etc.). It does not depend only on the query and the data structure, so it is near impossible at development time to be certain that a table scan will occur during production. In fact, a table scan might occur today and not tomorrow for the same query, but with different data.

Many developers have been burned by trying to optimize their queries and indexes too early. The query planner, although sometimes unpredictable, is generally better than developers at predicting the performance of a particular execution plan. Table scans are also preferable in certain situations.

Relying on the fact that RDBMS have sophisticated query planner (as opposed to MongoDB) is not a sign of "reactive development". It just means that you should test your queries (and your indexes) with close-to-production data, and, as the data grows and evolves, continue to test, because the best execution plan might change.

Slower query using indexes is better than randomly using table scans depending on statistics, ...

At least for interactive apps, NoSQL wins.

There are cases where a full table scan is faster than an index scan, and there are cases where the index scan is faster. There are even cases where the only option is a full table scan!

I still don't see what a setting that gives errors when a particular access method has been chosen would be good for.

You don't care about the plan, you care about how fast the query runs.

You'll always have to do reactive development when trying to get good performance. There is only so much simulating and testing you can do and real data volume and distribution changes all the time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact