Hacker News new | past | comments | ask | show | jobs | submit login
The four categories of NoSQL databases (rebelic.nl)
46 points by wspruijt on May 28, 2011 | hide | past | favorite | 25 comments



Imho this article is too much of an oversimplification. It's forcing artifical labels on some things that don't really fit very well.

It's fairly useless if you've already heard about most of these and it's a false start if you haven't.

If you do want such a list, a much more accurate and complete one is at http://nosql.mypopescu.com/kb/nosql


You could be surprised about the efforts many folks do to try to put the different DB solutions into some kind of schema. The reason for this is mostly opaque for me...


It can be useful to have a sort of map of the landscape if you're looking around. Sort of how in PLs, it's nice to be able to have a categorization better than "there are thousands of programming languages". Coming up with a good map is hard, though, and any categorization does tend to emphasize some factors and gloss over others...


Categorizing the stores by their data model is misleading. Tokyo Tyrant and Voldemort are very different: later is a multi-master, partitioned, distributed system; former is a network attached database. In fact, one could easily use Tokyo Cabinet as the persistence engine in Voldemort in place of BerkeleyDB.

A far better classification should be done along the lines of the distribution models and handling of failures. Do they use consistent hashing or range based partitioned? Under failure, is consistency or availability given up? Are there barriers to nodes rejoining after recovering from a failure? Is there an ordered, distributed commit log, etc...

It's much harder to change the distribution model than it is change the query model. Riak, HBase and Voldemort also have support for running custom code against which queries may be made. On the other hand, adding a totally ordered commit log (via serializing through a floating master or using consensus) to Voldemort or Riak, or adding multi-master (in the sense of multiple region servers taking reads and writes for the same partition) support to HBase is going to be far more difficult.

Creating a "grand classification scheme" that ignores the most important aspect suggests a dilettante approach (not saying the author is a dilettante, but it's a much more complex topic which requires more than dabbling).


I'm puzzled by their describing Riak as a column oriented database. It is a key/value store. Perhaps they are thinking of the highly beta Riak Search? http://wiki.basho.com/Riak-Search.html


Key/value store or maybe a document database. But it certainly has nothing to do with being column oriented.


Basho themselves have complained about being called a document DB. Probably because you can store absolutely anything in Riak. Regardless of the reason I think we can safely call it a KV store and be done with it.

edit: And by Basho themselves I don't mean some official statement, I just mean people who work for Basho.


I would put RIak in either the Key/Value store category but most likely in the Document Database category and certainly not in the Column Family Store category. Riak is in no way a column store.


Describing Redis as just another key value store kind of misses the point.


I agree. Anybody interested in Redis should get to know its hash tables, lists, sets, sorted sets, etc. It's sometimes amazing how many problems Redis can solve out-of-the-box, just because it provides these common data structures as a persistent service.


redis is key-structured-value db


And no mention of memcache in the key/value list. Not cool enough anymore?


Indeed. Memcached is very effective for what it does and IMHO one of the fathers of the whole NoSQL thing.


I thought they left it off because it wasn't persistent.


This is just a particular tradeoff IMHO. It is persistent as long as the process is running. Other databases are persistent as long as the HD is working :) Others are persistent as long as N nodes are still working. Just tradeoffs.


While technically possible to accept in some cases, nowhere-durable is a rather extreme tradeoff option.

It would be interesting to reconsider memcached as durable storage in the modern SSD era, however.


Memcachedb is persistent though.


I believe his description of document-based databases is mistaken WRT versioning: it's a central concept for couchdb, but not for mongodb.


It would be great if there was a very basic web site that could guide you to the most suitable storage solution based on the problem you are working on. Sorta like a guide or wizard.

I think it's a shame that the movement is called NoSQL because it sorta creates this false dichotomy that the field of web data storage is divided in two camps. In reality, you need to, as always, pick the right solution for the problem you are facing. NoSQL databases will never entirely replace SQL databases and I think it's important to not shoehorn NoSQL in places where it doesn't really belong.


Philip Greenspun has a short article on the subject at:

http://philip.greenspun.com/teaching/three-day-rdbms/beyond-...

And there's a discussion about the article on his blog:

http://blogs.law.harvard.edu/philg/2011/01/13/rdbms-versus-n...


To your request for such a guide I would like to request:

Documentation of the technological overhead: the soft/hardware/sys_admin_experience overhead required by each (and the hosts offering such support overhead for them - compared for example to support for MySQL :/ ).


The best such comparison that I've seen is http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis.



What about non-relational object databases?


how about EAV, OOdbms, functional and deductive db?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: