Hacker News new | past | comments | ask | show | jobs | submit login

I was part of a large migration that moved a significant amount of data from MongoDB to HBase recently. Along the way, I also spent significant time digging into Cassandra and Riak.

I appreciate the effort and intention behind this article, but for all practical purposes, such numbers are really not helpful. From experience, if you really want performance from HBase, you need to spend significant amount of time coming up with the right way to structure your data in HBase. To name a few:

* choosing the right row key that optimizes bulk scans

* setting optimum client and server caching based on the size of each row

* pre-splitting regions, and setting custom region sizes

You will also run into various cluster-related issues. Things don't really scale linearly as you add more nodes. You need to also consider maintenance, upgrades, backups, replication and so on.

If you want to choose a NoSQL (or for that matter any database), spend some time thinking about whether it fits your data model and your own understanding of the technology. Performance is rarely gained by simply switching a few knobs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: