NoSQL databases benchmark: Cassandra, HBase, MongoDB, Riak

karterk · on Nov 2, 2012

I was part of a large migration that moved a significant amount of data from MongoDB to HBase recently. Along the way, I also spent significant time digging into Cassandra and Riak.

I appreciate the effort and intention behind this article, but for all practical purposes, such numbers are really not helpful. From experience, if you really want performance from HBase, you need to spend significant amount of time coming up with the right way to structure your data in HBase. To name a few:

* choosing the right row key that optimizes bulk scans

* setting optimum client and server caching based on the size of each row

* pre-splitting regions, and setting custom region sizes

You will also run into various cluster-related issues. Things don't really scale linearly as you add more nodes. You need to also consider maintenance, upgrades, backups, replication and so on.

If you want to choose a NoSQL (or for that matter any database), spend some time thinking about whether it fits your data model and your own understanding of the technology. Performance is rarely gained by simply switching a few knobs.

jbellis · on Nov 2, 2012

Note that the HBase numbers are so good because the client wasn't actually hitting the database: http://brianoneill.blogspot.com/2012/10/solid-nosql-benchmar...

knightni · on Nov 2, 2012

Just came here to post this.

It's frustrating, because a configuration like that makes the numbers completely irrelevant, unless you don't care whether your data gets persisted or not. I'd much rather see a more like-for-like comparison, and perhaps an unsafe configuration included for interest's sake.

linuxhansl · on Nov 3, 2012

Nobody in their right mind hits a key value store for each single key value (an RPC for each key value). You'd be limited by the network latency.

You batch your changes in the client and then you commit them to the server. When that commit succeeds your data is persisted; until such time any client application must consider the data uncommitted. This is not unlike a client application would deal with transactions to a relational database. So the benchmark is completely valid here.

(Well, only that here we also have deferred log flush, which means the changes will take a second or two to actually make it to the file system, but that still demands the same aggregate throughput from the HBase. That is where this test is not fair).

Edit: Spelling

knightni · on Nov 4, 2012

> (Well, only that here we also have deferred log flush, which means the changes will take a second or two to actually make it to the file system, but that still demands the same aggregate throughput from the HBase. That is where this test is not fair).

That was actually the bit that I was concerned about - I failed to read the parent comment properly and assumed it had the exact same concern I did. If you don't have to do a disk write because you're not guaranteeing durability, you have a completely unfair comparison.

TallGuyShort · on Nov 2, 2012

More accurately, the database was configured to buffer all writes until flush() was called, so durability was not guaranteed like it was on the other database instances. Still an unfair comparison, of course - I was just curious what you meant by 'not actually hitting the database', so I shared the tl;dr.

jbellis · on Nov 2, 2012

This isn't like mongodb's non-durable-by-default writes. No, the "buffering" here is done client-side. So it really wasn't hitting the database at all (until the client teardown phase, well after the latency that is being graphed here was measured).

eclark · on Nov 2, 2012

All requests are flushed to the server before the test is finished. So yes it does hit the database and average latency will be correct.

https://github.com/brianfrankcooper/YCSB/blob/master/hbase/s...

cleanup calls flushCommits. So all requests are there. No get requests are served from the client's buffers so all of those must hit the region server. In addition the client side write buffer size is 12mb. So we're not talking about all requests being buffered at a time.

This may or may not be what is needed for your application so autoFlush is a per-client setting (HBase actually defaults it to on). Deferred log flushing is similar, though it's a per table setting.

jbellis · on Nov 2, 2012

Average latency is NOT correct because YCSB latency is measured per-insert, not averaged over the total runtime. Here is the code from DBWrapper.insert:

		long st=System.nanoTime();
		int res=_db.insert(table,key,values);
		long en=System.nanoTime();
		_measurements.measure("INSERT",(int)((en-st)/1000));

... where db.insert makes the htable.put call we're talking about.

The HBase default is correct for what YCSB is trying to measure ("live" updates, i.e., the update should be readable from the regionserver afterwards), but someone submitted a YCSB patch to override the default to make latency look better time some time ago, and most benchmarks including this one have inherited the mistake.

eclark · on Nov 2, 2012

The client will block after the buffer is full, and will wait until the outstanding requests are ack'd from the server. So the times will still be pretty close to correct. Only the last few puts won't be timed.

https://github.com/elliottneilclark/hbase/blob/trunk/hbase-s...

So I created a pull request for YCSB so that last little bit of the buffer is counted for HBase.

jegoodwin3 · on Nov 2, 2012

Not sure I buy the paper's analysis. They don't seem to know much about load testing -- they speak as though throughput is 'load' (rather than the multiprogramming level) and don't computer Bandwidth - Delay products.

Also, rather than using M/M/1 or some reasonable analytic model, they deliberately trottled their request rates to hold throughput constant (thereby guaranteeing different loadings for different 'benchmarks')

Just reading the first graph, for example, and applying Little's Law, it's pretty evident that Cassandra was loaded more heavily than than the two MySQL systems, with HBASE and Riak trailing.

Looks like HBASE and Cassandra lead the pack to me, with different characterists for different purposes.

Advice to authors: buy a book by Neil Gunther.

jbellis · on Nov 2, 2012

A more rigorous set of tests (including datasets that don't fit in memory, for instance) was presented at VLDB this year: http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf

Mysterion · on Nov 3, 2012

Note that this paper was discussed at the time on HN:

http://news.ycombinator.com/item?id=4453500

Some people were skeptical of the tuning for HBase, and an HBase developer criticized their methodology, given the especially poor performance shown for HBase. The paper authors themselves indicated that they had considerable trouble setting up the HBase system. Which in a way validates the comments of some here that HBase is fickle and can be difficult to tune, though it still means that the results in this paper for HBase are not valid, when compared to the other systems.

In the interest of disclosure, it should be mentioned here that jbellis is a principle corporate backer of Cassandra.

HarrisonFisk · on Nov 2, 2012

I wonder why they picked MyISAM for MySQL, since basically no one sane is using that anymore. It isn't even the default anymore, so that can't be the excuse.

"MyISAM caches index blocks but not data blocks."

elchief · on Nov 2, 2012

MyISAM is fast at reads. You can use innodb for the transactional, writeable master, and replicate to MyISAM for read-only slaves. This also gives you full-text search.

HarrisonFisk · on Nov 2, 2012

Then your slaves will lag really badly while the replication thread keeps stalling on table level locks.

InnoDB is faster at reads for most workloads on modern hardware. MyISAM has horrible scalability across multiple CPUs. MariaDB recently improved in MyISAM with the segmented key cache, but I haven't seen any results of head-to-head benchmarks since then.

sbierwagen · on Nov 2, 2012

Running benchmarks dependent on I/O speed on AWS, where I/O speed fluctuates wildly depending on who you're sharing the hardware with?

Hmm.

mikkom · on Nov 2, 2012

What I find most interesting is how bad mongo performance is.

orthecreedence · on Nov 2, 2012

Actually, looks like it does pretty well on reads. It's the writes that mongo can't handle. But we all knew that already...

dccoolgai · on Nov 2, 2012

How dare you! The Emporer looks magnificent in that robe!

joshuanomed · on Nov 2, 2012

They appear to have conducted these tests in 1Q12. The versions of MongoDB (2.0.5) and hbase (0.92) both date from then (MongoDB is now at 2.2.x, hbase 0.94.x).

newman314 · on Nov 2, 2012

The article tries not to draw a conclusion but Cassandra's numbers look pretty good to me...

falcolas · on Nov 2, 2012

A note for people doing benchmarks - please post your configurations for the services with your post as well - it would be even better if your tests were posted on github or something similar so others can reproduce your tests and validate/update them.

Also, unless you're running all of the tests concurrently, we really need to see your IOPs records during the tests, since a single lagging EBS volume in a Raid-0 array will still negatively affect disk performance (either that or spring for dedicated iops), and thus skew your benchmarks.

Making sure you're not getting a spike CPU steal time would be great as well; same reason.

dschiptsov · on Nov 2, 2012

Well, the data never leave JVM, that is why it is so "fast" - it never fsyncs.

What if JVM instance crash under load? Data lose, but, see, it is not our fault - our code is OK.

crypto5 · on Nov 2, 2012

Not exactly. Cassandra writes commit log, and if JVM will crash, cassandra will repair data from that log.

dschiptsov · on Nov 3, 2012

how often it fsyncs that log?

crypto5 · on Nov 3, 2012

It can be configured: http://wiki.apache.org/cassandra/Durability One option is to fsync changes before telling client that operation is succeeded.

dschiptsov · on Nov 4, 2012

And get what benchmarks?)

crypto5 · on Nov 4, 2012

Benchmark will be not so good, but do you have better choice?

ajross · on Nov 2, 2012

The idea behind RAM-based databases is that the data is inherently redundant. If it crashes under load the rest of the cluster still has the data. I don't know HBase well at all, but clearly no one sane would argue for a "database" which held data in DRAM on a single host.

opendomain · on Nov 2, 2012

How would you compare SQL benchmarks? Oracle, mySQL, SQL Server, Postgresql? You can find any specific use case where one will out-perform the others. A lot of DBAs assume the Oracle is the most powerful, but it is also harder to manage and VERY expensive to run. I know that most NoSQL databases are free, except for support or multiple datacenter usage- but what about the cost for DevOps? Or backup?

taligent · on Nov 2, 2012

Almost all NoSQL databases are open source and free for multiple datacenter usage.

And they are all just as easy to manage and backup as everything else. They all have their problems and issues.

bonzoesc · on Nov 3, 2012

How do I back up a 20-node Riak cluster as easily as a Postgres cluster of any size?

Disclosure: I work for Basho, makers of Riak.

xoail · on Nov 2, 2012

I guess the conclusion pretty much sums it up. Every db solution has its own advantages and disadvantages. I find many people jump on latest bandwagon and later regret the choice followed by a "Why we moved away from xyz database" post on their blog. Please make sure to analyze your application and future strategy fully before choosing one.

sh_vipin · on Nov 2, 2012

Thanks for this post. It was really good comparison.

neya · on Nov 2, 2012

Thank you for posting this. I actually spent a considerable amount of time searching for similar benchmarks a few weeks ago.

bborud · on Nov 3, 2012

"After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics."

Oh, great.

nirvana · on Nov 3, 2012

It's a shame they didn't test CouchBase 2.0 (or at least 1.8)... and really kinda silly, given that it is pretty widely used commercially. I think CouchBase may be the most successful NoSQL database, when it comes to commercial installations (for large customers, maybe mongodb has more total customers.)

Plus, I think it would have scored very well here.

ericcholis · on Nov 3, 2012

I was curious about CouchBase myself. I know it's hard to run tests on every variation of the NoSQL fad, but CouchBase (IMHO) is a pretty sizable player.