Benchmarks of Cassandra, HBase, VoltDB, MySql, Voldemort and Redis

jhugg · on Aug 30, 2012

VoltDB Engineer here. It seems like the authors made some unfortunate choices in the configuration and usage of VoltDB.

A limited number of synchronous loads are always going to present scalability problems and will do a poor job measureing the throughput achievable. Even if it takes 500usecs to round-trip a transaction from the client, that means each synchronous client can only do 2000 per second. To scale linearly to 12 nodes might require hundreds of these synchronous clients.

VoltDB is commonly used with many parallel synchronous clients, such as web front-ends, or asynchronous workloads, such as event feeds. Both of these workload scale very well, so long as there is enough parallelism.

They also used global, consistent, multi-partition transactions for the YCSB scan. I'm not sure if that is necessary, and it would also limit scalability.

I can't speak too directly to the work because the code and configuration doesn't seem to be public. Perhaps I'm mistaken.

It would have been nice if they had reached out to us at VoltDB to verify their configuration and client. They also could have contacted Andy Pavlo at the H-Store who has some experience with YCSB on H-Store.

Still, we can always learn from something like this. One thing we're working on in VoltDB is better performance for workloads that aren't designed to be parallel. This means more performance for synchronous clients and for clients that do global consistent reads. Another area we could improve is more prominent diagnostics to show you why your cluster isn't running faster. That information is all available in our system tables and management tools, but we could probably boil it down to a single status field telling you whether your cluster is starved on client requests, intercluster-transaction-agreement or slow procedures. In our experience, it's usually the first one. I'll go file a ticket now.

jbellis · on Aug 30, 2012

Hmm, there has to be more to it than that. VoltDB throughput went down scaling from 1 node to 4 in every workload but RSW. This is not a sign of a system that would handle many more requests if only there were more client parallelism.

arielweisberg · on Aug 30, 2012

Another Volt developer here.

Short answer, Volt establishes a global order across transactions. Transactions can't be executed immediately because their position in the global order isn't known when they initially arrive. The exchange of ordering information is driven by in-flight transactions. In the absence of sufficient in-flight transactions to drive the ordering process heartbeats are sent every 5 milliseconds resulting in increased latency and lower throughput as the system waits on heartbeats. 2-4x the concurrency should have shown different results.

We're finishing a different transaction initiation system that doesn't produce a global order so that people can do this kind of benchmark and get the expected result. You can switch the beta version on in 2.8.

I also wonder if they had clock skew. The global ordering process is always delayed by the amount of clock skew you have. With NTP properly configured this is 10s of microseconds, but with a typical out of the box NTP config it will be milliseconds unless you have a lot of uptime.

I don't really see this as an excuse and it is why we are eating the pain of a transaction initiation rewrite so people can use Volt the traditional way with small numbers of synchronous client threads.

jhugg · on Aug 30, 2012

There is an explanation for this. VoltDB latency is currently sub-millisecond for single-node deployments, but often several times higher for clusters. This is due to the current system of global-ordering-agreement that adds some latency. So for synchronous workloads, you'll see a big drop moving from one to two nodes. With enough parallelism, you will see close to double the throughput. Still, we usually look at scalability starting at 2-3 nodes, because a distributed system with one node is a different animal.

This is something we address in our docs, but people naively testing VoltDB run into it more than we'd like. So for the next major release, we've re-worked how we do global ordering and can now achieve sub-millisecond latency on larger clusters in internal testing. Note that there's no change to our serializable consistency to achieve lower latency. So this new scheme has a huge impact on synchronous workloads, but the scalability with enough parallelism has been close to linear all along.

llasram · on Aug 30, 2012

With these sorts of academic benchmarks I frequently find myself questioning what exactly got tested. The researchers obviously know what they're doing, but that doesn't make them experts on all of the technologies under test. Their description of their HBase configuration in particular leaves me skeptical of how closely it reflects a well-tuned production setup. I think publishing the actual configuration files they used for each technology would be a big help.

linuxhansl · on Aug 30, 2012

My thoughts exactly. The HBase numbers point to a "strange" setup.

Disclaimer: HBase developer here.

LeafStorm · on Aug 30, 2012

This is like publishing the comparative performances of a Nissan Altima, a Ford F-150, a Honda Odyssey, a Smart Fortwo, a Kawasaki Ninja, and a Gillig Advantage.

Sure, it's possible to determine which one has the best land speed, range, and fuel economy, but you're also sure as heck not going to use a Gillig Advantage to do an F-150's job or vice versa.

lazyjones · on Aug 30, 2012

Interesting, though I miss Riak, Postgres and also a benchmark of degradation effects after running the workload a few 10 or 100 times (due to on-disk / index fragmentation).

antirez · on Aug 30, 2012

In a system like Redis Cluster (or any other where clients talk with the right nodes directly) it is conceivable that you always see linear scalability as the number of nodes goes up. This fact is not captured by the setup of this benchmark.

If you think at it, in a real-world scenario if you have 100 database nodes, you likely also have not threaded clients, but N different processes running in M different computer systems, querying different nodes independently.

jbellis · on Aug 30, 2012

Unless you're CPU bound on the client, it's irrelevant whether you're using multiple threads or multiple processes to generate the load. Assuming of course that you're not using a GIL-bound client, which the Java-based YCSB does not.

antirez · on Aug 30, 2012

If you look at the Redis graphs there is no linear scalability with N distinct nodes. Now given that Redis has no proxy nor any other node-to-node chat in this setup, how it is possible that it's not linear scalable?

jbellis · on Aug 30, 2012

I wondered the same thing, but I do know that it's not threads vs processes. :)

jhugg · on Aug 30, 2012

If they posted the config and code, we could probably figure it out.

jhugg · on Aug 30, 2012

This.

lucian1900 · on Aug 30, 2012

It's refreshing to see a benchmark that is somewhat thorough and doesn't even make sweeping statements like "X is faster than Y".

Everyone who's ever written a hello world benchmark? Learn from this.

freemonoid · on Aug 30, 2012

Well they did say that Cassandra is better than all the others - especially when compared to HBase - on nearly all measurements (except for high write scenario latencies.)

This makes the decision of Facebook to go with HBase for their new messaging platform back in 2010 all the more strange. Though that was two years ago so things might have changed in Cassandra's favor since then.

ot · on Aug 30, 2012

> This makes the decision of Facebook to go with HBase for their new messaging platform back in 2010 all the more strange.

Speed isn't everything to a database. AFAIK they chose HBase over Cassandra because of consistency guarantees: eventual consistency is a bad choice for a messaging platform.

erichocean · on Aug 30, 2012

eventual consistency is a bad choice for a messaging platform

Strange you would mention that in the context of Cassandra, since it allows for per-read/write configuration of consistency, from "eventual" to "strong". You get exactly what you ask for with Cassandra, whether its availability or consistency.

AFAIK, HBase only supports strong consistency.

dialtone · on Aug 30, 2012

This is not the main reason people would choose HBase. For time series data HBase is much better at load balancing the data and at storing the same data in sequential blocks so that in a single operation you can fetch all of the data points that are interesting. Cassandra supports range queries but the last time I saw it wasn't super awesome at load balancing data across nodes when using the OrderedPartitioner. Do I remember wrong?

jbellis · on Aug 30, 2012

Time series works very well on Cassandra. If anything, avoiding global-ordering hot spots is a Good Thing.

http://rubyscale.com/blog/2011/03/06/basic-time-series-with-...

http://www.datastax.com/dev/blog/advanced-time-series-with-c...

http://www.youtube.com/watch?v=F-fYqPu2ciQ&feature=youtu...

dialtone · on Aug 30, 2012

Does Cassandra support Range queries with the RandomPartitioner now?

jbellis · on Sept 3, 2012

Within a partition, yes.

linuxhansl · on Aug 30, 2012

I won't put too many words into the mouths of the Facebook fellows, but I meet with them every now and then, and they very much care about "speed" (latency and throughput, best and worst case).

This paper does not reflect what I have seen in production setups.

Disclaimer: HBase developer here.

rb2k_ · on Aug 30, 2012

> Since the Redis cluster version is still in development, we implemented our own YCSB client using the sharding capabilities of the Java Jedis library.

I just glanced over this, but at this point they might have very well benchmarked driver performance. I also don't see why somebody would compare Cassandra against MySQL and against Redis.

While they all persist application data, their feature sets are so completely different that I don't see for which task people would have to decide between them.

lucian1900 · on Aug 30, 2012

I think they wanted to discriminate between them on tasks where someone might reasonably decide for either of them.

rb2k_ · on Aug 30, 2012

I haven't seen those usecases yet.

Either you want to scale over multiple systems or you don't. Either you have more data than RAM or you don't.

michaelt · on Aug 30, 2012

How about Hacker News, and more generally internet discussion forums?

Many discussion forums are SQL backed - but both Reddit and Digg are reported to use Cassandra, a NoSQL system.

rb2k_ · on Aug 30, 2012

In this case Redis would be the odd man out. Unless you want to do manual sharding and replication, MySQL also is targeted at a different use-case. I also don't think anybody would run a 40 node mysql master-master cluster

taligent · on Aug 30, 2012

Even though Redis is moving from being a Memcached competitor to being more of a Cassandra competitor and MySQL is used by many of the world's largest websites (often as a glorified key-value store) they are odd choices.

Would anyone really use Redis for 'big data' ?

rb2k_ · on Aug 30, 2012

None of them do auto-sharding and master-master replication. This is the domain dynamo/bigtable-like systems (elasticsearch, riak, cassandra, hbase, voldemort)

taligent · on Aug 30, 2012

Actually AFAIK MySQL supports master-master and the Cluster version supports autosharding.

morgo · on Aug 30, 2012

Supports versus recommends are two different things. To do master-master correctly, you need either some sort of conflict resolution or a locking manager. MySQL has neither.

jdefarge · on Aug 30, 2012

Why they didn't tested MongoDB or Riak? Is Voldemort really more relevant than Riak or MongoDB? WTF?!

Sorry, boys, but this paper looks like the result of a ill performed research job, done in a hurry of getting published.

weinbergm · on Sept 6, 2012

Another vote for MongoDB

hastur · on Aug 30, 2012

Should'ev tested MongoDB too!

enlighten_user · on Aug 30, 2012

Wanna mongo measurements too