

How not to benchmark Cassandra: a case study - tjake
http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra-a-case-study

======
arielweisberg
I find the language regarding synchronously durable writes to be confusing.
Writing to the filesystem and invoking fsync every 10 seconds is not
synchronously durable. You still send a response before the data hits non-
volatile media. In my mind that would make the MongoDB, Couchbase, and
Cassandra behaviors roughly equivalent barring differences in the frequency
with which data is synced.

Writing to the filesystem (without syncing) is a few hairs better than
buffering in application memory, but then you are at the mercy of the
filesystem blocking your party for a few hundred millis while it naval gazes
at random. Not a huge deal for most workloads because you can still get great
latency at 99 and 99.9, but beyond that it's a problem.

Not standardizing on a specific set of commit log behavior (sync vs async,
fsync frequency) is pretty egregious for benchmarking anything with a log.

It's not clear from the blog post which direction you went in, relaxing
durability in Cassandra or increasing durability in the others. I had to go to
github to figure out that you switched to synchronous commits all around. That
begs the question, were there other tunables necessary to make synchronous
durability perform well? I know in some databases you can set how long the
database will wait to group commit in terms of # of transactions and time.

Having a working group commit implementation is of course very important.

~~~
jbellis
This was the toughest part to make clear as well as brief. I was not entirely
successful.

For Cassandra I used the default 50ms batch window. I'm not aware of tunable
windows for the others.

------
army
I think it's important to consider that people interpreting benchmarks need to
be skeptical and aware of the scope of the results: a lot of problems are as
much from people overgeneralising results as from bad experimental
methodology.

------
abengoam
All I see are graphs with no units on the Y axis.

~~~
jbellis
They are all ops/s.

~~~
tedchs
Hi, I think the point is that the graphs need Y-axis labels to be meaningful.

