

MongoDB and DataStax, In the Rearview Mirror - skjhn
http://blog.couchbase.com/mongodb-and-datastax-rearview-mirror

======
bkeroack
My problems with Couchbase:

\- Completely opaque administration. The company charges for support ($5k per
production node per year!), so perhaps unsurprisingly the log output is
unreadable (similar to a binary stacktrace). Forget about diagnosing cluster
issues on your own.

\- Bizarre failover behavior. In every test scenario I've tried, if any single
node goes down my application becomes unresponsive (even if it can see other
nodes that are up). This could be a problem with the .NET driver, but it's
hard to tell.

\- Weird performance. Usually Couchbase is fast, but sometimes my application
will go into what looks like a busywait loop and consume 100% CPU while trying
to contact the Couchbase cluster. Again, possibly (probably?) a driver issue,
but it makes me hate dealing with Couchbase.

\- Ten bucket limit. Ostensibly for performance reasons, but cannot be over-
ridden. Even if I'm running a non-production cluster where performance isn't
critical.

At this point we're looking at switching to Redis. At least it's widely used
and easier to administer.

~~~
vosper
Hi, thanks a lot for posting this! I have been thinking about using Couchbase
Server for a mobile project because I like the mobile client component they
have. I'm particularly concerned about the "opaque administration" issue -
have you had to actually pay the company [1] for support because the log
output is so bad, or did it just slow down your debugging process.

[1] I have nothing against paying for support, but this is a personal project

~~~
bkeroack
I don't have a problem with charging for support either, but Couchbase (for
us) is just a memcache replacement, so that price is too high to justify.

Without support the "debugging process" consists of blindly trying random
things to fix the problem before giving up and moving on to something else.

~~~
scalesolved
The 'community' edition is free, can you run that in production? Agree on the
crazy logs it generates.

~~~
bkeroack
We only run the community edition (everywhere). We could legally use the
enterprise edition in QA, but it wouldn't make sense since production must be
community.

------
jbellis
Given that Couchbase is very aware of my criticism of the last time Thumbtack
benchmarked C* vs Couchbase, it's significant that no mention is made of
efforts to have both systems do durable writes. It looks like they made the
same mistake a second time: [http://www.datastax.com/dev/blog/how-not-to-
benchmark-cassan...](http://www.datastax.com/dev/blog/how-not-to-benchmark-
cassandra-a-case-study)

(Update: the Couchbase pdf says that they used "the fastest consistency and
durability modes available for the particular database" as cover for
benchmarking different things... but even that wasn't done right; no mention
is made of setting durable_writes=false in Cassandra.)

~~~
skjhn
Well, writes are not durable until fsync. That's true for MongoDB, Cassandra
and Couchbase Server. That being said, Cassandra demonstrated great write
latency. The issue was read latency.

~~~
jbellis
Of course they are not durable until fsync; that is why having an upper bound
on how long that will be is important.

Out of the box Cassandra defaults to an upper bound of 10s. Couchbase defaults
to no upper bound at all -- you can lose arbitrary amounts of data on power
loss. That's a huge difference.

Since Couchbase does not support a time bound on fsyncs, there are two ways to
make a fair comparison: make both systems fsync before acknowledging any write
(commitlog_sync: batch in Cassandra and persistTo: master in Couchbase) or
give Cassandra an unlimited durability window like Couchbase
(durable_writes=false at the keyspace level). Of the two, the former is a lot
more reasonable in real world scenarios, but the latter is at least more
defensible than apples-to-oranges.

~~~
skjhn
As can Cassandra. It depends on how much data is still in the page cache. I do
agree both systems can be configured for immediate durability. However, we
went with the default values as most people (and most databases) do not sync
on every write. It is too much of a performance cost.

~~~
jbellis
Cassandra _cannot_ lose more than 10s of data because it will slow down writes
as necessary to make sure it does not. That's why comparing to a system that
allows arbitrarily high data loss is unfair.

People can and do run Cassandra with full durability. When people understand
the tradeoff between performance and data loss on power failure, you'd be
surprised how often they'll chose real durability. (And by batching concurrent
writes into the same fsync, the penalty isn't nearly as high as it would be in
a naive implementation.)

~~~
skjhn
Do you place a maximum amount of data in that 10s window? If not, it too is
arbitrary. I do think batching concurrent writes is nice.

~~~
jbellis
Of course "limit by time" and "limit by data" are interchangeable; in practice
"limit by time" is easier for operators to reason about.

------
meritt
"To deliver the highest write performance, Couchbase Server writes to an
integrated, in-memory cache first. Then, it synchronizes the cache with the
storage device for durability."

Comparing the write speed of non-durable writes against the write speed of
durable writes is the same bullshit tactic that MongoDB first utilized a few
years ago. Why don't we just add /dev/null in there too?

We're also apparently using 2 replicas however is the write being confirmed
prior to the replica? From Couchbase documentation: "When a client application
writes data to a node, that data will be placed in a replication queue and
then a copy will be sent to another node. The replicated data will be
available in RAM on the second node and will be placed in a disk write queue
to be stored on disk at the second node."

So a write in Couchbase only guarantees it has been written to RAM on the 1st
node. There's no fsync nor replica-writes occurring at the time of
write_success=true

~~~
regularfry
I'm intrigued as to how one can possibly justify calling that a "write".

~~~
skjhn
Because the data is written to a data structure before the write completed.
You can "write" to a cache or to an in-memory database. Databases, relational
or NoSQL, write to memory first whether it is to an application managed cache
or an OS managed cache (i.e. page cache).

------
andrewvc
Oh boy, this is rich. From the original source: _It is always part of our
process to invite vendors to provide configuration suggestions prior to
testing and to share our methodology and preliminary results with each of them
before we write conclusions. We will add any updates here should there be any
before the final report is released._ [http://blog.thumbtack.net/new-
benchmark-preliminary-results/](http://blog.thumbtack.net/new-benchmark-
preliminary-results/)

It's pretty irresponsible of Couchbase to post this on their blog given that
statement. The benchmark is EXTREMELY limited in scope. Crucially, it's mostly
about raw speed in a fairly artificial set of use cases. They only used one
size of record for for christ's sake.

I'd say more, but I'll hold off till the final report.

------
nwenzel
When are we going to be finished with raw speed tests? I get it that faster is
better. But I've always been a believer that good design will get you farther
than choosing one software/hardware/infrastructure over another.

Also, I get the comparison of MongoDB and Couchbase. But why Cassandra from
DataStax? It's a completely different technology with entirely different
strengths and weaknesses. There are certainly overlapping use cases, but it
seems like an odd comparison.

The better comparisons would seem to be comparing Couchbase to vanilla CouchDB
and Cloudant's version of CouchDB. I guess you could throw in MongoDB, but
MongoDB and CouchDB have different strengths and weaknesses. Raw speed alone
doesn't exactly leave any technology in the rearview mirror.

~~~
skjhn
Performance is a reflection of architecture. Better performance, better
architecture. That, and you can never downplay performance.

~~~
cnlwsu
Really? ... Really? I didn't know my local file server that I SCP things too
is better architected than S3.

~~~
skjhn
Good point. After all, most interactive applications read and write data from
local files via SCP and S3.

------
antirez
Benchmarks are almost always not able to really provide a generally useful
picture. When you see the _actual_ database performance difference, is in your
company, fighting for latency, in a given, specific use case, with a given
writes durability and safety requirement. Every developer that really tried
hard to optimize an application latency or performance knows how you end
hitting the details, and very specific and database-dependent tradeoffs.
TL;DR: pick databases after doing tests and simulations for your specific use
case.

------
threeseed
That's odd. I was under the impression that DataStax Java drivers had
knowledge of the cluster and so weren't sending requests to random nodes.

[http://www.datastax.com/doc-source/developer/java-
apidocs/co...](http://www.datastax.com/doc-source/developer/java-
apidocs/com/datastax/driver/core/policies/TokenAwarePolicy.html)

~~~
rsvihla
You actually have to use it though. TokenAware is one of many policies most of
which can be layered on top of one another. Just depends on how they had the
driver configured.

~~~
jbellis
Token aware is the default now, but Thumbtack benchmarked with the two year
old Thrift client instead.

------
zorked
You know what would be a nice twist, if database vendors added jepsen runs to
their benchmarking mix.

~~~
Dave_Rosenthal
FoundationDB did just that: [http://blog.foundationdb.com/call-me-maybe-
foundationdb-vs-j...](http://blog.foundationdb.com/call-me-maybe-foundationdb-
vs-jepsen)

~~~
ChuckMcM
That is pretty cool. I'm surprised that foundation doesn't get more press.

------
itamarhaber
> ...while keeping most of the working set in RAM

So if the benchmark's about data being served from RAM, how come Redis isn't a
part of it? #onlyasking

~~~
skjhn
Most of the data is read from memory, not all of it. All of it is on disk too.

------
devanti
Does anyone know if Couchbase can be used as a TB sized data warehouse? Seems
a bit difficult since it seems all keys must be stored in memory.

~~~
danws6
At my previous company we experimented with a 12 node cluster (128GB per node)
used as a datastore. At one point we had over 1 billion keys. This was back in
the 1.x version where they persisted data into sqlite files so each file would
have 100+ million rows in it. Persisting data took forever. Rebalancing took
forever.

When it worked, it was very fast. When you lost a node, things went bad. The
java client would lose it's mind trying to cope with an outage. We ended up
writing a connection pool were we could just recreate our connections when we
detected a node went out.

That said, it was the best distributed NoSQL solution we tried and might have
improved a lot since I last used it.

~~~
devanti
really helpful info. thanks!

------
thspimpolds
OH HERE YOU GO, I found that impartiality you dropped on the floor

------
rdtsc
Just wanted to point out that this "we are writing to memory only" thing in
Couchbase doesn't apply to CouchDB. CouchDB will fsync the data to the disk by
default.

[http://couchdb.readthedocs.org/en/latest/config/couchdb.html](http://couchdb.readthedocs.org/en/latest/config/couchdb.html)

You can then disable that if you don't care but it. So don't confuse
Couchbase's policy with CouchDB (Couchdbase it seems has gone the way of Mongo
here).

Also couchdb writes in append only mode and will let you crash restart your
server without corrupting your data.

------
jamieb
A response from Jonathan Ellis, via jancona:
[http://www.datastax.com/dev/blog/how-not-to-benchmark-
cassan...](http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra-a-
case-study)

~~~
jbellis
(That was a response to a different benchmark, also done by Thumbtack, but it
looks like it applies just as well to this one. See my comment at
[https://news.ycombinator.com/item?id=7944226.](https://news.ycombinator.com/item?id=7944226.))

------
trhway
i don't understand - Couchbase took 1ms per operation without durability
configured, ie. in memory. What took so long? Or is network roundtrip
included?

>However, MongoDB implements a single lock per database (link). MongoDB nodes
are limited to executing a single write at a time per database.

that sounds like a nightmare. Are they for real?

Looking at the results - durable configs, MongoDB and DataStax, have 3ms
latency with 25K and 75K ops/sec. Are they using HDDs or SSDs? As a reference
point - 5 years ago I was hitting 3.5ms on 15K disks on Oracle, at 20-30K
ops/second (durable).

>*Couchbase sponsored this study.

priceless.

~~~
skjhn
Network roundtrip included.

Regarding MongoDB, they are. They say MongoDB 2.8 will have document level
locking.

No durable configs in this benchmark though. All databases fsync'd after
writes. The servers had SSDs.

------
mscook
Why no units on the plots?

~~~
skjhn
Good catch. Latency is in milliseconds. I'll have to fix that.

------
atonse
Any mention of TokuMX?

