

Comparing Non-Relational Databases: CouchDB, MongoDB, Tokyo Tyrant - spahl
http://github.com/igal/ruby_datastores/raw/master/2009-11-14%20Non-relational%20data%20stores%20for%20OpenSQL%20Camp.pdf

======
javery
The performance tests here seem about useless. When memcache is slower than
all your databases its a good indicator that after your first call all the
databases just cached the request, to get meaningful performance metrics you
would need to use sizable databases and more realistic use cases. It's not
enough to just publisher a disclaimer, why not just do it right in the first
place?

It also states transactions as a benefit of MYSQL then shows the performance
tests against the MyISAM engine which doesn't support transactions. This also
explains why the MyuSQL inserts might be faster than any others.

~~~
sedachv
I love how the graphs don't include any units. The "scores" one is
particularly meaningless.

------
mattlanger
These "NoSQL" comparisons are a tricky thing, inasmuch as this is a very
volatile space with a lot of really interesting development happening.
Furthermore, benchmarking a handful of storage options that are really only
alike in that they share the same thing which they are not (relational, SQL)
is a lot like running road tests on a handful of cars that are not Honda
Civics.

I made the foolish decision a while back to choose a data storage option based
on benchmarks and hype, and I'm currently paying for it by having achieved
spectacular launch failure and now completely rearchitecting my backend.

NoSQL emerged only after a decade or so of webscale challenges helped us
finally realize that everything only _looked_ like relational nails because
the only tool we had was SQL hammer, and that tackling every data warehousing
problem with SQL was coming at the cost of money, engineering hours, and
headaches. Now that viable, production-ready alternatives have become
available it's important that design decisions to go beyond seeing problems as
not-nails and seeing them instead for what they are.

Are you looking for reliable, versioned master-master replication for embedded
devices? Might want to consider Couch. An obscenely fast caching tier that
need not persist to disk? Sure, check out Scalaris. A fully distributed and
pluggable map-reduce architecture capable of storing a metric fucktonne of
data? Hbase might be for you. And, it should go without saying, there are
still and there will always be problems for which SQL is the best solution.

In any event, an invaluable starting point is Richard Jones' post at Metabrew
discussing research on a number of options for Lastfm:
[http://www.metabrew.com/article/anti-rdbms-a-list-of-
distrib...](http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-
key-value-stores/)

~~~
smanek
_I made the foolish decision a while back to choose a data storage option
based on benchmarks and hype, and I'm currently paying for it by having
achieved spectacular launch failure and now completely rearchitecting my
backend._

Can I ask what you used - and how it hurt you? Just so I can avoid making the
same mistakes ...

------
swannodette
One thing I see people rarely mention in these discussion is that since
CouchDB is master<->master, you can design applications completely
differently: peer-to-peer.

~~~
jchrisa
Thanks. The important thing about CouchDB is that it allows you to write
applications you could never dream of, with the standard LAMP stack. p2p
replication makes a huge difference.

------
bayareaguy
The title is slightly misleading since they also include PostgreSQL, MySQL and
Memcached in their tests. They conclude:

* MongoDB and Tokyo Tyrant are useful now. CouchDB has promise, but is too slow currently.

* Non-relational databases have shown their worth at larger sites when used cleverly.

* Non-relational databases will continue to improve performance, stability & features.

* Relational databases are still a great choice: fast, powerful and proven. With caching, denormalization, rework (e.g. Drizzle) & better replication, they will continue to be competitive.

------
dlsspy
I'm calling total bullshit on those performance graphs. Neither Tokyo Tyrant
nor MySQL is anywhere nearly as fast as memcached unless you have a _really_
bad memcached client (and they do exist).

I got somewhere over 90k stores per second on my macbook localhost with my
java client. If he's only getting around 3k, he must be doing them by hand.

------
mark_l_watson
Nice writeup. My current favorite is MongoDB (I blogged a few days ago on an
easy way to index and search text in MongoDB docs), but I still also use
CouchDB (and covered CouchDB in my last APress book).

I have played a little with Tokyo Cabinet/Tyrant but find it a little too 'low
level' - would probably be great if you only need a fast hash. Cassandra is
also worth a good look: if you are a Ruby developer, the Cassandra gem can
auto-install all of Cassandra and manage it for you - very elegant, really.

I admit that one reason I like MongoDB is that it is so simple to use, and the
documentation is good.

------
suhail
I know little about couchdb but the documentation score seems wrong:
<http://books.couchdb.org/relax/>

------
adrinavarro
Graphs don't seem accurate for real-life situations. And I miss Redis in the
comparison chart (IMHO memcache shouldn't be there)

------
va_coder
Is CouchDB really that slow?

~~~
mahmud
[Edit:

Having just read the paper (really, a few screens of a presentation) I feel
confident that it's safe to ignore these "findings". Something is amiss. No
way memcache is slower than MySQL for inserts/writes, and on-par for queries.
Just doesn't make any sense.]

Hard to tell. You can "speedup" MS Access to Redis speeds, with sufficient RAM
and a good caching strategy.

If CouchDB is slow in one benchmark, the developers can just go out and do
something "nice", like, memory mapping all the documents in the "database",
and serving them with a tight epoll(2) dispatch system using sendfile(2).

Speeding up databases for a benchmark is easy; scaling them for the real world
is the tricky part.

~~~
jchrisa
The thing we see with CouchDB in production is that, although under low load
we aren't the fastest kid on the block, as you ramp up concurrency (hundreds
or thousands of simultaneous clients) with mixed reads and writes, on a multi-
GB database, we don't slow down.

an old BBC case-study:

[http://www.erlang-
factory.com/conference/London2009/speakers...](http://www.erlang-
factory.com/conference/London2009/speakers/endafarrell)

Building a test harness is non-trivial, Igal's tests are a lot like ones I
used in the CouchDB book to illustrate the importance of batch updates, so I
don't blame him for the imprecision.

Right now the best test driver for CouchDB throughput looks to be Baracus:
<http://blog.cloudant.com/benchmarking-couchdb-with-baracus>

