MongoDB: A Light in the Darkness

gfodor · on Sept 24, 2009

I've got serious "Key value fatigue." Inevitably these articles are always glowing but somewhere in the comments or on usenet you find the thread saying "we actually tried this and it fell on its face in production." I'm tired of articles that spend the whole time talking about features and showing single-machine "hello world"-esque 'performance tests' while neglecting the things people making IT decisions care about: does it actually work?

For us, we're using tokyo and it explodes at 70GB of data, though I'm guessing its a configuration issue or something. We purge it every week now since its just used as a persistent cache and I haven't looked into why, but it puzzles me how it can just basically break at a certain limit and not just start pushing things to disk.

To cut through the noise, can anyone here vouch for a simple "plug and play" kv store that actually works as advertised, at scale, and ideally is distributed so I can just add nodes as needed? Third party tall tales and anecdotes don't count, I want you to explain in detail your own personal experience running one of these things on a real, live, many-noded system. CouchDB, MongoDB, Tokyo, Redis, HBase, MemcacheDB, Voledmort, Cassandra, the list goes on (I realize not all of these are strict k-v stores), who out there other than the original authors can get up front and say these things work well?

moe · on Sept 24, 2009

does it actually work?

Someone has to step forward and try it. You did that with tokyo and noticed it doesn't. Report your bug and after it's fixed it will work better for the next person to try.

can anyone here vouch for a simple "plug and play" kv store that actually works as advertised

It doesn't exist. For the simple reason that most K/V stores are under 2 years old and haven't gotten enough real world exposure to work all the corner cases out.

If you want rock-solid then you're looking at the wrong market. Use PostgreSQL. It's not optimized for that use-case, but it's as chuck-norris as you'd expect after 20 years of development and production vetting. It won't segfault under load, it won't corrupt the datastore and it won't plot sawtooths while plowing through a large index.

And at the end of the day a K/V store is a table with two columns, right?

gfodor · on Sept 25, 2009

A few things. First, the "shoot the messenger" reply was inevitable, but doesn't really detract from my point that these types of blog posts trumpeting features and hype are not worthwhile. We will report this as a bug if and when we have a useful bug report to submit. As of right now we don't.

Second, we use PostgreSQL for our RDMBSes -- I can't really say we've tried using it with the same use cases we'd be using for our K-V store and if it would work. We're talking tens of millions to hundreds of millions of HTML documents keyed by URL, and I'm skeptical the performance we'd get would be close to the 1000-10000 tps we experience with tokyo. Would be a good experiment, though!

thenduks · on Sept 24, 2009

I'll get back to you in a few months. Building a project right now based on MongoDB.

I can't resist giving you an anecdote, though:

can anyone here vouch for a simple "plug and play" kv store that actually works as advertised, at scale, and ideally is distributed so I can just add nodes as needed

Yes, MongoDB :) With the caveat that the sharding support is alpha at the moment, database-level sharding is ready right now. Timeline is agressive in getting this solid, too.

I've been looking at and waiting for the right document-oriented database for over a year now (almost 2, it seems) and I settled on Mongo after lots of experimentation.

stephenjudkins · on Sept 25, 2009

We've been disappointed, in some way or another, with almost every one we've tried as well. Interestingly, one of our uses of Tokyo cabinet completely choked at somewhere between 60-70gb of data.

At the same time, these articles can be very valuable for us right now. Yes, this particular piece is a bit fluffy, but they invite skepticism and many of the feedback people give.

We've never used Voldemort in production, but in heavy load testing we were able to stuff in a few hundred million records with only three instances. Retrieval remained acceptably quick, even while the inserts where going on. However, when we looked it didn't support repartitioning/rebalancing of the data set. This may have changed.

Cassandra and MongoDB offer a great deal more functionality than, say, Voldemort or Tokyo. Cassandra seems like a really awesome choice, but is currently under heavy development.

jbellis · on Sept 25, 2009

Cassandra has the features you are looking for, including adding nodes as needed, and is starting to get a bunch of success stories outside of its original authors (facebook). Digg, for instance, has 9TB across 12 machines. They wrote some articles about it at http://arin.me/code/wtf-is-a-supercolumn-cassandra-data-mode... and http://blog.digg.com/?p=966.

(I work for rackspace and we have an internal app going live on Cassandra in the coming days. But it doesn't technically qualify for "in production" yet.)

wmf · on Sept 25, 2009

I can't help you with key-value stores, but document databases have been in production for years... under the name XML databases.

gstar · on Sept 24, 2009

MongoDB works very well for us with 100GB of data per collection, although we did run into a severe bug with .count not using an index, totally killing performance (we're talking 60 seconds to return).

Inserts and indices, however are very very fast, and the bug was fixed incredibly quickly and now works in trunk.

It doesn't seem quite cooked yet, but it's a very very nice start, and promises much. I prefer it to the other KV stores that are out there right now, anyway.