Full disclosure: I work for 10gen.
You strategically posted this when my air conditioning was broken, so here are a few thoughts before I go find somewhere cooler. Since CouchDB is "not a competitor" to MongoDB, it's nice of you to put all this time into a public service.
> MongoDB, <b>by default</b>, doesn’t actually have a response for writes.
Whoopsy, got your emphasis wrong there. We did this to make MongoDB look good in stupid benchmarks (http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs...).
Seriously, though, this "unchecked" type of write is just supposed to be for stuff like analytics or sensor data, when you're getting a zillion a second and don't really care some get lost if the server crashes. <b>You can do an insert that not only waits for a database response, but waits for N slaves (user configurable) to have replicated that insert.</b> Note that this is very similar to Cassandra's "write to multiple nodes" promise. You can also fsync after every write.
> MongoDB writes to a mem-mapped file and lets the kernel fsync it whenever
> the kernel feels like it.
fsyncs are configurable. You can fsync once a second, never, or after every single insert, remove, and update if you wish.
> When you look at MongoDB more critically I don’t see how you could actually
> justify using it for anything resembling the traditional role of a database.
This is because you assume you'll run it on single server. MongoDB's documentation clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.
Also, as another commenter mentioned, full single-server durability is scheduled for the fall.
> Stories like this (http://www.korokithakis.net/node/119) are dubious not
> because they expose a few bugs in MongoDB but because they show inherent
> architectural problems you cannot overcome long term without something
Stories "like this" show that MongoDB doesn't work for everyone, particularly people who give no specifics about their architecture, setup, what happened, or anything else. Isn't it irritating how people will write, "MongoDB lost my data" or "CouchDB is really slow" and provide no specifics?
That's not to say that things never go wrong, MongoDB is definitely not perfect and has lots of room for improvement. I hope that users with questions and problems will contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog post). Anything to contact the community and let us try to help. I wish every person who tried MongoDB had a great experience with it.
Lots of users, hopefully most, love MongoDB and are using it happily and successfully in production.
I'm using MongoDB in production and I've looked at your documentation a bunch. After spending days with your docs open in a browser tab I can't say that it was especially clear on this point.
Perhaps I'm particularly ignorant, but I'd wager that not many other developers know that multiple MongoDB servers are currently required to achieve reasonably acceptable durability.
We're also using it single-server, but only for analytics where loss of a small amount of data isn't a big deal. I've made it very clear to all and sundry that if we're going to put any data that has more value into it, we first need at least one additional server instance.
Having said that, I must ask. Do the "unchecked" writes, configurable fsyncs and multi-node writes exist in the currently stable version of MongoDB or are these features still in alpha or beta?
While I do disagree with the article, he clearly pointed at the current version of Mongo.
They're all available in stable! This might be a documentation fail. It is on the wiki, but suggestions are welcome if you looked at page X and didn't see it.
He did go into several specifics as to why MongoDB might lose your data. Nice jab at CouchDB though!
He was talking about that post http://www.korokithakis.net/node/119 that doesn't go into any specifics at all.
> Nice jab at CouchDB though!
But couchdb is slow though. I, like many, switched to mongodb because couchdb was just too slow and when I asked in IRC how to make it faster I was told to run a cluster of couchdb, so, not too different than mongodb ;)
There was also an Erlang configuration change which allowed us to make use of a thread pool for disk-io so that fsyncs didn't block other activity.
Really it was a bunch of tiny things, that all added up. The catalyst was the creation and use of a few highly-concurrent benchmark suites that we could use to identify bottlenecks.
for some reason wordpress wanted me to moderate your post so sorry for the delay in it showing up.
>> Whoopsy, got your emphasis wrong there ….. Seriously, though, this “unchecked” type
>> of write is just supposed to be for stuff like analytics or sensor data, when you’re getting
>> a zillion a second and don’t really care some get lost if the server crashes.
did the default change? the last time i attempted to a concurrent performance test this was one of the barriers i hit. my issue isn’t that you include this feature, it’s that it’s the default, i certainly believe there is a use case for it i just think it’s harmful as a default.
>> Since CouchDB is “not a competitor” to MongoDB, it’s nice of you to put all this time
>> into a public service.
haha, that’s funny. i regularly use non-CouchDB databases and I get along great with all the people from other databases at conferences. even if i did feel like we were competing, i wouldn’t care. this post really is about reliability issues i don’t think your users are fully aware of and i honestly hope that you fix.
>> fsyncs are configurable. You can fsync once a second, never, or after every single insert,
>> remove, and update if you wish.
that’s really good to hear. have you optimized for a “group commit” yet?
>> This is because you assume you’ll run it on single server. MongoDB’s documentation
>> clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.
I responded earlier to the complexity of actually keeping something available that depends on this. so i won’t cover it again.
>> That’s not to say that things never go wrong, MongoDB is definitely not perfect and has
>> lots of room for improvement. I hope that users with questions and problems will
>> contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog
>> post). Anything to contact the community and let us try to help. I wish every person
>> who tried MongoDB had a great experience with it.
You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming it on users who don’t use JIRA or get on IRC a little distasteful.
these issues are architectural and until you do something append-only they aren’t going to go away. someone mentioned earlier that you plan to do an append-only transaction log, if that’s accurate then it’s fantastic news.
> haha, that’s funny. i regularly use non-CouchDB databases and I get along great with
> all the people from other databases at conferences.
Oh, I tend to bite people when I find out they use another database . Maybe I should stop that?
> even if i did feel like we were competing, i wouldn’t care. this post really is about
> reliability issues i don’t think your users are fully aware of and i honestly hope that
> you fix.
You must be thrilled to learn that single server durability is coming. I look forward to a followup post extolling MongoDB’s virtues this fall.
> I responded earlier to the complexity of actually keeping something available that
> depends on [multiple servers]. so i won’t cover it again.
Yes, it is a difficult, but not unsolvable, problem. Mongo’s made a bunch of tradeoffs in the awesome vs. easy to program area. For instance, remember last year when CouchDB was saying MongoDB sucked because of its lack of concurrency? That it was too complicated to do concurrency in C++ and that Erlang was the way? Well, now Mongo has concurrency, so on to the next “must have” thing.
>You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming
> it on users who don’t use JIRA or get on IRC a little distasteful.
People discuss everything from bugs to architecture to lunch on our various forums. I was trying to say, possibly badly, that we have a lot of ways for people with questions, problems, and suggestions to reach out.
Eliminating the methods I outlined, I’m not sure how people with suggestions could reach the developers, other than telepathy.
Also, the user you cite is far from typical. It sucks that some people don’t like Mongo, but there’s are a lot more out there from those who do: http://codeascraft.etsy.com/2010/07/03/mongodb-at-etsy-part-..., http://blog.eventbrite.com/guest-post-why-you-should-track-p..., http://blog.wordnik.com/what-has-technology-done-for-words-l..., http://www.engineyard.com/blog/2009/mongodb-a-light-in-the-d... and so on.