How can this be? If no single server guarantees durability of the writes, how can a cluster of those types of machines suddenly cause those writes to be durable? Maybe it's a pedantic argument, but it seems to me that semantically speaking you are simply relying upon luck that your replicated nodes don't become corrupt or die for some systemtic reason.
The fact that there is not a replayable, append-only transaction log says to me that no matter what you build on top of it, it will never by definition be durable because the whole cannot be greater than the sum of its parts in this case.
I think the word "durable" implies that the data is written onto a disk. I think of what they are talking about as "redundant". Redundant not-necessarily-durable data.
I'd argue that durability is a sliding scale. You have to figure out how much risk you're willing to take and you cannot have a perfectly durable system.
thinking about durability this way may work great for MongoDB but it isn't how durability is framed in the rest of the database world.
B) Durability is an important goal at a system-wide level, but that doesn't mean it needs to be handled at the database layer. In addition to the already mentioned replication and transaction log methods, it can also be handled at the block or fs layer using snapshots, or by admins using backup tools. It can even be handled by having a different Database of Record and using Mongo as a operational store. Mongo as software is agnostic; we provide the tools, but it is up to the user or admin to make the best decisions for their technical and business interests. If another layer of the stack provides sufficient protection against data loss, it is unnecessary to pay performance costs associated with doing it in the DB layer.
We've always clearly said don't build a bond trading system with it. Our philosophy is one-size-fits-all is over; use the right tool for the right problem.
Based on how I like to define the term, there is nothing in the NoSQL space that does full "ACID", including complex transactional semantics involving many objects, on many server clusters. That is ok : the perf + scale problem isn't really solvable if you don't give on something.
So if you're running master/slave and choose N to be 1, you're durable (at the two server level). Run 5-replica sets and choose N to be 3 or 4, and you're basically guaranteed durability. One of your nodes goes down hard and bad things happen? You have four additional clones of it sitting and waiting to be copied.
This is actually more durable in cases of extreme hardware failure like, say, your RAID controller going out. However, it requires that you spend more on hardware. So do several other things Mongo does, like using a lot of RAM and disk to get faster writes, so that's in keeping with a lot of their other design decisions.
It's not perfect for every project, but it's a great choice for most of the same projects where you'd use Mongo in the first place.
Basically: (a) for real single server durability you need to turn off hardware buffering or have a battery-backed RAID controller to ensure your write really hit disk; (b) this won't help you if your disks fail, and this failure mode is as likely as any other; and (c) for some applications, the delay required to replay a transaction log in unacceptable--you need 100% uptime.
I am thinking though, that the fact that MongoDB writes to files at all is somewhat misleading, and that they may as well say that all your data is loaded into virtual memory. (Since they make no guarantees that the database files will be consistent except in the case of a controlled shutdown. Neither does MySQL, they point out, but I think in practice MySQL database files will be easier to recover from, since their structure is presumably more regular.)
In our experience, MongoDB single-node recovery is very robust. It makes no guarantees of transactional integrity _between_ objects, but in our experience the individual objects have always been recovered intact. According to the MongoDB documentation, recovery will occasionally fail to rebuild objects that span disk pages which were flushed in an inconvenient order, but this will not prevent it from recovering other objects.
So even though MongoDB doesn't have single node durability, and you _really_ ought to run it in replicated mode, it actually manages to have a robust recovery tool.
Just saying that you might want to test it in a ways that simulates an overwhelming workload, before you trust it.
I really don't understand the problem. No, MongoDB isn't durable the same way MyISAM/InnoDB is with MySQL.
But I think that is clear as day, certainly no news to me when choosing MongoDB.
Been running it in production for 3 months, working fine.
If it for some reason would be data loss or corruption, I wouldn't come crying to the devs. I know what risks I took when making the decision.
If the backups aren't there I can only blame myself when such event happen.
Full disclosure: I work for 10gen.
You strategically posted this when my air conditioning was broken, so here are a few thoughts before I go find somewhere cooler. Since CouchDB is "not a competitor" to MongoDB, it's nice of you to put all this time into a public service.
> MongoDB, <b>by default</b>, doesn’t actually have a response for writes.
Whoopsy, got your emphasis wrong there. We did this to make MongoDB look good in stupid benchmarks (http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs...).
Seriously, though, this "unchecked" type of write is just supposed to be for stuff like analytics or sensor data, when you're getting a zillion a second and don't really care some get lost if the server crashes. <b>You can do an insert that not only waits for a database response, but waits for N slaves (user configurable) to have replicated that insert.</b> Note that this is very similar to Cassandra's "write to multiple nodes" promise. You can also fsync after every write.
> MongoDB writes to a mem-mapped file and lets the kernel fsync it whenever
> the kernel feels like it.
fsyncs are configurable. You can fsync once a second, never, or after every single insert, remove, and update if you wish.
> When you look at MongoDB more critically I don’t see how you could actually
> justify using it for anything resembling the traditional role of a database.
This is because you assume you'll run it on single server. MongoDB's documentation clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.
Also, as another commenter mentioned, full single-server durability is scheduled for the fall.
> Stories like this (http://www.korokithakis.net/node/119) are dubious not
> because they expose a few bugs in MongoDB but because they show inherent
> architectural problems you cannot overcome long term without something
Stories "like this" show that MongoDB doesn't work for everyone, particularly people who give no specifics about their architecture, setup, what happened, or anything else. Isn't it irritating how people will write, "MongoDB lost my data" or "CouchDB is really slow" and provide no specifics?
That's not to say that things never go wrong, MongoDB is definitely not perfect and has lots of room for improvement. I hope that users with questions and problems will contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog post). Anything to contact the community and let us try to help. I wish every person who tried MongoDB had a great experience with it.
Lots of users, hopefully most, love MongoDB and are using it happily and successfully in production.
I'm using MongoDB in production and I've looked at your documentation a bunch. After spending days with your docs open in a browser tab I can't say that it was especially clear on this point.
Perhaps I'm particularly ignorant, but I'd wager that not many other developers know that multiple MongoDB servers are currently required to achieve reasonably acceptable durability.
We're also using it single-server, but only for analytics where loss of a small amount of data isn't a big deal. I've made it very clear to all and sundry that if we're going to put any data that has more value into it, we first need at least one additional server instance.
Having said that, I must ask. Do the "unchecked" writes, configurable fsyncs and multi-node writes exist in the currently stable version of MongoDB or are these features still in alpha or beta?
While I do disagree with the article, he clearly pointed at the current version of Mongo.
They're all available in stable! This might be a documentation fail. It is on the wiki, but suggestions are welcome if you looked at page X and didn't see it.
He did go into several specifics as to why MongoDB might lose your data. Nice jab at CouchDB though!
He was talking about that post http://www.korokithakis.net/node/119 that doesn't go into any specifics at all.
> Nice jab at CouchDB though!
But couchdb is slow though. I, like many, switched to mongodb because couchdb was just too slow and when I asked in IRC how to make it faster I was told to run a cluster of couchdb, so, not too different than mongodb ;)
There was also an Erlang configuration change which allowed us to make use of a thread pool for disk-io so that fsyncs didn't block other activity.
Really it was a bunch of tiny things, that all added up. The catalyst was the creation and use of a few highly-concurrent benchmark suites that we could use to identify bottlenecks.
for some reason wordpress wanted me to moderate your post so sorry for the delay in it showing up.
>> Whoopsy, got your emphasis wrong there ….. Seriously, though, this “unchecked” type
>> of write is just supposed to be for stuff like analytics or sensor data, when you’re getting
>> a zillion a second and don’t really care some get lost if the server crashes.
did the default change? the last time i attempted to a concurrent performance test this was one of the barriers i hit. my issue isn’t that you include this feature, it’s that it’s the default, i certainly believe there is a use case for it i just think it’s harmful as a default.
>> Since CouchDB is “not a competitor” to MongoDB, it’s nice of you to put all this time
>> into a public service.
haha, that’s funny. i regularly use non-CouchDB databases and I get along great with all the people from other databases at conferences. even if i did feel like we were competing, i wouldn’t care. this post really is about reliability issues i don’t think your users are fully aware of and i honestly hope that you fix.
>> fsyncs are configurable. You can fsync once a second, never, or after every single insert,
>> remove, and update if you wish.
that’s really good to hear. have you optimized for a “group commit” yet?
>> This is because you assume you’ll run it on single server. MongoDB’s documentation
>> clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.
I responded earlier to the complexity of actually keeping something available that depends on this. so i won’t cover it again.
>> That’s not to say that things never go wrong, MongoDB is definitely not perfect and has
>> lots of room for improvement. I hope that users with questions and problems will
>> contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog
>> post). Anything to contact the community and let us try to help. I wish every person
>> who tried MongoDB had a great experience with it.
You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming it on users who don’t use JIRA or get on IRC a little distasteful.
these issues are architectural and until you do something append-only they aren’t going to go away. someone mentioned earlier that you plan to do an append-only transaction log, if that’s accurate then it’s fantastic news.
> haha, that’s funny. i regularly use non-CouchDB databases and I get along great with
> all the people from other databases at conferences.
Oh, I tend to bite people when I find out they use another database . Maybe I should stop that?
> even if i did feel like we were competing, i wouldn’t care. this post really is about
> reliability issues i don’t think your users are fully aware of and i honestly hope that
> you fix.
You must be thrilled to learn that single server durability is coming. I look forward to a followup post extolling MongoDB’s virtues this fall.
> I responded earlier to the complexity of actually keeping something available that
> depends on [multiple servers]. so i won’t cover it again.
Yes, it is a difficult, but not unsolvable, problem. Mongo’s made a bunch of tradeoffs in the awesome vs. easy to program area. For instance, remember last year when CouchDB was saying MongoDB sucked because of its lack of concurrency? That it was too complicated to do concurrency in C++ and that Erlang was the way? Well, now Mongo has concurrency, so on to the next “must have” thing.
>You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming
> it on users who don’t use JIRA or get on IRC a little distasteful.
People discuss everything from bugs to architecture to lunch on our various forums. I was trying to say, possibly badly, that we have a lot of ways for people with questions, problems, and suggestions to reach out.
Eliminating the methods I outlined, I’m not sure how people with suggestions could reach the developers, other than telepathy.
Also, the user you cite is far from typical. It sucks that some people don’t like Mongo, but there’s are a lot more out there from those who do: http://codeascraft.etsy.com/2010/07/03/mongodb-at-etsy-part-..., http://blog.eventbrite.com/guest-post-why-you-should-track-p..., http://blog.wordnik.com/what-has-technology-done-for-words-l..., http://www.engineyard.com/blog/2009/mongodb-a-light-in-the-d... and so on.
I do agree durability is important, but as long as you are aware of this behavior and you consider it in your design, you may still find scenarios where the gained speed is a good trade off.
Another aspect that tends to be forgot when speaking about MongoDB is its fire-and-forget API. Combined with the behavior of automatic collection creation, innexistent data validation this may lead to "interesting" results.
However--I do think the decision to make writes "fire and forget" is just a mistake. If you use abstractions like a connection pool under heavy concurrency, you can get unpredictable behavior in terms of when the data is "actually there."
For example, all in one thread:
with connection pool: do_write operation A
do other things...
with connection pool: read something, assuming A has been applied
Specifically, the fact that you get an arbitrary connection out of the pool means you cannot be sure that the database has completely processed operation A before executing your new query.
MongoDB has a "safe=" flag in their Python bindings that implements the project's official (http://www.mongodb.org/display/DOCS/Last+Error+Commands#Last...) recommendation for "it is written" consistency. It's a bit of a hack, but it calls "getLastError()" on the connection that does the write before returning from the update()/save()/insert()/delete() call. I think it's astonishing this behavior isn't the default.
In fact, I recently updated diesel's bindings to be safe=True, so getLastError() is always executed on write operations to make sure they succeeded before the call returns:
Repairing a corrupted database, even a relational database, often takes too long for a production app.
Also, most people have backups so its not really "running naked". See http://www.mongodb.org/display/DOCS/Backups for a few ways to backup mongodb. With LVM/EBS/ZFS or any other snapshotable filesystem, backups can be done almost instantly. With EBS you can even get an insta-slave from the snapshot.
the suggested default step 1 for MongoDB is to acquire 3 servers?
i mean, no other database suggests such a huge default configuration. even knowing that their datacenter can get hit by lightening and all a lot of large production sites don't even run with this kind of redundancy.
this seems like a pretty taxing workaround for not keeping an append-only transaction log.
That said, for most users two servers are fine. Other users don't need any replicas at all since they do a nightly dump from a stored source into mongodb, or they just take regular backups. There are many ways to achieve system-wide durability, not all of them require the database to be durable.
At my previous job, we suffered a catastrophic RAID failure. The controller died while trying repair a degraded array. At this point, we valued the data enough that we weren't going to screw around with replacing the controller and restarting the recovery. The whole RAID array went out to DriveSavers, at the cost of several thousand dollars. Now, DriveSavers are really awesome folks, but my goal is to never do business with them again. :-)
After this incident, our policy was simple: If our data mattered, it had to be replicated.
A 3 server configuration is, admittedly, pretty high end: Even if one node is down, you still have redundancy. You can keep the remaining 2 nodes live while rebuilding the failed node.
I know lots of people who care _deeply_ about data integrity, but who keep their databases on a single server. I find this a bit mystifying: Even the most expensive hardware can die in ugly and unrecoverable ways. And RAID arrays are some of the biggest culprits: Their striping formats are usually undocumented and proprietary.
if you want to see a message that is clearly delivered about (lack of) durability look at memcached. nobody misunderstands memcached's durability/consistency guarantees.
-Mikeal (after a few drinks)
Also, I think I've always mention it in my talks.
I'm not sure how VoltDB escaped your wrath, it's an in-memory db that claims to be durable.
I don't know about their marketing, but it's easy to find on the web site.