

MongoDB Performance & Durability - ericflo
http://www.mikealrogers.com/2010/07/mongodb-performance-durability/

======
kristina
I've tried to post this as a comment on the blog, but it's not showing up
(moderated?):

\-----------

Full disclosure: I work for 10gen.

You strategically posted this when my air conditioning was broken, so here are
a few thoughts before I go find somewhere cooler. Since CouchDB is "not a
competitor" to MongoDB, it's nice of you to put all this time into a public
service.

> MongoDB, <b>by default</b>, doesn’t actually have a response for writes.

Whoopsy, got your emphasis wrong there. We did this to make MongoDB look good
in stupid benchmarks
([http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-
vs...](http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs-mongodb-
benchmark/)).

Seriously, though, this "unchecked" type of write is just supposed to be for
stuff like analytics or sensor data, when you're getting a zillion a second
and don't really care some get lost if the server crashes. <b>You can do an
insert that not only waits for a database response, but waits for N slaves
(user configurable) to have replicated that insert.</b> Note that this is very
similar to Cassandra's "write to multiple nodes" promise. You can also fsync
after every write.

> MongoDB writes to a mem-mapped file and lets the kernel fsync it whenever >
> the kernel feels like it.

fsyncs are configurable. You can fsync once a second, never, or after every
single insert, remove, and update if you wish.

> When you look at MongoDB more critically I don’t see how you could actually
> > justify using it for anything resembling the traditional role of a
> database.

This is because you assume you'll run it on single server. MongoDB's
documentation clearly, repeatedly, and earnestly tells people to run MongoDB
on multiple servers.

Also, as another commenter mentioned, full single-server durability is
scheduled for the fall.

> Stories like this (<http://www.korokithakis.net/node/119>) are dubious not >
> because they expose a few bugs in MongoDB but because they show inherent >
> architectural problems you cannot overcome long term without something >
> append-only.

Stories "like this" show that MongoDB doesn't work for everyone, particularly
people who give no specifics about their architecture, setup, what happened,
or anything else. Isn't it irritating how people will write, "MongoDB lost my
data" or "CouchDB is really slow" and provide no specifics?

That's not to say that things never go wrong, MongoDB is definitely not
perfect and has lots of room for improvement. I hope that users with questions
and problems will contact us on the list, our wiki, the bug tracker, or IRC
(or, heck, write a snarky blog post). Anything to contact the community and
let us try to help. I wish every person who tried MongoDB had a great
experience with it.

Lots of users, hopefully most, love MongoDB and are using it happily and
successfully in production.

~~~
cbryan
"This is because you assume you'll run it on single server. MongoDB's
documentation clearly, repeatedly, and earnestly tells people to run MongoDB
on multiple servers."

Err, what?

I'm using MongoDB in production and I've looked at your documentation a bunch.
After spending days with your docs open in a browser tab I can't say that it
was especially clear on this point.

Perhaps I'm particularly ignorant, but I'd wager that not many other
developers know that multiple MongoDB servers are currently required to
achieve reasonably acceptable durability.

~~~
kristina
We really, really want people to know they should run on multiple servers. Do
you have any suggestions on making it clearer? Where did you look for
information about running it in production (so I can add stuff about multiple
servers to that page)?

~~~
cbryan
Thanks for being open to suggestions. Maybe on the documentation homepage?
<http://www.mongodb.org/display/DOCS/Home>

~~~
kristina
I'm reluctant to put such technical info on the documentation homepage, but
I've updated it to very clearly point people to Production Notes. How does it
look now?

------
gfodor
Ok, so the standard response seems to be "single server durability" isn't
supported, but replication makes up for that.

How can this be? If no single server guarantees durability of the writes, how
can a cluster of those types of machines suddenly cause those writes to be
durable? Maybe it's a pedantic argument, but it seems to me that semantically
speaking you are simply relying upon luck that your replicated nodes don't
become corrupt or die for some systemtic reason.

The fact that there is not a replayable, append-only transaction log says to
me that no matter what you build on top of it, it will never by definition be
durable because the whole cannot be greater than the sum of its parts in this
case.

~~~
wmf
They should say that replication makes up for it _under the assumption of
uncorrelated failures_. As you point out, some people may agree with that
assumption and some may not.

~~~
po
You're right. I've had a whole datacenter lose power. I am definitely not
comfortable with assuming that multiple servers somehow magically makes it
safe.

I think the word "durable" implies that the data is written onto a disk. I
think of what they are talking about as "redundant". Redundant not-
necessarily-durable data.

~~~
kristina
Writing to disk and transaction logs are nice, but they aren't magic bullets.
What if a data center catches fire? More mundanely, I've heard ~6% of hard
drive fail/year. Only replication can help you there.

I'd argue that durability is a sliding scale. You have to figure out how much
risk you're willing to take and you cannot have a perfectly durable system.

~~~
mikealrogers
traditionally durability isn't considered a sliding scale, it's a
goal/priority which requires you to implement multiple features and fallbacks
to handle everything from invalid writes, crash during write, to the data
center catching on fire.

thinking about durability this way may work great for MongoDB but it isn't how
durability is framed in the rest of the database world.

~~~
mathias_10gen
A) Just because something is 'traditionally' done doesn't mean its mandatory.
Databases 'traditionally' spoke SQL but I don't see you dinging anyone for
breaking that tradition. You've used the Appeal to Tradition fallacy (look it
up on wikipedia) many times, and it add nothing to your argument.

B) Durability is an important goal at a system-wide level, but that doesn't
mean it needs to be handled at the database layer. In addition to the already
mentioned replication and transaction log methods, it can also be handled at
the block or fs layer using snapshots, or by admins using backup tools. It can
even be handled by having a different Database of Record and using Mongo as a
operational store. Mongo as software is agnostic; we provide the tools, but it
is up to the user or admin to make the best decisions for their technical and
business interests. If another layer of the stack provides sufficient
protection against data loss, it is unnecessary to pay performance costs
associated with doing it in the DB layer.

------
dm_mongodb
Single server durability is coming, v1.8 in the fall. We chose to prioritize
some other things before that. <http://jira.mongodb.org/browse/SERVER-980>

We've always clearly said don't build a bond trading system with it. Our
philosophy is one-size-fits-all is over; use the right tool for the right
problem.

Based on how I like to define the term, there is nothing in the NoSQL space
that does full "ACID", including complex transactional semantics involving
many objects, on many server clusters. That is ok : the perf + scale problem
isn't really solvable if you don't give on something.

~~~
ericflo
If MongoDB doesn't have single server durability, what kind of durability does
it have?

~~~
angelbob
Clustered durability. You can request, per-write, "don't return until this
data has been persisted to N other servers," with N chosen per-write.

So if you're running master/slave and choose N to be 1, you're durable (at the
two server level). Run 5-replica sets and choose N to be 3 or 4, and you're
basically guaranteed durability. One of your nodes goes down hard and bad
things happen? You have four additional clones of it sitting and waiting to be
copied.

This is actually _more_ durable in cases of extreme hardware failure like,
say, your RAID controller going out. However, it requires that you spend more
on hardware. So do several other things Mongo does, like using a lot of RAM
and disk to get faster writes, so that's in keeping with a lot of their other
design decisions.

It's not perfect for every project, but it's a great choice for most of the
same projects where you'd use Mongo in the first place.

------
mjs
There's a blog post where the MongoDB people explain why they haven't
prioritised single server durability:

<http://blog.mongodb.org/post/381927266/what-about-durability>

Basically: (a) for real single server durability you need to turn off hardware
buffering or have a battery-backed RAID controller to ensure your write really
hit disk; (b) this won't help you if your disks fail, and this failure mode is
as likely as any other; and (c) for some applications, the delay required to
replay a transaction log in unacceptable--you need 100% uptime.

I am thinking though, that the fact that MongoDB writes to files at all is
somewhat misleading, and that they may as well say that all your data is
loaded into virtual memory. (Since they make no guarantees that the database
files will be consistent except in the case of a controlled shutdown. Neither
does MySQL, they point out, but I think in practice MySQL database files will
be easier to recover from, since their structure is presumably more regular.)

~~~
ekidd
We run MongoDB on development machines, and they're frequently shut down
unexpectedly. But some our development databases take an hour or two to
regenerate from scratch, so we prefer to run recovery.

In our experience, MongoDB single-node recovery is very robust. It makes no
guarantees of transactional integrity _between_ objects, but in our experience
the individual objects have always been recovered intact. According to the
MongoDB documentation, recovery will occasionally fail to rebuild objects that
span disk pages which were flushed in an inconvenient order, but this will not
prevent it from recovering other objects.

So even though MongoDB doesn't have single node durability, and you _really_
ought to run it in replicated mode, it actually manages to have a robust
recovery tool.

~~~
po
Just a thought here… there might be a difference between a development box
that's mostly idle shutting down unexpectedly and a production server that's
under heavy load, maybe starting to fall behind on some writes, freezing and
crashing.

Just saying that you might want to test it in a ways that simulates an
overwhelming workload, before you trust it.

------
cgbystrom
Here we go again. Getting a bit tiresome hearing about this "durability
issue".

I really don't understand the problem. No, MongoDB isn't durable the same way
MyISAM/InnoDB is with MySQL.

But I think that is clear as day, certainly no news to me when choosing
MongoDB.

Been running it in production for 3 months, working fine. If it for some
reason would be data loss or corruption, I wouldn't come crying to the devs. I
know what risks I took when making the decision.

If the backups aren't there I can only blame myself when such event happen.

------
cloudkj
Actually, I feel that these are some of the reasons that I find MongoDB to be
an attractive solution for certain purposes. You can give up some facets of
durability and make up for it in ease and performance. MongoDB actually fits
quite well into various web apps that have a need for a secondary, non-
authoritative datastore that needs to be highly available but not as reliable.
Any kind of real-time stream feature would probably benefit from this. Just
keep an authoritative copy of your data in your relational datastore of your
choice, and asynchronously replicate data as needed to MongoDB. Since it's a
non-authoritative datastore, you can denormalize it as much as you need. Works
pretty well, IMHO.

------
alexpopescu
Posted about this months ago:
[http://nosql.mypopescu.com/post/392868405/mongodb-
durability...](http://nosql.mypopescu.com/post/392868405/mongodb-durability-a-
tradeoff-to-be-aware-of)

I do agree durability is important, but as long as you are aware of this
behavior and you consider it in your design, you may still find scenarios
where the gained speed is a good trade off.

Another aspect that tends to be forgot when speaking about MongoDB is its
fire-and-forget API. Combined with the behavior of automatic collection
creation, innexistent data validation this may lead to "interesting" results.

~~~
jamwt
I'm a big fan of MongoDB, and I think its _replicated_ durability
characteristics are good enough for many classes of applications. If you could
lose a minute of data and have it just be "bad" instead of "customer enraging
and business threatening", then it's a very nice database system.

However--I do think the decision to make writes "fire and forget" is just a
mistake. If you use abstractions like a connection pool under heavy
concurrency, you can get unpredictable behavior in terms of when the data is
"actually there."

For example, all in one thread:

with connection pool: do_write operation A

do other things...

with connection pool: read something, assuming A has been applied

Specifically, the fact that you get an arbitrary connection out of the pool
means you cannot be sure that the database has completely processed operation
A before executing your new query.

MongoDB has a "safe=" flag in their Python bindings that implements the
project's official
([http://www.mongodb.org/display/DOCS/Last+Error+Commands#Last...](http://www.mongodb.org/display/DOCS/Last+Error+Commands#LastErrorCommands-
UseCases)) recommendation for "it is written" consistency. It's a bit of a
hack, but it calls "getLastError()" on the connection that does the write
before returning from the update()/save()/insert()/delete() call. I think it's
astonishing this behavior isn't the default.

In fact, I recently updated diesel's bindings to be safe=True, so
getLastError() is always executed on write operations to make sure they
succeeded before the call returns:

[http://github.com/jamwt/diesel/commit/95cd71d82ffc5c308060b6...](http://github.com/jamwt/diesel/commit/95cd71d82ffc5c308060b651b6d1056dd1908b45)

~~~
ericflo
It's not really about losing a minute of data though--the whole thing could
become corrupted, and repairing it will be extremely difficult.

~~~
kristina
...which is why you have a slave. Master corrupt? Promote the slave to master,
get it its own slave.

Repairing a corrupted database, even a relational database, often takes too
long for a production app.

~~~
janl
So you run naked during the time of the master rebuild? — The only sensible
solution is to run two slaves at least, IMHO. — I haven't looked, but do you
promote that?

~~~
mathias_10gen
Actually, that will be a recommended config for replica sets. Most of our
slides already show 3 replicas per set.

Also, most people have backups so its not really "running naked". See
<http://www.mongodb.org/display/DOCS/Backups> for a few ways to backup
mongodb. With LVM/EBS/ZFS or any other snapshotable filesystem, backups can be
done almost instantly. With EBS you can even get an insta-slave from the
snapshot.

~~~
mikealrogers
wait, seriously?

the suggested default step 1 for MongoDB is to acquire 3 servers?

i mean, no other database suggests such a huge default configuration. even
knowing that their datacenter can get hit by lightening and all a lot of large
production sites don't even run with this kind of redundancy.

this seems like a pretty taxing workaround for not keeping an append-only
transaction log.

~~~
mathias_10gen
No, it is _a_ suggested configuration, not _the_ suggested one. And
recommending three nodes is not that uncommon for distributed systems, because
you cant have a quorum with only two nodes. Some users are more concerned with
handling 10s or 100s of servers than having single-server durability.

That said, for most users two servers are fine. Other users don't need any
replicas at all since they do a nightly dump from a stored source into
mongodb, or they just take regular backups. There are many ways to achieve
system-wide durability, not all of them require the database to be durable.

------
mathias_10gen
Since Mikeal decided to take down the comments I've archived what I had in an
open tab and put it up on s3: <http://bit.ly/diuWH8>. If anyone has a newer
snapshot please let me know and I'll update.

------
forsaken
Scary.

~~~
angelbob
Eh. Yes and no. The Mongo guys are very, very upfront about this. Check their
web site. There are good workarounds, they just require spending a bit more on
hardware.

~~~
mikealrogers
that's kind of like saying "we told you our parachutes won't open sometimes.
well, not in our conference talks or in our marketing but it's on our wiki and
we blogged about it".

if you want to see a message that is clearly delivered about (lack of)
durability look at memcached. nobody misunderstands memcached's
durability/consistency guarantees.

-Mikeal (after a few drinks)

~~~
kristina
Actually, it's more like saying, "we HIGHLY recommend having a backup
parachute" (as, I'd imagine, most skydiving instructors would).

Also, I think I've always mention it in my talks.

I'm not sure how VoltDB escaped your wrath, it's an in-memory db that claims
to be durable.

