
MongoDB Durability: A Tradeoff to Be Aware Of - alexpopescu
http://nosql.mypopescu.com/post/392868405/mongodb-durability-a-tradeoff-to-be-aware-of
======
dmytton
This post doesn't really add anything to the original post from the MongoDB
developers other than "replicate for durability".

However, the point about "you can only have a replica pair" forgets to mention
some key aspects:

\- MongoDB supports both master/slave and master/master replication in
addition to replica pair. The difference is that the replica pair setup will
do automatic failover in the sense that if there is a communication failure
where A was the master and B was the slave, and now only B can be reached, B
will assume the master status automatically. In contrast, the master/slave
setup requires manual failover.

\- In the event of a network partition (given as an example in the post that
MongoDB does not handle), you are supposed to use an arbitrator server which
will have the final say over which instance will assume the master status.

The documentation is clear on this -
<http://www.mongodb.org/display/DOCS/Replication>

~~~
alexpopescu
I don't think that any of the master/slave and the limited master/master
replication are bringing any benefits to durability discussion. These both
setups will behave exactly the same as the replica pair. The arbitrator server
is an interesting option, but it has to deal with network partition and not
durability. Such an arbitrator is not really useful for the replica pairs as
these are self-managing the master-slave status.

~~~
andrewtj
_Such an arbitrator is not really useful for the replica pairs as these are
self-managing the master-slave status._

From the Mongo documentation:

 _The arbiter is used in some situations to determine which side of a pair is
master. In the event of a network partition (left and right are both up, but
can't communicate) whoever can talk to the arbiter becomes master._

Seems to me that would be useful in ensuring a durable system.

~~~
alexpopescu
It is my understanding that the arbiter is useful for the replica pair
coordination, but that is not directly related to durability (i.e. the
guarantee that once the user has been notified of success, the transaction
will persist, and not be undone.)

~~~
andrewtj
I think the arbiter is quite relevant since a split-brain makes durability
impossible.

~~~
alexpopescu
Would you mind expanding on this? I thought the role of the arbiter is to
decide who's the master at one point in time. And while this is helpful for
the whole coordination process, (by the way I think a similar effect could be
achieved with smart clients) this impacts availability on not durability per
se.

~~~
andrewtj
Describing the arbiter's role as being to "decide who's the master" is an
over-simplification. It's there to achieve quorum without which conflicting
updates can occur.

------
JulianMorrison
MongoDB also uses update-in-place with mmap rather than versioned data, which
means that a kick-out-the-plug error is very likely to trash data even if you
get the database to successfully repair into a usable state. IOW, they're a
bit iffy on the C as well as the D of ACID. As a trade-off for this, you get
speed and non-garbage-creating updates if all you're doing is changing an
existing value.

~~~
alexpopescu
I think you are correct about the C and D. Considering replication is
currently asynchronous it is clear that we cannot really talk about C.

~~~
JulianMorrison
Async replication can be consistent if it preserves transaction ordering and
boundaries. In MongoDB that boundary is around each one update/insert
operation. So async replication isn't the consistency loss - the problem is
that transaction boundaries are only weakly respected _within one operation_.
The old data is not preserved while its new value is being written.
Catastrophic shutdown could leave a record half-altered.

------
AdamN
Agreed, this post doesn't really say anything or point out what the tradeoffs
really are between durability, scalability and performance. Battery-Backed
RAID has nothing to do with the scalability/durability tradeoff of a sharded
system for instance.

~~~
alexpopescu
The tradeoff is giving up durability to performance. MongoDB operates under
the assumption that durability is not critical and so will not write your data
to disk. That allows it to perform at the speed of say memcached. For example
CouchDB, which is another NoSQL solution in the same space of document stores,
has durable writes (in fact it is ACID). You'll see a lot of benchmarks out
there comparing these two (or MongoDB with MySQL) with the emphasize on the
speed of MongoDB, while it would be more fair to benchmark MongoDB against
memcached.

