
Broken by Design: MongoDB Fault Tolerance - angersock
http://hackingdistributed.com/2013/01/29/mongo-ft/
======
notacoward
Nearly a year old, submitted four times already. Here's the one that actually
had some interesting comments.

[https://news.ycombinator.com/item?id=5159723](https://news.ycombinator.com/item?id=5159723)

~~~
angersock
I wonder if the submission detector is broken--second issue I've had with this
in recent memory.

EDIT: Does HN use Mongo to keep track of duplicate submissions? :P

~~~
ChuckMcM
I'm guessing that the submission bots are getting better at working around the
submission detector.

------
ar7hur
Best part from the post:

> _I know that the brogrammers out there are constantly getting texts from
> their buddies to plan the weekend 's broactivities, trying to decide in
> whose mancave they'll be setting up their lan party, and are thoroughly
> distracted in between futzing with their smart phones and writing a few
> lines of code per day by cutting and pasting it from stackoverflow._

I love that!

~~~
Jare
With the psychotic amount of crap he wrote in this post, and that Twitter
handle, I can only assume he was just mad he never gets invited.

~~~
emin-gun-sirer
That blog post met with a lot of responses of the form "tl;dr but mongo was
broken and it's fixed now, nothing to see here." Wasn't fixed then, doesn't
seem to be fixed now [1]. And it's quite difficult to have a conversation with
people who can write but not read.

TL;DR Not surprised that a brogrammer response to all this would be "you
jelly."

[1] [http://aphyr.com/posts/284-call-me-maybe-
mongodb](http://aphyr.com/posts/284-call-me-maybe-mongodb)

------
octix
I'm confused, the article gives Cassandra as an example of better data
handling, but according to
[http://wiki.apache.org/cassandra/ArchitectureOverview](http://wiki.apache.org/cassandra/ArchitectureOverview)
and
[http://www.datastax.com/documentation/cassandra/2.0/webhelp/...](http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/dml/dml_config_consistency_c.html)
you have eventual consistency and you get to tune consistency level... which
setting is the same, but behaves better than mongo's?

PS: I'm not a fan, but I like mongo for what it gives... + I'm hoping it will
get just better.

------
pbbakkum
Since the article is a resubmission, I hope you all will excuse a comment
resubmission:

This is the 3rd or 4th time I've seen this article in the past few days so
I've decided to post my take. I work with Mongo in a production scenario but
I'm hesitant to post because these things tend to turn into a pointless
argument. So let me stress: this is not an attack on the blog post, I'm hoping
to improve the discussion here. Mongo has some real problems, these mentioned
are not the problems. Here goes, written along with the article's segments.

\- "It lies." Mongo used to have a default where a driver would fire off a
write and not check that it succeeded with the server. This was very obviously
a decision made to improve benchmark performance, though I imagine a benchmark
with only default settings would be rather naive. Regardless, yes, the default
was a stupid corporate decision but it is well known and should be apparent if
you're deploying a Mongo cluster. Additionally, as the author notes, this
default has changed and this entire point is no longer a concern.

\- "Its Slow." A real point he raises is that its a little wonky you need to
send a separate message for getLastError. I suspect this is an artifact of
Mongo's historical internal structure. [http://docs.mongodb.org/meta-
driver/latest/legacy/mongodb-wi...](http://docs.mongodb.org/meta-
driver/latest/legacy/mongodb-wi..). . If you look at the wire protocol, I
think it is designed such that only the OP_QUERY and OP_GET_MORE message types
get an OP_REPLY back. getLastError is a command, which are run through an
OP_QUERY message. He notes that using this check affects performance. It does,
but lets dig into this: "Using this call requires doubling the cost of every
write operation." If the author benchmarked this, I suspect he would find that
it vastly more than doubles the latency of a single write operation from the
client's perspective. My understanding is that when performing this kind of
safe write, the driver sends an OP_INSERT, for which it doesn't have to wait
for a reply, then immediately sends an OP_QUERY message (getLastError), on
which it hangs waiting for the OP_REPLY. In other words, this is now slower
because we've created a synchronous operation immediately after just firing
off the insert command. Again, its a little wonky that we send Mongo two
messages, but one is immediately after the other and that is vastly
overshadowed by the fact that we now have to wait for a reply. I believe 1
synchronous send and receive is unavoidable to ensure a safe write in ANY
system, and the argument about sending 2 messages really boils down to sending
about 200 bytes over a socket vs 400 bytes, I personally don't worry about it.

\- "It doesn't work pipelined / it doesn't work multithreaded." He is also
missing the real complexity here, and this is where I have a problem with this
blog post because this is not a theoretical discussion, if its content is true
then it should just be proven rather than FUD launched into the world. As
noted in the docs (
[http://docs.mongodb.org/manual/reference/command/getLastErro...](http://docs.mongodb.org/manual/reference/command/getLastErro..).
), getLastError applies for the socket over which the command was run, so its
up to the driver to execute the getLastError on the same socket as the write,
which is an implementation detail and a solved problem. The way the drivers do
this in practice is you set the write concern and the driver takes care of the
rest. If you run getLastError manually then it depends on the driver, but for
Java the correct procedure is addressed at
[http://docs.mongodb.org/ecosystem/drivers/java-
concurrency/](http://docs.mongodb.org/ecosystem/drivers/java-concurrency/) .
So for the fastest possible safe performance, you multithread (which is
effectively pipelining from the server's perspective) and run operations with
the driver's thread-safe connection pool. Suffice to say people actually use
these drivers in a multithreaded context in the real world, and they work.

\- "WriteConcerns are broken." There are several relevant settings here,
including write concern acknowledgement and fsync, the author is confused
about how these map to the Java driver WriteConcern enum values. The
acknowledgement setting (elegantly stored in a variable named "w", thanks
10gen) is the number of replicas that must confirm the write before the driver
believes it has succeeded. I personally set this to 1, but you could
potentially wait for the entire replica set to acknowledge. The fsync setting
is whether or not this acknowledgement means that the server on these machines
has completed the write in memory, or actually synced the data to disk. I set
this to false for performance. There is an excellent StackOverflow answer on
Java Mongo driver configuration at
[http://stackoverflow.com/questions/6520439/how-to-
configure-...](http://stackoverflow.com/questions/6520439/how-to-
configure-..). . The author also spends time noting that if you only ensure
the write succeeds on a single machine, and you irreparably lose that machine
before replication, then the data is lost. This is obviously true for every
distributed system.

I've used a 4-shard Mongo in a production multithreaded environment that
handled several hundred million writes. For part of this period our cluster
was extremely unstable because of a serious data corrupting Mongo bug (more on
this in a second). I haven't done a full audit but based on our logs, in about
6 months I've seen exactly 1 write go missing (which I believe was in the
rollback log), so I'm personally not concerned about the things mentioned in
the blog post. I've also been happy with performance as long as the data size
comes under the memory limit. If your data exceeds memory, Mongo essentially
falls on its face, though this is hard to avoid when a query requires disk
accesses.

Mongo is not without its problems, however. As I mentioned, QA is a real
concern, we hit a subtle bug when we upgraded to v2.2 that caused data
corruption when profiling was turned on and the cluster was under high load.
It was very difficult to debug, and basically should have been caught by 10gen
before their release.

Another serious problem is that sharding configuration is still somewhat
immature, and it seems like every new release is described as "well it used to
be bad, we finally fixed it". Here is an example: you pick a shard key that
can't be split into small enough chunks, and now shard balancing silently
fails. Ok, so you pick a better shard key, but you can't migrate shard keys,
so you have to drop the collection and start again. Except dropping a
collection distributed across a cluster is buggy, so you can't recreate the
collection with the same name and a different shard key. So you just pick a
new name for your collection, and you have this weird broken state for the
original one sitting around forever unless you completely blast your cluster
and start from scratch. This sort of thing is not fun!

Mongo has many pros and cons, personally I think its real advantage is
simplicity for developers which makes it worth putting up with the other
stuff. Sorry for being long winded, hopefully this has been useful.

~~~
emin-gun-sirer
I'm the author of the blog post, and just saw this resubmission on HN. Just
wanted to quickly touch upon some of these points:

* "It lies." Mongo lied at the time I wrote the post, and while the default has been changed, the "this entire point is no longer a concern" seems overly optimistic. See here for further details where successful writes are lost: [http://aphyr.com/posts/284-call-me-maybe-mongodb](http://aphyr.com/posts/284-call-me-maybe-mongodb)

* "It's slow." We have indeed benchmarked Mongo's behavior, and, even though developers give up on consistency and fault-tolerance by using MongoDB, they don't get high performance in return for their tradeoff: [http://hyperdex.org/performance/](http://hyperdex.org/performance/)

* "It doesn't work pipelined / it doesn't work multithreaded." I don't quite see a correctness argument here. I read through the sources of the Java driver at the time I wrote the blog post. If you follow the pattern described in the link you provided (java-concurrency), you will find that thread A can issue getLastError and receive results for thread B's operations. That is broken. The word "thread-safe" does not mean what you think it means. "People use it without ill effects" isn't as strong an argument as "I read through the code and it looks broken." Further, when 10gen responded to this blog post, they were unable to refute the technical point: [http://hackingdistributed.com/2013/02/07/10gen-response/](http://hackingdistributed.com/2013/02/07/10gen-response/)

* "WriteConcerns are broken." Check out the Jepsen test, the last subsection: [http://aphyr.com/posts/284-call-me-maybe-mongodb](http://aphyr.com/posts/284-call-me-maybe-mongodb)

Agreed that Mongo's simplicity is a big draw for many developers new to NoSQL.
Sadly, the system provides very weak properties, and applications built on top
end up having even weaker properties still.

------
threeseed
This is from January and some of the issues have already been fixed.

What is the point in submitting this ?

~~~
SideburnsOfDoom
Did you read this far?

> "So, chances that a Mongo fan can engage only his hind-brain, react with
> "LOL, tl;dr, fixed in v2.2 with a one-liner as described in Mongo manual"
> and actually have something meaningful to contribute are absolutely nil."

------
guard-of-terra
Please provide a better drop-in alternative or shut up seriously.

SQL is single master and is pain in the ass for many applications. Cassandra
didn't happen to work for us. CouchWhatever failed to ever get off ground.
Redis - can't say anything about it. That leaves us with what?

Yes, mongodb fails but only after it delivers. Other nosql databases don't get
into the delivery phase.

What's there to choose?

~~~
SideburnsOfDoom
> Please provide a better drop-in alternative or shut up seriously.

He mentioned several. He talked about why he prefers them.

~~~
guard-of-terra
Maybe he should have written a first article on how they are fantastic for his
real-life tasks instead of 100th article on how mongodb is bad despite being
really used.

All tree of his recommendations strike me as key-value storage. Key-value
storage is both easy to write and not very useful in production. I don't know
how much meat they provide over mere key value, but allow me to be wary.

~~~
notacoward
None of the three are just key/value. Cassandra is a BigTable-like wide column
store, and Riak is a document store with some graph capabilities. HyperDex
seems closest to Redis with lists/maps/sets, except that it's distributed and
fully consistent (i.e. what Redis could dream of being some day). None of
those are easy to write, and many people already find them quite useful in
production. Stop being flat-out wrong about everything, and do some research.

