Hacker News new | past | comments | ask | show | jobs | submit login

"Shortly before JSConf, I had personally spent some time finding out ways to demonstrate that MongoDB will lose writes in the face of failure, to be used in a competitive comparison. Let’s just say that I was successful in doing so, despite recent improvements that 10gen has made. Unfortunately, I am not at liberty to share the results, nor do I think it would be constructive to this discussion. "

Why not? Did the author at least contact 10gen with the test case?




Disclaimer: I'm the competition.

It's pretty easy to demonstrate data loss with MongoDB if you're doing replication, but it's the "normal" behaviour because MongoDB uses asynchronous replication and W=1 writes by default.

Set up a n=3 cluster and a client that writes data continously, like:

i = 0; while (true) { write(_id:i, data:i); getlastresult(); /* to make sure the client sends it */ printf(i); i++; }

Now kill the master. Another node will become the new master. Issue some more writes to make sure the logs of the new and the old master diverge. Now bring back the old master, which will become a secondary, and it will say something along the lines of "finding common oplog point", and it will discard the writes that it had that were not copied to the other nodes before it was killed.

You can verify all this by looking at the i's that were acknowledged by the old master and printed by the client. The last couple of them will be gone for good.

If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow.


As a single master system, MongoDB doesn't allow the data on different nodes to become inconsistent or go into conflict. The idea is that avoiding this prevents developers from having to worry about (and clean up) conflicting data from different nodes.

The description above left out an important part of the process that occurs in this situation, but it is documented: http://www.mongodb.org/display/DOCS/Replica+Sets+-+Rollbacks . Note that the data that has been rolled back is saved to a file so that it can be applied again if so desired.

As mentioned here, if you don't like that behavior, you can use write concerns, and require W=2 (or more) and wait for the writes to be replicated. Of course, there's a performance cost to doing that, but you can choose.


Yes, but from an application perspective your database will be left in an inconsistent state. The next morning the ops guys will have to call the dev guys to "repair" the database by hand.

Truth is ScalienDB runs in W=3 mode at about the speed MongoDB in W=1 mode, so this is not a trade-off that customers have to make.


Does ScalienDB support WAN replication?


Not yet. ScalienDB uses synchronous replication which works only works inside the datacenter. Across datacenters is a completely different use-case coming in 2012.


I understand the difference. Just wanted to know whether to spend time evaluating it or not. :)


"If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow."

Serious question: are the consensus modes slower than for any other system (e.g. HBase, Cassandra), or is this just a re-statement of the fact that writing to N > 1 machines is inherently slower than writing to a single machine?


You have to be careful here, as MongoDB and Cassandra use a different model of replication.

Cassandra does not perform replication/synchronization on a per command basis between the nodes. Roughly: the client writes to multiple nodes, which are mostly independent, so assuming client bandwidth is not the bottleneck, writing to W=2 nodes will not be much slower than W=1. In practice, since Cassandra's disk storage subsystem is also fast for writes, it's overall very fast at raw writes. (As in, fastest in my benchmarks.) The trade-off is that its replication model is eventual consistency, and reads are somewhat slowish. On the other hand, their model works well in a multi-datacenter environment (along with Riak).

MongoDB uses an asynchronous replication model. What seems to happen if you specify W=2 is that the master doesn't ACK the write to the client until one of the slaves has copied it off the master. In my measurements W=2 ran at a fixed ~30 writes/sec on EC2, which means this mode may as well not be there. (This W=2 performance problem was also verified by customers looking at MongoDB.)

If you look at my company's product, ScalienDB, it uses highly optimized synchronous replication model (Paxos) and a storage engine designed for that. It's actually faster running in W=3 mode than certain other NoSQLs in W=1 mode. My bet is that this is what enterprises are going to want if they're going to use a NoSQL as a primary-copy database.

(Test for youself, all products are open-source, it'll cost you less than $20 on AWS.)


1) The explanation of Cassandra isn't quite correct. See http://www.datastax.com/docs/1.0/cluster_architecture/replic... for details.

2) Cassandra read performance is on par with writes now: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-...

3) Your explanation doesn't make sense to me. No matter the value of W, MongoDB should make best efforts to get it to all the replicas, no? So lower W should affect availability and perhaps latency but throughput should be unaffected, given a benchmark with sufficient client threads.


My Cassandra benchmarks were performed a couple of months ago.

Turns out MongoDB doesn't scale well with number of connections due to software issues (eg. one thread per connection instead of async io). At about 500 connections Mongo starts to break on the platform we tested.


I appreciated the tone and contents of the article until this paragraph. Revealing the details of the flaw would allow MongoDB supporters to continue the discussion. I would like to know if it's related to a design choice or if it's a trivial bug.


That was what stuck out to me from this, too... sounded like "nyanyanyana, I found a critical bug in your product - but I won't tell you what it is"


Really? Maybe I'm just an optimist, but it sounded more like "Every product has its bugs, and there's no point in pointing fingers here when it's not the point of this blog post."

I'm sure that Riak has bugs too - there are very few software products that DON'T (the provably-correct C compiler CompCert[1] being one that shouldn't). But it seemed to me that the post's author was simply trying to avoid getting sidetracked.

[1] http://compcert.inria.fr/compcert-C.html


I read it with this tone also, but the "I'm not at liberty..." part is different from "I don't want to bash them here," and is a little troubling.


That sounded very wrong to me too. Rightfuly or not, I immediately got the image of Basho using their "secret" bugs to demo a MongoDB fail to a big potential customer.


I will now take 30 seconds to continuously click my screen where the "upvote" button initially appeared.

A test case that confirms data loss would actually be one of the few things that COULD make this MongoDB discussion constructive.


WTF guys? It's not like it takes a genius to devise one, given how Mongo works. There's even one given above...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: