

MongoDB and Riak, In Context (and an apology) - luigi
http://seancribbs.com/tech/2011/11/07/mongodb-and-riak-in-context-and-an-apology/

======
dabeeeenster
"Shortly before JSConf, I had personally spent some time finding out ways to
demonstrate that MongoDB will lose writes in the face of failure, to be used
in a competitive comparison. Let’s just say that I was successful in doing so,
despite recent improvements that 10gen has made. Unfortunately, I am not at
liberty to share the results, nor do I think it would be constructive to this
discussion. "

Why not? Did the author at least contact 10gen with the test case?

~~~
Maro
Disclaimer: I'm the competition.

It's pretty easy to demonstrate data loss with MongoDB if you're doing
replication, but it's the "normal" behaviour because MongoDB uses asynchronous
replication and W=1 writes by default.

Set up a n=3 cluster and a client that writes data continously, like:

i = 0; while (true) { write(_id:i, data:i); getlastresult(); /* to make sure
the client sends it */ printf(i); i++; }

Now kill the master. Another node will become the new master. Issue some more
writes to make sure the logs of the new and the old master diverge. Now bring
back the old master, which will become a secondary, and it will say something
along the lines of "finding common oplog point", and it will discard the
writes that it had that were not copied to the other nodes before it was
killed.

You can verify all this by looking at the i's that were acknowledged by the
old master and printed by the client. The last couple of them will be gone for
good.

If this is unacceptable to you, then you can run MongoDB with W=majority mode,
but with MongoDB W>=2 modes (so-called consistent replication modes) are very
slow.

~~~
cwestin63
As a single master system, MongoDB doesn't allow the data on different nodes
to become inconsistent or go into conflict. The idea is that avoiding this
prevents developers from having to worry about (and clean up) conflicting data
from different nodes.

The description above left out an important part of the process that occurs in
this situation, but it is documented:
<http://www.mongodb.org/display/DOCS/Replica+Sets+-+Rollbacks> . Note that the
data that has been rolled back is saved to a file so that it can be applied
again if so desired.

As mentioned here, if you don't like that behavior, you can use write
concerns, and require W=2 (or more) and wait for the writes to be replicated.
Of course, there's a performance cost to doing that, but you can choose.

~~~
Maro
Yes, but from an application perspective your database will be left in an
inconsistent state. The next morning the ops guys will have to call the dev
guys to "repair" the database by hand.

Truth is ScalienDB runs in W=3 mode at about the speed MongoDB in W=1 mode, so
this is not a trade-off that customers have to make.

~~~
foobarbazetc
Does ScalienDB support WAN replication?

~~~
Maro
Not yet. ScalienDB uses synchronous replication which works only works inside
the datacenter. Across datacenters is a completely different use-case coming
in 2012.

~~~
foobarbazetc
I understand the difference. Just wanted to know whether to spend time
evaluating it or not. :)

------
k_bx
I have strong knowledge and some good experience with MongoDB, and now, on my
new job, we will use Riak (it's a new project that just starts) as primary
server. And while user-friendliness of it is a bit horrifying sometimes (hope
I can help to change that) in comparision to MongoDB, Riak's real beauty is
it's architecture and main ideas.

And yes, these are just different databases. MongoDB is much more RDBMS-like
(in terms of consistency, B-Tree indexes and queries), while Riak is what you
think of when you think "auto-scalable". So now I love them both :-)

~~~
Maro
If you:

    
    
      1. evaluate the default settings with which MongoDB ships and
      2. evaluate MongoDB performance with more strict,
         consistent options and learn that it's quite slow and
      3. look at my comment above
    

... then you will quickly see that MongoDB is not RDBMS like at all in terms
of data safety and consistency _goals_.

Currently 10gen is shooting for light-weight web workloads to spread the
product, and they're very successful.

~~~
k_bx
Well, yes, it would be fair to say that when you need to use MongoDB instead
RDBMS you have to make safe=True commits all the time (otherwise you can
really lost piece of them). I didn't do any benchmarks, because for usage like
this I am completely ok with RDBMS, and use MongoDB in more "warehouse"-like
use-cases (statistics, caching, long-running m/r on data etc.)

So MongoDB gets it's own niche: use it for "not 100% important data", but
instead you get really high availibility of MongoDB cluster, really high speed
and document-storage.

What you said about speed on safe commits is really interesting for me. I
mean, of course they're slow consequentially, but in parallel they should be
just as fast. Or am I wrong here?

~~~
Maro
MongoDB has very nice tunability. The new version ships with journaling turned
on by default.

You can specify `safe`, which means the data will be written to the journal.
You can also specify `fsync`, which means additionally the system will issue
an fsync on the journal. Additionally, there's the question of the replication
protocol, `W` in MongoDB. What I said is that if you turn on `W=2` it will be
slow. You should also know that even with `safe` and `fsync` you may lose data
if running in `W=1` mode if the master goes down in your replicata set. See my
comment above and the reply from a MongoDB guy.

~~~
k_bx
Yes, I understood what you said about W=2 and about safe. Maybe what I wrote
wasn't too clear :-)

------
secoif
_"Have you tried out Riak 1.0? It’s awesome."_

There appears to be a bug in Riak 1.0, rendering it totally useless, at least
with the NodeJS Driver:

[https://github.com/frank06/riak-
js/issues/97#issuecomment-25...](https://github.com/frank06/riak-
js/issues/97#issuecomment-2571828)

tldr; Executing a save then remove then a getAll yields a 500 error.

...so we moved our app to MongoDB. Critical mass in the community and the
myriad of 3rd party drivers and tools that depend on each other means I'd
vouch this kind of issue has better chance of being fixed quickly if it
occurred in the NodeJS MongoDB native driver.

In our case, we are yet to have any specific needs of a NoSQL DB, just one
that works, and this experience gave me no confidence in the Riak ecosystem.
Sorry guys.

------
nphase
"Unfortunately, I am not at liberty to share the results, nor do I think it
would be constructive to this discussion."

\- I don't understand this mindset. Forgive me for being naive, but if we're
to "toast to the challenges and all become stronger, more proficient, and more
successful as a result," then why don't we share these edge cases and fix
them?

