

MongoDB vs. Clustrix: Fault Tolerance and Availability - sergei
http://sergeitsar.blogspot.com/2011/02/mongodb-vs-clustrix-comparison-part-2.html

======
meghan
I'd like to correct some factual errors from this article.

1) Failover of a MongoDB Replica Set is totally automated and requires no
manual intervention. The replica set remains available for writes as long as a
quorum can be established between remaining members. See
<http://www.mongodb.org/display/DOCS/Replica+Sets> for more info

2) MongoDB does support different consistency models through Write Concerns
and Safe Mode. The client can choose to wait for the transaction to be written
to multiple replicas if it wants. See
[http://www.mongodb.org/display/DOCS/Verifying+Propagation+of...](http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError)
for more info

Disclaimer: I work for 10gen

~~~
sergei
1\. Say I have a 2 node replica set. Now a replica dies, permanently. How is
the recovery automated? These are quotes directly from your docs:

[http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+R...](http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member)

"1. Delete all data. If you stop the failed mongod, delete all data, and
restart it, it will automatically resynchronize itself. Of course this may be
slow if the database is huge or the network slow.

2\. Copy data from another member. You can copy all the data files from
another member of the set IF you have a snapshot of that member's data file's.
This can be done in a number of ways. The simplest is to stop mongod on the
source member, copy all its files, and then restart mongod on both nodes. The
Mongo fsync and lock feature is another way to achieve this. On a slow
network, snapshotting all the datafiles from another (inactive) member to a
gziped tarball is a good solution. Also similar strategies work well when
using SANs and services such as Amazon Elastic Block Service snapshots.

<http://www.mongodb.org/display/DOCS/fsync+Command> "Lock, Snapshot and Unlock

The fsync command supports a lock option that allows one to safely snapshot
the database's datafiles. While locked, all write operations are blocked,
although read operations are still allowed. After snapshotting, use the unlock
command to unlock the database and allow locks again

2\. Really? Is this wrong then?

[http://www.mongodb.org/display/DOCS/Replica+Set+Design+Conce...](http://www.mongodb.org/display/DOCS/Replica+Set+Design+Concepts)

"Writes which are committed at the primary of the set may be visible before
the true cluster-wide commit has occurred. Thus we have "READ UNCOMMITTED"
read semantics. These more relaxed read semantics make theoretically
achievable performance and availability higher (for example we never have an
object locked in the server where the locking is dependent on network
performance).

~~~
knbanker
1\. You really need a minimum of three replica set nodes, one of which can be
a lightweight arbiter. If the primary fails, the secondary node will be
promoted to primary automatically. In the case of a network partition, the old
primary will come back up as a secondary with no problems. In the case of a
true hardware failure, you can resync very quickly from a snapshot. For extra
peace of mind, add more nodes to the replica set. You can have up to seven.

2\. If you're reading from both primary and secondary nodes, then the view may
not be consistent. In most cases you simply read from the primary for fully-
consistent reads. You get to decide whether reads from secondaries are
consistent or not by setting the write concern (i.e., the minimum number of
nodes to replicate to before returning each write.)

~~~
sergei
1\. Yes, I recognize that MongoDB will automatically fail over when we go from
N nodes in the set to N - 1. But how do I get back to N nodes? That's
completely manual.

2\. What happens when I read an update that succeeded on the master but then
later fails on the slaves?

~~~
knbanker
1\. It depends on how the node fails. If there's just a network partition,
then you still have N nodes, so no issues. If you're running with durability
enabled, and you experience, say, a power outage, then the member should
rejoin the set and resync with no issues. If a node's drive crashes, then
you'll need to restore from a recent snapshot (within a day or so) or perform
a complete resync if you don't have snapshot. But this can all be done without
taking the replica set offline. In that last case, there is some manual work
involved. But your post, unless you've corrected it, implies that replica set
failover is completely manual. That's certainly not true.

2\. Outside of some kind of hardware failure, you won't have situations where
writes succeed on the primary but fail on a secondary. And as I stated on your
blog post, if you're really concerned about it, you can specify a write
concern on insert, and if the write fails to replicate in the desired way,
you'll know about it.

~~~
sergei
Sorry, but "hardware failure" is a fault, and when you can't deal with it,
you're not tolerant. And with larger clusters, you see hardware faults on a
regular basis. So saying we're ok in the nominal mode is not fault tolerance.

------
j2d2j2d2
These posts are written by one of the Clustrix founders.

~~~
megaman821
Are you trying to imply that the post has wrong information because of this
fact? If so, attack the wrong data. I don't care who posts facts, as long as
they really are facts.

~~~
lucisferre
I haven't read their analysis yet (I will try to when I have some free time),
but in general, I would argue that trying to compare a document database to a
SQL one is always going to be somewhat misleading. I'd care more if they were
comparing Clustrix to MSSQL, MySQL, PostgreSQL.

If you are using MongoDB in a way that is similar to the way you would have
used a SQL DB you are probably doing something wrong. Specifically, you are
trying to place normalized data in a database designed for denormalization.

~~~
codex
Sergei compares Clustrix to MongoDB because their target markets are very
similar--not because the technology is similar.

As a startup, it behooves them to attack the low-end database market, but I
suspect they've found that the primary market for a highly scalable low-end
database lies on the web, but that market has chosen to go cheap-and-dirty
with NoSQL. So now they're in the middle ground between fast-and-and-loose-
and-free and my-enterprise-uses-Oracle.

I think a lot of web development is of a highly speculative, winner-take-all
sort, so devs. want to be as cheap as possible until they win the web lottery.
For all the flaws of NoSQL, software only solutions do allow developers to
make very efficient use of their hardware by running multiple services on the
same machine, or run them in the cloud. Once they hit the jackpot, they can
afford to either go Oracle, hire software developers to work around
deficiencies in their data store (e.g. Facebook), or use a data store from
Amazon or Google or Microsoft.

That's a shame, because I think Clustrix is ultimately the right approach. The
web has a history of doing the shittiest-and-easiest thing first (ColdFusion,
anyone?) only to repent years later to the second-shittiest solution. Rinse,
repeat.

~~~
mjw0
Clustrix does give you the option to start with MySQL and then do a drop-in
upgrade when your idea gets traction.

~~~
praptak
That's a huge risk. What if the way I use MySQL does not go well with the way
Clustrix is supposed to scale?

