
Jepsen: MongoDB 3.6.4 - aphyr
http://jepsen.io/analyses/mongodb-3-6-4
======
nathan_long
> This interpretation hinges on interpreting successful sub-majority writes as
> not necessarily successful: rather, a successful response is merely a
> suggestion that the write has probably occurred, or might later occur, or
> perhaps will occur, be visible to some clients, then un-occur, or perhaps
> nothing will happen whatsoever.

> We note that this remains MongoDB’s default level of write safety.

This sounds pretty scary. How does it compare to other distributed dbs, like
Riak? My understanding is that Riak lets you specify how many nodes a write
must succeed on to be considered successful. Are its responses more reliable?
Is this just a "distributed computing is hard" situation?

~~~
TheDong
> Is this just a "distributed computing is hard" situation?

I think this is "gaming performance benchmarks and being right is hard".

MongoDB has spent a lot of effort on making sure it looks good in benchmarks,
and that includes using worse defaults than any other distributed data-store I
know of.

As long as MongoDB continues to wish to "win" at performance benchmarks by so
much, they'll have trouble being able to provide a correct distributed system
by default.

~~~
nathan_long
This isn't about distribution, but someone once wrote about getting PostgreSQL
upsert performance to be better than MongoDB's by disabling some of
PostgreSQL's safety features.

[https://markandruth.co.uk/2016/01/08/how-we-tweaked-
postgres...](https://markandruth.co.uk/2016/01/08/how-we-tweaked-postgres-
upsert-performance-to-be-2-3-faster-than-mongodb)

This made me laugh. Snarky interpretation: "yeah, we can go fast and sloppy,
too. We just usually don't."

------
rystsov
> Thus far, causal consistency has generally been limited to research projects
> ... MongoDB is one of the first commercial databases we know of which
> provides an implementation.

Cosmos DB provides session consistency (looks like an another name for causal
consistency) at least since 2014 [1].

Cosmos DB's session guarantees [2]: consistent prefix, monotonic reads,
monotonic writes, read-your-writes, write-follows-reads.

Mongo DB's causal consistency guarantees [3]: monotonic reads, monotonic
writes, read-your-writes, write-follows-reads.

Doubt that four years later still qualities as one of the first.

[1] [https://www.infoq.com/news/2014/08/microsoft-azure-
documentd...](https://www.infoq.com/news/2014/08/microsoft-azure-documentdb)

[2] [https://docs.microsoft.com/en-us/azure/cosmos-
db/consistency...](https://docs.microsoft.com/en-us/azure/cosmos-
db/consistency-levels)

[3] [https://docs.mongodb.com/manual/core/read-isolation-
consiste...](https://docs.mongodb.com/manual/core/read-isolation-consistency-
recency/)

~~~
aphyr
Causal and session are definitely similar, but I'm not entirely sure if causal
implies consistent prefix, and conversely, I think causal miiight have
stronger implications than just the intersection of MR, MW, RYW, and WFR.
Because we weren't entirely certain whether we could make that claim regarding
Cosmos, we opted to be conservative.

~~~
rystsov
I agree it's hard for me too to be precise about naming in academic sense. But
this published paper "Writes: the dirty secret of causal consistency" says
that both Cosmos DB and MongoDB have causal consistency so I don't know.. At
least Cosmos DB and MongoDB provide the same guarantees for session/causal.

------
robterrell
I thought "Jepsen.io" was just Kyle Kingsbury. Interesting that there's new
author for this analysis. (Also might explain the lack of memes, which I
always liked.)

~~~
kitpatella
Hi, new author here. :) Kyle was clear that I could write it however I wanted
to, but I opted for the more formal tone used in recent analyses.

~~~
brian_herman__
Nice job!

------
avitzurel
Completely unrelated to the core of the article, does someone know which
program was used for the sketches/diagrams?

~~~
kitpatella
I used procreate on an iPad pro.

~~~
avitzurel
Thank you for both comments! I'll look into that app.

------
wiremine
This is off topic, and I might get downvoted, but I realized I am teed up
waiting for the "mongoDB hate" comments to role in... seems to be not a lot of
love on HN for MongoDB.

I wonder what positive use cases people have used Mongo for? I've used it for
a few small/medium sized projects without problem myself.

~~~
wenc
Mongo works well as a straight-up JSON store for store-and-retrieve use cases
(with no analytics). It is horizontally scalable, avoids the overhead of
relational databases, has indexing capabilities, and provides a strong
consistency model. The big improvement came with WiredTiger, which addressed
many of the issues that plagued earlier versions of Mongo.

I've seen high-speed machine data stored in Mongo for logging and
visualization purposes. It's an improvement over writing csv files to disk.

However, if you ever need to perform non-trivial analytics, Mongo's weaknesses
quickly become obvious. For machine learning, typically you would want to
first ETL the data into a dataframe-like structure (which is a structure
native to SQL databases).

~~~
threeseed
Actually MongoDB is quite popular in the analytics space. It has a unique
trick with Spark/Hadoop where its data gets represented as a single wide
table. This allows you to use it as an analytical/ML feature store which is
not possible with anything other than Cassandra.

Also not sure where you get the idea dataframes are unique to SQL databases
because that's completely wrong. HBase and Cassandra were even the original
big data databases and they aren't relational. And Spark can manifest almost
any database as a dataframe.

~~~
wenc
> not sure where you get the idea dataframes are unique to SQL databases

I'm not sure I said this.

> HBase and Cassandra were even the original big data databases and they
> aren't relational.

They also had trouble doing joins and many other query operations which are
common in analytics. Presto addresses this somewhat.

> And Spark can manifest almost any database as a dataframe.

Which entails a translation layer from whatever non-tabular form that data was
(e.g. JSON) into a dataframe-like structure, rather keeping it in its native
form, which reinforces my point. You still need to somehow transform data into
tabular form. (ETL is just a batch way of doing this transformation; you can
have live transformations of course, with accompanying overheads)

BI tools also generally require data to be in tabular form, which entails the
use of a translation layer. The Mongo BI connector is one such translator.

> Actually MongoDB is quite popular in the analytics space.

I work in this space, interact regularly with vendors, and monitor the space
actively for strategic developments. This does not track with my observations.

