
MongoDB 3.4.0-rc3 - aphyr
https://jepsen.io/analyses/mongodb-3-4-0-rc3
======
jasondc
Bigger news is that Jepsen tests are now part of the MongoDB continuous
integration suite:
[https://evergreen.mongodb.com/build/mongodb_mongo_master_ubu...](https://evergreen.mongodb.com/build/mongodb_mongo_master_ubuntu1404_jepsen_bf4385aed5e528a8cf1edb7955c8c2164dda04f0_16_10_28_14_33_06)

Open and available for everyone to see, for every build of MongoDB. Is there
another database that has this much transparency? (for every build)

~~~
radicalbyte
Given their start point (a product unfit for public consumption) that is the
absolute minimum they need to do.

I at least will never trust Mongo for anything but a toy project. There are so
many better options out there, options whose technical capabilities are as
good as Mongo's marketing.

~~~
BillFinchDba
I think if you look back objectively, there are very few database platforms
that were absolutely "fit for public consumption" right out of the box. Look
at all the SQL Server shops out there (mine included) that won't even roll out
a new version of SQL Server until it hits SP 1 at a minimum... For MongoDB, If
you look forward based on what they are doing now rather than at how early
adopters may have had a sub-optimal experience way back when, you'll see a
mature product that is consistently improving and is demonstrably reliable.
Can you give an example of another option you are referring to?

~~~
josephg
Right out of the box? Mongodb has been trying to get it right for 10 years
now. Kyle says the storage engine they've used for most of that lifetime is
fundamentally flawed, and they've only now, a decade on, managed to write
something without known bugs to replace it. And maybe this time it's ok. Maybe
this time there aren't any more layers of buggy crap in mongo yet to be found
and fixed.

Maybe. But you'd have lost that bet if you made it any day in the last 10
years. And in those 10 years mongodb has demonstrated again and again that
they aren't up to the task of writing a reliable database. Even with their new
storage engine they couldn't find the bugs alone.

I think using mongo today for any mission critical data is an irresponsible
choice. I'd seriously question the judgement of any senior engineer who picks
it for a new project over rethinkdb or Postgres.

~~~
laichzeit0
Do you think MongoDB is a good choice (given how easy it is to use) when you
only care that 99.999% of your data that you insert should end up in the
database? That's my use case. Best-effort integrity. I mostly just want a DB
can insert and query fast for documents and am not really concerned if I lose
a few documents here and there.

~~~
always_good
Why wouldn't you just use anything else that can manage to insert/read data
without losing it?

I don't really understand the angle of "can I get away with it anyways, tho?"

~~~
Kiro
Some of us are already using MongoDB and are not so keen on replacing it.

~~~
cies
If you read back the discussion was scoped to "new projects". By jospehg:

> I'd seriously question the judgement of any senior engineer who picks it for
> a new project over rethinkdb or Postgres.

------
tbrock
MongoDB receives a fair amount of criticism here but the company is a
fantastic place to work. I'm proud that I was able to learn and grow as a
developer alongside all of those who have been trying (and succeeding) to make
a great database.

The team at MongoDB really cares a lot about making the best database product
possible. I knew it when I was there and still think so after I've left.

~~~
bogomipz
>"MongoDB receives a fair amount of criticism here but the company is a
fantastic place to work"

None of the criticisms I have ever seen or heard about MongoDB were related to
Mongo Inc and their office culture but rather their product.

~~~
edgan
Actually, having been through their training, and dealt with their
consultants. Their company culture is the problem. They had a pure-developer
centric mindset, and no operations mindset. So at the time they had no good
way to do backups. Their more modern solution is backups that they manage for
you, which is even more crazy.

~~~
Fiahil
Yes. Mongo is an operational nightmare. I mean, who in their right mind would
use a product where the recommended solution[0] to resync a stale replica
(which happen all the time after a long netsplit) is to "perform an initial
sync". Which, of course, mean "please remove everything and type this
command". Crazy.

[0]: [https://docs.mongodb.com/manual/tutorial/resync-replica-
set-...](https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/)

~~~
bogomipz
Indeed, and the only practical means of performing a compaction is to "rm -rf"
the data directory and let it resync from another replica set member. This is
not documented of course.

~~~
Fiahil
This is documented (okay, a little "hidden")! It's written black on white in
their documentation (see the link above):

> A replica set member becomes “stale” when its replication process falls so
> far behind that the primary overwrites oplog entries the member has not yet
> replicated. The member cannot catch up and becomes “stale.” When this
> occurs, you must completely resynchronize the member by removing its data
> and performing an initial sync.

> MongoDB provides two options for performing an initial sync:

> Restart the mongod with an empty data directory and let MongoDB’s normal
> initial syncing feature restore the data

> Restart the machine with a copy of a recent data directory from another
> member in the replica set.

Note: the second option is not a real option when you're dealing with a 700GB
database. By the time you finish the copy the oplog will be too big anyway.
Thus, making all these steps completely pointless.

And that's why it's so bad. They even acknowledge the "correct" solution is to
rm your data and resync.

~~~
bogomipz
It's documented for the use case of "stale replica" but I was referring to the
use case for when you want a compaction to reclaim disk space. For that they
recommend the db.repairDatabase() option but that requires you to have twice
the size of your db available available on whatever partition your database is
on. That was I said "practical." But yes the procedure is the same.

------
jdcarter
It's been a long way from the "Call Me Maybe: MongoDB" post from years back.
Aphyr/Kyle took them to task in so many ways for playing fast and loose with
data integrity, and rightly so. MongoDB could have said, "that guy's full of
BS, ignore him," but instead they did the smart thing and paid Kyle to help
solve the problem.

n.b. I can't find the original "Call Me Maybe" post, but this later one [1] is
similar.

[1]: [https://aphyr.com/posts/284-jepsen-
mongodb](https://aphyr.com/posts/284-jepsen-mongodb)

~~~
omginternets
>paid Kyle to help solve the problem.

Any more information on this? How did he "solve" the problem?

Edit: straight from the horse's mouth [0]

[0]
[https://news.ycombinator.com/user?id=aphyr](https://news.ycombinator.com/user?id=aphyr)

~~~
omginternets
Whoops, wrong link. Here's the correct one:
[https://news.ycombinator.com/item?id=13591048](https://news.ycombinator.com/item?id=13591048)

It seems the problems were indeed "solved" and not _solved_.

~~~
aphyr
I'm not exactly sure what you're trying to say, but perhaps I can help provide
context:

In 2013, I performed an unpaid analysis, in my nights and weekends, of
MongoDB. I found a bug leading to the loss of acknowledged writes with
majority write concern. Mongo fixed this bug within a few weeks.

In 2015, I performed a followup test as a part of my work at Stripe. I
confirmed dirty reads (which were already documented, though perhaps not well-
appreciated), and discovered stale reads (which ran counter to MongoDB's
documentation). MongoDB wasn't enthusiastic about that report initially, but
got things sorted out and started work on adding majority and linearizable
read concerns. I found that writes appeared linearizable.

In spring and summer of 2016, MongoDB paid me to help expand the Jepsen tests
and hook them up to their internal CI system, so they could use the Jepsen
tests to help verify their ongoing work towards linearizable reads. I
privately confirmed that MongoDB still failed to prevent stale reads, but the
dirty-read failures I'd seen in 2015 appeared to be prevented by majority
reads.

In fall 2016, MongoDB announced they were almost ready to release 3.4.0 with
support for linearizable reads, and paid me to perform a full analysis so they
could be more confident in the results. It passed the linearizability tests I
had initially written in 2015, but I offered to expand the tests to be more
aggressive. Our collaboration resulted in the present analysis, uncovering
design flaws in v0 and implementation bugs in v1. MongoDB worked to develop
patches prior to 3.4.0's release, and that's why it passes now. :)

~~~
omginternets
Hi Kyle. I'm not trying to put words in anybody's mouth (though perhaps my
tone suggested I was -- sorry!)

This is exactly the info I was looking for. Thank you.

------
metheus
I hadn't noticed this before, but aphyr has an elaborate and articulate ethics
policy: [http://jepsen.io/ethics](http://jepsen.io/ethics)

I like it.

~~~
lmm
So a client can get the analysis done and leave it unpublished if they don't
like the results? Maybe that's standard practice but it doesn't seem great for
users.

~~~
aphyr
I agree. In my ideal world I publish everything immediately.

There's no "standard practice" that I know of--very few people are doing this
kind of work. Behind every one of these analyses is weeks of contract
negotiation where I try to convince assorted lawyers & CFOs to go along with
my weird, idealistic ethical policies. And conversely, those lawyers and CFOs
do their best to balance the desire for correctness with their duty to protect
the company. This policy outlines how we compromise.

Originally I did offer a client veto, because clients weren't willing to sign
without some assurance of control over the outcome. My current standard
contract for analyses actually drops that client veto: I have final say on the
analysis content and publication. There's still a grace period to allow the
vendor to fix bugs & get things in order. I'm adopting this as the standard
going forward, but it's been a long road to get there.

That said, I do perform private consulting--usually for in-house systems,
sometimes for clients that can't afford a full analysis, and sometimes as a
precursor to more involved work. That means that yes, vendors may be aware of
bugs and I won't have told you about them--but I promise that my public work
remains honest and forthcoming.

~~~
remar
Hey, just wanted to say I really appreciate all the in depth posts you put out
on your site in hopes of educating others and helping them navigate the waters
of distributed systems.

Curious, is there any single book you would say you've found to be the
equivalent of say CLRS but for the fundamentals of distributed computing
paradigms?

My current plan for deep diving into distributed systems theory has just been
to go through resources like "Distributed systems for fun and profit", and
aggregated lists/overviews like [https://henryr.github.io/distributed-systems-
readings/](https://henryr.github.io/distributed-systems-readings/) and your
distsys-class notes and then just DFS into certain topics from there -
compared to the approach I've been taking with deep diving into OS and
database theory which is just to go through text books like Operating System
Concepts and Database System Concepts while applying the theory in side
projects.

------
kenwalger
MongoDB 3.4 passes the rigorous and tough Jepsen test. Jepsen designs tests to
make databases fail in terms of data consistency, correctness, and safety...
MongoDB 3.4 passed through their newest tests.

I think that this really shows how mature of a Database MongoDB is.

~~~
untog
> I think that this really shows how mature of a Database MongoDB is.

Or that it took this long for them to pass basic proficiency tests. How do
other database systems fare with these tests?

~~~
jetpacktuxedo
You can check out the other databases he has tested here:
[http://jepsen.io/analyses](http://jepsen.io/analyses)

That being said, none of the big players in sql are there, so you can't size
it up against postgres or mysql.

~~~
kedean
Postgres has been analyzed, I'm not sure why it's not on that list.

[https://aphyr.com/posts/282-call-me-maybe-
postgres](https://aphyr.com/posts/282-call-me-maybe-postgres)

------
geodel
I think this is 10th year of MongoDB's existence. This year they seem to be
having not just popular but good product. It will be interesting to see if
ratio for Good/Popularity is similar for most non-traditional databases.

------
mhoeller
Good news to hear! We just finished a fairly big IoT project where we needed
to ship a replicated database which was fast, easy to deploy and reliable. We
tried other DBs but none was so stable to be shipped with a standalone
application completely black boxed to the customer. I was septic in the
beginning but after 800 installations in 2 month we have no complains. Also
not from early adopters which run now for appr. a year.

------
rocky1138
Why would anyone use MongoDB when RethinkDB is available?

~~~
redtree
"Over a third of the Fortune 100 and many of the most successful and
innovative web companies rely on MongoDB. "

taken from their page : [https://www.mongodb.com/mongodb-
scale](https://www.mongodb.com/mongodb-scale)

This should give people some insight on why MongoDB is being used in the
industry.

~~~
mat_keep
thats an old stat - its over 50% of the Fortune 100 now

------
jlaustill
This is great news. Historically I've accepted the risk of data loss and coded
checks when needed. I will never rely on my database, regardless of which one
I am using for complete data consistency. It is, however, nice to see strides
being made towards even better robustness. Go MongoDB!

~~~
koolba
> I will never rely on my database, regardless of which one I am using for
> complete data consistency.

I can't imagine developing any software that involves relationships between
entities that does _not_ have data consistency. Check constraints, foreign
keys, and data type validation all provide a minimum sanity level of the
underlying data that allows your mind to focus on more important things.
Otherwise you're entire application is going to be littered with those same
sanity checks.

I'm not saying that all apps need that type of data store or that there isn't
room in this world for NoSQL stores, I mean specifically that complicated
interdependencies and validation checks lend themselves well to the relational
model.

~~~
mushi
I think maybe the doubt is based on case like
[https://aphyr.com/posts/282-jepsen-
postgres](https://aphyr.com/posts/282-jepsen-postgres) which all systems are
subject to.

~~~
throwawayish
Uh, no. That article is a bit of a long winded way to say that 2PC with
timeouts is 2PC with timeouts.

~~~
aphyr
It might be better to think of it as a limitation of two-generals, rather than
2PC in particular.

------
dirkg
I see many people say that Mongo is unfot for a production db, but it is
actually used in production by many companies.

Obviously this is true of many products, but how serious is it in practice?
The criteria for 'good enough' can't be passing Jespen because very few db's
do.

Storing schemaless json is a very valid use case. And the fact is there aren't
really many proper NoSql alternatives, other than RethinkDB, which I hope
becomes popular, and maybe Couch. e.g. Cassandra is often touted but its a k-v
store, not a document db.

There's a reason Mongo became and continues to be popular. I still think
Rethink is superior in every way.

------
_Codemonkeyism
I know there are a lot of bad vibes on HN about MongoDB, but I also know
several companies which MongoDB enabled with the right libraries to much
faster iterate on "schema" and business requirements (in their growth phase)
than with other databases at that time (some years ago). The schemaless
approach of MongoDB suited their fast changing needs much better than e.g.
RDBMs.

What currently still makes MongoDB nicer than e.g. Postgres JSON are the
libraries embracing schemaless while JSON in PG libs still feels tacked on.

Recently I had some bad experience with MongoDB support though.

No experience with RethinkDB.

------
lowbloodsugar
Is protocol v1 something that existing users can migrate to with just a client
library swap, or is it a rewrite effort?

~~~
aphyr
Protocol v1 should be largely transparent to clients (at least for standard kv
operations). It's a change you make on the server, at the replica set level.

------
ClayFerguson
These 'from the ground up' totally all-new-code approaches to DBs are just a
scary proposition. Think of the thousands of man-years of effort that went
into building MySQL, testing its codebase, and perfecting it's robustness
(fail-proof-ness). What does MongoDB bring that couldn't have 'built on top'
of MySQL codebase, and used MySQL transational layer as it's underpinnings.
Sure, MongoDB gets all its performance gains from delaying writes to the DB
(eventually consistent), caching in memory, and dispensing with ACID, however
there is nothing about MongoDB that couldn't have been written to use the DB
layer of MySQL at the point where it actually does its writes to disk. In this
way, MongoDB would have revolutionized the world rather than mostly
"fragmenting" the storage space.

I guess there are those who will say that even using batch-commits, that
MongoDB could never have achieved the performance it currently does (by
bypassing any ACID) if it was built on top of MySQL. But regardless, why not
focus efforts on improving MySQL batch processing performance rather than
throwing it all out, starting from scratch, and writing directly to disk. I
know MongoDB became a success, but I think that is 'in spite of' their
decision to start from scratch and not 'because of' it. Also think of the
power that would be available if there were some relational table capability
(true real MySQL ACID) right inside MongoDB whenever it was needed, if they
were the 'same animal', rather than having to use two totally and completely
separate DBs if you need NoSQL and also ACID in an app, which 99% of apps DO
NEED, at some point, once an app grows beyond the round-one funding startup-
toy phase and MongoDB falls flat in it's RDB capabilities.

~~~
squeaky-clean
> Also think of the power that would be available if there were some
> relational table capability (true real MySQL ACID) right inside MongoDB
> whenever it was needed, if they were the 'same animal', rather than having
> to use two totally and completely separate DBs

Just use Postgres (or several other options) then? It has all that built right
in if it's what you need. But...

> if you need NoSQL and also ACID in an app, which 99% of apps DO NEED, at
> some point,

...I doubt 99% of apps really need this.

> once an app grows beyond the round-one funding startup-toy phase and MongoDB
> falls flat in it's RDB capabilities.

Or don't try to force MongoDB to behave like a RDB and you won't hit those
problems? I just moved a system from Postgres to MongoDB, and it's running
faster on way cheaper hardware now. Not because either database is inherently
better than the other, but the use case lined up with mongodb perfectly and
the old model was leveraging Postgres really poorly. Eventual consistency is
fine, and I can denormalize certain data for faster reads because I know it
won't be modified later.

~~~
ClayFerguson
I don't know Postgres, and I just consider MySQL to be the leading best-in-
class open source DB engine available. I realize that's debatable. But
personally my mind is made up on that.

I think relational-type queries are so common that any significant app will
need them. For my own side project (meta64.com) I'm using JCR-SQL2, on Apache
Oak jackrabbit, so I get to do lookups fairly easily, but it sure would be
nice to have a full-blown ACID RDBMS engine sitting right there to use also,
in the same engine.

------
ahosny
This is the first time I hear of Jepsen. I think I'm immersed completely in
Java than databases!

