
Call Me Maybe: MongoDB Stale Reads - llambda
https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads
======
Maro
From 2009 to 2012 I had a distributed database startup that competed with
MongoDB. We used Paxos for replication and built the database with on-disk
consistency guarantees --- like the ones this article looks for and rightly
obsesses over --- in mind.

[https://github.com/scalien/scaliendb](https://github.com/scalien/scaliendb)

Outcome: you've never heard of ScalienDB; MongoDB brilliantly won by winning
the hearts and minds of hackers and coders who don't care about such issues,
but were able to get started quickly with Mongo (and got cool free cups at
meetups). It turns out that's most engineers out there, definitely the initial
critical mass to target for a database startup like Mongo.

Btw. the story behind Oracle is similar: early versions were basically write-
only; read Ellison's book 'Softwar'. Of course there are other ways to get
started: for example DBs coming out of academic research like Vertica seem to
avoid this problem; in that case initial funding is basically provided by the
gov't and when they create the company to commercialize they're already
shooting for Enterprise contracts, skipping the opensource/community building
phase of Mongo.

~~~
dublinclontarf
I don't know, I got started with Mongo because ... it was so easy to start
with, but come deployment time got bitten by a LOT of issues (which
essentially negated any advantage in using Mongo).

As a result of this experience I almost exclusively use PostgreSQL, and I've
never EVER been burned by taking this approach.

Sometimes I do use another DB but there has to be a seriously good reason for
it.

~~~
nazka
Same thing. I was a software dev, then I went to the back end with MongoDB and
NodeJS, and I learnt what ACID means...

Also about MySQL, many people forget it isn't ACID compliant either and just
look at benchmarks or use MySQL "because Facebook uses it". I am sure that if
any startup would use PostgreSQL it would avoid many problems on the road.

~~~
coolgeek
I think that PostgreSQL is, in almost every aspect, a superior product to
MySQL.

But MySQL not being ACID compliant is flat out wrong (assuming that you're
using InnoDB)

~~~
nazka
Yes with InnoDB it is.

------
bkeroack
If you are a database author and you get a bug report from Kyle, spend a
_long_ time thinking about it before closing the issue as invalid.

~~~
jodah
They're not the only ones:

[https://github.com/elastic/elasticsearch/issues/2488#issueco...](https://github.com/elastic/elasticsearch/issues/2488#issuecomment-89509176)

~~~
craigching
Actually elasticsearch took their jespen results pretty seriously and even
have an ongoing status of their resiliency. See my post on this:

[https://news.ycombinator.com/item?id=9418318](https://news.ycombinator.com/item?id=9418318)

------
jxf
The most interesting lessons from the Jepsen series:

* You should never trust, and always verify, the claims made by database manufacturers.

* Especially when those claims relate to data integrity.

* Super-especially when every safety level provided by the manufacturer that includes the word "SAFE" is actually unsafe.

~~~
threeseed
Actually the broader lesson would be to assume the worst in your application
layer and try and remediate/verify wherever possible.

If you look at his articles: Redis, PostgreSQL, Cassandra, ElasticSearch etc
all had data consistency errors. And none of those have vendors making any
claims.

It's pretty sobering to say the least.

~~~
ak4g
Um, this is the postgres article:

[https://aphyr.com/posts/282-call-me-maybe-
postgres](https://aphyr.com/posts/282-call-me-maybe-postgres)

There were no acknowledged writes lost. The only unacked-but-successful writes
resulted from a connection while a commit ack was in-flight. That doesn't
qualify as a data-consistency error, it means the client has to check if the
data is present after reconnecting.

But in no cases would the client reconnect to find that there were
acknowledged-as-committed records that were missing or stale. In no cases
would the client find that responded-as-rolled-back data was actually
committed. This is very, very different than what is seen with MongoDB.

~~~
tracker1
Now try it again with PostgreSQL's built in sharding or replication
functionality... oh, wait.

~~~
zapov
The parent post correctly pointed out that including Postgres in that list is
misleading at least.

------
addisonj
Mongo absolutely nailed creating a database that is easy to get started with
and even do things that are traditionally more 'hard' such as replication. It
is still super attractive for me to pick it up for small projects, even after
dealing with its (many) pain points both in development and operational
settings.

Given this, it is so tragic to see how dismissive they have been in regards to
the consistency issues that have plagued the db since the early days. Whether
it was the stupidity of bad defaults in drivers to not confirm writes, or
easily corruptible data in the 1.6 days, or now with not seriously looking at
the results of jepsen, the mongodb organization has never taken the issues
head on. It would be so refreshing to see more transparency and admitting to
the faults rather than wiggling around them until eventually pushing a fix
buried in patch notes.

I often feel like a mongodb apologist when I admit that I don't mind using
mongo for small (and not important) projects and while the mongodb hate can be
a bit extreme at times, the companies treatment of these sorts of issues may
justify some of it.

~~~
meritt
After MongoDB published their write speed benchmarks based entirely on
unacknowledged writes (e.g. how fast can you write to a socket?), it's been a
long downhill ride with an immense amount of inexplicable ignorant support.

~~~
hendzen
Can you post a link to these unacknowledged write benchmarks? I can't find
them.

~~~
meritt
Need to find an archived version but it caused a lot of arguments in
2009/2010: e.g. [http://rethinkdb.com/blog/the-benchmark-youre-reading-is-
pro...](http://rethinkdb.com/blog/the-benchmark-youre-reading-is-probably-
wrong/) references similar benchmarks

Can also link simply to the HN discussion from back then too:
[https://news.ycombinator.com/item?id=1496035](https://news.ycombinator.com/item?id=1496035)

> Full disclosure: I work for 10gen.

> We did this to make MongoDB look good in stupid benchmarks.

------
dantiberian
There's a lot going on here, but the summary is: "What Mongo actually does is
allow stale reads: it is possible to execute a WriteConcern=MAJORITY write of
a new value, wait for it to return successfully, perform a read with
ReadPreference=PRIMARY, and not see the value you just wrote."

[https://jira.mongodb.org/browse/SERVER-17975?focusedCommentI...](https://jira.mongodb.org/browse/SERVER-17975?focusedCommentId=892980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel#comment-892980)

------
jamescostian
I'm so glad to see the Jepsen series re-instated. Thank you so much Stripe

------
jxf
Question: How do I actually run Kyle's tests to see this for myself? (Not that
I don't believe him, I just want to play around a bit.)

When I run `lein install` and then `lein test`, I get:

    
    
        ╰─▶ ψ lein test
        Exception in thread "main" java.io.FileNotFoundException:
        Could not locate jepsen/db__init.class or jepsen/db.clj on classpath: ,
        compiling:(mongodb/core.clj:1:1)
    	at clojure.lang.Compiler.load(Compiler.java:7142)
    	at clojure.lang.RT.loadResourceScript(RT.java:370)
    	at clojure.lang.RT.loadResourceScript(RT.java:361)

~~~
caipre
Can't answer your question, but I'm curious how you managed to include an
image in your comment. I didn't think embedded HTML was possible?

~~~
jxf
That's from my command prompt that I wrote -- the arrows are just Unicode
characters. You can see it here if you like:

[https://github.com/fj/dotfiles/blob/master/home/.config/shel...](https://github.com/fj/dotfiles/blob/master/home/.config/shell/prompt.sh#L140-L143)

 _Edit_ : On further reading, I think you seem to be thinking that the paste
of my console is an image. It is just text. To make fixed-width text on HN,
indent each line by four spaces, like this:

    
    
        This text has four spaces at the beginning of its line.

------
cpks
People really underestimate the value of Occasional Consistency. Occasionally
Consistent databases, like MongoDB, are great for approximation algorithms,
sublinear time algorithms, and similar applications.

~~~
jdcryans
The issue isn't that MongoDB is eventually consistent, it's that the
documentation claims that in some cases it's strictly consistent[1] while Kyle
found that:

"MongoDB, even at the strongest consistency levels, allows reads to see old
values of documents or even values that never should have been written."

1\. [http://docs.mongodb.org/manual/reference/glossary/#term-
stri...](http://docs.mongodb.org/manual/reference/glossary/#term-strict-
consistency)

~~~
cpks
I didn't say Eventual Consistency. I said Occasional Consistency. MongoDB is
has hard Occasional Consistency. Indeed, it is the most occasionally
consistent database I know of. I once wrote a few million records into Mongo.
It was consistent before the write, but never again after.

Great for sub-linear time algorithms! At that point, all my algorithms ran at
less than O(n) on the size of the data I had written in.

From a business perspective, Occasional Consistency is also a very nice
property if you are storing audit data for certain types of organizations. It
gives complete plausible deniability about rule compliance.

~~~
jdcryans
Heh, Poe's Law, etc :)

I guess you could also call it Quantum Consistency.

------
geowa4
Since Postgres added a JSON type and Docker made running it simple in
development, I haven't had a need for anything else. Call me old school, but I
prefer starting with a relational database and changing when it's no longer
appropriate.

~~~
nazka
Old school? It's the best thing to do in my opinion. An ACID relational
database that can do even more than that! I think it's one of the best DB for
startups.

------
ivanb
So what should users of MongoDB do? I'm asking because it is the main database
used in Meteor and I'm very interested in Meteor.

Should the general advice just be "store in MongoDB everything that doesn't
require consistency and use Postgresql for everything else"?

~~~
edejong
The general advice should be: use PostgreSQL in case you are uncertain what to
use. Watch some Youtube video's with Michael Stonebreaker (2014 Turing Award
winner) and start getting disillusioned by the NoSQL hype.

Then, try to understand the mess Edgar Codd tried to fix in the '60s and '70s.

~~~
AdrianRossouw
Do you have any specific videos we should watch?

~~~
edejong
In the top-hit on Youtube, Michael starts to discuss the solution-space
@31m15s
([https://www.youtube.com/watch?feature=player_detailpage&v=OY...](https://www.youtube.com/watch?feature=player_detailpage&v=OYGJe1z97VI#t=1875))

------
rdtsc
I still don't get it. MongoDB can't possibly call itself a database. I can
understand MongoScratchStorage, MongoPorbabilisticDataEngine but not MangoDB.

~~~
gasping
MongoWeakReference

------
sylvinus
Another instance of Kyle's amazing research! You may want to catch him on
stage with other great minds at dotScale on June 8:
[http://dotscale.io](http://dotscale.io)

------
Kiro
This article is too technically advanced for me. As a casual MongoDB user, how
do these problems affect me?

~~~
tel
In replicated Mongo scenarios higher write volume increases the probability of
inconsistent reads. What this means is that there's a chance that on some
data—no matter how safely you attempt to write it—you'll end up in a totally
inconsistent state for your system.

The actual impact of an inconsistent state is very hard to judge. It could be
as minimal as having two different users just see something weird on their
screen for a moment and then it goes away. It could even be totally avoided if
your application handles inconsistent data well.

At the same time, it could also cause complete and nearly untraceable complete
corruption of all data in your system. Who knows?

It'd be a bit like building a bridge using metal with a known defect. It'll
probably work fine for a long time and depending on how and where that metal
was used you might be alright.

Or you might have a complete structural integrity failure at any moment once
stress starts ramping up and you'll just have to blame it wholesale on using
bad materials.

------
bsaul
I seem to remember from a foundationDB talk that they first spent two years
building a simulation environment to control everything from network to
persistance for testing scenarios.

Does anyone know of any open-source project that would aim at doing the same,
so that future NoSQL DB can finally be built on strong foundations ?

------
narrator
I knew something was funny with Mongo when all the api calls defaulted to
writes not being guaranteed to sync to disk. Maybe for a use case like
aggregate statistics gathering it would be ok to risk missing a few updates in
a crash for the sake of speed, but to make that the default??

~~~
threeseed
You know this configuration change was changed in November 2012.

Do you think it's still relevant to be bringing this up ?

~~~
lazyloop
Actually, the defaults are still unsafe, just the marketing language has
changed. Take the Node.js driver for example, it defaults to w=null and
j=false. [http://mongodb.github.io/node-mongodb-
native/2.0/api/Db.html](http://mongodb.github.io/node-mongodb-
native/2.0/api/Db.html)

------
lobo_tuerto
I think it would be great to see one of these done for RethinkDB :)

~~~
timmaxw
RethinkDB engineer here. RethinkDB currently doesn't support automatic
failover, so this test couldn't be performed for RethinkDB yet. But when we
implement automatic failover we're planning to test it against Jepsen. That
will probably be sometime in the next few months.

~~~
tootie
DB engineers living in mortal fear of Kyle is where we want to be.

------
bakhy
I must admit, I always feel like I am missing something in these discussions.
Like I didn't get some memo... I just don't expect a DB like MongoDB to
guarantee consistency. The whole story around NoSQL and the likes was to
enable crazy horizontal scaling needed for the web. Phrases like "eventual
consistency" flew around. It seems so logical - you lose consistency, gain
scalability.

But somehow, people simply started using them everywhere? Assuming that these
DBs are just like any other? And now, we're all bashing on MongoDB because it
is - not consistent? What happened here? :)

NB that I do not wish to attack the OP - if MongoDB now claims to be
consistent in any way, that deserves scrutiny. And these analyses are always a
really interesting read. But the general tone in the developer community about
MongoDB seems a bit irrational.

~~~
fennecfoxen
> It seems so logical - you lose consistency, gain scalability. ... And now,
> we're all bashing on MongoDB because it is - not consistent? What happened
> here? :)

There are ways to do "eventual consistency" responsibly. Mind you, it's
obnoxiously tricky to do it right, even when someone has provided an
underlying implementation that works exactly as promised. But if you design
your data access patterns in the right way, the system can provide guarantees
so that even if it doesn't have all your data _at the moment_ , you can still
ask questions about the the state of the data that _is_ available, and get
meaningful responses back that conform to a certain set of guarantees.

What happened here -- why we make fun of MongoDB -- is that it doesn't provide
many promises like that, and even when it does, its implementation does a
very, very bad job of delivering them ... and it doesn't even do a good job of
delivering scalability. (It's basically a mmap()'d series of b-trees of BSON
documents, so as soon as you run out of RAM, you're at risk of having the
kernel swap out all your _indicies_ instead of your data, whereupon
performance craters. Oh, and the much-mocked global write lock has finally
been replaced with a _per-database_ write-lock in recent versions.)

In short, you sacrifice everything and gain... a modestly convenient API for
document-storage, _maybe_.

------
agopaul
So, now I'm wondering: why is Stripe using Mongo at all? Maybe they are
planning to migrate to another DBMS?

------
ccleve
Does anyone have any references on how you _could_ write a distributed
database that met all ACID properties? Surely there's an academic paper that
says that if you do A then B then C, you are guaranteed a certain level of
consistency.

We've developed a type of distributed database at my company, and I think it's
pretty solid, but I need a broader familiarity with the available theory.

~~~
teraflop
Aside from reading papers, it's a good idea to look through the syllabus of a
distributed systems course to get a broad idea of what the problem space looks
like.

Academic papers will talk about "minimal" problems like consensus, or
desirable properties like sequential consistency, and expect you to already
know why those concepts are important. If your experience is mostly hands-on,
it may not be obvious how it all applies to real-world systems.

Say you have a complex distributed database. Forget all the bells and
whistles: can it solve the problem of allowing a set of processes to reliably
agree on a single Boolean value? If so, then you're trying to provide the same
consistency guarantees as Paxos/Raft. So if your architecture is substantially
simpler than Raft, then either you've come up with something really ingenious,
or you've missed some edge cases.

~~~
kasey_junk
Or you are obviously violating proven theory...

------
posnet
Would the use of wired tiger as a storage engine affect these results?

~~~
misframer
My guess is no. This is more about the behavior of the database as a
distributed system and not just a storage engine.

------
chatman
Apache Solr has done very well at Jepsen tests.

------
chucksmart
Maybe we should listen to Larry Ellison when he say "gimme my money!"

~~~
Roboprog
Has Oracle been run through this test battery?

Or would publishing the results of doing so bring down an army of Larry's
lawyering henchmen? "You violated the EULA, now you must pay! You will only
wish you were dead when _we_ are done with you, bwahahhahahaha!"

~~~
Roboprog
I guess the first rule of "Benchmark Club" is that we don't talk about
Benchmark Club?

Now off to deface a piece of corporate art... :-)

------
pje
upvoted for the Look Around You link alone.

------
threeseed
Shame this wasn't done with the latest version 3.0. Although given that
improvements are scheduled for 3.1 I would imagine it might be still an issue.

Nice writeup either way though. Would like to see a similar article for Couch*
and MySQL.

