
Jepsen Disputes MongoDB's Data Consistency Claims - anarchyrucks
https://www.infoq.com/news/2020/05/Jepsen-MongoDB-4-2-6/
======
madhadron
In the circles I run in, MongoDB is regarded as a joke and the company behind
it as basically duplicitous. For example, they still list Facebook as their
first user of MongoDB on their website, for example, but there is no MongoDB
use in Facebook hasn't been for years (it came in only via a startup
acquisition).

I had the misfortune to use MongoDB at a previous job. The replication
protocol wasn't atomic. You would find partial records that were never fixed
in replicas. They claimed they fixed that in several releases, but never did.
The right answer turned out to be to abandon MongoDB.

~~~
kyllo
I was floored by this comment yesterday from one of their Developer Relations
people:

> _Did any of you actually read the article? We are passing the Jepsen test
> suite and it was back in 2017 already. So, no, MongoDB is not losing
> anything if you know what you are doing._

[https://twitter.com/MBeugnet/status/1253622755049734150?s=20](https://twitter.com/MBeugnet/status/1253622755049734150?s=20)

Can you imagine saying the phrase "if you know what you are doing," in public,
to your users, as a DevRel person? Unbelievable.

~~~
jen20
Firstly let me point out that this response is neither intended as a defence
of MongoDB defaults which are atrocious, or of the company, who are arguably
duplicitous.

However I can _quite easily_ see how a non-native English speaker could use
the phrase “if you know what you are doing” to mean “if you are careful”.

~~~
kyllo
Perhaps, but the point is, if you're working in DevRel and you think your role
is to defend the product from criticism, and you do it by placing the onus on
the developer to figure out how to use your product safely, you've totally
lost the plot.

~~~
bauerm97
Also, if you're a DevRel, maybe you should have strong verbal and written
communication skills in the language of your users? Just an idea

------
naked-ferret
From the jepsen report:

"""

Curiously, MongoDB omitted any mention of these findings in their MongoDB and
Jepsen page. Instead, that page discusses only passing results, makes no
mention of read or write concern, buries the actual report in a footnote, and
goes on to claim:

> MongoDB offers among the strongest data consistency, correctness, and safety
> guarantees of any database available today.

We encourage MongoDB to report Jepsen findings in context: while MongoDB did
appear to offer per-document linearizability and causal consistency with the
strongest settings, it also failed to offer those properties in most
configurations.

"""

This is a really professional to tell someone to stop their nonsense.

~~~
Thaxll
MySQL and PG are not truly consistent per default, they don't fsync every
writes.

MongoDB explains that pretty well:
[https://www.mongodb.com/faq](https://www.mongodb.com/faq) and
[https://docs.mongodb.com/manual/core/causal-consistency-
read...](https://docs.mongodb.com/manual/core/causal-consistency-read-write-
concerns/)

~~~
castorp
> MySQL and PG are not truly consistent per default, they don't fsync every
> writes.

Postgres most certainly does fsync by default.

It's tru, you can disable it, but there is a big warning about "may corrupt
your database" in the config file.

~~~
Thaxll
No PG does not fsync every writes, more details here:
[https://dba.stackexchange.com/questions/254069/how-often-
doe...](https://dba.stackexchange.com/questions/254069/how-often-does-
postgres-or-mysql-make-the-fsync-call)

My point is people complain about MongoDB are the one not using it most
likely, MongoDB is very different from 10 years ago.

I like to remind people that PG did not have an official replication system
10years ago and as of today is still behind MySQL. No DB is perfect, it's
about tradeof.

~~~
wolf550e
> It writes out and syncs the accumulated WAL records at each transaction
> commit, unless the committed transaction touched only UNLOGGED or TEMP
> tables, or synchronous_commit is turned off.

So wal is synced before commit returns, and if you power cycle immediately
after, the wal is played back and your transaction is not lost? So it's fine?

It does not need to sync all writes, only the records needed to play back the
transaction after restart. This is what all real databases do.

------
thomascgalvin
You can tell a lot about a developer by their preferred database.

* Mongo: I like things easy, even if easy is dangerous. I probably write Javascript exclusively

* MySQL: I don't like to rock the boat, and MySQL is available everywhere

* PostgreSQL: I'm not afraid of the command line

* H2: My company can't afford a database admin, so I embedded the database in our application (I have actually done this)

* SQLite: I'm either using SQLite as my app's file format, writing a smartphone app, or about to realize the difference between load-in-test and load-in-production

* RabbitMQ: I don't know what a database is

* Redis: I got tired of optimizing SQL queries

* Oracle: I'm being paid to sell you Oracle

~~~
threeseed
And you can tell a lot about a developer when they post comments like this.

Almost none of is remotely accurate e.g. RabbitMQ isn't even a database.

~~~
wzy
I can't believe the one item that was so obviously added as a joke went right
over head.

It may be good idea to take a break from the computer and find something less
stressful to do.

~~~
ceocoder
Perhaps that’s because some other message brokers are now being touted as
databases[0][1], I remember seeing a thread about it on HN couple of days ago.

[0] [https://www.confluent.io/blog/okay-store-data-apache-
kafka/](https://www.confluent.io/blog/okay-store-data-apache-kafka/)

[1] [https://dzone.com/articles/is-apache-kafka-a-database-
the-20...](https://dzone.com/articles/is-apache-kafka-a-database-
the-2020-update)

~~~
ashtonkem
Kafka is a very different beast from RabbitMQ.

------
dang
All: we've changed the submitted URL from
[https://www.infoq.com/news/2020/05/Jepsen-
MongoDB-4-2-6](https://www.infoq.com/news/2020/05/Jepsen-MongoDB-4-2-6) to the
work it is reporting on. You might want to read both, since the infoq.com
article does give a bit of background.

Edit: never mind, I think the other URL -
[http://jepsen.io/analyses/mongodb-4.2.6](http://jepsen.io/analyses/mongodb-4.2.6)
\- deserves a more technical thread, so will invite aphyr to repost it
instead. It had a thread already
([https://news.ycombinator.com/item?id=23191439](https://news.ycombinator.com/item?id=23191439))
but despite getting a lot of upvotes, failed to make the front page
([http://hnrankings.info/23191439/](http://hnrankings.info/23191439/)). I have
no idea why—there were no moderation or other penalties on it. Sometimes HN's
software produces weird effects as the firehose of content tries to make it
through the tiny aperture of the frontpage.

------
VonGuard
Lying about your test results from Jepsen is like going onto a reality show
with Chef Ramsey, being thrown off for incompetence, then putting his name on
your restautant's ads "Chef Ramsey ate here!"

I'd pay to watch Kyle screaming at people in the MongoDB offices, not that he
screams or anything. Just a spectacular mental image: "IT'S NOT ATOMIC! IT
COULDN'T SERIALIZE A DOG'S DINNER!"

~~~
jagannathtech
I would watch a tech version of Ramsey's show.. oh boy!

~~~
tluyben2
Yep, always thought shame there isn’t one but too small of a niche I guess.
Also, almost everyone telling online that they apply best practices at their
company is maybe lying and wishful thinking; that would come out so no-one
would apply for the show. So maybe more of a startup show where ‘a Ramsey’
comes in when a (bootstrapped or angel invested; VC funded is not saveable
that way imho) company is in distress for tech reasons put in by the founders.
Relevant pet peeve for this thread; let us (tiny, cash strapped startup
company with founders who know just not enough about prod envs to do a lot of
damage) do everything autoscale in the cloud and now we have a burnrate of
$28k/mo on AWS bills with 5 users.

------
ncmncm
MongoDB's big problem is that their present user base _does not want_ the
problems fixed, particularly at default settings, because it would mean going
slower. Their users are self-selected as not caring much about integrity and
durability. There are lots of applications where those qualities are just not
very important, but speed is. People with such applications do need help with
data management, and have money to spend on it.

The stock market wants to see the product as a competitor with Oracle, so
demands all the certifications that say so. MongoDB marketing wants to be able
to collect money as if the product were competitive. Many of the customers
have management that would be embarrassed to spend that kind of money on a
database that is not. And, ultimately, many of the applications do have
durability requirements for _some_ of the data.

So, MongoDB's engineers are pulled in one direction by actual (paying) users,
and the opposite direction by the money people. It's not a good place to be.
They have very competent engineers, but they have set themselves a problem
that might not be solvable under their constraints, and that they might not be
able to prove they have solved, if they did. Time spent on it does not address
what most customers want to see progress on.

~~~
threeseed
If they only cared about performance then they would've left the write concern
defaults to not acknowledge writes either locally or within a replica set. Or
just read from the nearest replica and don't worry about potential consistency
issues.

Also this isn't 2011. MongoDB is not a competitor to Oracle and never really
has been by people that knew that a DocumentDB was not usable as a SQL one.
It's other SQL databases that are the real competitors e.g. Snowflake,
Redshift are.

~~~
ncmncm
You know it, I know it, MDB knows it, and most of their customers know it, but
that doesn't matter: the stock market doesn't. MDB wants to be valued like a
durable-database company, and to be able to charge durable-database prices.
They need a plausible durable-database story to get those, regardless of what
actual current users want.

It is possible there are still potential users not buying until they get that
story. MDB wants those users.

------
jedberg
MongoDB started life as a database designed for speed and ease of use over
durability. That's not a good look for a database.

People have told me that they have since changed, but the evidence is
overwhelmingly and repeatedly against them.

They seem to have been successful on marketing alone. Or people care more
about speed and ease of use than durability, and my assumptions about what
people want in a database are just wrong.

~~~
otterley
> MongoDB started life as a database designed for speed and ease of use over
> durability. That's not a good look for a database.

I think it depends. One could say the same about Redis, but it's wildly
successful and people love it.

The difference is now they are advertised. Redis makes no claims to be
anything other than what it is - a fast in-memory database that has some
persistence capability but isn't meant to be a long-term data store. MongoDB,
on the other hand, made (and continues to make) claims about being comparable
in atomicity and durability to traditional SQL databases (but magically much
faster!) that haven't withstood scrutiny.

Keep in mind, too, that most data ain't worth much. It's one thing to entrust
data of low value in MongoDB; another to store mission-critical data in it. I
would look askew at leadership who didn't ask hard questions about storing
data worth millions or billions of dollars in MongoDB without frequent
snapshots -- and even then, the value mustn't be contingent on the 100%
accuracy of said data.

~~~
gav
When I'm thinking about data stores in large systems I like to break them down
depending on how they are used on two main axes: is it fast/slow moving and
durability from "we don't care" and "we must never lose data".

It's easier to reason about systems if there's fewer things that require
durability guarantees, ideally you want to be able to draw data flows that
look like a tree instead of a graph.

I find that Redis fits great because it's perfect for a whole bunch of
different temporal shared state needs, everything from sessions to partial
results. I've also deployed things like Ehcache, MongoDB, and Memcached to fit
these needs and found other tools such as Kafka or RabbitMQ to be great
"glue".

Having the root of your important data be something "boring" like Postgres or
MySQL (or even Oracle!) is just good risk management to me. I wouldn't want to
trust Redis or MongoDB for important data because it adds to the things I have
to worry about. It's "keeping your eggs in one basket" while making sure that
basket is really well looked after.

------
speedgoose
The Jepsen analysis :
[https://jepsen.io/analyses/mongodb-4.2.6](https://jepsen.io/analyses/mongodb-4.2.6)

------
erulabs
I wonder if I'm the only sysadmin in the world who doesn't hate MongoDB. Yes,
I wouldn't use it for new projects, and yes, I wish RethinkDB had taken its
place, but it's not as horrible as people seem to think. Default
configuration... If it weren't for RDS' doing PG-bouncer-style connection
management, 95% of production postgres instances would probably fail. It
innodb_buffer_pool_size wasn't set properly, plenty of data-centers would
light on fire. If no one setup a firewall or AOF for redis, it's data-loss and
data-exposure waiting to happen. If no one adds auth to an HTTP route, it's
open to the world, etc etc etc. If tech-stacks were legos, software engineers
would earn a heck of a lot less.

I absolutely agree it's been used by people who just don't want to write SQL
queries, or being used as a text-search-engine in place of something like more
appropriate like ElasticSearch, but to mock successful projects who were based
on it seems silly. It reminds me of interviewing candidates at a startup who
primarily used PHP/MySQL. Most of them openly laughed and called it all
horrible. I voted "no" on them, and sometimes injected a somewhat toxic "ah,
you're right - we should close up shop. Someone call Facebook - tell them
their tech stack is horrible - shut it all down!".

You can learn a lot about a developer by asking "What do you think about
Mongo, JavaScript, or PHP", and if their response isn't a shrug, they're
probably more concerned with what editor is correct than if the product
they're building is useful. It's an exceptional filter to reject zealots and
find pragmatists.

All that said, MariaDB with MyRocks is _awesome_, but certainly not with the
default settings :)

~~~
ianamartin
RethinkDB is a better solution to every problem that MongoDB claims to solve.
I wouldn't use it for everything. But once my need for a document store
outgrows what's convenient and easy in Postgres with JSONB, I reach for
Rethink. It's great. There's a Jepsen analysis of it a while back too that is
quite positive.

It's a shame that Rethink did so many things right and failed as a company
while Mongo continues to do almost everything wrong as a company and still
gets business.

~~~
rixed
> It's a shame that Rethink did so many things right and failed as a company
> while Mongo continues to do almost everything wrong as a company and still
> gets business.

This seems to be more the rule than the exception, doesn't it?

It's even not that hard to come up with explanations for this, main one
certainly being that popularity depends essentially upon simplicity.

And simplicity might not even be economically as inept as we would like it to
be. Indeed, since only a small minority of all the systems that are designed
reach production and stay there for long then it can make sense to use the
quickest piece of junk available, at least until proven it will stick.

------
Ice_cream_suit
There is much amusement to be obtained from reading Jepsen's report:

"MongoDB’s default level of write concern was (and remains) acknowledgement by
a single node, which means MongoDB may lose data by default.

...Similarly, MongoDB’s default level of read concern allows aborted reads:
readers can observe state that is not fully committed, and could be discarded
in the future. As the read isolation consistency docs note, “Read uncommitted
is the default isolation level”.

We found that due to these weak defaults, MongoDB’s causal sessions did not
preserve causal consistency by default: users needed to specify both write and
read concern majority (or higher) to actually get causal consistency. MongoDB
closed the issue, saying it was working as designed"

[http://jepsen.io/analyses/mongodb-4.2.6](http://jepsen.io/analyses/mongodb-4.2.6)

------
crazybit
MongoDB is horrible, I get it.

What do I use in this situation:

1) I need to store 100,000,000+ json files in a database

2) query the data in these json files

3) json files come from thousands upon thousands of different sources, each
with their own drastically different "schema"

4) constantly adding more json files from constantly new sources

5) no time to figure out the schema prior to adding into the database

6) don't care if a json file is lost once in awhile

7) only 1 table, no relational tables needed

8) easy replication and sharding across servers sought after

9) don't actually require json, so long as data can be easily mapped from json
to database format and back

10) can self host, no cloud only lock-in

Recommendations?

~~~
gilbetron
Elasticsearch? [http://smnh.me/indexing-and-searching-arbitrary-json-data-
us...](http://smnh.me/indexing-and-searching-arbitrary-json-data-using-
elasticsearch/)

Depends on what your queries look like, I guess.

~~~
inglor
Just adding that I have used elasticsearch for a use case under the above
constraints several times in the past and it worked well.

Ironically once because mongo was such a pain to work with I dumped the data
from it into ES to get the better API, usability and Kibana.

------
NelsonMinar
I think it's remarkable this report has been out for a week now and no one at
MongoDB has commented on it. At least, not that I have seen.

~~~
pengaru
Maybe they're too busy spending their MDB money.

[https://www.google.com/search?q=NASDAQ:+MDB](https://www.google.com/search?q=NASDAQ:+MDB)

~~~
threeseed
I genuinely am confused by comments like this.

Are companies not supposed to invest money into their product, sales, people
etc ?

And why does being listed on the NASDAQ imply being flush with money ?

~~~
pengaru
> Are companies not supposed to invest money into their product, sales, people
> etc ?

> And why does being listed on the NASDAQ imply being flush with money ?

It was intended to be a playful reference to MDB's stock price being on a tear
right now, not simply being listed on NASDAQ.

Expand the timeline on the graph to "Max", it's at an all time high.

------
seemslegit
"We found that due to these weak defaults, MongoDB’s causal sessions did not
preserve causal consistency by default: users needed to specify both write and
read concern majority (or higher) to actually get causal consistency. MongoDB
closed the issue, saying it was working as designed, and updated their
isolation documentation to note that even though MongoDB offers “causal
consistency in client sessions”, that guarantee does not hold unless users
take care to use both read and write concern majority. A detailed table now
shows the properties offered by weaker read and write concerns."

That sounds like a valid redress, or am I missing something ?

~~~
Smaug123
Kyle's point is that it's arguably valid but certainly unhelpful: the _default
settings_ are liable to lead to data loss. Moreover, he draws attention
specifically to transactions as something which you would expect to make
things safer, but in fact there's a rather arcane part of the documentation
that notes that you need to manually specify both read and write concerns on
every transaction individually if you want transactions to behave
consistently, regardless of the concerns specified at the database level.

Basically, there are a large number of pitfalls that it's very easy to fall
into unless you have an encyclopaedic knowledge of the documentation, and you
need to ignore some of the words that are used (like "transaction" or "ACID")
because they carry connotations that either do not apply or only apply if you
do extra work to make it so.

~~~
scarface74
How is this any different than DynamoDB where you specify that you want either
eventual consistency vs strong consistency? DDB also does eventual consistent
reads by default.

Is the argument that Mongo’s documentation isn’t clear?

~~~
Smaug123
I trust Kyle when he tells me that the behaviour he observes is surprising.
From the analysis
([https://jepsen.io/analyses/mongodb-4.2.6](https://jepsen.io/analyses/mongodb-4.2.6)):

"In order to obtain snapshot isolation, users must be careful not only to set
the read concern to snapshot for each transaction, but also to set write
concern for each transaction to majority. Astonishingly, this applies even to
read-only transactions."

"This behavior might be surprising, but to MongoDB’s credit, most of this
behavior is clearly laid out in the transactions documentation… MongoDB offers
database and collection-level safety settings precisely so users can assume
all operations interacting with those databases or collections use those
settings; ignoring read and write concern settings when users perform
(presumably) safety-critical operations is surprising!"

~~~
scarface74
There is difference between “Mongo’s documentation sucks” and “Mongo is
technically deficient”. The former can be corrected by updating the
documentation.

Yes, I agree as far as the end user is concerned, they are losing data either
way.

~~~
Gaelan
I think the implication here is that "Mongo's documentation is deliberately
bad in order to hide their technical deficiencies," i.e. they're hoping people
will use the defaults, be impressed by the speed, and never realize until it's
too late that they're not getting the consistency they were promised.

------
arpa
Oh, Jepsen and MongoDB again? Somebody get the popcorn!

~~~
balfirevic
Unfortunately, not an entertaining showdown - too one-sided.

~~~
saagarjha
Because MongoDB is web scale?

~~~
senko
Some readers might not be familiar with that particular meme:
[https://m.youtube.com/watch?v=b2F-DItXtZs](https://m.youtube.com/watch?v=b2F-DItXtZs)

IMHO it perfectly describes the hype-reality disconnect at the early days of
MongoDB. Yeah it was that bad.

Mongo has improved since, the hype has toned down and the NoSQL space is more
crowded these days.

~~~
znpy
i remember diaspora chanting about using mongodb.

then a year or two later they admitted that their data model mostly fitted the
relational model, and that they spent a lot of time basically reimplementing
relational integrity in application code, in ruby.

yeah, diaspora has never been fast. I'm not sure they can blame it on mongodb
though.

~~~
collyw
I remember the Mongo hype when it came out and I really couldn't understand
it. You are just throwing away a lot of useful features of a relational
database because "schemaless" and "big data". The majority of people using it
were on single server setups.

------
sacks2k
I still remember when MongoDB was the new kid on the block and it was lauded
as the only thing you should be using here on HN.

I'm glad my gut instinct was correct and that it really wasn't worth the hype.
It reminds me of Ruby on Rails.

~~~
nexuist
I've never used RoR but I know people that still swear by it. It's outdated by
today's "standards," but ActiveRecord was and is still a gem (heh) and a lot
of RoR's foundational principles have been adopted by the existing major
frameworks.

Regardless of technical acumen, I believe RoR doesn't deserve to be compared
to Mongo for one reason: the RoR developers never tried to gaslight their
users into thinking _they 're_ the reason everything broke; they never said
only "if you know what you're doing" can you avoid these hidden pitfalls.

------
veritas3241
Every time I see a post about Mongo it makes me wonder what could have been if
RethinkDB was managed differently.

------
winrid
I worked at one company where the network traffic just on the MongoDB master
was around 2gb/s. We had machines with terrabytes of memory, and Mongo worked
fine - until we had some replica set nightmares. Mongo support is amazing, but
when replication breaks it's very hard to diagnose (usually it was our fault,
but it felt very fragile).

------
holoduke
I used mongodb for 1 year for a milti million user app. I abondened it. The
reliability and stability is just not good. I wanted it to be good, but it
turned out to be a different

------
Too
Ok, so defaults suck, marketing is misleading, documentation and error
messages are not exactly obvious. Assuming you are already stuck in the soup,
putting those issues aside and getting practical instead instead of throwing
more fire on the discussion:

If you set w: majority and r: linearizable/snapshot, both on collection,
client and on transactions. Plus assuming you accept snapshot over Isolation.
How bad are those remaining cases in reality and how do these issues compare
to other databases? The final "read your future writes" error looks quite
scary and does not seem to be caused by configuration error, same with
"duplicate effects".

~~~
eternalban
"Informally, I would summarize the CAP theorem as: If the network is broken,
your database won’t work."

\- Dwight Merriman, former CEO, and "one of the original authors of MongoDB"
[1]

A word to the wise suffices. Sometimes the word in question is implied by
other words.

For those who get this oblique post, note that throwing the above _bon mot_ in
an interview session for a "distributed systems engineer" and asking for an
opinion is a excellent way to differentiate between Peter Principle and
Principal Engineer.

[1]:
[https://web.archive.org/web/20100903213540/http://blog.mongo...](https://web.archive.org/web/20100903213540/http://blog.mongodb.org/post/475279604/on-
distributed-consistency-part-1)

------
twoodfin
Discussed previously:

[https://news.ycombinator.com/item?id=23191439](https://news.ycombinator.com/item?id=23191439)

~~~
dang
Surprisingly, it seems not to have made the front page:
[http://hnrankings.info/23191439/](http://hnrankings.info/23191439/). There's
clearly community appetite to discuss this, so we won't treat the current
submission as a dupe.

~~~
kevinburke
“ Did HN's antispam measures get a lot more aggressive recently? The last
handful of Jepsen reports have really struggled to make it to frontpage,
despite significantly higher vote-to-age ratios than comparable posts. Once
they're on FP, they reliably hit top 10, but Dgraph's (1/2) ”
[https://twitter.com/jepsen_io/status/1261640852666855426](https://twitter.com/jepsen_io/status/1261640852666855426)

~~~
dang
Funnily enough I emailed aphyr earlier this afternoon to let him know that the
current submission was at #1 on HN and that
[https://news.ycombinator.com/item?id=23285249](https://news.ycombinator.com/item?id=23285249)
had strangely failed to make the front page despite all the upvotes it got.
There wasn't any moderation in either case. Nothing has changed recently.
There's just a lot more randomness than people assume...2000 submissions a day
competing for the same 30 slots creates a lot of weird high-pressure effects.

------
matthewborden
Our company migrated away from MongoDB, here's a talk about how we did it, in
case you're thinking about what is involved and how to do it safely:
[https://www.youtube.com/watch?v=Knd3m2qh0o8](https://www.youtube.com/watch?v=Knd3m2qh0o8)

------
mmackh
Ubiquity used MongoDB for their CloudKey Gen1 series. When there was an
unexpected shutdown, there’s a random chance it would lose its configuration
[1]. If your SD backup didn’t work, you’d lose configuration for all WiFi
hotspots. If you did client installs like I did, this was a total nightmare.
How did they solve it? Release new, more expensive hardware with a battery
backup acting like a UPS. Never solved Gen1 issues. Imagine your phone
corrupting after a hard reset. Thanks Ubiquity & MongoDB

[1] [https://community.ui.com/questions/MongoDB-corrupt-after-
eve...](https://community.ui.com/questions/MongoDB-corrupt-after-every-
powercycle-Cloud-Key/6be0a9ad-0049-4a1a-8f73-35cac8b531f9)

------
numlock86
If you want to be "that guy" on parties, ask people what MongoDB is trying so
solve. If they bring up the typical "noSQL document store" stuff, aks them why
you'd want to use MongoDB for that.

------
KingOfCoders
MongoDB uninstalled our cloud hosted cluster once and the site was down and we
needed to setup a large database from backups. Their response was very
unhelpful. I would never touch MongoDB again.

------
whoevercares
Regardless of tech, MDB is a weird stock that go up steadily every time.

~~~
miked85
I have never understood the stock price. I tried shorting at one point, that
was a mistake.

------
aphyr
It looks like relatively few people clicked through to read the analysis
itself, so @dang's kindly offered to repost it. You can find the analysis
here:

[https://jepsen.io/analyses/mongodb-4.2.6](https://jepsen.io/analyses/mongodb-4.2.6)

... and the corresponding HN thread here:

[https://news.ycombinator.com/item?id=23290844](https://news.ycombinator.com/item?id=23290844)

------
m0zg
Dan Luu suggested on Twitter that MongoDB trolled Kyle into testing Jepsen
again. I think they've made a mistake though. :-)

------
jwr
If you're looking for MongoDB done right, it does exist and it's called
RethinkDB. For some reason it didn't catch on and become popular — but it's
nicer, and most importantly, it doesn't lose your data.

Data point: I have been running my production system (a fairly complex SaaS)
on RethinkDB for the last 4 years.

~~~
neximo64
RethinkDB is no longer supported, its major caveat.

~~~
jwr
Yes. Although the degree of "support" always depends on how much you pay for
it :-) I doubt MongoDB is "supported" in the way most people understand that
word.

From my point of view, RethinkDB is not regularly developed and improved.
There is progress, but it's slow. Which is a pity, because it's a really good
database, and one that tries really hard to be correct above all else.

The only other correct distributed database with strict serializable
guarantees that I know of is FoundationDB, which nowhere near as easy to use
as RethinkDB is (but it's somewhat easier with their document layer, which
pretends to be MongoDB, just done right).

------
rmdashrfstar
Main argument for using document-oriented databases:
[https://martinfowler.com/bliki/AggregateOrientedDatabase.htm...](https://martinfowler.com/bliki/AggregateOrientedDatabase.html)

------
jtdev
It seems that the only tangible benefit remaining for DocumentDBs over SQL
platforms (PostgreSQL, SQL Server, etc.) is scalability. Jr. devs thinking
they can have a career in software dev without learning SQL is not a benefit.

------
pier25
Anyone has a recommendation for a NoSQL database?

[https://news.ycombinator.com/item?id=23253870](https://news.ycombinator.com/item?id=23253870)

 _(not Mongo obviously)_

~~~
balfirevic
This question sounded familiar - turns out I replied to it in another thread:
[https://news.ycombinator.com/item?id=23286054](https://news.ycombinator.com/item?id=23286054)

To repeat my (non)answer:

There is no way to recommend NoSQL database without knowing what you need it
for because NoSQL databases are highly specialized systems. If you need
general-purpose database use an SQL one.

It's kind of a weird question, now that I think about it. Why would anyone
seek out a database based on what it doesn't have?

~~~
lmm
I'd actually say the reverse. SQL databases are highly specialised datastores:
they make sense if you need one particular transaction model and one
particular query language and are prepared to coerce your data into one
particular model to do so.

If you're starting from just "I need to store some data" I'd look to e.g. Riak
or Cassandra before looking to an SQL database.

~~~
jatone
SQL DBs are not specialized.... they're incredibly general...

You are never starting from "I need to store some data" you're always going to
start from "I need to store and read some data" otherwise /dev/null would work
if you are not going to read the data back.

the problem with cassandra and riak is precisely the read aspect of the
problem which quickly degrades the performance of those systems.

I've used both cassandra and postgresql at scales most companies never reach.
cassandra I'd only touch for immutable time series data and only if that
information was large enough to not fix on a single server and i didn't care
about consistency. everything else is a SQL rdbms.

~~~
lmm
For simple reads, the SQL model forces significantly worse performance: MySQL
benchmarks found that 75% of the time for a pkey lookup was spent on parsing
the SQL. For more complex querying, SQL databases _can_ be fast... and they
can also be extremely slow, and you can't tell for any given query just by
looking at it.

The much-vaunted consistency comes at a significant cost: index updates block
writes, and more insidiously, it's very easy to be surprised by a deadlock or
a stale transaction with a long-running query. I've seen an SQL database stop
committing any new writes because someone ran a seemingly innocuous query 23
days ago. And a lot of the time - including every web use case I've seen - you
can't actually make any real use of those consistency guarantees.

Writing either a transformation pipeline that serves the same function as a
secondary index, or a deliberate map-reduce style aggregation, takes more up-
front effort. But it means you understand what's actually going on a lot more
clearly and are much less likely to hit that kind of unpleasant surprise.

------
hartator
[repost - asking for help] I am disappointed with the direction that MongoDB
took this past few years. Going ACID shows in benchmarks [1] and it’s not
advisable if you are using MongoDB for stats and queue. (No one uses MongoDB
for financial transactions despite the changes.)

And the recent change to a restrictive license is worrisome as well. I have
been thinking of forking 3.4 and make it back to “true” open source and
awesome performance. (If any C++ devs want to help out, reach out to me!
username @gmail.com)

[1] [https://link.medium.com/PXIeZfhhH6](https://link.medium.com/PXIeZfhhH6)

~~~
toomuchtodo
Why not use PostgreSQL instead? It supports a JSON document data type
natively. It also has exceptional stewardship as an open source project.

Mongo should never be a first choice, but a last choice for edge cases.

~~~
aeonsky
Postgres has terrible indexing with json. It doesn’t keep statistics so simple
queries sometimes take much longer than expected due to query planner not
knowing much about the data.

~~~
pletnes
DB noob question: if you know that you should be indexing on a json attribute,
can’t you put it into a «proper column» and index there?

~~~
Mister_Snuggles
There are a number of ways to do this:

* Extract the attributes you're interested in into their own columns, index these. With the extraction happening outside the database, this is the most flexible option.

* Similar to above, use a trigger to automatically extract these attributes.

* Also similar to above, used a generated column[0] to automatically extract these attributes.

* Create an index on the expression[1] you use to extract the attributes.

My use a JSON in PostgreSQL tends towards the first option. This works well
enough for cases where documents are ingested and queried, but not updated.
The last three options are automatic - add/change the JSON document and the
extracted/indexed values are automatically updated.

[0] [https://www.postgresql.org/docs/12/ddl-generated-
columns.htm...](https://www.postgresql.org/docs/12/ddl-generated-columns.html)

[1] [https://www.postgresql.org/docs/12/indexes-
expressional.html](https://www.postgresql.org/docs/12/indexes-
expressional.html)

------
codecamper
<rant>

This corruption is brought on by the stock market.

Have a look also at Shopify. They go and tack on 2% fees when customers use
Google Pay or Apple Pay to checkout with. They recently announced that FB
would be pulling ecom sales within app, and yet Shopify plans to charge 2% on
top of FB fees. That's what I could gather despite the pricing being rather
opaque.

Is this a step forward or backwards? Charging 2% / transaction for modern
Internet protocols running on cheap hardware across a public network?

</rant>

------
jpxw
Obligatory
[https://www.youtube.com/watch?v=b2F-DItXtZs](https://www.youtube.com/watch?v=b2F-DItXtZs)

------
crackinmalackin
Can anyone share any positive experiences with MongoDB? I wouldn’t think
MongoDB as perfect like any other piece of tech, but the unanimous hatred for
it seems a little overblown. Not trying to discredit the bad experiences
people have had with it. Just curious to know where people are using it
successfully

------
Hydraulix989
This has been a known issue for a while:

[https://hackingdistributed.com/2013/01/29/mongo-
ft/](https://hackingdistributed.com/2013/01/29/mongo-ft/)

MongoDB: Broken By Design

~~~
threeseed
Might want to read up as this involves a completely different set of issues.

And most of those listed in the blog were fixed many years before 2013.

~~~
Hydraulix989
Actually, I read both articles. In fact, the author of the first article was
my very own distributed systems professor in school. The persisting issue in
both articles is a non rigorous specification of when a write actually
completed. Both articles point out that a fault tolerant database should be
ACID compliant, which does not live up to MongoDB’s claims.

------
etxm
MongoDB is the /dev/null of databases

------
therealdrag0
How is Cassandra as an alternative to MongoDB?

~~~
jb3689
I mean, they are completely different. MongoDB is more-or-less a traditional
RDBMS with automated failover and trying to staple on more advanced features.
Cassandra is a masterless DynamoDB-ish database with features like hinted
handoffs. You really need to know how consistency and distributed systems work
if you're looking to pick Cassandra. It's a great implementation, you just
can't compare it to MySQL/Postgres/etc like you can with Mongo

~~~
ianamartin
wat.

I hope this is a joke.

------
gigatexal
Typical HN posts of late hating on Javascript and MongoDB from database
elitists -- the thing is there's a tool for a job and as engineers we need to
figure out what tool best suits our use cases. It could very well be a NoSQL
database such as Mongo or a relational one like Postgres or MySQL.

~~~
calcifer
> the thing is there's a tool for a job

Really? Which job do you belive needs a _" maybe store some of this data,
sometimes"_ kind of database?

~~~
Andrew_nenakhov
I'm not defending mongodb in and sense and had stern talks with some of my
junior developers who were too eager to try out this new hot mongo thingy on a
new website, but there are plenty such jobs.

For example, climate data gathered from hundreds of thousands of devices every
minute can very much survive some data to be lost. Or some astronomical
observations data.

I wouldn't choose mongoDB for it, though.

~~~
jatone
your example is a perfect use case for postgresql via the timescaledb
extension.

~~~
Andrew_nenakhov
I actually agree. I love postgresql and we've been using it for all our
projects since our company was founded (well, except mobile apps, obviously),
and it never failed us.

