
Was MongoDB Ever the Right Choice? - pmoriarty
https://www.simplethread.com/was-mongodb-ever-the-right-choice/
======
matthewmacleod
I would argue that MongoDB is not—and has never been—the _best_ choice for
solving any particular technical problem. But it had some other "advantages"
over other, better solutions – in that it was easier to set up, didn't require
schema definition, had a passable clustering story etc.

I have worked with at least one company that had been built using MongoDB as a
primary data store from day one. This caused untold pain later on, but the
trade-off is that it likely allowed the company to exist at all – the founder
being more of a domain expert than a technical expert, but being able to use
it to scale their idea pretty quickly without having to pay much attention to
all that tedious "reliability" and "safety" nonsense :)

That said, it's not something that an experienced developer should be using
for anything nowadays, and the solution might be to ensure that competing
alternatives (like Postgres) can learn from why MongoDB became popular and
seek to solve some of the pain points in their own implementations.

~~~
rwmj
I once worked for a company where Lotus Notes fulfilled a similar role (this
was back in the late 1990s). Eventually they ran an entire free to sign up
web-based email system using 4 giant Notes instances. It was an absolute
nightmare, but probably the company would never have happened otherwise
because Notes was all that the founders knew. My job was managing the
migration off to a normal SQL database, which took the best part of a year.

~~~
k__
I worked in such a company too.

Every time we wanted to do something new a founder would say "Domino can do
this!" and then they would spend the whole weekend setting it up for what we
were trying to do.

All because we were some kind of IBM partner and they wanted to please the big
wigs.

~~~
teh_klev
Once upon a time in a job far away, about 15% of my time was allocated to
being the company sysadmin, and as all good sysadmins do, reduced that time to
about 5%.....until....

One day, and despite my protests, I had to bin our perfectly fine and well
maintained/loved Exchange server for fricken Notes/Domino just because one of
the investors had some free licenses. After the migration, everyone hated it
and I was persona non grata seeing as it was me who had to switch us over. It
never worked properly. I left not long afterwards and the poor sod who was my
replacement had the joy of looking after that dumpster fire.

edit: opening sentence

~~~
fouc
I wonder if that poor sod will be an HN reader too, and end up seeing your
post

------
mailslot
I went to production with Mongo and it exceeded my expectations. It operated
without a single issue for years under high load. It was a dream to administer
and a vital piece of a multimillion dollar franchise.

Everybody else I know, however, has had nothing but headaches.

They’re the same reasons I’ve seen people fail at using Cassandra, Redis, or
Spanner. If you can’t adjust to the limitations and paradigm shift, you get no
benefits. And an ORM often makes everything worse.

The “no” in NoSQL doesn’t seem to stop people from modeling join relationships
in Redis, or chaining distributed queries with fully consistent writes on
Cassandra.

I’m on a project right now, where a developer has selected an ORM for
PostgreSQL that forgoes joins. They’ve managed to generate about 100 queries,
in one case, where a single query is all that’s needed. 2ms vs. 800+ms. That
individual is incapable of using something more complicated. Substituting
Mongo as-is will make everything worse, and they’ll triumphantly proclaim how
terrible NoSQL is and then write an article.

I feel confident using Mongo for any task. I don’t feel confident letting most
of my peers use it.

There are a LOT of legitimate gripes, but no article I’ve read mentions them.
It’s always the same superficial complaints from ten years ago. If you can’t
get past those, choose another tool. End. Of. Story.

~~~
neves
I'd really like to read an article written by you talking about Mongo
limitations.

------
dijit
Since everyone is sharing their opinion and experience with mongodb I think
I’ll share mine.

As an appeal to authority I would like to mention that I have relevant
vocational qualifications on the subject (more geared towards scalability and
operations). Although I don’t believe it really matters - it will to those who
assume I don’t understand best practice.

MongoDB itself is not /really/ a valid choice in many scenarios that it was
painted as solving. Their only fault is overzealous marketing, it has (in my
opinion) very clear pain points that should be avoided, but those painpoints
are antithical to why many people used it in the first place.

Most people pick up mongo because it’s painted as being “beginner developer
friendly”, I don’t mean new developers, I mean picking it up and running with
it, without understanding it, was made to be incredibly easy. But MongoDB
itself needs you to understand your data patterns before you start adding
shards, so the technology itself depends on you actually sitting down and
designing an architecture while understanding that. These goals are at odds
with each other.

In MongoDB (as it was when I was using it in full prod 6+ years ago) you
-needed- to understand how your data is going to grow and how it will be
queried long before you ever created an index. You could not grow after
creation. But using it as a plain document store with no searching and heavy
sharding on the document ID is the best way to go. And in that scenario it is
much better than most competitors.

In nearly every /other/ scenario its a less favourable choice than another
technology of some variety.

I would argue the data loss point but I think if that’s not a solved issue it
will be, and I’m fairly certain you can configure it to be slower but correct
(my memory is bad).

I am not a MongoDB advocate, nor do I hate the technology outright. I strongly
dislike how it was marketed as being a panacea.

And for the same reason I avoid PHP, I will attempt to avoid MongoDB.

(As in; it can be done well but the majority of cases will be poorly
implemented)

~~~
scarface74
I’m a big fan of Mongo for the use case you described - searching by ID and
all information in one document.

But people don’t seem to understand that there are plenty of scenarios where
you really either don’t know the schemes in advance and/or the “schema” is
defined by an external source.

I worked for a company that sold software that allowed users to create forms
that could be filled out either on the web or via a mobile app.

The user created the form and the schema and the indexes were created on the
fly - one collection per type of form. What would an RDMS have bought us?

~~~
felixfbecker
PostgreSQL's JSON columns are pretty powerful

~~~
mr_toad
And if you create a table with a single ID column and a single JSON column
you’ve essentially re-invented a NoSQL database. But I guess you can pretend
it isn’t.

~~~
scarface74
And it’s a lot worse and the tooling isn’t as robust...

------
skywhopper
As with anything, it depends on the project. I’m working on an internal
service that uses Mongo as a single merged cache for a lot of mostly
unchanging data from various data stores with different credentials for each,
which are distributed around the world, that we otherwise have to fetch
through multiple comparatively slow API calls. For this, Mongo is perfect: no
messing with schemas as they change, unannounced, from upstream; I can index
just the fields I want to search on; Mongo will expire things for me on a TTL;
the query API is simpler than the API we’re caching from; and we get results
20-50x faster. We looked into FoundationDB and Postgres, but they require a
lot more initial setup. ElasticSearch is the closest solution, but it needs a
lot more info about the schema up front and its query language is a nightmare
compared to Mongo’s, for no real gain in functionality that I can see.

Is Mongo the right tool to build your entire business on top of? Probably not,
but it can be the best tool for the right job.

~~~
james-mcelwain
Elasticsearch can index documents dynamically, and doesn't require a schema to
create an index. Dynamic data types for fields may not always produce what you
want, but it's possible to define a partial schema for the fields that are
important and let Elasticsearch handle the rest.

The query language is verbose but I would hesitate to call it a nightmare. You
can always search using the Lucene query language, and SQL support is landing
sometime soon.

~~~
AznHisoka
The query language can be a bit quirky if you're coming from a SQL background,
but you're absolutely right. It is nowhere close to a nightmare.

------
jandrewrogers
My anecdote: several years back I had a test lab for measuring the performance
and scalability characteristics of various geospatial databases. We added
MongoDB to the mix a couple years after they released geospatial support.

We always verified basic correctness with a new database by inserting several
billion geometries as fast as the database would accept them, reading the
entire data set back out, and comparing it to the original data set we
inserted for any discrepancies. MongoDB never passed this test. It would
apparently lose records semi-randomly every time, so we removed it from the
test set. It was the only database we tested that had this issue.

~~~
bonesss
I evaluated several distributed databases for a healthcare-related system. The
ability to lose messages in sharding scenarios, and the specifics of how one
would recover them, made me think I could never support MongoDB for anything
more serious than Reddit.

~~~
TimFogarty
The Jepsen tests [1] have been run against MongoDB - while older versions
presented edge-case opportunities for data loss, that's no longer the case
with recent versions. The Jepsen tests also specifically test sharded
clusters. From Aphyr's report:

> MongoDB 3.6.4’s sharded clusters offer comparable safety to non-sharded
> deployments.

These tests are now integrated into MongoDB's regular test suite. Maybe
MongoDB wasn't the right choice for you at the time you were evaluating it,
but I just want to point out that MongoDB has matured and improved a great
deal.

(Disclaimer: I work for MongoDB)

[1]
[https://jepsen.io/analyses/mongodb-3-6-4](https://jepsen.io/analyses/mongodb-3-6-4)

~~~
jandrewrogers
FWIW, the data loss I was referring to happened on a single node. There may
have been issues in a distributed environment as well but we never got there.

~~~
TimFogarty
When did you perform the test? Can you remember what MongoDB version you were
using?

------
mattparlane
I'm still in charge of a production system serving around 2,000 small to
medium websites from a 2-machine MongoDB cluster. It's been running on MongoDB
since around 2010 and we have NEVER had any issues.

I accept that the unacknowledged writes was a bad decision, but IMHO if you
deploy a new database without reading the documentation, you have bigger
issues.

The reality is that there are some places where speed of movement is important
and referential integrity is just not that big a deal. We're not all building
banking systems.

~~~
jacques_chester
1\. Writing DDL is not hard. It's just not very hard.

2\. You can go from strict guarantees to looseness safely, when you
demonstrably need to. The reverse isn't true -- it's easy to wind up
realising, much too late, that you actually needed particular guarantees that
you didn't even think of.

Relational databases didn't become incredibly popular by accident. It's
because they were a drastic improvement -- theoretically and empirically -- on
the generation of NoSQL databases that preceded them.

~~~
scarface74
Writing DDLs are hard when you don’t know in advanced what the schema is or
the schema is changing frequently.

A lot of things “become popular” but that doesn’t necessarily mean they are
good.

~~~
aeorgnoieang
All the SQL databases with which I'm familiar support large string columns.
What's the downside to just stuffing your schema-less or volatile-schema data
in one of those?

~~~
scarface74
The downside is that you can’t easily query against the individual fields and
you can’t index individual JSON fields.

I’m bringing up C# again...

Querying with Mongo using the Mongo driver in C#....

    
    
      var people = db.GetCollection<People>().AsQueryable();
    
      var seniorMales = from p in people where p.Age >= 65 && p.Sex == “M” select p;
    
    

Querying an RDMS with EF:

    
    
      var people = context.People;
    
      var seniorMales = from p in people where p.Age >= 65 && p.Sex == “M” select p;
    

Both queries get translated to their respective query languages and run on the
server. C# enforces the types in either case and you get compile time type
checking and IDE Intellisense. In either case you can index the Age and Sex
fields.

NoSQL databases aren’t “schemaless”. Mongo understands the schema of JSON data
and can query against it just like an RDMS understands rows and columns.

------
TomK32
It is still in production as the main database in a startup I joined in 2010
and kicked off their software's development. I've been through smaller ups and
downs, but we never lost data. The reason why I loved it back then and still
do today (I'm using mongodb in my own little webapp) is the speed of
development. The few data migrations that I had to write in over eight years
where nothing compared to what you'd have to do in a relational database. With
new features we were always able to keep the db schema in some state of
fluidity on our dev and staging machines until we we happy with the data's
architecture. No back and forth that you'd do with a relational database. We
have only a dozen collections in the db and only four where pulled during a
normal user session but those translate not just into four models/classes but
a few more embedded ones, which makes totally sense because except for
reporting we didn't pull out the embedded data, even though it has become so
much smoother with the aggregation framework.

~~~
danpalmer
It sounds like your product was in the sweet spot where there isn’t much data
complexity or evolution, and the schema is understandable and easy enough to
modify over time without strictness.

The product I work on has ~400 tables active at the moment, and ~800 that have
ever existed in it over the last 6 years. We depend on the database schema to
reduce complexity at the application level.

I don’t think the complexity would be manageable at the application level if
we couldn’t do this. At least not with the ~6 engineers working on it.

~~~
TomK32
That's about six times the engineering power you have there.

Our data is indeed simple and beside data collection via a web app, the focus
is on reports with that data; but always in small sensible junk and never
across the whole available data for a model.

------
mikekchar
I always supposed that document databases were based on object databases from
the early 90s... And that the "NoSQL" craze was simply a continuation of that
progression. I mean there is a difference between NoSQL and no schema. The
early object databases had schemas. The reason they wanted to abandon SQL was
because they believed that they never wanted relational data: they just wanted
a persitance layer for their business models.

I might be wrong about that because I pretty much ignored DBs in the late 90's
and 00s, so I never really followed what was happening. However, one thing I'm
absolutely sure about: MongoDB had very few features that were really novel.
I'm not really sure why people think everything started there (especially
someone who knows about 4GL ;-) )

~~~
gfody
Mongo's debut was well timed to ride the JSON crazewave.

~~~
giornogiovanna
Sorry for my ignorance, but _what_ JSON craze?

~~~
jstimpfle
As a serialization format, JSON is horrible. Way too loose regarding
formatting and way too much noise. All that is typically needed is a standard
text format for relational data. I.e. a fixed CSV.

The only "advantage" to JSON is that it maps directly to the object trees most
developers use in their scripted programs (which is misguided IMHO).

~~~
arethuza
Not sure I would call CSV a "standard text format" \- I've seen many many
problems over the years with badly formed CSV files and bad Unicode handling.
CSV appears to be an "almost standard" where 98% of the time it is fine and
the remaining 2% are an utter nightmare.

~~~
k__
Haha, yes I remember building my first HTTP-API in 2010.

It had to deliver three kinds of formats: CSV, JSON, and XML.

JSON was a one-liner, then came CSV and then XML.

But all the consultants working in customer projects said CSV was the most
important one because it's industry standard.

When the consultants sent me example CSV files from like 5 customers I
couldn't believe they all had a different format.

------
a13n
These "caveats" are mostly false, or outdated.

> Loss of transactions

MongoDB 4.0 supports ACID transactions:
[https://www.mongodb.com/transactions](https://www.mongodb.com/transactions)

> Loss of relational integrity (foreign keys)

> Having a database enforce these relationships can offload a lot of work from
> your application, and therefore from your engineers.

I've never seen anyone use MongoDB without also using something like Mongoose
where you get all this for free. Zero work for your engineers.
[https://mongoosejs.com/](https://mongoosejs.com/)

> Lack of ability to enforce data structure

Again, Mongoose. The work doesn't fall on your engineers, it falls on an
awesome, heavily-used, heavily-tested library.

> Custom query language

Can you give an example of something that you realistically would want to do
in SQL that you can't with a JSON query?

> Loss of tooling ecosystem

This might be the only valid caveat in the entire article.

~~~
matwood
The problem with thinking a library like Mongoose satisfies those constraints
is that it only works with a single application. Every database I've worked
with that made it beyond a prototype outlived the original application. So you
either end up building an RDBMS app/api in front of mongo for every new
application to use or you have to translate those constraints to every client.

I'm of the opinion to let the "awesome, heavily-used, heavily-tested" RDBMSs
built to manage data and constraints do just that.

~~~
flurdy
True, using a library to hide the limitations of a DB smells bad. It is fixing
the problem in the wrong location.

I may have misread your comment but rarely do more than one client application
access a database. Especially in today's often Microservices architecture. So
using a library as mentioned is fine. And as the app evolves/gets rewritten
you can keep the library or replace it with a similar one.

Over 10+ years ago I did join a few projects with existing architecture that
had evolved to multiple applications accessing the same database. And they
were a nightmare but that is rare to encounter these days as most people have
learned that one app = one DB. (Data warehousing is a possible exception, but
mostly those datasets are exported instead as well as streams for machine
learning).

Probably preaching to the choir but multiple clients mean feature
freeze/deadlock, no DB refactoring and spaghetti architecture.

~~~
aeorgnoieang
> rarely do more than one client application access a database

That has not been my experience!

------
gdulli
> Bandwagon effect – Everyone knows this, and yet it is still hard to fight
> against. Just make sure that you’re choosing a technology because it solves
> real needs for you, not because the cool kids are doing it.

> Mere newness bias – Many software developers tend to undervalue technologies
> they have worked with for a long time, and overvalue the benefits of a new
> technology. This isn’t specific to software engineers, everyone has the
> tendency to do this.

> Feature-positive effect – We tend to see what is present, and overlook what
> isn’t there. This can wreak havoc when working in concert with the “Mere
> newness bias”, since not only are you inherently putting more value on the
> new technology, but you’re also overlooking the gaps of the new tech.

This is so much what I've always wanted to and tried to say on the topic.
Adopting new technologies thoughtfully rather than reflexively is something
I'd love to find out about a team before joining one again.

------
TimFogarty
Great article - it's definitely vital to fully understand the tradeoffs of any
technology you choose, and not just going with the latest fad.

I work at MongoDB, so I do want to mention that MongoDB has changed a lot over
the past few years. MongoDB now has multi-document ACID transactions. Also,
schema validation means you can enforce a strict data structure if you want.
On top of that there have been improvements in data durability and consistency
(the Jepsen tests are now integrated into the MongoDB test suite).

As mentioned in the article MongoDB has matured a great deal. It's far more
mature and fully-featured in 2019 than it was in 2012.

------
joduplessis
I hate these kinds of articles, however, I do recognise that people are
allowed their opinions, which is good. But these sorts of articles take a very
polarised view of the world.

> Does this tool solve a real problem for us > Do we thoroughly understand the
> tradeoffs

Well, no. But neither did running JS on the server (along with numerous other
similar examples). I've used MongoDB plenty of times before (in production
with 1 issue to date) and I absolutely will again.

~~~
aeorgnoieang
> running JS on the server

Huh? Enough people apparently knew JavaScript well enough that running it on a
server was really useful. Why's that not a real problem? (Not being able to do
so.)

------
leejo
In my previous job, at a payment service provider, I wrote a "velocity"
system. This was required to block transactions that met certain conditions
within a certain time frame. A typical example was: block transactions that
are from the same IP if that happens more than 5 times in a 10 minute period.
Or: block transactions from the same card number if it is used more than 3
times in a minute.

The problem was that the "velocity" blocking requirements could be completely
arbitrary, and could be anything a merchant required as long as they used any
of the data that was available from the transaction/previous transactions -
block a transaction if it's more than 500 GBP and we've had 3 other
transactions greater than 500 GBP in the same country in the previous 10
minutes.

This essentially meant we would either have to do dev work whenever a merchant
had a new "velocity" rule, write our own complex framework to handle the
addition of arbitrary rules, or just stuff the data into a schemaless NoSQL
store and then leverage the power of the engine's query syntax as part of the
merchant's configuration.

We went for the schemaless NoSQL solution and we used mongodb v1.6 - around
2010 IIRC, before it reached peak hype, before it "fixed" a lot of the out of
the box defaults, before it got a lot of hate. It worked perfectly and ran
from the time we deployed it until i left the company a few years later. Maybe
I left a mass of technical debt, but the solution ended up being so simple and
so little code I doubt it.

The other nice thing was that the velocity system was not essential - if the
time to do a velocity check took more than 0.05s we would ignore it. If the
backend wasn't responding we ignored it. If the write to the storage failed it
didn't matter. We didn't need to keep more than a couple of hours worth of
data. If we lost all of that data it didn't matter.

I don't know if I'd take the same approach now, nine years later, but at the
time the use of mongodb worked perfectly. It was only a couple of weeks work,
and solved the problem elegantly. So in response to the original question:
yes, but as usual RTFM and make sure the pros and the cons fit your use cases.

~~~
rujuladanh
If you have almost no guarantees to provide, almost any solution would have
worked.

In other words, you claim it worked for years, but anything would if you can
lose all data, reboot it, have no real-time requirements, etc.

In summary, it is not really a useful data point on whether MongoDB is useful
or not.

~~~
leejo
> If you have almost no guarantees to provide, almost any solution would have
> worked.

Within reason. If the failure rate is 1 in 1000 then it's probably not as a
big a concern as 1 in 10 _depending on your use case_

> In summary, it is not really a useful data point on whether MongoDB is
> useful or not.

As I said in the last sentence - it worked perfectly. In other words the "no
guarantees to provide" never actually came up and I don't recall us having any
data loss, response time, or write issues.

------
willio58
I used it before I understood anything about databases, and I got some jobs
done with it. Now that I have some knowledge, I choose PostgreSQL for most
cases. But mongodb got me up and rolling in a pretty DRY way when I first
started programming for the web.

If you get to the point where your website is outgrowing it’s initial mongodb
implementation, you have a very good problem on your hands.

~~~
todd3834
It doesn’t take a lot of traffic to outgrow a poorly implemented mongodb
structure with a lot of data. That’s not a fault of mongodb as much as it is
using the tool wrong. When I first started using mongodb I treated it like a
relational database and hit performance issues very quickly.

------
JustSomeNobody
Man, I remember the Mongo craze. You couldn't have a rational discussion about
databases with some of the most rabid supporters. I made a comment on a post
yesterday about tech having those few early and loud supporters who shut down
any conversation that doesn't support their chosen technology. The Mongo craze
was like that early on. It was really frustrating not being able to look at
other databases when everyone was chanting "Mongo! Mongo!" Thankfully tech has
moved on to Kubernetes and React and microservices and now we can finally have
a rational discussion about databases.

~~~
dankohn1
If you haven't watched it recently, this was always my favorite look at
"MongoDB is web scale":
[https://www.youtube.com/watch?v=b2F-DItXtZs](https://www.youtube.com/watch?v=b2F-DItXtZs)

------
gnur
Off course it was, prototyping speed on mongodb was (and probably still is)
always excellent.

Some features that don't scale are very nice to have when you don't have
scaling issues. For example, if you add tags to your documents and you want to
query on those tags (find all documents containing tag A and B), it's nice
that's just a builtin.

I haven't found a single datastore that is as developer friendly that supports
that use case, so for now, I'm sticking with mongodb for my pet project.

(if you know of a datastore that has support for this query out of the box,
please let me know)

~~~
matthewmacleod
The other answers have confirmed that Postgres will do this with array fields,
and it's good advice to follow. It's also in my view much easier to read than
MongoDB's query language is!

    
    
      CREATE TABLE documents (name text, tags text[]);
      INSERT INTO documents VALUES ('Doc1', '{tag1, tag2}');
      INSERT INTO documents VALUES ('Doc2', '{tag2, tag3}');
      INSERT INTO documents VALUES ('Doc3', '{tag2, tag3, tag4}');
    
      SELECT * FROM documents WHERE tags @> '{tag1}';
       name |    tags
      ------+-------------
       Doc1 | {tag1,tag2}
      
      SELECT * FROM documents WHERE tags @> '{tag2}';
       name |    tags
      ------+-------------
       Doc1 | {tag1,tag2}
       Doc2 | {tag2,tag3}
       Doc3 | {tag2,tag3,tag4}
    
      SELECT * FROM documents WHERE tags @> '{tag2, tag3}';
       name |       tags
      ------+------------------
       Doc2 | {tag2,tag3}
       Doc3 | {tag2,tag3,tag4}
    

Postgres certainly isn't perfect, but it's usually a good answer to "how do I
store and query data" where you don't have any particular specialist
requirements.

~~~
tnolet
Using this almost 1:1 in a production app where customers can filter by tags.
Works great.

------
djhaskin987
I've only ever heard of mongo successfully used for two use cases:

1\. As a cache, like redis 2\. As a log store, like Elasticsearch

In both cases, the data is somewhat ephemeral, and not the "source of truth"
for the app. The minute it is used for holding real, customer supporting data,
things start to get dire real fast.

~~~
TimFogarty
Plenty of people are successfully using MongoDB for real, customer supporting
data at a large scale. There's a selection of users on the website for a
start: [https://www.mongodb.com/who-uses-mongodb](https://www.mongodb.com/who-
uses-mongodb)

(Disclaimer: I work for MongoDB)

~~~
alexandercrohde
But we'll never know if those companies will be better off if they hadn't.
You're just making a bandwagon argument

(Disclaimer: I hate MongoDb)

------
gwbas1c
I found developing with MongoDB a pleasure, until its lack of transactions
became problematic. Fortunately, the project I was working on didn't go very
far.

At the time, I concluded that MongoDB was the "Visual Basic of Databases." It
was very easy get something simple running, much like Visual Basic classic
was.

Quite honestly, something ACID-compliant with a MongoDB-like API is really
needed for small-scale projects and prototyping.

~~~
codepope
It's called MongoDB. ACID multi-document transactions, here, now, introduced
in version 4.0.
[https://www.mongodb.com/transactions](https://www.mongodb.com/transactions)

~~~
luckydata
Plus the feature, minus the reputation.

------
insulanian
I'm really glad this "Mongo all the things!" trend is reversing.

------
sebringj
After using mongodb for many years, I haven't run into a use case where I
should have used it over SQL. The only benefit is rapid prototyping for me
personally and it initially felt awesome to have mongoose in node be the
schema as it was super easy to modify. But you pay for it big time as you go
and find that anything difficult means you'll have to do backflips and more
requests to get things done and get used to very strange and verbose nested
json queries. It's the 10-15% use cases that are really hard. NoSQL is
especially annoying when needing any type of joins because despite your best
planning, it still happens. There were many cases I needed to fallback on
postgres to get stuff done and have a second source of data to sync and then I
was wondering why I didn't just do it all in postgres first.

------
bshipp
I feel like Bill Murray in Punxsutawney. Didn't we just see this article
yesterday?

[https://news.ycombinator.com/item?id=19492562](https://news.ycombinator.com/item?id=19492562)

Here was my comment from yesterday morning:

 _> Question 1: What problems am I trying to solve?_

I wish I had really thought about this when I first wrote my largest web
scraper. At the time, I was still relatively new to database design and
programming in general. This web scraper, out of the thousands I've written in
the interim, is--of course--the one that is still going strong many years
later.

I eshewed MongoDB for all the reasons given to me on the internet and, because
I was slowly gaining competence with SQL, ended up building a large and
complex pipeline to send the data right into Postgres. In retrospect, this was
a serious design mistake, and one that I regret the most.

Although I still contend that the data did eventually need to be normalized, I
now believe that I was doing it far too early. By ingesting the JSON stream
into a parser, splitting it up, generating foreign keys, and then forcing the
whole works into a single Postgres database I severely limited the capacity of
my web scraper (and also guaranteed the need for a very powerful server to run
it).

Had I initially dumped all results into MongoDB (or some other efficient
document store) and then, separately, parsed the output into normalized SQL, I
would have dramatically simplified the operation, maintenance, and debugging
of my web scraper. Plus it would have been much simpler to spawn work jobs on
to different machines instead of trying to break up huge monolithic processes
with poorly defined endpoints. There have been many lessons learned.

In short, Mongo likely serves a very good purpose for high-speed data storage
and manipulation (although it's hardly alone in this space). However, it's
still likely not a great all-around solution and works best when supported by
an ACID-based normalized RDBMS. Unless, of course, things have dramatically
changed in recent updates.

------
_Marak_
It's been over 10 years and CouchDB has not let me down once. Still using it
in production everyday.

~~~
wmij
I also really like CouchDB and can attest to it being a reliable part of the
stack. I'm using it on a personal project (Node/Nano/Couch) that has the
potential for needing to store a lot of data but hasn't gotten there yet, so I
can't speak from experience yet on performance/scalability. It has been so far
great in production and also great for the normal CRUD parts of this project.

The main reasons I chose Couch over something relational like MySQL was
essentially 1. to have a a clear path for data to go as JSON from
server/API/client without the need for mapping, 2. Schemaless to allow for
quick iterating and development, 3. Easy to start with local first development
and then rollout for deployment. Also, I hadn't worked on a stack with a
persistence strategy that was solely document oriented so it was a fun and
good learning experience.

A few things I learned along the way:

I still needed to create externalized "views" of the data that either combined
data from multiple documents and the need to hide private data. I still needed
to serve lists of data in pages. More importantly I needed to provide a lot of
adhoc reporting over the various data I'm storing across multiple Couch dbs.

The views and paging are all easily solvable with Couch, but so much easier to
implement using SQL and it feels like I'm just pushing that need or
compensating somewhere else in the implementation. But the friction around
quick reporting has made me second guess choosing Couch/document based vs.
something like a MySQL/ORM.

The author's point of the custom query language (Mango for Couch) and loss of
tooling ecosystem has been the biggest problem for me on this project and I'm
considering migrating to a Node/Sequelize/MySQL stack just to avoid wasting
future cycles trying to quickly report on the data I'm storing. When the
project started the reporting aspects weren't as apparent as they are today
since the project has evolved and other requirements became necessary.

If anyone has any experience or recommendations for tools that can easily do
adhoc reporting against CouchDB or documents in general I'm interested in
hearing about them.

~~~
_Marak_
I've had decent success so far with CouchDB 2.0 and Mango for performing ad
hoc queries.

I believe Mango will automatically create the view index on the fly for any
query you make and save the index for later. Before CouchDB 2.0 this would
usually have to be done manually before the query was run.

------
ineedasername
> Feature-positive effect – We tend to see what is present, and overlook what
> isn’t there.

This is such a problem. Just yesterday someone came to me and said, "We're now
using x to do y. It makes z a lot easier. Just wanted to let you know so you
can make the appropriate changes to your analysis & reporting applications."

Say what now? I'm embedded within the operational unit. I know _a lot_ about
day to day operations in & out of the systems in use, and how they ultimately
translate through in data. I asked half a dozen questions, and 4 of them were
met with "oh, we hadn't thought about that." These are deal-break questions,
things the current method handled without issue, so much so they became
invisible to the users, until they decide to make a change and realize the new
method doesn't make any provision for them.

------
_Codemonkeyism
MongoDB with it's JSON model helped us develop features and iterate much
faster than the previous MySQL/PG. So it was a huge win for use (but we had
some pain with some issues).

Today we use PG everywhere with JSON, though still the library support for
JSON in Mongo is often a little bit better than in PG drivers.

------
superwayne
The really interesting part of the article comes after the "What could have
been done differently?" headline. I think the article would have been better
if it wasn't focused as much on MongoDB because now everybody is discussing if
the caveats still apply or ever applied.

------
pmarreck
Could this article title be rephrased "Has Postgres essentially killed the
MongoDB market?"? ;)

------
ganonm
If you need limited document storage capabilities I'd recommend just using S3
alongside a traditional RD like PostgreSQL. You can simply store object IDs in
the primary database but the actual document/data in S3. I used this
methodology for a cloud platform I built that required the ability to store
large 3D models uploaded by users. Metadata was stored in PostgreSQL then the
actual data in S3. This also facilitated generating a acquisition URL on the
server which could be triggered client-side so that after initial creation,
there was almost no primary server overhead (bandwidth or storage) for
retrieval.

------
reustle
MongoDB was great to me for prototyping, either manually or with tools like
Meteor.

~~~
chrisweekly
Yeah, Meteor was pretty amazing when it first came out. It felt like it was
from several years in the future. As I recall, it relied on mongo on the
server and a minimongo that ran in the client. It made it trivially easy to
create universal real-time webapps, which was a game-changer for me at the
time. Reactivity has since become mainstream, but at the time it represented a
bright line / step function increase in both DX and UX.

------
ohaideredevs
The main use case for MongoDB I have seen are custom forms or data that can be
nested a variable number of times.

The main problems I have had with MongoDB were that, as of several years ago,
it did not integrate well with most (I would argue any in practice) data
reporting / displaying third parties.

Also, some of the more advanced queries were not at all intuitive. In fact, I
barely remember any of the syntax now. In Mongo's defense, that might have to
do with the fact that we atempted some stuff in MongoDB that we would never
attempt to do with SQL cursors.

~~~
golergka
> data that can be nested a variable number of times

I'm far from a SQL expert and have mostly done client-side work in my life,
but SQL-based database does seem like a good match for things like these. You
just need to write more foreign key relations, and constraints are a bit more
complicated, but I see no reason to switch to NoSQL for things like these.

------
tjpnz
While I think MongoDB is deserving of a lot of the criticism it receives I
think the bigger issue is with those who adopt it without a full understanding
of their data and the use cases they might need to implement. There are
countless stories now from people who assumed their data wasn't relational
only to discover much later on that the opposite was true. There are cases
where a document oriented datastore could be the way to go but everything I've
read seems to suggest that they're in a minority.

~~~
a13n
There's no reason you can't use MongoDB with relational data.

Foreign references? They've got them.
[https://docs.mongodb.com/manual/reference/database-
reference...](https://docs.mongodb.com/manual/reference/database-references/)

ACID transactions? They've got them.
[https://www.mongodb.com/transactions](https://www.mongodb.com/transactions)

~~~
tjpnz
But why wouldn't I just use Postgres instead?

~~~
a13n
But why wouldn't you just use MongoDB instead?

------
huffmsa
Great for prototyping, and cases where you can contain everything in one
document, but as soon as you're referencing _model2_ from _model1_ , it's
probably time to switch to a SQL based DB.

I know MongoDB supports references, but it's like using a flathead driver for
a Phillips head screw.

------
julien_c
Very timely Show HN post: we just released Mongoku, a neat Web-based
interface:
[https://news.ycombinator.com/item?id=19500859](https://news.ycombinator.com/item?id=19500859)

------
earthboundkid
I've said it before, and I'll say it again: Mongo:2010s::MySQL:2000s.

------
pictur
mongodb has its advantages and disadvantages. just like other databases.

------
acl777
If one researches the company's history, MongoDB came from a specific need the
founders had for another product: ShopWiki.

ShopWiki is a shopping price listing site, where it keeps tracks of prices for
any item: computer, clothing, food, etc. ShopWiki needs to be able to store,
access, AND search amongst all of these different items. So, MongoDB is the
perfect solution.

If the application being built has similar requirements to ShopWiki or a
retail site where it sells _everything_, then MongoDB IS the right choice,
because the founders basically built MongoDB for ShopWiki.

To sum it up: Is MongoDB the right choice? If the product is similar to
ShopWiki, then yes.

~~~
jd_mongodb
Untrue. MongoDB arose from the frustrations the founders had with scaling SQL
databases while building DoubleClick.

------
wnevets
There seems to be a lot of backlash over using mongodb recently. Which current
hip tech will we see receiving the same treatment? graphql? golang?

------
mustardo
No

~~~
taneq
Betteridge strikes again!

~~~
skrebbel
Except this one is an exception!

------
SlowRobotAhead
I’ve never used Mongo. How many of people’s complaints revolve around NoSQL
(Ex AWS DynamoDB) vs Mongo specifically?

------
RocketSyntax
Highly nested data with a schema that might change from site to site =
Electronic medical records.

------
sonnyblarney
I actually think NoSQL might be better for MVP and prototyping. Why? Because
they can handle churn more quickly, and you don't have to worry about scale.

Once the data layer starts to resonate around a solution ... then pick the
right horse for it.

~~~
mruts
Why not just use a SQL db that handles json and stick everything in a couple
columns? Changing databases is a lot of work, and as far as I can tell, mongo
isn’t really providing much value over SQL dbs that already support schemaless
json columns.

~~~
geezerjay
> Why not just use a SQL db that handles json and stick everything in a couple
> columns?

Honest question: what's wrong with just defining a domain model, adopting an
ORM and a serialization framework, and simply go with a conventional RDBMS?
Afaik all reference web application frameworks handle this right out of the
box.

~~~
ddebernardy
Nothing on paper, if you get the schema more or less correctly the first time.
The problem is when you need to do a schema change that affects terabytes of
data down the road.

~~~
ants_a
That is going to be a problem regardless of technology.

With an enforced schema you will at least know that all of the existing data
matches the schema. Without it you have to hope that you had zero bugs while
collecting the terabytes of data.

~~~
ddebernardy
Yeah, but it's a different type of problem.

If you've a schema, you sometimes need to rewrite an entire table as part of a
migration, and that can mean heavy engineering if you want to avoid downtime
or disabling writes during the migration. There are 1:1 relationships between
tables in the wild that wouldn't have passed a sniff test as part of an early
schema design review, but then got created regardless to avoid a lengthy table
rewrite.

If you've no schema proper, by contrast, you can manage multiple variations of
what the data might look like in code. Not that such a thing is simple; it's
definitely not, for the reason you raised. But it's simpler to deploy and
migrate, or certainly might appear to be so to someone who isn't comfortable
with SQL.

Also, there's a class of apps where having a schema doesn't add much value and
NoSQL actually makes sense. Think storing and mining logs, scraped data, ML
training sets, etc. -- apps where it doesn't matter much however a big pile of
data gets stored, so long as you can shovel through it in parallel or store it
very fast.

------
suff
For any team that ever avoided a six or seven figure SQL Server or Oracle
license and managed to scale to more than 50 transactions per second with
horizontal scaling, yes, MongoDB was absolutely worth it.

~~~
alexandercrohde
Uh... Postgres?

------
thosakwe
In my experience, MongoDB shouldn’t be used in cases where you need relations,
or the typical guarantees that ACID databases bring.

I think a lot of the “beef” people have towards MongoDB is from past
experiences where it was used in this regard.

I usually use PostgreSQL at this point, but when I use MongoDB, it’s more as a
“document store” than a “database.”

~~~
a13n
MongoDB is ACID compliant.
[https://www.mongodb.com/transactions](https://www.mongodb.com/transactions)

