
Why You Should Never Use MongoDB - hyperpape
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
======
gmjoe
> Seven-table joins. Ugh.

What? That's what relationship databases are _for_. And seven is _nothing_.
Properly indexed, that's probably super-super-fast.

This is the equivalent of a C programmer saying "dereferencing a pointer,
ugh". Or a PHP programmer saying "associative arrays, ugh".

I think this attitude comes from a similar place as JavaScript-hate. A lot of
people have to write JavaScript, but aren't good at JavaScript, so they don't
take time to learn the language, and then when it doesn't do what they expect
or fit their preconceived notions, they blame it for being a crappy language,
when it's really just their own lack of investment.

Likewise, I'm amazed at people who hate relational databases or joins, because
they never bothered to learn SQL and how indexes work and how joins work,
discover that their badly-written query is slow and CPU-hogging, and then
blame relational databases, when it's really just their own lack of
experience.

Joins are _good_ , people. They're the whole _point_ of relational databases.
But they're like pointers -- very powerful, but you need to use them properly.

(Their only negative is that they don't scale beyond a single database server,
but given database server capabilities these days, you'll be very lucky to
ever run into this limitation, for most products.)

~~~
bowlofpetunias
People hate joins because at some point they get in the way of scaling, and
getting past that is a huge pain.

Or at least, that's where the original join-hate comes from.

In reality of course, most of us don't have that problem, never had and never
will, and it's just being parroted as an excuse for not bothering to
understand RDMS's.

Relational database design is a highly undervalued skill outside the
enterprise IT world. Many of the best programmers I've worked with couldn't
design a proper database if their lives depended on it.

~~~
drsintoma
> In reality of course, most of us don't have that problem, never had and
> never will

Maybe you never had any problems, but I don't believe "most of us" can say the
same. At least me, I'd encountered problems derived from join-abuse in almost
every job I've had.

~~~
rodgerd
That's funny, because I've mostly encountered problems with people who prefer
to nest SQL queries inside a sucession of loops in their code, rather than
learn how to use SQL properly.

------
dasil003
There is a good reason that relational databases have long been the default
data store for new apps: they are fundamentally a hedge on how you query your
data. A fully-normalized database is a space-efficient representation of some
data model which can be queried reasonably efficiently from any angle with a
bit of careful indexing.

Of course relational databases being a hedge, are not optimal for anything.
For any particular data access pattern there's probably an easy way to make it
faster using x or y nosql data store. However as the article points out,
before you decide to go that route you better be pretty certain that you know
exactly how you are going to use your data today and for all time. You also
should probably have some pretty serious known scalability requirements
otherwise it could be premature optimization.

Neither of these things are true for a startup, so I'd say startups should
definitely stay away from Mongo unless they really know what they are doing.
Being ignorant of SQL and attracted by the "flexibility" of schema-less data
stores powered by javascript is definitely the wrong reason to look at Mongo.

~~~
camus2
> Being ignorant of SQL and attracted by the "flexibility" of schema-less data
> stores powered by javascript is definitely the wrong reason to look at
> Mongo.

It's usually the only reason. And 10gen were good at marketing it.

~~~
chadcf
I used it for exactly one production app and it was a huge success. The reason
I used it was because the data we needed to represent was actually a document,
in this case a representation of fillable form fields in a pdf document. The
basic structure was that documents had sections and sections had fields and
fields had values, types, formatters, options, etc.

Initially trying to come up with a schema in SQL was somewhat painful as what
I was really looking for was an object store. Switching to mongo gave me a way
to do a very clean, simple solution that worked quite well for the problem at
hand (representing pdf forms). That said, we also played it very safe and used
mongo for only the document portion, with every other part of the system being
in an sql database. But for the doucuments mongo worked really well as a basic
object store without the complexity of something like Neo4j.

~~~
idProQuo
For all the talk about how "MongoDB totally has SOME use cases", I've never
before heard of a use case where it would be unambiguously better to use a
document store. Thanks for explaining that so well.

~~~
threeseed
I've used it well in the past as well.

MongoDB was 5-10x faster than PostgreSQL, Cassandra etc.

If your domain model is structured like a document then MongoDB is a pretty
great fit.

------
just2n
I once tried to insert a screw using a hammer. I'll be writing my article "Why
You Should Never Use A Hammer" shortly.

And here's the crux of the problem with this article, and of so many articles
like it:

"When you’re picking a data store"

"a", as in singular. There is no rule in building software that says you have
to use 1 tool to do everything.

~~~
dbcfd
I think this is the first post on HN I wish I had a downvote button for, just
for the reason you list. There is a reason there are different flavors of
databases, and MongoDB most definitely would not be my choice for representing
graph like relationships.

It's also scary that it has 217 points because it bashes Mongo.

~~~
towelrod
I think you are missing the point of the article. If you read down to the
Epilogue it explains how the "perfect" application still didn't work with
MongoDB once the clients started asking for more features.

My read was that even when you think you don't have "graph like relationships"
in your data, you actually do.

The original author did say this, but I would like to add: if you don't have
"graph like relationships", then your data is pretty trivial and any data
store will do.

~~~
dbcfd
From another comment I made, on why I don't think is a good article even using
the proposed thesis of "mongo doesn't work for graph like relationships":

Even though their data doesn't fit well in a document store, this article
smacks so much of "we grabbed the hottest new database on hacker news and
threw it at our problem", that any beneficial parts of the article get lost.

The few things that stuck out at me:

* "Some folks say graph databases are more natural, but I’m not going to cover those here, since graph databases are too niche to be put into production." \- So you did absolutely no research

* "What could possibly go wrong?" \- the one line above the image saying those green boxes are the same gets lost. Give the image a caption, or better yet, use "Friends: User" to indicate type

* "Constructing an activity stream now requires us to 1) retrieve the stream document, and then 2) retrieve all the user documents to fill in names and avatars." \- Yep, and since users are indexed by their ids, this is extremely easy.

* "What happens if that step 2 background job fails partway through?" \- Write concerns. Or in addition to research, did you not read the mongo documents (write concern has been there at least since 2.2)

Finally, why not post the schemas they used? They make it seem like there are
joins all over the place, when I mainly see, look at some document, retrieve
users that match an array. Pretty simple mongo stuff, and extremely fast since
user ids are indexed. Even though graph databases are better suited for this
data, without seeing their schemas, I can't really tell why it didn't work for
them.

I keep thinking "is it too hard to do sequential asynchronous operations in
your code?".

~~~
towelrod
I'm pretty ignorant of MongoDB so I'm genuinely interested in your response:
How would you solve the problem in the epilogue, namely "a chronological
listing of all of the episodes of all the different shows that actor had ever
been in"?

Did Sarah model the data poorly ("We stored each show as a document in MongoDB
containing all of its nested information, including cast members").

Or is there an easy way to extract that information that Sarah just doesn't
know about yet?

Keep in mind the constraints in the article, for example: some shows have
20,000+ episodes, actors show up in 100s of shows, and "We had no way to tell,
aside from comparing the names, whether they were the same person".

The last part seems like a really straightforward relational critique to me.
If you don't break the actors out into unique entities then you can't compare
them across shows. But if you do break them out into unique entities, then how
to you present the show information without doing joins?

~~~
jt2190

      > Did Sarah model the data poorly ("We stored each show as a 
      > document in MongoDB containing all of its nested 
      > information, including cast members").
    

Yes, they modeled the data poorly.

In this example, we have a TV Show, which is modeled as an entity (document).
This TV Show has a list of cast members, each one modeled by a nested object.

In a relational database, this type of relationship would be modeled by having
a TV_SHOWS table, a CAST_MEMBERS table with a foreign key to the TV_SHOWS
table, and a CASCADE DELETE relationship to ensure that if a TV_SHOW is
deleted, the related CAST_MEMBER records are also deleted.

This is obviously too strong a relationship between CAST_MEMBERS and TV_SHOWS.
(In OO we'd call this a "component" relationship, that is, we're saying that a
tv show is composed of cast members, and if we destroy the tv show we destroy
the cast members as well.)

They should have modeled CAST_MEMBERS as true entities, by making them
documents in their own collection, and storing a list of Cast Member IDs in
each TV Show.

    
    
      > But if you do break them out into unique entities, then 
      > how to you present the show information without doing 
      > joins?
    

You must join, albeit in MongoDB you do this in the application layer, not the
database, so:

1\. Query the cast members collection to find the cast member id. 2\. Query
the tv shows collection to find all tv shows with cast member id in the cast
members set.

Those of us who sharpened our teeth using relational databases have trouble
seeing past "two trips to the database" in the above strategy, and that's
probably why there's an urge to embed documents rather than to query two
collections sequentially. Resist this urge, as it's as as bad as the urge to
denormalize, i.e. there'd better be a damn good reason to do it.

~~~
jacques_chester
> _This is obviously too strong a relationship between CAST_MEMBERS and
> TV_SHOWS._

... huh?

> _They should have modeled CAST_MEMBERS as true entities, by making them
> documents in their own collection, and storing a list of Cast Member IDs in
> each TV Show._

So instead of a one-to-many relationship, they should use a one-to-many
relationship expressed in a different notation?

------
m_mueller
I don't know much about MongoDB, but I've been using a lot of CouchDB for my
current project. Am I correctly assuming that MongoDB has no equivalent for
CouchDB views? Because if it had, all these scenarios shouldn't be a problem.

Here's how relational lookups are efficiently solved in CouchDB:

\- You create a view that maps all relational keys contained in a document,
keyed by the document's id.

\- Say you have a bunch of documents to look up, since you need to display a
list of them. You first query the relational view with all the ids at once and
you get back a list of relational keys. Then you query the '_all' view with
those relational keys at once and you get a collection of all related
documents - all pretty quickly, since you never need to scan over anything
(couchDB basically enforces this by having almost no features that will
require a scan).

\- If you have multiple levels of relations (analogous to multiple joins in
RDBMs), just extract they keys from above document collection and repeat the
first two steps, updating the final collection. You therefore need two view
lookups per relational level.

All this can be done in RDBMs with less code, but what I like about Couch is
how it forces me to create the correct indexes and therefore be fast.

However, if my assumption about MongoDB is correct, I have to ask why so many
people seem to be using it - it would obviously only be suitable in edge
cases.

~~~
rdtsc
Also CouchDB has better safety. Its append-only files allows you to make hot
backups and safely pull the plug on your server if need be without worrying
corrupting data.

Plus change feeds and peer to peer replication are first class citizens in the
CouchDB. Once you start having large number of clients needing realtime
updates, having to periodically poll for data updates can get very expensive.

~~~
jimbokun
I immediately wondered why Diaspora didn't try CouchDB, since replication
seems to be one of the key features they were after.

~~~
twic
In Diaspora as it exists now, replication - really, federation - is between
pods. There's a protocol for transferring data between pods that is
deliberately database-agnostic:

[https://wiki.diasporafoundation.org/Federation_protocol_over...](https://wiki.diasporafoundation.org/Federation_protocol_overview)

So CouchDB's replication doesn't really help.

If the day comes that any single pod is big enough to need replication between
clustered machines within it, then CouchDB should certainly be a contender for
storing its data.

------
scbrg
I must have read a dozen (conservative estimate) articles now all called "Why
you should never use MongoDB ever" \- or permutation thereof. Each and every
one of them ought to have been called "I knew fuckall about MongoDB and
started writing my software as if it was a full ACID compliant RDBMS and it
bit me."

There are essentially two points that always come up:

1\. Oh my God it's not relational!

Well, you could argue that if you move from a type of software that is
specifically called RELATIONAL Database Management System to one that isn't,
one of the things you may lose is relation handling. Do your homework and deal
with it.

2\. Oh my God it doesn't have transactions!

This is, arguably, slightly less obvious, and in combination with #1 can cause
issues. There are practices to work around it, but it is hardly to be
considered a surprise.

I keep stumbling on these stories - but still these are the two major issues
that are raised. I'm starting to get a bit puzzled by the fact that these
things are still considered surprises.

In either case, I'm happily using MongoDB. It has its fair share of quirks and
limitations, but it also has its advantages. Learn about the advantages and
disadvantages, and try to avoid locking too large parts of your code to the
storage backend and you'll be fine.

FWIW, I think the real benefit of MongoDB is flexibility w/r to schema and
datamodel changes. It fits very, very well with a development process which is
based on refactoring and minor redesigns when new requirements are defined. I
much prefer that over the "guess everything three years in advance" model, and
MongoDB has served us well in that respect.

~~~
rdtsc
> I must have read a dozen (conservative estimate) articles now all called
> "Why you should never use MongoDB ever"

Strange statistical oddity if you ask me right? How many "don't use
PostgreSQL" or "don't use Cassandra" or "don't use SQLite" have you seen? Not
as many. It is just very odd isn't it...

So either everyone is crazy or maybe there is something to it. I lean towards
the later here.

> 1\. Oh my God it's not relational! ... > 2\. Oh my God it doesn't have
> transactions!

Maybe those, you forget about:

3\. Claim "webscale" performance while having a database wide write lock.

4\. Until 2 years ago shipped with unacknowledged writes as a default. Talk
about craziness. A _data_base product shipping with unacknowledged send-and-
pray protocol as a default option. Are you surprised people criticize MongoDB?
Because I am not at all. Sorry but after that I cannot let them within 100
feet of my data. They are cool guys perhaps and maybe having beers with them
would be fun, but trusting them with data? -- sorry, can't do.

~~~
dangayle
> A _data_base product shipping with unacknowledged send-and-pray protocol as
> a default option.

MongoDB had a default initial fire-and-forget mentality, but that was on
purpose for their initial use cases. Just because someone else uses the tool
for a different purpose doesn't mean the software is to blame.

Also, if you're complaining about the default settings and you were running
this in production, RTFM.

~~~
rdtsc
> MongoDB had a default initial fire-and-forget mentality, but that was on
> purpose for their initial use cases

Yes I call that deceitful marketing. It wasn't an accidental bug or an "oops".
I don't know how someone can be considered honest or trusted with data when
they ship a d_a_t_a_base product with those defaults. Call it random storage
for 'gossakes, that would ok, anything but "database".

> Also, if you're complaining about the default settings and you were running
> this in production, RTFM

Yes and I also don't expect to read the fine print on last page of a manual to
enable the brakes when I buy a car. I expect cars to have brakes enabled by
default, even if it somehow makes them not go as as fast in benchmark tests.

~~~
lttlrck
It still boils down RTFM and don't trust marketeers, right?

~~~
rdtsc
Mostly, "don't trust marketeers with you data", which I don't.

------
Justsignedup
SQL is actaully a rediculously elegant language at expressing data, and
relationships. NOT a good general purpose language. So I tend to always favor
sql for relational data, actually most data.

queues, caches, etc = nosql solution. They tend to have much more features
around performance to handle the needs of these problems, but not much in
terms of relational data.

If you study relational databases and what they do, you will quickly find the
insane amount of work done by the optimizer and the data joiner. That work is
not trivial to replicate even on a specific problem, and ridiculously hard to
generalize.

And so this article's assertion that mongodb is an excellent caching engine,
but a poor data store is very accurate in my eyes.

~~~
_sh
No. SQL is actually pretty third-rate at expressing data and relationships. My
preferred way of expressing data and relationships is _the programming
language I am writing in_.

The problem with SQL is that it is not an API, it's a DSL. Which usually means
source-code-in-source-code, string concatenation/injection attacks, and crappy
type translations ('I want to store a double, what column type should I use?
FLOAT? NUMERIC(16,8)?'). Even as a DSL it's pretty low-brow: just look at how
vastly different the syntax is between insert and update, or 'IS NULL'.

For all those who love SQL, consider having to address your filesystem with
it. Directories are tables, foreign-keyed to their parent, files are rows.
There's a good reason why this nightmare isn't real: APIs are preferred over
DSLs for this use case. And so too for databases, because they are the same
abstraction.

Don't get me wrong, I love relational algebra and the Codd model, but SQL just
aint it. SQL has survived because of its one and only strength: cross-
platform. And like all cross-platform technologies, such as Java bytecode and
Javascript, its rightful place is a compilation target for saner, richer, more
expressive technologies. This is why I always use an ORM and have vowed to
never, ever, write a single line of SQL again.

~~~
spion
How about hybrid sql-builder / data grouper solution?

Not limited to ORM methods - get the full power of SQL instead. Not string
concatenation - get the full power of the language to build queries. Also, the
ability to get join results in either flat or grouped form.

For example [https://github.com/doxout/anydb-
sql](https://github.com/doxout/anydb-sql) (shameless plug)

~~~
_sh
Nice. This is exactly what I mean when I talk about ORMs. See how everything's
nicer when its an API?

------
JangoSteve
As others have pointed out, this article can basically be summarized as,
"don't use MongoDB for data that is largely relational in nature."

Mongo (or most document stores) are good for data that is naturally nested and
silo'd. An example would be an app where each user account is storing and
accessing only their own data. E.g. something like a todo list, or a note-
taking app, would be examples where Mongo _may_ be beneficial to use.

A distributed social network, I would have assumed, would be the antithesis of
the intended use-case for a document store. I would have to imagine a
distributed social network's data would be almost entirely relational. This is
what relational databases are for.

~~~
danenania
"An example would be an app where each user account is storing and accessing
only their own data. E.g. something like a todo list, or a note-taking app,
would be examples where Mongo may be beneficial to use."

Until you want some analytics.

~~~
JangoSteve
> Until you want some analytics.

I can actually respond to this specifically, as we recently had a project that
needed us to build some decently-sized and complex analytics into their app. I
spent about a month researching how most analytics solutions are structured
and work, and became very familiar with the codebase for FnordMetric, which is
one such open-source analytics solution.

You wouldn't initially think it (I certainly didn't), but Mongo is actually a
_great_ use-case for analytics data. Here's why...

Most analytics platforms don't query live-data and build reports on the fly.
It's terribly inefficient and doesn't scale. If something like Google
Analytics did this, it'd take forever for your Analytics dashboard to load,
especially at their scale.

What most analytics platforms do, is they know before-hand what data you want
to aggregate and at what granularity, and they perform calculations (such as
incrementing a counter) and then store the result in a separate analytics
database/table. In fact, there are several presentations and articles about
doing things like this with Mongo:

[http://blog.mongohq.com/first-steps-of-an-analytics-
platform...](http://blog.mongohq.com/first-steps-of-an-analytics-platform-
with-mongodb/)

[http://www.10gen.com/presentations/mongodb-
analytics](http://www.10gen.com/presentations/mongodb-analytics)

[http://blog.tommoor.com/post/24059620728/realtime-
analytics-...](http://blog.tommoor.com/post/24059620728/realtime-analytics-at-
buffer-with-mongodb)

And then, this is an interesting article that discusses the difference between
processing data into buckets on the way in, and creating an analytics platform
that does more ad-hoc processing on the way out:

[http://devsmash.com/blog/mongodb-ad-hoc-analytics-
aggregatio...](http://devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-
framework)

Let's take something as simple as aggregate pageviews for example (for
simplicity's sake, we'll say you want total pageviews for your app, not per-
page). Normally you'd think, simple, I'll just store my pageview events, and
then when I want to view pageviews, I'll issue a `COUNT` command on the
database. Even this gets terribly slow, for a couple reasons:

* You may just have a _ton_ of pageview event entries to query.

* Each pageview has a datetime-stamp, and you have to query not just one `COUNT` query for a given time-range; rather, your analytics dashboard needs to show a graph of counts over time, e.g. pageviews per day for the last week, or pageviews per week for the last year or pageviews per hour for the past day, etc. Each of these would require several distinct COUNT queries (or one more-complex GROUP query), which is even slower, especially for large datasets.

So generally, analytics platforms will have different aggregate buckets for
pageviews in the database, which each keep a different granular tally. For
example, I'd have a bucket for each day, which keeps tally for pageviews that
day, and a bucket for each week, which tallies pageviews for that week, etc.
When a pageview comes in, they'll increment each bucket (which is a really
fast process with Mongo, since it actually has an `INC` command (aka UPSERT)
which can easily increment multiple buckets with one really fast query.

So why is Mongo pretty good for analytics? Because 1) each time-interval
bucket is a silo of data for that time-interval, and 2) usually analytics are
for patterns and aggregate data, so they don't normally require extremely high
reliability (i.e. it's usually okay if an event is dropped here or there).

Of course neither #1 or #2 above are always the case, so this doesn't always
apply, but my point was just that Mongo is actually a better fit for analytics
than you might imagine.

~~~
jsmeaton
I haven't done the kind of analytics you're talking about, but it sounds like
the implementation is basically a round robin database.

------
chaz
Linkbait title aside, it's actually a helpful example for directing a database
novice on when to not use a document store. I could have used this post a few
times in the past few years.

~~~
rch
Agreed. Despite the unfortunate title, this is an informative, well written
and entertaining article that I might refer to in the future. It would be
better if there was a followup on when it would in fact be appropriate to
introduce a document store to a project.

~~~
danso
I agree that the OP is lengthy, and putting together this well-illustrated
post is no easy feat. However, I don't think the OP should be the one to write
about when you _should_ use a document store.

Maybe I'm too annoyed by the poorly chosen title. Or that I read that entire
post and was thinking _where 's the punchline_? On one hand, I credit the
author for thinking things through. On the other, the fact that she
_unequivocally_ attributes this issue to MongoDB shows that she _currently_
lacks the domain knowledge to consider appropriate use cases. It's not a
MongoDB problem, it's a problem inherent to this data structure, and someone
more well-versed in this topic would not conflate the issue...just as a decent
IT person would not blame "Windoze" for the fact that she can't get good Wifi
reception in the office.

OK, to be even more petty...I think what really aggravates me is how the OP
says she's not a database expert -- which is a _good_ disclosure, but self-
evident -- but attempts to assert authority by saying "I build web
applications...I build _a lot_ of web applications"...Uh, OK, so what you're
saying is that it's possible to be an experienced web developer and yet be a
novice at data design?

If _that_ was the angle of the OP, I'd give it five stars. Such sentiment
cannot be overstated.

~~~
rch
Well you're right of course that web developers (and business analysts, and
politicians, etc.) can absolutely get by for a staggeringly long time with
novice-level abilities. That problem is only getting worse as the tools get
better. Luckily I don't have to judge the OP on that basis since that's what
markets are for.

And maybe someone else, who has tackled enough difficult problems over time to
evolve a nuanced and technically informed opinion of various data modeling and
management options, _should_ write the response I mentioned. I'd argue there
are plenty examples of that material available already.

The OP, on the other hand, would be writing from the perspective of a
professional user who might choose a tool off the shelf at the recommendation
of a colleague, and whack it against the problem du jour to see if it works or
not. This is a common enough approach that there is at least a chance that a
followup would have some value. I can't really expect everyone who makes a
living writing web applications to understand CS fundamentals, any more than I
would expect it from chemical engineers or physicians. It is nice to be able
to point representative members of that audience to an article that resonates
with them, and not have to try to translate my opinions into similar language
(with or without cat gifs).

Edit: I actually think Journeyman would be a more appropriate term than
novice.

------
raverbashing
Really, in some places it hurts

* We stored each show as a document in MongoDB containing all of its nested information, including cast member*

I've seen this in people using MongoDB and the bough the BS that because "it's
a document store" _there should be no link between documents_.

People leave their brain at the door, swallow "best practices" without
questioning and when it bites them then suddenly it's the fault of technology.

" or using references and doing joins in your application code (double ugh),
when you have links between documents"

1) MongoDB offers MapReduce so you can join things inside the DB. 2) What's
the problem to have links between documents? Really? Looks like another case
of "best practice BS" to me

~~~
lhc-
Links in mongo aren't really links though; its up to the application to handle
the "joins", which really means making an extra query for every linked item.
It's like SQL joins except without any of the supporting tools or
optimizations that exist in RBDMS.

~~~
raverbashing
Yes, it is manual

But you can query for a list of ids for example, using the 'in' operator and a
list.
[http://docs.mongodb.org/manual/reference/method/db.collectio...](http://docs.mongodb.org/manual/reference/method/db.collection.find/#db.collection.find)

~~~
SigmundA
Isn't this done client side? Without joins in the db engine itself locality is
much worse along with lost opportunities for optimization leading to much
worse performance.

~~~
mason55
Yes, you have to build the list of IDs to pass to the $in operator and then
send out a second query but grandparent post said you had to make an extra
query for each linked item which is incorrect.

------
exclusiv
Do NOT use MongoDB unless you understand it and how your data will be queried.
Joins like the author mentions by ID is not a bad thing. If you aren't sure
how you are going to query your data, then go with SQL.

With a schemaless store like Mongo, I've found you actually have to think a
LOT more about how you will be retrieving your information before you write
any code.

SQL can save your ass because it is so flexible. You can have a shitty schema
and make it work in the short term until you fix the problem.

I wrote many interactive social apps (fantasy game apps) on Facebook and it
worked incredibly well and this was before MongoDB added a lot of things like
the aggregation framework.

The speed of development with MongoDB is remarkable. The replica sets are
awesome and admin is cake.

It sounds like the author chose it without thinking about their data and
querying upfront. I can understand the frustration but it wasn't MongoDB's
fault.

This is a big deal for MongoDB:
[https://jira.mongodb.org/browse/SERVER-142](https://jira.mongodb.org/browse/SERVER-142).

Let's say you have comments embedded on a document and you want to query a
collection for matches based on a filter. If you do that, you'll get all of
the embedded comments back for each match and then have to filter on the
client. IMO, when the feature above is added, MongoDB will become more usable
for more use cases that web developers see.

------
kcorbitt
I've seen a fair number of articles over the last couple of years comparing
the strengths and weaknesses relational/document-store/graph databases. What
I've never seen adequately addressed is why that tradeoff even has to exist.
Is there some fundamental axiom like the CAP theorem explaining why a database
like MongoDB couldn't implement foreign keys and indexing, or why an SQL
couldn't implement document storage to go along with its relational goodness?

In fact, as far as I can tell (never having used it), Postgres's Hstore
appears to offer the same advantages as a document store, without sacrificing
the ability to add relations when necessary. Where's the downside?

~~~
ddebernardy
> why an SQL couldn't implement document storage to go along with its
> relational goodness? (…) Postgres's Hstore appears to offer the same
> advantages as a document store, without sacrificing the ability to add
> relations when necessary. Where's the downside?

PostgreSQL can store arbitrary unstructured documents just fine: hstore, json,
… Each come with the possibility to actually index arbitrary fields within the
documents using a BTREE index on an expression, and arbitrary documents
wholesale using GIST index.

Besides the need to know a thing or two on query optimization, the only
downside I can think of is that ORMs are usually broken (Ruby's Sequel is a
notable exception). But this isn't a problem with Postgres itself; it's a
problem with ORMs (and training, admittedly).

------
hkarthik
>> Some folks say graph databases are more natural, but I’m not going to cover
those here, since graph databases are too niche to be put into production.

Is this really true? It sounds like both relational DBs and document DBs are a
poor choice for the social network problem posed. I've actually dealt with
this exact problem at my last job when we started on Mongo, went to Postgres,
and ultimately realized we traded one set of problems for another.

I'd love to see a response blog post from a Graph DB expert that can break
down the problem so that laymen like myself can understand the implementation.

~~~
fat0wl
I would look at Neo4j. I originally came across it when vetting Grails (it has
a Grails plug-in) and it seems to be one of the heavy contenders in terms of a
production-ready graph DB. People (this article's included) seem to say that
production-ready graph DBs don't exist. Maybe these projects are still trying
to gain traction? I expect some stable builds will be out there soon if they
aren't already...

[http://www.neo4j.org/](http://www.neo4j.org/)

~~~
drone
My experience with Neo4j (this year) was abysmal. The take-away I had was:
it's only good for very small graphs.

Generally, I'd spend some time writing a script to load data into it, start
loading data, respond to it crashing a few hours later, increase the memory
available to the process, start up again, and respond to it crashing a few
hours later. I was never able to get any reasonably-sized graph[0] working
reliably well without using an egregious amount of memory, and knowing that I
would continue to face memory issues, I gave up on Neo4j and found another way
to solve my problem.

It may be that I simply was not competent at setting it up properly, but no
other data store I've worked with has been as hard to get stable over a
moderately sized data set. I spoke with some other people who had worked with
Neo4j at the time, and they expressed the same issues - they couldn't make it
work for any reasonably-sized dataset and had to find another solution.

[0] Not big, mind you, just reasonably-sized. E.g. 4 million nodes, with each
node having an average of 5 edges and 2-4 properties.

~~~
mhluongo
Hm, I assume you reached out to the mailing list and what not? I know a number
of installations with numbers well above that. Were you using the batch
insertion API?

~~~
drone
No, I'm sure there are some great running instances out there - but I was put
off by the difficulty of getting it reliably running without being an expert
in its configuration. Additionally, the fact that I'd have to spend at least
$12k/year to have only 3 nodes in a cluster, knowing we'd need a lot more than
that as time went on sealed the deal.

We found that we could do everything we needed with secondary processing
against our document store at runtime for so much less without adding another
layer of complexity to the architecture.

Edit: forgot to mention - no we weren't use batch-insertion in all cases,
IIRC, we had issues with duplication and had to do check-if-exists -> create-
if-not as we were reading from raw data sources that were heavy with
duplicates.

~~~
jexp
Many heavy duty production customers of Neo4j run with just a 3 node cluster,
no need to scale out as with other NoSQL datastores. And actually they
replaced larger clusters with a small Neo4j one.

I would love to learn about your Neo4j setup, and the issues in detail, I want
to make it easier for people in your circumstances in the future to get
quickly up and running with Neo4j in a reliable manner. If you're willing to
help out, please drop me an email at michael at neotechnology dot com.

------
edude03
I hate link bait like this.

The real title should be "Why you should never use a tool without
investigating it's intended use case"

~~~
Robin_Message
But the point is that there is no use case. Relational databases and
normalisation didn't arise because a load of neckbeards wanted bad performance
and extra complexity.

The point of the article is that the world is relational, and because Mongo
isn't, it'll bite you in the ass eventual. Sure, that's a specialisation of
what you said, but still a useful one, as it allows you to immediately know
you shouldn't use Mongo (unless your data is all truly non-relational, _and_
you know you'll never integrate it with any relational data, which, without a
crystal ball, you can't know, so don't use it.)

~~~
exelius
There is a use case, but internet hype has gotten everyone wanting to use
Mongo when there's no real reason to. Postgres scales nearly as well as Mongo
while being a lot more flexible. That said, Mongo has some real benefits for
non-relational computing (see mapreduce) that could make some of the
abstraction headaches and lack of data model flexibility worth it for very
large data sets.

But I sort of agree; Mongo tends to be overused by startups who are trying to
solve a scalability / performance problem before they have one. In the process
they often end up running into data model limitations because stuff moves fast
early on and you can't foresee what you'll need in a year.

------
Axsuul
Can anyone explain what are some actual real-life good uses for MongoDB?

~~~
donw
Something like Imgur might be a good use-case for MongoDB. There's basically
no relations between images, so each image can easily be thought of as a lone
document.

That said, even if somebody was building something like Imgur, I would still
advise that the start with a SQL database. SQL is very well-understood, and
you will have no problem finding developers that have deep experience in your
SQL engine of choice.

More importantly, by the time you hit the point where you need a NoSQL
solution to handle scaling issues, you will have achieved product-market fit,
and can make a sane technology decision based on your vastly greater
understanding of the business needs.

~~~
theseoafs
> by the time you hit the point where you need a NoSQL solution to handle
> scaling issues

See, people keep saying that NoSQL databases give you a performance boost over
traditional relational solutions (MySQL and Postgres), but exactly where does
this performance boost come from? I can understand the appeal of in-memory
databases or using caching (Memcached) to supplement the relational solution,
but it seems like the vast majority of Mongo's performance benefits come from
eschewing ACID guarantees rather than document databases being inherently
faster.

------
colinbartlett
This is ridiculous linkbait bullshit.

Anyone who dismisses document stores entirely has lost all my respect. It
wasn't the right solution for your problem, but it might be the right solution
for many others.

~~~
y0ghur7_xxx
> _but it might be the right solution for many others._

The author made the example of the movie database and explained why it was a
good idea when they started, and why it didn't work out. Can you point out an
example of data you would store in a document database, which is not purely
for caching purposes?

~~~
lotyrin
Collecting structured log data like monitors or exception traces or user
analytics. Lots of documents, no fixed schema, they're all self-contained with
no relations. Map reduce makes query parallelism crazy magic.

A content management system. Some stuff may want data from across relations
(who owns this thing, and what is their email), but that's pretty infrequent
and having nice flexible-schema documents that contain all relevant
information that's being CRUD'ed simplifies things hugely - particularly in
MVCC systems like Couch that put stuff like multi-master/offline-online sync
and conflict resolution in the set of core expectations.

Edit: That said, Postgres is also MVCC, and hstore makes schema an option the
same way that relations and transactions were already, so I think it could do
pretty well. I haven't gotten the chance to play with it in recent history,
unfortunately.

~~~
Govannon
> Some stuff may want data from across relations (who owns this thing, and
> what is their email), but that's pretty infrequent

That might be a shaky assumption. Speaking as someone who works on a CMS,
content usually has an author, and people accessing that content might be
interested in them.

~~~
lotyrin
Yeah, but in most of those cases, it's as easy as get the author based on a
key from his content.

It's only when you want joins (e.g. give me all of the titles of all the
content and their author's information at the same time) that things get
hairy.

Agreed it's not always going to be true for many CMSes. I meant it as a
particular CMS, not the general class of CMSes but didn't make that clear at
all.

------
ryanobjc
They say never jump in an argument late... but here goes...

There is a lot of people arguing in the positive for polyglot persistence. The
arguments sound pretty appealing - hammer, nail, etc - on the face of it.

But as you dig deeper into reality, you start to realize that polyglot
persistence isn't always practically a great idea.

The primary issue to me boils down to safety. This is your data, often the
only thing of value in a company (outside the staff/team). Losing it is cannot
happen. Taking it from there, we stumble across the deeper issue that all
databases are extremely difficult to run. DBAs dont get paid $250k/year for
nothing. These systems are complex, have deep and far reaching impacts, and
often take years to master.

Given that perspective, I think it then makes the decision to use a single
database technology for all primary storage needs totally practical and in
fact the only rational choice possible.

------
electrichead
I'm going to reiterate as others have done - this is an area where a good
graph database would blow all the others out of the water. I am currently
using neo4j for a web app and find it to be extremely good in terms of
performance. There is really only one downside to using a graph database -
they are not really scalable horizontally as you might want. They need a fair
bit of resources. But in terms of querying, they would be unparalleled in this
particular use-case.

They are also not in infancy - they are in use in many places where you
wouldn't expect them and which aren't discussed. One big area is network
management - at least one major telecom uses a particular graph db to manage
nodes in real-time.

~~~
mason55
_> they are not really scalable horizontally as you might want_

Seems like this would be a huge drawback for a project whose entire raison
d'etre is horizontal scaling.

------
3327
This is a well known and well document "down side" of mongo. Frankly the
analysis of your article is jeopardized by your first line stating, "I am not
a database designer". Mongo has its downsides that are well known, but there
are also very good reasons to use Mongodb too. although its a lengthy article
with good examples it states nothing more than an obvious caveat mongo has
which is well known and documented.

------
danso
My advice to the OP: Re-jigger this article and retitle it: "The Data-Design
of Social Networks". That would be a worthwhile read and I appreciate the
detail that the OP goes into.

One of the subheads should be: "Why we picked the wrong data store and how we
recovered from it"

And not to be snarky about it, but an alternative title is: Why Diaspora
failed: because a Ruby on Rails programmer read an Etsy blog and thought they
understood databases

------
meritt
OP should have been using a graph database. Ranting about MongoDB because it
doesn't support what it's not designed to support is a bit silly. A RDBMS
would have been just as poor of a choice here.

~~~
ddebernardy
Facebook seems to be doing quite fine by combining SQL and Memcached.

~~~
weixiyen
They probably don't use relational databases the way you do for smaller
projects that don't need to scale to the millions.

------
neokya
Though title sounds like a link bait, this is actually eye opening article for
database layman like me. Very clearly written.

Now, what is MongoDB fit for? Most of the web applications are what author
gives example, complex and having inter-relationships. Can someone light up?

~~~
jacques_chester
Like the article says, it can be suitable as a caching layer in front of a DB,
especially for web apps that deal in ephemeral JSON documents.

------
DigitalSea
So in other words, they misused MongoDB and because of that are telling people
not to use it? Wow. Seems to be a case of "a bad mechanic will always blame
his tools".

In the right hands MongoDB can be a great asset. The problem here is that
Diaspora chose MongoDB when it was very immature and it seems the choice was
based on hype more so that mapped out requirements. This is where proper
planning for a large scale application will spot these kinds of problems
before they get to the development stage.

Later versions of MongoDB are much better and the upcoming planned changes
will take it many steps in the right direction towards being a viable
alternative to a traditional RDBMS. Having said that, it's not a silver bullet
and MongoDB is not for everything.

10Gen are exceptionally great at marketing Mongo and I get the feeling they
have kind of trapped Foursquare who have been using it in production for a
couple of years or so now. Having said that with exception of that one 11 hour
outage Foursquare encountered, MongoDB seems to be working really well for
them and they seem to be capturing more than just check-in data.

I still think with proper planning pulling off a social network with MongoDB
is possible. I am currently building a social networking type application, not
on the scale of Facebook but it does share some parallels. I've planned and
mapped out a viable structure and how it all connects, prototyping and testing
seems to indicate that MongoDB is up to the task, but we'll see.

~~~
wmt
"So in other words, they misused MongoDB and because of that are telling
people not to use it? Wow. ... The problem here is that Diaspora chose MongoDB
when it was very immature"

So the specific misuse of MongoDB was to use it? I guess that was also the
point of the story.

~~~
DigitalSea
Yes, exactly. They chose the wrong tool for the job, that's not a fault with
MongoDB. It's like eating a soup with a fork, you'll eventually eat the soup,
but if you used a spoon in the first place there would never have been a
problem. Is the company who made the soup to blame or are you to blame for not
consuming it correctly?

The problem with Diaspora was that it was a poorly executed good idea. Had
they actually sat down and mapped out their requirements and chosen an
appropriate database they would have realised that a traditional relationship
was the right choice to make.

Databases are hard, for a project as ambitious as Diaspora proper planning is
key and evident by the issues that Diaspora had when it debuted (delete
controller with no auth checking...) it's apparent they had no clue what they
were doing not from just a database planning point-of-view but a code one as
well.

I would take whatever anyone who had anything to do with Diaspora had to say
with a grain of salt. The real lesson here is to not get caught up in hype and
trends. NoSQL isn't a magical solution that will make scaling problems
disappear, traditional databases like MySQL have been battle-tested over a
very long period of time. If MongoDB and <insert X NoSQL database here> were
so great, the likes of Facebook and whatnot would use them on the scale
Diaspora tried to use them on.

People have to realise when Diaspora used MongoDB it was a very early version
of the database. A lot has changed since Diaspora used it, it is suitable for
a large-scale Facebook clone as the sole database? Definitely not, but using
it for aspects like messaging and notifications I think would definitely be a
good use case for it.

~~~
wmt
If MongoDB was the wrong tool because it was still too immature, i.e. broken,
wasn't the problem not that they were eating soup with a fork, but with a
broken spoon.

But yes, they were wrong to use a broken tool for the job.

------
ecaron
The best thing about this article is it demonstrates the problem with pg's
"The submission must match the title" policy.

------
kylemaxwell
"You should never use MongoDB [the way we did]". For some use cases, it would
be a terrible decision, as this project learned. In others, it works fine,
even with the "eventually consistent" sort of thing. I knew it would go bad
when the author immediately started talking about web apps from the get-go,
because (as unusual as this may seem to some subset of developers) not
everything is a webapp.

------
andrelaszlo
I'm not arguing against this article, they seem to have made some poor choices
along the road, but to say "you should never use MongoDB" is silly - even if
you add "...for relational data". It has it's obvious drawbacks, of course,
but MongoDB is way more than just a caching layer.

Here's some social networks that are running MongoDB:
[http://www.mongodb.org/about/production-
deployments/#social-...](http://www.mongodb.org/about/production-
deployments/#social-networking)

The list includes Foursquare. They have been running Mongo as their main
storage engine for about four years now. They migrated away from MySQL, as
discussed in this video:

[http://youtu.be/GBauy0o-Wzs?t=2m30s](http://youtu.be/GBauy0o-Wzs?t=2m30s)

------
ukd1
This should read one of;

1\. why you shouldn't use things you don't understand. 2\. I wish I'd rtfm
with Mongo 3\. schema design for mongodb, what I wish I'd known

------
quizotic
Funny thread!

MongoDB's query performance is typically sub millisecond, and rarely as much
as 5ms. I don't think you can through the postgres/mysql parser in 5ms, much
less optimizer, planner, and execution stack. Couple that speed with a dead
simple API, and you've got a thing of beauty.

So yeah, if you don't care about ms latencies, if you've got a fixed
rectangular schema, if want to write queries with lots of joins, if you need
ACID guarantees ... etc ... then by all means pick your favorite RDBMS.

OTOH, when you need to manipulate relatively self-contained objects quickly,
when those objects don't have a fixed schema, when availability is more
important to you than immediate consistency, then why would you choose
anything other than MongoDB?

~~~
j-kidd
> I don't think you can through the postgres/mysql parser in 5ms, much less
> optimizer, planner, and execution stack.

Yeah... except no. I just set `log_min_duration_statement` to 0, and can see
that PostgreSQL typically takes less than 0.1 ms to parse a query.

Quickly parsing a query to come up with an optimized plan is actually a great
strength of PostgreSQL, when compared to other RDBMS. MSSQL, for example, has
this complex query plan caching mechanism to compensate for its slow parsing.
PostgreSQL doesn't need that.

Also, with EXPLAIN ANALYZE, I can see that PostgreSQL typically takes less
than 0.1 ms to do an index lookup as well.

You seem to believe that MongoDB has some kind of magic that makes it the only
database that can perform sub millisecond query. 10gen is doing a great job
there.

------
gress
Many comments attacking the author of original article, but not a single one
addressing the arguments.

~~~
bronson
That's because her arguments don't make a lot of sense. It's like titling a
blog post "Never use a hammer!" and follow it with line after line of "I'm not
a refrigerator repairman but my refrigerator uses Phillips screws so I used a
hammer to remove them and it kind of worked but it took far long to do the job
and damaged some of the screws..."

Short of "use the right tool for the job" (which a lot of people here have
already said), what do you expect?

Mongo is a document store. If she'd used it to store documents then I think
her blog post would have been quite positive.

~~~
gress
If you're right, it would be helpful to point to an unambiguous description of
what constitutes the kind of 'document' that mongo is suited for, and what
does not.

~~~
bronson
That's like asking for an unambiguous description of what makes a language
good. Even if I were to write a book it would be oversimplified and
unsatisfying.

SO, OK, here's the oversimplified version... basically, if you have large
amorphous chunks of data, easily denormalized so very few joins required for
typical queries, then you have a document-friendly dataset. Medical records,
court docket entries, things where a SQL representation has many tens of
often-NULL columns and only a few foreign keys, those are probably good
document-based data.

What Sarah describes is exactly the opposite: small, deeply nested, tightly
joined, difficult to denormalize, nuggets of data. Expecting a document-
oriented database to handle this is like expecting a SQL database to handle
complex graph data: it can be done, but it's slow and the workarounds are not
pretty. It doesn't make a lot of sense to complain about that.

Hope that makes sense.

~~~
gress
It makes sense, although Sarah seems to suggest that _any_ linking between
documents will lead to the problems she describes.

I am willing to accept that for a sufficiently stable set of 'documents' this
might be surmountable, but it does seem like a major limitation of mongo as a
general purpose store, and the quick dismissal of her point seems unwarranted.

~~~
bronson
Her quick dismissal of Mongo is unwarranted. For many datasets it works great.

Here's my question: are the (hundreds? thousands?) of production Mongo deploys
wrong, or is Sarah wrong? Given her blog title it's gotta be one or the other.

I'm not using Mongo now, but I'm happy the site I created with it a few years
back is still running strong. Once I got used to denormalizing the heck out of
everything I had zero complaints. But it was for medical records: easily
denormalized, mongo-friendly.

It did lead to this:
[https://github.com/bronson/valid](https://github.com/bronson/valid) (probably
never get around to polishing it, alas...)

~~~
gress
Her dismissal is anything but quick. It's a carefully crafted argument which
deserves real rebuttal.

I'm not in any way trying to bash mongo. I have a consumer facing production
that has been running it since 2011 without great performance and no problems,
but my data model is carefully simplified, and I've been uneasy about how well
it would work for a more complex schema.

I found Sarah's analysis useful, but I'd find more precise guidelines for the
conditions of what does and doesn't work even more useful.

~~~
bronson
"Why You Should Never Use MongoDB" is about as quick as it gets. If that's
true, there's not much point to reading the rest of the article is there?
Don't use Mongo.

You might as well be asking for precise guidelines on the right amount of
normalization to have in a SQL schema, or which language or editor to choose.
You could spend weeks reading articles and blog posts and come away with a
really lopsided view of the subject, or just dive in and figure out what works
for you.

I found Sarah's analysis trite and one-dimensional. About as useful as "never
eat peanuts" with a lot of scare words about allergies. That's great linkbait,
interesting to anyone who's never eaten a peanut before, and might even have a
thing or two to be learned. Unfortunately, it overstates its point so far that
it just isn't useful in the real world.

~~~
gress
The headline is provocative, and certainly not absolutely true. It would be
better phrased as 'Why mongodb is unsuitable for most use cases'.

However she then wrote a full piece justifying her position. We might disagree
but she provided valuable insight.

Most people here did nothing of the sort. And the argument that 'it would be
better to dive in rather than read blog posts' is a generic argument against
anyone reading or writing technical blog posts.

Surely you can't really mean that.

~~~
bronson
Her headline is a more accurate than yours. Why attach a reasoned headline to
an unreasoned article?

If you truly believe that mongodb is unsuitable for most use cases then I hope
you're basing that on more evidence than this blog post. Citation needed
please. And recognize that your statement doesn't agree with my experience.
(unless you're saying, " _(insert any database technology)_ is unsuitable for
most use cases" \-- that's probably true but not useful)

SO, is it better to dive in rather than read linkbaity one-sided blog posts?
Yes. Yes, I mean exactly that. Surely you don't disagree?

------
mcgwiz
I give Diaspora and the author kudos for the effective hatebait title, though
I would have preferred "Study the documentation/source of all mission critical
components carefully."

Reading through the article, I saw similarities between their issues and ones
I've encountered. However, having skeptically studied MongoDB's docs and
tested its behavior, I had a very good idea early that MongoDB preferred
idempotent, append-only, and eventual-consistent data, with two-phase commit
being the nuclear option (and that boxen-wise, the high-memory ones are
preferred).

Regarding denormalization storage, invalidation is indeed a tricky issue. A
CQRS approach with a durable event queue between the command/domain and an
idempotent denormalization system elegantly yields an exceptional
scalability:complexity ratio. In the end, I built a performant, flexible app
that's been a joy to work on and operate.

I see pain similar to Diaspora's as a result of "move fast, break things"
culture taken to the extreme. No planning or basic research of any kind is
expected anymore. I know (from experience watching people pick technologies
they barely understand and try to IntelliSense their way to delivery) that I'm
in the minority, but I've always preferred to read a book cover-to-cover (or
traverse a website depth-first) before adopting a technology in earnest,
because I know I don't know what I don't know.

------
pkorzeniewski
All things aside, the main problem I had with MongoDB when I've tried to use
it in one of my projects (in NodeJS) was the way to query data. I know, there
are some libraries for that, but it's just pain in the ass to make complex
queries - something which would be an easy JOIN in SQL, in MongoDB is a big
pile of callbacks and long, chained method calls. It was just hard to
maintain, and the data structure flexbility didn't make it easier which led to
a completely messed up database.

------
No1
The whole post hinges on the statement:

"When MongoDB is all you have, it’s a cache with no backing store behind it.
It will become inconsistent. Not eventually consistent — just plain, flat-out
inconsistent, for all time."

OK, here is a chance to share a little insight. At what point did Mongo become
hopelessly inconsistent? Were you ever able to determine why? Why bother with
cute pictures and verbose explanations of simple schemas when the conclusion
is just that Mongo breaks no matter what without further explanation?

~~~
gcv
Probably because the project was forced to keep copies of data all over the
place, and was distributed to boot. There was probably a reasonable bug or
edge case which caused some copies to conflict with each other, and since
writes are destructive, it became impossible to reconcile the conflicts.

------
petepete
The 'main' app at my place of work uses MongoDB this way; it even implements a
'relationships' collection that is used for one-to-many 'joins' (ugh, just
thinking about it makes me feel ill). Unfortunately, I joined when the initial
write was nearing completion and was unable to steer the team in the right
direction in time.

I just submitted a proposal for a ground-up rewrite; unless it's accepted I'll
be leaving promptly.

~~~
jacques_chester
In fairness to MongoDB, I've seen this done in the relational world. I once
worked at a place who had a schema where everything joined through a single
table -- "TableRow_TableRow", which had six fields. Two IDs (which were
varchars) and metadata_1 through metadata_4 (also varchars).

They couldn't understand why it was so slow. But hey, it's _super flexible_ ,
right?

------
jwwest
Link bait title aside, I'm a little bored with these "We don't like x, so you
shouldn't use it" articles.

The main downside of MongoDB is that it's new. This means less knowledge of
best practices, incomplete or missing support in third party integrations, and
feature-lacking tools. It also takes a different approach to architecting
systems than you would take when using a SQL approach.

~~~
dbcfd
There's also the fact that what they were trying to do in Mongo was not what
you should do in Mongo. Use a relational or graph database for data that is
best represented as a relation or graph.

Nothing like using a hammer to paint a wall and then say you should never us a
hammer.

------
lynchdt
Over the past year we have from scratch built a significant web-application
with MongoDB. We also have a social graph. We build infinite-scroll activity
feeds on the fly. We handle multiple writes per second and 10s of reads per
second. We have hundreds of thousands of users. We have 100 million documents,
growing nicely. We've forced square pegs into round holes on occasion, but
nothing we were too surprised about.

It seems like you walked into to your database technology choice with your
eyes fully shut. Given even a modicum of of preparation - e.g. reading the
MongoDB documentation - you would have asserted the social graph use case to
be a challenging one with a data-store in which relations are unnatural.

Then - because you realized you may have a sub-optimal solution, you optimize
it by changing technology. And then decide to join this ridiculous anti-
MongoDB internet bandwagon.

For somebody who builds "4-6 web-applications per year" and has deployed "most
of the data-stores you've heard about - and some you haven't" this seems
surprising.

Or perhaps not, actually.

------
bsaul
The most interesting part is to make a difference between your primary data
store and your secondari(es) ones. For "friends of friends" queries you could
definitely use graph dbs, for "tree hierarchical" data then a document based
is good. Secondary dbs are chosen depending on queries. For analytics then
choose columnar dbs or map/reduce friendly data stores, etc...

But primary db needs powerful querying possibilities, strong consistency and
durability,as well as really good admin toolings. That's the "fall back", to
be used as a last resort, but that's also the point of truth of your system.

Then comes optimizations, and pipe-like data processing to move data from
primary to secondary dbs ( or to both in parallel).

That's why i've never been found of using new technologies such as mongodb or
couchdb or anything like that for my primary db.

------
crystaln
There are very few use cases that can be solved with mongo that can't be
solved equally well using nosql inspired features of Postgres and other simple
tools.

Data often has relational characteristics we don't anticipate. That's why we
build flexible schemas. Mongo schemas have none of that of flexibility.

------
exelius
+1 on all the linkbait comments. MongoDB was not a good fit for this project;
but there are a great many projects where MongoDB is a great fit. If you were
dumb enough to just start using MongoDB (or mySQL, or whatever) without
matching it to your data model, that's on you.

------
henryw
I was about to go all in with mongodb (w/ node) on my next project, including
for the user table. But after this, I'm going have to re-evaluate it.

"Once we figured out that we had accidentally chosen a cache for our database,
what did we do about it?"

"The only thing it’s good at is storing arbitrary pieces of JSON. “Arbitrary,”
in this context, means that you don’t care at all what’s inside that JSON. You
don’t even look. There is no schema, not even an implicit schema"

"I’ve heard many people talk about dropping MongoDB in to their web
application as a replacement for MySQL or PostgreSQL. There are no
circumstances under which that is a good idea."

"I suggest taking a look at PostgreSQL’s hstore"

------
mkohlmyr
.. For relational data. Is what the title should be. But that is far less
effective linkbait.

------
cnp
What a FANTASTIC write up! So much clarity here.

------
eonil
(1) The author knew social data is _a graph topology_. (2) Also knew there's
no true production level graph database solution in the market.(I doubt this
but anyway, the author did) (3) And tried to store it into many ISOLATED trees
- what MondgoDB offers. (4) Realized that's impossible. (5) Blame MongoDB for
lack of graph connectivity feature. (6) Back to RDBMS for consistent graph
connectivity.

None of this procedures does make sense.

I want to ask to the author. Did you really knew what the GRAPH is?

------
auggierose
This seems to be a case of incompetent programmers doing jobs which are way
over their heads. Just saying. Well, if it doesn't kill them, they'll learn
from it.

------
buckbova
Not sure why it has to be one or the other.

It seems perfectly reasonable to store the users profile information and media
in mongodb and your likes and comments in a relational db.

------
khailey
Putting together a social network schema and keeping performance in check is
less than evident. It's a problem that many have to tackle such as facebook,
twitter, linkedin, flickr etc. Here is a blog article discussing some of the
issues and approaches: [http://www.kylehailey.com/facebook-schema-and-
peformance/](http://www.kylehailey.com/facebook-schema-and-peformance/)

------
dgregd
I hope that Mozilla guys will also learn that lesson. They helped Microsoft to
kill WebSQL. So instead of SQLite, we have terrible IndexDB inside browsers.

------
marvwhere
TL;DR would be nice =/

all in all i do not see all problems u see, we running mongo with elastic
search on 30k unique page per day, and we do not have big problems.

------
wehadfun
This was the best explanation of MongoDB I've read.

------
LouisSayers
How about this for an article: Why you should never use a Microwave.

I tried toasting my toast in the microwave and it didn't work so good.

How about you use a document store for documents, and a graph db for graphs?
Of course mongo won't work for data with loads of relationships in it, because
it's not meant for that.

Next time take your bread out of the microwave and put it in the toaster - I
guarantee you'll get better results.

------
hakcermani
I wonder why this is such a surprise and all this hoopla ...
[http://docs.mongodb.org/manual/faq/fundamentals/#what-are-
ty...](http://docs.mongodb.org/manual/faq/fundamentals/#what-are-typical-uses-
for-mongodb)

The big advantage of mongo (as other nosql dbs) is the dynamic schema. Try
adding a new column to your sql database ..

------
memracom
Many boosters of NoSQL forget that relational database engines are not all the
same. They evolve, and are no longer the same as they were 10 years ago when
people started building NoSQL engines. In particular, PostgreSQL is not the
same as Oracle or MySQL, and PostgreSQL has been evolving quite a lot in
recent years to make it a stronger competitor wih NoSQL.

------
beat
Consider Neo4J.

The fail mode here for MongoDB (complex non-hierarchical relationships) is a
win mode for Neo4J. Hierarchies are just a special case of graphs. Use a
proper graph database, and you can actually represent the relationships based
the domain model, without the clumsiness of hierarchical NoSQL or even worse
clumsiness of RDBMS.

------
jliptzin
It's certainly not a one size fits all solution, but saying NEVER use it for
anything is a little strong. I've been using it in production now for 6 months
on a fairly well trafficked site without a _single_ disruption that wasn't the
fault of application layer code.

------
johnymontana
The article mentions that they considered a graph database, but considered it
too niche for production. Is that the general opinion on graph databases (like
neo4j) at this point? Not production ready? This project seems like a perfect
application for a graph database.

------
hobs
To be fair, I work in SQL constantly, but someone saying "7 table joins ugh"
all I can think is really?

Maybe I have spent too much time in the TSQL world, but I regularly see things
with 20+ joins without blinking.

------
weddpros
The problem described here is about the lack of transactions only... The
relational nature of pgsql or mysql is not what makes a difference when you
want "all or nothing" insertions or updates...

------
dreamdu5t
Like mongo queries but use SQL with node.js? Check out
[https://github.com/goodybag/mongo-sql](https://github.com/goodybag/mongo-sql)

------
csense
I was never fully sold on the advantages of jumping on the NoSQL bandwagon.
This article echoes a lot of the concerns I had when people were first getting
excited about MongoDB.

------
ezmobius
You thought you wanted to use a document store until you realized that
redis+postgres was actually what you needed :P

------
devanti
someone misused mongodb, and now they're blaming mongodb for it.

A social network will have a lot of relations -- so you should use a
relational database.

Other web apps may not need as many relations (most mobile apps), in which
mongodb is a superior solution

~~~
argvzero
you keep using that word "relations". i do not think it means what you think
it means.

[http://en.wikipedia.org/wiki/Relational_algebra](http://en.wikipedia.org/wiki/Relational_algebra)

------
bibstha
Is it also the case of all other nosql databases like Cassandra or CouchDB
etc?

------
tayzco
This is just a naive article from a naive developer who shouldn't be
responsible for choosing data stores in any project. 4-6 projects is a lot? I
think you need to get your head out of your a __(like most ruby devs ;).

~~~
kul_
we dont treat well the kind who write 1000 lines of codes for adding two
integers over here mister.

------
halayli
Don't blame it on MongoDB if you are using the wrong tool for the job.

~~~
jonknee
The point of the article was that there isn't a job that MongoDB is the tool
for. (See the last example where the model fit perfectly until a feature was
needed that blew everything up.)

~~~
ruok0101
Wasn't the point actually that even in that second example, they didn't have
the foresight to ask the client about the need to cross reference tv shows
ahead of time?

And for the record, how on earth is this MongoDB specific? Almost all NoSQL
solutions fall into this same situation.

I was at least expecting more complaints about the early days of mongo's
writes not being durable... At least that argument has merit.

------
snambi
Absolutely Agree.

------
asdasf
>Seven-table joins. Ugh.

Where does this attitude come from in the first place? Even when I was just
first learning SQL the notion of doing multiple joins was never off-putting or
scary. Quite the contrary, the fact that joining two relations produces a
relation which I can then use in more joins seemed like a perfectly elegant
abstraction.

>On my laptop, PostgreSQL takes about a minute to get denormalized data for
12,000 episodes

[http://www.postgresql.org/docs/9.3/static/sql-
createindex.ht...](http://www.postgresql.org/docs/9.3/static/sql-
createindex.html)

~~~
bcoates
I'm not sure either but I think it's the procedural mindset. People learning
about relational databases hear about how a join is equivalent to cartesian
product, then imagine the nested-loops implementation and think that k-way
join query fundamentally has performance O(n^k).

They haven't wrapped their brain around the declarative mindset that a join
doesn't mean looping any more than multiplication implies a loop of additions.

I think it would help if relational databases acted less like black boxes and
exposed worst-case performance guarantees for particular queries. AFAIK no
DBMS actually promises that an equi-join on an indexed column has constant-
time overhead despite being implemented that way.

~~~
mcdougle
It doesn't help when a co-worker writes a query that just left outer joins
every table on the server and uses the where clause to filter out the
excess...

(found one of those this morning)

~~~
twic
Just make sure they don't find out about recursive common table expressions.

------
lafar6502
Oh, God, an advisory article by people who have just learned what a relational
db is. What's next?

------
AsymetricCom
LMFAO at startups using hot new technology so they can throw back the
technical debt back in their willfully ignorant investors' laps. Then the
business suits come back thinking "yeah we want a feature that makes us as
useful and powerful as IMDB, that site looks so dated, we'll steal their
market easy. Yeah just add actors. Piece of cake right? Shouldn't take you
more than a day." Hahahahhaha.

------
leccine
I find it fascinating that people misunderstand data so much even 2013 that
they end up implementing a totally inefficient solution without actual data
modeling and design. The most missed point is usually querying. Focusing on
the how do I write data first can lead to disasters like this. Why don't
everybody start with the "how do I query this dataset in this shape?" question
first?

------
seivan
These days I tend to use SQL for data, and Redis for complex relationships and
graphs. It's good enough for me to combine those. Usually the relationships
are stored with ids on ordered sets. And I just do an id lookup on postgresql.

------
lampe3
I Stopped clicking on titles like this... Titles with NEVER/ALWAYS/something
in that direction are just there to make you click and mostly are not good
articles.

Maybe this is i don't know... i don't clicked on it...

~~~
golfadas
It is a flaming title, but an actual good read if you want to know what kind
of problems you might have when using Mongo.

