
Comparison of Joins: MongoDB vs. PostgreSQL - ahachete
https://www.enterprisedb.com/blog/compare-mongodb-vs-postgresql-join-command
======
ahachete
Co-author here. Thank you all for your comments.

While I understand is tempting to easily conclude that this is apples-to-
oranges or that "obviously a relational database is better at data
relationships", I recommend to dive deeper into the topic.

First of all, almost all data models you may think of are in some or another
way relational (call it "related data", if you prefer). So any database, RDBMS
or NoSQL or whatever, will need to deal with data that is related to other
bits of data. So this is, from this perspective, very apples-to-apples IMHO.

Second, and regardless of the above statement, the exercise here is to model a
data model that is, necessarily, two (or three) pieces of related data. And
actually we considered the two models that MongoDB themselves consider for
data modelling: embedded and reference. This is explained in the post and
references to official MongoDB documentation are provided.

Third: benchmark is open and public. The repository also contains all the
query examples used in the post. If anyone considers that there's a better way
to represent this data model (problem statement), that would still answer all
the "business" questions (queries) considered in the post, please send a Merge
Request to the repository. We will be very happy to analyze it.

All in all, this is an exercise of data modeling, for a given dataset. MongoDB
requires data modeling, is not "schema-less", and this is pervasive through
their official documentation. The reference model, and their $lookup operator,
are not a corner case nor a hidden downplayed option. It is a first-class
operation, well advertised. But its complexity, performance and rewriting
needs to adapt to changes are what are not, apparently, advertised, after our
observations.

------
liquidify
Ok everyone, calm down. I think it is obvious that the author knows that he is
comparing apples to oranges here... but that isn't the point. The point of
this article was to quantify the difference. His work accomplishes that for
these specific tests very well. Yes, we know relational systems are better in
these cases, but it is useful for those considering no-sql solutions to see
this kind of hard data.

------
stepbeek
While I think it's easy to poke fun at this article - relational database
better at handling relational data, more at 10 - it does make me want to
understand why one would choose mongo over postgres?

~~~
jmull
I can’t address MongoDB, but nosql vs relational generally, there are two
reasons:

(1) what’s faster than a fast join? No join. With nosql you have more
flexibility to store the data organized in the same way it is accessed. You
don’t need to join to another table if the data your app needs is already
directly part of the main data the app requests. You _do_ need to understand
your access patterns well, and develop migration plans when they change.

(2) scalability. nosql databases generally let you scale horizontally more
easily/gracefully than relational databases, though there are trade offs.

Put it together and you get performance at scale, though you generally need to
understand your data access patterns and usually need to be more resilient to
inconsistencies.

~~~
zozbot234
> what’s faster than a fast join? No join. With nosql you have more
> flexibility to store the data organized in the same way it is accessed.

You can definitely do this in PostgreSQL via materialized views. They're
directly supported in recent releases, and they don't require you to
denormalize the underlying datamodel unlike NoSQL.

~~~
jmull
Well, a materialized view _is_ denormalized data. You distinguish it from the
“underlying data model”, but you can do that in nosql as well. Also,
materialized views still tend to be oriented to rectangular data, which isn’t
always how you want to access data.

I agree with the general point that Postgres and other relational databases
can do nosql things (and nosql databases can do relational things).

------
kthejoker2
Only one person in this comment thread mentioned CAP theorem, which is the
(highly informal*) rationale for NoSql's existence and also its deliberate
aversion to join dependencies (which more or less violate the core principles
of eventual consistency modeled distributed data stores.)

Useful links for erudition:

[https://arxiv.org/pdf/1302.0309.pdf](https://arxiv.org/pdf/1302.0309.pdf)

[https://codahale.com/you-cant-sacrifice-partition-
tolerance/](https://codahale.com/you-cant-sacrifice-partition-tolerance/)

[https://martin.kleppmann.com/2015/05/11/please-stop-
calling-...](https://martin.kleppmann.com/2015/05/11/please-stop-calling-
databases-cp-or-ap.html)

[https://www.somethingsimilar.com/2013/01/14/notes-on-
distrib...](https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-
systems-for-young-bloods/)

[https://www.microsoft.com/en-
us/research/publication/replica...](https://www.microsoft.com/en-
us/research/publication/replicated-data-consistency-explained-through-
baseball/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F157411%2Fconsistencyandbaseballreport.pdf)

------
verst
I would really love to see a direct comparison for scaling MongoDB and
PostgreSQL.

When do I usually need to scale up / out in either database? How do I assess
that? And how do I actually go about scaling?

Specifically I would love to know how to scale up and scale out (what are the
different multi-node options and what are the consistency / replication delay
guarantees)?

The last time I used Mongo in production (2013) most startups eventually had
to switch databases or hire a consultant to scale. This has always stuck with
me.

~~~
gshulegaard
I have spent most of my (limited) career scaling PostgreSQL with (IMO)
surprisingly good results.

Most recently I have been using Citus and it has been ridiculously easy to
shard data.

To be fair I have only been exposed to workloads of a few Terabytes of data
ingest a day so YMMV.

Unfortunately, I can't compare to Mongo as I have had somewhat limited
experience with it.

~~~
matchy
> can’t compare to Mongo as I have had somewhat limited experience with it

Doesn’t seem to stop most people here.

------
throw_m239339
Can someone who is using MongoDB and a RDBMS at the same time tell me the
exact use cases where MongoDB is fitter? Structured logs? Caching? Sessions?
And why MongoDB does it better?

~~~
anthony_doan
Cluster is very easy in MongoDB and for unstructured data. I've looked into
clustering postgresql a few time and it seems like a pain in the butt. TBH,
storing data without thinking about relation is just kicking the can down the
road.

My previous company store coupon code on it (referrals). I prefer a hash nosql
for that though (cassandra) but for a one man shop it's overkill.

\---

edit:

IIRC postgresql have their own document storage like MongoDB. I just never had
to use such thing because I guess I'm too OCD about how my data suppose to
look like.

------
BenoitP
A basis for the apples to oranges argument is that you don't use MongoDB when
you need JOINs.

To me, this almost never happens. You almost always need JOINs, otherwise your
data is sterile and useless. What point is there to talk about entities devoid
of relationships to other entities? Data is always relational; we don't live
in a 1-dimensional world.

The only way I see it being useful is for scaling purposes. But there are way
better, more stable, more resistant to failure kv stores out there.

------
tasubotadas
If you picked mongodb and you are using joins, you've picked a wrong tool for
a job.

~~~
commandlinefan
And the corollary: if you store data, you need to use joins.

~~~
throwanem
Embrace denormalization! It makes for splendid job security.

------
diddid
This should really be titled “A comparison of aggregates”.

------
daneel_w
This was quite the lopsided comparison - a document store just doesn't get
fair competition in a relational setting. Additionally, both the SQL and the
MongoDB queries appear unnecessarily complicated (or inefficient) for the
examples they illustrate.

~~~
d_watt
Is this actually an objective truth? "relational" just refers to the relation
within the table, it's less about having a database system with good inter-
table relationship handling. It happens to be that a lot of the things we
think about as relational are OLTP and have good fkeys, etc.

Is there any real reason why a document store can't have good joins between
collections? If you index the lookup keys, I don't understand what about the
underlying document model would make joining impossible to optimize as well as
a relational model.

~~~
alexpetralia
I don't think most document stores enforce foreign key relationships (and
hence consistency) like RDBMS do, so in practice that would make joins
difficult.

~~~
d_watt
Is joining on an fkey any more performant than without it assuming the same
indexing is applied?

When you say difficult do you mean from a developer ergonomics standpoint
(knowing there's data integrity), or from a database optimizability
standpoint?

~~~
alexpetralia
The former. You have a guarantee of consistency, whereas otherwise you must
adjust for this in your application code. Database performance wise I agree
they are equivalent.

------
psychometry
We're still talking about MongoDB? Why?

~~~
alexbanks
Because lots of people are still using it. In Q1 2020 they 89m in revenue, up
from 50m in Q1 2019. Just because something is bad technically doesn't mean
people will immediately abandon it. See: php.

