
A Postgres Perspective on MongoDB - tdurden
http://bonesmoses.org/2016/07/15/pg-phriday-a-postgres-persepctive-on-mongodb/
======
twblalock
I think the NoSQL trend is something similar to what is going on with
programming language trends (in industry, not in academia). Many commonly used
languages were strictly typed and static, such as C, C++, and Java. Then,
dynamically typed languages became popular because they are easy to use,
although they give up a lot of the safety and static analysis capability of
the older languages. Let's call that the second phase, and it is exemplified
by Ruby and Python. Now, newer languages with type inference are coming out
that combine the safety of the older languages with the ease of use of the
dynamic languages -- Scala, Go, etc. Let's call that the third phase.

In the database world, we are still in the second phase, which is exemplified
by databases such as Mongo, which have started from scratch and thrown out
many of the good ideas embodied in older databases. I suppose that the new
JSON capabilities of Postgres are an attempt to move database design into the
third phase, which combines the ease of use of NoSQL with the many superior
qualities of traditional relational databases.

~~~
BurningFrog
My pet theory is that the rise of automated testing was/is a big factor in the
rise of dynamic languages.

Test suites guard against most things type safety does, so if you have a test
suite anyway, the type safety becomes a net burden.

~~~
matt4077
I don't remember much automated testing being done in those early days of PHP,
and I don't see how your theory would explain the current trend back towards
typed languages.

------
bojo
> MongoDB is basically a native JavaScript filesystem

That is probably the most apt description of MongoDB I've ever read.

~~~
threeseed
Filesystems don't intrisincally support searching the contents of the files to
satisfy predicates. And that is the most important part of a database.

I think I understand what the author means by the statement. But technically
it doesn't make much sense.

~~~
raverbashing
Yeah, technically you could bring grep to your fs instead of having it as a
userspace tool, but 90% of the work is on the fs

------
graffitici
I wish the OP had also drawn a comparison between Postgres and Cassandra
(which he mentions early on).

Based on the schema he writes at the beginning, it looks as though he works
mostly with time-series. My understanding is that Cassandra is particularly
well-suited for these types of workloads (definitely more so than MongoDB?).
If the write throughput is such that a single server can handle it, how would
Postgres compare with Cassandra? What are the distinctions in read-latencies?
Storage space?

Perhaps for the next article..?

~~~
zihotki
So true, I often see people choose a database without considering pros and
cons, they often considering only what they want as a developer to work with
or to try and don't consider the nature of the data. Time series data fits the
best in Cassandra and some other column-oriented DBs but definitely not in
MongoDB.

~~~
threeseed
Time series data is actually a great fit for MongoDB.

You create a collection for each measurement, JSON document for each day and
then preallocate the fields for each hour/minute/second. With batching you can
update 100K documents a second pretty easily off one node. For example:
[https://blog.serverdensity.com/using-mongodb-as-a-time-
serie...](https://blog.serverdensity.com/using-mongodb-as-a-time-series-
database/)

~~~
rlau26
I don't like MongoDB as much as the next guy, but can people explain why this
comment is downvoted?

~~~
danbmil99
Because haters gonna hate

------
patsplat
From my experience building and deploying production systems with mongodb the
primary benefit is not needing a migration or ORM framework.

\- Deployments get much simpler when migrations for additive changes are
unnecessary. Hundreds of deployments a week!

\- JSON queries easy to pass around and decorate in application code.
Hydration from a dictionary is straightforward. Combined, an ORM isn't really
necessary.

~~~
mrweasel
>primary benefit is not needing a migration or ORM framework

You don't need an ORM, but they are handy. I'm pretty sure it wouldn't be an
issue to load rows from a RDBMS, using a language like Python, as a
dictionary, and just decorate the snot out of that dictionary, to make it look
like an object.

Anyway, regarding migrations, I sort of agree. Migrations can be terrifying
and slow things, but at least you're allow to make certain assumptions about
your data afterwards. Assumptions like "this field will always be there".

With MongoDB, and any other NoSQL database I've used, you need to handle your
"migrations" in the application code. So now you have block of code that
checks if a given attribute exists in your JSON document.

It's not necessarily a problem, but it is something you need to consider.
Personally I much prefer RDBMS, because I actually like SQL and the ability to
make ad hoc queries easily. If JavaScript is advertised as the query language
for a database I'm not terribly interested, unless the pay off is extremely
high.

------
pbreit
Is NoSQL really all that much different than relational? At the end of the day
isn't a collection pretty much like a table? And you have to tie two
collections together with a common ID (ie, join)?

NoSQL is less rigid but your code pretty much has to be as rigid, right?

~~~
asimuvPR
My experience says that the issue is not differences, but bad architectural
decisions. Most applications out there are glorified CRUDs. They are perfectly
serviced by SQL databases. NoSQL fits a smaller set of systems whose scope is
mostly passing data around. PostgreSQL now supports JSON as a datatype and
allows querying it. Works beautifully. No need for MongoDb at all.

~~~
eyelidlessness
My understanding, which is admittedly minimal, is that MongoDB provides some
conveniences around partitioning and distributing your dataset that Postgres
does not without additional work. Simply supporting JSON accommodates
MongoDB's approach to schemas (not having them; er, pushing constraints to
application code), but doesn't address the distribution concern.

How well does MongoDB support this need, and how hard is it to achieve
equivalent results with Postgres? What are the drawbacks?

~~~
elmigranto
Side note: you can have constraints, `check`s, indexes and triggers on `json`
and `jsonb` columns in Postgres. In fact, you can have everything other data
types have. So "no schema" or "moving schema to app level" doesn't have to be
the case, when using PG.

~~~
bonesmoses
Exactly. With Postgres, you can add CHECK constraints to enforce specific
keys, with any level of nesting desired. So an INSERT could reject a document
for having missing fields, fields of improper type, etc. It all depends on how
strict you want to be.

------
petetnt
> While it’s in our best interests to keep similar documents restricted to
> certain collections, there’s no kind of constraint that enforces this.

However, if you need such a thing, there's always things such as Mongoose[1]
that can do this trivially for you.

[1]
[https://github.com/Automattic/mongoose](https://github.com/Automattic/mongoose)

~~~
elmigranto
But you still can insert non-consitent data into the database with CLI, or
otherwise. And `default` values defined inside Mongoose is no good when
filtering or searching documents.

There are plenty more problems with emulating schema in app code (Mongoose or
anything else), so it's not that trivial.

------
Kiro
I'm building a Node.js app where I was thinking of using MongoDB but I'm
reconsidering it after reading the comments here.

The database's only job is to keep persistence between server restarts. On
server start I fetch the a collection from the database and populate a
variable with it, which I use for any direct access.

When a user updates something it will update the variable as well as updating
the collection in the database.

Something like this:

    
    
        var foo = readFromDB();
    
        socket.on('connection', function() {
            this.emit(foo);
        });
    
        socket.on('update', function(data) {
            foo[data.id] = data.val;
    
            updateDB(data.id, data.val);
        });
    

(In my real application each collection has sub-arrays etc and not just key-
value like in the example.)

What should I use instead of MongoDB for this? Or is my approach fundamentally
flawed?

~~~
orf
What's wrong with using a plain file?

~~~
Kiro
It would probably work but in my real application it's a bit more complicated
since I only want to update a sub-array of the document, like this:
_collection[ 'some_identifier']['items'][3] = 123_.

I guess using a plain file would mean I have to overwrite everything on each
write which seems wasteful.

~~~
orf
Sure, it depends on your use case though. For infrequent updates I don't think
it's wasteful to just overwrite the file, it seems less wasteful to me than
running a whole database. But it all depends on your setup :)

Also I think updating a sub array in a collection triggers a whole write to
that collection in mongodb. In any case I would avoid mongodb if you can, it
seems redis might be a better fit for you.

------
raverbashing
Good to see an article with a fair comparison and not just "anything is worse
than PostgreSQL" mentality

~~~
elmigranto
Article is more an introduction of Mongo to Postgres users, rather than
comparison.

And yeah, anything's better than Mongo :)

~~~
Lio
I note the smillie but could you actually expand on that comment for someone
who is only really familiar with relational databases?

I've recently become interested in using PouchDB (a CounchDB implementation)
for syncing offline content between a user's browser and server where the
content is based on a set of fixed documents. I believe that MongoDB can
provide similar replication tricks. Once on the server I would expect to
extract the data from the documents into a relational warehouse database
(probably PostgreSQL as it's one of the databases I know best).

A document store _seems_ like the right tool for the job but could expand on
why something like Mongo or Couch is a bad idea for situations where you're
just storing/syncing documents?

~~~
elmigranto
> why something like Mongo or Couch is a bad idea

Couch is fine, other NoSQL is fine. It's just Mongo I don't like:

\- weak, if any, guarantees on data consistency and safety;

\- poor API. If you need anything more complicated than `findOne`, there are
at least 3 unrelated syntaxes with major differences that are not compatible;

\- performance is not what marketing team says it is. Even at home JSON turf,
Postgres is pretty comparable.

If you have data and queries that will benefit from document store, go for it.
But don't choose only based on marketing.

There are many more rant on the web, I like this one:

[http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-
ne...](http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-never-ever-
ever-use-mongodb/)

------
eljulio
One missing point Is the scalability. Postgres is awesome until you need to
setup a db cluster.

~~~
segmondy
Very few organizations in the world truly need to scale out their database.
Their problem is usually that of a bad design. For those that need to scale
out, NoSQL is not a magic solution either. The scaling out solution pretty
much these days for SQL database is replication, sharding and partitioning
tables. Postgres does these very well.

~~~
threeseed
You might want to learn a bit about "big data".

Almost every major organisation will have a big data analytics program doing
feature extraction, modelling, machine learning etc. Usually this starts with
Hadoop/Spark with data in HDFS but then a point comes when you want this in a
database.

PostgreSQL is awful at this which is why almost no uses it in this space.
Apart from the lack of native drivers it is simply too difficult to do basic
clustering. And the fact that it isn't built into the product doesn't give you
a lot of confidence that it is (a) going to work and (b) is going to be
supported.

And yes for those that scale out NoSQL is a magic solution. That's why they
are so popular. I can deploy a 40 node Cassandra cluster in 30 mins and
guarantee it works. Likewise MongoDB replica sets are ridiculously easy.

~~~
collyw
Maybe you could just design your data a bit better with smaller data types,
and keep it all one one machine.

------
IshKebab
Why does it take 110ms for Postgres to count the rows in a table? That seems
like an insanely long time.

~~~
cpkpad
I would guess two issues:

1\. In many cases, there is a ~30ms floor on operations if a server is non-
local. That's true of virtually all services on AWS -- whether SQS, SNS, SES,
RDS (MySQL/PostgreSQL), etc.

2\. In many cases, rows are missing. A table has a million rows. Ten are
deleted. It's not compacted. So it's not a simple size lookup. A lot of this
depends on how the data is stored/indexed. That's something which is naively
pretty slow, but easily optimized.

------
franzwong
I tried to scale Postgres few months ago but I found that it is quite tedious.

~~~
forgotpwtomain
> I tried to scale Postgres few months ago but I found that it is quite
> tedious.

If your application is big enough that you really need to scale postgres, I
assure you, you will have a whole _lot_ of tedious problems - not just
sharding.

