
PostgreSQL Outperforms MongoDB in New Round of Tests - wslh
http://blogs.enterprisedb.com/2014/09/24/postgres-outperforms-mongodb-and-ushers-in-new-developer-reality/
======
krylon
I never really "got" the new wave of NoSQL databases. Mongo seemed to be the
one I could most easily wrap my head around, but still.

I was never sure, though, if that meant I had never faced a problem suitable
for one of these DBMSs or if my mind is just so warped by years of using
relational engines (mostly Postgres, or SQLite for simple projects) that I
could not think of modeling my data any other way.

Recently though, I had to get familiar with the database schema of the ERP
system we use at work, plus some modifications that have been done to it over
the years, and it kind of feels to me like somebody was trying force a square
peg through a round hole (i.e. trying to model data in relational terms,
either not fully "getting" the relational model or using data that simply
refuses to be modeled in that way).

I sometimes think the people who wrote the ERP system might have enjoyed a
NoSQL DBMS. Then again, with a multi-user ERP system, you <i>really</i> want
transactions (personally, I feel that ACID-compliant transactions are single
most useful benefit of RDBMS engines), and most NoSQL-engines seem to kind of
not have them.

~~~
romanovcode
It is for storing huge number of data with dynamic keys.

For example when site admin says "I want new archive that I can fill with
items, items will have Id (automatically), Name (string), IsMale (bool)". He
also want to do complex queries on this data as well. That's where NoSQL comes
to help.

And to answer why exactly MongoDb is so popular - it's because it has awesome
driver support for every popular language.

I don't understand what's so hard to understand here. It's a simple solution
to EAV/nulltable nightmare.

~~~
krylon
Thanks for the explanation.

Quite frankly, no application I have ever worked on has had to deal with
"huge" amounts of data by any common definition (a couple of gigabytes at the
most).

And like I did say, looking at our ERP system's database I am beginning to
understand the appeal of a database without a fixed schema. Some of the tables
have dozens of columns, with most of the rows being full of NULL values. So I
do get that part, but no application I have ever worked on was like that.

~~~
yummyfajitas
_Some of the tables have dozens of columns, with most of the rows being full
of NULL values._

This is generally addressed in a relational design with a star schema. First
create a dimension table:

    
    
        CREATE TABLE person (
          id BIGINT PRIMARY KEY NOT NULL
        )
    

Then create fact tables:

    
    
        CREATE TABLE person_name (
          person_id BIGINT REFERENCES person(id) UNIQUE,
          name VARCHAR(128) NOT NULL
        )
    
        CREATE TABLE person_bank_details (
          person_id BIGINT REFERENCES person(id) UNIQUE,
          bank_detail ....
        )
    

This avoids large numbers of rows containing nulls, but it violates a normal
form. The mnemonic is that the table must contain the Key, the whole key, and
nothing but the key, so help me Codd. Anytime you have a "REFERENCES table(pk)
UNIQUE", you violate the "whole key" bit.

------
tete
Did I miss something? MongoDB was never ever faster than Postgres. That's
nothing new. Most of these things are clear when one reads the MongoDB docs:

MongoDB stores Metadata, (nearly) uncompressed on a per document basis, so of
course it uses way more diskspace. It doesn't store the data in any efficient
way either.

Also it's pretty much unoptimized, compared to Postgres which has been around
for a really long time so it's kinda slow.

Many functions in MongoDB are actually implemented in JavaScript, not C. So
that's also a factor, even when I guess it's not the big one here.

MongoDB has a lot of limitations that can really bite yo (document size even
though that's the smallest (gridfs), how you can do indices, even limitations
in how your query can look like, etc

The only thing that's good about MongoDB is that it's nice for getting
something up and running quickly and that it's a charm to scale (in many
different kinds), compared to PostgreSQL. If PostgreSQL had something built
in(!) coming at least close to that (and development has a strong focus there)
it would be perfect.

For all these reasons many companies actually have hybrid systems, because
sometimes one thing makes sense and sometimes the other.

The benchmark seems strange, cause there are many SQL and NoSQL databases that
are faster and that's a kinda well-known fact. I think everyone who ever had
to decide on a database system has known that, even without a benchmark.

This makes it kinda look like an advertisement (look at the company behind the
blog).

Using PostgreSQL 9.3 with JSON for a while now and it's great. Also I know it
is possible to scale PostgreSQL and it's really nice. Still a lot more
complexity involved (again, depending on the use case).

Just use the right tool and please let's stop with such shallow comparisons,
because I think it kinda harms the reputation of database engineers and system
architects - and the authors of such comparison. When you look for real
comparisons and example use cases, typical patterns or just some help one
always stumbles across these things and they tend to quickly be out of date
too, cause all well-known databases have a lot of active development going on.

~~~
Nitramp
Is MongoDB's distribution and scaling story really nicer? A cluster story
that's easy to set up but then doesn't actually work (loses data, fails in
potentially catastrophic ways) sounds not all that useful.

[http://aphyr.com/posts/284-call-me-maybe-
mongodb](http://aphyr.com/posts/284-call-me-maybe-mongodb)

~~~
chaostheory
Mongodb is easy to setup as a single instance, but it is definitely no easier
to setup for horizontal scaling compared to alternatives. I'm still not aware
of anything that's both simple to setup and maintain for horizontal scaling in
practice (not theory).

~~~
pas
Cassandra is, but you're forced into a very simplistic data (sharding) model.
(And probably Riak is too, but haven't used it.)

~~~
chaostheory
Cassandra is easier to scale than mongo, and yeah Riak is even easier, but
they're still not easy unless you don't care as much about consistency and you
can wait, and if your app is more about writes than reads.

------
gregwebs
This benchmark misses the entire point of MongoDB: that you can atomically
update individual fields in the document.

Thas has not been possible with Postgres json storage type. Instead, the
entire JSON blob must be read out, modified, and inserted back in.

This reality is well known to those that understand Postgres, which is why
they have HStore. HStore is limited though (particularly to the size of the
store), so there is work underway to make it more competitive with MongoDB.

So now they are also releasing a jsonb (b for binary) storage format, which
looks promising, but I can't find any information on exactly what its features
are. I would love to actually see a benchmark comparing field updates, but
this benchmark is not it.

MongoDB is a database with trade-offs, downsides, and more crappy edge cases
then MySQL, but it does exist because at its core it allows data modeling that
traditional SQL databases are lacking.

MongoDB has first class arrays rather than forcing you to do joins. It
supports schema-less data, which is rarely useful, but when you need it can be
very useful. It can do inserts and count increments very quickly (yes the
write lock means you eventually have to put collections in separate
databases), which is also useful for certain use cases.

~~~
yangyang
The improvements to hstore (primarily to make it hierarchical and add array
support) ended up becoming jsonb.

~~~
gregwebs
That is really exciting! I am going to take a serious look at jsonb and using
Postgres now, although FoundationDB has caught my eye with its table group
functionality:
[https://foundationdb.com/layers/sql/documentation/Concepts/t...](https://foundationdb.com/layers/sql/documentation/Concepts/table.groups.html)

~~~
yangyang
Table groups look rather like hierarchical databases [1], which were the norm
before relational databases came along.

Do bear in mind that you'll never get MongoDB-style in-place updates in
PostgreSQL, due to MVCC. You may save a round-trip with the entire JSON object
once they implement update operators, though.

[1]
[http://en.wikipedia.org/wiki/Hierarchical_database_model](http://en.wikipedia.org/wiki/Hierarchical_database_model)

------
hnhipster
This is very interesting. Around when MongoDB was quickly becoming the cool
thing to do I'd ask people about why it is better than just storing things in
Postgres. People would have answers that would be grammatically correct but
would not make any sense.

That being said, I find it weird that now it is cool to make fun of MongoDB.
Some people on this thread have even said they want to know if a service is
using MongoDB and they'd not use that service. I am pretty sure they'd be all
over Stripe (who store your money related stuff in MongoDB) in a different
thread.

~~~
xorcist
> Stripe (who store your money related stuff in MongoDB)

That's a bit scary. There has been several successful attacks against virtual
currency exchanges that use MongoDB, utilizing the eventual consistency to
your advantage.

If you handle money, you don't want any inconsistencies in your database, no
matter how temporary. You can work around these of course but you really need
to know what you're doing.

~~~
est
> against virtual currency exchanges that use MongoDB, utilizing the eventual
> consistency

against _solemnly_ depending on MongoDB's eventual consistency alone.

------
xtrumanx
Only problem I have with JSON on Postgres is you can't update a property of a
JSON object like so:

    
    
        update table set jsonCol->propertyA = 42;
    

You need to write an extension for that. Easiest to do so using Python but
sadly Heroku doesn't support python on postgres since its unsafe.

~~~
mm_throwaway
Please correct me if I'm wrong, but isn't updating JSON properties one the big
improvements of the soon-to-be-released 9.4 version?

~~~
rpedela
Not until 9.5 at the earliest.

[http://www.postgresql.org/message-
id/534C4225.6020600@agliod...](http://www.postgresql.org/message-
id/534C4225.6020600@agliodbs.com)

------
tszming
MongoDB work well ONLY if indexes (and working set) fit in the memory. What
are the indexes size in the benchmark? (I doubt as I see you are running a
145GB database on a 32GB instance)

[http://docs.mongodb.org/manual/tutorial/ensure-indexes-
fit-r...](http://docs.mongodb.org/manual/tutorial/ensure-indexes-fit-ram/)

~~~
brey
once you start page faulting and hitting spinning disk it's game over for any
database's performance, postgres included.

~~~
Nitramp
... but that usually doesn't mean you have to have your full data set in
memory, only the hot parts. And even if so, one or two ~10 ms seeks during a
query aren't that terrible.

~~~
brey
mongodb recommends that your working set lives in memory, not your entire
database - it's generally much cheaper to add more RAM than code in any case.

a few seeks aren't terrible for small/medium applications, but when you're
asking for thousands of queries a second any disk access is bad news.

------
jvehent
I have been using JSON types in postgres 9.3 for several weeks now. It's nice.
It just works. I love being able to mix relational logic on static columns
with undefined documents on dynamic columns, all in the same tables.

------
bshimmin
I'm sort of struggling to imagine anyone really using MongoDB at all in a
couple of years.

But then again, plenty of shops still use MySQL (and one of my clients uses
DB2...).

~~~
yangyang
What's wrong with DB2? I know it's commercial and a bit arcane in areas, but
it's a far more powerful database than MySQL.

~~~
BjoernKW
It depends. DB2 comes in several editions. The DB2 that IBM mostly sells
nowadays is ok but there's also DB2/400, which -as the name suggests - comes
with all sorts of quirks and funny mainframey limitations.

------
jedisct1
How does it compare to TokuMX?

I am a huge fan of PostgreSQL, but in order to avoid having to rewrite an
application designed for Mongo, I tried TokuMX and was pleasantly surprised by
its performance.

~~~
RyanZAG
Going off [1] it looks like TokuMX is a similar multiple faster than Mongo
than Postgre is from Mongo, so I'd guess Postgre and Toku would be fairly
similar speed wise. Obviously depending on your data and search patterns one
may still be significantly faster than the other as the usual caveat.

[1] [http://www.databasejournal.com/sqletc/tokumx-compared-to-
mon...](http://www.databasejournal.com/sqletc/tokumx-compared-to-mongodb-
using-a-specific-common-use-case.html)

~~~
yangyang
Nitpick: PostgreSQL, or Postgres for short. Never "Postgre".

------
dschiptsov
Since when MongoDB is considered as a database comparable with PostgreSQL?)

~~~
wmf
That's not really relevant, since the point is that Postgres can do what
MongoDB does but faster.

~~~
threeseed
Except it doesn't. It just has a JSON data type. As does many other databases
e.g. Oracle, Teradata.

People are switching to MongoDB because the developer and deployment
experience is so good.

~~~
Argorak
Hate to say so, but many are switching to MongoDB because it's the current new
kid on the block.

I found that many companies have incredibly bad, not use-case driven reasons
to pick their DB. Not that MongoDB is a bad database, but it has it's fair
share of issues. Granted, this is the case for every database, but if you pick
any database without being aware of those, you end up in a world of pain.

(Context: I consult for backend systems in general)

~~~
davidw
That was my only experience with MongoDB - "let's try this, for this
experimental project, to see what it's like", without taking into account its
characteristics.

After that the project grew, I was brought in and needed to do some reports,
which were a huge pain in the ass without proper SQL and joins.

I don't intend to have anything to do with people using MongoDB unless they
have a really, really good reason for doing so, and it's used alongside other
databases.

~~~
bostik
A friend of mine came up with the perfect use-case for MongoDB.

His company installs and monitors sensors in civic infrastructure: roads,
train tracks, bridges, ... These sensors provide constant data streams which
need to be stored somewhere before processing. Any raw data older than 2 weeks
is worthless, and if 2-3% of data is lost before written it's not much of a
problem. More data is coming in all the time any way.

For them MongoDB is the perfect transient cache. Sensor data is processed and
the results stored elsewhere. Easy to expand. Flipping between installations
is just a matter of toggling load balancer, an once an offline cache has been
processed, it can be nuked and put back as a fresh system.

So, for a setup where easy size expansion, fast mostly-reliable writes and
ease of use are the primary design constraints, mongo fits in suprisingly
well. It's a fascinating use-case.

~~~
Argorak
From what that sounds like, I agree. This is a complex problem description
with a pointer where the database provides solutions.

------
jeremylevy
Any articles coming from a company which describes itself as "The Postgres
Database Company" and sells "a relational database management system based on
PostgreSQL..." [1][2] stating than "PostgreSQL outperforms X" is de facto
suspicious.

[1]:
[https://www.google.fr/?q=enterprisedb#q=enterprisedb](https://www.google.fr/?q=enterprisedb#q=enterprisedb)

[2]:
[http://en.wikipedia.org/wiki/EnterpriseDB](http://en.wikipedia.org/wiki/EnterpriseDB)

~~~
yangyang
They employ several of the main developers of community PostgreSQL and also
sell commercial versions of the database.

Why does that make this article suspicious? Obviously they stand to benefit
from more people using PostgreSQL, but they've been exceedingly clear with
their methodology - their benchmark is available on Github:
[https://github.com/EnterpriseDB/pg_nosql_benchmark](https://github.com/EnterpriseDB/pg_nosql_benchmark).

------
ranman
Why does the graph read 2.4 when they say they're testing 2.6?

------
yangyang
These are the slides from an interesting presentation by one of the people
that did a lot of the work for the new jsonb type in 9.4, and associated index
improvements:
[http://www.sraoss.co.jp/event_seminar/2014/20140911_pg94_sch...](http://www.sraoss.co.jp/event_seminar/2014/20140911_pg94_schemaless.pdf)

Also contains some comparisons with MongoDB.

------
dang
Url changed from [http://developers-
beta.slashdot.org/story/14/09/26/1330228/p...](http://developers-
beta.slashdot.org/story/14/09/26/1330228/postgresql-outperforms-mongodb-in-
new-round-of-tests), which points to this.

------
larry_the_lover
bored, bored, bored....good luck trying to shard postgres! idiots

~~~
collyw
I'll keep the data types small, so it takes a fraction of the disk space and
doesn't need sharding.

------
threeseed
Will be great if PostgreSQL ends up as a storage engine option in MongoDB like
InnoDB. Then you have the better engine of PostgreSQL with the superior
clustering, sharding and end user experience of MongoDB.

That said these articles are pretty pointless. Performance isn't the reason
companies are switching to MongoDB.

~~~
jakozaur
How about implementing MongoDB API on top of the PostgreSQL? Then sell it
companies which started with MongoDB, but run into some
scalability/reliability issues.

~~~
yangyang
Someone's already done a proof of concept of this, see
[https://github.com/JerrySievert/mongolike](https://github.com/JerrySievert/mongolike)

~~~
yangyang
This is the thing I was actually thinking of when I posted the above. This is
a background worker (runs within PostgreSQL) speaking the mongo wire protocol:
[https://github.com/umitanuki/mongres](https://github.com/umitanuki/mongres)

Again, very basic PoC.

