Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL Outperforms MongoDB in New Round of Tests (enterprisedb.com)
338 points by wslh on Sept 27, 2014 | hide | past | web | favorite | 160 comments

I never really "got" the new wave of NoSQL databases. Mongo seemed to be the one I could most easily wrap my head around, but still.

I was never sure, though, if that meant I had never faced a problem suitable for one of these DBMSs or if my mind is just so warped by years of using relational engines (mostly Postgres, or SQLite for simple projects) that I could not think of modeling my data any other way.

Recently though, I had to get familiar with the database schema of the ERP system we use at work, plus some modifications that have been done to it over the years, and it kind of feels to me like somebody was trying force a square peg through a round hole (i.e. trying to model data in relational terms, either not fully "getting" the relational model or using data that simply refuses to be modeled in that way).

I sometimes think the people who wrote the ERP system might have enjoyed a NoSQL DBMS. Then again, with a multi-user ERP system, you <i>really</i> want transactions (personally, I feel that ACID-compliant transactions are single most useful benefit of RDBMS engines), and most NoSQL-engines seem to kind of not have them.

I think people mix up three things:

(1) Transactional model. Many NoSQL databases are non-ACID, but others are (Google's stores all have some transactional guarantees). Some databases try to gain efficiency by relaxing their transactional guarantees, some probably just didn't get around to implementing a proper transactional system yet.

(2) Data model. The relational models can be overly restrictive as you cannot easily represent contained, repeated elements in an object without ending up in a crazy join smorgasbord. Note that that doesn't mean you have to be dynamically typed.

(3) Distribution/sharding/clustering. RDBMSes are traditionally single machine, and getting them to cluster is usually a huge source of pain. NoSQL databases are often built from the ground for sharding.

I think people go for MongoDB mostly for (2) and ease of use. Very few people have an actual big data problem where you really need (3), and for reliability there are simpler solutions (hot standby). (1) makes it so much easier to build reliable systems that it'd be a real deal breaker for me.

I personally don't understand why so many people go for NoSQL, seems to me like that creates a substantial cost, both for performance, but more importantly missing the transaction guarantees, with no real benefit, at least none that's obvious to me. MongoDB with its unapplied writes, no real transactions, but no real distribution story seems like an odd choice in particular.

> ... contained, repeated elements in an object

Pardon me if I'm just sniping on the word "object" here, but if you think of your data as objects then you will find the relational model restrictive.

In my experience, objects are an application concept, closely coupled to an implementation. If you can conceive of your data in implementation-independent terms, i.e. as entities and relationships, then you can put a RDBMS to effective use.

You don't need big data for (3), we use mongo sharding primarily for spreading database write load. Not saying that it's the best way to do that but it's what we use it for and I doubt we're the only ones.

I see where you are coming from, and my opinion is mostly the same. However, I can see a use case for relaxing constraints in a distributed scenario, as described by the CAP theorem.

In a distributed scenario, when a partition event occurs, relational databases opt for consistency, whereas nosql opts for availability. This is formally correct behaviour by relational dbs, but comes with a cost. The cost is a serial performance component, that can't be parallelized. Nosql DBs (most of them), in this scenario, go for availability, and may thus eschew some consistency guarantee tasks, with a cost in data consistency and an advantage in parallel performance.

The trick, as ever, is to use each tool in its function. Nosql and relational dbs are wholly different tools, for wholly different problem classes. Using nosql where consistency is paramount irks me to no end, and that is the case 80% of the time I see people using nosql. On the other hand, in specific cases, nosql DBs are a new useful tool in my arsenal.

It is for storing huge number of data with dynamic keys.

For example when site admin says "I want new archive that I can fill with items, items will have Id (automatically), Name (string), IsMale (bool)". He also want to do complex queries on this data as well. That's where NoSQL comes to help.

And to answer why exactly MongoDb is so popular - it's because it has awesome driver support for every popular language.

I don't understand what's so hard to understand here. It's a simple solution to EAV/nulltable nightmare.

We've redone our product catalog for a website using a NoSQL solution ( not MongoDB, but we did look at it). Our products are in multiple different categories, and have vastly differently attributes depending on categories. NoSQL solutions are perfect of this. As you point out it's a simply alternative/solution to deploying an EAV model.

I've only seen EAV used in one system, Magento, but was a disaster. It's complex and slow to the point that very product in stored both in the EAV model and as a "flattened product".

For systems dealing with sales and economy in general I would almost alway pick a RBDMS, it's seems a much more natural fit. The ability to do ad-hoc queries in SQL, rather that map-reduce is a huge advantage.

Thanks for the explanation.

Quite frankly, no application I have ever worked on has had to deal with "huge" amounts of data by any common definition (a couple of gigabytes at the most).

And like I did say, looking at our ERP system's database I am beginning to understand the appeal of a database without a fixed schema. Some of the tables have dozens of columns, with most of the rows being full of NULL values. So I do get that part, but no application I have ever worked on was like that.

Some of the tables have dozens of columns, with most of the rows being full of NULL values.

This is generally addressed in a relational design with a star schema. First create a dimension table:

    CREATE TABLE person (
Then create fact tables:

    CREATE TABLE person_name (
      person_id BIGINT REFERENCES person(id) UNIQUE,
      name VARCHAR(128) NOT NULL

    CREATE TABLE person_bank_details (
      person_id BIGINT REFERENCES person(id) UNIQUE,
      bank_detail ....
This avoids large numbers of rows containing nulls, but it violates a normal form. The mnemonic is that the table must contain the Key, the whole key, and nothing but the key, so help me Codd. Anytime you have a "REFERENCES table(pk) UNIQUE", you violate the "whole key" bit.

The nice thing about using an RDBMS with JSON support, rather than a NoSQL solution, is that you can store all the fixed-schema stuff in column as usual, and benefit from the performance, consistency, ease of joins and so on with that, but you can also store your JSON documents alongside that data in the same table, efficiently indexed.

Yes, but what happens if your JSON data size grows so large that it can't fit on a single machine? Multi-master replication or sharding is a terrible pain in any RDBMS (at least according to my research and trials).

At the end of 2013, Stack Overflow worked on one SQL server (plus a redis server for caching). The rest of Stack Exchange runs on another SQL server.[0]

For the most part, for most projects, worrying about multi-master replication is going to be pointless. You can always put some data in a distributed K/V (or document) store and point to that from your SQL if you need to.

[0] http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-s...

What I meant was a data-size that was too large for a machine. Adding arbitrarily large JSON to your table could expand the data-size to be too big for one machine, or even too big for block storage. Plus you might not want it all in block storage. My point is that an RDBMS can be better served storing the relational data alone with a separate DB Engine for the potentially massive JSON data.

How many use cases really exceed a single, multi-terabyte machine? Your argument is true for any data type and format not just JSON. If you have more than a few terabytes then you have "big data" and most solutions, including Mongo, probably won't work.

You're storing exactly the same data whatever you're storing it in. The point is that there's rather few use cases where your primary database is going to be more than a few terabytes, which is usually easy to handle with a single machine + some caching. I pointed to Stack Overflow as an example of a large site which still manages to keep its entire database on one machine.

Why does the format the data is stored in matter more than the data itself? JSON or columns in rows - it's not fundamentally different.

It has nothing to do with the format. I'm using JSON as an example of something that could be large. I'm only talking about separating the "could become massive" columns/data/whatever from the "small, relational data".

Sure, but how many people genuinely have data that big?

Well one thing we store is HTML content and "MS Word" like document data. We also store hundreds of revisions for all of those documents. I wouldn't want to use an RDMBS for this because (a) its not relational, but also (b) backup/replication/load distribution would be too painful. A system like CouchDB can be spread out over n-machines, any of them write-capable.

This is exactly what SharePoint does on top of boring, battle tested SQL Server.

> I never really "got" the new wave of NoSQL databases.

It's easy: Overzealous DBAs who insist in normalization at all costs.

A new technology allows developers to try new approaches to the challenges, by sidestepping those DBAs.

So, a technological solution to a political problem.

Now I can understand where that people are coming from.

If I hadn't seen so many dev teams eschew constraints and triggers in favor of broken client code being allowed to screw up the data, I would agree to the political element easily. But in practice, the ignorance is so great, I can't even be sure there is a political choice being made.

Did I miss something? MongoDB was never ever faster than Postgres. That's nothing new. Most of these things are clear when one reads the MongoDB docs:

MongoDB stores Metadata, (nearly) uncompressed on a per document basis, so of course it uses way more diskspace. It doesn't store the data in any efficient way either.

Also it's pretty much unoptimized, compared to Postgres which has been around for a really long time so it's kinda slow.

Many functions in MongoDB are actually implemented in JavaScript, not C. So that's also a factor, even when I guess it's not the big one here.

MongoDB has a lot of limitations that can really bite yo (document size even though that's the smallest (gridfs), how you can do indices, even limitations in how your query can look like, etc

The only thing that's good about MongoDB is that it's nice for getting something up and running quickly and that it's a charm to scale (in many different kinds), compared to PostgreSQL. If PostgreSQL had something built in(!) coming at least close to that (and development has a strong focus there) it would be perfect.

For all these reasons many companies actually have hybrid systems, because sometimes one thing makes sense and sometimes the other.

The benchmark seems strange, cause there are many SQL and NoSQL databases that are faster and that's a kinda well-known fact. I think everyone who ever had to decide on a database system has known that, even without a benchmark.

This makes it kinda look like an advertisement (look at the company behind the blog).

Using PostgreSQL 9.3 with JSON for a while now and it's great. Also I know it is possible to scale PostgreSQL and it's really nice. Still a lot more complexity involved (again, depending on the use case).

Just use the right tool and please let's stop with such shallow comparisons, because I think it kinda harms the reputation of database engineers and system architects - and the authors of such comparison. When you look for real comparisons and example use cases, typical patterns or just some help one always stumbles across these things and they tend to quickly be out of date too, cause all well-known databases have a lot of active development going on.

Is MongoDB's distribution and scaling story really nicer? A cluster story that's easy to set up but then doesn't actually work (loses data, fails in potentially catastrophic ways) sounds not all that useful.


MongoDB clustering is nothing short of disastrous.

If anything I hope databases like RethinkDB and even multi-master PostgreSQL if we ever see it learn from the most crucial mistake here - rolling your own consensus. (This goes for databases other than MongoDB, I don't meant to single it out here)

Stick to proven algoritms, ZAB, Paxos, Raft.

Mongodb is easy to setup as a single instance, but it is definitely no easier to setup for horizontal scaling compared to alternatives. I'm still not aware of anything that's both simple to setup and maintain for horizontal scaling in practice (not theory).

Cassandra is, but you're forced into a very simplistic data (sharding) model. (And probably Riak is too, but haven't used it.)

Cassandra is easier to scale than mongo, and yeah Riak is even easier, but they're still not easy unless you don't care as much about consistency and you can wait, and if your app is more about writes than reads.

It depends on your workload, but MongoDB has worked extremely well for us, we've never lost data. I love aphyr's posts on databases, but you can't use them as a blanket for everything.

Just because you haven't had a failure yet, it doesn't make the test any less true.

Three years? Numerous customers? That's good enough for me so far.

C'mon you know that isn't fair reasoning, you could say the same about running backups. Just because someone hasn't seen the need for backups for three years after numerous customers doesn't mean thats a good enough reason to not keep backups.

I'm not saying you shouldn't use Mongo, but to say Aphyr's assessment of database shouldn't be considered in all deployments isn't wise.

I'm not saying Aphyr's work isn't useful, it's the right tool for the job. For our needs and uses, it's been spectacular. I want to use Postgres, but it's HA isn't something I'm willing to expose to my customers. So I'm actually using MongoDB in anger ... and it's doing spectacular.

So let me leave off with our use case. We have few writes, little need for sharding (our largest customers run on a single node and keep the hot data in memory), we use acknowledged writes (MongoDB can be journaled you know), and our customers are willing to have three + nodes in an HA scenario. The HA is simple to configure. So far in three years we have no data loss.

What more information can I provide to make up for the downvotes?

As an aside, downvotes to me should be used for those that contribute nothing to a conversation. If you disagree with what someone says, state your disagreement so that we all, including me, can benefit from your better experience.

Have you actually tested a net split situation, or a crash, or what it'd take to recover?

Yep, sure have. And you still need backups of your data, that's just good sense.

Nitramp, I'm sorry you disagree with my experience. But instead of downvoting, why don't you contribute to the conversation with your own experience? At least my experience is taken from the real world and not from something I read on the internet.

Ok, granted, my experience is an anecdote, but downvoting me without disputing what I'm saying? Pure cowardice. At least counter with your own anecdote, don't downvote. Or let me know why I'm being downvoted, simply disagreeing shouldn't result in a downvote, that just kills the conversation.

"Just use the right tool and please let's stop with such shallow comparisons"

A lot of people may have thought, prior to this benchmark, that MongoDB was the right tool for all high performance JSON-related tasks.

> it's a charm to scale (in many different kinds), compared to PostgreSQL. If PostgreSQL had something built in(!) ...

This! A million times! I agree 100% that Postgresql is better than MongoDB in every way except in ease of replication (that's really my only need) for HA. We needed an embedded database in our product and I wanted so bad to use Postgres, but we needed HA and we needed our customers to be able to set it up. This was so important to us that we took the otherwise inferior solution.

Now, that said, I bring up the replication/HA issue every time one of these "postgres is better than mongodb" articles comes up. The last time I posted a Postgres developer responded to me that they're laying the groundwork for a good answer and that it's coming. I can not WAIT for that day!

Streaming replication is really not hard to set up in PostgreSQL. It is cluster-level; if you want to replicate a single database you'll need to use a 3rd party solution like Slony or Burcado (logical replication built on features in 9.4 will improve this).

2ndQuadrant have developed Bi-Directional Replication for 9.4 (http://2ndquadrant.com/en/resources/bdr/, https://wiki.postgresql.org/wiki/BDR_User_Guide) based on the aforementioned logical changeset support in 9.4.

I think your comment summarizes why this stuff so hard on traditional SQL; one doesn't even know where exactly to start. In MongoDB, you just turn on replication according to http://docs.mongodb.org/manual/replication/, which means basically adding one directive to mongod.conf and typing in the IP addresses of the nodes. MongoDB handles everything else for you. I'm not claiming that it's better or reliabler, just that it's really easy to set it up in 5 minutes.

I would suggest that if you've reached the point where you need streaming replication (rather than, say, a cron job uploading database dumps to s3), you've reached the point where you need to do your research.

You do have a point. PostgreSQL undoubtedly has many more knobs and do you do end up having to learn a fair bit to do stuff like replication.

I'd rather spend more time understanding things up-front and have a reliable solution rather than flick a switch and have something which initially works but I'm not too confident in. That's fine for initial development I suppose, but not in production.

It only takes an hour or so to google and read up on what the different ways of replicating data are in PostgreSQL, their advantages and disadvantages.

Are we really that scared of 'research'?

I get the feeling that a sizeable proportion of NoSQL users don't have much knowledge of relational databases at all, never mind replication. If you just want to persist some JSON data you already have, I suppose it is quite a large leap to start reading about relational algebra, ACID, undo/redo logs, SQL, and so on.

> Are we really that scared of 'research'?

You've missed the main point of why we use MongoDB. We use it embedded in our product and it's up to our customers to configure HA if they need it. Sure, we document the process for configuring it, but the simpler it is, the less likely a customer is going to have a problem with it.

I would agree with you if it were a database that we maintained in-house, yeah, it's certainly doable. But my main point is that this has to be done in the field at customers who are just trying to use our product.

You are absolutely right, it is as simple as adding the replSet name to your configuration, rs.init(), rs.add() for each other host that will participate in the replica set. Exactly the point.

It depends on what your measuring. I work on a project that gained significant speed improvements moving to MongoDB simply due to packing and unpacking the data. MongoDB BSON serialization was way faster than the ORM when building up the data objects after fetching from the database. In our workload that operation actually dominated the workload not IO.

This article is interesting because Postgres document store options may make it competitive to us when compared to MongoDB.

The entire narrative behind MongoDB in the first chapters of its evangelism was performance, partly due to its document-oriented approach (things are quicker in some scenarios when you get rid of relations, and it was contrast against the many tables approach), which is something that you can only very recently do in pgsql, and partly due to implementation choices like the dangerous default lack of fsync.

Early evangelism for Mongodb was overwhelmingly one that hyped performance over all others. It was, somewhat infamously now, webscale.

So now pgsql (since 9.2, but vastly improved in 9.3) can also do the things that MongoDB does, better, if you want to do the document approach (which is a serious debate unto itself). That is news and is interesting.

As for scaling out, I would argue that 9.3 offers more realistic, robust options than MongoDB does.

I wonder if Node.js will follow the same fate...get a nice dose of reality check.

It is mistakenly evangelized as also being faster, leaner, scalable, more concurrent than anything out there. Some clients would pay extra to redo CRUD using Node.js because it's "Asynchronous and We wanna Pay For Speed and Scalability".

My understanding is that people pick node.js because of simplicity and not for speed.

Wow. That's interesting, I really did not find asynchronous programming easy, with basic things like hooking up to database requiring you to go deeper into a chain of callbacks, forcing you to rely on duct tape solutions like futures or whatever they use now to work around the limits of Javascript, or having to write stuff in Coffeescript.

But some devs swear by it, I just fail to see what the advantages are.

ASP & PHP servers don't have exceptional performance but they're 98% of the web

I'll pose the inappropriate question. Unless you are going to do it right (RethinkDB) why write a brand new database at all (MongoDB?)

money, of course. position yourself well and investors come a-callin. buy yourself a lambo.

This benchmark misses the entire point of MongoDB: that you can atomically update individual fields in the document.

Thas has not been possible with Postgres json storage type. Instead, the entire JSON blob must be read out, modified, and inserted back in.

This reality is well known to those that understand Postgres, which is why they have HStore. HStore is limited though (particularly to the size of the store), so there is work underway to make it more competitive with MongoDB.

So now they are also releasing a jsonb (b for binary) storage format, which looks promising, but I can't find any information on exactly what its features are. I would love to actually see a benchmark comparing field updates, but this benchmark is not it.

MongoDB is a database with trade-offs, downsides, and more crappy edge cases then MySQL, but it does exist because at its core it allows data modeling that traditional SQL databases are lacking.

MongoDB has first class arrays rather than forcing you to do joins. It supports schema-less data, which is rarely useful, but when you need it can be very useful. It can do inserts and count increments very quickly (yes the write lock means you eventually have to put collections in separate databases), which is also useful for certain use cases.

Postgres has first-class arrays too:


Can you atomically increase one element of that array by 2 without a big chunk of code?

Yes, see the array operations section of the docs.

Note that on the Github repo for this benchmark, the README mentions that:

"... later versions will include a complete range of workloads (including deleting, updating, appending, and complex select operations) and they will also evaluate multi-server configurations."

Any update in PostgreSQL will result in a new tuple being inserted, whether it contains json, hstore or anything else; that's the basis of multi-version concurrency control. It'll be the same deal with jsonb.

Not only that, the delta to update the page to contain the new tuple, and a copy of the full page the tuple is being written to (if it's the first change to that page in that checkpoint cycle) are written to the write-ahead log.

This is an excellent post. You've answered a question I've had for a while - are there any circumstances where MongoDB is the right tool for the job? You seem to be on point that Mongo is currently the best for frequent updates of JSON blobs.

Thank you.

I'll take issue with a couple of points, though. Postgres has arrays:


Rapid counting of increments can be done fairly easily with triggers. You can even write the trigger function in javascript if you like.

It does have arrays, but they aren't first class to the same extent as MongoDB.

* A Postgres Array can end up being stored elsewhere, whereas in MongoDB an array will be contained within the document

* I am also not clear on what exactly can be stuck inside an array (In MongoDB it can be an object that contains more arrays) while maintaining first-class access and updates.

I would love to find more detailed information on these points, but with jsonb they may be moot now.

I will look into the increment issue in PostgreSQL more carefully. I just know that it wasn't feasible in MySQL when I was using it.

"stored elsewhere" doesn't make sense to me. Arrays are part of the tuple as much as any other attribute type.

You can store anything in an array that you can store as an attribute of a row. There may be some edge cases I don't know about, but generally anything that can be a column can be an array element.

The improvements to hstore (primarily to make it hierarchical and add array support) ended up becoming jsonb.

That is really exciting! I am going to take a serious look at jsonb and using Postgres now, although FoundationDB has caught my eye with its table group functionality: https://foundationdb.com/layers/sql/documentation/Concepts/t...

Table groups look rather like hierarchical databases [1], which were the norm before relational databases came along.

Do bear in mind that you'll never get MongoDB-style in-place updates in PostgreSQL, due to MVCC. You may save a round-trip with the entire JSON object once they implement update operators, though.

[1] http://en.wikipedia.org/wiki/Hierarchical_database_model

Could you confirm that you can atomically update a single item inside a jsonb field without reading/writing all of the jsonb data? I know this is possible with hstore (but not 9.3's json type), but can not find a clear answer for jsonb in the documentation. Any pertinent links would be much appreciated!

You can't. Others in this conversation have pointed out this is targeted for 9.5, though it may available as an extension before that.

I'm pretty sure you can using a stored procedure written with plv8. Though that's not exactly a fluent and elegant natural solution

You're only avoiding putting the whole thing on the wire twice.

This is very interesting. Around when MongoDB was quickly becoming the cool thing to do I'd ask people about why it is better than just storing things in Postgres. People would have answers that would be grammatically correct but would not make any sense.

That being said, I find it weird that now it is cool to make fun of MongoDB. Some people on this thread have even said they want to know if a service is using MongoDB and they'd not use that service. I am pretty sure they'd be all over Stripe (who store your money related stuff in MongoDB) in a different thread.

> Stripe (who store your money related stuff in MongoDB)

That's a bit scary. There has been several successful attacks against virtual currency exchanges that use MongoDB, utilizing the eventual consistency to your advantage.

If you handle money, you don't want any inconsistencies in your database, no matter how temporary. You can work around these of course but you really need to know what you're doing.

> against virtual currency exchanges that use MongoDB, utilizing the eventual consistency

against solemnly depending on MongoDB's eventual consistency alone.

yeah, mongodb has flipped from hype to hatred without much in between.

The reason, I think, is that few people using it had a good sense of where the pain points would be. Many of us somehow imagined that because we didn't know where mongodb's limitations were, we wouldn't run into them. Cue frustration when you're bitten by some issue you didn't think about.

I do believe there are good uses for it. But they are pretty specialised, and I'd rather use a relational db for most things.

> yeah, mongodb has flipped from hype to hatred without much in between.

The hatred was always present, it's just that the hype has died down as the early adopters have (re)discovered the sharp corners in both architecture and implementation.

Decade ago it was the same story with mysql.

> People would have answers that would be grammatically correct but would not make any sense.

You mean like this?



Only problem I have with JSON on Postgres is you can't update a property of a JSON object like so:

    update table set jsonCol->propertyA = 42;
You need to write an extension for that. Easiest to do so using Python but sadly Heroku doesn't support python on postgres since its unsafe.

Just tried to google this, but couldn't find anything. Why is Python on Postgres considered unsafe?

Just because its a programming language. It can write to disk, open sockets, etc.

"As of PostgreSQL 7.4, PL/Python is only available as an "untrusted" language, meaning it does not offer any way of restricting what users can do in it. It has therefore been renamed to plpythonu. The trusted variant plpython might become available again in future, if a new secure execution mechanism is developed in Python. The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator. Only superusers can create functions in untrusted languages such as plpythonu."


Incidentally I've been merging my json in perl. It's automatically built into postgres, and the Hash::Merge library with it's configurable behaviour is very handy. Still, looking forward to not doing that. 9.4 is coming before xmas, right?

That's not strictly true; PL/Perl is (obviously) a programming language and is available in trusted (with certain operations disabled) and untrusted variants: see [1].

The issue with PL/Python is that it's nigh on impossible to properly sandbox Python.

[1] http://www.postgresql.org/docs/9.3/static/plperl-trusted.htm...

Please correct me if I'm wrong, but isn't updating JSON properties one the big improvements of the soon-to-be-released 9.4 version?

Yes this is true, and much needed.

Huh. That's a little silly given the venerable HSTORE already has this capability:

      create table foo (d hstore);
      insert into foo (d) values(hstore(ARRAY['key', 'key2'], 
      ARRAY['value', 'value2']));
    --  "key"=>"value", "key2"=>"value2"
      update foo set d = d || hstore('key', UPPER(d->'key'));
    -- "key"=>"VALUE", "key2"=>"value2"
Seems like JSON / JSONB need operators to merge two objects. In fact isn't JSONB sharing the data format with the new HSTORE2?

You can use PLV8 to do that in javascript on heroku.

You're right [0]. Thanks for the heads up.

[0] https://blog.heroku.com/archives/2013/6/5/javascript_in_your...

MongoDB work well ONLY if indexes (and working set) fit in the memory. What are the indexes size in the benchmark? (I doubt as I see you are running a 145GB database on a 32GB instance)


once you start page faulting and hitting spinning disk it's game over for any database's performance, postgres included.

There are plenty of database that don't fit into RAM completely and are perfectly usable. It depends on what you can fit in buffer cache / OS page cache.

The whole point of btree indexes is that they're efficient to query from disk.

... but that usually doesn't mean you have to have your full data set in memory, only the hot parts. And even if so, one or two ~10 ms seeks during a query aren't that terrible.

mongodb recommends that your working set lives in memory, not your entire database - it's generally much cheaper to add more RAM than code in any case.

a few seeks aren't terrible for small/medium applications, but when you're asking for thousands of queries a second any disk access is bad news.

That sounds like bollocks.

I have a script that grows a bit Perl hash. It fills up memory and starts paging. The application grinds to a halt (almost). I tie the hash to Berkly DB file, so it writes to disk. Performance is about a third of the in memory hash, BUT doesn't slow when it's too big for memory.

I have been using JSON types in postgres 9.3 for several weeks now. It's nice. It just works. I love being able to mix relational logic on static columns with undefined documents on dynamic columns, all in the same tables.

I'm sort of struggling to imagine anyone really using MongoDB at all in a couple of years.

But then again, plenty of shops still use MySQL (and one of my clients uses DB2...).

Changing databases is extremely painful if there's a fair amount of data already stored there. It doesn't matter how "database agnostic" the code was originally, there are too many edge cases that tend to become front cases over time that you have to adjust for your specific database.

But just like with MySQL, I imagine those who can migrate to another DB would do so at the first opportunity or wait for the equivalent of MariaDB to make a "drop-in replacement" in lieu of messing with their application code.

What's wrong with DB2? I know it's commercial and a bit arcane in areas, but it's a far more powerful database than MySQL.

It depends. DB2 comes in several editions. The DB2 that IBM mostly sells nowadays is ok but there's also DB2/400, which -as the name suggests - comes with all sorts of quirks and funny mainframey limitations.

Big banks still use it.

Exactly! I can't imagine anyone using MongoDB at all in a couple of years either. Everything will be in Node!

But then again, I am going to say the opposite of what I just said to hedge my bet and point out plenty of shops still use MySQL. No serious company like YouTube, Facebook, or until recently Google Ads would use it.

I love sitting in my armchair and passing technical judgements without providing any technical details!

The approach to comments link doesn't indicate I should avoid sarcasm. But if that's how it is going to be then maybe one of the moderators can add that there?

Link in question: https://news.ycombinator.com/newswelcome.html.

Nice benefit, bro (high five)!!

> and one of my clients uses DB2


How does it compare to TokuMX?

I am a huge fan of PostgreSQL, but in order to avoid having to rewrite an application designed for Mongo, I tried TokuMX and was pleasantly surprised by its performance.

Going off [1] it looks like TokuMX is a similar multiple faster than Mongo than Postgre is from Mongo, so I'd guess Postgre and Toku would be fairly similar speed wise. Obviously depending on your data and search patterns one may still be significantly faster than the other as the usual caveat.

[1] http://www.databasejournal.com/sqletc/tokumx-compared-to-mon...

Nitpick: PostgreSQL, or Postgres for short. Never "Postgre".

PostgreSQL indexes are b-trees, and thus will suffer the same performance drop on insert/update/delete as all other b-tree based databases (InnoDB for example). I've been meaning to run the reported benchmark posted by EnterpriseDB myself, but haven't yet had the chance. My other concern with that benchmark is that it really doesn't cover much in the way of an interesting workload.

Since when MongoDB is considered as a database comparable with PostgreSQL?)

It's a comparison on MongoDB's home turf, showing how well PostgreSQL works when used as a NoSQL document store.

That's not really relevant, since the point is that Postgres can do what MongoDB does but faster.

... and far more safely. No one should be using MongoDB in production (or if you do, please never let me store data in your service).

Except it doesn't. It just has a JSON data type. As does many other databases e.g. Oracle, Teradata.

People are switching to MongoDB because the developer and deployment experience is so good.

Hate to say so, but many are switching to MongoDB because it's the current new kid on the block.

I found that many companies have incredibly bad, not use-case driven reasons to pick their DB. Not that MongoDB is a bad database, but it has it's fair share of issues. Granted, this is the case for every database, but if you pick any database without being aware of those, you end up in a world of pain.

(Context: I consult for backend systems in general)

That was my only experience with MongoDB - "let's try this, for this experimental project, to see what it's like", without taking into account its characteristics.

After that the project grew, I was brought in and needed to do some reports, which were a huge pain in the ass without proper SQL and joins.

I don't intend to have anything to do with people using MongoDB unless they have a really, really good reason for doing so, and it's used alongside other databases.

A friend of mine came up with the perfect use-case for MongoDB.

His company installs and monitors sensors in civic infrastructure: roads, train tracks, bridges, ... These sensors provide constant data streams which need to be stored somewhere before processing. Any raw data older than 2 weeks is worthless, and if 2-3% of data is lost before written it's not much of a problem. More data is coming in all the time any way.

For them MongoDB is the perfect transient cache. Sensor data is processed and the results stored elsewhere. Easy to expand. Flipping between installations is just a matter of toggling load balancer, an once an offline cache has been processed, it can be nuked and put back as a fresh system.

So, for a setup where easy size expansion, fast mostly-reliable writes and ease of use are the primary design constraints, mongo fits in suprisingly well. It's a fascinating use-case.

From what that sounds like, I agree. This is a complex problem description with a pointer where the database provides solutions.

> I found that many companies have incredibly bad, not use-case driven reasons to pick their DB.

As in "Our lead dev always wanted to try X, because he likes the name".

Given that, I really need to build an app using project-he-who-may-not-be-named.

good luck sharding postgres - idiots

I maintain an environment that has both pg and mongodb. While pg was simple to install and to debug, mongo has given me just pains. I agree that MongoDBs developer experience is nice - download, run, develop, but from the OPS side, that looks totally different. PG is a bit the other way round.

The sad state of affairs however is that often developers choose which DB they'd like to work with and (Dev)OPS doesn't get a say in that.

Capability is not the same thing as usability. Postgres is demonstratively more capable than Mongo, it's just not perceived as usable.

There's a deep lesson here to be learned. One that can make you rich.

> People are switching to MongoDB because the developer and deployment experience is so good.

WAT? can you tell us more about your experience? according to mine ,developers are moving away "en masse" from Mongodb,because the experience was so bad.

>because the developer and deployment experience is so good. That was ironic, right?


Strangely,there is no JOIN example ...

Well, you won't find one. That's a property of the way the system works. It can be either good or bad, but you won't find an example on how to store and query whole documents with a map/reduce implementation on an SQL database. Both can be either a good fit or a bad fit to your needs - I've seen project that leveraged CouchDB to basically return all data for any given view with a single query from a pre-calculated view, something which would have impossible with an SQL database. That's a good fit. Using a document store as drop-in-replacement for an SQL dabase is most of the time a very very bad fit.

The general issue that I often see is that people don't choose their databases by "is this a good fit or a bad fit for our use case" but rather by "that looks cool" or "we need that to attract talent" or "we did that project with X, so X is also fine for that new, totally different thing". And then they also try and stick with it at all costs, instead of acknowledging that the choice was a bad fit. And then they call you and say "we have an X database in flames, can you come in and rescue our data? Best would be yesterday?"

This is not restricted to document databases. I've seen apps run on mysql on a cluster with 10 Sun XFires where after a man-month of index-optimization 3 would have been sufficient. That was like shooting dead fish in a dry barrel. A lot of developers don't want to bother with stuff like "how does that complicated piece of machinery actually behave if I push it."

> I've seen project that leveraged CouchDB to basically return all data for any given view with a single query from a pre-calculated view, something which would have impossible with an SQL database.

This sounds interesting. I've been looking for a good example where NoSQL db is better than relation db. Could you provide more details.

Postgresql has materialized views now:


Unless I'm misunderstanding what Xylakant means, that's certainly doable in Postgresql and has been in other relational databases for a long time.

I didn't mean "view" in the sense of "database view" but rather in the sense of "page view". All data displayed on a page with multiple types of documents was returned in a single query. That's certainly doable in pg, but requires a lot of hacks and it's not efficient. Or you can use hstore and basically use pg as a nosql db.

Ah, that makes sense, though couldn't you solve this by something like creating a table called 'pages' and storing the materialized JSON (the same that mongo would generate) in the table?

Sure, you could. But CouchDB just makes that very easy to do while it is very hard and hacky in PG. On the flip side, the use case did not require the guarantees that PG has to offer, so why go through the pain for no gain :)

I refuse to believe you speak from experience.

Since NoSQL movement started comparing their database options with SQL based ones.

Come on. To call itself a database the product should have state-of-the-art storage technology implemented, not just a simple API which always returns OK before actual data has been committed to disk.

In old times you could suddenly switch off power of a running Informix server and it will correctly restore the state on that very moment after reboot, dropping all partially (unfinished) transactions, keeping all the committed ones, so it was possible to have a clean state of the system.

How does it work? Append-only so-called "physical logs" on an direct-access, unbuffered by an OS and hard drives storage, proper partitioning on separate physical drives (for real parallelism), etc.

Again, real databases are all about data-storage (architecture, data-structures, algorithms, design decisions, proper implementation) not some "user-friendly" APIs and "well-written" docs to quickly gain popularity among ignorant.

Database back-ends are among the hardest problems in programming and the amount of research which has been done in 80s and 90s in this field is quite remarkable. I doubt that a bunch of punks with professional marketing and sales techniques could adequately replace implementation of a high-performance and fail-safe storage back-end, which is called a database.

You are right on all counts.

As someone who evaluated and eventually recommended against implementing MongoDB for a new product, I entirely agree with you. The Mongo implementation was shot after about three months of wasted time after failure in pre-production testing. It was rolled back at great cost to SQL Server with a light document-style abstraction as Mongo failed miserably on precisely what you said: reliability, consistency, scalability and storage management.

Proper relational databases (PostgreSQL, Oracle, SQL Server in my scope of experience) can take an absolute incredible amount of punishment without breaking any guarantees. The relational bit above is optional but it is at least something you can choose to use or not.

I never got the point of the NoSQL movement.

For me it always sounded like a bunch of people not willing to learn how to write proper queries and optimize data storage.

So CouchdDB qualifies? Given a proper hardware underneath it will at most loose the last, unfinished write.

Since the phrase "big data" appeared.

Any articles coming from a company which describes itself as "The Postgres Database Company" and sells "a relational database management system based on PostgreSQL..." [1][2] stating than "PostgreSQL outperforms X" is de facto suspicious.

[1]: https://www.google.fr/?q=enterprisedb#q=enterprisedb

[2]: http://en.wikipedia.org/wiki/EnterpriseDB

They employ several of the main developers of community PostgreSQL and also sell commercial versions of the database.

Why does that make this article suspicious? Obviously they stand to benefit from more people using PostgreSQL, but they've been exceedingly clear with their methodology - their benchmark is available on Github: https://github.com/EnterpriseDB/pg_nosql_benchmark.

Why does the graph read 2.4 when they say they're testing 2.6?

These are the slides from an interesting presentation by one of the people that did a lot of the work for the new jsonb type in 9.4, and associated index improvements: http://www.sraoss.co.jp/event_seminar/2014/20140911_pg94_sch...

Also contains some comparisons with MongoDB.

Url changed from http://developers-beta.slashdot.org/story/14/09/26/1330228/p..., which points to this.

bored, bored, bored....good luck trying to shard postgres! idiots

I'll keep the data types small, so it takes a fraction of the disk space and doesn't need sharding.

Will be great if PostgreSQL ends up as a storage engine option in MongoDB like InnoDB. Then you have the better engine of PostgreSQL with the superior clustering, sharding and end user experience of MongoDB.

That said these articles are pretty pointless. Performance isn't the reason companies are switching to MongoDB.

How about implementing MongoDB API on top of the PostgreSQL? Then sell it companies which started with MongoDB, but run into some scalability/reliability issues.

Someone's already done a proof of concept of this, see https://github.com/JerrySievert/mongolike

This is the thing I was actually thinking of when I posted the above. This is a background worker (runs within PostgreSQL) speaking the mongo wire protocol: https://github.com/umitanuki/mongres

Again, very basic PoC.

As sad as it may be, with the use of schema-enforcing ORMs and the popularity of MongoDB, this will be a viable business in the near future.

I wonder how far one could go with this. I'm currently playing around with Meteor which requires MongoDB. I wonder if a MongoDB API on top of PostresSQL would be a path to start bringing in relational DB capabilities to Meteor? But perhaps Meteor's use of Mongo is too deeply intertwined and goes deeper than an application-level API.

Metor contains an implementation of mongodb written in js to support transparent use of data access in eith client or server.

Pretty sure it's easier to add PostgreSQL as a storage engine:


Than to fix up the sharding and clustering in PostgreSQL. It's been an issue for quite a few years now and still not mainlined. I think most PostgreSQL users are simply scaling them vertically.

Got a friend recently who used mongodb as first choice for an easy db. He got crazy, dropped it after a month, recoded the thing for postgres in 2 weeks and its been working ever since.. and hes now another die hard postgres fan.

Sooo I don't know.

Are you confusing MySQL and MongoDB? InnoDB is a MySQL storage engine.

The idea is that postgres would be a storage engine for mongodb, like innodb is a storage engine for mysql

I don't think MongoDB would have much left to do, if PostgreSQL was doing the storage duties. The converse - implementing the MongoDB protocol within PostgreSQL, as mentioned above (mongolike), sounds like a more sensible proposal.

threeseed probably wants to use Postgres as a robust transactional backend for MongoDB much like InnoDB is used as a robust transactional engine for MySQL. (There is TokuMX that might fit that use case also).

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact