Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Have you ever chosen Postgres over Mongo and regretted it?
233 points by chatmasta 5 months ago | hide | past | web | favorite | 149 comments
The past 2 years seem to have been the death knell for Mongo and the "NoSQL" hype, with Mongo moving toward SQL parity and SQL databases supporting JSON storage, like Postgres with JSONB.

At this point it seems that there are very few workloads for which Mongo is the best option compared to Postgres+JSONB. I'm wondering...

1) Have you ever chosen Postgres over Mongo, and regretted it? Why? Did you end up switching to Mongo? Was it better?

2) Can anyone name a use case where Mongo is clearly a better choice than Postgres+JSONB?




My example is a bit dated, but may be illustrative: I chose PostgreSQL over MongoDB for a MEAN webapp (making it PEAN?) project started in 2013 and regretted it. I was constantly struggling with the poor support for SQL in Node at the time and spent way too much time fixing bugs or adding workarounds for deficiencies in the SQL abstraction libraries and ORMs, or writing bespoke SQL and object mapping. After switching to MongoDB, I spent significantly less time on storage issues and development speed increased noticeably.

Later we did run into several issues caused by MongoDB which could have been avoided by using SQL (mostly related to consistency, constraint enforcement, and problems with MongoDB's aggregation capabilities). MongoDB got us to market faster, but had higher costs in the long-term.

I think there are two lessons that I'd take away: 1) There are costs to choosing a less popular technology combination, even if the individual technologies might be better (for your particular needs) in isolation. 2) Sometimes it's better to choose the more costly technology in the long-run to avoid costs early (i.e. incurring some technical debt can be the right choice). MongoDB vs SQL for us was an example of both of these.

Note: I assume SQL libraries in Node have matured significantly since 2013, so I wouldn't necessarily recommend MongoDB over SQL for new projects starting today.


IMHO, this is more of an indictment of the ORM (sounds like sequelize) than anything else.


Partly it is. I had issues with both sequelize and any-db. However, since not using them resulted in a lot of bespoke SQL and object mapping, I found MongoDB preferable to SQL without these libraries as well.


So the whole problem could have been avoided if you had chosen a more mature programming language at the time time? Perl, Python and PHP all had perfectly good SQL libraries at the time, as well as a lot of other languages that have been around a long time.


That's a fair criticism. There are a lot of tradeoffs in language choice. For this project the choice was out of my hands, but in retrospect I don't think Node was necessarily a bad choice. Node worked well for a lot of things we did. Was the cost spent struggling with SQL outweighed by the other benefits? It's hard to say for sure.


I don't really get what was the problem on creating your own queries though. If the ORM layer sucked you don't really need it, queries and object mappers (if you needed some kind of typed objects) is powerful enough without making you have to learn the ORM's quirks and gotchas.


Sure, but I found that I was wasting a lot of time writing dead-simple SQL queries and object mapping that could have been better spent writing application code (and was after switching to MongoDB).


So instead of writing dead simple SQL queries (that mostly you could copy-paste if you don't want the complexity of dynamically generating them) it was easier and simpler to switch the whole data layer?


Why wouldn't you auto generate the simple SQL stuff? It's a no brainier really. It's so easy to dynamically create simple selects, updates, etc.


I played around with that a little, but it seemed like once you start trying to auto-generate SQL from objects you are basically writing a simple ORM. So you either spend time writing queries, spend time building an ORM, spend time fixing an existing ORM, or some combination of those.


Sure, but it's about finding that sweet spot. I dislike large ORMs The majority of queries are simple and trivial to auto generate. The more entities you add the more value you get from it.


I'm really confused why you would put any-db in the same category as Sequelize... it's not anything like an ORM and never wanted to be. It's the wrong tool for any job that isn't "writing a DB-agnostic SQL generator"


Isn't all the code bespoke? Or, less snarkily, what's wrong with writing SQL when your ORM starts leaking abstractions?


> fixing bugs or adding workarounds for deficiencies in the SQL abstraction libraries and ORMs

TBH I had same experience with all ORMs I have used sqlpp(c++), odb(c++), diesel(rust).


ORM are one of those kind of libs where very few ones are great.

Python has lots of them, and only SQLAlchemy can be considered of high quality.

Not that Django's ORM, peewee or eloquent don't get the job done.

But they only cover 80% of the use cases. SQLA gets you to 95%, and you bridge the five remaining percents with raw sql injected in the proper places for those rare projects where the extra is important.

The cost of it is that it's a harder to use ORM, and a more verbose one. Which is why I generally use https://github.com/absent1706/sqlalchemy-mixins to make things easier.

What's more, ORM for languages like c++ or Rust have it extra hard as they are used from a low level tech while SQL is quite high level and easy to manipulate. It's hence attractive to use it directly.

If you use PHP, Ruby or Python, SQL doesn't seem that big of a gain anymore for the simple operations.

Now diesel is quite young, and the rust ecosystem has proven to be moving not only fast, but generally in a sane direction, so I wouldn't count it out just yet.

However, I've yet to find an even acceptable JS ORM. Not only do they fail to provide any decent operation beyond simple ones, but they have such poor introspection capabilities that they can't be used to create an ecosystem around them.

Which is, after all, __the main point of an ORM__. Avoiding to write SQL is definitively NOT the most important feature of an ORM. The critical one is having a common API you can build upon to create form generators, auth systems, data validation units, serializers, automatic UI, etc.

Without this, the case for using an ORM is way weaker.


This article could be titled ORM considered harmful (2006)

http://blogs.tedneward.com/post/the-vietnam-of-computer-scie...


There's a lot of hatred for ORMs that floats around but most of the ire seems to revolve around the cases where the ORM's flexibility doesn't allow it to expose the power of the underlying database properly. But people often fail to note how few and far between these examples are and how the vast majority of an app's queries tend to be dumb & boring and handled extremely competently by the ORM. Contrast this with the number of dumb typos & sql mistakes through verbosity that the ORM has saved. Don't get distracted by the 1 or 2 cases per app.


2013 I used knex and had no problems whatsoever interfacing with postgres from node.


That's great to hear! But given that the first release of knex wasn't until May of 2013, it wouldn't have been an option for me.


Knex is great. I use it together with Objection.

This match reminds me on the 2 sides of SQLAlchemy in a very good way :)


In 2018 i feel same (we are using sequelize). I haven't found something that work faster than sequelize, but API is absolutely useless for anything other than simple joins. May be someone knows good ORM for js with nice query builders?


I've been happily using Objection.js (https://vincit.github.io/objection.js/) for my last couple projects.


I hear that TypeORM is nice.


I tried and in my case it was 2x times slower. Probably, because of extensive use of reflection or something like this.


Do you always run into problems when you change your stack and the ORM of choice then isn't absolutely stable?

I never find a room to use ORM as it's just an unnecessary and ineffective layer.


I'm always very skeptical of this sort of claim. How are you constructing queries? Bespoke string concatenation? That's a recipe for disaster. How do you compose queries? That is, given the questions "give me all posts owned by user X" and "give me all posts newer than timestamp Y", how do you ask for all posts owned by user X that are newer than timestamp Y? Is that a completely separate question? If so, that quickly has unmanageable multiplicative effects. Do you write a layer that knows how to take the two original queries in some abstract form and build the correct WHERE ... AND ... clause for the composed query? Well, now you have a bespoke and immature version of an ORM.

I can understand peoples' disdain for ORMs (which tend to be large dependencies that are more frameworks than libraries), but I don't understand not having a query construction and composition layer, and usually the popular ORMs for a language have the most mature implementations of that. In Ruby, there is actually both ActiveRecord (ORM) and Arel (query composition). Nobody really uses Arel directly, though, but I think that's a shame.


  How are you constructing queries?
Entirely with bind variables.

  given the questions "give me all posts owned by user
  X" and "give me all posts newer than timestamp Y",
  how do you ask for all posts owned by user X that are
  newer than timestamp Y?
The API our service presents isn't an advanced search API, so it doesn't need to support searching on arbitrary combinations of database columns.

Obviously, if your software needs to provide an arbitrary search capability, your needs will be different to mine and an ORM might make sense :)


What are "bind variables"?

I'm not talking about a search API, I'm talking about code units that ask myriad questions of the database in order to implement myriad business logic. If you don't have the problem of asking myriad questions of the database, fair enough, but that is a simpler database-connected-software use case than any I've come across (including the ones I thought were pretty simple). I'm very curious now what your service does :)


Bind variables are the ? in "SELECT fname, lname, pcode FROM cust WHERE id = ?" and they're a better choice than concatenating the ID you want into the SQL string because (a) it prevents SQL injection and (b) it lets the query optimiser cache plans.

We have plenty of complicated queries - when we've got complex SQL queries, we find it valuable to have pure SQL as you can paste it into a tool that lets you execute it on the database and view the results, the explain plans, and suchlike.


Another bonus - depending on your DB/driver, it may set this up as a prepared statement, so executions after the first one happen much quicker.


What are "bind variables"?

I’m sorry, but bind variables are Database 101. I have a hard time believing your use cases are particularly “myriad” given lack of the very basics.


No. "bind variables" are Oracle 101. Everyone else calls them "query parameters".


What is wrong with writing a raw SQL for such a simple thing? I never want to fight with a tool for achieving such a simple query. Tools are supposed to help and when queries tend to get complicated, I think that's when ORM starts to fail with performance. So, I don't find a room for it. I'd rather learn how SQL works instead of how to get random ORM of the time to work like it should.


If you're not doing anything to manage and autogen your queries, does that imply that any change to the model's table entails manually updating all N different queries that touch the table in your application?

I'd be really concerned about the fragility of having that many disparate sources of truth about the structure of your data. Particularly when errors are only going to be discovered when the query actually runs.

You don't need to have an ORM, but at the same time, having no abstraction above "hardcode every query" is its own enormous set of problems.


> What is wrong with writing a raw SQL for such a simple thing?

Mostly query composition. I run in to this a lot. Say I need to write a less-than-simple query and reuse it with several different WHERE clauses.

Sometimes I think I just want migrations, not so much an ORM. I want to define views and stored procs in my repo and have them move to prod automatically, but I still want to write in straight SQL.


Although the majority of what I do is in Python, I've taken to using Flyway for DB migrations.

A lot of what I do is using some of the more esoteric features of PG (think domains, custom aggregate functions, etc), so the ability to run migrations as direct SQL is great.


This resonates with me.

But the other super critical thing (which was the first part of my original comment, and which nobody has really highlighted here) is a mature query construction library that knows how to properly protect against injection.


In all fairness a properly used sql connector library will hopefully do this too.


Don't get me wrong: everyone should learn SQL and learn it well. But a query construction and composition abstraction is also an invaluable tool. Fortunately, it is quite achievable to learn both things well.


The constraint and consistency problems in mongo are and were well documented, this is a failure in the technology due diligence process. Any app that is more than a toy needs to be back by an ACID compliant database.


It's also a failure of Mongo developers, who have a well known record of shipping dangerous defaults.


This sounds like a poor choice of tools for the project. If implementing something that needs a json front-end and relational back-end, why not use java?


I have never regretted it. We dropped Mongo completely about a year after JSONB support was released for Postgres. JSONB indexes perform very well, and can be tuned with different index types (e.g. btree, gin, hash). You get the best of both worlds, i.e. relational models with joins and aggregates, as well as flexible and dynamic document storage. If you want a pure JSON interface to your database, use PostgREST:

https://postgrest.org/en/v5.0/api.html


My experience with JSONB is that you should think very carefully before choosing to use it as it can become a real footgun if used inappropriately.

An example of a (possibly) good use case would be e.g. user preferences/settings - there is often a lot of them and the relationship between them is often hierarchical. You could use e.g. a junction table that maps users to settings, but this can be unwieldy. A JSONB value can just be sent directly to client. It is possible to be reasonably certain that you won't want to do aggregate SQL operations on the entire table and thus just being able to do CRUD operations on single rows of the table avoids the pain of JSONB aggregate queries. For the record, I would probably still use a SQL solution in this case.

An example of a bad use case (which I saw in the wild) would be lumping entire user conversations (think Facebook comments) into a single JSONB value. This might naively seem like a time saver since you don't need a 'conversations' table, a 'conversation_messages' table nor the JOIN to aggregate the required data to reconstruct an entire conversation. However, as soon as you want to do _aggregate_ operations across several/all conversations it becomes a real hassle. Say for example you wanted to gather all messages that had been sent in the last hour - using a traditional SQL solution, this would be trivial and highly performant. With a JSONB solution you have to hope you have a JSONB index on the timestamp and then you also have to work out the correct JSONB syntax to do the query. Constructing non-trivial JSONB queries can get really gnarly - trust me, I've been there.

From my (somewhat) limited experience with JSONB I would now generally avoid it except for the cases where either the user wants to quite literally store JSON (and now we get some nice JSON validation for free) _or_ for the case where I am absolutely 100% sure I will never need to do any non-trivial aggregate operations on the JSONB and that the upfront implementation cost of a SQL solution would be stupendously high compared to JSONB. An example that comes to mind would be an Evernote style application where the user data is semi-structured and we don't generally speaking want to run aggregate operations on it (it's their private data).


We abstracted JSONB operations into the Clojure Query AST library HoneySQL and never really looked back. It was something one person on the team learned thoroughly for a half day and promptly forgot about. Now we say things like:

    (jsonb/get-in :jsonb-column [:field :sub-field])
    ;; jsonb_column#>>'{field, subfield}'

    (jsonb/get-in :jsonb-column [:field :sub-field] "default")
    ;; COALESCE(jsonb_column#>>'{field, subfield}', 'default')
    
    (jsonb/get-in :jsonb-column [:field])
    ;; jsonb_column->>'field'
Which each generate the appropriate data structures for HoneySQL to do some combination of using #>>, ->> and COALESCE as needed. I can't imagine this is too difficult in any language. Even if you are writing static SQL, I don't think learning the syntax to go with a feature is too high of an ask to use a feature.


Nice library - I agree with you, learning the syntax shouldn't be considered a downside of an otherwise novel feature, my concern is mainly around the appropriateness of the feature in the context of problems that already have well established relational solutions


Thanks for sharing. I find myself reading docs but am missing the experience to know when/when not to use something.


Just checking, as far as I understand, at the moment Postgres can't add statistics on the JSONB type. Did that affect your query optimizer by any chance? or noticed any degradation in performance?


You can get around that with functional indexes https://blog.anayrat.info/en/2017/11/26/postgresql---jsonb-a...

I'm not working with tens of millions of records, but I haven't needed to go beyond GIN indexes yet - might just be our documents or our workload though. ymmv.


But ... once you're deciding what expression indexes to add you effectively have a schema, yeah?


well thats not too different from NoSQL you will get something like GIN index by default at best. In mongo you also have to create indexes.


It's funny to see these things come around and go around.

~1998, there were quite a few megabytes burnt on discussing Postgres vs. MySql. At the time, MySQL was not ACID compliant, and was (rightly) derided as a database shaped object, but not an actual database. It was faster, provided you didn't need get back the same data you put into it.

MongoDB seems to be in a similar situation. Under the right conditions, it can be ACID compliant. But it's a database shaped object. As a developer, you have to actively manage aspects of your data that should be handled by the database itself.

Every MongoDB project I've touched has run into a schema version issue. Sure, you can put freeform data into Mongo. But when you're querying your database, will a record made yesterday have the same shape as one made six months ago? How do you ensure that? You write model level code to normalize your data at run time, or you write migrations.

Anyhow, to the matter at hand:

With a well behaved ORM you can store certain properties in dedicated columns, and shove everything else into a jsonb blob. This gets the best of both worlds: indexed, relational entities, and effectively schemaless data.

As your needs evolve, you can migrate properties out of the jsonb column and up to the table schema. This can be entirely transparent to your repo/service/controller level code. Now you can apply SQL level constraints to required fields.

This has worked for me on a few different projects.

1) I've never regretted going with Postgres over MongoDB.

2) I can't think of a real application where I'd prefer Mongo. And the cases that get close, I'd probably jump to a stream processing system like Kafka and avoid database semantics entirely.

A fully schemaless database is great, providing that all of your database applications are schemaless, too. Otherwise, you now have to manage that by hand.

I don't begrudge Mongo's existence. It's a worthwhile experiment to move features in and out of the database semantic model. We go through cycles of decomposing and recomposing functionality. This is how we recombine tools to become other tools. But I do not see Mongo as the future of databases.


> Every MongoDB project I've touched has run into a schema version issue. Sure, you can put freeform data into Mongo. But when you're querying your database, will a record made yesterday have the same shape as one made six months ago? How do you ensure that? You write model level code to normalize your data at run time, or you write migrations.

While our project found that Mongo wasn't a good fit, this was a big reason why we leaned towards Mongo. We had log data that evolved over years and has lots of weird quirks over the years so having one place to deal with all of those was great. Ultimately though we found Mongo's scalability in its batch processing to be lackluster (there's only so much you can do with a small cluster of machines) and fell back to Hadoop.

I can't speak towards Mongo vs. Postgres, but coming from a hacked up MariaDB-based sharding system I loved how easy shard management was in Mongo and felt Mongo had a lot of tools for prototyping and allowing me to try something in a new paradigm with fairly low effort. Overall I found Mongo pleasant and flexible but it was a real PITA when you were trying to optimize for a particular use case


> ~1998, there were quite a few megabytes burnt on discussing Postgres vs. MySql. At the time, MySQL was not ACID compliant, and was (rightly) derided as a database shaped object, but not an actual database. It was faster, provided you didn't need get back the same data you put into it.

MySQL has come a long way since then though.


Oh, absolutely it has! Even if its query planner is... suboptimal[1]. It's even held up as an examplar of stability[2] :D

[1]: Yes, I'm nitpicking.

[2]: https://www.youtube.com/watch?v=b2F-DItXtZs


Thing is, postgres has come further.


this, so much. Last time I worked with MongoDB it was with some founder code where he'd scraped sites for data and evolved the code & schema as he found issues. To stop my code from crashing every day, I had to basically write a ton of boilerplate to enforce the schema. I tried to convince him to move to Postgresql, but he was convinced that it would be slow and he'd have to write a lot of boilerplate to deal with it hehe


Practical example from the last ~8 months. I started building my somewhat non-trivial Saas [0] with MongoDB + NodeJS. Getting started was quick.

Soon however, things started playing up: - transactions - aggregations (avg, percentiles), rollups etc. - simply having a relational model to express the business domain

I switched to Postgres and I've been more than happy with it. Heroku hosting is top notch. Libraries are pretty mature. I use the JSONB and Array datatypes not heavily even. Turns out an "exact" data model is quite useful on anything none-trivial. As always, YMMV

[0] https://checklyhq.com


Think they asked about the opposite case.


I think OP's comment is a pretty clear answer to question 1: "Have you ever chosen Postgres over Mongo, and regretted it?"

OP chose Postgres over Mongo, and did not regret it.


And he answered it, with the answer being "no".


Which ORM do you use?


I use knex and objection.js


I’d like to repeat my comment from another recent thread:

I mostly used MySQL. But for my app, I now use CouchDB. It has its limitations, but the automatic sync between devices is pure gold. I haven't written a single line of sync code and it just works. It baffles me a bit that CouchDB isn't more popular and that people rather use MongoDB for NoSQL, which doesn't come with the same syncing capabilities.

Adding to this: if you want sync out of the box and like NoSQL (and your app isn’t super complicated), CouchDB is clearly better than Postgres (and Mongo of course).


Syncing smaller databases between devices seems to be the best use case of CouchDB.

However, Couch is extremely difficult to manage at scale, especially when managing indexes on top of documents. The index generation was extremely slow and wasn't able to take advantage of beefier hardware at all. Deploying updates to indexes had to be done extremely carefully as it will rebuild the entire index and lock reads from that index in the meantime.

Iterating through GB of documents takes a wildly long amount of time, especially if other operations such as indexing updates are happening.

Furthermore it had scary failure modes. There were many times that a couch instance would die with documents cached in memory but not yet written to disk, even though couch had returned a successful document write. We found that documents we thought existed were lost forever during this blip.


I'm so glad to hear from another person who appreciates CouchDB. I think people choose MongoDB because big data performance isn't a selling point with CouchDB, but it's feature rich and my go-to database when staring new projects from scratch.

For those unfamiliar, here are the cons and pros of CouchDB.

CONS:

- Not as performant as MongoDB, but still pretty fast for most purposes.

- Probably won't ever have transactions.

-No direct socket connection or native driver; its protocol is entirely dependent on HTTP.

PROS:

- Has a REST-like HTTP API.

- Admin interface out of the box. (aka Fauxton)

- Replication is dead simple, and probably the biggest killer feature.

- Is a gateway drug to PouchDB, a JavaScript implementation and client for CouchDB.

- Comes with a changes API out of the box with support of long polling, server sent events, and web sockets. (basically so an app can live update documents)

- JSON documents support file attachments, just like Email.


Sync is the primary use case for CouchDB (and PouchDB). It really deserves more attention for that IMO.

It's not ideal for a lot of other use cases, but where it fits it shines.


I just wonder if there is a single reason why anyone would use MongoDB over CouchDB/PouchDB.


If you remember when Mongo first came out, it made a splash with a lot of write focused benchmarks. They were achieved by RAM buffering the writes and then persisting to the disk periodically.

Couch never led in benchmarks and always focused on consistency or both performance and data. That’s not as flashy with the “OMG Benchmarks!” crowd, in my experience at least.

Couch operates entirely over REST last time I checked as well, which makes it a bit clunky for a lot of use cases. Couchbase attempts to be the fast version of Couch and seems to do pretty well.


I have to second this. CouchDB has a much better native interface than Mongo (pure HTTP/REST JSON) and automatic syncying.

Honestly, most of the time I think the real problem is developers not being able to cognitively handle changing from a SQL style declarative syntax to a functional map/reduce style syntax. It takes higher-order thinking, but the results are dramatically better.


SQL is declarative. There's no sense in which functional is higher-order than declarative.

When you write map/reduce, you're doing a mix of functional and probably imperative down the line. Putting syntactic sugar aside, it's hard to beat SQL in tersity, because you're talking to a big query engine which can do JOIN/WHERE/GROUP BY/HAVING/etc.


my only two issues with couchdb are the query language has some learning curve and it would be nice if they had good support for public/private key authentication like lotus notes had.


I'm sure that could be done using Proxy Authentication? Either way, you should suggest that feature on their Github.


This is an opposite of what OP asked, but I hope it provides some useful insight.

GIVE.asia recently migrated from MongoDB to Postgresql. We didn't really use any special feature from MongoDB, so it was basically a poor decision.

One of the main problem we have with MongoDB was that there was no schema. When we changed schema (e.g. add column, delete column), we needed to write a migration script to update all the data inside that schema. It was burdensome, so sometimes we didn't do it. This led to a situation where different rows might have different columns.

Postgresql (or mysql and others), on another hand, forces us to handle schema change immediately because the schema change (e.g. add column, remove column) applies to all rows immediately. That leads to cleaner data.

Two situations you might need MongoDB (or other nosql): 1. having an extremely large data that it's impossible to modify all of them when changing schema, or 2. you actually don't care about schema that much (e.g. logs)


> 2. you actually don't care about schema that much (e.g. logs)

I would argue that this is one of the most overlooked cases _for_ schemas, because it's where schemas are the most complex and also where traditionally we've had few options for really representing that complexity.

This is something I've tried to change with the hobbes project, where we have a small header-only lib to efficiently record "log data" with inferred schemas into shared memory:

https://github.com/Morgan-Stanley/hobbes#storage

Basically, in this view, the type for recorded log data is a variant over all possible log statements in an application. This is a very large and complex type, but in memory or on disk it's a simple representation -- a number for the constructor ID of the variant, and then the payload type that corresponds to that constructor.

Then there's quite a lot that we build on top of that, but as a basic primitive I think that logging is actually a great case for "extreme schemas" rather than throwing out schemas entirely.


When you need to dynamically scale the "cluster membership" of a distributed datastore. When you need more data than a single instance can handle.

I've worked on products that used Mongo and could not have worked with Postgres. Mongo has plenty of issues but Postgres barely has support for true master-master replication even with a static set of nodes, yet alone doing dynamic scaling.

I tend towards Cassandra rather than Mongo, but if we're talking about advantages for NoSQL more generally I'd say: when you need to do reporting queries without them interfering with live queries. When you want index updates to happen behind "live" changes for performance reasons. When you want a query language that makes sense as a programmer and lets you understand query performance from reading the query. When you want a nonblocking client library in the JVM ecosystem. (Hell, when you want your client library and the ecosystem on top of it to have first-class support for your database's features - Postgresql may have first-class support for JSON but that doesn't mean your ORM does). All these things are possible with SQL, but harder.


CitusDB?

https://www.citusdata.com/

There are a few other solutions for postgres as well.


There are various efforts, but none of them is mature/established enough that I'd trust it (and I heard of specific issues with some of them, though I think citrusdb is newer than when I last looked). Maybe this one works, but without an established community around it I'd rather stick to cassandra.


Citus is almost a decade old now. Others like Greenplum are even older. I like Cassandra, but like most NoSQL, if you want to use it as a general purpose datastore; there's a lot of work to do for devs


> if you want to use it as a general purpose datastore; there's a lot of work to do for devs

I'd say that's equally true for postgres. Adapting your data to a relational model does not come naturally, and to use postgresql as a document store requires learning a whole lot of novel postgresql-only functionality that the community will always treat as second-class.


Generally speaking, I don't like MongoDB. Though it has little to do with the database itself: its more so a combination of the limited use-cases of NoSQL and the lack of managed hosting options directly on AWS or GCP (Azure can kind of do Mongo via CosmosDB, but I don't use Azure and don't plan to at this time).

If I've just got a small project that needs to throw data somewhere, I'll probably reach for Redis, DynamoDB, or (if its simple enough) just serializing JSON files to S3 (this is an underutilized way of storing structured data if all you need is key-value lookup with no latency or consistency guarantees, very cheap). Its likely my project is already on App Engine or Lambda, so I don't want to have to go to a second login portal just to manage the database.

I don't hate NoSQL. I prefer SQL, but I also respect the fact that if you combine something like Mongo with in-app schemas, and are careful about keeping those up to date, writing proper migrations, etc, you can get a very production ready and highly scaleable DB out of Mongo. Its not as bad as people make it out to be.


For hosting options I've recently discovered mongo atlas, and been very happy basically. It's dead simple and solid. Just like using RDS or cloud SQL.


Mongo wins when two things are true 1) Your use cases are get doc, put doc, delete doc. No aggregations, OR you have a very healthy sharding key. 2) You need to go to the 1 terabyte - 100 terabyte scale for a very small cost in humans and machines.

For those of us who have dealt with (or dealt with the human/machine cost of) a TB+ Postgres or MySQL database, Mongo can be a blessing if you data can be organized in a proper NoSQL fashion.

It's true, you can scale RDBMSs into the terabytes range with modest cost with excellent performance. BUT the singular guiding principle to achieve this is integers, integers, integers. In general, RDBMSs start shedding performance the more non-integer columns you have in your relevant tables.

If you have less than 100GB of data, it will never matter what database you choose -- it's all about table structure and queries. MySQL, Postgres, Mongo, etc. Unless you choose sqlite of course :P


> In general, RDBMSs start shedding performance the more non-integer columns you have in your relevant tables.

The more non-integer indexes, I'd suggest.

TimescaleDB under Postrgres goes a long way to improving scaling of certain (NoSQL-type content) large databases.


I think we're both right for sure, but even just having varchars in your critical join tables is a big perf killer given there'll be fewer rows read per page fault(unless you're just index scanning/joining of course)


I've used both mongo and postgres in large production environment. For small projects that I plan on keeping small (like personal tools) I'm often happier when I leave it as mongo, vs dealing with schemas and migrations in the future.


Personal tools and small projects are firmly in SQLITE territory for me. It just works and I can easily scale it to Postgres if necessary.


> It just works

This is one of the bigger issues I have with sqlite though. It often "just works" in situations where it really shouldn't. There are countless situations I've found where if you ask sqlite for something nonsensical, instead of erroring out it will happily answer with something nonsensical. Case in point - referring to non GROUP BYed, non-aggregated column values in SELECT lists.


I do like the "just throw whatever you want in this document" approach that Mongo has.


I've regretted that every time I've done it, with the usual "why is this document not like the others" failure case. Nowadays I use schemaless databases/fields extremely sparingly.

Relevant talk: https://www.youtube.com/watch?v=BN8Ne2JCGBs


We are using azure documentdb for a project at work and it already has like 3-4 structural versions of the documents in the db that has to be accounted for by the applications and reporting tools, the latter of which I am working on. It is somewhat similar to Mongo.

It's starting to become really painful and I'd be extremely happy if they moved it to a relational database.


That's been my experience too. Very rarely is your data model non-relational and schemaless, and if it isn't, you end up having to manage all the relations and cascades and everything yourself, which ends up being a huge pain.


> Very rarely is your data model non-relational and schemaless

This of course completely false - which is why we use filesytems and files, more than we use databases ;-)

But what I think is less clear and obvious - is when is a document database better than files and the file system? This is kind of the question, are maildirs/mbox any good at storing mail - and are they any good at searching mail? And is sql better at either?

This isn't clear cut - there are things like dbmail[d] that just uses postgres and layer imap on top/in front - while a lot of servers prefer some form of file storage augmented with custom indexing.

[d] http://dbmail.org/


> This of course completely false - which is why we use filesytems and files, more than we use databases ;-)

But consider: there is no application in existence that attempts to work with the contents of every kind of file that can possibly exist on a filesystem. The only kinds of information that any application can reasonably aggregate across all files are those which conform to the standard metadata schema the filesystem applies to all files. Creating an application that aggregated anything else across all files is essentially impossible -- no sane human would attempt to write an application whose functionality depended on them writing parsers for every single file format that could possibly be found in a filesystem.

And yet, for most uses of MongoDB it is acting as the datastore for a single application, which in order to do any number of things normally demanded of applications (queries, aggregations, CRUD, etc) must be able to read every variety of document that anyone has ever thrown into the datastore, whether those variations were planned (lets add the following fields to all new Comment documents) or accidental (whoops, a bug led to 3 days worth of Comments missing one or more of the following fields).

In the latter scenario, the "it's like a filesystem, store whatever you want!" actively works against the controlling the complexity of the application, to the programmer's great detriment.


> This of course completely false - which is why we use filesytems and files, more than we use databases

Filesystems and files are good at a lot of things, but not really great at anything. They're a general-purpose abstraction, but the relational model is much more specific.


Of course. I agree with the points made here and in sibling threads about schemas and sql databases. But I also think it's hard to dispute the fact that most data lives in files and file systems.

And I think developers too often compare various no(t only)sql solutions with rdms' - without really considering "file-based sql" (sqlite) - or simply file system/DAV based storage (webdav, s3 and work-a-likes, local files, nfs etc).

And again - I think it can be easy to see some examples where files are great (store the images for an image gallery), or where relational model is great (transaction data).

But I think it can be much harder to see where file storage, or file storage augmented with some form of index - is beaten by a nosql dbms.


Even when I'm storing data in a schemaless fashion, I always use some sort of strongly-typed mechanism for generating the JSON that provides guarantees about the nature of the JSON I am storing. At the very worst I end up with multiple well-defined versions of the "schemaless" content. "Schemaless" here just means the default relational mechanisms don't work well for my use case for some reason.

"True" schemalessness is a lie. You can't even hand arbitrary blobs of JSON to human beings and get consistent interpretations of them. There is always a schema. I'm not dogmatic about where the schema lives; relational DBs are great when your data fits them but there are times you need an "exception" to your relationships for some chunk of data. I'm fine with the schema being "this particular strong type in a programming language as rendered by this JSON renderer"; that's a fine schema too.

But some schema always exists, and no matter how you slice it, you can not escape from issues of dealing with schema changes. Thinking you can just ignore the problem and just throw whatevs into the database is simply one of your options, one which may gain a lot of very, very short-term convenience for a loooot of long-term pain. As with all cost/benefit questions, there are times where that is the right choice... but it's a lot less often than the old NoSQL hype presupposed.

(I've used this option when extracting things from an existing, stable DB and doing an ad-hoc one-off reporting analysis using a temporary database. I don't have to worry about breaking the stable DB when I'm only reading from it, and I can fluidly experiment and develop the one-off report easily. But it's really a rare use case.)


> "Schemaless" here just means the default relational mechanisms don't work well for my use case for some reason.

Agreed, but you're basically doing things as they should be done. Not using the schema in a relational database because of valid reasons, still validating the schema, etc. I'm not sure if you're agreeing or if you meant it as a different viewpoint, but it sounds like we agree.

> But some schema always exists, and no matter how you slice it, you can not escape from issues of dealing with schema changes.

This is why I hate it when people say "oh I use MongoDB because I don't want to have to bother with migrations". No, your schema is always going to change, you're just going to migrate things in an ad-hoc manner instead of with a well-thought-out, established process.


"I'm not sure if you're agreeing or if you meant it as a different viewpoint, but it sounds like we agree."

Agreement. I was mostly arguing against a ghost argument of my past, where people get dogmatic that schemas must live in the database. I'm much less dogmatic about that. Mostly I just want DRY; the schema should try to live in as few places as possible, preferably one, though that is particularly hard here. But it doesn't bother me too much for the DB to not be any of them; in the past few years I think all my databases have ended up with JSON blobs in them. But by careful choice, and still with some careful design to it, not because "woo hoo, I can just throw whatever I want into these fields!".


Completely agreed, and I love postgres' support for JSON fields, I think it fills an important need. Not everything can be conveniently described with a schema.


Indeed: I recently ran into a bug where a field was inserted as a string in some cases and an integer in others. A simple fix, yet more work than I would have had to do had I just configured the field properly in the first place.


It beats the heck out of serialized JSON files. However, for any application that needs data with relationships (most all of them), it's a no brainer to use SQL for your query language and DB engine.


It’s always felt then like “easy in, hard to get out”


Maybe not Mongo itself but schema-less NoSQL databases are very useful for Data Hubs that integrate many different data sources. If you work as I do on a team that has 100+ members, trying to integrate 25 billion documents with 30+ datasources and 60+ APIs, then a schema-less database where you can more flexibly handle changes to the document model (which are happening every day) is ideal. Trying to build that in MySQL would be a disaster.


Right, but you could just build that in Postgres with JSONB



So the question might be: when would building it as postgres + jsonb not be appropriate?


I like using tools like Apache Drill for the schemaless use case. I apply that in use cases where you’re dropping files with maybe similar schemes into a folder and doing exploratory analysis on the way to a structured solution, or for quick throwaways, or even for data integration ELT type solutions, as you mentioned. This use case works well with data vault-like architecture (or “data lakes”).

Actually, anything that can read and perform a SQL query against a standards based file in a file system (e.g. CSV, JSON, Parquet, etc. on HDFS or regular FS). Apache Hive, PrestoDB, Spark, or even some RDBMS which support Foreign Data Wrappers can do this, although some of those require a schema to be specified for querying even if the underlying files can be of different schemas. Drill doesn’t have that limitation and can literally query without metadata. Schema / data types matter so true schemaless can be difficult unless you manually map data types, but it has a place.


The vast majority of projects are basic CRUD operations, using relational data. Not something you should ever do in a NoSQL database. That's the problem, people were using it for EVERYTHING. Your use case is very niche.


I think it depends on what platform you choose. For my Ruby/Rails work postgres support has been nothing short of phenomenal. The addition of JSONB and hstore added the dynamic schema features of MongoDB that I'd used in a previous projects. Nowadays I miss none of those features, plus I get free fulltext search, Common Table Expressions and tons of other vetted goodies that come with postgres.

However, working with NodeJS was a different story, as there seems to be a lot more community around projects like Mongoose as opposed to Sequelize. I found that bootstrapping a project from scratch was a lot easier and the pace of development was relatively hiccup free. The only downside for me was memory consumption. Mongo stores all data in memory for fast reads and this quickly proves unsustainable if you aren't prepared for it. By comparison, postgres has some sort of crazy developer voodoo behind it because it has way more features, yet consumes far less RAM and is more resilient to failure in my experience.

NOTE: I haven't built a Twitter or anything "webscale" Most of my apps are for small to mid sized business so take my input here with the appropriate grain of salt


JSON storage is only half the story, the other half is ease of scaling. Scaling Postgres is not simple, and given enough of the right kind of data, I could see why someone might chose MongoDB. But for quickly iterating on a schema for a new project, Postgres's JSON support is definitely the best of both worlds.


Not Mongo, but I am eyeing CouchDB, solely because of PouchDB. I am just not finding an easy and well-supported way to build an offline-first SPA/PWA backed by Postgres without handling all of the data syncing manually.


Woudn't Vue with Sequelize+postgresql solve your usecase ?


(I'm not jaxn)

That solution does not sync.

Also, we've found that Knex with Objection is much better than Sequelize.


I can think of reasons.

I have a personal dislike of mongodb, for many reasons but I’m going to put them aside for a moment.

MongoDB beats PostgreSQL because it’s relatively easy to shard and scale. If your application logic isn’t quite capable of sharding then mongodb handles this and replication for you.

It’s also true that mongodb attempts quorum over consistency. Which is just fundamentally different than how PostgreSQL works, it is possible that mongodb can be faster in a very large mesh topology.

What I am hoping for in future is that something like elasticsearch exists but with PostgreSQL as it’s engine instead of lucene. Such a system would be a complete death knell to mongodb.


Yes.

I think the major advantage of NoSQL databases was easy horizontal scaling (shard + scale + seamless failover) at a time when it was difficult to do these things with SQL databases. However, NoSQL's original advantages have started to fade with db's like Citus and TimescaleDB, and even FB's MyRocks.

NoSQL databases are optimized for straight data retrieval and horizontal scaling. If you want to dump lots of naturally unstructured or semi- or variable-structured data (like logs) into a distributed data store and retrieve them quickly without doing any non-trivial manipulation on them, Mongo will likely fit the use case. It's fast, strongly consistent, supports secondary indices etc.

However, as soon as you need to manipulate your data (i.e. analytics use cases), you're much better off doing a bit of data modeling up front and using a SQL database.


It's not native support but more than a couple times I have wondered how well this works: https://github.com/zombodb/zombodb

Doesn't seem to be a whole lot of people trying it out, though the project is 'relatively' active.


These are fundamentally different database technologies. Postgres is a powerful SQL based store while Mongo was originally built around the idea of a write never failing.

Postgres tends to keep a speed edge but the jsonb syntax is awkward and some operations can be trickier for a developer to navigate. Scaling uses traditional sql strategies.

Mongo has a pretty nifty query language but more difficult queries start becoming quite slow. Scaling is a bit different then Postgres.

Like any tech choice, it's not about what is vogue or trendy but what best fits your problem space.


This is a recent attempt to make a nicer Document API for PG: https://github.com/robconery/dox


> built around the idea of a write never failing.

What am I missing if I think this sounds like a very interesting thought experiment, but completely disconnected from reality?


The tradeoff is "consistency". As long as some node of Mongo is running, it'll do it's absolute damned best to accept a write, including buffering in memory and all kinds of other tricks.

However, if you write to it and then immediately read.. you might not get back what you just wrote (but it will eventually show up in reads). Likewise, you might get different responses from different nodes in some weird situations.

A few things to note: This weird behavior is only in a failure state (eg a network partition), and I believe it's no longer the default but can be enabled. It's very useful for lots of applications, but isn't the right choice for all applications.


In distributed systems getting to a failing state is exactly the same as getting to a succeeding state, both have to make CAP consistency/availability trade offs. So in many cases not failing is just easier.


In response to Question 2, here's the best I've got: At a previous job, we used MongoDB with a write concern of 0 to store logs in such a way that we could more easily search through them later on. The information we were logging was useful to us as developers but wasn't critical to the business, so we were willing to take some risks in order to achieve faster inserts.

It worked OK for us at the time, but if I had to do it over again today, I would probably look at ELK before rigging something up with Mongo.

In the larger sense -- at least for conventional web stuff -- I don't like Mongo. Usually, I'm building something that receives a web request, reads or writes to a database, then generates either a web page or a JSON object as a response. In theory, since you know what information you need to render the response, you should be able to include all of that into the structure of your Mongo document, then fetch everything you need with only one query. In practice, however, business and UX requirements tend to change quickly enough that it isn't feasible to do that. As your product evolves away from its initial design, it gets harder to bridge the gap without joins and transactions. At that point, at least for me, I just feel more comfortable using PostgreSQL.


In support of ELK, I spent years glancing at all sorts of different options (fluentd, greylog, ELK, Splunk, logstash, loggly, ...). I had set up Elasticsearch but never got the parts working for log ingestion and groking. I got them loading but no groking. This was a "personal time" project at work and I just didn't have the time.

Then we decided to wanted Apache logs to be centralized, so I set up an rsyslog server and fed the logs to that. Once I had that component running, I set up another ELK system and figured out the "groking", ended up using liblognorm.

A few weeks later our Elasticsearch cluster had some problem and people had come to rely on it, so I started shoring up the deployment. Going from experiment mode to something more production ready.

The stack I ended up with was rsyslog on the clients, shipping logs to the rsyslog server using RELP. The rsyslog server using omelasticsearch to send the logs to Elasticsearch, and liblognorm module in rsyslog to process the logs.

liblognorm is kind of a pain in the ass, but it was built into rsyslog. If I were to do it again, I'd spend a bit more time figuring out if I could use grok instead.


1) YES. But the schema design was terrible, which isn't a fault of Postgres. The result was a SQL database that didn't scale well, was brittle, required lots of compute resources, and wasn't always redundant. The original codebase existed before me and my team didn't have the approval from management to fix the core issues, so we chose Postgres as the replacement for MySQL.

Since then, we've built new versions of those products, re-architected, and invested 100% into the cloud. Now that we are serverless, cellular, and highly-available we use a NoSQL database that "scales".

The schema concepts from the relational era to the non-relational era haven't changed that much, but we did learn from others mistakes and changed the types of the worst offenders to be JSON. (Which Postgres now supports, yay!)

Now I live in a world were instead of querying for the data from a relational database, I instead have to load more data than I need into "serverless" memory, join the relations, and return the output. That requires at least a single database request for each relational object, increased lines of code, increased complexity, and expensive monthly service costs.

2) We all know that NoSQL is great for non-relational data, but most of my projects have relational data objects. Example. A system has users; a user has notes; a note has attributes.

99% of the time, I'm willing to bet a relational database is needed, and I have yet to find a project that benefited from NoSQL.

Ironically, I've been thinking about the SQL to NoSQL trend for the last couple of weeks and asked myself similar questions. I came to the conclusion that a relational database is not only more natural and more efficient but cheaper in the long run.

Just my thoughts.


For me whether to use Mongo or an RDMS is dependent on what I see the need to be for the data.

If I see using the database as only backing storage for a domain specific microservice where all access should go through the service, I will probably use, MongoDB. C#/Linq/MongoDB is a powerful, under used combination. You get your data and model consistency via C#'s strong typing and it feels more like you are working with everything in memory even though your LINQ queries are being translated to MongoQuery.

For instance, in C# with Mongo you're not working with an amorphous blob of weakly typed data. You're working with a Collection<Customer> and an IMongoQueryable<Customer> where you can write standard LINQ queries....

var seniors = from c in Customers where c.Age > 65

For storing data you can't put an Orders object in a Collection you defined to hold Customers.

You don't get the whole object/relational mismatch with Mongo and as good as Entity Framework is, it's still brings along all of the issues that are unavoidable with an ORM.

On the other hand, my experience with large Mongo clusters is write once/read many. Meaning once the documents got stored, they were never updated and it was very much a document store. I would still use an RDMS most of the time. But why choose? In the age of domain specific services and polyglot persistence, do both.


In the case of C#/.Net, there are more alternatives for document DBs with native Linq-based drivers. For example I've used RavenDB in a couple of web app projects, and it has the extra benefit of being fully transactional and having Lucene indexes built-in (effectively giving you a built-in auto-syncing Elasticsearch/Solr server). For another upcoming (cloud based) project we're planning to use Azure Cosmos DB.


Not Mongodb but Redis . I added Redis to my stack and I am absolutely loving it for what it does. I have SQL server for stuff that belongs in a RDBMS e.g. purchase data, accounts etc. . But the operations and analytics related stuff for my application goes into Redis. And Redis outperforms the SQL server for those purposes.

If you can represent your data in terms of data structures as opposed to Table/records, you may want to take a look at Redis.


> the operations and analytics related stuff

Can you elaborate on this? Is it ephemeral data that requires some operation done with it, and then it can be discarded?


We are building a tool [1] that pulls in data from several different external services as part of our data visualization platform. For this use case MongoDB, with its schema-less design, is perfect.

We are however using Postgres for our internal app database where it is easy to manage our schema a priori.

[1] https://chart.ly


The only regret I have going with PostgreSQL over MongoDB is that the upgrade path is not easy. I can't just upgrade in place, or easily convert the older pgsql files to the new version, I need to spin up a new one and transfer the data via dump.

That's why I'm still running 9.5 in production, because the associated downtime is too much.


most of the issues i have encountered and that i've heard peers complain of stack up to human-error and the inability to adequately plan for change...doesn't seem to matter what database or framework we're talking about...


MongoDB is a fine database IMO. The main issue I have with it, is that they dropped SQL interface and instead invented a less expressive query syntax. Really there are some great features to MongoDB and it's relatively easy to maintain. I wish it had better tooling for modifying existing document structures such as ALTER TABLE... but the automatic sharding is nice.


I would never choose mongodb after using it. Postgres has never failed me in a decade, where mongo is huge pain in the ass in the last year. Honestly the concept of schema less is stupid. You just eventually have to implement SQL to the documents. It's an obnoxious waste of time.


No, because I only use RDBMS systems, never had a business reason to jump into the NoSQL wave.


I've been at two companies that used MongoDB for high velocity, high volume data (packet capture and analysis). It was a bad decision that we tried to change after the fact both times.


Nope but I would imagine in a consulting context one can miss out on the whole and now let's switch to PostgreSQL to address all the Mongo issues revenue stream


Postgres cannot (without some wild and wooly SQL) do an increment upset in a JSONB field. I.e. If I wanted to add 1 to a JSONB attribute that may not exist, I cannot do this, not simply.

I'd, of course, love to be wrong about this, but when I looked into it about six months ago, the best solution I could find was a hack that had untested performance characteristics. While in Mongo, it's a very simple, standard operation.


See the `||` operator and `jsonb_set` (use Cmd-F):

https://www.postgresql.org/docs/current/static/functions-jso...

From the docs for the `||` operator:

> Concatenate two jsonb values into a new jsonb value

And for `jsonb_set`:

> Returns target with the section designated by path replaced by new_value, or with new_value added if create_missing is true ( default is true) and the item designated by path does not exist. As with the path orientated operators, negative integers that appear in path count from the end of JSON arrays.

You'd use this function in conjunction with an `UPDATE` statement to get the result that you want.


If you are incrementing a field, then by definition you know it exists and is an integer. In that case, what advantage are you gaining by storing it in JSONB? Just factor the integer field out of the JSONB column and into the table schema.


I deal with messy data often and there are definitely times when a JSON row should contain a field but doesn't. If I also needed to increment the field as part of cleaning the data, it would also be nice to upsert in the cases where the field is missing. Extracting a field into its own column may make sense sometimes, but usually I just need the data to be correct and stored rather than be queried or updated frequently so adding another column is a waste of time and space.


It seems like you're talking about a couple of different cases here.

If you're going to be incrementing a field infrequently, then just resaving the JSONB shouldn't be a big deal. If you're updating a field frequently, having it in it's own column makes more sense and should absolutely be worth the time. It's tremendously more efficient to increment a typed number than it is to increment a piece of a serialized chunk of JSON.

When you create a column with postgres that has a null default, it doesn't allocate space for it. It's one of the things that makes postgres so good for huge databases because adding an empty column is an instantaneous operation, rather than a locking operation. On the flip side, when you create a field in JSON you're duplicating the space used just to name the field in every single row.

But the short version is, if you're updating something often enough that you're willing to fight around upserting into a JSON column rather that just creating that same column it might be worth it to take a step back and re-evaluate why doing it that way is important to you.


> If you're updating a field frequently, having it in it's own column makes more sense and should absolutely be worth the time. It's tremendously more efficient to increment a typed number than it is to increment a piece of a serialized chunk of JSON.

Yes. When did I say the opposite?


I think I assumed based on your need for increment upsert on a JSON field. My mistake.


Generally speaking, I like to use json tables for inserts only.


nope


No. Mongo is garbage.


No


tl;dr Nope, although Mongo is fun to ride around in the backyard.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: