I've run Postgres at large scale (dozens of machines) at multiple companies. I've also run MongoDB at large scale at multiple companies. I like both generally. I don't really care about data modelling differences - you can build the same applications with approximately the same schema with both if you know what you're doing.
I don't understand how folks seemingly ignore Postgres' non-existent out of the box HA and horizontal scaling support. For small scale projects that don't care about these things, fair enough! But in my experience every Postgres installation is a snowflake with cobbled together extensions, other third party software, and home-rolled scripts to make up for this gap. These third party pieces of software are often buggy, half-maintained, and under-documented. This is exacerbated by Postgres' major version file format changes making upgrades extremely painful.
As far as I can tell, there is no interest in making these features work well in Postgres core because all of the contributors' companies make their money selling solutions for HA/sharding. This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things. I don't really want to run another Postgres cluster myself again. For a single node thing where I don't care about HA/scaling I do quite like it, though.
You'll never see true support for horizontal scalability in Postgres because doing so would require a fundamental shift in what Postgres is and the guarantees is provides. Postgres is available and consistent. It cannot truly be partitionable without impacting availability or consistency.
When an application grows to such a scale that you need a partitionable datastore it's not something you can just turn on. If you've been expecting consistency and availability, there will be parts of your application that will break when those guarantees are changed.
When you hit the point that you need horizontally scalable databases you must update the application. This is one of the reasons that NewSQL databases like CockroachDB and Vitess are so popular. They expose themselves as a SQL database but make you deal with the availability/consistency problems on day 1, so as your application scales you dont need to change anything.
Context:
I've built applications and managed databases on 10's of thousands of machines for a public saas company.
Because vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more. There is just few incentives.
Especially since:
- Servers will keep getting better and cheaper with time.
- Data is not only in postgres, you probably have redis, clickhouse and others, so the charge is balanced. In fact you may have different dedicated postgres, like one for GIS tasks.
- Those hacky extensions are damn amazing. No product in the world is that versatile.
- Posgres has much better support from legacy frameworks like django/ror/laravel than nosql alternatives. People shits on ORM, but they enable a huge plugin well integrated ecosystem that makes you super productive, and PG is happily and transparently handling all that.
- If by some miracle you actually reach the point you need this, you'll have plenty of money to pay for commercial HA/sharding, or migrate. So why think about it now?
I don't think these two words will buy you HA automagically. You will need 3 layers of various open source components on top, and I am not sure if they will improve or reduce HA at the end.
> This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things.
I’m not sure precisely what you mean by “HA”, but, in my experience, out-of-the-box support for the most basic replication setup in MySQL is pretty bad. Just to rattle off a few examples:
Adding a replica involves using mysqldump, which is, to put it charitably, not a very good program. And the tools that consume its output are even worse!
There is nothing that shops with MySQL that can help verify that a replica is in sync with its primary.
Want to use GTID (which is the recommended mode and is more or less mandatory for a reasonable HA setup)? Prepare for poor docs. Also prepare for the complete inability of anyone’s managed offering to sync to an existing replica set via mysqldump’s output. RDS will reject the output due to rather fundamental permission issue, and the recommended (documented!) workaround is simply incorrect. It’s not clear that RDS can do it right. At least Azure sort of documents that one can manually real and modify the mysqldump output and then issue a manual API call (involving the directives that you manually removed from the dump) to set the GTID state.
Want point-in-time recovery? While the replication protocol supports it, there is no first-party tooling. Even just archiving the replication logs is barely supported. Postgres makes it a bit awkward, but at least the mechanisms are supported out of the box.
But maybe the new-ish cluster support actually works well one it’s set up as long as you don’t try to add managed RDS-style nodes?
That's one path, but it is not the only way, and never has been.
MySQL 8.0.17 (released nearly 5 years ago!) added support for physical (binary) copy using the CLONE plugin. And MySQL Shell added logical dump/reload capabilities in 8.0.21, nearly 4 years ago.
Third-party solutions for both physical and logical copy have long been available, e.g. xtrabackup and mydumper, respectively.
And there was always the "shut down the server and copy the files" offline approach in a pinch.
CLONE is indeed nifty. But why is it a plugin? And who don’t any of the major hosted services support it? (Or do they? The ones I checked don’t document any support.)
I wouldn’t call xtrabackup or
mydumper an out-of-the-box solution.
What's wrong with CLONE being a MySQL plugin? I mean a good chunk of this page is people praising Postgres for its plugins.
As for support in hosted cloud providers, that's a question for the cloud providers, no one else can answer this. But my best guess would be because they want you to use their in-house data management offerings, snapshot functionality, etc instead of porting MySQL's solution into the security restrictions of their managed environment.
Yes, xtrabackup and mydumper are third-party tools, as I noted. If you needed something out-of-the-box prior to CLONE, the paid MySQL Enterprise Edition has always included a first-party solution (MySQL Enterprise Backup, often abbreviated as MEB). Meanwhile Community Edition users often gravitated to Percona's xtrabackup instead as a similar FOSS equivalent, despite not being a first-party / out-of-the-box tool.
That situation raises a separate set of concerns, especially in the context of Microsoft's main database cash cow being SQL Server, not Postgres/Citus.
Yep, exactly. Apologies, my previous comment was semi-sarcastic but in retrospect that was way too vague :)
On average, HN leans anti-MySQL, with concerns about Oracle ownership frequently cited in these discussions (mixed in with some historic distrust of MySQL problems that were solved long ago). But I rarely see the same sentiment being expressed about Citus, despite some obvious similarities to their ownership situation.
Personally I don't necessarily think the ownership is a huge problem/risk in either case, but I can understand why others feel differently.
I guess some people really, really dislike Oracle (understandably).
And MariaDB is lagging behind, less and less compatible with MySQL etc leading to various projects dropping support for it - notably Azure. I wouldn't pick it for a new project.
This depends on what level you consider HA and horizontal scaling to be required. I could make the same argument, based on my personal experience, that postgis ought to be included out of the box. Of course, I'll assume most people don't need it :)
I don't understand how folks seemingly ignore Postgres' non-existent out of the box HA and horizontal scaling support. For small scale projects that don't care about these things, fair enough! But in my experience every Postgres installation is a snowflake with cobbled together extensions, other third party software, and home-rolled scripts to make up for this gap. These third party pieces of software are often buggy, half-maintained, and under-documented. This is exacerbated by Postgres' major version file format changes making upgrades extremely painful.
As far as I can tell, there is no interest in making these features work well in Postgres core because all of the contributors' companies make their money selling solutions for HA/sharding. This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things. I don't really want to run another Postgres cluster myself again. For a single node thing where I don't care about HA/scaling I do quite like it, though.