I've run Postgres at large scale (dozens of machines) at multiple companies. I'v...

bognition · on March 29, 2024

You'll never see true support for horizontal scalability in Postgres because doing so would require a fundamental shift in what Postgres is and the guarantees is provides. Postgres is available and consistent. It cannot truly be partitionable without impacting availability or consistency.

When an application grows to such a scale that you need a partitionable datastore it's not something you can just turn on. If you've been expecting consistency and availability, there will be parts of your application that will break when those guarantees are changed.

When you hit the point that you need horizontally scalable databases you must update the application. This is one of the reasons that NewSQL databases like CockroachDB and Vitess are so popular. They expose themselves as a SQL database but make you deal with the availability/consistency problems on day 1, so as your application scales you dont need to change anything.

Context: I've built applications and managed databases on 10's of thousands of machines for a public saas company.

BiteCode_dev · on March 29, 2024

Because vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more. There is just few incentives.

Especially since:

- Servers will keep getting better and cheaper with time.

- Data is not only in postgres, you probably have redis, clickhouse and others, so the charge is balanced. In fact you may have different dedicated postgres, like one for GIS tasks.

- Those hacky extensions are damn amazing. No product in the world is that versatile.

- Posgres has much better support from legacy frameworks like django/ror/laravel than nosql alternatives. People shits on ORM, but they enable a huge plugin well integrated ecosystem that makes you super productive, and PG is happily and transparently handling all that.

- If by some miracle you actually reach the point you need this, you'll have plenty of money to pay for commercial HA/sharding, or migrate. So why think about it now?

riku_iki · on March 29, 2024

> vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more

its less about the scale and more about HA and service interruption: your service will be down if server dies.

hahn-kev · on March 31, 2024

Never heard of docker/k8s?

riku_iki · on March 31, 2024

I don't think these two words will buy you HA automagically. You will need 3 layers of various open source components on top, and I am not sure if they will improve or reduce HA at the end.

amluto · on March 29, 2024

> This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things.

I’m not sure precisely what you mean by “HA”, but, in my experience, out-of-the-box support for the most basic replication setup in MySQL is pretty bad. Just to rattle off a few examples:

Adding a replica involves using mysqldump, which is, to put it charitably, not a very good program. And the tools that consume its output are even worse!

There is nothing that shops with MySQL that can help verify that a replica is in sync with its primary.

Want to use GTID (which is the recommended mode and is more or less mandatory for a reasonable HA setup)? Prepare for poor docs. Also prepare for the complete inability of anyone’s managed offering to sync to an existing replica set via mysqldump’s output. RDS will reject the output due to rather fundamental permission issue, and the recommended (documented!) workaround is simply incorrect. It’s not clear that RDS can do it right. At least Azure sort of documents that one can manually real and modify the mysqldump output and then issue a manual API call (involving the directives that you manually removed from the dump) to set the GTID state.

Want point-in-time recovery? While the replication protocol supports it, there is no first-party tooling. Even just archiving the replication logs is barely supported. Postgres makes it a bit awkward, but at least the mechanisms are supported out of the box.

But maybe the new-ish cluster support actually works well one it’s set up as long as you don’t try to add managed RDS-style nodes?

evanelias · on March 29, 2024

> Adding a replica involves using mysqldump

That's one path, but it is not the only way, and never has been.

MySQL 8.0.17 (released nearly 5 years ago!) added support for physical (binary) copy using the CLONE plugin. And MySQL Shell added logical dump/reload capabilities in 8.0.21, nearly 4 years ago.

Third-party solutions for both physical and logical copy have long been available, e.g. xtrabackup and mydumper, respectively.

And there was always the "shut down the server and copy the files" offline approach in a pinch.

amluto · on March 29, 2024

CLONE is indeed nifty. But why is it a plugin? And who don’t any of the major hosted services support it? (Or do they? The ones I checked don’t document any support.)

I wouldn’t call xtrabackup or mydumper an out-of-the-box solution.

evanelias · on March 29, 2024

What's wrong with CLONE being a MySQL plugin? I mean a good chunk of this page is people praising Postgres for its plugins.

As for support in hosted cloud providers, that's a question for the cloud providers, no one else can answer this. But my best guess would be because they want you to use their in-house data management offerings, snapshot functionality, etc instead of porting MySQL's solution into the security restrictions of their managed environment.

Yes, xtrabackup and mydumper are third-party tools, as I noted. If you needed something out-of-the-box prior to CLONE, the paid MySQL Enterprise Edition has always included a first-party solution (MySQL Enterprise Backup, often abbreviated as MEB). Meanwhile Community Edition users often gravitated to Percona's xtrabackup instead as a similar FOSS equivalent, despite not being a first-party / out-of-the-box tool.

mixmastamyk · on March 29, 2024

Citus is open source and well financed. This comment may have made sense a few years ago, but no longer.

evanelias · on March 29, 2024

By "well financed" you mean "owned by Microsoft"?

That situation raises a separate set of concerns, especially in the context of Microsoft's main database cash cow being SQL Server, not Postgres/Citus.

sgent · on March 30, 2024

How is that different than owned by Oracle?

evanelias · on March 30, 2024

Yep, exactly. Apologies, my previous comment was semi-sarcastic but in retrospect that was way too vague :)

On average, HN leans anti-MySQL, with concerns about Oracle ownership frequently cited in these discussions (mixed in with some historic distrust of MySQL problems that were solved long ago). But I rarely see the same sentiment being expressed about Citus, despite some obvious similarities to their ownership situation.

Personally I don't necessarily think the ownership is a huge problem/risk in either case, but I can understand why others feel differently.

mixmastamyk · on March 30, 2024

I'm as skeptical of MS as anyone. However it is licensed GNU AGPL, so not particularly worried.

switch007 · on March 29, 2024

I guess some people really, really dislike Oracle (understandably).

And MariaDB is lagging behind, less and less compatible with MySQL etc leading to various projects dropping support for it - notably Azure. I wouldn't pick it for a new project.

jmuguy · on March 29, 2024

This depends on what level you consider HA and horizontal scaling to be required. I could make the same argument, based on my personal experience, that postgis ought to be included out of the box. Of course, I'll assume most people don't need it :)

klysm · on March 29, 2024

I feel like I have read this exact comment before verbatim