what is "at scale"? Is there a specific metric or range of metrics that raises a flag to begin considering something else? For example, in the olden days when it was my problem, page load times were the metric. Once it got high enough you looked for the bottleneck, solved it, and waited. When the threshold was broken again you re-ran the same process.
This bugs me every time performance comes up. No one is ever concrete, so they can never be wrong.
If Michael Jackson rose from the dead to host the Olympics opening ceremony and there were 2B tweets/second about it, then postgres on a single server isn't going to scale.
A crud app with 5-digit requests/second? It can do that. I'm sure it can do a lot more, but I've only ever played with performance tuning on weak hardware.
Visa is apparently capable of a 5-digit transaction throughput ("more than 65,000")[0] for a sense of what kind of system reaches even that scale. Their average throughput is more like 9k transctions/second[1].
Yeah, I don't mean to say Visa can run global payment processing on a single postgres install; I'm sure they do a ton of stuff with each transaction (e.g. for fraud detection). But for system design, it gives an order of magnitude for how many human actions a global system might need to deal with, which you can use to estimate how much a wildly successful system might need to handle based on what processing is needed for each human action.
For similar scale comparisons, reddit gets ~200 comments/second peak. Wikimedia gets ~20 edits/second and 1-200k pageviews/second (their grafana is public, but I won't link it since it's probably rude to drive traffic to it).
interesting re reddit, that's really tiny! but again, I'm even more curious about how many underlying TPS this turns into, net of rules firing, notifications and of course bots that read and analyze this comment, etc. Still, this isn't a scaling issue because all of this stuff can be done async on read replicas, which means approx unlimited scale in a single-database-under-management (e.g. here's this particular comment ID, wait for it)
The truth is that it really depends on your application work load. Is it read-heavy, or write-heavy? Are the reads more lookup-heavy (i.e. give me this one user's content), or OLAP heavy (i.e. `group by`'s aggregating millions of rows)? Is read-after-write an important problem in your application? Do you need to support real-time/"online" updates? Does your OLAP data need to be mutable, or can it be immutable (and therefore compressed, columnarized, etc.)? Is your schema static or dynamic, to what degree?
I agree with others that a good simplification is "how far can you get with the biggest single AWS instance"? And the answer is really far, for many common values of the above variables.
That being said, if your work load is more OLAP than OLTP, and especially if your workload needs to be real-time, Postgres will begin to give you suboptimal performance without maxing-out i/o and memory usage. Hence, "it really depends on your workload", and hence why you see it's common to "pair" Postgres with technologies like Clickhouse (OLAP, immutable, real-time), RabbitMQ/Kafka/Redis (real-time, write-heavy, persistence secondary to throughput).
For me with any kind of data persistence backend, it's when you go from scaling vertically to horizontally. In other words, when it's no longer feasible to scale by just buying a bigger box.
I don't know that there is a canonical solution for scaling Postgres data for a single database across an arbitrary number of servers.
I know there is CockroachDB which scales almost limitlessly, and supports Postgres client protocol, so you can call it from any language that has a Postgres client library.
Is there an equivalent for postgres?