1. Compared with column storage, the performance of vectorized search is relativ...

smallerfish · on Aug 25, 2024

> 2. Postgre is not serverless, so it is not easy to separate read and write, and it is not easy to auto scaling

By the time you're hitting the limitations of vertical scaling a single do-everything instance of Postgres on cloud infrastructure, you're making boatloads of money and can afford to stand up something else for search. And besides which, creating read replicas horizontally is very doable.

Though to be fair, I wouldn't implement moderately complex search on postgres, just because there are better tools for the job. Keeping data consistent between multiple systems though is "involved", and there's therefore a good argument for doing search in Postgres if your needs are simple.

hamandcheese · on Aug 25, 2024

You don't have to be that big for mixed workloads to cause issues for a do-everything PG instance.

Imagine a scenario where read-heavy but infrequent search queries end up pushing, say, your sessions table out of cache.

Postgres has no facilities for earmarking cache for one table vs another, so the noisy neighbor problem is real, and hard to fix. You can throw money/ram at it, but that's needlessly expensive if you have some workloads that don't require that level of performance.

LunaSea · on Aug 25, 2024

You can get 512GB RAM and 96 core machines for $1000 / month and at that point you can throw any workload on it.

ForHackernews · on Aug 25, 2024

Parent comment already mentioned read-replicas.

The main problem I've seen is companies allowing tables to grow enormous because they never partition (by year, for instance) or archive out old stale data.

wvh · on Aug 25, 2024

Multi-master is not common, but it's easy to have a single master and multiple read-only slaves (with failover), though of course you're going to have to configure your application's database/ORM layer to handle multiple servers. That requires a bit of effort, but then you're set for running analytic queries on a completely different database or column store later on if you choose to do so.

I'm not saying you don't need multi-master, but I've worked on several large projects and one Postgres database can handle a lot of traffic. My first solution is to offload analytic queries to read-only instances or pull data into a column store for "offline" processing. Just make sure you don't get stuck into some ancient ORM or application framework.

There are several Kubernetes operators that are moving towards more complex topologies, so I think a lot of innovation and progress is happening somewhat outside of core Postgres itself, building on functionality already present within.

samwillis · on Aug 25, 2024

While both points are true now, there is a lot of work happing to bring both column storage and separate compute and storage to Postgres.

The pg_duck project has the eventual aim to implement a column storage engine for Postgres. There are a few steps to get there as it needs to be tied into the Postgres page storage and replication system. So it's not solved by the first version of pg_duck, but the team is incredible and I believe it will happen.

Neon and Oriole (acquired by Supabase) are both open source and separate storage and compute. There is a few steps more for them to go to be truly usable self hosted, but they will get there, and some of the work they are doing will hopefully be upstreamed.