I am trying to gain a basic understanding of this:
Right now I have a 4TB DB on one large box.
Is the idea that using a proxy tool like PGDog I could spin up 8 smaller boxes handling ~500GB each and then one medium box for the proxy?
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this.
We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us.
We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.
That's the idea of sharding. If you read the pgdog docs, you'll notice you need to tell it which shard server to route your request to - it doesn't just magically work. It's still providing value by reusing connections, which are particularly expensive in postgres.
Because it's not magic, you do still have to know what's going on under the hood, e.g. no cross-shard transactions.
I'd see if my application can benefit from read replicas before doing sharding, because sharding is difficult (if you care about data consistency). With replicas, each replica does have a full copy of the data and you only write to the master - you have to decide which transactions are suitable for running against replicas, which can lag slightly behind realtime. E.g. reading data to build a webpage is probably safe to do from a replica - any read-modify-write is not.
> 8 smaller boxes handling ~500GB each and then one medium box for the proxy?
That's exactly right. Get in touch (lev@pgdog.dev), happy to help or at the very least tell you what current works (or doesn't) so you know what your options are.
Right now I have a project that has very heavy write traffic from multiple services and a web app that reads from this. We are starting to hit the point where no amount of indexing, query optimisation, caching or box upgrades is helping us. We are looking at maybe moving the bulk of the static data to clickhouse to reduce the DB size but I would love to hear if PgDog or other kind of sharding could be useful for this use case.