Again I am super impressed with the technology involved but do want to clarify: ...

mattashii · on May 28, 2022

Kind of. S3 is the long-term low-cost durability guarantee, while our Safekeepers (3, each in a different zone) provide a high-cost short-term durability guarantee with their local persistent disks.

Latency from PostgreSQL WAL to S3 depends on WAL throughput and the configured pageserver checkpoint distance (default 256MB, and this config field is not equal to that of PostgreSQL).

IgorPartola · on May 28, 2022

When you say short term do you mean for hot data or that the guarantee is short term? As in, once it is written to the Safekeeprs is there any chance that the data will disappear?

mattashii · on May 28, 2022

We keep it there for a short duration, until the changes are confirmed to also be written to S3.

Writing to 3 instances in 3 availability zones is considered persistent enough while also maintaining a high performance, and even though it does not provide the 11 9s of durability that S3 has, 3 availability zones dropping out with loss of all instance-local storage is considered rare enough that we do not think that it will impact our availability and durability guarantees.

IgorPartola · on May 28, 2022

That makes sense, thank you! Sounds pretty damn robust.

manigandham · on May 28, 2022

Many distributed systems offer ACID by using distribution + replication for the initial write commit.

It's much faster and cheaper to just have your data on multiple nodes (RAM or local disk) and provides better reliability against crashes. Data can then be compacted and streamed out in an async fashion to more durable storage.