Hacker News new | past | comments | ask | show | jobs | submit login

Again I am super impressed with the technology involved but do want to clarify: in order to have D in ACID the update must be sent to S3, right? Is there a mode which makes it so that an INSERT, UODATE, and DELETE do it return until this happens? What kind of latency does that introduce and is that latency affected by throughput at all?



Kind of. S3 is the long-term low-cost durability guarantee, while our Safekeepers (3, each in a different zone) provide a high-cost short-term durability guarantee with their local persistent disks.

Latency from PostgreSQL WAL to S3 depends on WAL throughput and the configured pageserver checkpoint distance (default 256MB, and this config field is not equal to that of PostgreSQL).


When you say short term do you mean for hot data or that the guarantee is short term? As in, once it is written to the Safekeeprs is there any chance that the data will disappear?


We keep it there for a short duration, until the changes are confirmed to also be written to S3.

Writing to 3 instances in 3 availability zones is considered persistent enough while also maintaining a high performance, and even though it does not provide the 11 9s of durability that S3 has, 3 availability zones dropping out with loss of all instance-local storage is considered rare enough that we do not think that it will impact our availability and durability guarantees.


That makes sense, thank you! Sounds pretty damn robust.


Many distributed systems offer ACID by using distribution + replication for the initial write commit.

It's much faster and cheaper to just have your data on multiple nodes (RAM or local disk) and provides better reliability against crashes. Data can then be compacted and streamed out in an async fashion to more durable storage.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: