This feels, intuitively, like it would be very hard to make crash consistent (given the durable caching layer in between the client and S3). How are you approaching that?
It depends on what you mean by crash-consistent. I would expect that we handle crash-consistency at the client fine (since it is the same crash-consistency of NFSv3) and craash-consistency at the server also fine (since we are able to detect using etags what version of an object is in the backing data storage). Tell me a bit more about what you're thinking.
For sure! Upon reflection, maybe I’m less curious about crash consistency (corruption or whatever) per-se, and more about what kinds of durability guarantees I can expect in the presence of a crash.
I’m specifically interested in how you’re handling synchronization between the NFS layer and S3 wrt fsync. The description says that data is “asynchronously” written back out to S3. That implies to me that it’s possible for something like this to happen:
1. I write to a file and fsync it
2. Your NFS layer makes the file durable and returns
3. Your NFS layer crashes (oh no, the intern merged some bad terraform!) before it writes back to S3
4. I go to read the file from S3… and it’s not there!
Is that possible? IE is the only way to get a consistent view of the data by reading “through” the nfs layer, even if I fsync?
So, the step that differs from your concern is Step 3. Let's say that we have a catastrophic availability scenario (as you said, intern comes in and tears down something) -- our job is to make sure that the data in our durable cache remains there (and to put safeguards in place to prevent the intern from hitting that data). If we do that, then any crash of our system will get the data back and be able to apply it to S3. I know that's kind of hand-wavy, but this is how things like AWS S3 work -- just having a super high bar for processes around operations to keep data safe.
For some reason, I don't see a "reply" button to your later comment (maybe there's an HN threading limit), but the answer is yes -- fsync guarantees durability in the Regatta durable cache, not in S3.
Gotcha! Thanks for the answer; so the tl;dr is, if I’m understanding:
“All fsync-ed writes will eventually make it to S3, but fsync successfully returning only guarantees that writes are durable in our NFS caching layer, not in the S3 layer”?
reply