
Reflections on S3's architectural flaws - JBiserkov
https://medium.com/@jim_dowling/reflections-on-s3s-architectural-flaws-71f14c05a5fa#.4dcfd3z6l
======
cle
S3's eventual consistency is easy to understand and hard to digest. It's way
too easy to create data quality nightmares by overwriting, deleting, or
relying on prefix listing. I've personally seen separate major data quality
issues caused by each of those.

If you care at all about deterministic data quality, never overwrite anything
in S3, and never rely on object listing. Read-after-new-write _is_ consistent
in S3. If you need to list objects but don't want to sync additional metadata
in e.g. DynamoDB, then a common pattern is to write a file listing
("manifest") in another S3 object.

~~~
majewsky
The same guarantees also apply to OpenStack Swift, which is basically S3-on-
your-own-cloud. Since more and more internal teams are now using the Swift
clusters that I administer at $work, I can see a lot of people struggling with
the concept of temporary inconsistency and the practical ramifications
thereof.

Last year, I spent two full months hunting temporary inconsistency bugs in the
Docker Registry's Swift driver. I added several heuristics that reduced the
frequency of inconsistency-induced errors by about two orders of magnitude,
but in the end, we still ended up switching to Quay, which has gotten the memo
that you mentioned: "never overwrite anything, and never rely on object
listing". They put metadata in an RDBMS instead, just like s3mper which was
mentioned in the submission.

