This looks awesome. I looked through the docs and didn't see much information on what happens in failure cases. If one node that is storing a replica of my data goes down how long until the data is replicated to a new node?
It depends on what "goes down" means. The vast, vast majority of such issues are transient: the node panics or loses power and comes back up. There's no impact to data availability or durability unless all nodes storing a copy of the same object are down at the same time.
It's a good question. Sounds like we should document or write up a blog post about the gory details.
That would be really useful. Along the same lines, one of the blog posts said that you chose to use strong consistency (CP in CAP). In the event of a failure, how many replicas (out of N) need to be down before the API returns 500s for reads or write (if they are different).
For a read to fail as a result of storage node failure, all replicas would have to be down. You may see increased latency on successful reads if some but not all of the replicas are down.
Recall that writes always create new objects. The storage nodes are selected dynamically for each write, so in practice that won't fail for storage node failure.
The more likely way to get a 500 from the data path is if parts of the metadata tier become unavailable. As with storage nodes, such failures are typically transient. They're also unlikely to affect reads. Writes may experience transient errors as the metadata tier recovers from failures.