Amazon AWS Outage Shows Data in the Cloud Is Not Always Safe

oxymoron · on Sept 6, 2019

It seems awfully odd to write a full article about this without mentioning that S3 has completely different guarantees due to the cross-AZ design. I tend to think that you probably can trust the cloud from a data integrity standpoint, but I would never feel safe depending on EBS.

skywhopper · on Sept 7, 2019

This is a terrible article. EBS can fail. The article even has a screenshot where Amazon is clear about the reliability ratings. If you want durable storage, S3 is the best option. Replicate it across regions for even more protection.

EBS is for storage volumes attached to virtual machines. There are many many use cases in which a slightly higher failure rate is absolutely fine and for which the lower cost and higher performance of EBS are the more desirable tradeoff. But again, it’s clearly stated in all the docs that EBS has a 0.1% annual failure rate. There are trivially easy tools built in to AWS to enable backups of EBS volumes. If the data on your EBS volumes is critical, then use those tools!

chousuke · on Sept 7, 2019

Is that 0.1% for non-snapshotted data? I'm under the impression that if you snapshot your data, EBS can recover some failures from the S3-backed snapshot transparently, increasing durability.

I obviously don't know the technical details, but it seems plausible to me that with a proper implementation snapshots could provide additional durability by reducing the amount of low-redundancy data that is "in danger" during disk failure recovery.

SteveNuts · on Sept 6, 2019

Data on prem is not always safe either. We've had irrecoverable storage failures from well known storage providers... That's why backups exist.

booi · on Sept 8, 2019

Arguably most on prem infrastructure is more susceptible to failure than cloud data centers

johnmarcus · on Sept 6, 2019

That .1-.2% data loss advertised has to happen to someone, statistics do not lie.

borramakot · on Sept 6, 2019

Is that for S3, or EBS?

viraptor · on Sept 7, 2019

EBS. S3 is much better:

> Amazon S3 Standard, S3 Standard–IA, S3 One Zone-IA, and S3 Glacier are all designed to provide 99.999999999% durability of objects over a given year.

tj-teej · on Sept 6, 2019

Data in one geographic region is not always safe, period.

Cloud aside, if you store data on Baremetal in one location (or boxes of tapes) and that location burns down, it's gone.

If you have mission-critical data to store, it should be spread between regions, if not cloud-providers.

david-cako · on Sept 8, 2019

AWS runs on servers, electricity, and drives like the rest of the internet.

If your application and data are single-AZ it is not fault tolerant. If that datacenter goes down or the EBS volume fails, at the very least your application will have downtime.

EBS is replicated within the AZ but that is not guaranteeing you fault tolerance. You must take snapshots and store them across AZs, like any other mission critical S3 object.

mr_toad · on Sept 7, 2019

Man runs snowflake server in the cloud, and is surprised when it fails.

To be blunt, if you treat your cloud instances as anything other than disposable and ephemeral: you’re doing it wrong.

chousuke · on Sept 7, 2019

It's a good reminder anyway. There are lots of people involved with tech who genuinely don't seem to understand that things fail, even in "the cloud".

As for pet servers in the cloud, not having them is the ideal, but it's not nearly always realistic. It's not "wrong" to run a pet server in the cloud if it's the only reasonable option you have given all other constraints involved.

A real failure-resistant system is often expensive, and sometimes simply not worth the investment.

fulafel · on Sept 7, 2019

There's nothing wrong with running a manually administered singleton box any more than there's doing the same with your laptop. Yes of course you should have backups in both cases, but all out automating infrastructures of every experiment is likely work that will not pay off.