

Why Amazon is not to blame for your AWS downtime - jakeludington
http://www.lockergnome.com/it/2011/04/23/the-aws-outage-how-to-avoid-downtime-in-the-cloud/

======
nostromo
"The way Amazon AWS works, you have the ability to easily replicate your
environment across multiple availability zones to prevent the kind of
catastrophic downtime we saw this week."

Yes, that's what everyone was told. What the author doesn't realize is that
several availability zones went down. I'm not sure the author is actually that
familiar with the outage given this non-trivial mistake.

~~~
jakeludington
The author (me) is aware that multiple availability zones did in fact go down.
Depending on your level of willingness to accept downtime, configuring an app
to be in US East, California, Ireland, Singapore, and Tokyo (or some
combination of those 5 regions, beyond being in several parts of US East)
would have avoided an outage completely.

~~~
shubber
What I'm curious about is how to reliably transit service across Regions. ELBs
live in a particular Region, so they don't help. You could certainly register
multiple ELBs to the same CNAME and do a limited round robin that way.

But, even if you manually pulled the record for a downed region, you can't
rely on downstream DNS to respect your TTLs.

I suppose you'd at least be able to say something like "some of our users" as
opposed to "we're completely down." That seems like cold comfort, though.

------
orijing
I wouldn't say that the linkbait title is completely wrong, but the author is
misguided about the severity and scope of the Amazon downtime.

If you read Quora's account [1], it's clear that this was regionwide, not just
limited to a single availability zone. This debunks the misguided
understanding that simply having slave instances in different availability
zones will be enough for ensuring availability.

While that's what you would have expected from Amazon (and the name
"Availability Zones"), that's not what happened.

The issue wasn't necessarily that Amazon had downtimes--you'd be hard-pressed
to find a provider that has never been down. The issue was the scope and
duration of the downtime that demonstrated that availability zones mean little
in terms of service redundancy.

[1] [http://www.quora.com/Quora-Outage-April-21-22-2011/What-
caus...](http://www.quora.com/Quora-Outage-April-21-22-2011/What-caused-the-
Quora-problems-outage-in-April-2011)

~~~
jakeludington
In the article I improperly used "availability zones" to mean spanning both
the AWS AZ construct and the AWS region construct. My point was specifically
that by building your app to either function in US East, CA, Ireland,
Singapore, and Tokyo or to fail over to one of those locations, you can avoid
a situation where you've put all your eggs in one basket.

~~~
Xorlev
Seems like controversy for controversy's sake. Sure, by mirroring to another
region completely, you'd avoid an EBS downtime, but the same could be said for
having another provider too. The idea of AWS being you pay less and pay for
what you use with the understanding that you can mirror to another AZ in
moments to recover from issues.

