

Amazon EC2 Outage Takes Down Foursquare, Instagram, Quora, Reddit, Etc - Vexenon
http://techcrunch.com/2011/08/08/amazon-ec2-outage/

======
fourspace
Maybe I'm oversimplifying things, but why haven't these companies distributed
their compute resources across various facilities and cloud providers, enabled
instant failover, and tested this before outages like these?

~~~
smanek
It costs engineering time to do so. Time that could otherwise be used to build
features, better protect against more common failures, attract users, etc.

Amazon probably has ~5hrs/year of complete failure of a region. Figure,
conservatively, it would take 3 months of engineering time to protect against
that, plus a 'continuing' cost of 1/2 a week per month to maintain that
protection. You'd also have to (at least) double your provisioned capacity
(which may include a larger ops team, etc). Assuming your servers cost
$20k/month and devs cost $100/hr (both fully loaded), we're talking about
~$340,000 to prevent 5 hours of downtime (just for the first year).

If downtime costs you more than $50K/hr, then it might make sense to be that
fault tolerant. Otherwise, there might be better places for a startup to spend
its (limited) resources.

~~~
cageface
Not to mention that it's very easy to _increase_ overall downtime by
introducing all the extra complexity this kind of redundancy can bring.

------
lars512
It's a testament to how successful Amazon has been in its cloud offering.
We're used to sites going down for one reason or another. What's weird is that
Amazon's success has made all these failures so correlated. It's a strange
feeling when many sites you like all fail at once.

------
stevenp
My t1.micro instance in us-east-1b seems to be up and running just fine as far
as I can tell.

------
bane
Well there goes all the parts of the Internet I'm interested in. Time to go
read a book.

------
dbuizert
Someone said business continuity? It can be costly, but could save your
business. Stop saving that VC money and start saving your business.

------
i386
and here I was thinking the change I just rolled out to our EC2 instances had
boned our test environment. Two failures in a week? Does not really inspire
confidence right now :(

------
fizx
It's back now.

------
sibsibsib
reddit is currently working for me.

------
scrod
LOL, the cloud.

------
thechut
This may be a stretch...but anything to do with the Verizon line workers
strike?

------
protagonist_h
we were thinking to migrate our service to EC2 from our dedicated softlayer
server. now we will probably stick with our current setup.

------
Vexenon
My reaction: shocking (but not really).

