
Numerous service errors on AWS today - lenova
http://status.aws.amazon.com/?Mar30
======
bmurphy1976
We've weathered the last few outages very well, but we got hit badly by this
one.

This is what I believe happened to us: We had a lot of Spot instances. After
about two hours of not being able to spin up instances, the Spot market went
screwy and prices shot up on multiple instance types across multiple
availability zones at the same time.

This screwed us. Many of our redundant services were deployed to multiple spot
instances across multiple availability zones and most of them were terminated
at the same time. Afterwards, we weren't able to spin up replacements and had
to scramble to move our key services onto unrelated instances that were
spared.

Lesson learned. Multiple availability zones aren't enough to protect you from
wild spot price fluctuations and an AWS outage can aggravate spot prices
pretty dramatically.

Also, as a final note, I feel this outage was severe enough to warrant
something other than a green, everything's OK!!! icon.

~~~
yeukhon
You are a brave soul to use Spot instance for key services. Just curious, what
kind of workload do you do on the spot instances? Fact: spot instances are
usually cheaper and more stable in a less dense region outside of U.S. if you
don't care about latency.

~~~
bmurphy1976
We manage hundreds of servers and have been migrating more and more to spot
instances in order to save on costs. It was only a portion of our
infrastructure, but it was just enough to cause us grief. Given the volume of
servers and the velocity with which our software has been changing, I'll
readily admit it's been a challenge keeping ahead of everything and we
definitely went a little too far with the spot instances in a few places, but
we just haven't seen this failure mode before so we got overconfident.

------
zimbu668
I love how i haven't been able to launch a new instance for hours and the
status page is still all green check marks, oh a few of them have an "i"
subscript.

~~~
nikolay
Amazon often hides issues behind a green icon and you find about issues post
factum. This wastes so much energy instead of being open and report in real-
time.

------
i_have_to_speak
Love how Amazon pushes multi-AZ redundancy when it's always an entire region
that gets screwed up. Oh, and us-east-1 again? Seriously?

------
setheron
when I worked there I always thought it was odd the concept of "green eye" and
how every service can define it themselves. one of the few political stuff I
remember during my time.

