
Amazon EC2 down? - vini
http://status.aws.amazon.com/?rf
======
jswanson
Newest update: 10:29 PM PDT We can confirm a portion of a single Availability
Zone in the US-EAST-1 Region lost power. We are actively restoring power to
the effected EC2 instances and EBS volumes. We are continuing to see increased
API errors. Customers might see increased errors trying to launch new
instances in the Region.

Source: <http://status.aws.amazon.com/?rf> Or:
<http://status.aws.amazon.com/rss/ec2-us-east-1.rss>

~~~
reustle
They keep saying it happened in a single availability zone when I saw frantic
tweets from people in 1A, 1C and 1D.

~~~
adamlindsay
Everyone's A zone is different. So I could say A is down while someone else is
saying B, and we could be talking about the exact same zone. It makes it
difficult to say if it is more widespread or not.

~~~
datr
Maybe I'm assuming wrong but I guess in your example zone A and B are the same
and that the different zone names users see don't represent different ways of
spreading resources. If so, why aren't they named consistently? If not, are
there any details on how and why they've set up their zones (or am I
overlooking another assumption I've made?)

~~~
saurik
Due to a quirk of both human nature and copy/paste example code, if the names
of the availability zones were mapped the same for all users then 90% of
requests would be to the zone named "A". To make certain that the zones get
even usage, as they bring on new customers, they change the A-D order of the
availability zones as seen by these new accounts.

------
chintan
We learnt our lesson the hard way after the great AWScalypse of Apr 2011.

The lesson: Use n>1 hosting companies (even if one of them promises a-z-
multiregion-distributed-fault-tolerant-back-up)

~~~
barkingtoad
My co-location provider's data center has had about four hours of downtime in
ten years. It turns out a backup diesel generator, redundant connectivity, and
a good network/sysadmin are all anyone needs.

People are now working out how to failover from AWS to Rackspace and that is
infuriating to me. You... you need redundant clouds now? That can't be right!

~~~
donavanm
Why do you assume amazon doesn't have " a backup diesel generator, redundant
connectivity, and a good network/sysadmin"? And if they do have that why is
your situation different?

~~~
barkingtoad
You tell me man. I guess the difference is I can install many more times more
server horsepower for the $500/month rental of a half rack at my colo, and the
result will be better uptime than EC2. I just can't install them immediately.

------
derekclapham
N. Virginia in my experience is by far the least reliable region on EC2/EBS...
Fortunately our app servers are across 2 zone in the region... but our db
server is just a lone master... Our slave is down... Very nervous.

~~~
WALoeIII
I run a master with a "hot" master each in a different AZ and slaves of each
in their respective AZs for days like today. Expensive, but makes it easy to
sleep at night.

The slaves have their EBS disks snapshotted every 30 minutes, the master every
24 hours.

~~~
derekclapham
Yeah... I snapshot our slaves every 30 minutes.. When you say "hot" master
what exactly do you mean?

~~~
WALoeIII
A "full-spec" machine, another X-Large with 4 EBS volumes that I can fail over
to. Its in circular replication with the other "active" master (only one is
receiving writes at a time). These instances are only snapshotted once a day
to keep them as fast as possible.

------
dsirijus
Am I at fault in believing this happens drastically less in Europe
datacenters, for any cloud service?

~~~
yuvadam
I believe that, at least for AWS, the us-east data center is drastically
larger than eu-west.

~~~
malachismith
Larger, older, slower, more fragile. And cheaper.

------
ftwinnovations
Yup, very down. My site is down, and I've been attempting to reboot and
recreate instances like a madman... Why didn't I just check HN first??

~~~
mhartl
For future reference: <http://status.heroku.com/>

------
cardmagic
This is why <http://AppFog.com/> is investing in multiple IaaS and is not
being hit nearly as hard. You can still sign up and even create apps.

~~~
Emouri
On the other hand they don't have any prices listed, and their blog is down
"Error establishing a database connection". This doesn't exactly inspire to go
to them for hosting.

~~~
Pythondj
and on the other hand, check out the <http://status.appfog.com/> page (hint:
it doesn't exist)

------
monsenhor
Our instances still down. The AWS service health says: 9:27 PM PDT We continue
to investigate this issue. We can confirm that there is both impact to volumes
and instances in a single AZ in US-EAST-1 Region. We are also experiencing
increased error rates and latencies on the EC2 APIs in the US-EAST-1 Region.

------
elq
power outage in one of their AZ's. Only after lots of machines died did the
generator kick on. Lovely.

~~~
stevefink
Are you sure? I have a graph of one of my Resque boxes taking 30Gbit/s of
inbound traffic. Looks like a DDoS attack to me.

~~~
elq
The update I heard was (essentially) 'Another update from Amazon: Looks like
it was a power issue for one facility that services a particular AZ in us-
east-1, flipped to generator, now back on power and in recovery mode.'

------
ShabbyDoo
So, Amazon has said since the introduction of EC2 that, to ensure really high
uptimes, customers should use multiple availability zones and architect their
applications to survive an outage in a single availability zone. While I would
question Amazon's competence if outages of any sort were overly frequent,
Amazon has not had many at all and no recent cross-AZ ones. [This is correct,
right?] I recognize that architecting applications to be performant across
datacenters (tolerant of relatively high-latency replication), but Amazon
seems to be a poster child for keeping its promises w.r.t. availability. Is my
take on this incorrect?

------
redditmigrant
I wonder if the power outage here has anything to do with this -
[http://www.dom.com/storm-center/dominion-electric-outage-
map...](http://www.dom.com/storm-center/dominion-electric-outage-map.jsp)

------
harryh
FWIW as of approx 0450 UTC we're starting to see various instances that had
become unresponsive return back to service.

------
philip1209
Hootsuite just reported that it is offline - I'm unsure if it is related.

In other news, for once Reddit is working.

------
duwease
Still down here, over 12 hours at this point. This is probably the second time
we've been hit with something on AWS in the last three months -- and you have
to pay them to talk to someone about it. We're definitely moving to Linode
ASAP..

~~~
spartango
If your application needs to be up constantly, then it should probably be at
least multi-AZ scaled, if not multi-Region. Multi-AZ applications are not
affected by this outage, and multi-AZ events are very rare. Living out of a
single AZ is very risky.

------
ww520
Hmm, my sites are still up. They are at us-east-1d. Keeping fingers crossed.

------
yuvalo
Still having problems staring a few instances.

We just started a campaign so i thought there were performance issues with our
application so it took me a while to look for ec2 issues. sigh

------
csmcdermott
Coming back up now...

------
snorkel
It does the same thing as OK.

------
drivebyacct2
Does anyone else find it strange that two Heroku posts made the frontpage
considerably (in relative terms, obviously) earlier than "EC2 down"? I would
think EC2 is a more common denominator for people, but maybe other hosts have
better redundancy and thus there wasn't an immediate awareness?

Or am I just overly curious and it's really just that some Heroku clients
happened to notice before an at-large EC2 customer?

edit: I don't mean to imply a conspiracy of some sort, upon a reread. I merely
am curious if there are just that many Heroku users in particular on HN or
somesuch?

~~~
res0nat0r
It is probably because of the large Heroku outage and post here just the other
day, and people are trying to point out that they are down again as that is
more dramatic than a normal AWS disruption.

