
AWS is down due to an electrical storm in the US - aritraghosh007
http://status.aws.amazon.com
======
zacharyvoase
By what stretch of the imagination is this icon suitable for representing a
total loss of availability due to a power outage?:
<http://status.aws.amazon.com/images/status2.gif>

Is this not a 'service disruption' situation? At the bottom of the page, the
yellow icon is associated with 'performance issues'.

If there's one thing that's shocked me about AWS, it's the total failure to
acknowledge the severity of service disruptions. Like the above case, or the
fact that a 3-hour loss of connectivity is displayed on the service history as
a green tick with a small 'i' box: <http://oi46.tinypic.com/x5qtch.jpg>

~~~
flyt
Or that there is absolutely no way to deep link to an ongoing outage, and
users must reload, then expand the link every single time, or subscribe to an
RSS feed.

AWS needs to blantantly copy Heroku's status system, which is worlds better
for people needing fast updates on their infrastructure.

<https://status.heroku.com/> vs <http://status.aws.amazon.com/>

~~~
droithomme
Wow, that is so beautiful! I am in awe.

~~~
kooshball
The whole Heroku site is phenomenal. I am constantly amazed every time i see
it.

------
paulsutter
AWS is not down. Only US-East. If your app is down, it's only because you
don't care about the availability of your service.

It's pointless to complain. We've all seen before that Amazon can't keep whole
regions up. If you rely on a region being up, you will have downtime and it's
your fault.

~~~
haberman
According to the AWS status page, only one availability zone within US-East is
down, not the whole US-East region. Running a highly-available service
exclusively from US-East is a reasonable strategy as long as you're spread
across multiple availability zones.

I'm not an AWS customer, just reading their docs; please correct me if I'm
wrong about any of this.

~~~
dragonstyle
We're across availability zones and definitely ran into an outage across zones
tonight.

------
sehugg
Be careful. Nothing is as it seems right now. Do not trust any API output, nor
should you do any API operations that are non-recoverable. Things are up that
are reported down and vice-versa.

Wait for the dust to settle. We're all just going to be a bunch of Fonzies
here.

EDIT: Looks like API access has been restored, so I'm cautiously optimistic
about things working. Note though that some instances may have rebooted or be
otherwise impacted so check your error logs.

EDIT2: Nope, ELB is still hosed. Continue to be skeptical.

------
adrianpike
There's another comment thread going over at
(<http://news.ycombinator.com/item?id=4180339>), if, like me, you got
extremely lucky and picked today's lucky availability zones, and have time to
read HN instead of scramble to get things back up.

Good luck, friends.

------
philip1209
I commented along the same lines during the last AWS/Heroku outage, but
Rackspace still is giving me amazing value and uptime, and every time I try to
move away (as I did this week with my lastest project, on Heroku) I get hit
with a massive service disruption that pushes me back to Rackspace.

~~~
shawnps
Hi, I work at Rackspace. If you don't mind me asking, what makes you initially
want to move away?

~~~
Fizzer
Funny to post this here, but I'm actually planning on migrating from Rackspace
to AWS purely because of price.

Rackspace's prices are insane. $1,314/mo for a cloud server with 30gb of ram,
compared to $657/mo for 34gb on AWS.

Plus with AWS you can use reserved instances to get that cost down to $286/mo.
Rackspace has no way to get the cost down.

That makes Rackspace cost over 4.5x more when comparing based on ram.

~~~
rdl
If you're that price sensitive/need big RAM and can use RIs you should really
be looking at dedicated servers, too. (for which Rackspace is expensive as
well, but actually provides a service level to justify it in many cases)

~~~
purephase
I disagree. The markup does not simply justify cheery service folks. I want
knowledgeable. When they don't have support for Nginx, to me, that's not
knowledgeable.

Rackspace prices are insanely high and I can't wait to move off of them.

~~~
rdl
I'm ok with a dedicated server provider being good at the physical and network
without much focus on applications on the host. You can use a third party for
host level sysadmin, but you don't have a third party choice for the
infrastructure.

~~~
purephase
For the cost that Rackspace charges (and what they claim to provide for said
cost) I want both.

------
bconway
If you're interested in information on the storms themselves and the
destruction they caused in West Virginia, there's good coverage here:
[http://www.foxnews.com/weather/2012/06/30/state-emergency-
de...](http://www.foxnews.com/weather/2012/06/30/state-emergency-declared-in-
west-virginia-after-powerful-storms/)

~~~
morsch
Gov. Earl Ray Tomblin in a statement: _With temperatures near 100 degrees
expected this weekend, it's critical that we get people's power back on as
soon as possible._

So let me get this straight: the critical issue with not having electricity
after a huge storm is that the A/C isn't working? And 100F/38C isn't even
_that_ hot, right?

~~~
smithian
100F with high humidity will result in a number of preventable deaths due to
heat exhaustion, heat stroke, dehydration, etc. So yes, it is the most
critical thing, along with power to hospitals and emergency services.

~~~
16s
Several hospitals in this area are running on generators right now b/c of
these storms. They are flying in additional generators too. It was pretty bad.
Six deaths in VA so far directly attributed to the storms.

------
RegEx
The status page seems to really underplay the severity of the situation.
Netflix and Heroku are down, yet these are just side effects of 'performance
issues' instead of a 'service disruption'. I wonder what it would take to
cross that threshold.

~~~
adrianpike
AWS has historically been both slow to update and heavily optimistic with
their status page.

When I got the frantic texts when EC2 first dropped offline, sure enough, the
AWS status page was all green, but twitter was alight with people talking
about it.

I suspect a service disruption would have to be Godzilla.

~~~
RegEx
According to other HNers, their RSS feed seems to be fairly accurate (someone
please verify this), which makes the whole thing that much weirder.

~~~
ehsanu1
Not weird at all. Companies the size of Amazon are practically bound to be
schizophrenic.

------
16s
Must be the same storm that took several of my trees down (east coast Virginia
USA) last night. It was a violent storm. 90 MPH winds. Made 80 foot tall oaks
bend like straws and they were almost touching the ground. I spent the morning
running the chainsaw just to clear the downed trees from the driveway.

AEP (local power company) says about 65% of customers in this area are w/o
power. May be days before it's fully restored. Hope no one from the HN
community got hurt.

Edit: I posted this from a computer in town. No power at my place so I can't
respond to follow-up posts.

------
kryptiskt
According to Colin Percival on Twitter[1][2], the US East-1 AZ has more IP
addresses, and thus probably other resources, than the rest of AWS put
together. It casts comments about "limited to one availability zone" into some
relief.

[1] <https://twitter.com/cperciva/status/219067641023840257> [2]
<https://twitter.com/cperciva/status/219067963356098561>

~~~
rabbitfang
> the US East-1 AZ

us-east-1 is a region, containing multiple AZ's.

------
codex
Pardon my rant, but I am frustrated. It seems there is always an excuse with
Amazon cloud. Is Google similarly disabled?

~~~
batista
If you mean GAE, its even worse...

~~~
icoloma
I can only talk about my experience, but I've had zero downtime with GAE since
we migrated to the HR datastore. Development is x1000 harder, but then
everything Just Works.

~~~
batista
I had used it before the HR, so it might have improved.

At the time It was like voodoo, and you had to triple-check your datastore
actions, because they could fail for no reason at the backend.

~~~
kroo
App Engine's reliability massively improved with the HR datastore, and has
gotten even better since the pricing change / SLA guarantee. It's actually
remarkably good now, I recommend taking another look.

------
streeter
You can use the EC2 API and ec2-describe-availability-zones to find out which
availability zone is having issues: <http://alestic.com/2012/06/ec2-outage-
availability-zone>

------
lsb
Interestingly, this is a great time to see which of your favorite websites are
rock-solid and which are kind of shaky.

I've been thinking about building a site with a Parse backend, and they're up,
which is good to discover.

~~~
jared314
It's like looking for a house in the rain, so you can see where the water
drains.

------
dakrisht
Is this the same EC2 zone that went out just 3-4 days ago??

Second or I believe third power outage/loss of service for AWS in the past
10-days if I'm not mistaken.

This is wild. I wonder what's going on at Amazon and if they're capable of
handling this much usage in addition to having power issues, etc.

Instagram and Netflix servers are down from what I hear and have been down for
a few hours. Now it makes sense that they're being hosted on AWS.

------
hendler
If you have a load balancer you may have balanced across availability zones
(Not regions) you'd still be up. So US-EAST didn't all go down, just one AZ.

~~~
genwin
But many people are saying that despite paying for multi-AZ for RDS, they were
still down. Do you think they didn't also load-balance across AZs for their
webservers?

------
gee_totes
Do we know this is due to an electrical storm? Today had a leap second as well
(The minute of midnight, June 30th lasted a second longer than normal).

~~~
dangrossman
This is the kind of weather conditions that spawn very electrically active
storms. I don't doubt they could cause the issues. Last night was probably the
most electrically active storm I've ever seen up here -- virtually non-stop
lightning strikes for an hour or two, and there's another just like it over
Virginia right now.

<http://i.imgur.com/d5pEP.png>

~~~
kylebrown
Would it help if they designed the building as a Faraday cage?

~~~
ramchip
The problem is in the power network, not in the machines.

~~~
kylebrown
But is it the power network inside the building, or outside? Presumably, the
problem wasn't a power _outage_ in the city's grid.

------
molecule
reading the linked page, "AWS is down" means "some N. Virginia AWS services
are down"

------
Aloisius
Just pay the extra money and get off US-East people.

~~~
maybird
Acts of God can happen anywhere.

~~~
nakkiel
You certainly mean bad weather, don't you?

~~~
cdcarter
Yes, that's what Act of God means in the US legal system.
<http://en.wikipedia.org/wiki/Act_of_God>

~~~
nakkiel
My appologies. I still believe it's wrong to call it that way because it
assumes too many things. Another way to word it is Force Majeure.
<http://en.m.wikipedia.org/wiki/Force_majeure>

~~~
latch
No, it isn't another way to word it. Force majeure includes acts of nature
_and_ acts of man. Acts of God is only acts of nature.

It's somewhat important to the original spirit of the comment since Acts of
God might indeed happen anywhere (1) whereas acts of man might not. For
example, disruption caused by on going war in Syria wouldn't be covered by an
Act of God clause.

(1) I think that's BS though...there are definitely some places where nature
is considerably more stable than others.

------
dustingetz
how is it that amazon.com itself is never, ever, impacted?

edit: so basically, the businesses suffering outages (heroku, netflix, etc)
don't value uptime to the same extent that amazon does. they got what they
paid for.

~~~
sofuture
Amazon.com does not run on the same EC2 that you and I use. It runs on a
nearly identical system that is isolated and private to Amazon. I wouldn't be
surprised if they were in entirely different physical locations.

------
marcuspovey
Cloud taken out by a cloud.

------
suninwinter
It looks like this is affecting iTunes Match, possibly. I have two tracks just
sitting there, waiting to upload and running lsof -i shows iTunes with a
connection to an AWS machine.

------
jamespcole2
unfuddle is down also, can't access any of my repos. I was going to spend the
day working too - oh well time for a long lunch.

This isn't the first time this has happened to AWS - we moved our app to
linode last year after this happened to us and it seems to affect AWS more
than any other hosting i've ever used, i'd be interested to know how their
infrastructure is set up because it doesn't seem particularly robust.

------
jack7890
Would it be expected that Amazon will issue substantial refunds (e.g. no
charges for June hosting for impacted users) due to the problems today?

------
spullara
Ok startup people. It is worth it to host in a different zone than US-EAST.

~~~
k-mcgrady
I host in Europe (Dublin I think). I'm from here and so are my customers so
there's no latency worry for me.

------
jaequery
with storms getting worse each year, i'll likely be choosing either central or
westcoast datacenters ... from now on.

~~~
freditup
Tornadoes in the midwest and earthquakes in the west. You're doomed.

Seriously though, as horrible as downtime is, I think most internet users
aren't terribly surprised when they can't go to a specific website for a short
period of time.

~~~
jser
Any press is good press? Can't count the number of times I read about the
Twitter fail whale.

~~~
MichaelApproved
I'd be interested in seeing someone polish this turd. How would you spin
amazon being down into good press when people are looking for reliability.
Only thing I can think of is people will be writing about you and that should
increase your page rank but I doubt amazon is concerned about higher page
rank.

Anyone else care to speculate how this could be good press?

~~~
eropple
It could be an opportunity to explain what reliability actually _is_ (it's not
"pack one availability zone with all your stuff") and how AWS helps you
achieve that.

~~~
MichaelApproved
That's awesome spin. So Amazon gives you the Chaos Monkey, whether you like it
or not.

[http://www.codinghorror.com/blog/2011/04/working-with-the-
ch...](http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-
monkey.html)

------
jordanthoms
Google Compute Engine anyone?

~~~
msie
Windows Azure anyone?

~~~
locusm
Their current VM performance is incredibly bad. To be fair they are in some
preview mode.

~~~
rdl
What's bad (disk? cpu?). It's on my list to experiment with, but haven't yet.

~~~
locusm
See this thread
[http://social.msdn.microsoft.com/Forums/en/WAVirtualMachines...](http://social.msdn.microsoft.com/Forums/en/WAVirtualMachinesforLinux/thread/2d471010-cccd-4eef-9f35-5fc6b3aa6a1b)

------
tathagatadg
So cloud is not ready for storm? (troll face)

------
gcb
Isn't that bastard operator from hell excuse #74?

------
somesaba
We're soooo screwed

------
tathagatadg
So cloud is not ready for storms? (troll face)

------
marcamillion
I am getting tired of all of these outages.

I know outages happen all the time at hosts, and maybe as a result of either
a) news is more accessible now, or b) Amazon is bigger than most other
hosts....I feel like Amazon & Heroku are going down WAYYYYY too much.

I am starting to wonder if this "tell all" policy is really best.

