Hacker News new | comments | show | ask | jobs | submit login
AWS is down due to an electrical storm in the US (amazon.com)
239 points by aritraghosh007 1574 days ago | hide | past | web | 163 comments | favorite

By what stretch of the imagination is this icon suitable for representing a total loss of availability due to a power outage?: http://status.aws.amazon.com/images/status2.gif

Is this not a 'service disruption' situation? At the bottom of the page, the yellow icon is associated with 'performance issues'.

If there's one thing that's shocked me about AWS, it's the total failure to acknowledge the severity of service disruptions. Like the above case, or the fact that a 3-hour loss of connectivity is displayed on the service history as a green tick with a small 'i' box: http://oi46.tinypic.com/x5qtch.jpg

Or that there is absolutely no way to deep link to an ongoing outage, and users must reload, then expand the link every single time, or subscribe to an RSS feed.

AWS needs to blantantly copy Heroku's status system, which is worlds better for people needing fast updates on their infrastructure.

https://status.heroku.com/ vs http://status.aws.amazon.com/

Wow, that is so beautiful! I am in awe.

The whole Heroku site is phenomenal. I am constantly amazed every time i see it.

What's the deal with the % measuring? I know going below three nines is seen as "bad service", but three hours of outage in a month gives you 99.5%, not 99.97%

The 99.97% statistic has "may uptime" written next to it, so I guess it refers to the previous month.

They've reporting a power issue with a single US East availability zone. There are four EC2 availability zones in US East. Strange that they would cite orange as "performance issues", but it's certainly more appropriate than suggesting a complete service disruption.

It is absolutely affecting more than one availability zone. Just because people didn't lose boxes outside of one AZ doesn't mean there weren't service disruptions outside of it. I know multiple people, myself included, who STILL cannot launch new instances anywhere in US-East. The EC2 web console and API have also been down intermittently and throwing errors for hours.

If one availability zone is down, there will be suddenly be ton of extra people trying to start new instances in the other availability zones. Could there just be a shortage of available instances in the other availability zones? Perhaps Amazon don't keep enough spare capacity for this eventuality (nor would I expect them to)?

This certainly can happen. You can't rely on being able to launch enough replacement instances if a zone fails.

You can at least mitigate it by keeping utilization of your instances below what would be needed to handle load of one zone failing. I suspect to save money, many people are pushing utilization as high as possible.

This is what reserved instances are for. You can by low use reserved instances for disaster planning.

If an outage in one zone causes a brownout in the other zones, then the aren't independent "availability" zones in any logical sense. They may as well be on the same power supply.

They are independent IF you buy them in advance. If you wait till an actual outage then it's too late.

It is obvious we are dealing with the imagination of a marketing exec here. And that is a sick cynical place.

They do have a red icon if an event impacts an entire region. If a customer is correctly utilizing multiple availability zones, a failure in one zone should only impact the customer until they can fail over (Should be within minutes if they are automated).

Aren't multiple availability zones impacted though? I don't get how Heroku can be so widely impacted if its not for the fact that multiple Availabiltiy Zones have been impacted.

The power issue only struck 1 of the 5 zones in US East (Virgina)

I can't speak to why Heroku was impacted as much as it was. It could be that they have single points of failure, or run at a utilization level that makes losing a single zone difficult.

But the icon is meant to display AWS status, not the customer's. Friday's outage did warrant a red icon - unreachable instances, corrupted EBS volumes, failed API calls.

Clearly Amazon hired Baghdad Bob as their PR guy when he was looking for a new gig.

Apart from Apple's legendary secrecy, Amazon's EC2 is a solid #2 in terms of impenetrability.

This is not fair, Amazon publishes surprisingly detailed postmortem reports for AWS problems.

For embarrassing outages, sure, but have you ever had to try and find out why some of your instances have disappeared? Good luck with that. Gold support, which you'd think would be good, just prioritizes your emails, apparently little more.

Linode, in comparison, has always been straight-forward, personal, and as honest as a company in that space can be.

What's perhaps odd is Amazon's customer service for their material goods is usually superb.

It's a terrible choice, but seems inspired by the triangle shape used for warning signs.

Hey! At least Amazon.com is up!

An upwards triangle is used as a danger/warning traffic sign.

We had 0 downtime. The only thing that's screwed up is a read replica of a multi-AZ MySQL on RDS deployment. Amazon did not send any notification. Kinda annoying.

AWS is not down. Only US-East. If your app is down, it's only because you don't care about the availability of your service.

It's pointless to complain. We've all seen before that Amazon can't keep whole regions up. If you rely on a region being up, you will have downtime and it's your fault.

According to the AWS status page, only one availability zone within US-East is down, not the whole US-East region. Running a highly-available service exclusively from US-East is a reasonable strategy as long as you're spread across multiple availability zones.

I'm not an AWS customer, just reading their docs; please correct me if I'm wrong about any of this.

We're across availability zones and definitely ran into an outage across zones tonight.

The idea that a single data center can host a highly available service is just wrong.

> If your app is down, it's only because you don't care about the availability of your service.

That is absolutely absurd. At what point did the common-sense solution to "unacceptable downtime on AWS" become "buy two of everything"?

If you operate a service where uptime is critical, two of everything isn't enough. You have to ensure they will be separated geographically and logically so that when things like this happen, you don't lose your systems.

We operate systems that sit on the pages of the top e-commerce companies in the world. We have 10 separate segments of clusters. Operating in four AZ's in East, three AZ's in West-1 and two AZ's in West-2. When this outage happened, the servers that were impacted in east were removed from our DNS and within 9 minutes the impact of this event on their sites was eliminated.

Thanks for the most constructive post in this whole discussion. Hacker News is a much better place to talk about solutions than just whining.

At Quantcast we have physical servers in 14 cities. We use anycast to achieve site failovers in 6 seconds. Downtime for us would impact millions of websites, so we don't have downtime.

That's our next step is automating our fail over. We currently use DNS Made Easy and while you can do that with them it's tougher. We are switching to Dyn, and their Layer 2 failure detection will put our failover in that 6 second range.

DNS based failover in practice will not give you 6 second failover, unlike BGP. Too much of DNS is broken and out of your control.

(The trend for regular ISPs is probably improving, except that mobile/carrier DNS is often particularly broken. It would be interesting to do monthly surveys of this.)

Can you clarify? I assume you mean that you withdraw announcements and you see BGP convergence in about 6 seconds?

Exactly. We announce different subnets for each continent or region, and all the sites within that region announce the same subnet. When one site ceases announcing the shared subnet, BGP usually converges in just more than 5 seconds. I was astonished the first time I saw it. It really makes you appreciate the solid engineering behind the core routers.

It's nontrivial to determine exactly when to drop the announcement. And be careful, because if you are too eager to drop the announcement, you may do it in more than one site at a time.

At first we used DNS with short timeouts, but those timeouts are only advisory and are ignored by some implementations. We would see most traffic tail off within 10 minutes for a one minute timeout, but it took many hours for all the traffic to migrate over to the new DNS. The folklore on using less than one minute for a DNS timeout is that a huge percentage of implementations ignore sub-minute timeout. Funny how much of the Internet's operation is passed along as folklore and not really known for sure.

Thanks for asking. Hacker News should be about sharing best practices and making the Internet a more reliable place.

^ This. You'll do it if you care enough or if it's absolutely required. Everyone knows in advance that this kind of shit can and will happen. Also, if people don't plan on hosting in multiple areas/AZs, I wonder why the hell anyone would even consider overpriced cloud technology while you can get practically 5 powerful dedicated servers for the same price, except that it's more performant than a shitty VM on EC2. That said, if you have 5 dedicated servers, why even bother with EC2? It's more expensive in every way. "Pay more as you grow" is actually extremely expensive for what you actually get. Infinite scalability? Please. When people think about scalability, they think about adding a few gigs of ram to their VM with a little more I/O throughput (e.g. migrating from VM1 to VM2) for hundreds of dollars, and it's still shitty VM performance compared to raw metal. Instead, why not spend that money on a few good dedicated boxes with 96-128gb+ ram and a bunch of true (not virtual) CPU cores, then you're done for a while, and for the same price. Hardware is dirt cheap these days.

The only useful/sane use case I can see in Amazon EC2 would be for services like Heroku where they need to automatically be able to manage a truckload of VM's as their rapidly growing infrastructure, unless you want to do it yourself which I imagine is quite a headache unless you work closely with someone like Amazon or Rackspace.

The "scalability" thing isnt about adding some ram or getting a larger proc. It's about adding a few dozen (or hundred) instances in minutes. Or getting hosts turned up in 7 different regions. Anyone can do that right now with AWS, let me know how your Equinox negotiations go for the next month.

Yes white boxes are cheap. Site negotiations, design, procurement, networking, operations, and maintenance are expensive in dollars and time. Personally I run "a bunch" of physical sites across the globe. It would be waaaay easier to be able to turn up rackspace/aws/google instances as needed.

> The "scalability" thing isnt about adding some ram or getting a larger proc.

You'd be surprised how many people that actually use EC2 think it is.

> Yes white boxes are cheap. Site negotiations, design, procurement, networking, operations, and maintenance are expensive in dollars and time.

It's called planning ahead of time. If not, then here's a suggestion: Use EC2 until you set it up and migrate, if you cannot wait that is.

All in all I don't mind whether people use EC2 for whatever reason. Just stating my opinion. I agree of course that in terms of "convenience" is has the upper hand. Not having to wait for boxes to be added to data centers, being able to spin up boxes in multiple regions through a single company/console. Maybe your use case does justify using EC2. Many other people clearly do not (hence all the whining because of all the downtime, which they wouldn't have had if they deployed to multiple AZs/Regions).

> let me know how your Equinox negotiations go for the next month.

How do cloud services compare to a gym membership? Are you implying you can't get out of your AWS contract?

Sigh, I blame auto correct. See https://en.wikipedia.org/wiki/Equinix

If you want reasonable uptime, you don't have any single points of failure. So 2 of everything.

Robust systems aren't hip. Get back to work and ship, ship, ship.

That's how I feel about most hackathons. There isn't enough work done on existing projects, but quick hacks the demonstrate concepts are the norm. It would be great to see a hackathon where the primary purpose was to contribute to an existing open source project or something similar.

Be careful. Nothing is as it seems right now. Do not trust any API output, nor should you do any API operations that are non-recoverable. Things are up that are reported down and vice-versa.

Wait for the dust to settle. We're all just going to be a bunch of Fonzies here.

EDIT: Looks like API access has been restored, so I'm cautiously optimistic about things working. Note though that some instances may have rebooted or be otherwise impacted so check your error logs.

EDIT2: Nope, ELB is still hosed. Continue to be skeptical.

There's another comment thread going over at (http://news.ycombinator.com/item?id=4180339), if, like me, you got extremely lucky and picked today's lucky availability zones, and have time to read HN instead of scramble to get things back up.

Good luck, friends.

I commented along the same lines during the last AWS/Heroku outage, but Rackspace still is giving me amazing value and uptime, and every time I try to move away (as I did this week with my lastest project, on Heroku) I get hit with a massive service disruption that pushes me back to Rackspace.

Hi, I work at Rackspace. If you don't mind me asking, what makes you initially want to move away?

Funny to post this here, but I'm actually planning on migrating from Rackspace to AWS purely because of price.

Rackspace's prices are insane. $1,314/mo for a cloud server with 30gb of ram, compared to $657/mo for 34gb on AWS.

Plus with AWS you can use reserved instances to get that cost down to $286/mo. Rackspace has no way to get the cost down.

That makes Rackspace cost over 4.5x more when comparing based on ram.

That's a crazy amount to be paying a month for under $500 worth of RAM (2 x 16GB DDR3-1600 registered ECC for example). As the other reply mentioned, you should look into dedicated servers or colocation if you don't mind dealing with hardware yourself.

If you're that price sensitive/need big RAM and can use RIs you should really be looking at dedicated servers, too. (for which Rackspace is expensive as well, but actually provides a service level to justify it in many cases)

I disagree. The markup does not simply justify cheery service folks. I want knowledgeable. When they don't have support for Nginx, to me, that's not knowledgeable.

Rackspace prices are insanely high and I can't wait to move off of them.

I'm ok with a dedicated server provider being good at the physical and network without much focus on applications on the host. You can use a third party for host level sysadmin, but you don't have a third party choice for the infrastructure.

For the cost that Rackspace charges (and what they claim to provide for said cost) I want both.

Interesting, are you using any cloud management platform (e.g. RightScale)? They'll make it simpler to migrate.

If you're scenario is more complicated than a single server, you might find our tool useful to forecast your costs: https://my.shopforcloud.com/?guest=true (this link will create a guest account for you so you can play with it quickly)

disclaimer: I'm co-founder of ShopForCloud.com

If latency to europe is acceptable for you then you might want to look into Hetzner.

$657/mo will rent you eight 32G/i7 servers there.

If europe is not acceptable then you should still look into american colos (e.g. LeaseWeb recently opened a US DC) which are not as cheap as Hetzner but will still give you 3-4 servers for the money.

Doesn't Germany have crazy data retention laws and privacy regulations? I'd be more concerned about having to follow two sets of laws than trans-atlantic latency.

Complete curiosity. I love Rackspace, but am always looking to try new things - new languages, new databases, new methods of hosting, new DNS, new foods, new music - I'm just a curious person. It's not problem with Rackspace; all of my mission-critical stuff is still hosted with you. But I use random projects of mine to to explore in multiple ways (including Black Sands by Bonobo, which I discovered last week - possibly my favorite coding album now).

Black sands by Bonobo is amazing. The song 'Black Sands' on the album is wonderful[0].


Not an album for coding per se, but if you like Bonobo you might like Pepe Deluxe's new album Queen of the Wave. It's my favourite new album of 2012. Retro psych rock and as mad as cheese.

What a coincidence - "Pepe Deluxe" is my name for my penis.


...what do you mean lowbrow humor isn't allowed on hacker news?

Unless it is particularly clever, humor is not allowed on HN.

Gotcha! I'll have to check out Black Sands, too.

I prefer to see it as the best album of 2010.

I moved because Rackspace was unable to resolve performance problems and outages with the interaction of the LBs in front of Cloud Sites and the Cloud Sites PHP nodes in DFW.

He wants to play downtime roulette and Rackspace just won't let him.

I'm not sure if others have noticed, but Rackspace London is probably the worst host I've ever used in my entire life. I'm not sure if this is because they were acquired or what, but it's genuinely frightening how awful the service is and how rude, irresponsible, and incompetent the staff are.

That couldn't be any further from my experience in dealing with Rackspace London. Have always found them very helpful and their technical knowledge is the best I've seen.

If you're interested in information on the storms themselves and the destruction they caused in West Virginia, there's good coverage here: http://www.foxnews.com/weather/2012/06/30/state-emergency-de...

Gov. Earl Ray Tomblin in a statement: With temperatures near 100 degrees expected this weekend, it's critical that we get people's power back on as soon as possible.

So let me get this straight: the critical issue with not having electricity after a huge storm is that the A/C isn't working? And 100F/38C isn't even that hot, right?

High temperatures pretty much always lead to increased mortality for the old and sick, so yes, that is one of the critical issues. Most other places where power is critical (e.g. hospitals) have backup generators and diesel supply contracts.

100F in a non-dry climate is very hot as it feels like 110F. Heat sickness is a very real concern, along with food spoilage and pets. Additionally, sick people without access to A/C in 100F weather can become even more sick very quickly.

100F with high humidity will result in a number of preventable deaths due to heat exhaustion, heat stroke, dehydration, etc. So yes, it is the most critical thing, along with power to hospitals and emergency services.

Several hospitals in this area are running on generators right now b/c of these storms. They are flying in additional generators too. It was pretty bad. Six deaths in VA so far directly attributed to the storms.

Yes it's that hot. Google about people dying in homes at temps < 100, even.

The status page seems to really underplay the severity of the situation. Netflix and Heroku are down, yet these are just side effects of 'performance issues' instead of a 'service disruption'. I wonder what it would take to cross that threshold.

AWS has historically been both slow to update and heavily optimistic with their status page.

When I got the frantic texts when EC2 first dropped offline, sure enough, the AWS status page was all green, but twitter was alight with people talking about it.

I suspect a service disruption would have to be Godzilla.

According to other HNers, their RSS feed seems to be fairly accurate (someone please verify this), which makes the whole thing that much weirder.

Not weird at all. Companies the size of Amazon are practically bound to be schizophrenic.

You called it. I had my money on the volcano.


When our EC2 hosted website seems offline and AWS status page is all green, the first thing I do is check twitter. Usually you can see people asking about outage before amazon acknoledge it.

Must be the same storm that took several of my trees down (east coast Virginia USA) last night. It was a violent storm. 90 MPH winds. Made 80 foot tall oaks bend like straws and they were almost touching the ground. I spent the morning running the chainsaw just to clear the downed trees from the driveway.

AEP (local power company) says about 65% of customers in this area are w/o power. May be days before it's fully restored. Hope no one from the HN community got hurt.

Edit: I posted this from a computer in town. No power at my place so I can't respond to follow-up posts.

According to Colin Percival on Twitter[1][2], the US East-1 AZ has more IP addresses, and thus probably other resources, than the rest of AWS put together. It casts comments about "limited to one availability zone" into some relief.

[1] https://twitter.com/cperciva/status/219067641023840257 [2] https://twitter.com/cperciva/status/219067963356098561

> the US East-1 AZ

us-east-1 is a region, containing multiple AZ's.

Pardon my rant, but I am frustrated. It seems there is always an excuse with Amazon cloud. Is Google similarly disabled?


I wonder how much of Google's reliability comes from service failover and redundancy and how much from very reliable datacenters etc. I'd find it hard to believe that their platform and DCs aren't better than what we have seen from AWS, which could make compute engine a very attractive product.

A lot of it comes from automatic failover. Individual data centers have issues frequently, but if you're using High Replication Datastore then you won't notice it much, apart from occasionally seeing all your instances getting killed and restarted in a new datacenter, which also results in memcache getting reset.

Why is this being downvoted? Is it inaccurate?

If you mean GAE, its even worse...

I can only talk about my experience, but I've had zero downtime with GAE since we migrated to the HR datastore. Development is x1000 harder, but then everything Just Works.

I had used it before the HR, so it might have improved.

At the time It was like voodoo, and you had to triple-check your datastore actions, because they could fail for no reason at the backend.

App Engine's reliability massively improved with the HR datastore, and has gotten even better since the pricing change / SLA guarantee. It's actually remarkably good now, I recommend taking another look.

I thought App Engine was much more reliable when they switched over from the Master/Slave Datastore to the High Replication Datastore.

You can use the EC2 API and ec2-describe-availability-zones to find out which availability zone is having issues: http://alestic.com/2012/06/ec2-outage-availability-zone

Interestingly, this is a great time to see which of your favorite websites are rock-solid and which are kind of shaky.

I've been thinking about building a site with a Parse backend, and they're up, which is good to discover.

It's like looking for a house in the rain, so you can see where the water drains.

Is this the same EC2 zone that went out just 3-4 days ago??

Second or I believe third power outage/loss of service for AWS in the past 10-days if I'm not mistaken.

This is wild. I wonder what's going on at Amazon and if they're capable of handling this much usage in addition to having power issues, etc.

Instagram and Netflix servers are down from what I hear and have been down for a few hours. Now it makes sense that they're being hosted on AWS.

If you have a load balancer you may have balanced across availability zones (Not regions) you'd still be up. So US-EAST didn't all go down, just one AZ.

But many people are saying that despite paying for multi-AZ for RDS, they were still down. Do you think they didn't also load-balance across AZs for their webservers?

Do we know this is due to an electrical storm? Today had a leap second as well (The minute of midnight, June 30th lasted a second longer than normal).

The leap second has not happened yet[1]:

                                   UTC TIME STEP
                            on the 1st of July 2012
  A positive leap second will be introduced at the end of June 2012.
  The sequence of dates of the UTC second markers will be:		
                          2012 June 30,     23h 59m 59s
                          2012 June 30,     23h 59m 60s
                          2012 July  1,      0h  0m  0s
[1] http://hpiers.obspm.fr/iers/bul/bulc/bulletinc.43

At least for me Netflix died at exactly midnight....

And? The leap second was not added at midnight lastnight...

This is the kind of weather conditions that spawn very electrically active storms. I don't doubt they could cause the issues. Last night was probably the most electrically active storm I've ever seen up here -- virtually non-stop lightning strikes for an hour or two, and there's another just like it over Virginia right now.


So how would an electrical storm take down AWS? Don't the data centers have backup generators?

Would it help if they designed the building as a Faraday cage?

The problem is in the power network, not in the machines.

But is it the power network inside the building, or outside? Presumably, the problem wasn't a power outage in the city's grid.

8:40 PM PDT We can confirm that a large number of instances in a single Availability Zone have lost power due to electrical storms in the area. Amazon Elastic Compute Cloud (N. Virginia)

reading the linked page, "AWS is down" means "some N. Virginia AWS services are down"

Just pay the extra money and get off US-East people.

EU-West had similar levels of outage last year due to a lightening strike. Twas out for several hours and took a few days for everything to be back to normal.

Acts of God can happen anywhere.

You certainly mean bad weather, don't you?

Yes, that's what Act of God means in the US legal system. http://en.wikipedia.org/wiki/Act_of_God

My appologies. I still believe it's wrong to call it that way because it assumes too many things. Another way to word it is Force Majeure. http://en.m.wikipedia.org/wiki/Force_majeure

No, it isn't another way to word it. Force majeure includes acts of nature _and_ acts of man. Acts of God is only acts of nature.

It's somewhat important to the original spirit of the comment since Acts of God might indeed happen anywhere (1) whereas acts of man might not. For example, disruption caused by on going war in Syria wouldn't be covered by an Act of God clause.

(1) I think that's BS though...there are definitely some places where nature is considerably more stable than others.

God. Legal term. Funny things.

% of overhead power lines in the US are much greater than say in Amsterdam...

us-west-2 is the same price as us-east-1. Price is no excuse anymore.

Latency to Europe kind of is, if you want to be in a single site with acceptable ping times to Europe, US East, and US West. But I'd rather just solve replication and be in each.

If latency to Europe is an issue, why not just have a few machines in their Ireland datacenter?

Because distributing sites across multiple regions is hard -- high latency between your front end and database sucks, and replicating a database over high latency is also hard. If you can do it, great, but if you can't, then pick the best single region.

US East or US West plus Direct Connect to your own colo space, with AWS for the burst capacity, and your own redundancy for the database servers, might be the best plan if you can't do wide area "over the Internet" database replication. (I might get an extra 10G DC (since I need 1G DC myself) and then have some colo for sale with it in US East/US West later this year.)

how is it that amazon.com itself is never, ever, impacted?

edit: so basically, the businesses suffering outages (heroku, netflix, etc) don't value uptime to the same extent that amazon does. they got what they paid for.

Amazon.com does not run on the same EC2 that you and I use. It runs on a nearly identical system that is isolated and private to Amazon. I wouldn't be surprised if they were in entirely different physical locations.

This outage only affected us-east-1. Considering an Amazon.com outage would cost them $51k in lost sales every minute, I seriously doubt they put all their eggs (servers, that is) in one basket.

Cloud taken out by a cloud.

It looks like this is affecting iTunes Match, possibly. I have two tracks just sitting there, waiting to upload and running lsof -i shows iTunes with a connection to an AWS machine.

unfuddle is down also, can't access any of my repos. I was going to spend the day working too - oh well time for a long lunch.

This isn't the first time this has happened to AWS - we moved our app to linode last year after this happened to us and it seems to affect AWS more than any other hosting i've ever used, i'd be interested to know how their infrastructure is set up because it doesn't seem particularly robust.

Would it be expected that Amazon will issue substantial refunds (e.g. no charges for June hosting for impacted users) due to the problems today?


Site failover is one of the exercises that are left to the user. Considering how relatively bare-metal an experience AWS provides, this has never surprised me.


Are you suggesting that AWS should operate out of more than one location?

It's a wonderful idea!

I even have a list of possible locations they should look into. Beyond the Virginia site, they should be looking at DCs in Oregon, California, Ireland, Singapore, Tokyo and even Sao Paulo. What do you think?

Ok startup people. It is worth it to host in a different zone than US-EAST.

I host in Europe (Dublin I think). I'm from here and so are my customers so there's no latency worry for me.

with storms getting worse each year, i'll likely be choosing either central or westcoast datacenters ... from now on.

Tornadoes in the midwest and earthquakes in the west. You're doomed.

Seriously though, as horrible as downtime is, I think most internet users aren't terribly surprised when they can't go to a specific website for a short period of time.

For ecommerce, people will just got buy somewhere else. So not to big a deal for them but a really big deal to your bottom line. Especially when you are still paying for traffic acquisition, sure you can turn Adwords off in minutes but other traffic sources may take 24 hours.

If you're willing to go outside AWS...Denver. We occasionally see tornadoes out here, but Denver is outside "Tornado Alley" [1], so they're pretty rare. On the WikiPedia tornado page they set the odds of a tornado hitting the downtown area of a big city at around one in a thousand per year.

If you want to be extra paranoid, you could always pick a secondary data center location where it would be safer in the summer, since Denver would be safe from tornadoes in the winter. [2] Or you could just put one on the West Coast and hope that there's not a 1-in-a-thousand tornado within a week of a big earthquake.

[1] https://en.wikipedia.org/wiki/File:Tornado_Alley_Diagram.svg

[2] http://www.weather.com/outlook/weather-news/severe-weather/a...

Tornadoes in the midwest and earthquakes in the west. You're doomed.

Yup, everywhere has got its natural disasters.

The Europe (Ireland) location is pretty high up the "nothing interesting ever happens here" scale.

What about the revolution?

Any press is good press? Can't count the number of times I read about the Twitter fail whale.

I'd be interested in seeing someone polish this turd. How would you spin amazon being down into good press when people are looking for reliability. Only thing I can think of is people will be writing about you and that should increase your page rank but I doubt amazon is concerned about higher page rank.

Anyone else care to speculate how this could be good press?

It could be an opportunity to explain what reliability actually is (it's not "pack one availability zone with all your stuff") and how AWS helps you achieve that.

That's awesome spin. So Amazon gives you the Chaos Monkey, whether you like it or not.


"I can't run my business/research/whatever"


"This site is so popular it barely works"

Google Compute Engine anyone?

Windows Azure anyone?

HP I-forget-their-catchy-name-for-their-service anyone?

Their current VM performance is incredibly bad. To be fair they are in some preview mode.

What's bad (disk? cpu?). It's on my list to experiment with, but haven't yet.

I just got a promoted tweet by VMWare Cloud or something like that. Considering the rest of my feed was full of #EC2 #fail, it was a good choice.

Dedicated servers anyone?

RedHat OpenShift?

Or they're just like Heroku and sit on top of AWS?

They don't (RH has its own datacenters) but I think it's still in some kind of beta. Great thing is that you can rip your app out of there and maintain the "service" yourself. Also, alternative providers for OpenShift will probably pop up over time.

I signed up for access, but no love yet.

So cloud is not ready for storm? (troll face)

Isn't that bastard operator from hell excuse #74?

We're soooo screwed

So cloud is not ready for storms? (troll face)

I am getting tired of all of these outages.

I know outages happen all the time at hosts, and maybe as a result of either a) news is more accessible now, or b) Amazon is bigger than most other hosts....I feel like Amazon & Heroku are going down WAYYYYY too much.

I am starting to wonder if this "tell all" policy is really best.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact