Ask HN: How has the Amazon AWS outage affected you?

nostromo · on April 21, 2011

Our database (RDS) is completely inaccessible -- so our website has been down since 2am or so, showing our maintenance page. It's backed up -- but we have no way to download the most recent backup and move to another server. We tried rebooting our db about 11 hours ago -- and it's been stuck in reboot mode ever since. We tried creating a copy from a recent backup -- but it's stuck in boot mode. We can't move the snapshot to the West Coast farm, so we're really pretty blocked until Amazon gets things in order.

A lot of people on the boards suggested Amazon make the automatic backups of RDS available for download for instances like these. Having a backup is great, but not if you can't access them in an emergency.

On the message boards, someone said that they had selected to have RDS automatically keep a copy in multiple availability zones -- and they said that worked for them. I'm not sure however, since a top post on HN is saying that all of the Virginia zones were effected -- so your mileage may vary.

st3fan · on April 24, 2011

"Offsite Backups" - Where Offsite means, not on Amazon's infrastructure. Seriously.

api · on April 21, 2011

Reddit is down, so I got more work done today.

garyrichardson · on April 21, 2011

I didn't. I was too busy reading blog posts about the downfall of cloud computing.

jk215 · on April 22, 2011

I woke up this morning in disbelief that it was still down. Its friday all I want to do is put off work and browse reddit.

venturebros · on April 21, 2011

I didn't! I kept taking naps and coming back to see if it was up.

Reddit actually helps me work more.

freerobby · on April 21, 2011

We're temporarily down with no permanent damage. But we have no way to relay that info to our customers because everything, including our users' email addresses, is inaccessible. Big oversight that I will be fixing as soon as we're back up.

It feels absolutely terrible to only be able to assure customers that their data is safe _after_ they write a panicked email asking if their data is gone.

martingordon · on April 21, 2011

I was hosting a client's app on Heroku and they had a big pitch today during which they weren't able to show the site. Really, really horrible timing.

dholowiski · on April 21, 2011

Yikes. That's how I feel... Yesterday I explained how my site, although very ugly, could handle all the traffic you could throw at it and more. Today, it can't handle any.

fomojola · on April 25, 2011

Saw the same RDS behavior nostromo saw: was actually truly interesting. After 2 days of arguing with the RDS database I actually spun up a new local storage instance, logged in over the mysql console at a shell (which actually worked, surprisingly enough) and then found out that a single table (this is a Wordpress instance) was hung: I guess the RDS instance couldn't access the disk file for that table (it was MyISAM which I guess isn't in memory?) so any requests for that specific table hung. As it so happens that was the wp_options table, so there was no way to get past that.

So I sat there with mysqldump and exported each table separately, then spun up a new MySQL instance in another AZ and then imported each table piecemeal. I then re-created the wp_options table from another dump (from an early prototype on Linode, actually) and then manually fudged the values till it all worked. That eventually worked for me.

mindcrime · on April 21, 2011

How has it affected me? My posting of the Wikipedia page on "Fallacies of Distributed Computing" made it to #2 on HN, and I got a big karma bump; and wasted more time than usual on HN today, discussing this stuff.

Also, since Quora was down, I didn't get my usual quota of Quora surfing in.

Otherwise, today has been "business as usual." :-)

cperciva · on April 21, 2011

No effect at all that I've noticed. Tarsnap doesn't use EBS.

webmonkeyuk · on April 21, 2011

My hastily thrown together article on how to work around EC2 outages got to #8 on HN and received ~2000 reads

http://news.ycombinator.com/item?id=2471258

kathryna · on April 22, 2011

I run an app that helps farms manage their weekly or monthly CSA programs. One farm had their monthly distribution yesterday - thankfully they had already downloaded the data they needed to pack and label the orders, but they weren't able to tell customers their final total or send reminders, so they had higher-than-average missed pickups and are receiving money several days later than usual. Farmers are already distrustful of digital/automated solutions, so I'm sad that this has added to that distrust.

ig1 · on April 21, 2011

Startup's site has been down all day, probably lost about ~$500-$1000 in sales, more concerned about the longer term loss of goodwill and the potential damage to google seo rankings though.

benologist · on April 21, 2011

My leaderboards, user-created levels and other miscellaneous bits of Playtomic run on AWS via MongoHQ and I've spent 1/2 the day answering emails explaining the situation. On the plus side MongoHQ are planning to introduce the ability to mirror databases onto our own servers which is going to be frigging awesome.

It at least highlighted another issue - because users are trying to access databases that don't exist they're taking ages to time out which is bogging down the servers' other thousands of requests/second.

colinplamondon · on April 21, 2011

Our registration/login/download/stat servers for all our iOS apps (Free Books + Free Audiobooks + Classicly) are all hosted on Heroku, and have been completely down all day. Not good.

mcotton · on April 21, 2011

I am running my phone processing/tracking app on EC2 so it's been up and down all morning. The fall back also has some dependencies on S3 so it has been a bad day.

abraham · on April 21, 2011

I wasn't able to view a Foursquare checking for a little while and I've had significant amounts of my morning news be related to AWS, the cloud, downtime, etc.

zacharypinter · on April 21, 2011

My team's single sign on prototype was deployed to heroku so that we could develop our apps against it (Android, iPhone, WP7, jQuery Mobile). It would have stalled development, but with a bit of tweaking I was able to get the prototype deployed to cloudfoundry and everything continued as planned.

jaredwill · on April 21, 2011

I manage four EC2s all with EBS and located on the east coast; luckily haven't had any issues.

bdclimber14 · on April 21, 2011

I run my main startup on Heroku, so OrangeSlyce.com has been completely down all day.

dpcan · on April 21, 2011

Might as well get some good out of it.... please describe what your service is.

bdclimber14 · on April 21, 2011

Sadly, I almost replied "just check out the website." I made a site for graphic design students to find freelance gigs that are posted by local small businesses who need cheap, small amounts of design work done. Thankfully this hasn't been very big, and mostly just a side project recently, so not too many users are being affected.

dpcan · on April 22, 2011

That's a great idea, can't wait to check it out when it's back.

callmeed · on April 23, 2011

I've got an app in private beta on Heroku. Closing in on 2 days of downtime.

Not a good way to instill confidence in the 50 or so potential customers I let into the beta.

Plus my iOS app gets data from a Heroku app that is down.

adpowers · on April 21, 2011

My website and e-mail have been down for 12 hours and counting. Alas, my only instance that has survived this is running a website that is still under development.

mtogo · on April 21, 2011

Actually, i was surprised to find i wasn't affected by it at all. Apparently everything i use either knows what they're doing or wasn't in the us-east-1 region.

sofuture · on April 21, 2011

Ditto, we're just in the process of moving to EC2, and were happy to find our staging environment (the only thing moved over so far) is happily healthy (of course it gets minimal load).

adelmand · on April 21, 2011

I provide media streaming services via EC2 to a few customers - basically all of the non-live streams are unavailable.

triviatise · on April 21, 2011

triviatise.com is down, but we just ended our last giveaway to close out our alpha on the 15th so the impact is minimal. We are working to figure out what a good backup plan to heroku is. Im sure a lot of other people (including heroku ) are doing the same.

hdragomir · on April 21, 2011

Nope. Our failovers are on self hosted servers, where the core of each app lives anyway.

gcr · on April 21, 2011

My site still works! (Thanks, prgmr ;))

henryrose · on April 21, 2011

Our test environments aren't available.

ascendant · on April 21, 2011

The company I work for offers a small hosted document management SaaS solution that uses Amazon as the hosted environment, so we've been having issues related to (surprise surprise) EBS.