Hacker News new | comments | show | ask | jobs | submit login
Heroku Sandy Situation Room (heroku.com)
100 points by ujeezy 1669 days ago | hide | past | web | 32 comments | favorite

I really feel for the Amazon employees working in the us-east data center this week. Not only do they have to worry about the safety of their families, but half the internet will be watching them if anything goes down. I can't even imagine all the contingencies they have to prepare for (e.g. engineer gets paged, but has no power at home, or can't drive in due to flooded streets). Best of luck to them.

I have pressureNET running on EC2 in US-East. It's collecting data about Sandy from a bunch of Android users in the region, and I only realized a few hours ago that...it's gonna get hit by Sandy. I'm not sure what to do.

The Android app has a hardcoded server URL to the DNS of my instance. If the server goes down, and I prepare a backup, all my users will still be sending to the old, dead server (assuming Sandy takes out AWS in that region). So I can also update the app and give it a new server url. But I'm going to lose many hours of valuable hurricane data as users take time to get the update, etc.

Does anyone have any thoughts on how I can handle this?

I realize my mistakes and know how to fix them for next time, and my current data is obviously backed up. But for incoming data...am I screwed?

Edit: I have an idea. I'll update the app and give it a backup URL to use only if the main one is non-responsive. Then I'll publish the update and cross my fingers.

Before you need to, set your DNS TTL as low as you can.

Edit: Rewrote my entire comment...

The delay on the app side should only be 15-30 minutes. Use a replication database (much like the Postgres follower system on Heroku) to ensure no loss of data and you should be fine.

"We recommend customers with production databases (Crane and up) create a follower running in us-west-2 using the --region flag (this is an alpha feature we are exposing ahead of schedule to prepare for this incident)"

Finally! Excited multi-AZ support is coming to Heroku.

Regions are comprised of multiple Availability Zones. Heroku has always been running on multiple AZs.

Thanks for the clarification. Meant to say regions.

Looks like only production databases for now, though. Hopefully they'll branch out to us-west for apps as well pretty soon.

I think figuring out how to let third-party addons talk to Heroku in multiple regions (given that there's no internal-to-AWS load balancing/elastic IP type solution) is the hard part with that, although I have no specific knowledge of Heroku to base that on.

Those on EC2 might want to read http://alestic.com/2010/10/ec2-ami-copy for their own disaster preparations. (Though ideally you should have been on it before now...)

I was hoping http://blog.linode.com/ would mention risk assessment/strategy on Newark center.

This is a good summary of data centers at risk: http://readwrite.com/2012/10/29/hurricane-sandy-vs-the-inter...

You were hoping for transparency from Linode? I've been a long time Linode customer (and don't have any plans to change that any time soon), but Linode has never been great about transparency, and that has only gotten worse over the past few years...

As per usual linode doesn't really say much about anything really.

A few Heroku add-ons have also high availability options. Along with a new add-on status page http://status.addons.heroku.com/

RedisToGo - http://blog.togo.io/status/redistogo-hurricane-preparation/

MongoHQ - http://blog.mongohq.com/blog/2012/10/29/monitoring-the-weath...

Why aren't there more data centers in the solid craton part of the content far from these inevitable coastal problems? Genuinely curious as to why Minnesota or something is the datacenter capital of the US.

The datacenter where most of my servers are is not too far from Philly, but I have 0 worries about it. Why? It was properly designed and maintained from the start.

Not only is the first floor 6 feet above the ground, but it is at the top of a gentle slope that naturally drains all water away to lower areas. The water level locally, would have to be about 20 feet for the first inch of water to push its way through the doors.

There is absolutely NO reason that any datacenter in VA should be having problems, aside from either shoddy facilities management or poor initial choice of the site. Don't let anyone tell you differently.

I've always assumed it's more about the backbones of the Internet. If anywhere in the world, being close to the Capitol of the largest economy on earth is likely to be the best place to put your datacenters.

You know except for hurricanes and terrorism but whatever it's got a fat pipe.

One reason is because fibre optic cables come ashore near major coastal cities.

I never thought of that- they don't just have aws availability to worry about, they have the availability of 79 external service providers, totally out of their hands, to worry about.

it wouldn't matter as much. if your primary app and workers via Heroku is down (which presumably are the ones using the add-ons) none of the end users are going to notice the add-ons being down.

On the other hand, Heroku could do everything in its power to keep its main system up, but the failure of a highly used addon, totally out of their control, could cause nearly as much of a problem for them.

We are complimenting companies for not having redundant setups but telling us about it, when they have been claiming all along, that they have redundant setups?

It is funny to see the points on the parent comment, "oscillate", going from 2 or 3 up to its current zero.

Is US-east the least reliable of all AWS data centres? Arguably, Virginia is more susceptible to natural disasters than Oregon or Northern California (while we wait for that 1 in a 100 year earthquake).

It's also by far the most heavily used of AWS regions, so they face many scale problems there that other regions don't have to worry about yet. And it was the first, so there's probably some legacy baggage there (i.e. they've learned from mistakes when building other regions).

I prefer Dallas (Colo4, specifically). They've been incredible. Only one significant outage in 7 years. They're nearly half way between coasts too, so they're a great latency compromise if you're stuck with a single DC.

Rackspace also has a big data center in DFW. I only remember one or two outages in the last several years.

AWS doesn't have a presence in Texas.

AWS actually operates 5 independent data centers (called availability zones) within the same complex that make up the us-east-1 region. This makes them significantly more likely to experience an outage in 1 availability zone, but also makes it much easier to architect around the problem.

Actually, AWS has more than 10 data centers that make up US-East-1. This was mentioned in the post-mortem published last week.

Availability zones in US-East-1 are at least one data center, but may be multiple facilities in geographic proximity.

This great that Heroku is doing this. It's nice to know that your cloud provider is aware of and preparing for disasters like this.

Heroku has had enough really high profile failures, due to their reliance on aws in the past that I would expect nothing less. If they haven't learned from amazon's mistakes yet then there is something wrong.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact