Hacker News

carsongross · on Feb 25, 2012

I lose money when Heroku is down, and let me say this: Heroku is a f$(king miracle, and you'll pry our apps cold, dead fingers off of their stack.

They screwed up. They'll fix it. Any ops team and hosting solution I could hire for a comparable amount of money wouldn't even come close.

matt2000 · on Feb 25, 2012

Just out of interest sake what are people doing as a backup to heroku, and is failover automatic or do you have to make DNS changes?

jvehent · on Feb 25, 2012

DNS changes does not provides high availability at all. Even with a ttl of 60 seconds, if a cache decides to overwrite your TTL, you're screwed.

Real high availability comes with a price: redundant datacenter with BGP failover.

imbriaco · on Feb 25, 2012

While BGP is certainly better, the vast majority of DNS servers honor TTL nicely these days. In the old days, there used to be many problems, but I have been shocked at how well DNS failover has worked in the past few years.

pors · on Feb 25, 2012

How does that work? If you have a link explaining it a bit that would be great! (a Google search shows me all sorts of solutions for ISPs, but not for application hosting).

edouard1234567 · on Feb 25, 2012

Got to love their last two updates. Spot the difference :).

UPDATE: Process management error rates remain high. Engineers are continuing to investigate and work towards a resolution. FEB 25, 2012 – 20:27 UTC – 10 MINUTES AGO

UPDATE: Process management error rates remain high. Engineers are continuing to investigate and resolve the issue. FEB 25, 2012 – 20:11 UTC – 26 MINUTES AGO

jared314 · on Feb 25, 2012

Sounds normal. They have nothing to add, but they have to say something.

wilfra · on Feb 25, 2012

I believe what he meant was that the first update implied that they had identified the problem and were fixing it ("continuing to investigate and resolve the issue.") - the second implied they were still looking for the problem ("continuing to investigate and work towards a resolution.").

Subtle but important difference.

mechanical_fish · on Feb 25, 2012

Having once been one of the engineers in charge of fixing problems very similar to these... don't read too much into the subtleties of the phrasing of these customer notices. ;)

They all translate to: "The engineers' hair is still on fire and they continue to have no time to tell me exactly why."

dholowiski · on Feb 25, 2012

Yet another reminder that well services like heroku are great, its up to you to make sure you have the appropriate redundancy in place to make sure your critical app doesn't go down. If you host anything critical on heroku, you _need_ a backup server elsewhere (not ec2!)

lukev · on Feb 25, 2012

But doesn't that defeat the whole point of Heroku? If your app is running redundantly elsewhere, that means that you've already done all the work to set up the stuff that Heroku normally provides. And if you've already done that work, why use Heroku at all?

wilfra · on Feb 25, 2012

This is true. Heroku needs to be redundant. This shouldn't happen on Heroku.

edouard1234567 · on Feb 25, 2012

I agree but we'll need to know more about caused this, hopefully they'll be a little more transparent than usual given the severity of this failure. In the meantime I have a new startup idea: Heroku failover cloud hosting.

imbriaco · on Feb 25, 2012

I am probably biased, since I wrote many of them, but I like to think Heroku has a pretty good reputation for writing meaningful public post-mortems when there are large scale outages. It's safe to give them the benefit of the doubt that they'll continue.

overworkedasian · on Feb 25, 2012

isnt heroku based on EC2? they should be able offer some type of redundant package for customers willing to pay for it. i mean, lets be serious, people WILL pay for it, assuming it works.

DannyPage · on Feb 25, 2012

Having this problem too. The thing that I'm worried about is that my site just keeps trying to load without any error message or 404 page appearing. Is there a way to get something to show up to inform users of the downtime?

(Or ideally, a way to point towards another instance of the site quickly. I'm worried the DNS wouldn't propagate fast enough)

wilfra · on Feb 25, 2012

Mine says:

Application Error An error occurred in the application and your page could not be served. Please try again in a few moments.

If you are the application owner, check your logs for details.

Stwerner · on Feb 25, 2012

Mine does too...eventually. It looks like it is taking ~15 seconds to bring that page up.

edouard1234567 · on Feb 25, 2012

When heroku goes down. 1M+ apps go down.

rhizome · on Feb 25, 2012

Now's the time to launch!

edouard1234567 · on Feb 25, 2012

Heroku should know better and not host their status page on their servers since it becomes unaccessible when most needed. This problem started last night and they detected it this morning, monitoring system FAIL.

ohgodthecat · on Feb 25, 2012

It appears to be hosted on rackspace (right now at least). But it doesn't look automatic so perhaps the people were not available to update it til this morning.

imbriaco · on Feb 25, 2012

The Heroku status site is indeed hosted outside of both the Heroku platform and EC2. It has been that way for at least a couple of years that I have direct knowledge of, and probably longer.

edouard1234567 · on Feb 25, 2012

The page wasn't available earlier so something from their stack is probably shared. Maybe the problem was is in how they route the traffic to this status page. Bottom line, this page should stay up and the only way to almost guaranty that is to make sure nothing is shared between heroku's infrastructure and where/how the status page is hosted/served.

imbriaco · on Feb 25, 2012

It's entirely possible that the status site had an unrelated issue. It happens. That said, I ran their stack until 2 weeks ago, so I can speak authoritatively on this: Nothing from their stack is shared with the status site. Period.

edouard1234567 · on Feb 25, 2012

If that's the case (I trust your authority in this), then maybe the status page went down because of the spike in traffic.

wilfra · on Feb 25, 2012

and not eating its own dog food...

http://blog.warsocial.com/post/18262715784/mmmm-dog-food

mechanical_fish · on Feb 25, 2012

That comment is made from sour grapes.

If some bits of Heroku seem to be up, but not others, it could be because those bits are running older versions of the infrastructure. Or are being used as test beds for newer versions of the infrastructure. Or are deliberately running on a separate platform so that, when the main Heroku infrastructure starts having problems, other bits of Heroku's domain are still around to dispense advice on how to work around those problems.

(The most extreme example is a company's "status" subdomain, which ideally should be hosted on a completely different server, in a completely different datacenter, on a different continent located on a distant planet with different DNS regulations.)

imbriaco · on Feb 25, 2012

The devcenter site is absolutely hosted on the Heroku platform.

I don't have first hand knowledge of this specific incident, but I do have a very deep understanding how the Heroku platform as a whole works. It is incredibly likely for many service disruptions to affect a only subset of applications, and not the platform as a whole. The system is, in fact, specifically designed to isolate failures of individual components to as small a failure domain as possible. That doesn't always work, but implying that there's something nefarious going on or that Heroku is not eating their own dogfood with the devcenter site is dead wrong.

c00w · on Feb 25, 2012

It is. Heroku.com is down.

glenngillen · on Feb 25, 2012

  % host devcenter.heroku.com                                                                                                                                                                          
  devcenter.heroku.com is an alias for iwate-88.herokussl.com.
  iwate-88.herokussl.com is an alias for elb002001-124710749.us-east-1.elb.amazonaws.com.
  elb002001-124710749.us-east-1.elb.amazonaws.com has address 107.20.144.56
  elb002001-124710749.us-east-1.elb.amazonaws.com has address 23.21.215.77
  elb002001-124710749.us-east-1.elb.amazonaws.com has address 107.21.227.222

It's 100% on Heroku, it's just also using the ssl:hostname add-on (https://addons.heroku.com/ssl) which requires different DNS settings (http://devcenter.heroku.com/articles/ssl#hostname_based_ssl).

wilfra · on Feb 25, 2012

Our app is back up.