Hacker News new | past | comments | ask | show | jobs | submit login
Heroku Is Down (heroku.com)
43 points by jonstaab 43 days ago | hide | past | web | favorite | 12 comments

Is anyone else getting timeouts with their heroku postgres instances? My dynos are healthy, but pg isn't. The status page doesn't mention anything about that.

Edit: it seems the postgres instances are running on amazon ec2, so that's probably why they're down. Edit: they're failing over postgres instances now, so we should be back up soon.

My app is down, got alerted by my monitoring. (It's a hobby/side project app, so not a big deal, I'd be freaking out if this were a money-making production app. In this case, first thing I did before even checking heroku status was seeing if I had let my domain name expire... nope! phew).

While the heroku outage message that suggests the main problem is "dynos can not be restarted" -- in fact what happened to me was at 3pm UTC Heroku tried to do an automatic "daily restart" of my app, triggering an outage of my app. I did not do anything else to ask for a restart.

The heroku stats/events page for my app shows ~300 critical errors, starting at 3pm UTC, immediately following "Dyno restart: Daily restart" event. (~1 hour 40 minutes before now). All appear to be "H20 App boot timeout". So perhaps the app has been down since then, although i would have thought my monitoring would alert me before now.

If when a reboot fails, it tries again, and fails again (~150 times an hour), and this is happening to a whole bunch of apps, especially as a consequence of heroku's automated dyno cycling... I can see how it would make a bad problem even worse/hard to recover from in heroku's infrastructure.

To heroku's credit... I'm not sure if I can remember this kind of widespread outage ever happening before. Against heroku's side... I'm not sure what a heroku-deploying customer would be expected, by heroku, to do, to avoid downtime here. "Have your whole thing runnable on some non-heroku stack" kind of detracts from heroku's value proposition, which is you won't have to figure out stuff like that, heroku will do it for you. If I was going to do the work to ensure my app could be switched at any time to be deployed on some other stack... I'd probably just use that some other stack instead of heroku, as it's probably cheaper than heroku. The answer is probably "No matter how much you are doing yourself, outages are possible. By using heroku you put it in their hands, but outages are still possible."

update possibly related? https://status.aws.amazon.com/?date=2019-08-31 https://status.aws.amazon.com/?date=2019-08-31 (in which case you might have been saved by having multi-region deployment on heroku. But if it's only effecting us-east-1, you'd think heroku would have noticed and said so? Could be unrelated in an odd coincidence. Or perhaps there's effected heroku infrastructure on us-east-1 regardless of what region you choose to deploy on heroku).

> To heroku's credit... I'm not sure if I can remember this kind of widespread outage ever happening before. Against heroku's side... I'm not sure what a heroku-deploying customer would be expected, by heroku, to do, to avoid downtime here.

This is the trade-off of lock-in into simple deployments.

It's largely the same, EKS/Beanstalk/maybe lambda + RDS + S3... Azure has similar offerines, GCP has similar offerings, Heroku has similar offerines. You can get your app up, running, hosted for a dime - cost based on usage. And you can do a lot of stuff with very little operational time invested. That's why this is so great for small startups with 5 technical dudes.

However, you can't get out. You're comitted to their abstractions for relational databases, deployments, DNS, loadbalancing, ...

If you go ahead and do the things you need yourself based on simple linux VMs, you can have much more mobility across cloud providers. Give us a terraform endpoint and some kind of centos image and we can probably host there, speaking of work.

However, that's overall an expensive and non-trivial thing. Suddenly, you're having so many problems heroku has a solution to. Suddenly you need someone who knows how to run a postgres cluster and how to handle failovers, or how not to. Suddenly you need someone who knows how drives should be handled.

This is becoming a really big rift I'm overall starting to see. Applications and deployments are becoming more and more trivial, but there aren't many people understanding how to make the stack below it run reliably, with low chances of data loss.

They paused all the dyno restart settings. Also, what monitoring system are you using?

Reddit also seems to be down

Wonder if there's an underlying AWS issue.



I just poked one of my really old applications running ruby 1.8.7 and got a bunch of errors about half an hour or so ago. The status page was fine then.

I hope it wasn't me that broke it lol

Sling TV is also down, which makes sense given the AWS outage.

We make moderate use, 0.5M MAU, of Heroku and AWS-east-1 and appear to be up.

Thank goodness... Last thing I needed on a 3 day weekend was an outage.

My dynos and postgres dbs are ok, thank goodness. Hopefully it’s been fixed already.

reddit is responding for me about 20% of the time.

forum.wordreference.com is also completely down for me.

Man I wish literally all of the web ends up on AWS and then we get an outage.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact