Edit: it seems the postgres instances are running on amazon ec2, so that's probably why they're down.
Edit: they're failing over postgres instances now, so we should be back up soon.
While the heroku outage message that suggests the main problem is "dynos can not be restarted" -- in fact what happened to me was at 3pm UTC Heroku tried to do an automatic "daily restart" of my app, triggering an outage of my app. I did not do anything else to ask for a restart.
The heroku stats/events page for my app shows ~300 critical errors, starting at 3pm UTC, immediately following "Dyno restart: Daily restart" event. (~1 hour 40 minutes before now). All appear to be "H20 App boot timeout". So perhaps the app has been down since then, although i would have thought my monitoring would alert me before now.
If when a reboot fails, it tries again, and fails again (~150 times an hour), and this is happening to a whole bunch of apps, especially as a consequence of heroku's automated dyno cycling... I can see how it would make a bad problem even worse/hard to recover from in heroku's infrastructure.
To heroku's credit... I'm not sure if I can remember this kind of widespread outage ever happening before. Against heroku's side... I'm not sure what a heroku-deploying customer would be expected, by heroku, to do, to avoid downtime here. "Have your whole thing runnable on some non-heroku stack" kind of detracts from heroku's value proposition, which is you won't have to figure out stuff like that, heroku will do it for you. If I was going to do the work to ensure my app could be switched at any time to be deployed on some other stack... I'd probably just use that some other stack instead of heroku, as it's probably cheaper than heroku. The answer is probably "No matter how much you are doing yourself, outages are possible. By using heroku you put it in their hands, but outages are still possible."
update possibly related? https://status.aws.amazon.com/?date=2019-08-31 https://status.aws.amazon.com/?date=2019-08-31 (in which case you might have been saved by having multi-region deployment on heroku. But if it's only effecting us-east-1, you'd think heroku would have noticed and said so? Could be unrelated in an odd coincidence. Or perhaps there's effected heroku infrastructure on us-east-1 regardless of what region you choose to deploy on heroku).
This is the trade-off of lock-in into simple deployments.
It's largely the same, EKS/Beanstalk/maybe lambda + RDS + S3... Azure has similar offerines, GCP has similar offerings, Heroku has similar offerines. You can get your app up, running, hosted for a dime - cost based on usage. And you can do a lot of stuff with very little operational time invested. That's why this is so great for small startups with 5 technical dudes.
However, you can't get out. You're comitted to their abstractions for relational databases, deployments, DNS, loadbalancing, ...
If you go ahead and do the things you need yourself based on simple linux VMs, you can have much more mobility across cloud providers. Give us a terraform endpoint and some kind of centos image and we can probably host there, speaking of work.
However, that's overall an expensive and non-trivial thing. Suddenly, you're having so many problems heroku has a solution to. Suddenly you need someone who knows how to run a postgres cluster and how to handle failovers, or how not to. Suddenly you need someone who knows how drives should be handled.
This is becoming a really big rift I'm overall starting to see. Applications and deployments are becoming more and more trivial, but there aren't many people understanding how to make the stack below it run reliably, with low chances of data loss.
I hope it wasn't me that broke it lol
Thank goodness... Last thing I needed on a 3 day weekend was an outage.
forum.wordreference.com is also completely down for me.