

Parts of Heroku are down - diegottg
https://status.heroku.com/#now

======
dpritchett
Heroku is up. My production application on Heroku is up.

Heroku's _provisioning, build, and remote console services_ have been down for
a few hours. Luckily my production app is still chugging along pretty well.
There was a time this afternoon when I wanted to restart one of my six dynos
(because it was half-dead) but was unable to due to the API lockdown. That
meant every sixth request was sent to an app server that was likely to barf on
it for a while there. The dyno got better a few minutes later though.

 _Postmortem edit_ :

API outage is over after a period of about three hours (that I was aware of
anyway). During this window my app's response time was about 1500ms. Now that
it's over it's down to about 250ms, which is on the high side of our daily
variance between 150 and 250.

Unfortunately for us, we pushed out a marketing newsletter right about the
time they shut down the API. Looks like we're still getting decent sales
though!

[1]
[https://status.heroku.com/incidents/633](https://status.heroku.com/incidents/633)

------
rdegges
I'm a long-time Heroku user (I've also written about them quite a bit, and
published a book on the topic). In my opinion, Heroku is still, without a
doubt, the best hosting option around for production apps.

I've worked and built enormous projects in the past and hosted them myself (on
hardware, and on providers directly (Rackspace, AWS)), but have _always_ had
more headaches, wasted time, and downtime when doing things myself (and with a
team) then when I'm using Heroku.

Regardless of the occasional incident Heroku has, I'm still 100% a loving
user. The people over there work super hard on tons of stuff, and do a great
job at keeping millions of applications up.

Keep at it! <3

------
andrewvc
I'm curious to know why it is that AWS can operate its API with a much higher
degree of reliability than Heroku. They clearly have superior design and
processes given the large difference in reliability.

Having been a heroku user for quite a while in the past I'm not surprised,
these sorts of issues are sadly common with heroku.

~~~
grandalf
Heroku has lots of single points of failure in its architecture.

Heroku also has no transparency into its architecture, which is why no site on
Heroku will be able to be PCI compliant after Jan 2015.

I'm not optimistic about Heroku's ability to continue to be a leading PaaS.

~~~
cschmidt
> no site on Heroku will be able to be PCI compliant after Jan 2015

do you have a reference for that?

~~~
grandalf
Read the PCI DSS 3.0 SAQ A-EP documentation (and some of the blog posts people
have written about it).

Heroku has never been able to pass a PCI Level 1 or Level 2 audit, but as of
Jan 2014 it will also no longer pass level 3 or level 4 SAQs.

~~~
spitfire
What in particular is Heroku missing to pass level 3 and 4?

Hit and run comments aren't really useful. Some detail and if possible some
suggestions for improvement really help. Even if they can't be implemented for
some reason.

~~~
grandalf
> What in particular is Heroku missing to pass level 3 and 4?

It's mostly documentation. If Heroku has built a secure system and documented
it adequately, then it would easily pass SAQ A-EP.

One easy example: Maybe Heroku keeps logs of all HTTP requests that include
params containing credit card numbers. Nobody knows.

------
jprince
Our website went from an appdex of .9 to .5, with page load times so long that
many requests timed out. We're still pretty dead in the water for many users
though it is getting better, according to NewRelic, and we didn't use
Autoscaling.

Kind of surprised Heroku went down for so long in the middle of the day. Can't
imagine any serious large scale services wanting to stay on a platform with
that availability for very long. We're not large enough yet to move, but I
think AWS is going to have to happen at some point.

------
bobx11
Heroku is not accepting new builds, but is still performing for my web apps
that are deployed there. I would say it's not down, but that they're having
deployment issues.

------
nantes
We're a Top-1000-ish site (in the US) with ~15M pageviews and ~50M backend
transactions per month. We saw a ~20ms increase in response times across all
requests (some are super-fast and skew things quite a bit) during the
downtime. Luckily, our CDN strategy takes a lot of load of the backend and we
were fine throughout the down time.

I'd echo rdegges point from eariler, Heroku has been (and will be for the
foreseeable future) equivalent to a FTE in giving us the ability to build
things and not worry (too much) about the hosting.

Edit: note the response time increase _during_ the downtime event.

------
smathieu
Heroku is effectively down if your application uses the Heroku API for auto-
scaling. It probably means that only larger applications are affected.

~~~
wmil
Also any free tier apps that were asleep couldn't start.

------
klinskyc
"We have completed the maintenance work. API functionality has been restored,
and builds are resuming. We're continuing to monitor the affected systems." \-
Heroku Status

Builds are back up for me after a few hours of downtime.

------
enraged_camel
I wish my company's IT team took their jobs as seriously and provided on-
demand, easy-to-read timelines for service outages along with regular updates.
That way anyone in the company can see at a glance how reliable the
infrastructure is and how quickly issues are resolved.

Of course, the IT team would probably hate being held visibly accountable like
that. But hey, I guess that's why they are working in corporate IT as opposed
to a well-known and successful cloud service provider.

~~~
toomuchtodo
> Of course, the IT team would probably hate being held visibly accountable
> like that. But hey, I guess that's why they are working in corporate IT as
> opposed to a well-known and successful cloud service provider.

Is your IT team compensated at the same level as well-known and successful
cloud service providers? If not, why would you hold them to the same standard?

~~~
dpritchett
Good point. Internal IT teams have a completely different set of incentives.
Heroku's status timelines are important in keeping old customers as well as
getting new ones. Internal IT doesn't really have that sales/retention
problem, but they definitely have an "I might get fired if someone gets the
wrong idea about an outage" problem.

~~~
toomuchtodo
Exactly.

------
enraged_camel
Heroku isn't down, only some parts of the infrastructure are disabled. I
haven't seen any service interruption to my production app however (which
would be the case if Heroku as a whole was down, which is what the title
implies).

------
dang
We changed the title from "Heroku is Down" (originally, "Heroku has been down
for more than 1 hour") in an attempt to make it more accurate. Happy to change
it further if anyone suggests a better title.

In general, "Foo is Down" doesn't make for very good HN posts, because while
it matters to users of Foo, the fact that Foo is down usually isn't
intellectually interesting (which is what HN is looking for in a submission).
Postmortems about why Foo went down, on the other hand, are often fascinating.
Conclusion: we should usually wait for the postmortem.

~~~
dpritchett
Surely there's room on HN for original content in the form of comments? My top
comment in this thread is a decent starting point for a discussion of what was
actually down and what it meant to users. I'm sure Heroku's inside story will
be even more interesting if they choose to blog it.

edit to add: Thanks for the transparency in moderation! I had thought this
thread was poorly titled too.

------
fataliss
"Funny" how both Heroku and Bitbucket end up on HN the same afternoon for
similar reasons... coincidence?

