
Dear Heroku: Quit blaming all of us when you fail. Do this instead… - pardner
http://blog.pardner.com/2012/06/dear-heroku-dammit-quit-blaming-all-of-us-when-you-fail-do-this-instead/
======
balloot
This is misguided. Nobody cares why your site is down, and for most sites 99%
of your users will have no clue what is meant by "This site is hosted by
Heroku". And a good chunk of that other 1% isn't even going to bother reading
the error text accompanying the whitescreen.

In the end, you chose to host your site on a platform that went down. That is
just as much your fault as a typo in the code. If you had a setup with a
hosted machine at Rackspace and the power goes out, you don't expect a custom
error. So why would you expect one from Heroku?

~~~
toast76
This is BS. Your users DO care.

When our service is down for some reason, the ONLY question we get (like this
one we got today) is - "Has the service been down recently? If so, no worries
- it happens, just wanted to report and see if this is temporary or if it is
just me."

Having a "holy shit our entire service is down" message means your users dont'
have to ask "is it just me?".

That's a big difference. It has nothing to do with shifting blame, it's about
keeping your users informed. Information makes people happy.

~~~
E14n
Showing enough information for your users to sensibly understand what they
should do is important, but thats not what the article was about[1], it was
about the application developer wanting to pass-the-buck when there
application failed because of a Heroku error.

This is really lame because, you chose to host on Heroku not your customers,
so it is your fault. You can play pass-the-buck but as far as _your_ customers
are concerned it is _your_ fault.

1\. "..Quit blaming all of us"

~~~
pardner
You say "pass the buck" I say "tell the truth." Heroku's current message says
it's an "application error" even when it is not an application error. It is
incorrect. And I suggested one simple way to correct it. They could word the
message as they see fit, but platform_problem != application_error. What
percent of customer grok/care about the difference is irrelevant... an
incorrect error message should be corrected.

------
michaelw
Why do you care? Do you think this helps your customer at all?

Perhaps you're worried about your technical reputation. All you're doing is
moving the blame to some part of your code to the decision you made to host on
Heroku.

Down is down. Unavailable is unavailable. To your customers that's all that
matters.

For what it's worth I think hosting on Heroku makes plenty of sense and I'm
actually moving my app (Crisply) to Heroku so that my technical team spends
less time writing chef scripts, less time managing database clusters and more
time adding value. But when Heroku goes down my customers will be just as
screwed.

~~~
the_bear
In my experience, users actually do care quite a bit. Back when my company was
on Rackspace, we had several periods of significant downtime that weren't our
fault. Customers who contacted us were very upset, but when we made it clear
that it wasn't a bug with our software, but rather a problem with the hosting
provider, they all calmed down. I believe there are at least a few key things
a customer takes away from a message like that, even if they don't understand
anything about website hosting:

1) This isn't a bug with our software. They don't need to worry about the
security of their data or anything else like that.

2) There's no point in getting mad at us. Sure we chose our hosting provider,
but we're just as upset about the downtime as the customer is.

3) This problem is affecting other websites as well. People seem to be better
at handling stress if they think everyone else is stressed out too.

4) There's a team of professionals working on the problem. My customers know I
run a small company, and they seem to appreciate knowing that a much larger
company handles the hosting.

~~~
seanp2k2
>"1) This isn't a bug with our software. They don't need to worry about the
security of their data or anything else like that." So I guess no one has ever
loosed up a firewall rule when something was down to try to get it back up
again, providing a perfect opportunity for someone with, let's say, stolen
MySQL credentials to connect to your now-exposed DB.

>"2) There's no point in getting mad at us. Sure we chose our hosting
provider, but we're just as upset about the downtime as the customer is." And
they're losing just as much business because /their/ site/service is now
broken too. That is plenty of reason to get mad at you guys, since to them,
/you are the total solution/.

>"3) This problem is affecting other websites as well. People seem to be
better at handling stress if they think everyone else is stressed out too."
Maybe, but then I'd just say "wow, so you have really bad planning and expect
that this stuff is all very reliable then, no?" The internet goes down. Power
goes out. Expect it and build around it.

>"4) There's a team of professionals working on the problem. My customers know
I run a small company, and they seem to appreciate knowing that a much larger
company handles the hosting." Yes, because I fully expect IBM to resolve my
issue faster than a mom-and-pop store. If anything, that would make me /more/
anxious. Ever had to call Level3, Cogent, or ATT for a null route? It takes us
~1 minute and them at least 20, sometimes much longer.

~~~
the_bear
I don't mean to sound snarky, but it seems like you don't deal with customers
very often. You're thinking rationally when you should be thinking
emotionally. Customers are hugely inconvenienced by downtime, but there's
nothing to be done about it. You just need to get them to calm down and
remember that you're someone they like doing business with, and this is just
an unfortunate mistake that's outside of your control. Like everyone else has
already mentioned, everyone knows that websites go down, so customers can deal
with it as long as you give them some peace of mind[1].

Here's a related anecdote: it's common for potential customers to ask me about
security before signing up. I used to actually answer them by explaining the
details of our security practices, but no one understood what I was talking
about. Now I tell people that the site is hosted on Amazon's servers so we can
take advantage of the infrastructure they've built. Obviously this is a non-
answer, but it makes people feel a lot better. They knew they wouldn't be able
to evaluate our security anyway, they just wanted some sign that they can
trust us (and they already trusted us because we're the only company that
picks up the phone when they call).

So I guess what I'm saying is that when you're dealing with technology, it's
good to be rational. When you're dealing with people, emotions are what
matter.

[1] Just so you know, one other thing that I would say during the downtimes
was that we were in the process of switching from Rackspace to Amazon
precisely because of these problems. We weren't just sitting around accepting
them. Since making the switch, the service has been rock solid, so customers
know we meant what we said.

------
famousactress
Bullshit. It's _your_ fault. I'm your user, you took my money. We're done
here. Everything is your fault.

~~~
wissler
Continue in that vein, then if there is a natural disaster then it is their
fault as well.

The information is useful. Heroku should provide it. We're done here.

~~~
radical_cut
Customers usually don't care about the reason for outage. They gave you money.
If the service is running, good. If it's not, you screwed up. No matter what
actually happened, you should've been prepared. Sad but true.

~~~
wissler
"No matter what actually happened, you should've been prepared."

This is just plain false. Being prepared comes at a cost. If you over-prepare,
then your customers have to pay more for no good reason, and they don't
necessarily want to. You have to draw a line and make a judgement call.

There are such things as natural (or political) disasters so serious that it
would be extremely stupid to plan for them. And there are other disasters in
between this and run of the mill. Again, it's a judgement call. And it's not
your "fault" if the customer wants a combination of low price and reliability,
and you made a reasonable tradeoff in order to achieve it.

~~~
koide
As long as you are being honest with your customer and explain this somewhere.

The problem is one of expectations, if you don't say anywhere what have you
prepared for and what are you going to do when something you didn't prepare
for happens, you are misleading the customer, as they will rightly assume you
have prepared for most ordinary things (heroku outage, for instance.)

------
Maro
I don't think the OP is right in general.

What if your customer sees Heroku's name, and gets confused?

She starts asking questions like: Who is on the other end? Am I in business
with X or with Heroku? Who should i call?

~~~
glesica
This is actually a really good point. I can't find it now, but there's an
article floating around somewhere about a government employee in some small
town accusing Apache or Debian or some project of being "hackers" because his
web server broke and was showing the default "congrats it works!" page instead
of the town web site.

~~~
kingofspain
An ex-boss accused me of hacking and embezzlement(!) when he discovered he was
paying £10/month for DNS services a few years back. It was something in place
from years before I even started!

He also flipped out over firewall issues on his new Macbook (the site is down
and taken the internet with it!) and problems with his ISP (some weird adult
filter).

Point is, some people expect a site to be a single atomic thing with no
dependencies or connections between. Through anything "unusual" in there and
confusion and chaos reigns. My ex-boss was somewhat extreme I admit, but I've
experienced it to a lesser degree many many times.

------
chao-
As someone who was inconvenienced by the outage, and with no mitigation
strategy in place, I DON'T blame Heroku. The weight is placed squarely on me
(lone tech in our company) for not having researched how to distribute
services alongside Heroku, or fall back to something else, or whatever the
proper term is.

I've been googling like mad since this morning, finding a few mostly-
unanswered StackOverflow questions and a smattering of blog posts, but I
haven't learned much. The only clear-cut answers I've seen are:

1\. Hire a sysadmin who knows more than you do (But whole point is that I want
to learn for myself!).

2\. Pay for a service that will host in multiple geographic locations for you,
and do the switchover (recovery? fallback? I don't know my terms here) for
you.

3\. A few mentions of "load balancers" and "heartbeat monitors". Sounds self-
explanatory, and these are my current terms of googling.

Any suggestions on where to start acquiring this sort of skill? I'm prepared
to teach myself anything, but the problem is not knowing the terms for what I
want to learn.

EDIT: Well, just watching this thread is helping a bit.

~~~
balloot
How do you not blame Heroku? You are paying them a good chunk of money to
handle not only hosting, but failover strategies, multiple locations, etc. If
you have to worry about any of those things they aren't doing their job.

That's like if you chose MySQL as the database, and then when an update had a
huge bug that broke your site, you say "totally my fault that I don't have a
version of the site that uses PostgreSQL."

~~~
slurgfest
To clarify part of this, Heroku doesn't have the same position as (say) a
router manufacturer because they are offering an all-in-one 'platform'. And
unlike MySQL, they are charging you a decent rate for using it.

------
mshafrir
Heroku has a mechanism for displaying custom error and maintenance pages,
served off of S3.

[https://devcenter.heroku.com/articles/error-
pages#customize_...](https://devcenter.heroku.com/articles/error-
pages#customize_pages)

~~~
le_isms
The earlier Heroku outage also brought down custom error pages. Our site was
only displaying a 500 server error via nginx.

~~~
Snappy
But was it serving _your_ 500 error page? In which case you could make it say
anything you want. Heroku platform errors are not 500s; they're 502s (or 503s,
I forget).

------
notatoad
From a customer's perspective, there are only two parties in their
relationship with you: you, and them. When something goes wrong with your
application, you either accept the blame, or you make the customers feel like
they broke something. To the average user, seeing an error message like
"heroku is down" (or any other jargon) leaves the possibility that they might
have broken something, and the failure is on their end. The end result of this
interaction is that _your software has made your user feel bad about
themselves_. This is not a way to get your users to return to you.

Heroku's error message could be friendlier, but it currently contains only
words that any user can understand, which reassures your customers that even
though the service they are looking for is unavailable, there is nothing they
could have done to improve the situation. Your customers might leave with a
lowered opinion of your service, but your app doesn't make them feel ashamed
of themselves, which is a much better outcome.

------
ultrasaurus
If we put our normal-user-hat on for a minute: I don't see how that would make
any difference for 90%+ of users. They'll see the website isn't working, but
another site is, so your site is broken. End of story.

(With my developer hat on, Heroku outages are fun: our internal switchboard at
<http://www.pagerduty.com> lights up like a christmas tree)

------
starrhorne
Heroku isn't for apps that can't stand downtime. My experience has been that
if you have 2-3 heroku apps, and you monitor them with a 3rd party tool,
you'll see random "server not found" behavior every few weeks. (And no,
they're not just timing out from dyno spinup). Usually this isn't a system
wide outage and never gets mentioned on their status page.

So only use heroku if:

a) Uptime is non-critical & you just don't want to deal with setting up a
server b) Uptime is non-critical & You don't know how to set up a server

------
arihant
You can customize the error page. It is your fault for not reading the docs
and serving the default.

[https://devcenter.heroku.com/articles/error-
pages#customize_...](https://devcenter.heroku.com/articles/error-
pages#customize_pages)

------
duck
I don't want my users to know I use Heroku, and by using Heroku I understand
that if they go down my site goes down. It really is "our" problem at that
point with regards to how our users understand it.

------
dllthomas
If Heroku is down, and I discover that site X that I want to visit was hosted
on Heroku, I'm more likely to hear that Heroku is back up _or_ that site X is
back up, than just that site X is back up. I also can skip checking any other
sites I know to be using Heroku during the outage. It is therefore mildly
useful data to a user.

------
fleitz
Add a custom 500 error page, problem sovled, you can make it say anything you
want.

------
iamleppert
It’s your fault for not having a fault tolerant site that runs on another
service provider. This is what happens when you put your eggs in one basket
and that basket bursts into flames.

If reliability is so important, make it a priority instead of just expecting
stuff to work or for a more politically correct error message — which leads me
to my next point: who cares about the ERROR message? The damage has been done
by that point and half the people won't bother to read any further. Queue
sounds of people clicking back buttons as fast as they can.

~~~
slurgfest
Question: given that Heroku involves a certain amount of platform lock-in, how
do you write a Heroku app that runs on another service provider?

It hardly makes sense if Heroku says "well, it's your fault for trusting us."

------
abscondment
This seems specious. Correctly assigning blame won't matter to readers; most
people couldn't care less. Sticking a different brand name on the failure is
side-stepping the issue: nothing stays up 100% of the time. Create a custom
page that treats the situation with a little bit of levity.

If legitimate downtime happens often enough that someone would _actually_
internalize the difference between your failures and Heroku's, you have bigger
problems than your error page.

------
ynniv
Quite off topic, but I'm always sad to see really poor scalability:

 _To my surprise, this blog post hit the top spot on HN at least briefly. My
blog started throwing some app errors._

I've had a couple of hit HN stories on my blog without a problem, and it was
hosted from my apartment on an old server with 256MB of RAM. Now, it is static
pages served through nginx, but I'm pretty sure that a few thousand hits
shouldn't require 10 Heroku dynos to not fall over.

Kids these days. (the mindset, not the age)

~~~
pardner
I doubt it requires anywhere near 10 dynos, normally I just run 1, and I
suspect 2 or 3 dynos would work fine today... but since 10 dynos only costs 45
cents per hour, and it's presumably just for an hour or two, I simply threw
two handfuls at the problem and went back to my actual work. Handling a one-
time spike didn't seem like something worth optimizing when I could throw the
price of a cup of coffee at the problem.

~~~
ynniv
Ah, my fault for not recognizing pragmatism at work. Rare events are
definitely not worth optimizing for.

------
16s
I've seen so much blame in IT/systems/coding in the last 15 years that I can't
recount it all. Anytime a vendor or service provider or consultant is
involved, get ready for finger pointing when things go wrong (from both
sides). I think many managers like being able to blame them and see this as a
benefit of the relationship. Outside providers should just expect to be blamed
for things they did not do and charge for that accordingly.

------
peterkelly
I think that for completeness, it should display a complete blame derivation
graph that explains to the user the full chain of events, right back to the
original person who was ultimately responsible.

After all, it wouldn't be fair for Heroku to be blamed just because a piece of
networking equipment failed - the user should be informed which vendor is at
fault, and in turn, which supplier the failed component within said equipment
came from.

------
overworkedasian
at the end of the day, if you have a service that other
people/businesses/clients rely on, that they need 24/7 up time, then you
really need to have a plan B that is not on heroku or aws. a REAL disaster
recovery plan needs to be thought out and implemented. if you dont want your
users to see the "there is a problem with this app" on heroku, then its your
job to figure out that plan B is. If you cant afford it a plan B, then well,
tough shits. as someone that has worked in the hosting business for years on
the operations side, its also the responsibility of the client to plan that
scenario where your primary host is not reachable (regardless if its an
application level issue, network or power outage). the hosting company can
only build so many N+1 backups (network/power/etc) as they can
afford/physically fit. you can buy all the load balancing you want, redundant
web servers and database servers. if you arent hosting in a secondary place
and your primary host fails, all those redundant servers you are paying for
arent going to mean a damn thing.

------
jtarud
We know when Heroku is down cause our emails from client's app drown our
inboxes, and our clients get pounded by their clients.

I think it would be ideal to allow you to customize these messages to make
things easier, but I can't imagine the infrastructure they would need to have
in place to support this.

The option presented by the article is lot simpler.

------
DanaDanger
A gentle reminder: <http://whoownsmyavailability.com/>

------
URSpider94
Many companies I know would immediately fire a service provider for ever
disclosing their existence to an end customer. If anything, Heroku's customers
should be able to replace the default error message such that it conforms to
the the customer's site branding.

------
pearkes
You can customize your error pages to be whatever you want.

[https://devcenter.heroku.com/articles/error-
pages#customize_...](https://devcenter.heroku.com/articles/error-
pages#customize_pages)

~~~
slurgfest
So the solution is for you to run a script which monitors Heroku for outages
and changes the error page?

If you are doing that, you might as well write an app against another
platform.

------
reilly3000
They ought to create an interface for serving up a custom 500 error page.

~~~
Snappy
You mean like this? [https://devcenter.heroku.com/articles/error-
pages#customize_...](https://devcenter.heroku.com/articles/error-
pages#customize_pages)

------
rsenk330
What if the problem only affects a subset of users? Then wouldn't any
application errors for unaffected users (e.g. a typo in code) say it is
Heroku's fault when it really isn't?

------
zbowling
It may be difficult from a platform for them to tell where the outage is
exactly.

Also I don't think they should tell everyone heroku is hosting it so I don't
think that is a good solution.

~~~
pardner
re your second point: Yeah I pondered that, and they could make the "ooops"
more generic. However, we freely talk about being on Heroku since by and large
it seems to engender customer confidence. And it's exactly a secret since the
dns will point to proxy.heroku.com after all.

------
sunkencity
All this ruckus for 18 mins of downtime? Moved my main app off heroku this
monday for various reasons (mainly to get better log access and to run the app
in europe).

------
neilmiddleton
The only change I would make, if any, is to remove the sentence about being
the application owner. Aside from that, that's all I'm going to tell a
customer anyway.

------
Tomis02
This is what a culture of pointing fingers leads to. The author should realize
the customer does not care why the site is down.

------
seanp2k2
"throw more dynos at it"

You kids and your lingo these days....

/back in my day/ we used to have servers. REAL, PHYSICAL servers :)

------
gaius
Does Heroku not have custom ErrorDocumenrs? We had those in the 90's...

------
halayli
Isn't it your fault to have picked Heroku in this case?

~~~
slurgfest
Yes, it is your fault for trusting what Heroku says about its availability.
But it would be classy for Heroku to take responsibility. It is not classy for
Heroku to say "it's your fault because you trusted us" in front of the users,
which seems to be the principal defense of Heroku in these comments.

------
awicklander
Or you could get on Engine Yard and stop caring what Heroku does. That's
worked wonders for my business.

------
awicklander
Or you could get on EngineYard and stop caring what Heroku does. That's worked
wonders for my business.

------
cpfohl
Wow, didn't know that (don't use Heroku). Good article. Good solution.

