In the end, you chose to host your site on a platform that went down. That is just as much your fault as a typo in the code. If you had a setup with a hosted machine at Rackspace and the power goes out, you don't expect a custom error. So why would you expect one from Heroku?
Couldn't agree more.
Further I wouldn't want anyone to know where the site is hosted anyway. We have some customers on a VPS at Media Temple and use custom dns so they don't see MT's dns servers (which of course they could find out if they want to check the IP obviously). They don't need to know that info. As far as they are concerned we are the vendor.
The message could simply say "We are having technical difficulties and we're working to get the service restored as quickly as possible".
>In the end, you chose to host your site on a platform that went down.
That's only partially true, there are limited viable options for platform hosts out there, and virtually no one has cracked the 100% up-time challenge, so there isn't any degree to which a software developer could choose a host with 100% up-time if it doesn't exist. Thus they're not at fault for choosing the "wrong" host or not creating something that is astronomically difficult themselves.
You're failing to understand that the overwhelming, vast majority of people have no idea what half the words in your sentence meant, have no idea what idea you're trying to convey, and even if they understood the idea, wouldn't be able to get it from your sentence.
Nobody cares whether it's a bug in your code or your host going down, because 99% of people don't know the difference.
If the customer attributes the downtime to your host rather than you, then your customer might tell you to switch hosts rather than fire you. If you have a good relationship, the added transparency can be the difference between keeping the customer and loosing the customer.
Of course, this line of thinking is no incentive for heroku to change its practices.
When our service is down for some reason, the ONLY question we get (like this one we got today) is - "Has the service been down recently? If so, no worries - it happens, just wanted to report and see if this is temporary or if it is just me."
Having a "holy shit our entire service is down" message means your users dont' have to ask "is it just me?".
That's a big difference. It has nothing to do with shifting blame, it's about keeping your users informed. Information makes people happy.
Also, a Heroku-specific error page would give zero help to the problem you describe above. Anecdotal evidence in this thread suggests Heroku sometimes goes down briefly for small chunks of its users. So if you see the Heroku error, it could be for just you. Or it could be system-wide outage. It wouldn't even solve that problem!
This is really lame because, you chose to host on Heroku not your customers, so it is your fault. You can play pass-the-buck but as far as your customers are concerned it is your fault.
1. "..Quit blaming all of us"
> If you had a setup with a hosted machine at Rackspace and the power goes out, you don't expect a custom error. So why would you expect one from Heroku?
Perhaps you're worried about your technical reputation. All you're doing is moving the blame to some part of your code to the decision you made to host on Heroku.
Down is down. Unavailable is unavailable. To your customers that's all that matters.
For what it's worth I think hosting on Heroku makes plenty of sense and I'm actually moving my app (Crisply) to Heroku so that my technical team spends less time writing chef scripts, less time managing database clusters and more time adding value. But when Heroku goes down my customers will be just as screwed.
1) This isn't a bug with our software. They don't need to worry about the security of their data or anything else like that.
2) There's no point in getting mad at us. Sure we chose our hosting provider, but we're just as upset about the downtime as the customer is.
3) This problem is affecting other websites as well. People seem to be better at handling stress if they think everyone else is stressed out too.
4) There's a team of professionals working on the problem. My customers know I run a small company, and they seem to appreciate knowing that a much larger company handles the hosting.
>"2) There's no point in getting mad at us. Sure we chose our hosting provider, but we're just as upset about the downtime as the customer is."
And they're losing just as much business because /their/ site/service is now broken too. That is plenty of reason to get mad at you guys, since to them, /you are the total solution/.
>"3) This problem is affecting other websites as well. People seem to be better at handling stress if they think everyone else is stressed out too."
Maybe, but then I'd just say "wow, so you have really bad planning and expect that this stuff is all very reliable then, no?" The internet goes down. Power goes out. Expect it and build around it.
>"4) There's a team of professionals working on the problem. My customers know I run a small company, and they seem to appreciate knowing that a much larger company handles the hosting."
Yes, because I fully expect IBM to resolve my issue faster than a mom-and-pop store. If anything, that would make me /more/ anxious. Ever had to call Level3, Cogent, or ATT for a null route? It takes us ~1 minute and them at least 20, sometimes much longer.
Here's a related anecdote: it's common for potential customers to ask me about security before signing up. I used to actually answer them by explaining the details of our security practices, but no one understood what I was talking about. Now I tell people that the site is hosted on Amazon's servers so we can take advantage of the infrastructure they've built. Obviously this is a non-answer, but it makes people feel a lot better. They knew they wouldn't be able to evaluate our security anyway, they just wanted some sign that they can trust us (and they already trusted us because we're the only company that picks up the phone when they call).
So I guess what I'm saying is that when you're dealing with technology, it's good to be rational. When you're dealing with people, emotions are what matter.
 Just so you know, one other thing that I would say during the downtimes was that we were in the process of switching from Rackspace to Amazon precisely because of these problems. We weren't just sitting around accepting them. Since making the switch, the service has been rock solid, so customers know we meant what we said.
Rational clients understand they are not hiring demigods or living in some uptime utopia. However, they expect their service providers to adhere to best practices and make logical decisions.
In this case it's Heroku, in another setting it might be a NetApp filer. Both are reasonable solutions in their appropriate environment, yet both may fail. There's a significant difference between downtime caused by the failure of a reasonably trusted solution as opposed to that due to design flaws, bleeding edge prototyping and flat out bad decisions. The former wouldn't undermine my faith in a service provider, while the latter certainly might.
For a software bug tracking app, I can see why you'd want this. For a store that sells something like cloth diapers, I think it would be confusing (at absolute best) and certainly not any kind of improvement for either Heroku's customer or their customer's customer.
My primary point was that since it can matter, the information should be presented. An underlying assumption being that it can't hurt, but could help.
However, you brought up a point I hadn't fully considered : namely that this information could be directly confusing to some users and by extension undermine their faith in their service provider.
That said, I think the sample error message in the OP is sufficiently clear for the broad spectrum of users. Therefore, I believe presenting users with that message, or something similar, would do more good than harm.
Heroku is generally regarded as a reliable hosting solution. If it was a bargain basement alternative that was expected to fail randomly, then this would in fact be "clownshoes" downtime.
Building in full redundancy behind Heroku is possible, but non-trivial. They are after all being paid to provide a reasonably fault tolerant solution.
Therefore, while not literally unavoidable, it may not make good business sense for a venture to incur the additional expenses associated with implementing full redundancy. Most of their clients probably understand that downtime happens and will be perfectly content as long as every reasonable effort was made to keep the system online.
edited to add : Communication is key. Business relationships typically don't crumble due to aberrations like this (power outage, hosting provider going down, etc.). However, they do crumble if communication either doesn't occur, or the wrong information is conveyed. That is the crux of this discussion.
When you know it's Heroku, you can deliver valuable uptime to your customers faster because you don't have to spend time testing for bugs and checking server logs--jumping straight to what you would do: contacting Heroku.
Your customers won't be just as screwed--3 hours of down is not the same as 6 hours of down.
Moreover, there may be companies hosting on Heroku who insist that the service is "white label" for any number of reasons; forcing something like onto them would devalue Heroku to those companies.
But Heroku can't know if the customer of their customers are technical folks or not and neither can you. Some people do care, the OP for example seems to care and we can assume thats because his customers do in fact care (only the OP can decide if his customers care or not).
Everyone else will be confused about what the hell this "Heroku" thing is.
That said, if they don't use appropriate error codes (maybe 502 or 504 for Heroku issues and 503 for app issues?) they should. But I don't think error messages should mention "Heroku" by name.
Delusional. If it's down, it's done, and unless your product is a developer tool, >99% of people won't care why.
For those kinds of customers, they may understand what Heroku is and why their vendor is using it and will definitely make at least some distinction about outage fault.
By Heroku not listing when its their downtime, they are insulating their reputation as a hosting company from end users, at the expense of the customers already using them. It's a little shady.
I agree that the average end user would probably not care, but most not caring does not mean it's not valuable information to some people. So I see where the original poster is coming from.
And in practical terms, it seems totally theoretical.
In my experience, more information is always valuable. It's not a matter of shifting responsibility, it's a matter of understanding what the problem is and efficiently getting it resolved.
Sending out an error that is incorrect is wrong -- both a theoretical and practical observation.
Instead the article read to me as if the benefit of displaying this message is that the user's frustration might be allowed to shift to the sub-contracted vendor. I find it hard not to be infuriated by that idea.
And yeah, I think it means it's the most highly rated... or at least something very close to that.
Unless the error message is a slightly-less-functional app, no.
OP was playing the role of the user. Paying customers don't care about the implementation details of the products they pay for. They only care about whether or not they work.
The hard fact is that /your app is unavailable/ and I promise that >99% of users won't care why. Did anyone care why Twitter used to Failwhale? It was down, and that sucked, and software exists these days (and has for >20 years) to eliminate single point of failure.
You get what you pay for.
But you're right about the second part. You get what you pay for... unfortunately that's often different than getting what you bought.
what OP is complaining about is that when Heroku has an outage it says that there's an error within the client's application. I agree that it's the client's responsibility to have an up-and-running app, while the average user doesn't really care what's going on behind the scene, in this case Heroku is still giving out factually wrong and misleading information.
I can imagine users will oftentimes tell Heroku's clients to fix their app when in reality there's nothing they can do.
> "...in reality there's nothing they can do."
This is not a reality I'm familiar with.
Devs don't get that UX is all that matters for most software.
1. You haven't paid them for anything yet.
2. Wrong question!!! It's an opportunity. If I showed up for lunch and despite power being out (probably on the entire block or neighborhood) the proprietors were set up outside making cold sandwiches next to a sign that said "Sorry, power's out so only egg salad" I'd be thrilled. Here are people single-mindedly devoted to my experience.
That's really my point. It's an attitude problem. I want to spend my money with people who hustle when it hurts, and I want to do that for my users. I'm not saying it's not "fair" to close shop and blame the other guy. Sure it's fair, but the person who cares more is gonna eat (or make, in this analogy) your lunch... and the world will be a better place for it.
The relevant part of the analogy is that you, the end user, wouldn't actually know why the restaurant is closed. Could be a power outage, or it could be due to health code violations. Having this information accurately communicated to the customer could impact their willingness to return to the establishment.
I go someplace outside of the power failure
I agree with the content of the article, but you're right -- users don't wanna hear it.
The information is useful. Heroku should provide it. We're done here.
I speak as someone who's worked primarily in healthcare building services where I assure you we held ourselves personally responsible for natural disasters.
So it depends on your app. If my startup lets people take photos of their dessert and paste lolcats on them, then maybe my hosting goes down and I show my users a page that says the server must have farted, who cares.. but the last thing I'd do is show a page that said the people I pay with their money must be fucking up at the moment and we'll all wait together for things to get better.
Point is: My users are not my peers, they're my responsibility and livelihood. Even when something totally out of my control occurs. Fuck, especially when something out of my control occurs.
Yes, this costs money. It's why people accustomed to getting everything for free on the internet can't fathom why larger companies charge six or seven figures for a service that they could roll out themselves by installing an open source package on some Linode VM. If you're paying that kind of money for the reliability, it's because you're extending a promise to your end customers, and the service contract you receive from your provider should come with lots of guarantees and financial penalties if the conditions warranting the price tag aren't met.
This is just plain false. Being prepared comes at a cost. If you over-prepare, then your customers have to pay more for no good reason, and they don't necessarily want to. You have to draw a line and make a judgement call.
There are such things as natural (or political) disasters so serious that it would be extremely stupid to plan for them. And there are other disasters in between this and run of the mill. Again, it's a judgement call. And it's not your "fault" if the customer wants a combination of low price and reliability, and you made a reasonable tradeoff in order to achieve it.
The problem is one of expectations, if you don't say anywhere what have you prepared for and what are you going to do when something you didn't prepare for happens, you are misleading the customer, as they will rightly assume you have prepared for most ordinary things (heroku outage, for instance.)
Yes, Heroku should put up a different error notification when the problem is on their side, but I doubt it would make that much of a difference in the eyes of the user.
What if your customer sees Heroku's name, and gets confused?
She starts asking questions like: Who is on the other end? Am I in business with X or with Heroku? Who should i call?
He also flipped out over firewall issues on his new Macbook (the site is down and taken the internet with it!) and problems with his ISP (some weird adult filter).
Point is, some people expect a site to be a single atomic thing with no dependencies or connections between. Through anything "unusual" in there and confusion and chaos reigns. My ex-boss was somewhat extreme I admit, but I've experienced it to a lesser degree many many times.
I've been googling like mad since this morning, finding a few mostly-unanswered StackOverflow questions and a smattering of blog posts, but I haven't learned much. The only clear-cut answers I've seen are:
1. Hire a sysadmin who knows more than you do (But whole point is that I want to learn for myself!).
2. Pay for a service that will host in multiple geographic locations for you, and do the switchover (recovery? fallback? I don't know my terms here) for you.
3. A few mentions of "load balancers" and "heartbeat monitors". Sounds self-explanatory, and these are my current terms of googling.
Any suggestions on where to start acquiring this sort of skill? I'm prepared to teach myself anything, but the problem is not knowing the terms for what I want to learn.
EDIT: Well, just watching this thread is helping a bit.
That's like if you chose MySQL as the database, and then when an update had a huge bug that broke your site, you say "totally my fault that I don't have a version of the site that uses PostgreSQL."
As a small company you probably aren't able to easily get your own IP block allocated (that I know of) so BGP  isn't really an option and the best you can do is probably DNS switching. Use a good DNS provider and set your TTLs to something low like 30 seconds or 1 minute. Then when you have an outage, change the DNS entry to point to a secondary datacenter, which would have a static error page or a reduced-functionality site. There seems to be some debate around whether low DNS TTLs increase users' request times, but we haven't seen it.
There are some companies that will handle the monitoring and switchover for you (Dyn comes to mind) but we prefer to manually switchover for the time being. We have a Big Red Button sinatra app that reports the status of the site and allows you to fail over to the secondary and recover when the primary returns; I'm planning on open sourcing it once it gets some documentation.
I think the reason failover doesn't get talked about as much in the startup world is just because it's hard to do and the costs are disproportionately high for a small company unless availability is really critical to you. For most people, just using multiple availability zones on EC2 is probably sufficient.
The more generalized "Cloud plus Dedicated" fallback/load balancing seems fairly involved, and raises a lot of other questions, but at least I've got a path to follow now. Also would be more expensive, as a backup server might just be hanging around doing nothing at times.
Then again, it would pay for itself in satisfied customers after just a single event.
Unless your application absolutely must be up with higher availability than Heroku provides, it's probably not worth the effort. The easiest thing to do is to use something like Cloudflare in front of Heroku, so at least when Heroku is down, you can serve a static page to customers informing them of the problem and estimated time to fix.
Heroku's error message could be friendlier, but it currently contains only words that any user can understand, which reassures your customers that even though the service they are looking for is unavailable, there is nothing they could have done to improve the situation. Your customers might leave with a lowered opinion of your service, but your app doesn't make them feel ashamed of themselves, which is a much better outcome.
(With my developer hat on, Heroku outages are fun: our internal switchboard at http://www.pagerduty.com lights up like a christmas tree)
So only use heroku if:
a) Uptime is non-critical & you just don't want to deal with setting up a server
b) Uptime is non-critical & You don't know how to set up a server
If reliability is so important, make it a priority instead of just expecting stuff to work or for a more politically correct error message — which leads me to my next point: who cares about the ERROR message? The damage has been done by that point and half the people won't bother to read any further. Queue sounds of people clicking back buttons as fast as they can.
It hardly makes sense if Heroku says "well, it's your fault for trusting us."
If legitimate downtime happens often enough that someone would actually internalize the difference between your failures and Heroku's, you have bigger problems than your error page.
To my surprise, this blog post hit the top spot on HN at least briefly. My blog started throwing some app errors.
I've had a couple of hit HN stories on my blog without a problem, and it was hosted from my apartment on an old server with 256MB of RAM. Now, it is static pages served through nginx, but I'm pretty sure that a few thousand hits shouldn't require 10 Heroku dynos to not fall over.
Kids these days. (the mindset, not the age)
After all, it wouldn't be fair for Heroku to be blamed just because a piece of networking equipment failed - the user should be informed which vendor is at fault, and in turn, which supplier the failed component within said equipment came from.
I think it would be ideal to allow you to customize these messages to make things easier, but I can't imagine the infrastructure they would need to have in place to support this.
The option presented by the article is lot simpler.
If you are doing that, you might as well write an app against another platform.
Also I don't think they should tell everyone heroku is hosting it so I don't think that is a good solution.
You kids and your lingo these days....
/back in my day/ we used to have servers. REAL, PHYSICAL servers :)