1. If you add the "max-stale" header to your content you can tell your CDN to hold on to things longer in the event of an outage. What happens is that the CDN will look at the cache time you set and check in as normal when that expires, but if it gets back no response or a 503 code (server maintenance status code- if you work with CDNs this is your friend!) it will continue serving the stale content for as long as you tell it.
2. Lets say you're beyond that, or your site is too dynamic. Instead of setting up an error page that responds to everything, setup a redirect with the 302 status code (so crawlers and browsers know it's a temporary thing). Point that redirect at an error page and you're golden. The best part is these types of requests use minimal resources.
What I do is keep a "maintenance" host up at all times that responds to a few domains. It responds to all requests with a redirect that points to an error page that issues the 503 maintenance code. Whenever there's an issue I just point things at that and stop caring while I deal with the problem. I've seen webservers go down for hours without people realizing anything was up, although with dynamic stuff everything is obviously limited. The other benefit to this system is that it makes planned maintenance a hell of a lot easier too.
Oh, another thought- you can use a service like Dyn (their enterprise DNS) or Edgecast to do "failover dns". Basically, if their monitoring systems notice an issue they just change your DNS records for you to point at that maintenance domain. You can also trigger it manually for planned things.
Is pleasing Google now the only reason to obey the HTTP spec?
For example, I had a hard time telling people that they should use a proper HTTP redirect from their domain "example.com" to "www.example.com" (or vice versa), instead of serving the content on both domains. All arguments about standards, best practice etc. were unconvincing. But since Google startet to punish "duplicate content", I never had problems convincing people again.
It's simply good practice to get the HTTP status codes correct in the ways you outline.
If I'm being honest, it's off putting to watch community-wide apologist response to what would normally be outrage of poor execution.
But you nail it. Had Gmail, Reddit or any other unpaid service gone down like this, the uproar would be heard from another galaxy.
In what universe? If email were to stop working in most major corporations that I've worked in for the last 8+ years, the company would basically come to a halt.
Email, for many, many companies is the message/workflow bus, and if it stops - communication comes to a halt.
It is, after electricity, and the network, the one essential function in a company in 2013.
Telephones, Photocopiers, Printers can all cease functioning , with little impact in most technologies companies - but not email.
I don't know where you keep coming up with that - Email is relatively easy to make highly available, and an Exchange Server in 2014, configured with even a modicum of skill, will likely keep running smoothly for the next 10 years. It's as close to 5 9s, of availability that you can get on a software system.
"There has been no big improvements to the reliability email the last 40 years. "
That's just silly. Take 5 minutes to read through http://en.wikipedia.org/wiki/Email and you'll see what improvements have been made in the last 40 years.
You also mention competance, which is available in variable quantities. You might be able to keep Exchange 2014 running solidly, but I've seen people doing scary things with MS SBS Server 2000 (and the exchange that comes with that).
Or data -- they tossed a day or so of submissions and comments when restoring from backup, AFAICT. I'm not saying that's a problem; 4chan loses data and works just fine.
Though the article is correct, with everything else that was going on response codes and cache headers were the least of my worries.
I think the best takeaway is that you will go down at some point, so it's best to have a reasoned plan in place for when you do. Handling it in the heat of the moment means you'll miss things.
I didn't post to try and make HN look bad, using the wrong http status code isn't a huge deal or anything. I just wanted to take the opportunity to discuss http response codes, an issue near to my heart. (In my day job at an academic library, the fact that most of the vendors we deal with deliver error pages with 200s does interfere with things we'd like to do better).
Thanks for the reply!
edit: Ack, I just realized that item_id got reset back to 7015126 on the reboot. My data matches HN up to 7015125, and then diverges after that.
After a semi-random sampling, the comments file appears to contain nothing but comments pre-crash.
The submissions however got clobbered a little by the crawler at some point. There are some submissions in there pre-crash and some post-crash; I think everything's OK from 7015172 on, which only leaves 15 possibly damaged rows, and of those, I'd expect most of them didn't have id collisions. Sorting out the old stuff from the new stuff could be manually done.
(Please let me know if there's anything I should be concerned about in those, or if they shouldn't be posted for some reason, or something. I'm recovering from flu and am still not entirely all here.)
PG doesn't care about HN's search listings, so there's no drawbacks to doing that.
Yup. re: "Why does HN have a relatively low Google PageRank?"
"Probably because we restrict their crawlers. But this is an excellent side effect, because the last thing I want is traffic from Google searches."
If you need to search for something that you know was on HN, HNSearch is a great tool. I use it all the time.
Am I the only one who finds that puzzling?
This isn't Fight Club. It's not even Entrepreneur Club. It's a bunch of generally smart people talking about technology, with an emphasis on making money from it. It's one of my favorite sites, and I love it, but it's not an invitation-only club, is it?
(I also find it weird that one of the go-to sites for web-savvy people would be like, "yeah, screw status codes and how the open, linked, web is supposed to work".)
To be clear, I'm not Protesting a Great Evil. I just find it puzzling, as in, "That's odd, I must not understand what this is all about, after all."
That's not how it is right now so I'm not sure if I am remembering it correctly.
A lot of the time you have people who are direct parties to <insert thing here> come and talk about it only for it to become forever inaccessible because Google can't get it's mittens on it.
Let's not even get started on how some urls expire.
Agreed - so is pinboard's search for #hn tags
How does that make sense? As if CloudFlare would honor status codes but not take advantage of cache headers (which in this case stipulated an absurd 10 year expiration).
Both of those would be much better than returning 200 OK.
At the same time there are many erroneous implementations (intentional and unintentional) that, as you imply, just cache any 200 without checking cache headers. This is also a good practical reason to avoid doing this.
Finally thought to try reload and it gave the page.
I've more than once found myself Googling old HN threads I'd like to find, but can't. Google's search (and site search) is miles better than HN's, but HN intentionally limits Google's crawl rate, thus limiting the amount of content crawled and indexed.
Standards are important.
What the hell, I'll do it.
Now that our brittle forum is back, let's get back to work nitpicking the Android UIs that aren't quite beautiful enough! This is not enough drop shadows!
Edit: CloudFront -> CloudFlare
200 means that the page loaded as intended (which it did).It turns out that some of the page's content (the interesting stuff) was unable to be loaded, and the site's content reflected that.
A 503 would be appropriate if there was a server problem, which might have actually been the case, but with Cloudflare's landing page there was not actually a server problem (since cloudflare served the substitute content properly w/o error)
The difference between your status.heroku.com example and this is that in the former case you are seeking to look at a page that tells you about their status, when in this case you were seeking HN's index but instead got a page about status - because there was a problem preventing you from getting what you wanted.
Suppose HN consisted of two content panes and one content pane was unavailable, and the unavailable pane was replaced by a message indicating a partial outage, should HN return a non-200 response code then?
200 means "this response is being served as intended without the server serving it having an error". With Cloudflare acting as a hybrid caching proxy and static site, it is appropriate for it to return a 200.
If HN were not using Cloudflare, the underlying server would probably just show an automatically generated error message of some kind and return a 503 status.
See, if I ask for, say, `https://news.ycombinator.com/item?id=7015129`, and I _get_ the page that URL names (this thread), then that's a 200. Whether it was delivered from a Cloudflare cache or not, I got that page.
IF I ask for `https://news.ycombinator.com/item?id=7015129`, and I get a "Sorry, this service is temporarily unavailable" message instead, I did not get what I asked for (200 "OK"), I got something else -- because the thing I asked for was 503 Temporarily Unavailable.
It's all about what the URL identifies, and if it was in fact succesfully delivered or not.
When we say API, we usually think of it as Twitter API that sort of thing, which we can send a JSON or XML and get back something easily parasable.
Sites like HN doesn't have that kind of service. Instead, it returns HTML. That's fine. No one said API can't return HTML.
When some part of the Twitter API becomes unavailable, the API server should return 503 when we asks those APIs to return response. If these bad APIs are also used to power some part of the frontend, then the frontend will not work properly. For example, viewing thumbnail from dashboard may be down. But the rest of the page is functioning. In that case, you can't send 503 from the frontend. It doesn't make sense. So if any crawler reading Twitter.com it shouldn't see 503 when it hit the home page.
When HN is down, it should return 503. There are two reasons. First, HN is not functional anymore. The page you see may just be a maintenance page configured in Nginx, much like 404 error page. Secondly, HN is itself an API service. When it is broken, it is broken. It just doesn't work anymore. When you try to access it through a Python script, it doesn't return in the format you expect it to return. When you debug you realize it is not returning anything like you were expecting because most of the HTML structure is gone. This is not an API format change. It's simply the backend is not functional anymore.
And semantically, when your site is under a maintenance mode, 503 makes sense. 200 is not that evil. It just doesn't give anyone a better clue.
If that was intended, then it was a poor, mis-guided, un-helpful intention.
But HN worker has already said it was not intended, there was not much intention involved at all, they just had other things to worry about (getting the site back up) and weren't thinking about it, due to lack of a pre-planning for how to handle an outage. https://news.ycombinator.com/item?id=7016141
At any rate, your ideology of HTTP status codes does not seem to match that of the actual HTTP designers, or of anyone trying to actually use HTTP status codes for anything. If you aren't going to use HTTP status codes for anything, then it hardly matters what they are, so there's no point in arguing about it. But as soon as you try writing software that uses HTTP status codes for anything, you will start hating sites that give you error messages with 200 OK response codes.
For something like a pure REST API, basic status codes are fine. It's when you start breaking the REST abstraction (which a human-readable landing page surely does) that it gets tempting to start misusing status codes.
For any kind of non-RESTful or procedural API, HTTP status codes are simply not adequate and application-specific error handling is necessary (HTTP response 200, app-specific-error 999, etc.).
If any meaning is going to be taken from them at all, then the difference between "OK" and "Server Error" seems pretty basic and fundamental.
But when you think about it, for a pure REST API you really only need classic HTTP response codes.
Response codes only make sense in the context of a resource. Once you introduce query strings or start to stretch the meaning of HTTP verbs the abstraction starts to leak.
So when designing an API it's smart to just handle errors at the application level rather than at the protocol level.
Protcol level errors are for things that are outside of the application. An error response with data validation errors should return a 200, for example.
> The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:
Is a query that turns up no documents when there is indeed no documents a success? I would argue yes, as the search service executed the query accurately...in fact, if the query turned up documents when it shouldn't have, that would probably be a problem...
But the way the 404 is worded, it would also fulfill the meaning of "no results".
> The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
Which is almost always 200. Partially because that's what everyone will expect because that's what everyone else does.