
Hacker News down, unwisely returning HTTP 200 for outage message - pcvarmint
http://bibwild.wordpress.com/2014/01/06/hackernews_http_200_outage/
======
tedivm
As people mentioned this site is an exception to how to do things, in that PG
actively does not care about search engine results. However, for the people
who are interested, here's a few ways you can handle a situation like this.

1\. If you add the "max-stale" header to your content you can tell your CDN to
hold on to things longer in the event of an outage. What happens is that the
CDN will look at the cache time you set and check in as normal when that
expires, but if it gets back no response or a 503 code (server maintenance
status code- if you work with CDNs this is your friend!) it will continue
serving the stale content for as long as you tell it.

2\. Lets say you're beyond that, or your site is too dynamic. Instead of
setting up an error page that responds to everything, setup a redirect with
the 302 status code (so crawlers and browsers know it's a temporary thing).
Point that redirect at an error page and you're golden. The best part is these
types of requests use minimal resources.

What I do is keep a "maintenance" host up at all times that responds to a few
domains. It responds to all requests with a redirect that points to an error
page that issues the 503 maintenance code. Whenever there's an issue I just
point things at that and stop caring while I deal with the problem. I've seen
webservers go down for hours without people realizing anything was up,
although with dynamic stuff everything is obviously limited. The other benefit
to this system is that it makes planned maintenance a hell of a lot easier
too.

Oh, another thought- you can use a service like Dyn (their enterprise DNS) or
Edgecast to do "failover dns". Basically, if their monitoring systems notice
an issue they just change your DNS records for you to point at that
maintenance domain. You can also trigger it manually for planned things.

~~~
tomkin
This a strange reaction. I agree that perhaps "this is different", but I am
hoping I can assume that the next time I see an article on HN, about another
website going down, or losing their data, that the HN community shows as much
compassion and understanding. In my experience, when not related to the work
of pg, there tends to be a lot of "you should have done X and there is no
excuse for not doing Y!!"

If I'm being honest, it's off putting to watch community-wide apologist
response to what would normally be outrage of poor execution.

~~~
gtirloni
I think it's an expected response from a community that is largely comprised
of people that desire YC-bucks.

But you nail it. Had Gmail, Reddit or any other unpaid service gone down like
this, the uproar would be heard from another galaxy.

~~~
tedivm
People depend on email. This is just a link aggregating and discussion site. I
think people should have some sense of perspective here.

~~~
DanBC
Having spent months telling people that email is not reliable and can fail at
any minute and must not be relied on, and then having to put up with the fall
out when the shitty provider broke something, I can confirm that people get
_angry_ when email breaks.

~~~
ghshephard
"email is not reliable and can fail at any minute and must not be relied on,"

In what universe? If email were to stop working in most major corporations
that I've worked in for the last 8+ years, the company would basically come to
a halt.

Email, for many, many companies is the message/workflow bus, and if it stops -
communication comes to a halt.

It is, after electricity, and the network, the one essential function in a
company in 2013.

Telephones, Photocopiers, Printers can all cease functioning , with little
impact in most technologies companies - but not email.

~~~
Moru
And yet Email is the most failureprone thing out there. It was not meant to be
relied upon in the degree it is today. And yes, many companies grinds to a
stop as soon as email is down. There has been no big improvements to the
reliability email the last 40 years. I mean, what is the date format in email?
Any sort of current standard? What date-stamp are most email readers trusting?
Ever had an email that says it arrived 1997? I still get those sometimes.

~~~
ghshephard
"Email is the most failureprone thing out there"

I don't know where you keep coming up with that - Email is relatively easy to
make highly available, and an Exchange Server in 2014, configured with even a
modicum of skill, will likely keep running smoothly for the next 10 years.
It's as close to 5 9s, of availability that you can get on a software system.

"There has been no big improvements to the reliability email the last 40
years. "

That's just silly. Take 5 minutes to read through
[http://en.wikipedia.org/wiki/Email](http://en.wikipedia.org/wiki/Email) and
you'll see what improvements have been made in the last 40 years.

~~~
DanBC
Deliverability of email relies on a number of factors outside your control.

You also mention competance, which is available in variable quantities. You
might be able to keep Exchange 2014 running solidly, but I've seen people
doing scary things with MS SBS Server 2000 (and the exchange that comes with
that).

------
kogir
This was all me. I probably should have thought about it more, but just wanted
to make it clear we knew something was wrong and were working on it. Load was
not a concern.

Though the article is correct, with everything else that was going on response
codes and cache headers were the least of my worries.

I think the best takeaway is that you will go down at some point, so it's best
to have a reasoned plan in place for when you do. Handling it in the heat of
the moment means you'll miss things.

~~~
thaumaturgy
If it becomes a big deal, and you can't get the data from ThriftDB for some
reason, I've got a copy of HN data (submission & comment content, user,
points, item date) up to id 7018491, a comment by kashkhan, time-stamped
2014-01-05 23:56:34.

edit: Ack, I just realized that item_id got reset back to 7015126 on the
reboot. My data matches HN up to 7015125, and then diverges after that.

~~~
emillon
Can you make it available somewhere? I'd be interested in that data, and sure
that others would be too. Thanks!

~~~
thaumaturgy
Sure. This is _very_ temporary, I'll be removing the links sometime tomorrow:

[http://www.associatedtechs.com/tmp/hn_submissions_7015126.sq...](http://www.associatedtechs.com/tmp/hn_submissions_7015126.sql)

[http://www.associatedtechs.com/tmp/hn_comments_7015126.sql](http://www.associatedtechs.com/tmp/hn_comments_7015126.sql)

After a semi-random sampling, the comments file appears to contain nothing but
comments pre-crash.

The submissions however got clobbered a little by the crawler at some point.
There are some submissions in there pre-crash and some post-crash; I think
everything's OK from 7015172 on, which only leaves 15 possibly damaged rows,
and of those, I'd expect most of them didn't have id collisions. Sorting out
the old stuff from the new stuff could be manually done.

(Please let me know if there's anything I should be concerned about in those,
or if they shouldn't be posted for some reason, or something. I'm recovering
from flu and am still not entirely all here.)

~~~
emillon
Thanks!

------
dangrossman
HTTP 200 = "Cloudflare, please cache this status message instead of passing
through a million requests to our dead server while it's busy restoring a
backup".

PG doesn't care about HN's search listings, so there's no drawbacks to doing
that.

~~~
Crito
> _" PG doesn't care about HN's search listings"_

Yup. re: _" Why does HN have a relatively low Google PageRank?"_

 _" Probably because we restrict their crawlers. But this is an excellent side
effect, because the last thing I want is traffic from Google searches."_

[https://news.ycombinator.com/item?id=5808990](https://news.ycombinator.com/item?id=5808990)

If you need to search for something that you know was on HN, HNSearch is a
_great_ tool. I use it all the time.

~~~
6cxs2hd6
> _" Probably because we restrict their crawlers. But this is an excellent
> side effect, because the last thing I want is traffic from Google
> searches."_

Am I the only one who finds that puzzling?

This isn't Fight Club. It's not even Entrepreneur Club. It's a bunch of
generally smart people talking about technology, with an emphasis on making
money from it. It's one of my favorite sites, and I love it, but it's not an
invitation-only club, is it?

(I also find it weird that one of the go-to sites for web-savvy people would
be like, "yeah, screw status codes and how the open, linked, web is supposed
to work".)

To be clear, I'm not Protesting a Great Evil. I just find it puzzling, as in,
"That's odd, I must not understand what this is all about, after all."

~~~
Crito
I can't speak for PG, but I think the general idea is that a slow influx of
new users is less likely to alter the nature of HN as everyone has a chance to
acclimatize (avoiding some sort of Eternal September), and the people who
really "need" to be on HN (people interested in startups I guess?) will know
about HN already, or be told about it. That last part might be a little
"fightclub-ish" I guess, but it seems to be working alright.

~~~
buttsex
Couldn't he just turn off registrations for new accounts? Not saying he needs
to get HN to the top for a "startup" search query. I found HN by a Google
search while looking for a good laptop to run Linux on.

~~~
supergauntlet
That happens. When too many people register accounts they're locked for the
rest of the day and the register account option is no longer there, only
login.

~~~
Crito
IIRC sometimes the "Create Account" section of the login page is missing, but
can still be accessed through
[https://new.ycombinator.com/submit](https://new.ycombinator.com/submit)

That's not how it is right now so I'm not sure if I am remembering it
correctly.

~~~
kgermino
IIRC (I might search for it later) that was a spambot fix. Apparently it was
fairly effective - I presume the bots were smart enough to find the 'login'
link on the front page then register an account from there but not much else.

------
yawboakye
Browsers tend to cache 200 OK responses. When HN came up (as reported on
Twitter) I kept getting the error page until I bust the cache and reloaded.
Yup, that's what 200 OK for an error page can cause. A regular reload will
still show your _down_ page

~~~
tedd4u
Well by the letter of the RFC browsers and middle boxes (all the invisible
caching proxies out there at your ISP, etc) are only supposed to cache if the
cache/expires headers are set for that, but 200 is a bad choice for _most_
people running a site for the other reasons listed above like
crawlers/indexers. 5xx is correct for a problem on the server, usually 500,
503 or 504
([http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html)).

At the same time there are many erroneous implementations (intentional and
unintentional) that, as you imply, just cache any 200 without checking cache
headers. This is also a good practical reason to avoid doing this.

------
TeMPOraL
I know that pg "actively does not care about search engine results", but HTTP
spec has _other applications besides Google pagerank_. It's hard to build
amazing new technologies and improve the Web if people keep ignoring the
standards without a good, technical reason. Please, for the sake of the
example set to others, send the proper HTTP codes.

------
joshsegall
This isn't bad for just Google. RSS aggregators were also getting the
"everything's fine" message. I thought I had a bug in my aggregator until I
went to the site and realized it was down.

------
ignostic
He's right that for most sites this would be undesirable, but PG has stated
that they aren't looking for a lot of Google traffic. But then, just because
it doesn't matter to PG doesn't mean it doesn't matter at all.

I've more than once found myself Googling old HN threads I'd like to find, but
can't. Google's search (and site search) is miles better than HN's, but HN
intentionally limits Google's crawl rate, thus limiting the amount of content
crawled and indexed.

------
mountaineer
Just hit HN this morning and got the downtime message. Then I remembered this
post and did a hard refresh to get back to normal. Browsers certainly
aggressively cached it.

~~~
HNJohnC
Same here, not sure why that isn't touted as a more important reason to
provide the correct status codes rather than simply saying Google search isn't
important.

Standards are important.

~~~
conanbatt
Completely agree. I didnt even think of hard refreshing until i saw the
"server is back up" status on twitter.

------
eande
I am not a news junky, but I realized how dependent my newsfeed is from Hacker
News. Seeing about 6-8 times the outage message reminded me on why I keep
coming back and reading the quality entries of these forums.

------
archildress
Welcome back, HN - I had an unusually productive (yet unstimulating) day at
the office.

~~~
Udo
Don't take this the wrong way, but after seeing hundreds of identical HN-
outage productivity jokes/references on Twitter today, I really hope this is
the last one I'll see in a while.

~~~
frou_dh
I finally added tail-recursion optimisation to my Lisp interpreter while HN
was down. Unless I hear otherwise, I'm taking the Onion crown!

------
cyanbane
I was 8% more productive today. Down-vote at will.

------
Yuioup
BTW I just realised that Chrome has been serving me a cached version of HN the
whole time. Didn't realise it was up again. When did it go back on-line? An
hour ago? More?

~~~
dntrkv
Were you a member at computer-forums.net like 8 years ago? I know it's a long
shot, but I remember someone with the same username as yours. I know it's not
to the most unique username, but I thought I would give it a shot.

~~~
Yuioup
No sorry. I did come up with the name a long time ago but (like 1999 or
something). I've used this username on /. for a while.

------
ck2
By the way, HN still says it is down at this url

[https://news.ycombinator.com/](https://news.ycombinator.com/)

~~~
maxerickson
The down pages had long cache times, you just need to refresh.

------
creativityland
@HNStatus helped keep me updated but it wasn't posted on the maintenance page
until the end of the day for everyone.

~~~
jacalata
And because of the caching, it wasn't posted on the maintenance page at all if
you visited before it was added. (I was seeing it on one machine and not the
other).

------
fmax30
Now i know why i had to refresh when i first opened HN 10 minutes ago even
though it was up at that time.

------
garthdog
I didn't know this was up until somebody told me to ctrl-reload. Nice one with
the 10yr cached soft-500 page. :P

Now that our brittle forum is back, let's get back to work nitpicking the
Android UIs that aren't _quite_ beautiful enough! This is not enough drop
shadows!

------
datawander
Even if the author is correct in that Google's ranking algorithm for
HackerNews will be affected by just 24 hours of downtime, wouldn't the
algorithm update itself back to normal over the next 24 hours?

------
foobarian
Seems like a fair point in principle, except in this case HN is one of the few
sites I don't need Google to get to and don't care about any other tools that
might rely on returned status codes.

~~~
sysop073
Tools like web browsers?

~~~
JetSpiegel
Of course. foobarian sends an email with the URL to a service and it returns
the webpage, which he reads in emacs.

------
wiradikusuma
I know it suppose to make you more productive when it's down (I was), but
suddenly I felt clueless during my commuting (which can take 2 hours total).

------
anonfunction
For anyone interested in HTTP statuses this is a great resource:

[http://httpstatus.es/](http://httpstatus.es/)

~~~
factorizer
those people certainly don't know their latin. just kidding.

------
belorn
Question: Is the HNsearch still being worked on? It has broken "link" and
"parent" for search results.

------
dragon1st
I just realized I've addicted to HN, have been kept refreshing the page
millions time :) thanks HN

------
Gonzih
Some people where speculating that HN banned goodle and other search engines.
But from at least /robots.txt I can't see that. Do they do any ip based
filtering? Does any one have information on that? I'm just curios.

------
jwilliams
Maybe this was a prior error message, but the original CloudFlare "Origin
Server" error was returning a 520 - which is a CloudFlare-custom HTTP Status
Code.

Edit: CloudFront -> CloudFlare

~~~
garindra
It's CloudFlare, not CloudFront.

------
grandalf
By the logic of this blog post, a page like status.heroku.com should return a
503 when heroku is experiencing downtime and a 200 otherwise.

200 means that the page loaded as intended (which it did).It turns out that
some of the page's content (the interesting stuff) was unable to be loaded,
and the site's content reflected that.

A 503 would be appropriate if there was a server problem, which might have
actually been the case, but with Cloudflare's landing page there was not
actually a server problem (since cloudflare served the substitute content
properly w/o error)

~~~
pizzeys
I disagree with this. By _that_ logic, 404 pages should also return 200,
because the error page stating that the content couldn't be found was indeed
rendered successfully.

The difference between your status.heroku.com example and this is that in the
former case you are seeking to look at a page that tells you about their
status, when in this case you were seeking HN's index but instead got a page
about status - because there was a problem preventing you from getting what
you wanted.

~~~
randallsquared
At my work, we have had arguments about what searches in an REST API should
return in the case that no results (literally, in our parlance, "no
documents") were found. Is that a 404 or a 200?

~~~
oneeyedpigeon
Does your search page display a list of results, or does it redirect to the
best result? If the former, I'd say that a list of zero is still a valid
search results 'document', and should therefore return a 200. But the latter
could certainly return a 404, although I doubt that's how your search actually
works :)

------
n1ghtmare_
Anyone knows the reason for the downtime ?

~~~
rplnt
I also noticed loss of data.

For example:

[http://webcache.googleusercontent.com/search?q=cache:r53Zl8w...](http://webcache.googleusercontent.com/search?q=cache:r53Zl8wLb94J:https://news.ycombinator.com/item%3Fid%3D7017250+&cd=1&hl=en&ct=clnk)

[https://news.ycombinator.com/item?id=7017250](https://news.ycombinator.com/item?id=7017250)

------
lnanek2
CloudFlare really messes a lot of things up. I've seen CloudFlare refuse to
give me error responses from forms before. Enter a bad value, get a cached
page of the empty form, lol. Server was trying to return a page explaining the
wrong entry, but CloudFlare refused to send it to me because it had a non-200
response.

------
catshirt
ah. this explains why HN has been telling me it's down for 2 days now. had to
open it in incognito to realize it was a cache issue.

------
nayefc
who cares...

------
leoh
Of hundreds of comments I have made on this site, only one has been snarky.
Here's my second: chill your tits.

