Hacker News new | comments | show | ask | jobs | submit login
Ask PG: What caused the downtime?
142 points by aaronbrethorst on July 31, 2012 | hide | past | web | favorite | 84 comments
It looks like the site was down for a couple hours. What happened exactly?

Still investigating. The site was slow all day. We got an immense spike in unique IPs. Typically we get a bit over 150k/day. Today we got 220k. Not sure if the downtime was related.

I was travelling today and didn't have proper access to the server, or I would have been on it sooner.

Update: An examination of the logs shows nothing remarkable happening at the moment of the crash.

We may have run out of some resource. In that case it may be hard to know for sure what happened.

Question: Did people find HN horribly slow this afternoon? (I found it unusably slow around 2 pm pacific, but there was little I could do about it from where I was then.)

I found it horribly slow from about 18 hours ago (3pm GMT) onwards. I think the last time I tried it was about 9pm GMT, and it was slow them. It took about 30-60 seconds to load a page, maybe more.

Edit: this was using https, I didn't try http.

I was using my phone browser when the site was down. What I noticed was that the "secure" icon was displayed when I tried to connect. However, this may have been an artifact in the address bar from the previous site which was secure - I haven't paid that much attention to how the phone's browser handles slow loads.

The reason it caught my attention was due to the recent https "Ask" threads, and my first thought was perhaps full time https had been implemented. I wouldn't bring it up except that it was unusual and correlated with HN content and the outage, even though it is likely to be a coincidence.

The site had seemed slow basically all day for me, from say about 8:30-9am central time (I'd been up and checking the site since 6) up to the point when it crashed.

Were you using https or http?

I was getting the proxy error message with https. I never tried http then.

I've been using HTTPS consistently.

Yes: unusably slow both times I tried yesterday.

If it helps at all, I'm pretty sure there was a period around 3:30 when http was working but https was not.

I noticed that too. I'm not sure of the time, but it was more than a momentary thing. Https was either not responding at all or it was responding only after a long wait. A bit later, https was taking 20-30 seconds before returning the page, which I think was an improvement over the earlier period where I didn't see it return a page at all.

I didn't personally notice any http outage or delay.

Probably related, hackerne.ws kept working for me well after news.ycombinator.com stopped, possibly because I was accessing the latter over HTTPS.

I will repeat something I said below in my comments because I'm not sure you will see it:

"But there is no question that it is a good idea to get to the bottom of why this happened. I have found in the past it always pays to do a post-mortem on any system problem as many times there is something lurking that will cause that same problem in the future." [1]

I can't tell you how many times over the years some little thing has been an indicator of something else going on or caused a big problem later. It's normally better to diagnose this while you are not under duress, have room to breath, and aren't under any pressure.

[1] Non system related last week my wife couldn't get into the garage because the GFI on the same circuit as the garage door opener had tripped. The same GFI is in the upstairs bathroom and had blown previously many times but it didn't seem to be a problem. We didn't realize it also controlled the garage door. So she couldn't get into the house (she didn't have her key). So by ignoring the problem (just keep resetting the GFI and hey - figure it out later) we had a larger impact and issue than it seemed on the surface.

What were the most common activities done by the IPs that were "new" in the last twelve hours? New compared to the history of IPs available. Just wondering if it was orchestrated or whether it would give a clue as to the resource that was consumed.

More instrumentation/monitoring details needed to help diagnose future occurences?

Maybe add some monitoring such NewRelic and ServerDensity.

A -vague- suggestion: did you tried to run ghost users on HN by saving some bunch of users' session on the site?... You should be able to know if HN scale and maybe replicate the crash.

I think so as well. Oddly enough that post on TC rose to the front page of reddit at the same time HN was down.


It's been sitting on the front page for the past few hours. I suspect several sources led to the TC article and then through to the post on HN.

This was also submitted earlier today to /r/technology:


Unlikely, TC typically delivers <10,000 mentions to your site if the article is about your site. For a mention I doubt it generates significant traffic.

That would be my guess as well. Although I am really pleased Codecademy has Python now.

edit: removed the 'A'

"We got an immense spike in unique IPs. Typically we get a bit over 150k/day. Today we got 220k."

I wouldn't exactly call that immense but more importantly it an increase like that certainly doesn't seem to be a reason for a site like this to go down (I didn't see it was down I'm going by the headline).

A 50% increase in unique IPs isn't necessarily something which could cause downtime, but it certainly could be related -- which is the word Paul used.

For example: HN relies heavily on caching page data. If 70,000 robots suddenly started loading random items, it could easily cause both the jump in unique IPs and the downtime.

"I was travelling today and didn't have proper access to the server, or I would have been on it sooner."

The obvious question is isn't there anyone else who minds this tremendously popular site and is on duty if you are traveling?

I'm guessing the answer is something along the lines of:

"Despite being immensely popular, it is a labor of love and generates little (or no?) revenue, so unfortunately, no there isn't anyone else minding the site."

Somehow I don't think any harm will come if Hacker News goes down for a few hours.

Frankly, every time HN is down, I wonder whether this is finally the day PG has given up on the project, because it's slipped into dire form in recent years.

"Somehow I don't think any harm will come if Hacker News goes down for a few hours."

What do you define as "harm"?

People are creatures of habit. If something isn't available many times they will fork and find something else to do with their time which they might find as or more enjoyable or educational and then start to gravitate toward using that website. You only have so much time. If you find an alternative site that you like that might very well end up being your "addiction" and then that site could build critical mass.

I used to start the morning with my laptop at Starbucks. I no longer do that. For the longest time I did that every morning. Then something changed and I found something else to do in the AM rather than that and achieved the same benefit. As an aside this is well known by Starbucks. I was told by one manager that at all costs they try to keep a SB being renovated open lest they loose people who have made it a habit to stop by and pickup their coffee.

Now all this may or may not be important to PG. And sure a little downtime isn't going to have a major impact on usage.

But there is no question that it is a good idea to get to the bottom of why this happened. I have found in the past it always pays to do a post-mortem on any system problem as many times there is something lurking that will cause that same problem in the future.

A sudden stop or reversal in the popularity of HN would do the site some good.

I bet that's a feature. Here's the headline:

"Hacker News goes down. Thousands of developers forced into unwanted productivity."

what language/framework is HN built on?

It is very interesting how..eh..addictive this site is. It's like a habit to do CMD + T and start typing in news.y..etc. And while it was down I was refreshing every few minutes. I think personally I do get a lot out of this site, I definitely wouldn't be where I am today without it. I've learned a lot, asked a lot, and tried to give back as much as I could. I landed a couple jobs from here that have now set me on a very successful path at such a young age. I'm very thankful for the community here. Sorry for turning this into an emotional post, but I really owe a lot to HN.

EDIT: I do just do CMD + T and then n for everyone who thought I did otherwise. Sometimes it happens so quickly I do new..or whatever, but you get the idea. This is a trivial point.

> It's like a habit to do CMD + T and start typing in news.y

I think you're a real addict if you don't even get past typing ne before the autocomplete comes up :D

Yeah, well, I'm constantly hooked up to an EEG headset that is programmed to recognize, within .03 nanoseconds from onset, the specific electrical wave pattern my brain triggers when I've got even the smallest desire to browse HN, flipping my desk upside down to expose the orange and overlocked 8.2ghz, 32gb DDRAM HN supercomputer loaded with nothing more than the latest webkit nightly set to open with 15 tabs each containing one of the top 15 stories at HN at that time. I've also got a CPAP machine hacked to fit a Camelbak tube through the mask so that when I do forget to continue living, I'm always covered. News moves fast here, and you've got to keep up.

Psh - it's my only "n" site. CMD + T, n, Enter.

I added an edit just for you.

Much appreciated... I thought my comment was in the spirit of the relentless pursuit of factual accuracy for which we come to HN - regardless of the trivialness of the fact... :)

I have my webcam set up to go to HN when I blink.

My problem is that I do exactly this while already reading hacker news.

Seriously who types all that?

It's 'h' + down + enter, and here I am.

At least for me, it's <ctrl-t>n<enter>. Three keystrokes, and probably under half a second.

Same here. It's just gnh in Vimperator/Pentadactyl for me.

Nah, you're a real addict if it's an app tab and you never have to type it.

Then count me as an addict. I have HN as a tab.

Ctrl + 1 for me on Opera

What website is more important/frequently visited and starts with n? I always type n<enter> unless I was very recently browsing the new york times

I tend to prefer http://hckrnews.com just because of the more intuitive browsing interface and filtering options (top 10, top 20, etc). I have it as a shortcut on my main phone screen and bookmarked in all browsers.

Ironically, for me typing n is indeed HN, but the next one is nbcolympics.com, (and we all know how useful the latter is if you don't have cable subscription...) so I assume I don't need to check my self into a HN addict clinic just yet

Nope, this takes the cake for just n but sometimes, out of habit, I start typing really fast and I get the first few letters in.

news.google.com for me

news.bbc.co.uk here

In Safari, CMD + 1 works if you have it as a top-level bookmark. Or rather, in the case of your workflow, CMD + T then CMD + 1.

There has to be a more efficient way of doing this!

set your homepage to hn?

I go a step further: I leave HN open in an app tab

Shameless plug: If you are so addicted you can check out the articles chronologically by day (or hour) using my little project: http://hntimeline.com/ (Hint: using shift + mousewheel to move through time)

For UX's sake, you should make the time big and put it somewhere obvious rather than on the right in little letters in a darker font. Otherwise, cool project. :)

There are minor typos in the message the javascript adds next to the slider for new users: 'Drag it to start moving thorught time'.

Wow, that's an awesome site. Good job!

I wonder if ycombinator could see a correlative increase in the value of their portfolios through the simple process of shutting down news.yc for 3-4 hours a day?

Maybe PG has a script that automatically enables noprocrast on all the YC entrepreneur accounts.

I would hope not. While some of them are almost children (<20), there's no need to treat them like children.

Software solutions to peopleware problems are always a bad idea.

Do you mean from ycombinator-funded founders becoming more productive or by giving reddit more ad revenue from the HN crowd?

Netflix chaos monkey is released into the wild, HN goes down soon after. Coincidence, I think not!

Wow, I got a surprising amount of work done in the last few hours. I had to check isup.me to make sure it was really down, else I would have found a proxy server :)

I hardly got anything done. I couldn't see the monitor through all of the tears.

Is anyone else out there proud to have a healthy relationship with hacker news?

Yes. I only check it on nights, and never for more than eight hours, and never until past 4am, except occasionally at work, but only when I don't have high priority stuff to do, and ... crap.

Doesn't seem like it :)

I would bet it was from some of the direct links on Reddit?

The Ubisoft backdoor was linked on reddit front page for awhile today.

Yep, I have the same question, lot's of frustrated tweets https://twitter.com/#!/search/hackernews%20down

I haven't been this productive in MONTHS ;-)

The outage came shortly after pg retweeted (https://twitter.com/davidstamm/status/229926404333129728):

@davidstamm: In the hour of your death, you will not warmly recall the many hours you spent engaging in vitriolic debates on Hacker News.

So I just assumed the downtime was a social experiment. Maybe even a new circuit-breaker: when global thread sentiments get a bit too snippy, the entire site goes dark for a short period.

I'm curious to see if it had something to do with hitting the front page of Reddit. The Facebook Bot Clicks post received over 1,700 upvotes and perhaps Reddit "performed the most friendly DDoS"

I actually found that I didn't get as much done. I kept checking to see if the site had come back up and wondering if there was a status page hiding somewhere I didn't know about. Sheesh!

I was thinking someone must have unleashed the simian army: https://github.com/Netflix/SimianArmy

I couldn't access the site for a few days (error 502) in Vancouver. Probably unrelated?

I think I might start throttling my bandwidth to 56kbs over HTTP/HTTPS.

Glad y'all got some work done, I just didn't do work elsewhere :).

God, it seemed like an eternity!

same question, noticed after i got sent two bogus password reset attempts...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact