Hacker News new | comments | show | ask | jobs | submit login

Still investigating. The site was slow all day. We got an immense spike in unique IPs. Typically we get a bit over 150k/day. Today we got 220k. Not sure if the downtime was related.

I was travelling today and didn't have proper access to the server, or I would have been on it sooner.




Update: An examination of the logs shows nothing remarkable happening at the moment of the crash.

We may have run out of some resource. In that case it may be hard to know for sure what happened.


Question: Did people find HN horribly slow this afternoon? (I found it unusably slow around 2 pm pacific, but there was little I could do about it from where I was then.)


I found it horribly slow from about 18 hours ago (3pm GMT) onwards. I think the last time I tried it was about 9pm GMT, and it was slow them. It took about 30-60 seconds to load a page, maybe more.

Edit: this was using https, I didn't try http.


I was using my phone browser when the site was down. What I noticed was that the "secure" icon was displayed when I tried to connect. However, this may have been an artifact in the address bar from the previous site which was secure - I haven't paid that much attention to how the phone's browser handles slow loads.

The reason it caught my attention was due to the recent https "Ask" threads, and my first thought was perhaps full time https had been implemented. I wouldn't bring it up except that it was unusual and correlated with HN content and the outage, even though it is likely to be a coincidence.


The site had seemed slow basically all day for me, from say about 8:30-9am central time (I'd been up and checking the site since 6) up to the point when it crashed.


Were you using https or http?


I was getting the proxy error message with https. I never tried http then.


I've been using HTTPS consistently.


Yes: unusably slow both times I tried yesterday.


If it helps at all, I'm pretty sure there was a period around 3:30 when http was working but https was not.


I noticed that too. I'm not sure of the time, but it was more than a momentary thing. Https was either not responding at all or it was responding only after a long wait. A bit later, https was taking 20-30 seconds before returning the page, which I think was an improvement over the earlier period where I didn't see it return a page at all.

I didn't personally notice any http outage or delay.


Probably related, hackerne.ws kept working for me well after news.ycombinator.com stopped, possibly because I was accessing the latter over HTTPS.


I will repeat something I said below in my comments because I'm not sure you will see it:

"But there is no question that it is a good idea to get to the bottom of why this happened. I have found in the past it always pays to do a post-mortem on any system problem as many times there is something lurking that will cause that same problem in the future." [1]

I can't tell you how many times over the years some little thing has been an indicator of something else going on or caused a big problem later. It's normally better to diagnose this while you are not under duress, have room to breath, and aren't under any pressure.

[1] Non system related last week my wife couldn't get into the garage because the GFI on the same circuit as the garage door opener had tripped. The same GFI is in the upstairs bathroom and had blown previously many times but it didn't seem to be a problem. We didn't realize it also controlled the garage door. So she couldn't get into the house (she didn't have her key). So by ignoring the problem (just keep resetting the GFI and hey - figure it out later) we had a larger impact and issue than it seemed on the surface.


What were the most common activities done by the IPs that were "new" in the last twelve hours? New compared to the history of IPs available. Just wondering if it was orchestrated or whether it would give a clue as to the resource that was consumed.


More instrumentation/monitoring details needed to help diagnose future occurences?


Maybe add some monitoring such NewRelic and ServerDensity.


A -vague- suggestion: did you tried to run ghost users on HN by saving some bunch of users' session on the site?... You should be able to know if HN scale and maybe replicate the crash.



I think so as well. Oddly enough that post on TC rose to the front page of reddit at the same time HN was down.

http://www.reddit.com/r/technology/comments/xel55/startup_cl...

It's been sitting on the front page for the past few hours. I suspect several sources led to the TC article and then through to the post on HN.


This was also submitted earlier today to /r/technology:

http://www.reddit.com/r/technology/comments/xdwqk/ubisoft_up...


Unlikely, TC typically delivers <10,000 mentions to your site if the article is about your site. For a mention I doubt it generates significant traffic.


That would be my guess as well. Although I am really pleased Codecademy has Python now.

edit: removed the 'A'


"We got an immense spike in unique IPs. Typically we get a bit over 150k/day. Today we got 220k."

I wouldn't exactly call that immense but more importantly it an increase like that certainly doesn't seem to be a reason for a site like this to go down (I didn't see it was down I'm going by the headline).


A 50% increase in unique IPs isn't necessarily something which could cause downtime, but it certainly could be related -- which is the word Paul used.

For example: HN relies heavily on caching page data. If 70,000 robots suddenly started loading random items, it could easily cause both the jump in unique IPs and the downtime.


"I was travelling today and didn't have proper access to the server, or I would have been on it sooner."

The obvious question is isn't there anyone else who minds this tremendously popular site and is on duty if you are traveling?


I'm guessing the answer is something along the lines of:

"Despite being immensely popular, it is a labor of love and generates little (or no?) revenue, so unfortunately, no there isn't anyone else minding the site."


Somehow I don't think any harm will come if Hacker News goes down for a few hours.

Frankly, every time HN is down, I wonder whether this is finally the day PG has given up on the project, because it's slipped into dire form in recent years.


"Somehow I don't think any harm will come if Hacker News goes down for a few hours."

What do you define as "harm"?

People are creatures of habit. If something isn't available many times they will fork and find something else to do with their time which they might find as or more enjoyable or educational and then start to gravitate toward using that website. You only have so much time. If you find an alternative site that you like that might very well end up being your "addiction" and then that site could build critical mass.

I used to start the morning with my laptop at Starbucks. I no longer do that. For the longest time I did that every morning. Then something changed and I found something else to do in the AM rather than that and achieved the same benefit. As an aside this is well known by Starbucks. I was told by one manager that at all costs they try to keep a SB being renovated open lest they loose people who have made it a habit to stop by and pickup their coffee.

Now all this may or may not be important to PG. And sure a little downtime isn't going to have a major impact on usage.

But there is no question that it is a good idea to get to the bottom of why this happened. I have found in the past it always pays to do a post-mortem on any system problem as many times there is something lurking that will cause that same problem in the future.


A sudden stop or reversal in the popularity of HN would do the site some good.


I bet that's a feature. Here's the headline:

"Hacker News goes down. Thousands of developers forced into unwanted productivity."


what language/framework is HN built on?





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: