Hacker News new | comments | ask | show | jobs | submit login
So what happened to the HN server?
170 points by huhtenberg on Feb 18, 2013 | hide | past | web | favorite | 127 comments
I'm sure I'm not the only one curious how that Altair is holding up.

I have only one portal onto the Internet - this site. I check it first thing and last thing, and when it was down today I had a weird epiphany - I did not mind.

HN is not a service, its not a startup trying to get mindshare - its a (luvvie alert) community - and one where the downtime had a known and obvious cause and did not matter - rtm would fix it, I would catch up, I knew this thread would exist and I would still get insight into what is happening 12 hours ahead of everyone else.

I thought I would be more bothered - turns out I am emotionally invested, not emotionally dependant.

To PG et al, good luck with the fixes, and thanks.


It's curious, isn't it, that we still have "Home" pages. For a while there, Google Reader was my starting point, but nowadays I find myself starting off from HN. This must be what Facebook feels like to normal people, but without the insidious stalking and the dreaded obligations that come from surveillance features like "X is online NOW, you have to interact with him/her!" and "X has seen this (but has neither liked nor commented on it)". Also, I wish my Facebook friends would produce a link and comment stream that is even one tenth as interesting as HN ;)

Normal people? I just vomited a little. Thanks for ruining breakfast.

You're very welcome. And I allege you know that's not how I meant it. Do you really want to assert that the subject matters at HN are part of mainstream culture? I do believe that IT people tend to use the internet differently, this is not an elitist observation in case it came across as such. It's just different.

I have a friend who is big into fantasy sports (blue collar job: construction). The first thing he does when he sits down in front of the computer is that he checks whatever roster he is managing at the time. Is he not "normal"?

I bet if he were talking about fantasy sports on a forum full of fantasy players, he'd neither be surprised nor upset at someone in the forum dividing the world into groups of fantasy players/normal people.

I get the feeling that you choose to misunderstand. If there is anything I can add to this line of discussion I will gladly do it, but I have to admit I'm a bit lost as to what your problem is here.

I didn't choose your words.

No, but you did choose to interpret them in a totally different context, deliberately giving them a meaning that was clearly not part of my statement. Then you ignored, yet still replied to, my clarification.

From my perspective this looks like either a huge misunderstanding or the definition of straw man rhetorics. And I have to say I don't get where your hostility comes from, but I neither recognize my message as it was apparently understood by you nor do I appreciate the tone. I'm always ready to admit that issues like these could be my fault as I all-too-often like to hide behind the fact that I'm not a native speaker. I just don't see it this time though.

In the spirit of being constructive: I don't think the application of the label "normalcy" or the lack thereof is a value judgement in any way, and being "normal" is something that must be interpreted within the given context - otherwise it makes no sense because it's not a global property that encompasses the entire personality of a person.

I brought the label of "normal" in a discussion with some friends. I have come to the conclusion that how someone interprets "normal" has a lot to do with their background. My friends with a strictly technical background (or those without some background in social sciences or the arts) did not seem differentiate between normal and average. In the social sciences normal has additional connotations/ramifications.


Instantly all kinds of connotations can be applied to pretty much any word, as long as someone is willing to completely ignore the context and the intent of the message. Which, I'm sorry to say, I think happened here. Of course we all have "trigger words" that we associate with our little linguistic traumas, but there is a line after the crossing of which you simply attack the wrong people for things they didn't even say. I know, because I've done that occasionally as well and it took someone external to tell and make me realize what was happening. So in this spirit I'm trying to tell you: this is what's happening.

Regarding the actual subject of normalcy I'm finding it difficult to add anything else without repeating myself. From personal experience, there is always "good normal" and "bad normal", as there is "good not normal" and "bad not normal", but even more often it just means "different from the other 80%". In this case, I chose the word normal because it instantly conveys the fact that most hackers use the web in different ways from most of the other people. That's all. It did its job.

It's unfortunate that this triggered something in you, but I encourage you to take a step back and recognize that you saw some things in my message that were simply not there.

I think there is a strong argument to be made that the myth of normal is a socially counterproductive concept -- much as the myth of race. "normal people" maybe sounds sort of close to "those people" which is trigger language that suggests one might not value diversity or might not judge all people as individuals. When I read your comment, I translated it to "adult consumer internet user" but still could not really empathize with your point.

There is also a point where misguided attempts at political correctness drag you down the rabbit hole into a land of pure absurdity where instead of communicating, perpetually offended people are just engaged in a cyclical exchange of trite phrases.

I recognize it might well be an expression of a cultural divide between us, but this is the way I see it: HN is populated by a huge group of very diverse people. While we are diverse in pretty much every aspect, we are also bound together by being hackers and by living, at least party, in whatever hacker culture means for each us. That's not normal, it's a subculture. Almost every single human being is part of at least one subculture, and many are members of several, and there are millions of overlapping points that defy categorization. I cannot for the life of me figure out why it would be considered wrong to express this fact.

I can say for sure that I am not normal in many aspects, and then again I am normal in a lot of others. It's not somehow evil to recognize and talk about this. I think before dfc's comment, the number of people who misunderstood my original comment was very low, but my point is tainted now, subverted and burnt beyond recognition.

This is a very peculiar comment. What exactly in the word "normal" made you vomit - especially in the context used by the OP?

I am not familiar with "luvvie alert"?

Imagine you are at a nice party in London. You are chatting to an interesting sounding person who turns out is an actor, and then you hear phrases like this

"just one summer with the RSC back when I was younger".

"Of course Johnny said, ... umm, Gielguld of course dahling, did you see him in the Tempest in 74? Wunderful"

"I was saying to Sam just after that Quantum thing. You really should talk to Barbara about directing the next one. I have her number; charming lady, did I ever tell you about the time she ..."

A small red rotating flashing light will rise up out of the coffee table next to you and the words "luvvie alert" will be flashed across your field of vision.

Well you learn something new every day. From your beautiful description I could picture this type of person exactly !

It's interesting that none of those references refer to the Lovey Howell character in the 60s sitcom Gilligan's Island, which came first to mind for me when reading "luvvie alert". I suppose the character's name+personality is more of a blatant archetype than I (now 43) had realized as a child.

PG wasn't aware that PetSmart closes an hour earlier on Sundays, so he was forced to wait till opening this morning to get the additional hamsters needed to keep this place running.

I remember when this would have been considered noise here; now it's the top comment.

Probably because thats what powers my computer nowdays. Though it's not hamsters, its a plump labradore australian cattle dog mix. Going green.

We three have all been hell banned, you see.

EDIT: More proof -- downvoted -- what could be more hellish than to ban humor?

amazing... Just amazing... Would you happen to know what breed of hamsters he is using?

All I know is that you never have to clean the cage. I'm told that when they run, they don't have side-effects.

Cured djungarian hamsters, I believe.

Djungarian hamsters? Well now you have two problems.

I think they are deadly poisonous Zanzibar hamsters. Genetically engineered from rat and hamster DNA.

Fast is one thing.

BUT I'd much rather get rid of the "this link has expired" syndrome.

Fast is only impressive if the results are modern.

The superior results from my browser's back button far outweigh the inconvenience of the expired link. I click on a link from the "Ask" front page, reply to the top comment, and my comment appears in the thread. Next I click "back" on my browser and I'm back at the front page of "ask".

I do that much more often than I hit an expired link.

That doesn't really have much to do with the expired link issue. That issue exists because the targets of links are stored as closures on the server. What you want is to serialize those closures into the URL itself, instead of letting the URL be a pointer to a closure stored on the server. In traditional web applications, you'd have to do that serialization manually (although if you've never used such a closure based web framework you might not even be aware that you're doing this, just like a C programmer who has never used a high level language might not consciously realize that he is implementing objects or closures or garbage collection manually -- or like an old Fortran programmer who is not aware that he is implementing recursion manually). The problem used to be much worse, but then PG did this transformation manually for the important subset of the site (most importantly for links to the comments section -- these links no longer expire).

Here's an example. Currently if you go to the home page and then click the "More" link at the bottom to go to the next page, and wait long enough, the more link expires. That's because it's currently implemented as something like this (not sure about ARC syntax, I'll use Scheme syntax here, but you get the idea):

    (define (show-list-of-posts page-number)
      ... display the rest of the homepage ...
      (link "More" (lambda () (show-list-of-posts (+ 1 page-number)))))
This stores the closure `(lambda () (show-list-of-posts (+ 1 page-number)))` with the free variable `page-number` in a hash table on the server. The URL becomes an index into that hash table. But the only information needed to reconstruct such a closure, is the function body and the free variables. So if we defined an auxilary function:

    (define (foo page-number)
      (show-list-of-posts (+ 1 page-number)))
then we could represent the closure as the pair ("foo",page-number). If we encode that in the URL, then instead of looking up the closure from the hash table, we can reconstruct the closure on the fly. Hence we do no longer have to store anything on the server, and no links can expire anymore.

There are some challenges when you want to do this transformation automatically, but they can be overcome.

"The problem used to be much worse, but then PG did this transformation manually for the important subset of the site (most importantly for links to the comments section -- these links no longer expire)."

I don't think this is strictly true. The comments and reply links simply link to a normal (non-closure) URL on the site and the page is generated from that URL in the normal manner.

I don't understand the disagreement...isn't that the same thing I said? IIRC it used to be the case that all links could expire, then he did this transformation from closures stored on the server to closures manually serialized in the URLs (which results in a normal URL as you say). What I'm describing is how a web framework could do that automatically, which means you get the behavior and expressiveness of closures without the link expired issue. Though I can see that the explanation isn't very clear, but I don't know how to make it more clear. Edit: don't downvote him people, it is a perfectly fine comment.

> then he did this transformation from closures stored on the server to closures manually serialized in the URLs (which results in a normal URL as you say).

The reply links have a parameter 'id' - the id of the comment to which the reply is being posted. I would guess it's just passed to a normal function which adds the reply to the post.

May be you mean the same thing, but "serializing closures in the url"(where? all I see is the id of the post being replied to) isn't the same as params in the url which are then passed to a function.

Yes, my point is that it is the same thing. What's happening is just manual closure conversion (http://en.wikipedia.org/wiki/Lambda_lifting), except the resulting data gets saved in the URL rather than the server. On the server, closures saved in the hash table, also have "params which are then passed to a function", except that the params (i.e. the free variables) are saved on the server inside the closure rather than in the URL.

If you have a lambda expression like `(lambda () (show-list-of-posts (+ 1 page-number)))`, the Scheme implementation does these things:

1. Introduce a global function definition with the same body as the lambda expression but with an extra parameter for the free variables. The free variables in the body of the function get replaced with expressions that extract their value from the extra parameter that has the free variables.

2. Convert the lambda expression to an expression that builds a pair of a function pointer to that global function, plus the values of the free variables.

For example the code:

    (define (show-list-of-posts page-number)
       ... display the rest of the homepage ...
       (link "More" (lambda () (show-list-of-posts (+ 1 page-number))))
Will get converted to something like this:

    ;; this is the extra global function that has the body of the lambda expression
    ;; note that the reference to `page-number` got 
    ;; replaced by `(extract-value "page-number" params)`
    (define (closure-324 params)
       (show-list-of-posts (+ 1 (extract-value "page-number" params))))

    ;; note that the lambda expression gets replaced by a create-closure expression
    (define (show-list-of-posts page-number)
       ... display the rest of the homepage ...
       (link "More" (create-closure closure-324 "page-number" page-number)))
create-closure creates a closure data structure where the first argument is the function pointer, and the rest of the arguments are the free variables.

Now, what happens if you want to remove this closure business in a web application, and instead use normal URLs?

First, you introduce a global request handler for the body of the lambda:

    (define-handler (post-list-handler params)
       (show-list-of-posts (+ 1 (extract-value "page-number" params))))
This would define the handler for news.ycombinator.com/post-list-handler?page-number=12.

Then, instead of the (link ...) with a closure, you just link to that url in the show-list-of-posts function:

    (define (show-list-of-posts page-number)
       ... display the rest of the homepage ...
       (link "More" (create-url post-list-handler "page-number" page-number)))
Compare these code snippets to the one above. Do you see the similarity to closure conversion? In both cases we:

1. Introduce a global function/handler for the body of the lambda.

2. That function/handler gets a `params` argument that has the free variables.

3. Everywhere a free variable is referenced in the body, it gets replaced by an expression that extracts the value from the params argument.

4. In place of the lambda expression, we have respectively a (create-closure func free-vars...) or a (create-url handler free-vars...)

So it's really completely analogous. That's why I say that we are just serializing the closure here, and this could be done automatically. Hopefully this makes it more clear what I mean, but maybe these details just make it less clear if you're not familiar with how closures are implemented (closure conversion)...

What I would love would be if the "More ..." link at the bottom of any given page would link to Page N+1 of whatever the current ordering of pages is, rather than page N+1 of a listing of articles that is now out of date and gives me an error message.

Great explanation of what's causing that - thank you!

Still, why not link to a page offset and just show what is current? Or include it with the current linking and automatically redirect? Pretty basic ux optimization.

Because the functions which generate pages get garbage collected. Once the function is gone, there is no reference point.

I suspect hindering spam and vote manipulation plays a part in the architectural decisions. It also makes it possible to create different HN's for different users - e.g. the hellbanned, royals, and plebeians.

I know about hellbanned, but what are the royals and the plebians?

Theoretical particles required by my unified theory of HN.

Interested. Link?

Royals and plebeians might be a little strong. YC folks have a slightly different interface. I think the main differences are that YC usernames show up in orange and there is a link along the top that displays the most recent submissions from YC folks/alumni.

"Royals" came to mind from FlameWarriors - though maybe my predispostion to think of PG as the Philosopher King of HN played a role. Anyway, "Plebians" was just pushing the political analogy a little further. I'll take your word about the differences in interface. I really was just conjecturing.


The worst is when you write a big long comment, submit it, and get "this link has expired" -- then you hit back, and the textbox is empty.

This is particularly infuriating on a cellphone.

One quick fix: in the code that prints this message, check to see if the request was a comment-post action. If so, append, "...but for your convenience, here's the text you tried to post, so it's not lost forever: [...]"

(Just watch out for XSS!)

There are extensions that help with that.

strange productivity peak recorded in the IT sector during the past 16 hours

Soon in the news: "New startup emerges from stealth mode after delivering on promise to improve IT efficiency by 80%"

16 hours?! I'm so glad I slept through most of it then!

strange productivity dip happening in the IT sector right now

Saturday, I thought: "This guy (PG) is really brave, everybody is nowadays using virtualized server environments on mulit-core machines and PG just gets some bare metal and puts his app on one core and one thread. WTF, this guy is a genius, single-threaded apps on bare metal FTW, this is so great and I am doing this too."

After seeing this thing down for several hours -- and Google hate downtimes, they punish you instantly -- I think: "Maybe not."

However, happy to hear what went wrong and why we still should go for the bare metal thing.

Virtualized environments vs. bare metal has never really been about performance. It's about whether you want to spend the time required to tend your servers, or whether you want to treat them as disposable.

What would the benefits of using a virtualized server for a relatively small (in terms of server needs) site like HN be? The way I see it so far, is that virtualization is mostly beneficial for hosting companies that can provide cheaper hosting and have better isolation and easier resource allocation. But I can't think of any big benefits for the site owner, assuming they can afford a dedicated server, and I can see downsides of virtualization, so I am genuinely curious if I miss something.

One thing is abstracted failure-proof(ish) disks. I'm considering moving from Linode to Hetzner for the enormous amount more RAM I can afford on a bare metal box there, but the one thing that gives me pause is having only RAID1 redundancy, and having to manage it myself.

(Live) Migration is one nice feature of virtualization.

the site actually runs very quickly, so can't argue it needs more cores / threads.

seems the issue was more with the migration.

so i reckon you're good to get back to removing all the threads from your apps :)

I totally agree, this new machine is so fricking responsive that I still think bare metal rules but I am just so afraid of the system operations of such a machine. But maybe Paul can give us some hints and why it's still worth to go the bare metal route.

EDIT: since Heroku fooled Rapgenius and us all it would be one more reason to get into system operations again and host on bare metal.

Many people use baremetal for their base-load and virtual/cloud instances for their fail-over and elastic loads. Its not all or nothing.

I wanted to reply that it's actually relatively fast, not very fast. Much faster than before, but there are many sites which are much quicker.

Timing some pages, the response is around 350ms. That means 95% of the loading time is networking, not generation time. You're right, the server is really fast given the load HN brings down on one server!

Virtual servers vs dedicated servers has not much to do with reliability. Both go down. What you want to minimize the risk of downtime is redundancy, e.g. hot fail-over.

what's the best way to have hot fail over?

That very much depends on the scale you're at and the database that you use. If your scale is such that you could comfortably run on 1 machine (as it is for 99% of sites), it's probably sufficient to rely on your database to handle the replication. For example for postgres: http://www.postgresql.org/docs/devel/static/high-availabilit...

Then to utilize this you want a load balancer in front that's unlikely to fail and if it fails can be restarted rapidly. This load balancer should send requests to the other server as soon as one has failed. The other option is DNS failover, which can work but has different trade-offs.

One thing that this does not protect you against is repeatable failure due to a software bug. If a request comes in that happens to crash one of your servers, load will be quickly switched to the other. But the user that cause the original server to crash might refresh his page because he didn't get any response back, which also sends the problematic request to the other server, and might crash that one too. These kind of problems are very hard to deal with (if the requests don't come from humans but from programs that automatically retry their request to other servers when they don't get a response it's even worse: a single bad request can bring an entire cluster down in seconds).

Paradoxically sometimes measures to improve availability can cause availability to go down. If you make your architecture more complicated you introduce more opportunities for failure, especially due to bugs. If you have multiple servers in a pipeline, for example a load balancer and then the real servers which in turn talk to the database servers, that can also cause your likelihood to fail due to hardware to increase. If your pipeline depth is three, then the chance that you have a hardware failure is about three times as big compared to when you had a pipeline depth of one. You want to minimize the depth and maximize the width of the pipeline.

So you should ask yourself whether it's worth it, especially when you're small. Many of the successful sites had or still have a lot of downtime. Maybe it's better to minimize the duration of downtime with fast restarts rather than trying to lower the probability of downtime.

Oh, that explains why this site is really slow occasionally.

how did you know everyone uses virtualized server environments on mulit-core machines..

Here is a curious thing--now that its up, checking http://news.ycombinator.com/threads?id=pg times out. And apparently only for PG. Other frequent commenters work fine.

It is back up now as well as the comments page.

More importantly, why isn't there a status page for something that's equivalent to a lifesaving drug for many?

there's only one server. if the server is down, then assume the status is not 'green'.

How can Joe-random-user be certain the server is down?

http://isup.me/ (their official shortened link).

Aside from the various services to test on, if you're using Chrome you'll see a "Other people have reported this site is unavailable" message in addition to the standard errors.

use twitter - there were various things there, including an explanation from pg and some people recommending alternatives.


And five free issues from Hacker Monthly :) . That made me happy.

My bad karma tainted the server..

I'll upvote you so it doesnt happen again.

Thanks Bro. Lived in Colorado for 3 years, Eric cartman was my role model ;)

phew Almost thought HN was blocked here in China...

There is no way to know if it's just you for whom it's blocked? Are services like isup.me/domain also blocked?

There s a plethora of checks I could do, and a lot of sites which check from different locations within china. The Internet is not same everywhere... But even some of these sites are blocked.

I did visit one of those quickly and the result was HN should be reachable... Hmmmm, therefore I. Checked with another age and this also timed out on my tablet. So either my ADSL or the server was problematic? I went for dinner and upon return it worked.

The check I used was http://www.blockedinchina.net/ First time I used this site...

www.viewdns.info has a "Chinese Firewall Test" to check if a particular domain is blocked in China.

When blocked browser behave differently. btw I'm in Beijing too, maybe we met in open parties?

This looked more like the block you get when you are on the site's black list. How many HN readers are in Beijing? In zhongguancun?

I must admit that my first thought was whether I accidentally got hellbanned.

When I saw http://www.downforeveryoneorjustme.com/http://news.ycombinat... I realized that I wasn't.

Chaoyang, but almost daily in zhongguancun. Mail me if you wanna meet up...

I'm in Chaoyang also (Sanyuan xiqiao area) but work in ZGC. A MS'er by any chance?

Was on a tablet and page timed out... Didn't feel anything different than nytimes. And another page also didn't load as if new records were added.

But you could have easily met me. Mostly Linux events... mail me if you want to know

phew lmost thought HN was blocked when behind VPNs

There was a post about it recently (advising that there would be downtime) - they've upgraded. It's considerably quicker now!

That was Saturday but appears to be related. PG posted this to Twitter[1]:

  Apologies for the HN outage. We think it may be related
  to the new server. This could take a while to fix.
[1] https://twitter.com/paulg/status/303350671460147200

original pg's announcement:


Though do wonder if it being quicker has something to do with everyone getting bored of refreshing it.

That was for Saturday morning.

Obviously the new server has some growing pains. :P

Just because the switch-over went smoothly doesn't mean that normal operations didn't fall apart later on.

But saturday morning the site was up Perhaps the server upgrade was delayed or something went wrong with it.

The site was down briefly saturday morning.

isn't a bit long for a server upgrade ? shouldn't be there better strategies for quick switching ?

Yes they are and they're applied to critical systems. I doubt HN is regarded as such. Especially if it would require splitting backend/frontend servers and replicating both to do step by step upgrade of each component.

Alternatively you know... you can just accept that the service will be down for an hour or two and most people wouldn't even notice without this post.

Are you paying for this?

What would you say if Google.com was down for 2 hours due to a server upgrade? Do you pay Google for searching?

Bonus points if you do actually. Their search API is limited to 100 a day, which is not a lot.

If you're using google search, you're paying google in ad impressions.

And what do you think these[1] are? HN has ads, they're just less conspicuous.

[1]: https://news.ycombinator.com/jobs

Aren't those free for Y Combinator companies? Also you could just say the entire site is an ad for Y Combinator.

This actually confirmed my thought, it was not a money related issue ..


I was starting to freak out without HN! (Even created an account just to express my feelings)

This latest test suggests that HN seeks membership of errorban club. http://www.codinghorror.com/blog/2011/06/suspension-ban-or-h...

Goodbye productivity.

Goodbye procrastination.

Curious too. I was able to ping the servers when it went down again a few hours ago just fine. Let's just hope it stays up now!

Why don't you Ajaxify HN while you're at it??

Not trolling but just wondering. What added value does Ajaxifying the site bring to HN?

:) I'd instantly know when I get downvoted for saying something stupid ;)

seeing submissions in realtime, reducing load on the HN server. Saving F5 addicts.

F5 addicts will never go away.

They will, once you have a highly responsive single-page site. I think more responsivity can calm down even F5 addicts

Good point on the F5.

Why don't you implement AJAX support in news.arc? That way PG won't have to spend the time writing it...


Good Point. Thanks for posting the link.

Sorry, I don't know Lisp and Arc seems to be a dialect of it that I also don't know.

I thought the thing was running off of a server on an iPhone; probably just ran out of battery power.

i had to work for heavens sake

Do we get a new server? Please tell me more :)

What else can go wrong on monday?

I think it may need some time to adapt the new server.

Apparently it is not out of the woods yet - the comment tab links to the front page.

Seems to be showing the homepage, not redirecting it. I think PG/RTM must have introduced a bug while fixing the problem. Have they no unit tests ;-D ?

Or maybe they removed the comment page because it was causing the crashes, possibly related to the Unicrud Twitter crashes?

Thank God Its Up now...:)

Thank God it's up now. FTFY

Overcapitalization is annoying, and it's "it's", not "its".

You should thank PG instead.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact