

Ask PG: Can you please fix the connection resets and / or explain them? - jacquesm

I see HN refuse connections many times every day without any clear reason on my end, I've asked two other people to test, one had the same problem for the other it worked fine so it is a bit flakey.<p>It's usually accompanied by HN slowing down and returning a page in 6-10 seconds or so.<p>My guess is (but I really am poking in the dark here) that it has something to do with the accept queue overflowing and the network stack sending out RST packets (I've seen those using tcpdump so they figure in there somewhere).<p>Alternatively the 'listen' gets dropped.
======
pg
It's because GCs are starting to take a long time now that there is so much
stuff to keep in memory. Items are lazily loaded, but they never get unloaded.
Eventually (after a day or so) crawlers drag everything into memory, and GCs
take excessively long. I plan to fix the problem by starting to dump stuff
back out of memory. In the meantime I just restart HN when it's starting to
spend too much time GCing, which in effect dumps everything out of memory
except the most recent 15k items.

~~~
jacquesm
How frequently does this happen ?

A surefire way seems to be to load the top 10 submitters history, those are
pretty big, since only the last 300 can be used to build pages anyway maybe
you can do a digest there and not load the whole thing ?

------
brk
I thought I saw something posted before that this is basically a bug in the
ARC code that runs the site, or the underlying server daemon. I filed it in
memory (wet) as roughly a memory leak. There is a process that HUP's
(something) whenever it happens.

My impression is that it's a known issue, and in the priority queue, but
probably still ranks right behind other more pressing issues like deciding
what's for dinner.

~~~
jacquesm
This thread: ?

<http://news.ycombinator.com/item?id=1049859>

I mistook the RSTs for a socket without a listen so I thought the server was
restarting, but the overflowing listen queue is more likely, especially in the
light of riffers comment there. That means the server is still up but not
accepting connections.

iirc the default is something like 5 outstanding connections, that is a thing
you could fairly easily remedy.

It's bugging me because I see it happening more and more often.

edit: irony, I had to submit this three times to get through...

~~~
brk
No, it was a different, much older comment. I think by pg.

I see this about once/day (and I spend WAY too much time refreshing HN).
Slightly surprised that you see it frequently enough to comment/inquire, guess
it's just random luck :)

~~~
jacquesm
> and I spend WAY too much time refreshing HN

Errm, yes. I think I do too.

> Slightly surprised that you see it frequently enough to comment/inquire,
> guess it's just random luck :)

At least once per hour sometimes many more, at least 10 times per day.

~~~
lt
I'm curious about your _workflow_ on HN. Are you mostly refreshing /newest and
/newcomments?

~~~
jacquesm
mostly newest, and 'threads'.

~~~
brk
Heh, me too.

Yet, we seem to have very different experiences.

What browser/OS are you using?

I am on OS X/Firefox.

~~~
jacquesm
FF on linux.

~~~
brk
I wonder how many concurrent connections FF/linux opens as opposed to FF/OSX.
IE: your architecture may present more strain on the server and contribute to
the higher rate of timeouts you encounter.

It's only an uninvestigated theory of mine.

