
Why HN was slow and how Rtm fixed it - pg
http://ycombinator.com/newsnews.html#19jan
======
mmaunder
"In 7 seconds, a hundred or more connections accumulate. So the server ends up
with hundreds of threads, most of them probably waiting for input (waiting for
the HTTP request). MzScheme can be inefficient when there are 100s of threads
waiting for input -- when it wants to find a thread to run, it asks the O/S
kernel about each socket in turn to see if any input is ready, and that's a
lot of asking per thread switch if there are lots of threads. So the server is
able to complete fewer requests per second when there is a big backlog, which
lets more backlog accumulate, and perhaps it takes a long time for the server
to recover."

I may have misunderstood but it sounds like you have MzScheme facing the open
internet? Try putting nginx (or another epoll/kqueue based server) in front of
MzScheme. It will handle the thousands of connections you have that are
waiting for IO with very little incremental CPU load and with a single thread.
Then when nginx reverse proxies to MzScheme each request happens very fast
because it's local which means you need much fewer threads for your app
server. That means less memory and less of the other overhead that you get
with a high thread count.

An additional advantage is that you can enable keepalive again (right now you
have it disabled it looks like) which makes things a faster for first-time
visitors. It also makes it slightly faster for us regulars because the
conditional gets we do for the gif's and css won't have to reestablish
connections. Less connections established means you give your OS a break too
with fewer syn/syn-ack/ack TCP handshakes.

Someone mentioned below that reverse proxies won't work for HN. They mean that
caching won't work - but a reverse proxy like nginx that doesn't cache but
handles high concurrency efficiently should give you a huge perf improvement.

PS: I'd love to help implement this free. I run a 600 req/sec site using nginx
reverse proxying to apache.

~~~
sedachv
Or I don't know, use continuations in a place that's actually appropriate?
John Fremlin showed that even with horrible CPS rewriting and epoll you can
get way better throughput in SBCL (TPD2) than nginx. MzScheme comes with
native continuations. It's not hard to call out to epoll.

Instead everyone in the Lisp community (pg included) is still enamored with
using continuations to produce ugly URLs and unmaintainable web applications.

~~~
pg
_Instead everyone in the Lisp community (pg included) is still enamored with
using continuations to produce ugly URLs and unmaintainable web applications._

If you read the source of HN, you'll see that it doesn't actually use
continuations.

I find the source of HN very clear. Have you read it? Is there a specific part
you found so complicated as to be unmaintainable?

~~~
axod
> If you read the source of HN, you'll see that it doesn't actually use
> continuations.

> It had to be some dialect of Lisp with continuations, which meant Scheme,
> and MzScheme seemed the best.

(From further down the page).

I'm confused. What needs continuations?

~~~
pg
I just wanted to have them in the language. The fact that I don't currently
use them in HN doesn't mean they're useless.

~~~
ezalor
I always thought the purpose of Arc was to be cruft-free, "don't include it
unless it is actually needed".

------
rarrrrrr
Since no one has mentioned it yet - Varnish-cache.org, written by a FreeBSD
kernel hacker, has a very nice feature, in that it will put all overlapping
concurrent requests for the same cacheable resource "on hold", only fetch that
resource once from the backend, then serve the same copy to all. Nearly all
the expensive content on HN would be cacheable by varnish. Then you can get it
down to pretty close to "1 backend request per content change" and stop
worrying about how arbitrarily slow the actual backend server is, how many
threads, how you deal with the socket, garbage collection, and all that.

~~~
nuclear_eclipse
Reverse proxies won't work for HN, because requests for the same resource from
multiple users can't use the same results. Not only are certain bits of info
customized for the user (like your name/link at the top), but even things like
the comments and links are custom per user.

Things like users' showdead value, as well as whether the user is deaded, can
drastically change the output of each page. Eg, comments by a deaded user
won't show as dead to that user, but they will for everyone else...

~~~
aonic
Varnish supports edge side includes. The header bar could be an ESI and the
rest of the page could be cached

~~~
nuclear_eclipse
> _and the rest of the page could be cached_

Except they can't, for the reasons I mentioned above. Eg, if my account is
deaded, when I view a thread with one of my own comments, it looks different
than if someone else was viewing that some thread, especially for those of us
with or without showdead checked in our profiles.

Its not as straightforward as you would like it to be.

~~~
nitrogen
The majority of requests probably come from live accounts in good standing or
from people not even logged in, so the majority of requests could still be
cached.

~~~
nkurz
Interesting point: what percentage of viewers are logged in? I was presuming
it was high, but I guess I really don't know.

------
ximeng
Off-topically, piecing together the years that the previous posts were made,
it looks like February 4 2009 had a downvote cap of 100. It's 500 as of late
last year. That suggests the annual karma inflation rate in the HN economy is
sqrt(5), or ~224%.

------
kabdib
I worked at a startup once that made a network card that did this type of
buffering (wait for whole HTTP requests, then forward as a lump to the host,
across a local fast bus).

Pretty whizzy, definitely helped server scaling.

We started shipping in 2001; the dot-com bust more or less canceled any
interest in the product, and canceled the company, too . . .

~~~
pclark
What was the name of the company/product?

~~~
kabdib
Akamba

------
axod
1 thread per connection??? Not doing continual GC in a separate thread and
instead taking 7 seconds and blocking everything?

What is this the 1990s?

~~~
pg
Feel free to fork MzScheme and replace the garbage collector with a new one
that runs continuously.

~~~
axod
Surely there's a way to target the JVM?

I'd trust my life on the JVM, it's pretty battle tested, and the GC is simply
awesome.

~~~
defen
> I'd trust my life on the JVM

You sure you want to do that? :-)

> Java technology is not fault tolerant and is not designed, manufactured, or
> intended for use or resale as on-line control equipment in hazardous
> environments requiring fail-safe performance, such as in the operation of
> nuclear facilities, aircraft navigation or communication systems, air
> traffic control, direct life support machines, or weapons systems, in which
> the failure of Java technology could lead directly to death, personal
> injury, or severe physical or environmental damage

<http://technet.microsoft.com/en-us/library/cc976720.aspx>

~~~
axod
Yeah I'd trust my life on it. I've had java processes running for several
months without issue. The hardware fails before the jvm does.

I wouldn't trust anything from microsoft though.

------
samdk
The traffic graphs linked in this post [0] are an interesting addition to the
"How often do you visit HN?" poll [1] that was done a week ago. From the
graphs, it looks like there are about 10x as many page views as unique IPs.

[0] <http://ycombinator.com/images/hntraffic-17jan11.png> [1]
<http://news.ycombinator.com/item?id=2090191>

~~~
b_emery
Anyone have an idea what the huge spikes in unique IP's are all about?

------
cperciva
pg / rtm: If you need any FreeBSD-related help, please let me know (preferably
not in the next couple of days, though...). There are lots of HN fans in the
FreeBSD developer community.

------
jwr
Which is why it's good to have a mature VM underneath your language. Paul's
choice of basing an implementation of Arc on MzScheme was a very good one (I
remember people criticizing him for not building a standalone implementation
with a new VM).

I write time-critical applications in Clojure and JVM's
-XX:+UseConcMarkSweepGC flag is a lifesaver. We no longer get those multi-
second pauses when full GC occurs.

------
nc17
YC ranks 2400 on Alexa, and I'm sure most of the traffic is HN. I bet you'd be
hard-pressed to find a top 10k site written in Scheme. Does anyone know of
one?

<http://www.alexa.com/siteinfo/ycombinator.com>

~~~
ximeng
And, presumably thanks to tptacek's hard work, one of the key search terms
bringing people in here is "sous vide supreme".

------
joshu
Or you can put a reverse proxy in front.

<http://joshua.schachter.org/2008/01/proxy.html>

(Like I suggested in 2009...)

~~~
ars
BTW, you suggest pound for the slow client problem, but according to this
email it doesn't help for that.

[http://www.apsis.ch/pound/pound_list/archive/2010/2010-11/12...](http://www.apsis.ch/pound/pound_list/archive/2010/2010-11/1289960900000)

~~~
joshu
Pound helped this problem for delicious in 2005, and by the time I wrote this
article it was starting to not be the right answer. In 2011, it's definitely
wrong :)

------
taylorbuley
I feel silly asking, but who or what is 'rtm'?

~~~
kujawa
Some punk skript kiddie who broke the internet in 1988.

~~~
steveklabnik
I hear he crashed 1507 systems in one day.

~~~
cschep
Yo, this is RTM!

~~~
alanfalcon
Well that's great. There goes MIT.

------
allwein
When I read this headline, my immediate thought was "Oh, he must have
forgotten to shut down the copy of his worm that was running on the HN
servers."

<http://en.wikipedia.org/wiki/Morris_worm>

~~~
mahmud
Please, retire that stupid "joke". It's 22 years too old.

~~~
mcantor
First time I had heard it.

------
antirez
pg: you could probably try to write an event driven HTTP server on top of Arc,
so that you don't have this kind of problems. Something like node.arc

Also if I understand correctly you use flat files that are loaded into memory
at startup. It seems like that switching to Redis could be an interesting idea
in theory, as it is more or less the implementation of this concept in an
efficient and networked way.

Probably with such changes you can go from 20 to a few hundreds requests per
second without problems.

------
jey
> _when [MzScheme] wants to find a thread to run, it asks the O/S kernel about
> each socket in turn to see if any input is ready_

They've never heard of select()? </snark>

But really, is there some reason that it's hard to collect up all the fds at
once or something?

~~~
kujawa
Read C10K? Both select() and poll() have this problem internally. You have to
use one of the more advanced techniques available if you really want to scale.
epoll(), kqueue() or friends.

~~~
caf
poll() is _slightly_ better than select(), because you only have to iterate
over the file descriptors that were passed, rather than from 0 to nfds.

~~~
maw
It doesn't hurt that its interface is far more pleasant to use than select's,
either.

------
idlewords
Not being able to handle 20 requests/sec quickly, in 2011, for a read-mostly
website is just shameful.

~~~
jwhite
I think that all depends on the purpose of the site. If I were paying a
subscription to access the site I would be within my rights to object. As it
is, with HN being a free service, built to be the application spurring the
development of Arc, whose reference implementation is intended for exploring
language design and not performance, I wouldn't choose the word "shameful" to
describe this situation.

~~~
idlewords
I guess it depends on your priorities. Personally, I think the community
discussion here is far more interesting than the toy language project it runs
on. If you think Arc is the future of computing you may think of this
discussion board as just a convenient test suite for the language.

Either way, it's 2011 and that really is some spectacular slowness.

~~~
jwhite
My comment was not expressing an opinion on the relative values of Arc and the
HN discussion community. HN delivers lots of value for the modest price of
your time. Claiming that its performance is shameful when it isn't being
directly monetized, or even indirectly monetized like Facebook &co., is
unfair.

If you were talking about Facebook, Twitter, or Basecamp, that would be a
different matter.

------
blinkingled
Sounds like there is a scalability issue within MzScheme in that it iterates
over the number of threads, asking each thread about the sockets it has. As
one can tell, once # of threads and # of sockets grow - finding which thread
to run in user space becomes awfully expensive. As any clever admin will do, a
least invasive fix involving limiting the number of connections and threads
was done - with what sounds like immediate results!

I have no idea what MzScheme is but I am curious about why is HN running
threads in user space in 2011? The OS kernel knows best what thread to pick to
run and that is a very well tuned, O(1) operation for Linux and Solaris.

~~~
svlla
not to mention that one thread per connection is, well, extremely outdated.

~~~
jrockway
I don't know much about MzScheme, but it's quite possible that "thread" means
"stack", not "OS thread". One context stack per TCP connection is quite
sustainable; with Haskell's threads and Perl's coros, I run out of fds long
before I'm using any significant amount of memory. (This is somewhere around
30,000 open connections on my un-tweaked Linux desktop. I know I can do a lot
more if I tried.)

The issue, in the case of HN, is with O(n) IO watchers. Most sockets are idle
most of the time, so you really want an algorithm that is O(n) over _active_
sockets, not O(n) over active and inactive sockets. You typically have so few
active fds at any time that the n is really tiny, making massively scalable
network servers trivial to write. But you also have a lot of connections at
any one time, so if you are O(n) over active and inactive fds, then you are
going to have performance issues. Basically, you don't want to pay for
connections that aren't doing anything.

Fortunately, we have the technology; epoll on Linux, kqueue on BSDs, /dev/poll
on Solaris. You just need to use an event loop, so it does all the hard stuff
for you (and so you don't have to worry about the OS differences). Hacking a
proper event loop into MzScheme may be hard, but it's absolutely necessary for
writing scalable network servers. Handling 10k+ open connections is trivial
with today's technology. And, all the cool kids are doing it (node.js, GHC,
etc.).

~~~
swannodette
My understanding is that MzScheme / Racket has a proper event loop.

~~~
jrockway
Yeah, I have no idea. All I know is that proper threads do not bloat anything,
and that proper IO watchers are not O(n) over inactive connections.

------
PStamatiou
Would be interesting to see if traffic goes up after this and is elastic.
Marissa Mayer had a talk at some conference in 2009 where she explained her
early tests on number of search results on Google - 10, 20, 25, 30 - but in
the end it was just about the speed associated with loading the pages that
accounted for the number of pageviews and visitors.

------
Luyt
_"It turns out there is a hack in FreeBSD, invented by Filo, that causes the
O/S not to give a new connection to the server until an entire HTTP request
has arrived."_

I wouldn't call it a hack, but a feature ;-)

    
    
        # Buffer a HTTP request in the kernel 
        # until it's completely read.
        apache22_http_accept_enable="yes"
    

Is HackerNews web scale?

~~~
makmanalp
What does "web scale" mean? I see it thrown around a lot without much
explanation.

~~~
klochner
<http://www.youtube.com/watch?v=b2F-DItXtZs>

~~~
Luyt
A transcript can be found at <http://mongodb-is-web-scale.com/>

_"Shards are the secret ingredient in the web scale sauce. They just work."_

------
earle
HN only supports 20 req per second???

~~~
PStamatiou
flat files, no database

~~~
pg
That's not the bottleneck. Essentially there's an in-memory database (known as
hash tables). Stuff is lazily loaded off disk into memory, but most of the
frequently needed stuff is loaded once at startup.

The bottleneck is the amount of garbage created by generating pages. IIRC
there is some horrible inefficiency involving UTF-8 characters.

~~~
gills
Are you using any sort of in-memory fragment caching? That seems like it might
reduce some render overhead.

~~~
pg
A great deal, and it does.

------
alain94040
I have a proposal to settle flamewars by the way. I had meant to propose
something like this (a debate solver) for years. Here it is:

After 4 levels of back and forth (Joe says "...", Tim replies, then Joe
replies once more, then Tim replies again), freeze that branch, hide it from
the general public, and turn the branch into a settlement: both Tim and Joe
are allowed one final comment each, that they both approve. Only once they
have posted this compromise, is it shown in-place, where the original sub-
thread used to be.

Simple. Prevents endless arguments. Good for everyone.

~~~
brc
Hmm, I have had ideas for something similar, but which involves having to
choose a side of the argument (ie, agree with parent or disagree) before
posting. Once chosen, you can only vote on your 'side' (either up or down).
Poor arguments on your 'side' can be killed with sufficient downvotes, so that
the ensuing set of arguments hopefully ends up being the best set. This tends
to happen in an informal way on HN, but only because people largely behave. In
other forums, not so much. Perhaps glomming together your idea of maximum
posts per user on a topic, along with side-based voting, some type of civil
debating platform coudl be developed. After all, in actual debates you get 2
chances to state your position and a final sum-up.

------
bootload
is there any reason the noobs url (29 Apr: Faster, Fewer Flamewars) ~
<http://news.ycombinator.com/noobs> shows no results?

~~~
pg
It has since been split into noobstories and noobcomments.

------
gms
Who's Filo? David Filo?

~~~
aristus
Yep. He's still a mensch. Yahoo did some incredible stuff on Apache & FreeBSD
back in the day. I remember a hack that added hardcoded HTTP headers to the
image files on disk, to squeeze that extra nilth percent out of the server.

~~~
jcapote
That's awesome!

------
rograndom
I was getting connection errors less than an hour ago, so I'm not sure if it
actually worked.

------
richcollins
Sounds like a switch to async I/O would be helpful.

~~~
dauphin
Well, HN is written in Arc, which is a layer on top of MzScheme. MzScheme
handling of sockets is actually already done with the select() syscall, and
its "threads" are lightweight non-blocking threads (think Erlang). So it's
already async but with "sugar".

------
js4all

      > In 7 seconds, a hundred or more connections accumulate. So  
      > the server ends up with hundreds of threads, most of them  
      > probably waiting for input  
    

This is why Nginx handles large site much better. The request are queued
without spawning threads. Evented I/O for the rescue.

------
jacquesm
Let's hope that puts an end to all the time-outs!

Thanks for the work and it sure seems to be a lot more responsive.

------
spydum
Does seem a bit faster -- could be placebo though :)

------
MikeCapone
Neat. All I can say is: Thanks!

------
ezalor
Reverse-proxying via nginx would solve this problem and more: the arbitrary 30
second limit on form submission (hotspots sometimes are slow...), nginx could
handle rate limiting & logging instead of srv.arc, etc. The Arc codebase would
btw be smaller and cleaner (no policy/sanitization code, etc.).

Serving static content via Apache was a first step ;-)

Don't reinvent the wheel!

~~~
djcapelis
Did you just tell people running a company that reinvented funding with their
custom written news site written in their own programming language with a
custom web server that they shouldn't re-invent the wheel?

They think they can build a better wheel. They seem to like doing it and have
a habit of it. There's nothing wrong with that.

~~~
kirubakaran
Unless they built their server with nand gates, I don't see running nginx
reverse proxy to be incoherent with their philosophy.

~~~
djcapelis
Right, they only re-invent wheels when they feel they can make a better one. I
don't think they feel they can go through the effort of fabricating a better
chip. Though I wouldn't put it past rtm to try.

The philosophy "Don't reinvent the wheel" however, is definitely inconsistent
with their philosophy. They will reinvent the wheel whenever they feel they
can make a better one. Just because they haven't reinvented every wheel does
not mean "don't reinvent the wheel" applies to this group.

They chose to create the best solution they think they can. They don't seem to
care whether or not that involves reinventing wheels. The original argument
that they should seems pretty silly.

~~~
pig
I think he means that the nginx reverse proxy is just a part of the
infrastructure, like the server, OS, MzScheme etc they use.

~~~
ezalor
Totally. "If you want to make an apple pie from scratch, you first need to
recreate the Universe" -- Sagan.

------
dauphin
_> It turns out there is a hack in FreeBSD, invented by Filo, that causes the
O/S not to give a new connection to the server until an entire HTTP request
has arrived. This might reduce the number of threads a lot, and thus improve
performance; I'll give it a try today or tomorrow._

Anyone know if they're referring to "accept filters" here? FreeBSD folks can
"man accf_http" if they're curious, which does prevent a request from being
handed off to the application until the complete (and valid?) request has been
made. Certainly not a "hack" but a feature of the OS itself.

Or they could use a proxy. All this "fuck me I'm famous" attitude is stupid.

~~~
svlla
this seems impossible for item pages due to how continuation ids are used for
replies

~~~
dauphin
This could be resolved using consistent hashing or a critbit tree.

------
dauphin
Lisp is definitely not a slow language: you can handle the crazy rate of 20
requests/second on a multi-core server!

~~~
mahmud
If you choose to. We pushed north of 800 r/s in production, and just shy of 4k
in our LAN, that's using stock hunchentoot with just mere _customization_.

This guy here broke the 10k barrier:

<http://john.freml.in/teepeedee2-c10k>

------
seanfchan
Yes placebo, oh our brains, though its not mid day just yet :)

------
dhimes
Sorry, guys, it was entirely my fault. I got Michael Grinich's iphone HN app
and just love the damn thing. :)

------
mbubb
I find it disturbing to see people asking "Who is Rtm?" "Who is filo?"

I understand if you are in tech you might not know figures in history or
literature... but these guys?

Every time you login to a UNIX/Linux system you use the passwd file and
related setup - authored at least in part by Rtm's father.

<http://www.manpages.info/freebsd/passwd.1.html>

Rtm has done lots in his own right as the wikipdia pages show.

But seriously - if you don't know who these people are you really should.

Read this: <http://www.princeton.edu/~hos/Mahoney/unixhistory>

and maybe ESR's writings and that online anthology of the early Apple days and
old issues of 2600, etc, etc

I am sorry - but it is really irritating to me that someone would be on this
site and really not be aware of the deeper history and culture. It is not that
deep - 1950s to present (to cover Lisp).

As Jay-Z (whom you probably know) says - "Go read a book you illiterate son of
a bitch and step up your vocab ..."

~~~
taylorbuley
You're making little sense. So "go learn" but "don't ask?"

~~~
mbubb
You write for a major magazine on the subject. Yes - I would hope you know
more about the history of this topic than I do.

