Hacker News new | past | comments | ask | show | jobs | submit login

And their documentation shows them scraping HN:


Please respect the robots.txt if you do. HN's application runs on a single core and we don't have much performance to spare.

Serious question, how is that possible? Somebody recently gave me a Dell R720 2RU server with 16 cores and 128GB of RAM for free. There's literally that much slightly used server gear showing up on the used market from companies that have migrated everything to aws/gcp/azure/whatever.

If all of HN has only a single core then you're running it on less server resources than I could buy on ebay with $180 and a visa card?


It's pretty easy if you're keeping everything in RAM and don't have layers upon layers of frameworks. You've got about 3B cycles per second to play with, with a register reference taking ~ 1 cycle, L1 cache (48K) = 4 cycles, L2 cache (512K) = 10 cycles, L3 cache (8M) = 40 cycles, and main memory = 100 cycles. This entire comment page is 129K, gzipping down to 19K; it fits entirely in L2 cache. It's likely that all content ever posted to HN would fit in 128G RAM (for reference, that's about 64 million typewriter pages). With about 30M random memory accesses being possible per second (or about 7 billion consecutive memory accesses - you gain a lot from caching), it's pretty reasonable to serve several thousand requests per second out of memory.

For another data point, I've got a single box processing every trade and order coming off the major cryptocurrency exchanges, roughly 3000 messages/second. And the webserver, DB persistence, and a bunch of price analytics also run on it. And it only hits about 30% CPU usage. (Ironically, the browser's CPU often does worse, because I'm using a third-party charting library that's graphing about 7000 points every second and isn't terribly well optimized for that.)


Software gets slow because it has a lot of wasteful layers in between. Cut the layers out and you can do pretty incredible things on small amounts of hardware.

Would love to see a blog post on building it some time, I think the last vaguely similar (but serious) rundown in this vein was “One process programming notes”.


> Serious question, how is that possible?

You see, it is possible. Take a step back and look again to see the evidence: The site is called Hacker News, the first goal of this site was to prove that something useful can be created in an entirely custom programming language, and the site handles a huge amount of traffic, so it is a challenging task to let this run on a single core.

So the answer from a hacker's mind to why they let it run on a single core might simply be: Because they can.

On the other hand, YCombinator is a successful company, so buying a larger server would certainly be in their latitude. But that would be less intellectually appealing, and part of their success come from the fact that they decide as hackers, and don't always take the easiest path.

Cool as it may be for the intellectual challenge of tight coding, to run on a minimum of resources, it also makes things more vulnerable to DDoS, slashdot effect, and less than ethical people running abusive scraping tools that don't respect robots.txt. As a person who is on the receiving end of the very rare 3am phone call for networking related emergencies, I try to provision a sufficient amount of resources above "the bare minimum" to ensure that I'm not woken up by some asshat with a misconfigured mass http mirroring tool.

RAM is so cheap now for small sized things that you can afford to trivially have an entire db cached at all times, with only very rare disk I/O.

As an example we have a request tracker ticket database for a fairly large sized isp which is a grand total of under 40GB and lives in RAM. It's dozens of thousands of tickets with attachments and full body text search enabled. For those not familiar with RT4 it's a convoluted mess of Perl binary scripts.

I could probably run my primary authoritative master DNS on bind9 on Debian-stablr on a 15 year old Pentium 4 with 256MB of RAM, but I don't...

Don't know how much it is still true, but HN was originally implemented in arc.

The language homepage[1] says "Arc is unfinished. It's missing things you'd need to solve some types of problems. [...] The first priority right now is the core language."

Perhaps parallelism is still pending. A Ctrl-F on the tutorial doesn't turn up any hits for "process", "thread", "parallel", or "concurrency".

[1]: http://www.arclanguage.org

Arc has threads and HN's code relies heavily on them. Arc currently runs on Racket, though, which uses green threads, so the threadiness doesn't make it to the lower levels. Racket has other ways of doing parallelism, but as far I know they don't map as well to Arc's abstractions.

It's a single-threaded process running a single core.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact