I'm surprised that a site as big as HN is only hosted in one place.

Aperocky · on March 15, 2021

HN is probably very small. Curious as to the minimum size of the backend that will hold up the website.

There may need to be read replicas, but maybe not even that is needed.

dang · on March 15, 2021

It's about the same as what Scott described here: https://news.ycombinator.com/item?id=16076041

But we get around 6M requests a day now.

skissane · on March 15, 2021

What was the motivation in choosing FreeBSD?

(Just so nobody misinterprets my question, nothing wrong with FreeBSD, I know other stuff also runs on it like Netflix’s CDN. Still always interested to hear why people choose the road less travelled)

tlb · on March 15, 2021

RTM, PG and I used BSDI (a commercial distribution of 4.4BSD) at Viaweb (starting 1995) and migrated to FreeBSD when that became stable. RTM and I had hacked on BSD networking code in grad school, and it was far ahead of Linux at the time for handling heavy network activity and RAID disks. PG kept using FreeBSD for some early web experiments, and then YC's website, and then for HN.

FreeBSD is still an excellent choice for servers. You may prefer Linux for servers if you're more familiar with it from using it on your laptop. But you use Mac laptops, FreeBSD sysadmin will seem at least as comfortable as Linux.

caslon · on March 16, 2021

Do you think this influenced early YC companies more generally? For example, reddit's choice in picking FreeBSD over Linux?

It's interesting that they might still be on Lisp if they hadn't picked FreeBSD (a chiefly cited concern was that spez's local dev environment couldn't actually run reddit, which seems like it wouldn't have been a problem with Linux, since Linux & OS X both had OpenMCL (now known as CCL) as a choice for threaded Lisp implementations at the time).

tlb · on March 16, 2021

Lisp was indeed a hassle on FreeBSD. Viaweb used CLisp, which did clever things with the VM system and garbage collection that weren't quite portable (and CLisp's C code was all written in German for extra debugging fahrvergnügen.)

I don't know how Reddit came to use FreeBSD, but if you asked which OS to use around university CS departments in 2005 you'd get that answer pretty often.

caslon · on March 16, 2021

Yeah, absolutely; wasn't criticizing the choice of FreeBSD more generally (short of elegant maybe, but the only real UNIX systems available these days are illumos and xv6, and they're short of elegant, too), just thought it odd for that specific use case.

Thanks for answering! That's really interesting about clisp; I've always found it a more comfortable interactive environment than any other Common Lisp, but it definitely sacrifices portability for comfort in more ways than one (lots of symbols out of the box that aren't in the HyperSpec or any other implementation, too, for example). I'm now really thankful I've never been tempted to look to its source!

skissane · on March 16, 2021

> the only real UNIX systems available these days are illumos and xv6

I do wonder how you are defining "real UNIX system" in that statement.

caslon · on March 16, 2021

How do you define UNIX? Don't use "a system licensed to use the trademark," as that's boring and includes many things that are definitely far from it. It's hard to pin down! I'd say it's easiest to define what isn't: massive systems.

Massive systems miss the design intent and, to a great extent, nearly every benefit of using UNIX over VAX.

This excludes many of the operating systems licensed to use the trademark "UNIX." In this regard, even though Plan 9 is obviously not UNIX, it's a lot closer to it than (any) Linux and FreeBSD.

skissane · on March 16, 2021

> Massive systems miss the design intent and, to a great extent, nearly every benefit of using UNIX over VAX

I take it you meant to say "VMS" here, not VAX.

I don't think the size of a system is essential to whether it counts as "UNIX" or not. The normal trajectory of any system which starts small is to progressively grow bigger, as demands and use cases and person-years invested all accumulate. UNIX has followed exactly that trajectory. I don't see why if a small system gradually grows bigger it at some point stops being itself.

I think there are three main senses of UNIX – "trademark UNIX" (passing the conformance test suite and licensing the trademark from the Open Group), "heritage/genealogical UNIX" (being descended from the original Bell Labs Unix code base), "Unix-like" (systems like Linux which don't descend from Bell Labs code and, with rare exception, don't formally pass the test suite and license the trademark, but which still aim at a very high degree of Unix compatibility). I think all three senses are valid, and I don't think size or scale is an essential component of any of them.

UNIX began life on small machines (PDP-7 then PDP-11), but was before long ported to some very large ones (for their day) – such as IBM mainframes – and the operating system tends to grow to match the scale of the environment it is running in. AT&T's early 1980s IBM mainframe port [0] was noticeably complicated, being written as a layer on top of the pre-existing (and obscure) IBM mainframe operating system TSS/370. If being small is essential to being UNIX, UNIX was only a little more than 10 years old before it was already starting to grow out of being itself.

[0] https://www.bell-labs.com/usr/dmr/www/otherports/ibm.pdf

caslon · on March 16, 2021

> I take it you meant to say "VMS" here, not VAX.

Embarrassing slip in this context (I was just reading the CLE spec, too!), but yes.

> UNIX has followed exactly that trajectory. I don't see why if a small system gradually grows bigger it at some point stops being itself.

Adding onto something (and tearing down the principles it was created on, as Linux and most modern BSDs do) doesn't always preserve the initial thing; a well-built house is better as itself than reworked into a McMansion. Moissanite isn't diamond; it's actually quite different.

An operating system that has a kernel with more lines of code than the entirety of v7 (including user programs) is too much larger than UNIX, and too much of the structure has been changed, to count as UNIX in any meaningful sense of the word.

> If being small is essential to being UNIX, UNIX was only a little more than 10 years old before it was already starting to grow out of being itself.

Correct, which is why many of the initial UNIX contributors started work on Plan 9.

skissane · on March 16, 2021

You started out by saying:

> the only real UNIX systems available these days are illumos and xv6

And then when I ask you what makes those "real UNIX systems" you say:

> I'd say it's easiest to define what isn't: massive systems.

But I don't see how illumos doesn't count as a "massive system". Think of all the features included in illumos and its various distributions: two networking APIs (STREAMS and sockets), DTrace, ZFS, SMF, Contracts, Doors, zones, KVM, projects, NFS, NIS, iSCSI, NSS, PAM, Crossbow, X11, Gnome, IPS (or pkgsrc on SmartOS), the list just goes. illumos strictly speaking is just the kernel, and while much of the preceding is in the kernel, some of it is user space only; but, to really do an apples-to-apples comparison, we have to include the user space (OpenIndiana, SmartOS, whatever) as well. Solaris and its descendant illumos are just as massive systems as Linux or *BSD or AIX or macOS are.

I will grant you that xv6 is not a massive system. But xv6 was designed for use in operating systems education, not for production use (whether as a workstation or server). If you actually tried to use xv6 for production purposes, you'd soon enough add so much stuff to it, that it would turn into just as massive a system as any of these are.

caslon · on March 16, 2021

> Think of all the features included in illumos and its various distributions: two networking APIs (STREAMS and sockets), DTrace, ZFS, SMF, Contracts, Doors, zones, KVM, projects, NFS, NIS, iSCSI, NSS, PAM, Crossbow, X11, Gnome, IPS (or pkgsrc on SmartOS), the list just goes.

Much of what you mention isn't actually necessary/isn't actually in every distribution! Including X11 and GNOME as a piece of it is a bit extreme, don't you think? I also think it's a bit extreme to put things that are obviously mistakes (Zones, doors, SMF, IPS) in with things that actually simplify the system (DTrace and ZFS, most importantly) as reasons for why illumos is overly-complex.

I mostly agree with the idea that we have to include user space; even then, it's still clear that illumos is much closer to sane, UNIX-ideals than Linux is. I'm not going to claim that the illumos libc is perfect (far from it!), but the difference in approach between it and glibc highlights how deep the divide runs here. illumos, including its userspace, is significantly smaller than most Linux, massively smaller than macOS, slightly smaller than FreeBSD (and much better designed). All of these, though, are of course much smaller and far more elegant than AIX, so in that way we all win.

I don't actually know much more I would add to xv6. If anything, I'd start by removing things. Mainly, I hate fork. Of course, its userspace is relatively small, but v7's userspace is more or less enough for me (anecdotally, I spend much of my time within via SIMH and it's pretty comfortable, although there are obviously limits to this), so it wouldn't take many more additions to make it a comfortable environment.

Again, I'm not claiming Linux is bad (I love Linux!), simply that it isn't UNIX and doesn't adhere to the UNIX philosophy.

skissane · on March 16, 2021

> simply that it isn't UNIX and doesn't adhere to the UNIX philosophy.

I talked earlier about three different definitions of UNIX – "trademark/certified UNIX", "heritage/genealogical UNIX" and "UNIX-like/UNIX-compatible". Maybe we could add a fourth, "philosophical UNIX". I don't know why we should say that is the only valid definition and ignore the validity of the other three.

The fact is that opinions differ on exactly what the "UNIX philosophy" is, and on how well various systems comply with it. The other three definitions have the advantage of being more objective/clearcut and less subject to debate or differing personal opinions.

Some would argue that UNIX itself doesn't always follow the UNIX philosophy – or at least not as well as it could – which leads to the conclusion that maybe UNIX itself isn't UNIX, and that maybe a "real UNIX" system has never actually existed.

It is claimed that one part of the UNIX philosophy is that "everything is a file". And yet, UNIX started out not treating processes as files, which leads to various problems, like how do I wait on a subprocess to terminate and a file descriptor at the same time? Even if I have an API to wait on a set of file descriptors, I can't wait on a subprocess to terminate using that API since a subprocess isn't a file descriptor.

People often point to /proc in Linux as an answer to this, but it didn't really solve the problem, since Linux's /proc was mostly read-only and the file descriptor returned by open(/proc/PID) didn't let you control or wait on the process – this is no longer true with the introduction of pidfd, but that's a rather new feature, only since 2019; Plan 9's /proc is much closer, due to the ctl file; V8 Unix's is better than the traditional Linux /proc (you can manipulate the process using ioctl) but not as good as Plan 9's (its ioctls expose more limited functionality than Plan 9's ctl file); FreeBSD's pdfork/pdkill is a good approach but they've only been around since 2012.

caslon · on March 16, 2021

> I don't know why we should say that is the only valid definition and ignore the validity of the other three.

For "trademark UNIX": very few of the systems within are small, comprehensible or elegant.

For "heritage/genealogical UNIX": Windows 10 may have the heritage of DOS, but I wouldn't call it "DOS with a GUI."

For "UNIX-like/UNIX-compatible": nothing is really UNIX-compatible or all that UNIX-like. Do you define it as "source compatibility?" Nothing from v7 or before will compile; it's before standardization of C. Do you define it as "script compatibility?" UNIX never consistently stuck to a shell, which is why POSIX requires POSIX sh which is in many ways more limited than the Bourne shell.

I personally take McIllroy's view on the UNIX philosophy:

    A number of maxims have gained currency among the builders and users of the UNIX system to explain and promote its characteristic style:

    * Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features."

    * Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.
    
    * Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them.

    * Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them.

Throwing out things that don't work is a good idea, which is why the modern backwards-compatible-ish hell is far from UNIX (in this regard, I'll admit illumos doesn't qualify).

I fully agree with you that Plan 9 is closer to UNIX than Linux and FreeBSD!

skissane · on March 16, 2021

Would the original authors of Unix agree with your opinions on how to define the term?

Does AT&T's c. 1980 port of Unix to run on top of IBM's TSS/370 mainframe operating system [0] count as a real Unix? It appears that Ritchie did think it was a Unix, he linked to the paper from his page on Unix portability [1].

So is your definition of "Unix" broad enough to include that system? If not, you are defining the term differently from how Ritchie defined it; in which case I think we should prefer Ritchie's definition to yours. (McIlroy's maxims are explicating the Unix philosophy, but I don't read him as saying that systems which historically count as Unix aren't really Unix if they fall short in following his maxims.)

[0] https://www.bell-labs.com/usr/dmr/www/otherports/ibm.pdf

[1] https://www.bell-labs.com/usr/dmr/www/portpapers.html

caslon · on March 16, 2021

> McIlroy's maxims are explicating the Unix philosophy

This is why I used the quote, not for this reason:

> but I don't read him as saying that systems which historically count as Unix aren't really Unix if they fall short in following his maxims.

I'd say yes, a port of v7 is fine, because it's not meaningfully more complex. It can still be comprehended by a single individual (unlike FreeBSD, Linux, everything currently called Certified Commercial UNIX trademark symbol, etcetera).

skissane · on March 16, 2021

> I'd say yes, a port of v7 is fine, because it's not meaningfully more complex

I think AT&T's port of V7 (or something close to V7, I guess it was probably actually a variant of PWB) to run on top of TSS/370 really is meaningfully more complex because in order to understand it you also have to understand IBM TSS/370 and the interactions between TSS/370 and Unix.

dang · on March 15, 2021

I don't know, because that decision dates back to pg and rtm and probably Viaweb days. We like it.

ethbr0 · on March 15, 2021

Pragmatic engineering: What will this change enable me to do that I cannot do now? Does being able to do that solve any of my major problems? (If no, spend time elsewhere)

bhl · on March 16, 2021

Can I ask a question that's half facetious half serious (0.5\s): does hackernews use docker or any containers in its backend? With 6M requests per day, if it didn't use containers, HN might be a good counter example against premature optimization (?).

dang · on March 16, 2021

Nope, nothing like that. I don't understand why containers would be relevant here though? I thought they had to do more with things like isolation and deployment than with performance, and it's not obvious to me how an extra layer would speed things up?

bhl · on March 16, 2021

I was trying to point out in my original comment that some people maybe pre-maturely optimizing for scale, and having tooling drive decision-making rather than problems at hand. And a good logical short circuit to that would be: "if Hacker News serves 6M requests per day, then using docker would be overkill for a small CRUD app".

That being said, if modern websites were rated by utility to user divided by complexity of tech stack, I must say Hacker News would be one of the top ranked sites compared to something similar like Reddit or Twitter which at times feels... like a juggling act on top of unicycle just to read some comments. :)

cm2187 · on March 16, 2021

Agree. No one has created anything better than html tables!

ksec · on March 16, 2021

Not even sure any other modern stack could handle this with the same Hardware.

siquick · on March 15, 2021

Does anyone know what the AWS instance size equivalent of that would be?

maccard · on March 15, 2021

Very roughly equivalent to an m4 xlarge

fotta · on March 15, 2021

Wow, that's not as big as I thought then. What's the average peak rps?

dang · on March 16, 2021

We use Nginx to cache requests for logged-out users (introduced by the greatly-missed kogir), and I only ever look at the numbers for the app server, i.e. the Arc program that sits behind Nginx and serves logged-in users and regenerates the pages for Nginx. For that program I'd say the average peak rps is maybe 60. What I mean by that is that if I see 50 rps I think "wow, we're smoking right now" and if I see 70 I think "WTF?".

fotta · on March 18, 2021

Lol I can imagine you see 70 and think "oh no what's going on now".

bombcar · on March 15, 2021

Maybe standby should be in another rack, perhaps even another datacenter.

dang · on March 15, 2021

That would be the natural next step, but it's a question of whether it's worth the engineering and maintenance effort, especially compared to other things that need doing.

For failures that don't take down the datacenter, we already have a hot standby. For datacenter failures, we can migrate to a different host (at least, we believe we can—it's been a while since we verified this). But it would take at least a few hours, and probably the inevitable glitches would make it take the better part of a day. Let's say a day. The question is whether the considerable effort to build and maintain a cross-datacenter standby, in order to prevent outages of a few hours like today's, would be a good investment of resources.

floatingatoll · on March 16, 2021

My vote is no. We will all be fine for a day without HN, as today proved. There have to be so many other ways HN can be improved, that will have more of an impact for HN users, in the remaining 364 days of the year.

cesarb · on March 15, 2021

> For failures that don't take down the datacenter, we already have a hot standby. For datacenter failures, we can migrate to a different host (at least, we believe we can—it's been a while since we verified this).

It might be a good idea to verify it; see the recent events at OVH (https://news.ycombinator.com/item?id=26407323).

Aperocky · on March 15, 2021

Question: what is the other things that need doing?

Obviously does not apply to engineering effort outside of hacker news website, which the team might be working on.

But this forum has seen little change over the years and it's pretty awesome as is.

(Though I didn't use HN api too much so not sure what's going on that side).

dang · on March 16, 2021

> Question: what is the other things that need doing?

I'm currently working on fixing a bug where collapsing comments in Firefox jumps you back to the top of the page. I'm taking it as an opportunity to refine my (deliberately) dead-simple implementation from 2016.

> But this forum has seen little change over the years and it's pretty awesome as is.

That's an illusion that we work hard to preserve, because users like it. People may not have seen much change over the years but that's not because change isn't happening, it's because we work mostly behind the scenes. Though I have to say, I really need more time to work on the code. I shouldn't have to wait for 3 hours of network outage to do that (but before anyone gets indignant, it's my own fault).

Flenser · on March 16, 2021

Does that mean it might get more performant? On my mobile the time it takes seems to scale with the number of posts on the page, not the number of posts it actually collapses

dang · on March 16, 2021

Yes I certainly hope so. The dead-simple implementation first expands all the comments and then collapses the ones that should be collapsed, so your observation is spot on.

phpnode · on March 15, 2021

Team is maybe a bit of a generous term to describe dang!

dang · on March 16, 2021

I had a lot of help today from one of the brilliant programmers on YC's incredible software team. And there are other people who work on HN, just not full-time since Scott left.

Aperocky · on March 15, 2021

ONE MAN TEAM.

amelius · on March 16, 2021

A read-only copy in a different DC could be a simple and still acceptable option.

And a status page would be nice.

_-david-_ · on March 16, 2021

Can you add any additional information like database or webserver?

yamrzou · on March 16, 2021

How much memory does HN use?

dang · on March 16, 2021

That depends on how much Racket's garbage collector will let us (edit: I mean without eating all our CPU). Right now it's 1.4GB.

Obviously the entire HN dataset could and should be in RAM, but the biggest performance improvements I ever made came from shrinking the working set as much as possible. Yes, we have long-term plans to fix this, but at present the only reliable strategy for getting to work on the code is for HN to go down hard, and we don't. want. that.

soegaard · on March 17, 2021

Are you using Racket BC or CS?

brodock · on March 16, 2021

Funny they didn't had to build 10 million Microservices and host it through a million kubernetes pod instances to handle "internet traffic".

_joel · on March 15, 2021

They only have one server, iirc.

voxadam · on March 15, 2021

And, if I'm not mistaken, the site is single threaded.

aspectmin · on March 15, 2021

Would love to see the HN architecture.

krapp · on March 15, 2021

arclanguage.org hosts the current version of Arc Lisp, including an old version of the forum, but HN has made a lot of changes locally that they won't disclose for business reasons.

There's an open source fork at https://github.com/arclanguage/anarki, but it doesn't have any direct relationship with HN.

mike_d · on March 15, 2021

Single threaded LISP application running on a single machine. Ta-da.

dang · on March 16, 2021

The application is multi-threaded. But it runs over a green-thread language runtime, which maps everything to one OS thread.

That's a significant distinction because if you swap the underlying implementation then the same application should magically become multithreaded, which is exactly the plan.

jsty · on March 15, 2021

Until 2018 at least it was ... wait for it ... a single server!

https://news.ycombinator.com/item?id=18496344

(Anyone know if that's still the case?)

dang · on March 15, 2021

One production server and one failover (in the same data center, obviously).

betamaxthetape · on March 15, 2021

I assume there are off-site backups, though?

Asking as someone who was impacted by the OVH fire last week, and I didn't have recent backups and therefore lost data.

ghotli · on March 16, 2021

I've been waiting to see a comment like this somewhere. Just a hugops from the internet and a reminder to all who see this to get your backups fire-proof and off-site.

dang · on March 16, 2021

Yes, we've got a good backup system thanks to the greatly-missed sctb.

Sorry to hear that, that sucks.

mwcampbell · on March 15, 2021

Running on a single server is cheaper, and nobody loses money if HN is down (as far as I know), so it makes sense.

Sahbak · on March 15, 2021

Sometimes, it pays off being extremely simple. In HN, it definitely does

mromanuk · on March 15, 2021

After this event, they should switch to two servers in different DC.

johannes1234321 · on March 15, 2021

When going to two you need to handle split brain some way probably, otherwise you end up with an database state hard to merge, thus you better get three, so two can find consensus, or at least an external arbitration node, deciding on who is up. At that point you have lots of complexity ... while for HN being down for a bit isn't much of a (business) loss. For other sites that maths probably is different. (I assume they keep off-site backups and could recover from there fairly quickly)

ethbr0 · on March 15, 2021

I haven't run a ton of complicated DR architectures, but how complicated is the controller in just hot+cold?

E.g. some periodic replication + external down detector + a break-before make failover that brings up the cold, accepting any unreplicated state will be trashed and rendering the hot inactive until manual reactivation

johannes1234321 · on March 15, 2021

Well, there you have to keep two systems maintained, plus keep Synchronisation/replication working. And you need to keep a system running which decides whether to fail over. This triples the work. At least.

bpicolo · on March 15, 2021

There are plenty of sites where it's acceptable to be down for a bit sometimes.

centimeter · on March 15, 2021

Having two servers is a lot more than 2x as complicated and expensive as having 1 server.

jpwgarrison · on March 16, 2021

A wise colleague recently explained to me that if you build HA things HA from the start, it's only a little more than 2x. If you try to make an _existing_ system HA, it's 3x at best. HN is not a paid service, they can be down for a few hours per year, no problem. We're not all going to walk away in disgust.

joshmanders · on March 16, 2021

Not to mention the HNStatus[0] twitter account has so few tweets[1] I don't think it's even worth it.

[0]: https://twitter.com/HNStatus

[1]: Last tweet since before today's incident was 2 years ago, 4 years ago since the one before that.

cm2187 · on March 15, 2021

You should look at stackoverflow's hosting!

aspectmin · on March 15, 2021

Is this described somewhere? :)

cm2187 · on March 15, 2021

The most recent resource I found. I think they basically use a rack in one datacentre.

https://meta.stackexchange.com/questions/10369/which-tools-a...

giantrobot · on March 15, 2021

This is the newest version of their architecture I've seen [0]. Compare to an overview from 2009 [1].

tl;dr StackOverflow's architecture is fairly simple and has done mostly vertical scaling (more powerful machines) and bare metal servers rather than virtual servers. They also realize their use patterns are read-heavy so there's a lot of caching and they take advantage of CDNs for static content which completely offloads that traffic off their main servers.

[0] https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

[1] http://highscalability.com/stack-overflow-architecture