That being said, is the actual production infra archi for HN described somewhere ? Curious how simple it can afford to be.
We laugh at people piling layers and layers and artifacts on their sites, all in the hope of adding redundancy, handle "webscale" load, and avoid an outage (ironically increasing the chances that _something_ will break).
However, if a single hard drive crashing somewhere can cause your site to be down for minutes or hours, some non-tech people (managers, shareholders, customers) will wonder if the site is "professionnal" enough - and I can sympathize with them.
> We’re recently running two machines (master and standby) at M5 Hosting. All of HN runs on a single box, nothing exotic:
CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (3500.07-MHz K8-class CPU)
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
Mirrored SSDs for data, mirrored magnetic for logs (UFS)
We get around 4M requests a day.
> If you had an auto scaling kubernetes cluster with multiple redundancies using rust and 3 JS frameworks outages like this wouldn't surprise your users anymore.
Irony aside, what's the point? In theory, yes, it could work better. In practice though, HN with its two baremetal boxes has better uptime than 99,99% of the Web, including the biggest ones - just because complexity has its price.
There is a good chance that it is (or was!) an actual spinning hard drive. Whatever it is, it lives in one of our boxes at M5 and it's in their hands for the moment.
People guess the origin of our name often. Maybe this will give you even more of a chuckle. I was not aware of the name of this computer when I named the company. https://en.m.wikipedia.org/wiki/The_Ultimate_Computer
Our Diablo disk goes on the fritz, but who needs a disk when you can netboot? Ken demonstrates the Alto network capabilities, connects to Google, and has the Alto calculate and display a Mandlebrot set. Ken's in-depth blog entry including the fractal demo source code is found here:
We begin our very gentle and progressive power up of the seminal Xerox Alto. No magic smoke, but one power supply is faulty. Opening it up reveals that it had a tough life, having suffered a catastrophic short of some sort, hastily repaired, and some traces almost entirely corroded through. But the source of the malfunction seems to be a somewhat classic case of bad electrolytic capacitors, way too far gone for any hope of reforming. After replacing them and repairing the supply, we turn our attention to the Diablo disc drive and cartridge, and have a bit of a surprise.
Many thanks to my CHM restorers colleagues Ron Crane, Ken Shirriff, Carl Claunch and Luca Severini.
See previous video introducing this historically significant machine:
No apology necessary, but I'm curious how a hard drive failure caused an outage. No RAID or mirroring? No hot spares? No clustering or distributed systems?
It was part of a mirror of identical SSDs on an LSI MegaRAID RAID card. We see occasional "spectacular" drive failures that take the machine down with a single disk failure. Usually it's just a reboot to come back up, and a disk replacement, then some hours of time to rebuild the array and get back to situation nominal.
Sorry everyone!