Hacker News new | past | comments | ask | show | jobs | submit login
A Brief History of High Availability (cockroachlabs.com)
221 points by melqdusy on Oct 5, 2018 | hide | past | favorite | 39 comments

Claiming that "high availability" started with the internet seems to ignore the role that Telcos (Bell Labs?) have played in our field and modern technological world.

The engineering work to deliver five nines of availability for POTS shouldn't be omitted in a piece about the history of HA, in my opinion.

I think it's ignored because the story is focused on databases' HA.

My impression from my limited telco experience (correct me if it's wrong) is that there was not much "database stuff" happening in telcos back in the days: only dispatch tables and call duration records come to mind. With then prevailing post-call billing, I assume a lot of the hard constraints on consistency were not there.

Banks and telcos both had database systems that had zero downtime in 20 years, in some case. Tandem Non-Stop was very real - https://en.wikipedia.org/wiki/NonStop_(server_computers)

They still are :D

Custom cameras and mechanical counters at one stage I believe.

Though telcos have much higher standards when it comes to the network / exchanges major failure's where meant to never happen.

And telcos did that with much weaker hardware than is available today!

Weaker in terms of processing power, but the robustness of telco hardware up until the mid-80's is ridiculous - over-engineering was the rule and did deliver excellent availability, though at high cost.

Hmm, I think potted history is more correct...

There is no mention hardware here. HP, Compaq, IBM and Sun all produced hardware with HA ability. Meaning that normal software could be run on two or more nodes, and should one break, it'd fall over with no loss of data, or outages. Here is a (contrived) video where they literally blow up a server stack: https://www.youtube.com/watch?v=qMCHpUtJnEI

You can do this with VMware, and if AWS could see a way to make money, they'd do it to. (For a while AWS's platform was not capable, but that was 7+ years ago. its much more lucrative to make people build in HA and redundancy into software, as its requires more services to run)

Hey, for once my former life designing HA infrastructure is actually useful!

The reality is that hardware HA was almost always terrible. Of the platforms you described:

- Sun had no hardware HA ever, down to the unfathomable design that all of their E-class machines had only one power cord. They had SOME hotswap hardware, but the rules were byzantine - it couldn't be the first processor board, and it couldn't be the last processor board, oh and by the way, if a processor or RAM went bad, the machine would crash, but when it came back up it would take the bad processor offline and if you were lucky enough the bad processor wasnt on the first or last board and then you could hotswap it.

- The only HA hardware from IBM was the mainframe, and even there, it was definitely possible for a software fault to take down the entire thing. The P-series boxes had lots and lots of fancy sounding HA capabilities, but they would only certify a configuration as fault-tolerant if you bought two of them and clustered them with HA/CMP (i.e. software HA).

- Compaq had the NonStop servers, based on the Tandem acquisition. As the downthread comment correctly pointed out, there was a ton of hardware redundancy in that platform, but I think it ran a proprietary OS. They also had their OpenVMS clustering, which offered amazing HA - but it was all delivered in software, and your app had to be either a) stateless or b) cluster-aware

- HP bought compaq, but the HP-UX machines relied on software for their clustering.

This all dates back to when people thought servers were special in some way, and needed to justify their insane price points with all sorts of fancy marketing features. Some were useful (I remember an early IBM linux box with RAM mirroring that actually kind of worked), but in the end, a Veritas cluster of decoupled nodes almost always worked better, more reliably, and faster than any hardware nonsense.

EDIT: fixed my wrong assertion about Compaq tech.

"Compaq had the NonStop servers, which ran OpenVMS...but it was all delivered in software"

NonStop/Guardian had nothing to do with OpenVMS. And the hardware was certainly purpose built for HA, including redundant, lockstepped CPUs, disk controllers, etc.

You are 100% correct - in my haste, I conflated the old Tandem hardware line with the OpenVMS clustering platform. My mistake, I’ll see if I can edit my post.

Your comment leaves the impression that IBM mainframes, not to mention high-end Unix systems from Sun, IBM, and HP, commonly had problems that would crash the machines and had no better availability than my desktop or perhaps a $3K server.

Perhaps it was meant as hyperbole, but it would be interesting and valuable to hear some of the nuance from someone with your experience.

IME talking to people, many, especially mainframe operators, would disagree. My direct experience with mid-range Compaq/HP servers has been excellent - it's hard to remember a box going down unless I commanded it to. I don't think I replaced any except due to age or warranty expiration.

> it's hard to remember a box going down unless I commanded it to

This could be a form of the law-of-large-numbers at work. If each customer only has a single box, and the failure rate is .1%, then 999 customers will never have a failure to remember, compared to the 1 who will.

What does it look like if each customer has 1000 boxes?

Remember that a "box" is a unit for commodity.

Your mid sized mainframe will be at least a full rack, plus ancillaries. This can be analogous to a rack full of blades.

It can be analagous, but usually isn't, not in the vast majority of technical [1], administrative, or purchasing senses.

I'd argue that blades are already bordering on being outside of "commodity", at least in this context reliability, since a single chassis replaces some number of actually-commodity standalone boxes with both shared PoFs and highly-customized designs (especially high density that results in thermal unreliability if not outright failure).

[1] Although some blade systems did offer things like integrated shared switches, they failed to make them cheap enough, so, in practice, it didn't happen. Personally, I suspect there was never enough advantage to putting ethernet or even InfiniBand on a backplane instead of cables, not at blade chassis scale of a dozen nodes.

I've managed far more than one box, and far more desktops/laptops than servers. The servers have been far more reliable.

That doesn't really address my point.

I'm talking about the mainframe (or other high-end server) situation.

Granted, you were specifically referring to "mid-range Compaq/HP servers", so perhaps I didn't understand what mid-range meant.

If they're just a branded/enterprise version of commodity servers, I'm missing how those are representative of hardware HA of the kind that the OC finds almost always terrible.

If they're non-commodity, then I'm curious what their price tags are and if "far more than one box" equates to at least several hundred (else there's still a decent enough chance of not getting bitten by even a 1% failure rate).

What about true HA hardware, for example like Tandem, or the systems on the Space Shuttle, etc.? A discussion of those systems from the 70s, 80s and 90s seems appropriate (but I am not knowledgeable enough to start it).

The hardware failures that hardware-based HA systems solve without software assistance are extremely unlikely on any modern platform (eg. CPU failure). In this regard it is somewhat remarkable, that original Pentium supports glueless(!) high-reliability redundant CPU mode (2 cores execute same instruction stream and you get NMI when the results diverge). It is somewhat telling that probably world's first 2 votes out of 3 HA implemented in hardware computer (SAPO, first Czech computer) was motivated by the fact that used CPU implementation technology (mechanical relays) was highly unreliable.

Both Tandem and Space Shuttle Flight Computer relied highly on software support for HA.

And it is my experience that most approaches to bolting on redundacy on system that is originally designed to run on single node are actually detrimental to the resulting availability. Either because it introduces additional failure modes that have to do with the fail-over mechanism itself (too eager failover, various byzantine generals failures of the control layer...) or because it depends on global consistency assumptions that are not in fact met (case in point: one high-reliability soft-realtime system with redundant application CPUs I was involved with is built with the assumption that you can reboot the application CPU at any time and it will catch up with system state as long as at least one other CPU is still running correctly. This ends up to not be a case as there is initialization phase that has to happen for the CPU to know the overall system state which interferes with the overall system operation as it involves injecting test patterns into global system inputs.)

very interesting could you please give more detail on how Tandem and Space Shuttle Flight Computer relied on software support ?

I'm pretty sure I had an E3500, back in the late 90's, that had two power supplies. I may be misremembering...

No, you are correct. That second power supply was apparently for "peripherals only", not the main CPUs... http://www.tech-specifications.com/SunServers/E3500.html !

"your app had to be either a) stateless or b) cluster-aware" could you elaborate on what your app had to do to be cluster-aware ?

Compaq ServerPro had HA ability with a hot spare server connected to the primary with automatic failover well before the Tandem acquisition.

IBM has done a huge amount in this area e.g.


I could not find anywhere an explanation of what HP nonstop is actually doing. The fact you say its like VMWare make me think you mean VMware vLockstep. Documentation say : "While both the primary and secondary VMs receive the same inputs, only the primary VM produces output such as disk writes and network transmits."

The part I don't understand is how can the primary wait for the secondary to execute the same cpu instruction before producing output? this would be extremely slow even with very good network latency.

Also it seem in vsphere6 VMware have rewritten how FT is handled, dropping the old “lock-step” approach in favour of a new “fast check-pointing” type method.

If the memory content on the secondary is not exactly the same as what was on the primary before the crash. Then there is some kind of small dataloss.

This is the kind of thing I assumed could only be solved by having the primary run a consensus protocol like Raft or Paxos for each network request it receive before producing response to the request.

HA was a thing long before "the internet" on mainframes, and still is. And apart from sharding, there were, and still are, load balancers for your service tier and HA (redundant, replicated) shared-disk DB clusters from Oracle and others around 1990 or earlier.

One word: "airlines".

The opening about availability and our expectations hits right at home.

I'm originally from France, where stores closing during lunch time, sundays and mondays all day is normal. I remember when I moved to Canada, it seemed so amazing that stores were opened 7/7, and sometimes until 9pm (on Thursdays and Fridays). Wow!

Then I got used to it, and now, anytime I'm back in France, I just find it unacceptable and wonder how those store survive and make any money while always being closed!

Because they are all closed at the same time?

Sundays / Mondays, yes, almost all stores are closed. Lunch time, not all of them close, but it is a fair part.

As mentioned on HN previously, B&H (a large New York based photography store) do not take online orders in accordance with Jewish holidays.


I once worked with a bank where a server we depended on was only available from 0800 to 1500, then 1530 to 1700. I think that's the only time i've ever had to build around a computer which took a tea break.

They also will not take online orders during the Sabbath (i.e., Friday evening until Saturday evening). It's peculiar that they enforce it for online checkout. It's a quality website nonetheless.

For anyone interested in learning more about HA and consistency, I found this book very interesting: https://dataintensive.net/. The author does a great job developing an intuition for progress made along with trade-offs at each stage.

Good read, thank you. I always welcome these articles where the notion of "how things came to be" is introduced. Makes it easier to manage the latest and greatest, when you have an understanding of how things have evolved.

The article mentions sharding (a scaling technique, not exactly something that helps with HA), but then it stops. Does every node in CockroachDB store all data?

No, data in the tables is sharded into blocks and each block is replicated across the cluster using Raft consensus.

latency, throughput, availability and storage capacity, the quest for the holy grail.

Hey bro you want to hit this? Whats the WORST name for a company we can come up with??

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact