
A Brief History of High Availability - melqdusy
https://www.cockroachlabs.com/blog/brief-history-high-availability/
======
dannypgh
Claiming that "high availability" started with the internet seems to ignore
the role that Telcos (Bell Labs?) have played in our field and modern
technological world.

The engineering work to deliver five nines of availability for POTS shouldn't
be omitted in a piece about the history of HA, in my opinion.

~~~
forkerenok
I think it's ignored because the story is focused on databases' HA.

My impression from my limited telco experience (correct me if it's wrong) is
that there was not much "database stuff" happening in telcos back in the days:
only dispatch tables and call duration records come to mind. With then
prevailing post-call billing, I assume a lot of the hard constraints on
consistency were not there.

~~~
aarghh
Banks and telcos both had database systems that had zero downtime in 20 years,
in some case. Tandem Non-Stop was very real -
[https://en.wikipedia.org/wiki/NonStop_(server_computers)](https://en.wikipedia.org/wiki/NonStop_\(server_computers\))

~~~
misiogames
They still are :D

------
KaiserPro
Hmm, I think potted history is more correct...

There is no mention hardware here. HP, Compaq, IBM and Sun all produced
hardware with HA ability. Meaning that normal software could be run on two or
more nodes, and should one break, it'd fall over with no loss of data, or
outages. Here is a (contrived) video where they literally blow up a server
stack:
[https://www.youtube.com/watch?v=qMCHpUtJnEI](https://www.youtube.com/watch?v=qMCHpUtJnEI)

You can do this with VMware, and if AWS could see a way to make money, they'd
do it to. (For a while AWS's platform was not capable, but that was 7+ years
ago. its much more lucrative to make people build in HA and redundancy into
software, as its requires more services to run)

~~~
mattzito
Hey, for once my former life designing HA infrastructure is actually useful!

The reality is that hardware HA was almost always terrible. Of the platforms
you described:

\- Sun had no hardware HA ever, down to the unfathomable design that all of
their E-class machines had only one power cord. They had SOME hotswap
hardware, but the rules were byzantine - it couldn't be the first processor
board, and it couldn't be the last processor board, oh and by the way, if a
processor or RAM went bad, the machine would crash, but when it came back up
it would take the bad processor offline and if you were lucky enough the bad
processor wasnt on the first or last board and then you could hotswap it.

\- The only HA hardware from IBM was the mainframe, and even there, it was
definitely possible for a software fault to take down the entire thing. The
P-series boxes had lots and lots of fancy sounding HA capabilities, but they
would only certify a configuration as fault-tolerant if you bought two of them
and clustered them with HA/CMP (i.e. software HA).

\- Compaq had the NonStop servers, based on the Tandem acquisition. As the
downthread comment correctly pointed out, there was a ton of hardware
redundancy in that platform, but I think it ran a proprietary OS. They also
had their OpenVMS clustering, which offered amazing HA - but it was all
delivered in software, and your app had to be either a) stateless or b)
cluster-aware

\- HP bought compaq, but the HP-UX machines relied on software for their
clustering.

This all dates back to when people thought servers were special in some way,
and needed to justify their insane price points with all sorts of fancy
marketing features. Some were useful (I remember an early IBM linux box with
RAM mirroring that actually kind of worked), but in the end, a Veritas cluster
of decoupled nodes almost always worked better, more reliably, and faster than
any hardware nonsense.

EDIT: fixed my wrong assertion about Compaq tech.

~~~
tyingq
_" Compaq had the NonStop servers, which ran OpenVMS...but it was all
delivered in software"_

NonStop/Guardian had nothing to do with OpenVMS. And the hardware was
certainly purpose built for HA, including redundant, lockstepped CPUs, disk
controllers, etc.

~~~
mattzito
You are 100% correct - in my haste, I conflated the old Tandem hardware line
with the OpenVMS clustering platform. My mistake, I’ll see if I can edit my
post.

------
tannhaeuser
HA was a thing long before "the internet" on mainframes, and still is. And
apart from sharding, there were, and still are, load balancers for your
service tier and HA (redundant, replicated) shared-disk DB clusters from
Oracle and others around 1990 or earlier.

~~~
rbanffy
One word: "airlines".

------
jypepin
The opening about availability and our expectations hits right at home.

I'm originally from France, where stores closing during lunch time, sundays
and mondays all day is normal. I remember when I moved to Canada, it seemed so
amazing that stores were opened 7/7, and sometimes until 9pm (on Thursdays and
Fridays). Wow!

Then I got used to it, and now, anytime I'm back in France, I just find it
unacceptable and wonder how those store survive and make any money while
always being closed!

~~~
QasimK
Because they are all closed at the same time?

~~~
jypepin
Sundays / Mondays, yes, almost all stores are closed. Lunch time, not all of
them close, but it is a fair part.

------
mjlee
As mentioned on HN previously, B&H (a large New York based photography store)
do not take online orders in accordance with Jewish holidays.

[https://news.ycombinator.com/item?id=12183263](https://news.ycombinator.com/item?id=12183263)

~~~
twic
I once worked with a bank where a server we depended on was only available
from 0800 to 1500, then 1530 to 1700. I think that's the only time i've ever
had to build around a computer which took a tea break.

------
feyman_r
For anyone interested in learning more about HA and consistency, I found this
book very interesting:
[https://dataintensive.net/](https://dataintensive.net/). The author does a
great job developing an intuition for progress made along with trade-offs at
each stage.

------
supercon
Good read, thank you. I always welcome these articles where the notion of "how
things came to be" is introduced. Makes it easier to manage the latest and
greatest, when you have an understanding of how things have evolved.

------
wacaraerer
The article mentions sharding (a scaling technique, not exactly something that
helps with HA), but then it stops. Does every node in CockroachDB store all
data?

~~~
manigandham
No, data in the tables is sharded into blocks and each block is replicated
across the cluster using Raft consensus.

------
z3t4
latency, throughput, availability and storage capacity, the quest for the holy
grail.

------
Exuma
Hey bro you want to hit this? Whats the WORST name for a company we can come
up with??

