Hacker News new | past | comments | ask | show | jobs | submit login

It was. In the entire history of electromechanical switching in the Bell System, no central office was down for more than 30 minutes for any reason other than a natural disaster or major fire. There are books about Number Five Crossbar and how it worked, and they're worth reading if you design high-reliability systems.

The only central office downtime from a major fire was in 1975 in New York City, resulting in 23 days of downtime.[1] All the incoming cables and the main distributing frame had to be replaced. The switching equipment, on upper floors, survived and just needed some maintenance and cleaning. That was still a crossbar office; it hadn't been converted to electronic switching yet. Worst disaster in the history of the Bell System. Bell switched to less flammable cable coverings, like plenum cable, after that.

Widespread failure of 911 service suggests an overcentralized architecture. 911 requires a phone number to address lookup, so there's a database involved. Widespread failure indicates this was implemented as a remote query service ("in the cloud") rather than read-only database copies of the directory at each central office.

[1] https://youtu.be/f_AWAmGi-g8?t=110

There's an old AT&T video showing the careful coordination and huge amount of manpower needed to upgrade a single telco switch. They had rows of workers with cable cutters, in order to literally cut the entire service over to the new equipment. Total downtime was just 47 seconds.

(And even during the cutover, they're careful not to interrupt any emergency calls.)


I love the "DON'T TOUCH THIS" post it at 3:55

That was absolutely amazing. Thanks for posting.

So, what exactly happened there?

Where were the signals going after the cables have been cut?

The new switch was connected to each line in parallel with the old switch, but it was able to ignore incoming signals until told to take over. The old switch was hard wired enough that it could not be remotely commanded to have no effect on incoming signals. The line relay attached to each line had to be physically cut out of circuit.

That is spectacular - thanks for the link!

Remember when that guy cut the fiber along 280 in about 2009 or something resulting in major outages.

And then again in 2015.

Also, there was that kid who put together areport of all infra cable runs for a thesis and the FBI confiscated it... https://www.google.com/amp/s/www.nbcbayarea.com/news/local/F...

I'd love to read more about the crossbar and switching systems, can you recommend a specific book?

"A History of Engineering & Science in the Bell System: Communications Sciences, 1925-1980".

Here's probably the simplest introduction.[1] When reading this, note phrases like "ORIGINATING REGISTER SEIZES AN IDLE MARKER". Think of that as "originating register asks for an idle marker from the pool of markers". Very little equipment is dedicated to specific lines. Everything is done by requesting a service from one of several identical units. If one of those units fails, the system capacity is reduced, but the switch does not go down as long as at least one of each unit type is still up.

The switch fabric, the actual crossbars, is dumb. It just makes the connections it's told to make.

Shared resources include:

- Originating registers. These provide dial tone and record dialed digits. They parse the incoming number to the limited extent needed to decide when it's finished.

- Markers. The smart part of the system. When an originating register has a full set of digits, it finds an idle marker and sends it the call info. The marker figures out what to do next, in about half a second, and then it's free for another call. Markers tell the switch fabric what connections to make. They're duplicated, and the two halves check each other. If the halves disagree, the marking aborts. If a marker aborts, the originating register tries again with another marker. One retry only. Marker failures also cause data to be sent to a "trouble recorder". As usual, there's more than one of those, and they're "seized" as needed.

- Senders. These send digits from one exchange to the next. They're primitive modems.

- Trunks. Lines between exchanges. Full duplex, four wires.

- Terminating senders. The receive side of senders.

There are also units associated with accounting, coin telephones, routing tables, and other auxiliary functions.

The key takeway here is that there's no single point of failure.

[1] http://wedophones.com/TheBellSystem/pdf/no5crossbar.pdf

This involves cell phone access... which is a little more complicated than a billing address lookup.

> There are books about Number Five Crossbar and how it worked

Can you recommend such a book?

/edit: i see you answered this for another poster

20m10s is definitely my favorite part for humor purposes.

Thank you for sharing, that was an enjoyable watch!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact