
My Hardest Bug - ttsiodras
http://www.peterlundgren.com/blog/my-hardest-bug/
======
daly
IBM OS gets loaded into memory which the hardware marked as "read-only".
However, a bad bit in the memory caused a change which caused the hardware to
mark the page as "dirty" so it got paged out with the failed bit copied into
the disk image... The OS took "core dumps", leaving a pile of paper with a
binary printout of all 16Mbytes of memory at my door, because core dumps
automatically printed. Paper piles every day. It was a long week.

PL/I had a language feature to copy-by-name from one record to another record.
However, it didn't work as advertised and the only way to find it was from
core dumps. A grant-funded project was failing.

The IBM 370 wouldn't boot up for 3 days. It turned out that the 3270 display
controllers on channel 0 were causing a fault at power-up (found only after we
flew in one of the designers). It turned out that the hardware maintenance
guys were swapping boards between the dozens of controllers but the
controllers were at different levels of "yellow wire patches" so they were not
compatible. So 72 programmers (who shared the machine) did nothing for 3 days.

Back in the 1970s I hooked a Unimate robot to a PDP 11/03 which caused the PDP
to freeze. Turns out there was a "ground loop" (electrical fault) that caused
the PDP backplane to carry 440V rather than 5V. It takes a long time to decide
to check voltage levels on a backplane.

Our Robot control board required a "useless read" to initialize the board. The
data was garbage so the result was ignored. All following reads were valid.
But it didn't work. The hardware guy swore we were not doing the initialize
read. I showed him the code. We finally put a scope on the wires to settle it.
The software was NOT doing the initial read, despite what the code said. Turns
out the new Bliss optimizing compiler decided it could remove the read because
the result was never used.

My paper tape failed to load from the TTY just before we were going to demo
for our major customer. Turns out the memory address control chip died.
Fortunately Radio Shack had the chip and a de-soldering gun so we replaced the
chip and it worked.

There were SO many more... Except for the last one, they cost me several days
of "being in the hot seat".

A subtle bug can ruin your whole day.

