
Infamous Software Bugs - doppp
https://www.bbvaopenmind.com/en/the-5-most-infamous-software-bugs-in-history/
======
joezydeco
Every software engineer should be aware of the RISKS digest (aka USENET's
comp.risks).

The mailing list and digest just celebrated it's 30th birthday in August.
There's an incredible wealth of history here regarding bugs, related issues,
and the overall risks of our dependency on computers in society.

RISKS broke most of the major stories of the day including the Morris Worm,
the AT&T Long Distance collapse, the Mars Climate Orbiter unit error, the Mars
Pathfinder priority lockup, the Therac-25, the first email spam, the Pentium
FDIV bug, and thousands of other interesting, amusing, and/or scary bugs.

[http://catless.ncl.ac.uk/Risks/](http://catless.ncl.ac.uk/Risks/)

~~~
anc84
I had never heard of that, thank you so much for many cozy winter reading
sessions.

~~~
joezydeco
This is one of my favorite nuggets in the digest, it's from November 1988.
It's Clifford Stoll writing at 3:45 am about his discovery of the Morris Worm.
The Morris Worm was literally the first internet virus captured in the wild.

[http://catless.ncl.ac.uk/Risks/7.69.html#subj1](http://catless.ncl.ac.uk/Risks/7.69.html#subj1)

Warning - this one email will lead you to multiple rabbit holes (Stoll, _The
Cuckoo 's Egg_, Robert Morris, etc).

[https://en.wikipedia.org/wiki/Clifford_Stoll](https://en.wikipedia.org/wiki/Clifford_Stoll)

[https://en.wikipedia.org/wiki/Morris_worm](https://en.wikipedia.org/wiki/Morris_worm)

~~~
randlet
The Cuckoo's Egg is a really fun read!

------
Piskvorrr
Well...I'd say that
[https://en.wikipedia.org/wiki/Therac-25](https://en.wikipedia.org/wiki/Therac-25)
has become much more notorious than the Arianne launch; but it's also a UX
fault, not a straight-out software error.

~~~
dkbrk
Yep, also I would say that Heartbleed is definitely deserving of a place.

~~~
basseq
I agree, Heartbleed is a big miss. Before reading the article, the two that
came to mind were Y2K and Heartbleed. Notwithstanding all the recent security
breeches (e.g., Ashley Madison, Sony) that could be attributed to "bugs".

~~~
gherkin0
If you're limiting it to the "The 5 Most Infamous Software Bugs in History,"
Heartbleed is definitely out. It just happened to be recent, so it has an
outsized place in people's minds. Including it would sort of be like those
foolish "Top 100 bands of all time" lists that have the Beatles at #1,
followed by 85 bands from the last 15 years.

As others have noted, the big miss is the Therac-25 bug, which is pretty
commonly taught in Software Engineering/Computer Science classes _the_ example
of how intangible software can kill people.

------
Gravityloss
Knight Trading's 10 million dollars per minute bug is a good one too:

[http://dealbook.nytimes.com/2012/08/02/knight-capital-
says-t...](http://dealbook.nytimes.com/2012/08/02/knight-capital-says-trading-
mishap-cost-it-440-million/?_r=0)

------
kriro
I'd add the Morris Worm to the list, probably one of the most influential
ones:
[https://en.wikipedia.org/wiki/Morris_worm](https://en.wikipedia.org/wiki/Morris_worm)

~~~
laumars
Which was written by one of the Y Combinator founders (Robert Tappan Morris)

------
MasterScrat
> a 64 bits variable can have a value of −9.223.372.036.854.775.808 to
> 9.223.372.036.854.775.807 (that’s almost an infinity of options)

 _almost_ ;-)

~~~
derrickdirge
9 quintillion rounds to infinity.

~~~
koliber
Just shy :)

------
aaronbasssett
I would add Therac-25[1] and the 500 mile email[2] to that list.

[1]:
[http://www.ccnr.org/fatal_dose.html](http://www.ccnr.org/fatal_dose.html)
[2]:
[http://www.ibiblio.org/harris/500milemail.html](http://www.ibiblio.org/harris/500milemail.html)

~~~
klodolph
The 500-mile email was a configuration error. The software was working exactly
as intended.

~~~
aaronbasssett
I would say that setting any missing configuration value to zero is a bug.

------
teddyh
See also _COMPUTER-RELATED HORROR STORIES, FOLKLORE, AND ANECDOTES_ :

[http://wiretap.area.com/Gopher/Library/Techdoc/Lore/rumor.ne...](http://wiretap.area.com/Gopher/Library/Techdoc/Lore/rumor.net)

------
manaskarekar
I came across this recently, really interesting list with further links

Software Horror Stories :
[http://www.cs.tau.ac.il/~nachumd/verify/horror.html](http://www.cs.tau.ac.il/~nachumd/verify/horror.html)

~~~
rwhitman
It's a shame the list ends in 2004. I imagine the last decade's worth of
software-related horror would be quite interesting

------
ozim
I would add Knight capital thingy on top:
[http://money.cnn.com/2012/08/09/technology/knight-
expensive-...](http://money.cnn.com/2012/08/09/technology/knight-expensive-
computer-bug/)

------
thedaydreamer
Wait a second, it says they had to destroy mass climate orbiter because the
development and underlying software used different metric system ?

It's bit hard to digest. ( Although just checked wikipedia, it also says so )
How can a high performance organisation like NASA could make such a simple yet
fatal mistake ?

Wikipedia page of Mars Climate Orbiter says that NASA was informed about this
discrepancy by two people, but the "concerns" were dismissed.

What am I getting wrong here ? These are not the "concerns" you simply dismiss
in a space mission. Could there be another story to this ?

~~~
icegreentea
So I went ahead and read the MCO Mishap Phase 1 report (linked here:
[http://www.icics.ubc.ca/~cics525/handouts/handout_MCO_report...](http://www.icics.ubc.ca/~cics525/handouts/handout_MCO_report_4pp.pdf))
and I'm having a hard time finding something that backs up the wiki summary of
two navigators raising concerns and having them dismissed.

The report does go ahead and state all sorts of organizational (and otherwise
'soft' issues) that contributed to the end failure.

The report notes that earlier deviations between measured and modeled results
were noted, however, they were hampered by limited data (in the sense that
they couldn't measure what they wanted). It is implied (though not stated) in
the report that in the absence of appropriate data, the operations navigation
team attempted to contain/mitigate the deviations instead of 'solving' it.

The report also notes substantial organizational issues. Different navigation
teams were used in development and operations, and there were insufficient
knowledge transfer during hand-off that hampered the operations navigation
team ability to notice these issues. Communications between the main
operations team and the ops nav team were not effective. They were apparently
spatially separate teams. In addition, model-measurement conflicts which were
brought up were solved via e-mail instead of over formal processes. The report
suggests that systemic use of formal processes may have allowed teams to
uncover the problem earlier in time.

And of course, the report also states that insufficient
verification/validation of the supplied software was not completed. The entire
section on verification/validation (MCO Contributing Cause No. 8) is just a
giant cringe fest.

The implication is that the MCO project was just... not run well.

~~~
thedaydreamer
First - Great job finding this report. Thank you for that.

So had a look at the report.

There was one more problem actually. This machine, the MCO, had asymmetrical
solar panels which would cause solar pressure ( force by sunlight ) to create
a very mild spin ( angular momentum ). Now this angular momentum had to be
desaturated time to time in order to keep this machine stable. Now, one module
called SM_FORCES calculate this adjustment and feeds to AMD ( Angular Momentum
Desaturation ). Now, this SM_FORCES & AMD uses different unit system, which
was ignored by whoever wrote this connecting piece of program. Due to this
error desaturation was not enough ( or more ) and it kept building over the
period of 9 months.

Now, I notice that NASA has a separate team to navigate this machine to mars.
There data showed this angular momentum adjustment event occurred 10-15 times
more than expected. It was like a man walking with one leg shorter than
another. It's a 9 months journey from mars to earth. They must have seen the
first sign to inconsistency with in first few weeks only, just guessing
though.

In this report, out of 8 possible contributing causes, at-least 3 are
attributed to navigation team. I think success of such mission depends not
only on meticulous planning but also on thinking on the feet ability of the
team. ( Any Apollo 13 fans? :) )

------
cafard
Given that the performance of the Patriot missile was much over-hyped in the
first Gulf war, should #3 be on there? Do we know that the missile would have
intercepted the Scud if launched?

~~~
therapix
The Israeli army wrote a report about the error and a patch was uploaded into
the US systems the day after the attack. Bad timing I guess. After day, the
Patriots never missed their target and didn't need to be rebooted.

"e. Two weeks before the incident, Army officials received Israeli data
indicating some loss in accuracy after the system had been running for 8
consecutive hours. Consequently, Army officials modified the software to
improve the system's accuracy. However, the modified software did not reach
Dhahran until February 26, 1991--the day after the Scud incident."

------
ghostDancer
Of the recent ones the Debian OpenSSL bug with the not so "random" number
generator.

------
himlion
Small copy/paste error on the php section:

There are two ways to run a PHP app on nanobox. You can eitherconfigure the
generic ruby engine, or use a framework specific engine.

------
mahouse
Goddamn Wordpress brought to its knees in a few minutes.

~~~
anc84
The 6 Most Infamous Software Bugs in History

------
danburgo
Great article, but IMO it missed one of the most important bugs of all, the
one with the Hubble Telescope and its mirror

~~~
OopsCriticality
How does that qualify as a software bug? Perkin-Elmer's custom null corrector
was misaligned so the mirror was figured into the wrong shape. Edit: if
anything, it was an organizational failure—PE chose to ignore other
measurements that showed the mirror was the wrong shape.

[http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/1991000...](http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19910003124.pdf)

------
rogeryu
If I wasn't one myself, I would say that the human is the biggest bug of all.

------
vlunkr
Ha, I've never seen that BSOD clip before. They handled it really well.

------
raynjamin
Ping of death wasn't big enough?

------
raynjamin
No ping of death reference?

------
jordache
no mention of Windows 3.1 calculator's 3.11 - 3.1 bug?

------
AshFurrow
You yanks and your aversion to the metric system...

~~~
chapium
The scapegoat for what was clearly an organizational failure. Why was the
system using multiple units of measure. Did it pass navigation tests prior to
launch? Was the test flawed?

~~~
VLM
I think OP was making a subtle joke about titling it "5 bugs" but providing
metric 5 aka 6 in the article to cause a buffer overflow in the article
itself. Which the buffer overflow in the article the 7th bug. Personally I
think a recursion fail would have made a funny additional article bug, but
buffer overflows are funny in their own way too. Or a picket fence / off by
one error would have been funny like iterating from 1-5 to output the bug list
where the bugs are enumerated beginning at zero... so why didn't we see bug 0
and the crash at bug 5 would have been pretty funny.

The story is normie clickbait anyway, and most of the bugs aren't mismatches
between the source code and the (possibly non-existent) unit testing
infrastructure, they're just cultural examples of blaming the lowest social
status individual involved, that usually being a programmer. There was a
programmer involved, someone in management screwed up and doesn't feel like
taking the blame, therefore its the programmer's fault. In the olden days
they'd just have blamed the closest (insert ethnic group here) or (insert
religious group here), nothing really to be proud of.

------
fromero
There are a lot of bugs more infamous than this...

this page is bullshit.

~~~
dang
It was arguably a weak article for HN, but instead of commenting like this,
you'd do much better to tell us about some of the more infamous bugs. Then
we'd all learn something—or at least some of us would—which is why people come
here.

