
History's Worst Software Bugs (2005) - t-3-k
http://archive.wired.com/software/coolapps/news/2005/11/69355?currentPage=all
======
scott_s
The Morris worm is on that list; Robert Morris is a partner at Y Combinator.
That incident is the source of the pg quote that would not stop kicking around
in my mind through the latter half of grad school:

"The danger with grad school is that you don't see the scary part upfront. PhD
programs start out as college part 2, with several years of classes. So by the
time you face the horror of writing a dissertation, you're already several
years in. If you quit now, you'll be a grad-school dropout, and you probably
won't like that idea. When Robert got kicked out of grad school for writing
the Internet worm of 1988, I envied him enormously for finding a way out
without the stigma of failure."

From
[http://www.paulgraham.com/college.html](http://www.paulgraham.com/college.html)

Morris is now also a tenured MIT professor, so things ended up okay for him.

------
lordnacho
Seem to me this list needs to incorporate how easily these bugs could have
been avoided/detected/fixed, rather than just how dire the consequences were.
It doesn't say much about what people did to test their code. For instance the
first one in the list is something unit testing would have fixed. Take the
trajectory function, plug numbers in, see if it's correct.

Some of these things were a lot more obvious than others.

Race conditions, for example, can be really hard to find, but as long as you
know it might happen (these days it's just about every system) you can take
precautions for testing. If it's important, maybe hire someone with
experience.

The AT&T network crash thing looks pretty unobvious to me. A network graph can
have a huge number of topologies, so you can't really test them all. Machines
might also be using different versions of software that don't interact nicely.
Sounds like they took sensible precautions and were thus able to roll back.
That's why "rollback" is a word.

There's a whole class of bugs where things work and then need to be upgraded.
You think it will work, because there aren't many changes and stuff is
qualitatively the same. Like the number overflow bug in the Ariadne, or the
buffer overflow in the finger daemon.

~~~
Retric
Unit tests would be highly unlikely to catch most of those.

"a formula written on paper in pencil was improperly transcribed", "neglect to
properly "seed" the program's random number generator, A HW bug that's not
close to obvious numbers to check, intentionally inserted bugs, input outside
of the intended design, etc.

~~~
zalzane
>"a formula written on paper in pencil was improperly transcribed"

offtopic, but a unit type would have prevented that. i had no idea how many
errors i was making in my math programs before i started using F#'s type
checker to make sure all the types lined up properly.

~~~
GFK_of_xmaspast
I don't know the actual transcription error, but how's it going to find a 5
being made a 6 or something?

~~~
zalzane
it wont, but the vast majority of errors with math formulas are along the
lines of adding velocities with positions, raising something to the wrong
power, using a multiply instead of an add, putting a parenthesis in the wrong
spot, performing equations in the wrong order, etc.

all of those can get caught with type checking, but it isn't perfect

------
trsohmers
Another thing which should be in this list (relating to floating point
rounding error):

"On 25 February 1991, a loss of significance in a MIM-104 Patriot missile
battery prevented it intercepting an incoming Scud missile in Dhahran, Saudi
Arabia, contributing to the death of 28 soldiers from the U.S. Army's 14th
Quartermaster Detachment."

[https://en.wikipedia.org/wiki/Floating_point#Incidents](https://en.wikipedia.org/wiki/Floating_point#Incidents)

------
luso_brazilian
No mention to Y2K and mankind can thank the millions of man/hours employed
(and regally paid) to stamp out the majority of the occurrences of that bug.

It could really be a game changer if it didn't get fixed and I don't really
know what expect in the wake of Y2K38 because it's about there, lurking in
waiting.

~~~
BinaryIdiot
> I don't really know what expect in the wake of Y2K38 because it's about
> there, lurking in waiting.

I've been wondering the same. The Y2K bug was easy for many places to fix.
Granted I wasn't a profressional developer at that time but I've looked at
historical fixes at the companies I have worked at and all of their solutions
were pretty easy (change application code to use 4 numbers instead of 2, run
SQL update script to update existing data, done). But the 2038 bug? That one
isn't near as obvious to fix in my opinion.

~~~
perlgeek
The obvious fix is to use a 64bit integer to hold timestamps.

~~~
luso_brazilian
That's the fix, of course, but what about all the embedded software that will
last enough to cross that barrier but that won't be upgraded from its 32bits
timestamps?

~~~
pavel_lishin
You'd have to find it, too. How many companies have manufactured devices with
embedded software that have gone out of business, devices for which no manuals
exist anymore, etc?

------
CookWithMe
The Soviet Gas Pipeline explosion - if the whole CIA story is true at all -
should not be labelled a bug... The code allegedly did exactly what it's
creator intended ;-)

~~~
ksk
Well, typically the users decide what is and isn't a bug. The developers can
always say "I intended it to do this". ;)

------
rer0tsaz
> Programmers respond by attempting to stamp out the gets() function in
> working code, but they refuse to remove it from the C programming language's
> standard input/output library, where it remains to this day.

gets was deprecated in C99 and removed in C11.

~~~
rgbrenner
C11 is about 6 years newer than the article. So the article was accurate at
the time it was written.

------
TillE
The title said "software", so I assumed they were going to exclude the
infamous Pentium FPU bug. But no, there it is.

To me, the interesting thing about testing a CPU is that it's theoretically
possible to comprehensively test all inputs and outputs, but the time required
makes that totally impossible.

~~~
trsohmers
Not so much anymore... there has been a ton of work put in by the EDA
companies to get companies to do formal verification (which they obviously
sell very expensive tools for) even before you get to physical design and
testing.

For the chip my team is designing, we are formally verifying our ISA using a
new domain specific language
([http://www.cl.cam.ac.uk/~acjf3/l3/](http://www.cl.cam.ac.uk/~acjf3/l3/))
which really helps lock down the "gold model" which all our other tests (Our
cycle accurate C++ model, our RTL (verilog) model, and eventually the physical
simulation) need to live up to.

As far as the tools provided by EDA companies, they have a ton of standard
verification tools that have actually gotten a lot better and faster since the
90s, but best of all there are things like Cadence's Palladium
([http://www.cadence.com/products/sd/palladium_series/pages/de...](http://www.cadence.com/products/sd/palladium_series/pages/default.aspx))
which is basically a super FPGA like device which isbuilt specifically for
verifying functionality of your circuits... while a FPGA is to 100 to maybe
1000x faster than simulating RTL, Cadence claims Palladium is up to 1,000,000x
faster than RTL simulation.

Anyways: Most chips done today (especially due to the advanced process nodes)
require EXTENSIVE verification that is just as long, if not longer, than the
design and implementation (though it occurs at the same time as part of the
"flow").

~~~
nickpsecurity
Exactly. You might like and keep handy this illustration from IBM's efforts. I
think it nicely summarizes many of the tasks and issues in HW verification at
various layers. At the least, it should give the impression to readers of how
overwhelming the job can be without best-in-class tools. ;)

[http://www.testandverification.com/DVClub/03_Jul_2014/DVClub...](http://www.testandverification.com/DVClub/03_Jul_2014/DVClub_Workshop%20June14%20V5.pdf)

I think we can do same for software, though. Just got to keep it simple,
layered, and each layer building on one before it properly. I did it
informally in a style that copied Wirth's Lilith work albeit special-purpose.
Verisoft did quite a bit on full-stack for imperative. SAFE (crash-safe.org)
is working on it for functional. I think a shortcut is to implement VLISP
Scheme in hardware using hardware verification techniques along with
previously verified I/O system. I've already seen LISP processors, VLISP for
rigorous implementation, Shapiro made a security kernel, and the right
hardware target can be reused for ML and Haskell code potentially. To counter
hardware issues, run several in synch in same way as old Tandem NonStop
architecture. Result should be flexible, fast enough for some workloads,
enforce POLA, and have five 9's.

What you think of combining a verified LISP with hardware implementation as a
time saver on goal to verification?

Note: Remember that, once we have that, building and verifying other
toolchains is so much easier because we can work at high-level. Even highly-
optimized systems such as yours could benefit from rigorously-verified systems
maybe running same synthesis or checks overnight as a check against faster,
possibly buggy implementations you use for iterations. Although, I mainly see
them as a root-of-trust for other systems in network.

------
OliverJones
1993 -- Intel Pentium floating point divide error.

Here's a joke from 1993. It's been a good year for Andy Grove, CEO of Intel.
They've rolled out the Pentium and it's been a big success. So he walks into a
bar and asks the bartender for a shot of 22-year old Glenmorangie Scotch to
celebrate. The bartender puts the glass in front of him and says, "that's $20,
sir."

Andy puts a twenty dollar bill on the counter, looks at it for a moment, and
says "keep the change."

------
spacehome
Seems more like a list of the software bugs with the most severe consequences.

~~~
engi_nerd
So, what other measure of "worst" would you suggest?

