
Therac-25 - markmassie
http://en.wikipedia.org/wiki/Therac-25
======
tinco
In the Computer Science degree I did, every course that was in the Software
Engineering or Formal Methods track started with "Why Software Engineering is
important", and then a group of very bad software bugs, this one was always
one of them. The slides would then be followed by how the professors believed
their course would have prevented them.

Especially funny was the formal verification course that mentioned the Ariane
5. Apparently all the new software in Ariane 5 was formally modelled and
verified, but one part of the system was directly ported from the Ariane 4.
Because the Ariane 4 mission had been successful they did not verify that
(it's an expensive process). The bug that crashed the rocket involved the fact
that the Ariane 4 was 16 bit, and the Ariane 5 was 64 bit, it resulted in an
integer overflow somewhere leading to a crash.

You can spend millions in painstaking formal verification, and pay for the
small part that you did not verify.

~~~
tragomaskhalos
An erstwhile colleague of mine worked on the component that destroyed the
Ariane 5; it performed exactly to spec, detecting that the vehicle was out of
control - due to the integer overflow (in another, unrelated component nb) -
and self-destructing it to prevent it crashing to earth. However explaining
this subtlety to a rabid press was a different story, and the whole thing
became a bit of a PR disaster

~~~
diminish
self-destroying components sound interesting. do you have any info/links to
which type of technology is used in this component?

~~~
gnu8
Nothing too fancy, just strategically placed explosive charges to break up the
vehicle to stop it from traveling further down range and disperse the
propellant to reduce the size of the explosion.

------
evincarofautumn
This is a tragedy that bears remembering. Many descriptions of it are cold and
clinical, and mention the deaths only statistically. The Therac-25 is
regularly used as a soapbox for various software engineering disciplines and
machine-checked program verification techniques. But it’s worth keeping this
in mind: humans suffered and died because of our mistakes. Whether due to
underpowered tooling or mere human error, as a field we are collectively at
fault. This is not a fucking game.

~~~
wldlyinaccurate
This is precisely why the current "hacker culture" scares the hell out of me.
The vast majority of new programmers don't understand the basics of software
design; we are entering an era where software is written by whoever will work
for the lowest wage.

~~~
kohanz
This is why we have regulatory agencies like the FDA and development standards
like ISO 13485 and IEC 62304. As someone who has worked for most of their
career in medical device development, I can tell you that anyone who is a
"hacker" rather than a software developer/engineer would not last a day in
such a regulated workplace.

~~~
rjzzleep
you and your op are so very wrong.

throwing together a website doesn't make you a hacker. allow me to rephrase.

using some tools a person built for you to throw together a website, and you
using them together in exactly the manner that they were meant to be used
makes you a user/consumer not a hacker.

quite a few of actual hackers i saw withered in university because they did
what grad students did in their high school days at home.

i can tell you a lot of things that you use in production today even in your
field is the result of a lot of these hackers you so disparage.

edit: rephrased last paragraph

~~~
kohanz
I'm not sure why you are being so defensive. I never _condemned_ (the word
you're looking for, rather than "condoned") hackers. People who like to learn
on-the-fly, tinker, make things, have an important place in this world,
including software. I would never deny that.

Also note that the roles of "hacker" and "software developer/engineer" need
not be mutually exclusive.

I was pointing out to the parent comment that someone who is not reasonably
versed in proper software design, process, and documentation cannot just hack
away at products such as Therac-25 anymore. At least, it would be much more
difficult for them to do so and bring it to market successfully, due to all
the regulatory red-tape.

~~~
rjzzleep
nope, not condemn, but maybe reproach or disparage are better words for your
attitude.

why don't you rephrase your original text? had you said "these new programming
hipsters" i wouldn't have argued with you. the truth is that everyone nowadays
can program, which is why people get so defensive about that "discipline"

but you guys completely misunderstand the meaning behind the word hacking. so
i had to correct it.

~~~
angersock
_the truth is that everyone nowadays can program_

yeah, no, try again

get outside your bubble

------
primitivesuave
Another terrible tragedy caused by a software error:

[https://en.wikipedia.org/wiki/MIM-104_Patriot#Failure_at_Dha...](https://en.wikipedia.org/wiki/MIM-104_Patriot#Failure_at_Dhahran)

Over 100 hours, the system's internal clock drifted by a third of a second,
which was enough to introduce a significant error into the Patriot missile
defense system. They updated the software the next day when the error caused
28 soldiers to lose their lives.

~~~
agilebyte
The error contributed, SCUD missile caused.

~~~
spingsprong
The Patriot missile probably would have failed to destroy the Scud even
without the error.

The Patriot was designed to shoot down aircraft, but ballistic missiles fly
much faster than an aircraft, and the warhead fusing system they had back then
was too slow.

Most Patriots detonated behind the Scuds they were targeting.

------
aaronem
I don't disagree in the slightest that the software had a deadly bug, or that
that bug shouldn't have been allowed to make it into production. But I am
surprised no one in this thread has yet identified the real source of the
problem. From Wikipedia:

> The accidents occurred when the high-power electron beam was activated
> instead of the intended low power beam, and without the beam spreader plate
> rotated into place. Previous models had hardware interlocks in place to
> prevent this, _but Therac-25 had removed them, depending instead on software
> interlocks for safety._

I am just a self-taught programmer, and certainly not any kind of trained
engineer. But when there's something your machine must not be able to do, and
it's possible to design it such that it _physically cannot_ do that thing,
what possible excuse can there be for not doing so?

~~~
NickNameNick
That's definitely poor engineering, the exact kind of thing we were told not
to do when I took engineering classes.

------
camperman
Very sad story.

This page introduced me to Nancy Leveson's work a few years back (her paper on
the Therac is linked at the bottom of the Wiki page). She's written two
excellent books (and a number of papers) on engineering, safety and complex
systems which are well worth reading.

[http://sunnyday.mit.edu/](http://sunnyday.mit.edu/)

------
ygra
This was one example given in a very early lecture about software testing and
failures in our uni. This (and others) make me hope that I never have that
much responsibility over human lives as a developer.

~~~
pjmlp
It is this type of failures that drives me to push for software quality and
moving away from languages like C.

As well as, consider that liability should be part of the industry as it is in
other industries.

Sadly the drive for profit speaks otherwise.

~~~
mortov
The issue is not because of the language used. You can have bugs - potentially
fatal as this case underlines - in any language.

When I studied the Therac case, there was also a study in the "Killer Robot"
\- see here :
[http://www.onlineethics.org/cms/5122.aspx](http://www.onlineethics.org/cms/5122.aspx).
Well worth a read to understand how software can become dangerous.

Sadly nowadays, a search for Killer Robot turns up all sorts of stories of
drones being used to kill people. When I first looked up killer robot a number
of years ago, the results were pretty much all about the ethics scenario and
not real life people being killed. How society has moved on...

~~~
Piskvorrr
Also relevant (and on HN newest as of now) - Toyota accelerator issue:
[http://embeddedgurus.com/state-space/2014/02/are-we-
shooting...](http://embeddedgurus.com/state-space/2014/02/are-we-shooting-
ourselves-in-the-foot-with-stack-overflow/)

~~~
fnordfnordfnord
This one isn't merely an example of a failure of engineering, but also of
organizational failure at Toyota, and organizational failure at both the NHTSA
and NASA.

------
jrockway
A couple lessons:

Never underestimate the value of a good safety interlock.

This is not the correct way to check for integer overflows: if x + 1 > INT_MAX
{ ... }

~~~
angersock
Hardware interlocks are really important.

------
tluyben2
With outsourcing of medical software to overseas 'software factories' it still
surprises me that we don't hear more about this. It does happen and you do
read the stories if you search for them, but it's not big enough news. Anyone
knows sources for this kind of news? I only know
[http://www.reddit.com/r/criticalsoftware/](http://www.reddit.com/r/criticalsoftware/)

~~~
josephlord
RISKs Digest

[http://catless.ncl.ac.uk/Risks](http://catless.ncl.ac.uk/Risks)

------
joezydeco
Just FYI, comp.risks (and Peter G. Neumann's moderation) live on at
[http://catless.ncl.ac.uk/Risks/](http://catless.ncl.ac.uk/Risks/)

------
ksrm
The investigation makes for fascinating reading:
[http://courses.cs.vt.edu/cs3604/lib/Therac_25/Therac_1.html](http://courses.cs.vt.edu/cs3604/lib/Therac_25/Therac_1.html)

------
bamdadd
We have read this story as part of the Parallel and Distributed course in for
the MSc. of CS in Manchester university. The whole story with details can be
read here :
[http://sunnyday.mit.edu/papers/therac.pdf](http://sunnyday.mit.edu/papers/therac.pdf)
Very sad story though.

------
perlgeek
Here's a podcast about the Therac-25 and software safety:
[http://disastercast.co.uk/episode-13-therac-25-and-
software-...](http://disastercast.co.uk/episode-13-therac-25-and-software-
safety/)

I found it quite interesting.

------
jbb555
Formal verification is all very well, but I find in practice that a lot of the
worst problems are specification problems "of course it wasn't meant to do
__that __"

------
undoware
band name. Called it.

~~~
adultSwim
Too soon

