
The programmer behind the THERAC-25 Fiasco was never found - whytai
https://twitter.com/zgryphon/status/1201292269787238400
======
WalterBright
Exacting revenge on people who make mistakes is the wrong way to improve
safety. Even the best people make mistakes. Fixing the process is how safety
is improved.

All revenge does is halt progress on new technology, and cause all involved to
cover up problems rather than expose and fix them.

~~~
rob74
A chilling example of this from another field:
[https://en.wikipedia.org/wiki/2002_%C3%9Cberlingen_mid-
air_c...](https://en.wikipedia.org/wiki/2002_%C3%9Cberlingen_mid-
air_collision#Murder_of_Peter_Nielsen) \- here also, the air traffic
controller who was responsible for the two planes that collided was singled
out (and later murdered by the aggrieved father of a victim), but arguably the
lax procedures used at Skyguide were also a major factor in the accident.

~~~
loco5niner
Wow, and the murderer was released in 2 years due to his "mental condition",
and 9 years later received a (unrelated?) award that represents, among other
things "for educating the younger generation and maintaining law and order".
Talk about a miscarriage of justice!!!

> He was released in November 2007, because his mental condition was not
> sufficiently considered in the initial sentence. In January 2008, he was
> appointed deputy construction minister of North Ossetia.[32] In 2016,
> Kaloyev was awarded the highest state medal by the government, the medal "To
> the Glory of Ossetia".[22] The medal is awarded for the highest
> achievements, improving the living conditions of the inhabitants of the
> region, for educating the younger generation and maintaining law and
> order.[33]

------
BuildTheRobots
One of the things the author fails to mention in the tweet summary, is that
the same software was in use on a previous version of the machine and had
absolutely no problems.

I believe (from memory) the previous version had hardware interlocks that
masked the issue and the T-25 did not have the hardware interlocks installed.
This lead to a situation where the software was viewed as heavily tested and
therefore trusted, even though it shouldn't have been.

I've always seen this as an example of why physical/hardware interlocks are
really important when you're mixing software with hardware that can easily
hurt people.

I'm also always amazed by how few people seem to know about the Therac-25
incident, especially people that work in therapeutic radiation roles (in the
UK anyway).

~~~
wolrah
> I've always seen this as an example of why physical/hardware interlocks are
> really important when you're mixing software with hardware that can easily
> hurt people.

Not only that, but running in to the interlock should be considered a notable
event. The machine should not just continue operating as normal, it should be
clear to the operators that something potentially dangerous has occurred and
should be investigated.

It seems like they had just assumed that because no one had managed zap
someone with the previous models that the software must have been perfect,
even though the previous models had hardware interlocks preventing the
dangerous scenario. Those interlocks had presumably been tripped many times,
just no one ever brought it to the attention of the vendor.

If a system trips a safety interlock it should fail to a safe configuration
and remain there until reset by someone capable of investigating why it was
tripped in the first place.

Modern traffic lights are a good example of doing it right. In those cabinets
you see at every intersection, right next to the traffic light controller will
be a device called a conflict monitor. This device will be wired to the
circuits feeding the light heads themselves. If two conflicting movements are
indicated for whatever reason, be it a failure of the controller, a short in
the wiring, etc, the conflict monitor will trip and set the intersection to a
fail-safe mode (usually either all-red blink or yellow blink for a main road
with reds elsewhere) until manually reset by a human.

\---

> I'm also always amazed by how few people seem to know about the Therac-25
> incident, especially people that work in therapeutic radiation roles (in the
> UK anyway).

That's interesting, at least amongst my "techie" friends it's common
knowledge. Many of us who went to college for computer science related things
had it used in one class or another to get across the point that bad software
can kill people in really unexpected ways.

I guess maybe the medical side of things doesn't find it worth as much
attention because they don't have as much to learn from it.

~~~
nwallin
We have physical interlocks all the time and they make intuitive sense to us.

If you park you car, and crank the steering wheel all the way to one side, it
won't start unless you put it back. If you have an automatic transmission
vehicle, if you turn the key without pushing the brake, it won't start. If you
have a manual, it won't start unless you push the clutch in. (This didn't used
to be true. Citing Ferris Bueller.)

Lots of stuff goes wrong when the hardware people depend on the software
people for correctness and the software people depend on the hardware people
for correctness. Insert <Group A> and <Group B> for hardware and software.
Could easily be writers and editors in journalism.

------
i_feel_great
"...The man (the authors seem to know it was a man, at least) who wrote the
software for this machine did so alone, without documenting what he was doing.
The company then sort of vaguely tested it."

Lone wolf programmer, no docs, vague testing. What kind of manager(s) would
let this out the door knowing this? Never mind the lone wolf programmer. Find
the managers and beat them with sticks

~~~
guiriduro
If I remember, the earlier systems had a hardware lock that prevented an
overdosing for whatever reason, including software fault. Sure, the software
was still faulty, but don't forget if software was developed for a particular
hardware such that a certain concern could be discounted (even if it ought to
have been tested for) - is it really the developer's fault when the
manufacturer (presumably for economic reasons) decides to remove safety
features and increase the risk surface?

~~~
brazzy
According to Wikipedia, the new model was actually deemed _safer_ in an audit
on the grounds that unlike a mechanical lock, the software could wear out or
get damaged. So it seems more likely that people generally had a very mistaken
view of software reliability.

~~~
jrootabega
I assume you mean software could NOT wear out or get damaged

~~~
brazzy
correct

------
paggle
Is their name really important? The deaths weren’t from a bad programmer, they
were from a lack of software quality methodology. The entire field of software
quality assumes that all programmers will write bugs.

~~~
wpietri
People are responsible for their actions. There were many things wrong with
this project, but one of them was fatally bad code, which this guy took money
to write. Having him be named strikes me as the very minimum in
accountability.

I think it's also important to name him, to interview him. To understand how
he came to kill. So that anybody writing life-critical code today can say,
"I'd better not end up infamous like that guy. I can't make the same
mistakes."

Maybe if he had been held accountable, the programmers at Uber wouldn't have
been lax enough to code up the negligent homicide of Elaine Herzberg.

~~~
adrianN
A culture of blame leads to programmers of safety critical software hiding
their mistakes.

~~~
mcguire
A culture of refusing responsibility is better how?

~~~
adrianN
The opposite of a blame culture is not a culture of refusing responsibility,
but a system where you look for problems in the process when a fault makes it
into the final product. It's always a mixture of specification,
implementation, and QA that makes mistakes possible. Putting the blame on the
engineer is not very helpful, it does little to prevent future problems.
Trying to figure out how it was possible that an implementation bug made it
through QA allows you to prevent similar mistakes in the future (at least if
you're lucky).

~~~
wpietri
Just to be clear, I'm not putting the blame on the engineer. Many people are
responsible for this failure. What I'm saying is that the engineer is one of
the responsible people, and that as fellow engineers we should be especially
concerned with making sure that responsibility is correctly performed and has
appropriate accountability.

------
jwilliams
There is the assumption that this is to protect the programmer -- however I
feel it's much more likely it's there to protect others.

As others have pointed out, this was a systemic failure. Pointing out the
individual highlights those upstream.

~~~
goatinaboat
_Pointing out the individual highlights those upstream._

Exactly. Look at Experian. Attempted to blame an individual engineer for not
applying a patch, then the next thing they know, the entire world is asking,
why is your CISO a music major?

~~~
hn_throwaway_99
> why is your CISO a music major?

The implication of that question is nonsensical. Experian had/has many
problems, but the fact that their chief security officer got a degree in music
decades earlier wasn't one of them.

Some of the very best programmers I know were music majors - it's actually not
uncommon at all. I mean, Steve Jobs only went to college for 1 semester, and
afterwards he audited creative classes like calligraphy. Is the implication
that he wasn't qualified to be CEO of Apple because he didn't have a degree in
management?

~~~
goatinaboat
_The implication of that question is nonsensical. Experian had /has many
problems, but the fact that their chief security officer got a degree in music
decades earlier wasn't one of them_

You’ve managed to completely miss my point about attempting to blame an
individual engineer backfiring and hitting senior management. No one would
have cared what her degree was in otherwise. But it became a stick to beat the
organisation with.

------
jng
The developer in question has probably lived some pretty bad 40 years with
nightmares about the issue. They might be 60 now, or 80, or already passed
away. Only bright spot was probably the fact that they were not known and thus
it didn’t have a wide social effect on their life. I have a hunch that last
part is about to end in a few months.

~~~
nothal
It was one "John Smith". I almost feel as though it must have been many
programmers and they pinned it on the guy who left before it blew up. No one
wants that black mark on their record and it would be insane to think most
people didn't remember anything about him.

------
skissane
From reading Nancy Leveson's paper, I get the impression she probably does
know the programmer's name, but little or nothing beyond that, and has chosen
not to publicise his name. (She probably thinks that is the ethical course of
action.) The lawsuit depositions to which she refers, quite likely include his
name, but not much else. You would think you would remember the name and
gender of a former colleague, but that doesn't mean you could answer any
questions about his past work experience or qualifications – most of us know
very little of our current colleagues' past work experience or qualifications
– even when they tell us, that kind of information often isn't very memorable.

------
mikorym
Since the real solution to the problem ended up being a hardware solution (a
rate limiter) I have trouble believing the culprit is the "guy who wrote the
code".

Boeing is having these same question asked about them and I wouldn't be
surprised if their "solutions" would essentially be hardware solutions.

~~~
jki275
The solution was to fix the software, not add hardware. A piece of hardware
was added to the system to overcome software errors, but that was not the
solution.

The software was badly broken, and had been for years. The difference between
the 25 and previous models is that previous models had interlocks that did not
allow the software errors to manifest.

------
codeulike
re: the actual bug - it took a long time for people to reproduce it and figure
out what it was.

This article gives an overview of the bug:

[https://www.bugsnag.com/blog/bug-day-race-condition-
therac-2...](https://www.bugsnag.com/blog/bug-day-race-condition-therac-25)

A much longer paper about the system here (linked from the OP twitter thread)

[https://web.stanford.edu/class/cs240/old/sp2014/readings/the...](https://web.stanford.edu/class/cs240/old/sp2014/readings/therac-25.pdf)

------
makerofspoons
It is unbelievable to me that nobody knows who this was. It makes me wonder if
somewhere out there is some software engineer, nearing the end of their
career, who carries this in their conscience. Do they know about it? It seems
it is such a famous case and the medical device industry so small that surely
they must.

------
aaron695
> The programmer behind the THERAC-25 Fiasco was never found

It was known.

It's a shame we live in such a vengeance culture what we really want is for it
to be public.

People die weeky in most hospitals because doctors and nurses don't wash their
hands.

Get over THERAC, is saved lives and the programmer helped with that.

From Reddit AMA -

"My teacher does know the name, but is bounded by the courts to not release
it. He knows the programmer is living in guilt and did say that he has left
programming as his career. Although, it was not entirely his fault, as my
teacher explained, the necessary software development process for a machine
like this was not there, and no checks were in place.

tl;dr Cannot be revealed, but wasn't entirely his fault."

~~~
hhas01
“wasn't entirely his fault”

Nobody says it was; such disasters are often multifactorial. But given to his
position that person holds key knowledge and insights into what went wrong
that no-one else has. Without access to that information, investigators can
only hypothesize.

This is why things like whistleblower laws and indemnity insurance exist, to
enable the full and unvarnished truth to come out. How are errors meant to be
fixed correctly when you don’t have all the information as to the cause?

Compare how air crash investigations work. Or research into procedural
improvements to hospital hygene. There are things far more important than just
finding people to blame.

------
stebann
How many years has an attorney prosecutor to investigate this cases? I mean
both Canada and USA.

In my country, if something like this is considered violating human rights,
then the general prosecutor could re-open the case, it doesn't matter how many
years after (this was intended for investigations on the disappearances during
last dictatorship, but I think our constitution abides my reasoning if
something like this happens).

------
axilmar
I think the search for the programmer was not about blaming him but to find
information about his background and question him about the actual software
development process.

In any case, the only person who has all the details about the development
towards this incident is the guy that wrote the program. He is the only one
that can shed any light into this. I think it's worth finding him even
nowadays.

~~~
tyingq
Funny that git chose "git blame" as the name for that piece of functionality.

~~~
michaelcampbell
Isn't that just an alias for `git praise`? In any case they're on equal
footing.

~~~
tyingq
It's the other way around. Blame was the original, and did not initially have
a polite counterpart.

------
8bitsrule
No news here. This 21-year-old college report [0]from 1998 (Porrello, cites
include 1993 Leveson, Nancy G., and Clark S. Turner.) has -a lot- more facts
without the hysteria. It states:

"<b>One</b> programmer, over several years, revised the Therac-6 software into
the Therac-25 software (AECL has not released any information about the
programmer or his credentials)."

[0]
[https://web.archive.org/web/19980201101244/http://cobra.csc....](https://web.archive.org/web/19980201101244/http://cobra.csc.calpoly.edu/~dbutler/papers/THERAC25.html)
\- "Death and Denial: The Failure of the THERAC-25, A Medical Linear
Accelerator"

Previous Therac models had hardware interlocks to prevent some modes; they
were removed in favor of software for the 25. No doubt there were some
engineers who knew more about this.

~~~
C1sc0cat
This is amazing - I would have thought that any medical equipment would go
through exhaustive testing and an order of magnitude more for systems
involving radiation and a machine capable of lethal doses.

At my first job we even had a separate safety officer for our low powered
sources used for tracing waterflow.

~~~
eigenvalue
Part of the problem is likely that the AECL is a quasi-public entity, and
governments tends not to be nearly as vigilant when regulating or policing
other branches of the same government as they are when regulating a 100%
privately owned company.

~~~
C1sc0cat
Yes I saw that relying of crown immunity

------
Cheyana
Perhaps it was a relative or friend of someone way high up in management.
Otherwise he or she would have been thrown to the wolves.

~~~
dasil003
Could be. Alternatively, he might have had a long paper trail of dissent on
the topic of removing the hardware interlocks and they just wanted to keep him
off the stand.

------
bregma
Such a fatal failure is never the result of one person making a mistake.

It's always the result of many mistakes piled one atop the other, and you'll
always find a bean-counter on the top adjusting an Excel spreadsheet somewhere
to make the numbers come out in a way that pleases some executives.

Did they ever find the name of the accountant behind the fiasco?

------
pacaro
My first job was working for a software company that had killed people in the
past. This was part of the London Ambulance scandal in the early 90s. The
official inquiry had mostly exonerated the company.

It has a crazy impact on corporate culture, it was rarely talked about except
in hushed tones over beer, the management was extremely averse to any
publicity, no press contact for any reason (compare to my next employer
putting out empty press releases at least weekly if not more often), sales
staff had an extensive playbook for not answering questions about it

I can understand an individual developer wanting to disappear in this
scenario. If they had internalized blame for this, I can certainly imagine
them choosing never to work in the industry again (or making more extreme
choices)

------
jlawer
I think its amazing to think it wasn't so long ago you could actually
disappear by simply keeping out of the spotlight. Now in most western
countries there are few people who could pull this off, and would need to
activity be masking your identity.

------
jstewartmobile
Tweeter is obviously not an engineer. In engineer land (licensed, that is),
entrusting the design of components with life-and-death consequences to a
single person--without review--is malpractice.

We get away with that kind of thing in software shops because a) we're
relatively new, b) rarely deal with life-and-death designs, and c) haven't
racked-up a large enough body count ( _empirically speaking, rather than
morally_ ) to warrant regulation.

Give Uber & Tesla a few more years of running people over, and engineer-style
licensing for certain types of software development will probably be in the
mail.

~~~
ghaff
>engineer-style licensing for certain types of software development

It was. There was a PE for software engineering until relatively recently in
the US but no one took the exam because it basically wasn't required for
anything.

Be careful what you wish for though. The requirements typically include a
formal degree and some number of years working under a PE.

There's nothing magical about such a certification though. Other than the
education and experience requirements, it's pretty much a GRE-type exam. I
took the engineer-in-training exam way back when in a different engineering
field but I stopped practicing before I sat for a PE.

~~~
ThrowawayR2
> _no one took the exam because it basically wasn 't required for anything_

No, no one took the exam because it was effectively impossible.

To become a PE, first the candidate has to pass one of the Fundamentals of
Engineering exam to become an engineer-in-training. Except, whoops, there
wasn't ever a software specific FE exam; the most relevant one is the EE/Comp.
E. exam. Take a look at the list of topics: [https://ncees.org/wp-
content/uploads/FE-Ele-CBT-specs.pdf](https://ncees.org/wp-content/uploads/FE-
Ele-CBT-specs.pdf) Most developers aren't going to pass that even with a CS
degree.

Secondly, you need 4-8 years of supervision by a licensed engineer. Again,
whoops, there are barely any software developers with a PE license, so who
would they get to supervise them?

Only then do you get to take the PE exam for software engineering. Frankly,
the situation was so absurd that one has to suspect that NSPE didn't want to
certify software developers as PEs.

------
Animats
Is the code available, after all these years?

~~~
saagarjha
I don’t think it was ever publicly released, unfortunately.

------
tsukurimashou
Deaths are clearly on the company, not on the programmer guy

------
SteveSmith16384
Does this programmer actually exist? Sounds like they have "invented" a person
to blame, who, strangely, people can barely remember.

------
microtherion
I read that 1993 THERAC-25 article when it first came out. One factoid which
stuck with me was that after the accidents, the manufacturer, AECL, retreated
to their core business: "AECL's primary business is the design and
installation of nuclear reactors."

------
murphy214
How possible is it that the only programmer was a minor and that's why we
never got any testimony or explanation? I've heard stories of companies or
governments hiring extremely young programmers to do some pretty serious work.
(John Romero comes to mind)

Edit: It would also explain the lack of provable credentials.

------
joncrane
I studied this in a computer class in college. I can't remember if it was a
computer testing class or a computer ethics class. But it was the first thing
we covered and the point was, as you embark on a career writing software, BE
CAREFUL!

------
chasd00
turning the Twitter hounds of hell loose on the guy/woman seems like a good
way to fail the responsible conduct part of their "Responsible Conduct of
Research" class.

------
codingdave
If the lawsuit ended in settlement before they were found, there is nothing
unusual here. There is no reason to look after that point. They settled.

------
im_with_stupid
Why would someone assume that the programmer was to blame? The programmer just
does what the managers tell him to do. Why should safety or security be about
blaming the low man on the totem pole? It's totally ridiculous when you read
accounts like that of the NASA engineer. [https://www.npr.org/sections/thetwo-
way/2012/02/06/146490064...](https://www.npr.org/sections/thetwo-
way/2012/02/06/146490064/remembering-roger-boisjoly-he-tried-to-stop-shuttle-
challenger-launch)

~~~
99052882514569
Where did someone assume that the programmer was to blame? How do you find out
if the High On Totem Pole Man is to blame if you can't depose the Low On Totem
Pole Man? You can't. You are forced to take the High On Totem Pole Man's word
for it.

~~~
im_with_stupid
The man over others is the responsible person. If you don't have this, you
have no reasonable reason to be incorporated.

~~~
criddell
If your doctor harms you, you don't sue his boss for malpractice, do you?

~~~
ivolimmen
But a doctor is not tolled how to perform an operation. Whereas a developer in
most all cases is. Just like in the emissions scandal of VW.

~~~
criddell
If a doctor is told to do something they know will harm a patient, they have
to refuse or they are guilty of malpractice.

Engineers at VW were found guilty:

[https://www.wsj.com/articles/volkswagen-engineer-
sentenced-f...](https://www.wsj.com/articles/volkswagen-engineer-sentenced-
for-role-in-emissions-fraud-1503676373)

------
gcb0
This is how most factories operate.

Hire someone to design a plastic mold, pay for the work, never hear from that
person again.

It is insane to expect anything more than a tax code, that is only retained
for some few years in some dusty finance department file cabinet.

There is no source control, design history files, CD/CI, etc in a factory,
i.e. 99% of small to medium business. Silicon valley and fintech are the
exceptions, even today, let alone when that happened.

Also, it is a bunch of old timers recommending one another for work. The
people on the floor and owners definitely know the person, but will not tell
unless they have to.

