Hacker News new | past | comments | ask | show | jobs | submit login
The programmer behind the THERAC-25 Fiasco was never found (twitter.com)
149 points by whytai 9 days ago | hide | past | web | favorite | 121 comments





Exacting revenge on people who make mistakes is the wrong way to improve safety. Even the best people make mistakes. Fixing the process is how safety is improved.

All revenge does is halt progress on new technology, and cause all involved to cover up problems rather than expose and fix them.


A chilling example of this from another field: https://en.wikipedia.org/wiki/2002_%C3%9Cberlingen_mid-air_c... - here also, the air traffic controller who was responsible for the two planes that collided was singled out (and later murdered by the aggrieved father of a victim), but arguably the lax procedures used at Skyguide were also a major factor in the accident.

Wow, and the murderer was released in 2 years due to his "mental condition", and 9 years later received a (unrelated?) award that represents, among other things "for educating the younger generation and maintaining law and order". Talk about a miscarriage of justice!!!

> He was released in November 2007, because his mental condition was not sufficiently considered in the initial sentence. In January 2008, he was appointed deputy construction minister of North Ossetia.[32] In 2016, Kaloyev was awarded the highest state medal by the government, the medal "To the Glory of Ossetia".[22] The medal is awarded for the highest achievements, improving the living conditions of the inhabitants of the region, for educating the younger generation and maintaining law and order.[33]


Where I work, we have a very strong culture of blameless portmortems.

But that’s not the same thing as not knowing exactly who were present or involved when things happened, nor is it the same as not obtaining their assistance and testimony in reconstructing what happened.

This person should have been tracked down, 100%. Then having tracked them down, the report should avoid blaming them. It might even be useful to anonymize them after obtaining their testimony, to avoid harassment.

But if we aren’t interviewing them just to avoid blaming them, we have a very, very big meta-problem that needs to be solved.


In general I agree with you on this, going after people motivates them to cover up or obfuscate mistakes, or refuse to participate in investigations, and this generally leads to worse safety.

However, it seems notable that they couldn't find the one person who actually knew the details of this situation. I suppose that in that era people could just move to a new place an be extremely hard to track down, but that ability to 'move away' and be hard to track down could also have been used as an attempt to lead to a more beneficial settlement. If this anonymous programmer were to testify that 'management was aware' of potential problems in the software, but insisted on shipping it anyway, it is the difference between negligence and gross negligence. The potential punitive damages would be dramatically increased in the latter case, and it would be 'cheap' to pay someone to hide to avoid this.


Yes -- and, what's gone wrong with the process? I'll defer to subject experts on the T-25, but most such disasters (737Max, Shuttle, bridge collapses) combine some noxious brew of:

- Wishful cost and schedule "estimates" as a precondition for project approval.

- Feigning of current technical expertise as a job requirement for management.

- Hiding of real issues by lower management, for fear of a "shoot the messenger" reaction by upper management.

- Regulatory capture.

- The allure of the "plausible deniability" defense, ever-rising with ascending level of management.

The highest levels of management involved are of course boards of directors, and national legislators. And when was the last time any of us heard a "mean culpa" from either of those?


This 100%. Paul Ross has given some good talks on this topic. One is recorded here: https://pyvideo.org/pycon-uk-2017/blame-and-the-fallacy-of-r...

> Exacting revenge on people who make mistakes is the wrong way to improve safety. Even the best people make mistakes.

And everyone, not just the best people, know this. Which is why, in a blame-oriented culture, everyone will just sit back and watch a colleague walk off a cliff. If you're not already involved, then getting involved risks painting a target on your back.


One of the things the author fails to mention in the tweet summary, is that the same software was in use on a previous version of the machine and had absolutely no problems.

I believe (from memory) the previous version had hardware interlocks that masked the issue and the T-25 did not have the hardware interlocks installed. This lead to a situation where the software was viewed as heavily tested and therefore trusted, even though it shouldn't have been.

I've always seen this as an example of why physical/hardware interlocks are really important when you're mixing software with hardware that can easily hurt people.

I'm also always amazed by how few people seem to know about the Therac-25 incident, especially people that work in therapeutic radiation roles (in the UK anyway).


> I've always seen this as an example of why physical/hardware interlocks are really important when you're mixing software with hardware that can easily hurt people.

Not only that, but running in to the interlock should be considered a notable event. The machine should not just continue operating as normal, it should be clear to the operators that something potentially dangerous has occurred and should be investigated.

It seems like they had just assumed that because no one had managed zap someone with the previous models that the software must have been perfect, even though the previous models had hardware interlocks preventing the dangerous scenario. Those interlocks had presumably been tripped many times, just no one ever brought it to the attention of the vendor.

If a system trips a safety interlock it should fail to a safe configuration and remain there until reset by someone capable of investigating why it was tripped in the first place.

Modern traffic lights are a good example of doing it right. In those cabinets you see at every intersection, right next to the traffic light controller will be a device called a conflict monitor. This device will be wired to the circuits feeding the light heads themselves. If two conflicting movements are indicated for whatever reason, be it a failure of the controller, a short in the wiring, etc, the conflict monitor will trip and set the intersection to a fail-safe mode (usually either all-red blink or yellow blink for a main road with reds elsewhere) until manually reset by a human.

---

> I'm also always amazed by how few people seem to know about the Therac-25 incident, especially people that work in therapeutic radiation roles (in the UK anyway).

That's interesting, at least amongst my "techie" friends it's common knowledge. Many of us who went to college for computer science related things had it used in one class or another to get across the point that bad software can kill people in really unexpected ways.

I guess maybe the medical side of things doesn't find it worth as much attention because they don't have as much to learn from it.


We have physical interlocks all the time and they make intuitive sense to us.

If you park you car, and crank the steering wheel all the way to one side, it won't start unless you put it back. If you have an automatic transmission vehicle, if you turn the key without pushing the brake, it won't start. If you have a manual, it won't start unless you push the clutch in. (This didn't used to be true. Citing Ferris Bueller.)

Lots of stuff goes wrong when the hardware people depend on the software people for correctness and the software people depend on the hardware people for correctness. Insert <Group A> and <Group B> for hardware and software. Could easily be writers and editors in journalism.


> I've always seen this as an example of why physical/hardware interlocks are really important when you're mixing software with hardware that can easily hurt people.

Agreed. I was a developer on a medical instrument where the decision to implement a hardware interlock was made only after we were years into development and months away from product release. Initially, the belief was that we could get away with software-monitored interlocks but then the lead HW engineer realized it would be an uphill battle to get UL to certify the machine as safe. UL & OSHA want to see "hard interlocks:" systems where opening an interlock immediately cuts power and eliminates the Hazard without anything else in the control chain.

The problem was that the easiest way for the hardware engineers to implement this: by cutting power to all the controllers, was also the hardest way to manage in software.

At any given moment, with dozens of motors and actuators processing commands, all I/O controllers could have their power cut when an operator opened the cover. This meant retrofitting graceful failure tolerance (previously our code expected that these events would mean a hard fault and a system shutdown) to every single I/O command, resulting in changes of thousands of lines of code, and months of work to implement and review the changes.

In the end it was the right thing to do: the machine was rendered safe as soon as an operator opened the cover while system software kept the non-affected parts of the machine running as best it could. As wolrah suggests in the sibling comment, this was an exceptional event: the only way to recover was to acknowledge the fault and put the instrument through a restart procedure.



The THERAC-25 is a classic example taught in software engineering and usability engineering courses. I use it as an example in one of my undergraduate lectures.

I think you are right that I’ve never heard about it in any other contexts though.


What? No, the Therac-20 had exactly the same bug, that was triggered just as frequently, however it had a hardware guard preventing the bug from killing people.

"...The man (the authors seem to know it was a man, at least) who wrote the software for this machine did so alone, without documenting what he was doing. The company then sort of vaguely tested it."

Lone wolf programmer, no docs, vague testing. What kind of manager(s) would let this out the door knowing this? Never mind the lone wolf programmer. Find the managers and beat them with sticks


There is a quote from Chesley Sullenberger (The pilot that landed an airliner in the Hudson river):

"Everything we know in aviation, every rule in the rule book, every procedure we have, we know because someone somewhere died... We have purchased at great cost, lessons literally bought with blood that we have to preserve as institutional knowledge and pass on to succeeding generations."

We forget that often process and procedures which we now take as "common sense" or "bare minimum" were introduced as a best practice exactly because someone somewhere made a mistake that would have been avoided with those procedures in place.

Another thing I think is important to bring up, is that in software teams we often think that procedures are in place for 'other' complacent people or inexperienced juniors.

The reality is that procedures enforce consistency. And that consistency is needed for you as well not just for 'others'.

Today you write something elegant, tomorrow you could have trouble at home, be sleep deprived, pushed for a deadline, pressured by management and suddenly you in that instance become the 'others'.

Procedures in the end eliminate any wiggle for negotiable 'business compromises' or relaxing quality 'just this once'.


Hang on, pardner!

We agree that management was faulty. So why do we give them the presumption of good faith by taking their word for how the programmer operated?

We haven’ heard from the programmer, just from people who are covering up their identity. I’m not saying the following is likely, but it’s possible:

What if the programmer argued with them that they needed to do more testing and allow more time for development? Or that their should be a budget for a redesign, rather than cobbling the -25 from the bits and bobs of the -6 and -20, but management rushed it into production over their objections?

Then people die, and the programmer quits.

Management goes on to settle while being careful to make it impossible to talk to the programmer, who may very well have a lot to say about management.

We agree that management is at fault. Why take their word for it that the programmer operated without documentation? Why take their word for it that the lack of testing was the programmer’s choice?

Maybe the programmer produced a huge document explaining why the product should not be shipped, and management buried it to save their own skins?


If I remember, the earlier systems had a hardware lock that prevented an overdosing for whatever reason, including software fault. Sure, the software was still faulty, but don't forget if software was developed for a particular hardware such that a certain concern could be discounted (even if it ought to have been tested for) - is it really the developer's fault when the manufacturer (presumably for economic reasons) decides to remove safety features and increase the risk surface?

According to Wikipedia, the new model was actually deemed safer in an audit on the grounds that unlike a mechanical lock, the software could wear out or get damaged. So it seems more likely that people generally had a very mistaken view of software reliability.

I assume you mean software could NOT wear out or get damaged

correct

Back then? Everybody. There was no Agile, few Computer Science degrees, mostly Cowboy programmers with little formal training. Managers knew nothing of these new computer devices, they just hired a six-pack of programmers and set them to work.

I worked at that time in the industry. Half the people I worked with were musicians getting some extra dough by programming.


... and some of those folks became or already were genius programmers.

At the time, programmers could be divided into "corporate" types and "computer nerds", people like the ones who founded Apple or the various software firms of today. Software wasn't an industry, it was a function within companies that did other things. You needed software to do useful things with computers and even to boot them up, so you had someone write it.

Not knowing the identity of the person who wrote this code is NBD. He likely never even knew there was a bug or problem, and if he did he couldn't be held responsible in any way, nor could the managers of the time.

If anyone should be held responsible, it would be the FDA for not getting ahead of the technology curve at the time and regulating computerized devices better than they did sooner than they did.


Is their name really important? The deaths weren’t from a bad programmer, they were from a lack of software quality methodology. The entire field of software quality assumes that all programmers will write bugs.

No, it's not. I think it would even have been counterproductive to find that programmer. People would have latched onto "see, it was a bad programmer!" and the real problems could have been forgotten.

The investigators did the right thing, they focused on the systematic problems. And the coworkers also did the right thing by not providing someone as a scapegoat.


I would think it's easier to blame someone you've never talked to than to blame someone you have talked to.

If they had interviewed him, he might have revealed systematic failures in many other parts of the company. But if he's a nameless individual who can't say anything, he can be a silent scapegoat.


[flagged]


Why would you be willing to testify on the shortcomings of your work if you’re going to end up as blurb taken out of context alongside your face as “the Therac-25 murderer” in the news?

It's not important in the sense of "name this guy and run him out of town".

But it would be nice if an investigation was completed, and all the facts found by the FDA before it was concluded. Maybe they could have collected information from him in a non-blaming way that would avoid similar disasters in the future.


The name is not important. The fact, that they were not found is telling though. Was there no track record of any kind?

Edit: It seems there was, as in he was developing alone and the company didn't disclose his identity. Puzzling.


The implication is not that nobody could track him down, but instead that nobody in a position to do so did so.

No but it will generate some sweet delicious outrage that can be used by the media to divert an important issue.

If I was hurt by some bad code, I’d want to know who wrote it.

Bugs happen, but reasonable steps should be taken to prevent them. Developers share that responsibility with management, whether they like it or not.


Like Vitaly Kaloyev? And then you'd come at their door to chat about life and death, with a knife in a pocket.

I would have asked why rather than jumping to accusations, but I too was reminded of the murder of Peter Nielsen.

The story of that series of tragedies: https://en.wikipedia.org/wiki/2002_Überlingen_mid-air_collis...


Which part of the "bad code?" The actual line that failed? The code that called it which didn't check a return value? The code below it that always returned the same value regardless of success or failure? The fault in the OS that allowed a race condition because it didn't lock processes properly?

Or did the failure happen because the code that was written for an 8-bit controller is now run on a 32-bit controller and no one realized that?

Perhaps you'd want to bring in the Test Engineer who verified that the particular feature passed? Why didn't they do their job? How about the Senior QA Engineer who wrote the test cases?

Do you also want to know who wrote the Requirement that the code met? Maybe the code did exactly what the Requirements said, but the Requirement was poorly written.

Point is, failures have to be analyzed on a Systems basis. Simply looking at a line of code can be completely meaningless and miss the big picture. And yes, each of the above failures is something I've come across in my career.


People are responsible for their actions. There were many things wrong with this project, but one of them was fatally bad code, which this guy took money to write. Having him be named strikes me as the very minimum in accountability.

I think it's also important to name him, to interview him. To understand how he came to kill. So that anybody writing life-critical code today can say, "I'd better not end up infamous like that guy. I can't make the same mistakes."

Maybe if he had been held accountable, the programmers at Uber wouldn't have been lax enough to code up the negligent homicide of Elaine Herzberg.


> There were many things wrong with this project, but one of them was fatally bad code, which this guy took money to write.

If I remember correctly, the bug in this case was a race condition between "normally" running code and an interrupt handler. The race was only triggered if an interrupt happened in just the right window between two instructions. I'd be willing to bet that 99% of programmers, if simply given the code and asked, "Is there a bug here?" would have answered "No".

Should people writing operating-system-level medical equipment software be required to have a basic training about race conditions and how to prevent them? Yes. Is it fair to expect a random engineer who was ordered by his company to rewrite the code to work without a hardware interlock to know what he doesn't know? No.


> Is it fair to expect a random engineer who was ordered by his company to rewrite the code to work without a hardware interlock to know what he doesn't know? No.

Looking back at when I did Safety Critical Systems at Uni way back in another era, the most important points that we got out of the Therac-25 case study were not to do with bugs at all, but to do with the deficiencies of the system architecture and methodology, especially the decisions that led up to it.


It is 100% fair to expect people working on life-critical systems to either a) know that they are competent to work on life-critical code, or b) to not work on life-critical code.

The guy wasn't ordered to do this at the point of a gun. His bosses asked him to do something, he said yes because he liked the money, and people died. The bosses are also responsible, but that doesn't mean that his negligence didn't kill people.


Is said "random engineer" an engineer? No. He's a code monkey and should be expected to be paid like a code monkey.

Obviously we don't know who wrote the code, so we can't ask him, but did he even know the consequences of what he was writing? Was he even aware of the potential lethality?

The very fact that we don't know who wrote the code is exactly why it doesn't matter. The complete lack of traceability and accountability is what caused these deaths.

There's so many things that went wrong in order for Therac 25 to kill people that it's irrelevant who wrote the code. It could've just as easily been you or me.


> Was he even aware of the potential lethality?

It could be worse than that. He might not have even known the actual purpose and complete function of the system for which he was writing code.

That's common when big projects get outsourced. One might call it "decomposing the problem into small tasks" but it ends up with people writing code for machinery that they have no knowledge of. I've seen this myself. These kinds of projects depend upon layers and layers of oversight and process.

I think that the folks who are looking for "a perpetrator" are utterly missing the key-value of studying the Therac-25 case study.


I am personally not looking for "a perpetrator". Many people share responsibility for a failure like this. But that's all the more reason that we should clearly describe who did what, when, and why.

Please note that you are working very hard to make up a fantasy situation that entirely exonerates the person like you. Your fantasy does not match the known facts. Think hard about why you think that's the most important thing for you to do here.


He knew or should have known the potential consequences of the code he was writing. Maybe this could have been you, but it definitely wouldn't have been me, because I would not have committed the first line of code until I was reasonably sure that a) I was not going to kill somebody accidentally, and b) the project had sufficient safeguards.

I'd also like you to note that you're pointing out the lack of accountability as a problem, and then dismissing one of the fundamental mechanisms of accountability: naming people who harm others.

Yes, other people on the project also should have pushed for traceability and accountability. But responsibility is not zero sum. "I was just following orders" is never an excuse for killing people.


A culture of blame leads to programmers of safety critical software hiding their mistakes.

On a project? Sure. I'm a big fan of blameless retrospectives as a way to maximize quality. But that only works in the context of a quality-oriented culture that is strongly focused on continually improving safety.

However, a) none of that was true here, and b) in the wider societal frame, I think blamenessless must not always trump accountability. When we're talking about egregious negligence leading to death, I think naming the culprits is the very least we should do.


I can see why you would say that, but isn't classical engineering a counter example? Professional engineers have a culture of accountability and responsibility. No system is perfect, but (to my eyes at least) classical engineering works safely without a culture of blame.

Preferable to the "mistakes were made" fog bank of diffuse nonblame. People respond to incentives whether cash or heads on pikes.

A culture of refusing responsibility is better how?

The opposite of a blame culture is not a culture of refusing responsibility, but a system where you look for problems in the process when a fault makes it into the final product. It's always a mixture of specification, implementation, and QA that makes mistakes possible. Putting the blame on the engineer is not very helpful, it does little to prevent future problems. Trying to figure out how it was possible that an implementation bug made it through QA allows you to prevent similar mistakes in the future (at least if you're lucky).

Just to be clear, I'm not putting the blame on the engineer. Many people are responsible for this failure. What I'm saying is that the engineer is one of the responsible people, and that as fellow engineers we should be especially concerned with making sure that responsibility is correctly performed and has appropriate accountability.

But why you want to keep him accountable and not the ones who did the testing and released the product?

I never said he should be the only one. Responsibility is not zero sum. But as a professional software developer, I am pointing out the professional responsibility we all have.

IMHO whoever decided to let out the hardware protection, present in older machines, is the 'culprit'.

[flagged]


Typical code cowboy mindset: expects all of the glory, none of the responsibility. Kindly piss off and don’t come back until you’ve learned proper systems design.

https://en.wikipedia.org/wiki/Fail-safe

Self-driving software might find its way around a controlled test track okay but is nowhere near fit to be in control on public roads. Safe way to test it the is to have human drive and have the software simulate the same drive, then compare the two for discrepancies. Rinse and repeat until the software consistently equals or betters human decision-making. Then you can consider putting the software in charge.

Programmers are supposed to be the technical experts in the room. So if they aren’t pushing back against ignorant uninformed hubristic Management’s bad decision-making and making it right, how the hell do they think their software will do any better? Fucking Worse than Useless, the whole cowardly bloody bunch.


I don't think you understand the basic concept let alone how the self driving systems are comprised and how large software development projects work. The uber car wasn't self driving. It required a human occupant to watch the road at all times. That is certainly not on the developers.

You appear to be laboring under the gross misassumption that the human is there to take control when the machine fucks up. They are not.

They are there to take the blame.

You cannot realistically expect a human spectator to switch modes at zero notice and save the machine from a deadly fuckup that the machine has already put them into. That’s not how human minds work. We just don’t bootstrap that fast. Jebus, we’re bad enough at salvaging bad situations we’ve knowingly manoeuvered ourselves into while already fully engaged in command and control mode.

At least when you—the human pilot—fucks up, you already know the decision chain that got you there because it is your own. With the machine you have first to determine it has gone catastrophically wrong, then determine how it has gone wrong, and finally calculate and execute the recovery strategy before…oh, whoops, too late: you just wiped out all the executive bonuses for this quarter. Also, there’s a blood streak on the street.

As for how large software development projects work, I think the trail of corpses already testifies more than suf as to how they don’t. Industrial institutionalization of incompetence is no defense, and any “professional” who hides behind it can go get fucked.


Exactly. Other companies knew that kind of attention and reaction was beyond human capacity, which is why they had two safety drivers. Which I'm still not sure is sufficient to compensate for the other sorts of negligence Uber staff indulged in here.

As much as I love the web, I think the move-fast-break-things ethos, which is arguably useful for startups doing who-cares-if-it-breaks things like social web front end tweaks, has been absolutely terrible for the industry more broadly. I have friends who make excellent money just sweeping up after the elephant parade of hotshots, solving infrastructure and code issues written by people who want to get paid like professionals without acting like ones. I'm glad for them, but the waste is maddening. And that's before we get to the body counts of places like Facebook and Uber.


That is not what the NTSB said.

Responsibility is not zero sum. The safety driver is responsible. But so are the people who sent out robot cars with a single safety driver, as they knew or should have known that paying attention in low-interaction situations for many hours in a row is not something humans reliably do.

The Uber managers and execs are also responsible, in that they set up the system that led to needless death.

But none of that absolves the programmers, some of whom did things that they knew or should have known were dangerous, and who did not make sure that the system they were committing code for was set up for proper safety.

Your notion that the only job of programmers is to meet the spec is one I deeply disagree with. We're not Amazon warehouse workers, desperate for a job and blindly following whatever orders come our way. We're highly paid professionals whose job is to understand what we're building and what effects it has. I think that's true for any sort of coding, but I believe it's very obviously true for life-critical systems. If we can't handle the responsibility, we shouldn't cash the (quite large) checks.


I agree, partly. They were testing the vehicle, and having a human in the driver's seat to take over in the event of a problem is a major component of that testing. That person likely became complacent in their duties due though, likely because the software seemed to work well enough.

The failure here was to expect someone to stay alert for a rare event for long periods of time. They probably should have had a more active task that would have kept them more alert.


There is the assumption that this is to protect the programmer -- however I feel it's much more likely it's there to protect others.

As others have pointed out, this was a systemic failure. Pointing out the individual highlights those upstream.


Pointing out the individual highlights those upstream.

Exactly. Look at Experian. Attempted to blame an individual engineer for not applying a patch, then the next thing they know, the entire world is asking, why is your CISO a music major?


> why is your CISO a music major?

The implication of that question is nonsensical. Experian had/has many problems, but the fact that their chief security officer got a degree in music decades earlier wasn't one of them.

Some of the very best programmers I know were music majors - it's actually not uncommon at all. I mean, Steve Jobs only went to college for 1 semester, and afterwards he audited creative classes like calligraphy. Is the implication that he wasn't qualified to be CEO of Apple because he didn't have a degree in management?


The implication of that question is nonsensical. Experian had/has many problems, but the fact that their chief security officer got a degree in music decades earlier wasn't one of them

You’ve managed to completely miss my point about attempting to blame an individual engineer backfiring and hitting senior management. No one would have cared what her degree was in otherwise. But it became a stick to beat the organisation with.


Equifax, not Experian

The developer in question has probably lived some pretty bad 40 years with nightmares about the issue. They might be 60 now, or 80, or already passed away. Only bright spot was probably the fact that they were not known and thus it didn’t have a wide social effect on their life. I have a hunch that last part is about to end in a few months.

It was one "John Smith". I almost feel as though it must have been many programmers and they pinned it on the guy who left before it blew up. No one wants that black mark on their record and it would be insane to think most people didn't remember anything about him.

From reading Nancy Leveson's paper, I get the impression she probably does know the programmer's name, but little or nothing beyond that, and has chosen not to publicise his name. (She probably thinks that is the ethical course of action.) The lawsuit depositions to which she refers, quite likely include his name, but not much else. You would think you would remember the name and gender of a former colleague, but that doesn't mean you could answer any questions about his past work experience or qualifications – most of us know very little of our current colleagues' past work experience or qualifications – even when they tell us, that kind of information often isn't very memorable.

Since the real solution to the problem ended up being a hardware solution (a rate limiter) I have trouble believing the culprit is the "guy who wrote the code".

Boeing is having these same question asked about them and I wouldn't be surprised if their "solutions" would essentially be hardware solutions.


The solution was to fix the software, not add hardware. A piece of hardware was added to the system to overcome software errors, but that was not the solution.

The software was badly broken, and had been for years. The difference between the 25 and previous models is that previous models had interlocks that did not allow the software errors to manifest.


re: the actual bug - it took a long time for people to reproduce it and figure out what it was.

This article gives an overview of the bug:

https://www.bugsnag.com/blog/bug-day-race-condition-therac-2...

A much longer paper about the system here (linked from the OP twitter thread)

https://web.stanford.edu/class/cs240/old/sp2014/readings/the...


It is unbelievable to me that nobody knows who this was. It makes me wonder if somewhere out there is some software engineer, nearing the end of their career, who carries this in their conscience. Do they know about it? It seems it is such a famous case and the medical device industry so small that surely they must.

> The programmer behind the THERAC-25 Fiasco was never found

It was known.

It's a shame we live in such a vengeance culture what we really want is for it to be public.

People die weeky in most hospitals because doctors and nurses don't wash their hands.

Get over THERAC, is saved lives and the programmer helped with that.

From Reddit AMA -

"My teacher does know the name, but is bounded by the courts to not release it. He knows the programmer is living in guilt and did say that he has left programming as his career. Although, it was not entirely his fault, as my teacher explained, the necessary software development process for a machine like this was not there, and no checks were in place.

tl;dr Cannot be revealed, but wasn't entirely his fault."


“wasn't entirely his fault”

Nobody says it was; such disasters are often multifactorial. But given to his position that person holds key knowledge and insights into what went wrong that no-one else has. Without access to that information, investigators can only hypothesize.

This is why things like whistleblower laws and indemnity insurance exist, to enable the full and unvarnished truth to come out. How are errors meant to be fixed correctly when you don’t have all the information as to the cause?

Compare how air crash investigations work. Or research into procedural improvements to hospital hygene. There are things far more important than just finding people to blame.


> People die weeky in most hospitals because doctors and nurses don't wash their hands.

And hospitals and doctors alike do get sued for medical errors and negligence - especially lethal ones. "But I saved 99 other lives" is not really a good argument, it's not a game of points.


How many years has an attorney prosecutor to investigate this cases? I mean both Canada and USA.

In my country, if something like this is considered violating human rights, then the general prosecutor could re-open the case, it doesn't matter how many years after (this was intended for investigations on the disappearances during last dictatorship, but I think our constitution abides my reasoning if something like this happens).


I think the search for the programmer was not about blaming him but to find information about his background and question him about the actual software development process.

In any case, the only person who has all the details about the development towards this incident is the guy that wrote the program. He is the only one that can shed any light into this. I think it's worth finding him even nowadays.


Funny that git chose "git blame" as the name for that piece of functionality.

Isn't that just an alias for `git praise`? In any case they're on equal footing.

It's the other way around. Blame was the original, and did not initially have a polite counterpart.

No news here. This 21-year-old college report [0]from 1998 (Porrello, cites include 1993 Leveson, Nancy G., and Clark S. Turner.) has -a lot- more facts without the hysteria. It states:

"<b>One</b> programmer, over several years, revised the Therac-6 software into the Therac-25 software (AECL has not released any information about the programmer or his credentials)."

[0] https://web.archive.org/web/19980201101244/http://cobra.csc.... - "Death and Denial: The Failure of the THERAC-25, A Medical Linear Accelerator"

Previous Therac models had hardware interlocks to prevent some modes; they were removed in favor of software for the 25. No doubt there were some engineers who knew more about this.


This is amazing - I would have thought that any medical equipment would go through exhaustive testing and an order of magnitude more for systems involving radiation and a machine capable of lethal doses.

At my first job we even had a separate safety officer for our low powered sources used for tracing waterflow.


Part of the problem is likely that the AECL is a quasi-public entity, and governments tends not to be nearly as vigilant when regulating or policing other branches of the same government as they are when regulating a 100% privately owned company.

Yes I saw that relying of crown immunity

A lot of the regulations around safety critical software came about because of the Therac-25 accidents.

Such a fatal failure is never the result of one person making a mistake.

It's always the result of many mistakes piled one atop the other, and you'll always find a bean-counter on the top adjusting an Excel spreadsheet somewhere to make the numbers come out in a way that pleases some executives.

Did they ever find the name of the accountant behind the fiasco?


Perhaps it was a relative or friend of someone way high up in management. Otherwise he or she would have been thrown to the wolves.

Could be. Alternatively, he might have had a long paper trail of dissent on the topic of removing the hardware interlocks and they just wanted to keep him off the stand.

My first job was working for a software company that had killed people in the past. This was part of the London Ambulance scandal in the early 90s. The official inquiry had mostly exonerated the company.

It has a crazy impact on corporate culture, it was rarely talked about except in hushed tones over beer, the management was extremely averse to any publicity, no press contact for any reason (compare to my next employer putting out empty press releases at least weekly if not more often), sales staff had an extensive playbook for not answering questions about it

I can understand an individual developer wanting to disappear in this scenario. If they had internalized blame for this, I can certainly imagine them choosing never to work in the industry again (or making more extreme choices)


I think its amazing to think it wasn't so long ago you could actually disappear by simply keeping out of the spotlight. Now in most western countries there are few people who could pull this off, and would need to activity be masking your identity.

Tweeter is obviously not an engineer. In engineer land (licensed, that is), entrusting the design of components with life-and-death consequences to a single person--without review--is malpractice.

We get away with that kind of thing in software shops because a) we're relatively new, b) rarely deal with life-and-death designs, and c) haven't racked-up a large enough body count (empirically speaking, rather than morally) to warrant regulation.

Give Uber & Tesla a few more years of running people over, and engineer-style licensing for certain types of software development will probably be in the mail.


>engineer-style licensing for certain types of software development

It was. There was a PE for software engineering until relatively recently in the US but no one took the exam because it basically wasn't required for anything.

Be careful what you wish for though. The requirements typically include a formal degree and some number of years working under a PE.

There's nothing magical about such a certification though. Other than the education and experience requirements, it's pretty much a GRE-type exam. I took the engineer-in-training exam way back when in a different engineering field but I stopped practicing before I sat for a PE.


> no one took the exam because it basically wasn't required for anything

No, no one took the exam because it was effectively impossible.

To become a PE, first the candidate has to pass one of the Fundamentals of Engineering exam to become an engineer-in-training. Except, whoops, there wasn't ever a software specific FE exam; the most relevant one is the EE/Comp. E. exam. Take a look at the list of topics: https://ncees.org/wp-content/uploads/FE-Ele-CBT-specs.pdf Most developers aren't going to pass that even with a CS degree.

Secondly, you need 4-8 years of supervision by a licensed engineer. Again, whoops, there are barely any software developers with a PE license, so who would they get to supervise them?

Only then do you get to take the PE exam for software engineering. Frankly, the situation was so absurd that one has to suspect that NSPE didn't want to certify software developers as PEs.


That wasn't a wish. It was a warning.

Is the code available, after all these years?

I don’t think it was ever publicly released, unfortunately.

Deaths are clearly on the company, not on the programmer guy

Does this programmer actually exist? Sounds like they have "invented" a person to blame, who, strangely, people can barely remember.

I read that 1993 THERAC-25 article when it first came out. One factoid which stuck with me was that after the accidents, the manufacturer, AECL, retreated to their core business: "AECL's primary business is the design and installation of nuclear reactors."

How possible is it that the only programmer was a minor and that's why we never got any testimony or explanation? I've heard stories of companies or governments hiring extremely young programmers to do some pretty serious work. (John Romero comes to mind)

Edit: It would also explain the lack of provable credentials.


I studied this in a computer class in college. I can't remember if it was a computer testing class or a computer ethics class. But it was the first thing we covered and the point was, as you embark on a career writing software, BE CAREFUL!

turning the Twitter hounds of hell loose on the guy/woman seems like a good way to fail the responsible conduct part of their "Responsible Conduct of Research" class.

If the lawsuit ended in settlement before they were found, there is nothing unusual here. There is no reason to look after that point. They settled.

Why would someone assume that the programmer was to blame? The programmer just does what the managers tell him to do. Why should safety or security be about blaming the low man on the totem pole? It's totally ridiculous when you read accounts like that of the NASA engineer. https://www.npr.org/sections/thetwo-way/2012/02/06/146490064...

Where did someone assume that the programmer was to blame? How do you find out if the High On Totem Pole Man is to blame if you can't depose the Low On Totem Pole Man? You can't. You are forced to take the High On Totem Pole Man's word for it.

> Where did someone assume that the programmer was to blame?

I agree higher-ups bear the brunt of responsibility, but the title of this post currently says "the programmer behind the THERAC-25 Fiasco was never found"


The man over others is the responsible person. If you don't have this, you have no reasonable reason to be incorporated.

If your doctor harms you, you don't sue his boss for malpractice, do you?

But a doctor is not tolled how to perform an operation. Whereas a developer in most all cases is. Just like in the emissions scandal of VW.

If a doctor is told to do something they know will harm a patient, they have to refuse or they are guilty of malpractice.

Engineers at VW were found guilty:

https://www.wsj.com/articles/volkswagen-engineer-sentenced-f...


One of the reasons I think the title Software Engineer should be protected as a professional designation (P.Eng) is to remove arguments like yours. If that lone developer were a real Software Engineer then he would be absolutely be guilty of malpractice. An Engineer can't escape culpability because they were just following orders.

There's no law surrounding software engineer though, so we have no power over their own work. In civil engineering, if someone else takes your plan and still use them even though you still haven't signed them, they are the one responsible. You can do that in software engineering.

> There's no law surrounding software engineer though

There actually probably is a law where you live, but in almost all jurisdictions I don't think it's enforced. Parts of Canada were trying to protect the title Software Engineer but I think even they've given up now. It's unfortunately because now the title doesn't really mean anything.


> There actually probably is a law where you live, but in almost all jurisdictions I don't think it's enforced.

I'm from Quebec, there's no laws that surround specifically software engineer. There's some for engineers in general, but nothing surrounding software development. The title truly doesn't means anything for us as long as no laws are passed, which is why I don't pay for it either.


Actually Quebec is one place where the regulating body has been most successful controlling the use of Engineer as a title (and that isn't saying much).

https://www.canadianconsultingengineer.com/engineering/quebe...

I think Microsoft won on appeal, but I can't find anything on that right now.


> Actually Quebec is one place where the regulating body has been most successful controlling the use of Engineer as a title (and that isn't saying much).

I hate repeating myself, but here again: there's no laws that surround specifically software engineer.

I'm not arguing they doesn't have good laws around the engineer title, I even mention that I can't use the title of engineer because I don't pay for it. What I'm arguing is that there's no laws that surround specifically software engineer.

That title is meaningless and doesn't give us any power over our own works. It doesn't protect our works, thus it doesn't allow us to takes responsibility over it. In the case of the Therac 25, the software was done for the previous hardware and reused over the newer one. Even if the software engineer was against doing that, there's nothing in his power to stop that. I don't know if the electrical engineer has more power and could have done it in the Therac 25 situation, but as far as I know, only civil engineers require signature over their works to proceed.


How is it malpractice for a single programmer/software engineer when a critical part of the code should have code reviews, software design in advance of creation of code which is approved by management? Not to mention thousands of tests with radiation detectors. How can you possibly miss so many safety gates without management being in the wrong?

One of the reasons I dislike the term "software engineer" is because programmers hate to take responsibility. A professional would be willing to sign their name for the work or they wouldn't do it at all.

Of course management could be negligent and I never said otherwise. That still doesn't relieve any Professional Engineer of their obligations and responsibilities.

In this case the author of the software was almost certainly not a P.Eng. (and thus shouldn't be called a software engineer IMHO) and couldn't be found guilty of malpractice.


This is how most factories operate.

Hire someone to design a plastic mold, pay for the work, never hear from that person again.

It is insane to expect anything more than a tax code, that is only retained for some few years in some dusty finance department file cabinet.

There is no source control, design history files, CD/CI, etc in a factory, i.e. 99% of small to medium business. Silicon valley and fintech are the exceptions, even today, let alone when that happened.

Also, it is a bunch of old timers recommending one another for work. The people on the floor and owners definitely know the person, but will not tell unless they have to.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: