The investigators did the right thing, they focused on the systematic problems. And the coworkers also did the right thing by not providing someone as a scapegoat.
If they had interviewed him, he might have revealed systematic failures in many other parts of the company. But if he's a nameless individual who can't say anything, he can be a silent scapegoat.
But it would be nice if an investigation was completed, and all the facts found by the FDA before it was concluded. Maybe they could have collected information from him in a non-blaming way that would avoid similar disasters in the future.
Edit: It seems there was, as in he was developing alone and the company didn't disclose his identity. Puzzling.
Bugs happen, but reasonable steps should be taken to prevent them. Developers share that responsibility with management, whether they like it or not.
The story of that series of tragedies: https://en.wikipedia.org/wiki/2002_Überlingen_mid-air_collis...
Or did the failure happen because the code that was written for an 8-bit controller is now run on a 32-bit controller and no one realized that?
Perhaps you'd want to bring in the Test Engineer who verified that the particular feature passed? Why didn't they do their job? How about the Senior QA Engineer who wrote the test cases?
Do you also want to know who wrote the Requirement that the code met? Maybe the code did exactly what the Requirements said, but the Requirement was poorly written.
Point is, failures have to be analyzed on a Systems basis. Simply looking at a line of code can be completely meaningless and miss the big picture. And yes, each of the above failures is something I've come across in my career.
I think it's also important to name him, to interview him. To understand how he came to kill. So that anybody writing life-critical code today can say, "I'd better not end up infamous like that guy. I can't make the same mistakes."
Maybe if he had been held accountable, the programmers at Uber wouldn't have been lax enough to code up the negligent homicide of Elaine Herzberg.
If I remember correctly, the bug in this case was a race condition between "normally" running code and an interrupt handler. The race was only triggered if an interrupt happened in just the right window between two instructions. I'd be willing to bet that 99% of programmers, if simply given the code and asked, "Is there a bug here?" would have answered "No".
Should people writing operating-system-level medical equipment software be required to have a basic training about race conditions and how to prevent them? Yes. Is it fair to expect a random engineer who was ordered by his company to rewrite the code to work without a hardware interlock to know what he doesn't know? No.
Looking back at when I did Safety Critical Systems at Uni way back in another era, the most important points that we got out of the Therac-25 case study were not to do with bugs at all, but to do with the deficiencies of the system architecture and methodology, especially the decisions that led up to it.
The guy wasn't ordered to do this at the point of a gun. His bosses asked him to do something, he said yes because he liked the money, and people died. The bosses are also responsible, but that doesn't mean that his negligence didn't kill people.
The very fact that we don't know who wrote the code is exactly why it doesn't matter. The complete lack of traceability and accountability is what caused these deaths.
There's so many things that went wrong in order for Therac 25 to kill people that it's irrelevant who wrote the code. It could've just as easily been you or me.
It could be worse than that. He might not have even known the actual purpose and complete function of the system for which he was writing code.
That's common when big projects get outsourced. One might call it "decomposing the problem into small tasks" but it ends up with people writing code for machinery that they have no knowledge of. I've seen this myself. These kinds of projects depend upon layers and layers of oversight and process.
I think that the folks who are looking for "a perpetrator" are utterly missing the key-value of studying the Therac-25 case study.
Please note that you are working very hard to make up a fantasy situation that entirely exonerates the person like you. Your fantasy does not match the known facts. Think hard about why you think that's the most important thing for you to do here.
I'd also like you to note that you're pointing out the lack of accountability as a problem, and then dismissing one of the fundamental mechanisms of accountability: naming people who harm others.
Yes, other people on the project also should have pushed for traceability and accountability. But responsibility is not zero sum. "I was just following orders" is never an excuse for killing people.
However, a) none of that was true here, and b) in the wider societal frame, I think blamenessless must not always trump accountability. When we're talking about egregious negligence leading to death, I think naming the culprits is the very least we should do.
Self-driving software might find its way around a controlled test track okay but is nowhere near fit to be in control on public roads. Safe way to test it the is to have human drive and have the software simulate the same drive, then compare the two for discrepancies. Rinse and repeat until the software consistently equals or betters human decision-making. Then you can consider putting the software in charge.
Programmers are supposed to be the technical experts in the room. So if they aren’t pushing back against ignorant uninformed hubristic Management’s bad decision-making and making it right, how the hell do they think their software will do any better? Fucking Worse than Useless, the whole cowardly bloody bunch.
They are there to take the blame.
You cannot realistically expect a human spectator to switch modes at zero notice and save the machine from a deadly fuckup that the machine has already put them into. That’s not how human minds work. We just don’t bootstrap that fast. Jebus, we’re bad enough at salvaging bad situations we’ve knowingly manoeuvered ourselves into while already fully engaged in command and control mode.
At least when you—the human pilot—fucks up, you already know the decision chain that got you there because it is your own. With the machine you have first to determine it has gone catastrophically wrong, then determine how it has gone wrong, and finally calculate and execute the recovery strategy before…oh, whoops, too late: you just wiped out all the executive bonuses for this quarter. Also, there’s a blood streak on the street.
As for how large software development projects work, I think the trail of corpses already testifies more than suf as to how they don’t. Industrial institutionalization of incompetence is no defense, and any “professional” who hides behind it can go get fucked.
As much as I love the web, I think the move-fast-break-things ethos, which is arguably useful for startups doing who-cares-if-it-breaks things like social web front end tweaks, has been absolutely terrible for the industry more broadly. I have friends who make excellent money just sweeping up after the elephant parade of hotshots, solving infrastructure and code issues written by people who want to get paid like professionals without acting like ones. I'm glad for them, but the waste is maddening. And that's before we get to the body counts of places like Facebook and Uber.
Responsibility is not zero sum. The safety driver is responsible. But so are the people who sent out robot cars with a single safety driver, as they knew or should have known that paying attention in low-interaction situations for many hours in a row is not something humans reliably do.
The Uber managers and execs are also responsible, in that they set up the system that led to needless death.
But none of that absolves the programmers, some of whom did things that they knew or should have known were dangerous, and who did not make sure that the system they were committing code for was set up for proper safety.
Your notion that the only job of programmers is to meet the spec is one I deeply disagree with. We're not Amazon warehouse workers, desperate for a job and blindly following whatever orders come our way. We're highly paid professionals whose job is to understand what we're building and what effects it has. I think that's true for any sort of coding, but I believe it's very obviously true for life-critical systems. If we can't handle the responsibility, we shouldn't cash the (quite large) checks.
The failure here was to expect someone to stay alert for a rare event for long periods of time. They probably should have had a more active task that would have kept them more alert.