IMO, harm from software bugs (so far) have been vastly surpassed by explicit choices in system design. The various emission cheat scandals have almost certainly taken a real toll on human life going into the hundreds of person lives. More subtly, the choices to retain data inappropriately at Ashley Madison (probably) lead directly to suicides and serious emotional harm. Those are just the two recent examples that spring to mind as a practocing developer, not an ethicist.
To somewhat over simplify but when discussing engineering ethics, the harm from software developers building things wrong is swamped by building the wrong things.
I think unfortunately this is going to change with the advent of "AI" and related technologies, such as autonomous driving (we've already had a few cases related to self-driving cars after all). When the total enumerable set of possible configurations become too great to exhaustively "whitelist" we won't be able to have foolproof hardware designs anymore. In these situations software bugs can be absolutely devastating.
Yes, the potential cost of software bugs is increasing as software does things that no hardware interlock can stop. And worse than that, as a society we largely haven’t realized we need to optimize for worst case (not average) performance of algorithms, because they WILL be attacked. If you’re lucky, they won’t be attacked by sophisticated, well resourced nation-state attackers. But sometimes that will happen.
The rise of complex algorithms to control complex processes is the real difficulty. Facebook’s banning algorithm is an example of something that has been exploited by attackers.
Let’s hope voting software is not the next target where bugs can be exploited. Because changing political decisions can and does produce life-changing effects.
Here is a quote from http://sunnyday.mit.edu/papers/therac.pdf:
"In addition, the Therac-25 software has more responsibility for
maintaining safety than the software in the previous machines. The
Therac-20 has independent protective circuits for monitoring the
electron-beam scanning plus mechanical interlocks for policing the
machine and ensuring safe operation. The Therac-25 relies more on
software for these functions. AECL took advantage of the
computer's abilities to control and monitor the hardware and
decided not to duplicate all the existing hardware safety
mechanisms and interlocks."
So, regarding these important safety aspects, even the Therac-20 was better than the Therac-25!
The linked post also mentions this:
"Preceding models used separate circuits to monitor radiation intensity, and hardware interlocks to ensure that spreading magnets were correctly positioned."
And indeed, the Therac-20 also had the same software error as the Therac-25! However, quoting again from the paper:
"The software error is just a nuisance on the Therac-20 because this
machine has independent hardware protective circuits for monitoring
the electron beam scanning. The protective circuits do not allow the
beam to turn on, so there is no danger of radiation exposure to a
When I asked him about the limit switches it turns out they are read by software only and the software will turn off power to the motor controllers if a limit switch is activated.
I asked why he does not wire the switches to cut power directly to be on the safe side.
His answer "It's to much bother to add the extra circuits."
We are talking less than $20 in parts and a day of his time. If the software fails after sending the controller a message to start moving the head at a certain speed then crashes there is nothing to stop the machine wreaking itself.
Limit switches typically include two trip points. The first is monitored by the control system; when it is tripped the control halts execution and stops the machine. The second limit is wired directly to the servo amplifier so that, if for what ever reason, the control fails to halt the machine when the soft limit is tripped power is removed and motion is halted. Both limits are fail-safe such that if they were to become disconnected it would result in a limit exceeded condition.
3D printers are like this too. They have mechanical limit switches  that are read only by software. So if there is a bug in the software, nothing is stopping it from pushing the hardware limits and breaking. Same goes the other way around, if this switch is broken, same might happen.
I'm much more worried about the heating element. Its temperature is usually controlled by the same cpu that also does motion control and g-code parsing. If anything locks up the CPU the heat might not be turned off in time, and (because you also want fast startup) there is enough power available to melt something. At the very least you would get nasty fumes from over-heated plastics, and maybe even teflon tape, which often is part of the print head. At worst it could start a fire.
The straightforward way of implementing this with a bidirectional motor is to wire normally-closed limit switches in series with their appropriate direction signals, such that when the switch is actuated it prevents the motor from going in that direction, but it can still move away from the switch.
It's in the article you're commenting on.
It wasn't subtle at all. The entire design was substandard to begin with and didn't meet code. Suspending a walkway from a piece of square tubing made by welding two pieces of C-channel together was ALREADY pants-on-head retarded, and undersized to boot. Deciding it's OK to hang the lower span off the upper span's substandard tubing was just the last step in a long chain of gross engineering negligence.
Calling it a "subtle" failure is like calling Challenger subtle because it was "just a leaky O-ring", or the Apollo I fire subtle because it was "just a tiny spark". There was a completely avoidable cascade of multi-level failure leading up to all of them.
Of course, as you have said, building the wrong thing swamps other harms. Unintended consequences are hard to predict, unfortunately, but I am interested in ways to improve this situation. Standards design also interests me, particularly standards which are hard to cheat.
For example if you're a Chinese network engineer, and you can avoid it, don't take a job setting up tracking and database of Uyghur people. That is an ethical issue just as important as the therac-25 type problem.
They killed at least one child and are drugging them against their will, while they are forcibly taken from their parents and held in worse conditions than terrorists.
(I took the Engineer-in-Training exam once upon a time but then stropped practicing engineering so I never got the PE.)
More involvement of the legal and insurance industries are one of them. Another is to give software engineers something solid to brace themselves on when pushing back at management - completely aside from consequences for the company, if you're bonded or worry about a license, there are some things you won't let your manager sweet-talk you in to. Another is to provide a model of behavior for engineers, like it says on the tin. It doesn't mean everyone will, or even that the model is always absolutely correct. But giving folks a way to think about things when they feel something's off is a good thing.
Yet another is theoretically providing a baseline of competence. I think that depends more on improving informal culture than any formal mechanism, though.
 Note: not arguing in favor of one; I haven't made up my mind on what I think about the topic.
"It is difficult to get a man to understand something, when his salary depends on his not understanding it." -- Upton Sinclair
In any case, it’s dangerous to explain away software failure by dumping blame on systems. The lack of professional standards in software makes it easy for people to do bad things well.
This applies to the vast majority of fields where software is in use, except maybe planes, trains and nuclear power plants (where I sure as hell hope are a bunch of hardware safeguards in place). So in a sense, we software developers just got lucky here that our mistakes only kill one or very few at a time in most use cases (if at all).
It's still insane how according to the article they apparently just had that thing developed using emulated hardware with no proper security audits, safety guidelines or formal verification.
Definitely agree on the explicitly bad choices though, and since software's impact is often very subtle it might really be impossible to gauge exactly how bad some of those choices end up being.
To paraphrase an old Hoare quote, software can either be so simple it obviously contains no bugs, or so complex that it contains no obvious bugs.
From "Computer-Related Risks" by Peter G. Neumann, published 1994 (REALLY recommended reading)
"One of the most dramatic examples was the $32 billion overdraft experienced by the Bank of New York (BoNY) as the result of the overflow of a 16-bit counter that went unchecked. (Most of the other counters were 32-bits wide.) BoNY was unable to process the incoming credits from security transfers, while the New York Federal Reserve automatically debited BoNY's cash account. BoNY had to borrow $24 billion to cover itself for 1 day (until the software was fixed), the interest on which was about $5 million. Many customers were also affected by the delayed transaction completions."
Additional reference: https://www.washingtonpost.com/archive/business/1985/12/13/c...
Granted, no one died because of this but ... wow ... that was bad day for some developers somewhere.
In short they saw some data they didn't understand that seemed to indicate things at the bank were very much not how they were thought to be, and indicated something very specific was happening. In reality that was not the case but a lot of assumptions by morons somehow were believed and everything snowballed into lots of corroborating "evidence" that seemed to indicate bad things. This was a bank that rarely had this level of stupid information and assumptions rise to the top so nobody actually questioned it no matter how absurd it seemed.
My presence on the call was purely because the bank wanted every vendor they had looking to see if they saw problems, so I was basically just poking around telling them what i saw from my end.
At one point on the call there was a dude in an unmanned data center literally just flipping off power switches and cutting the cables of various equipment as he was given the locations for it. It was frantic and very unamerican bank like (at least the ones I worked with were pretty cool customers normally). I turned my speakerphone on to let my coworkers listen to the chaos.
In the end, it was a stupid SQL server related virus thing that created some good old fashioned network disruption and when that was solved and nothing looked like the sky was falling, everyone came to their senses. Once the dude pulled the power to enough of them everything calmed down and I checked my bank account and it was all there... but no excess either :(
On the other hand, when corners are cut (no hardware interlocks, for example) and edge cases aren't considered, even innocently, then you get things like this. It makes products more expensive to design and more costly to buy and maintain to do the extra engineering. It is certainly a barrier to entry, too. But do we want another case like this because people said "this is good enough"?
> An unauthorized third party can interfere with pump communication and undermine patient safety
> we confirmed this through laboratory experiments by sending commands to an insulin pump using an unauthorized remote programmer at a distance of 100 ft
> Thus, the specifically identified issues are a security breach that could result in:
> (1) changing already-issued wireless pump commands;
> (2) generating unauthorized wireless pump commands;
> (3) remotely changing the software or settings on the device;
> (4) denying communication with the pump device.
People can also attack the blood glucose monitors and the data they report to the pump system.
Today, companies build equally important UI logic in JS frameworks that target rapid prototyping and consumer-focused startups.
More than anything else, this accident shows the importance of fuzz testing your critical logic, the importance of hardware interlocks, and the importance of multiple independent layers of interlocks.
Only then will most companies actually start to care about software quality in their development processes.
The root problem is that society got used to turn off/on and hope for the best instead of going back to the shop and ask for their money back.
Also every time that there is an bunch of black hat hackers that expose company internal data, if the security breach can be mapped into a CVE database entry, a good law firm could probably make something out of it.
Not all jurisdictions are alike, but one needs to start somewhere.
No other industry has ever gotten away with this. But with 'software eating the world' change is just around the corner, the first software bug that will kill a few thousand people will be a very rude wake up call that something needs to be done.
The only industry that really gets it is aviation, medical tries hard but is still a mess, with the exception of devices, in general those are engineered reasonably well.
In a way all these SaaS products are setting the stage for some real liability, after all, if the end user doesn't have even a modicum of control over what happens with their data then the other party should assume liability, even if they try real hard to disclaim that.
Open source might get exempted, if not then I suspect that a lot of open source projects will fold.
There is no way I'll drive an internet connected car, unfortunately I still have to share the road with people that do drive internet connected cars.
But that's what I'm saying. If there is a possibility of legal action there will be enough legal bureaucracy to make sure there is something to show in court and avoid responsibility, but not to actually address the problem.
Jurisdictions that try and override this, simply get excluded from the customer base.
The market is still the ultimate decider for quality; if you build a crappy product, expect to get innovated out.
Judges might disagree.
> Jurisdictions that try and override this, simply get excluded from the customer base.
Until the customer base is the EU or the US.
> The market is still the ultimate decider for quality; if you build a crappy product, expect to get innovated out.
The market has utterly failed to decide for quality, the market is mostly interested in price and marketing power, quality has never been a very large factor, though in a mature market it might allow some manufacturers to charge a premium for their products.
Thankfully EULAs are void in Europe.
It is all a matter how big the customer base gets, I am hoping eventually we get something like that EU wide.
If that was true 1 € shops wouldn't exist, but even those products have more testing than most software out there.
It's not so clear cut that you should be thankful. The ability of companies to dictate the terms of which users can use their software, affects their risk calculation to produce the product in the first place. It is very likely that very useful but imperfect software will not be written because the risk / reward balance is tilted.
Remember, you always have the ability to reject an EULA; simply don't use the product.
> If that was true 1 € shops wouldn't exist, but even those products have more testing than most software out there.
Consumers can make value choices on quality vs cost. This is a basic market function.
Crappy software just doesn't physically injure people very often (compared to like, lawnmowers), and that's where the most serious legal liability for products comes from. Monetary damage from software gets worked out the same way any contract dispute gets worked out, or the same way a physical product that doesn't work but doesn't hurt anyone would get worked out.
Liability for bad software is also complicated by the fact that there's a million apps out there that are free to use. If they are broken for however long, it's hard to say that it cause monetary damage to anyone. (If anything, people are saving time... to paraphrase Mitch Hedberg, FB is broken, sorry for the convenience.)
If everyone did the same thing for broken software, instead of being conditioned that broken software is unavoidable, the quality across the industry would be much better.
For those working in safety and quality control of medical systems, how much does compliance to those specifications actually diminish the chances of another Therac-25 incident?
Considering that automation continues to increase, from automatic patient table positioning up to diagnose-assisted AI, are there new challenges when it comes to designing medical systems in order to keep them safe and maintainable? How likely is it for the FDA or the equivalent agencies around the globe to authorize the use of open source systems?
Actually they already authorize stuff like Qt.
Computer systems where human lives are put in risk belong to what is called High Integrity Computing.
There are very strict coding standards, where even C looks more like Ada than proper C.
Source code availability is not an issue, because it is part of the certification process to provide it.
The problem is having the money to pay for a certification, which becomes invalid the moment anything gets changed, namely compiler being used, source code, or if any of the third party dependencies gets updated.
For some treatments, a metal wedge would be placed within the beam to attenuate it more at the thick end of the wedge.
Because of the non-linear attenuation along the length of the physical metal wedge, dosages were difficult to calculate.
Someone got the bright idea of creating a software wedge by slowly moving the treatment couch at the same time as closing the beam aperture, so that there would be 100% exposure at one end of the "wedge" and 0% at the other, with a linearly decreasing distribution across the whole wedge.
I was the programmer for this project, and we had just started testing it with a sheet of X-ray film on the couch when I received an offer I couldn't refuse to go work elsewhere.
I'm glad that I departed before they started using this on live patients.
I don't think it's reasonable to expect nurses to wait an undocumented 8 seconds after changing modes to avoid a race condition. That goes far past "utmost care". Are pilots expected to never overlap command inputs? Are they allowed to engage the flaps and then activate the spoilers before the flaps are fully deployed?
I'm basing my account on this report as well as the OP: https://hackaday.com/2015/10/26/killed-by-a-machine-the-ther...
I’ll use a terrible analogy to make the point, if someone tells you “that’s bad” if you pull the trigger on a revolver playing Russian roulette and you pull the trigger five times without any apparent consequence despite being informed “don’t do that” each time, are you completely not responsible if you pull the trigger the sixth time and die?
I'm not sure that's a valid analogy. Isn't it the case rather that the revolver said "that's bad" but a person said "it's fine, ignore that, it always does that"? Which, in my experience, happens all the time in the workplace. If you didn't trust other people you work with, it would defeat the purpose of being in an organization at all.
Not to mention, if pulling the trigger repeatedly was required for normal operations, you really can't blame the operator at all.
If you are doing work that could immediately cause someone’s death, I feel you have a duty to double check and not just go with the flow. Ymmv and I’m not claiming my POV is more right, just that it’s my POV.
Look at the "unusual" choice of averaging inputs from the pilot and co-pilot that helped lead to AF 447. It's very reasonable to argue it's bad design, but it was the responsibility of the pilots to know how it worked.
And also between pilots flying a plane they are certified on and nurses who are probably less specialized.
I found this the most telling sentence...here the UX was so terrible the operator routinely chose to break the rules... but on some occasions it killed people
It's not just cryptic error messages. Pretty much anything that requires the attention of people will end up being ignored eventually. For example:
I've also read about an anesthesiologist who turned off the alarms because they annoyed him. One day he failed to secure the endotracheal tube during a surgery, it came off and nobody noticed. The result was cardiac arrest, brain damage, multiple organ failure, sepsis and death.
Monitoring hardware is very sensitive so it will fire off alarms if anything changes, no matter how small. The more sensitive a test is, the more false positives you get. This is extremely demanding of a health care professional's attention, which in practice is multiplexed between countless patients.
Up to 99% of these alarms and messages will do nothing but get in the way of people. These represent false positives, disconnected cables, and other minor failures that don't represent a real danger and can be easily fixed. People will get used to the alarms, and will learn to ignore them.
Perhaps, but your example is not supporting evidence. The PACU alarms were muted, precisely because they were so hard to ignore.
> Monitoring hardware is very sensitive so it will fire off alarms if anything changes, no matter how small. The more sensitive a test is, the more false positives you get.
This is fixable. The problem is the same as the one in the Therac-25 case ... severe and inconsequential alerts are indistinguishable.
Here's an particularly enjoyable piece of literature crafted around an instance of alarm fatigue: https://gutenberg.ca/ebooks/smithcordwainer-deadladyofclownt...
Raise your hand if you've made this type of mistake many times in the past. Most of us have the luxury of not having our software bugs affect human lives.
It's a shame they've removed the hardware safety controls. I don't think I'd even feel comfortable programming such a powerful tool without such circuit breakers.