> Like removing sensors because of a supplier/cost issue
There were more than one AOA sensors. Not hooking the other one up to the MCAS system could hardly be a cost issue. Nor is it an issue of software vs hardware. It being implemented in software had nothing to do with MCAS's problems. The software was not buggy, nor was it a workaround. What was wrong was the specification of how the software should work.
I was bringing the context back to the article of this thread, not talking about Boeing there. Sorry if it lead to confusion.
You can note that I acknowledged there were multiple AOA sensors in other replies. Further, they were already “hooked up” to MCAS, but Boeing made this safety critical redundancy requirement an option within the software. That’s bad practice, full stop.
I still maintain that the software was 100% a mitigation to a hardware change and I think there’s plenty of other evidence supporting that. E.g., has they not updated their engines, would MCAS be installed? If the answer is no, then it was a mitigation to a risk introduced by a hardware change.
I’ll say it one more time just to be clear: I’m not saying the concept was bad. I’m saying their design philosophy and implementation was bad. They could have used software mitigation within the right process/philosophical framework just fine. What doesn’t work is using software as an “easy” risk mitigation strategy when you don’t understand the risk or the processes necessary to fully mitigate that risk. The problem arises because software is a seductive fix because it’s relatively easy and cheap on its surface, but if your design philosophy and processes are equipped to implement it effectively, that “easy” fix is rolling the dice.
> it was a mitigation to a risk introduced by a hardware change
It wasn't really a risk. It was to make it behave the same.
Allow me to explain something. A jet airliner is full of hardware and software adjustments to the flying characteristics. For example, look at the wing. What do you think the flaps and slats are for? They are to completely change the shape of the wing, because a low speed wing is very very different from a high speed wing. There are also systems to prevent asymmetric flaps, as that would tear the wings off.
The very existence of the stab trim is to adjust the flying characteristics. The stab trim has an automatic travel limiter to constrain the travel as the speed increases because, you guessed it, full travel at high speed will rip the tail off.
The control columns are connected to a "feel computer" which pushes back on the stick to make the airplane feel in a consistent way from low speed to high speed. This feel computer can be mechanical or software. Pilots fly by the force feedback on the stick, not the travel of it. The idea is to make the airplane "feel" like a completely different airplane. Without the feel computer, they'd promptly rip the airplane apart.
There are plenty more of these. The MCAS concept is no different in any substantive way.
Your thesis that using software to run it as some unusual risk is simply dead wrong. What was wrong with the MCAS system was:
1. reliance on only one sensor
2. too much travel authority
3. it should have shut itself off if the pilot countermanded it
What was also wrong was:
a. Pilots did not use the stab trim cutoff switch like they were trained to
b. The EA pilots did not follow the Emergency Airworthiness Directive sent to them which described the two step process to counter MCAS runaway
There weren't any software bugs in MCAS. The software was implemented according to the specification. The specification for it was wrong, in points 1..3 above.
P.S. Mechanical/hydraulic computers have their own problems. Component wear, dirt, water getting in it and freezing, jamming, poor maintenance, temperature effects on their behavior, vibration affecting it, leaks, etc. Software does not have those problems. The rudder PCU valve on the 737 had a very weird hardover problem that took years to figure out. It turned out to be caused by thermal shock.
In various times in my past I've been a private pilot, airframe mechanic, flight-control-computer engineer, aerospace test & evaluation software quality engineer, and aerospace software safety manager. I've even worked with Boeing. So I am quite familiar with these concepts.
>It wasn't really a risk. It was to make it behave the same.
Hard disagree here. The fact that it did not behave the same and lead to mishaps shows there is a real risk. That risk could have been mitigated in various ways (e.g., engineering via hardware or software, administrative via training etc.) but they did not. You downplaying the risk as not credible is making the same mistake.
">There weren't any software bugs in MCAS. The software was implemented according to the specification.
I'm not claiming there were bugs. This seems to be a misattribution regarding how software fails. There are more failure modes than just "bugs". It can be built to spec but still wrong. This is the difference between verification and validation. Verification means "you built it right" (ie meets specs) while validation means "you built the right thing" (ie it does what we want). You need both and in this instance, there's a strong case they didn't "build the right thing" because their perspective was wrong.
>Your thesis that using software to run it as some unusual risk is simply dead wrong.
My thesis is that they didn't know how to effectively characterize the software risk because, as you point out, software risks are different than the risks of mechanical failure. Software doesn't wear-out or display time-variant hazard rates like mechanical systems. Rather, it incurs "interaction failures." The prevalence of these failures tends to grow exponentially as the number of systems that software touches increases. It's a network effect of using software to control and coordinate more and more processes and is distinct from buggy software failures. Which is why we need to shift our thinking away from the mechanical reliability paradigm when dealing with software risk. Nancy Leveson has some very accessible write-ups on this idea. There's nothing wrong with using software to mitigate risk, as long as you're actually characterizing that risk effectively. If I keep thinking about software reliability with the same hardware mentality you're displaying, I'll let all those risks fall through the cracks. They may have verified the software met specs, but I could also claim they didn't properly validate their software because you usually can't without understanding the larger systemic context in which it operates.
So what does that mean in the context of Boeing and, in a broader sense, Tesla? Boeing did not capture these interaction risks because they had an overly simplified idea of the risk and mitigations. They did not capture the total system interactions because they were myopically focused on the software/controls interface. They did not capture the software/sensor interface risk (even though their HA identified that risk and required redundant sensor input). They did not capture the software/human interface risk, which led to confusion in the cockpit. They thought it was a "simple fix". Tesla, likewise, is trying to mitigate one risk (supplier/cost risk) with software. TFA seems to implicate them in not appropriately characterizing the new risks they introduced with the new approach. I'm saying that is a result of a faulty design philosophy that downplays (or is ignorant of) those risks.
There were more than one AOA sensors. Not hooking the other one up to the MCAS system could hardly be a cost issue. Nor is it an issue of software vs hardware. It being implemented in software had nothing to do with MCAS's problems. The software was not buggy, nor was it a workaround. What was wrong was the specification of how the software should work.