No, it indicates that the problem domain is sufficiently dangerous that the risk...

labawi · on April 2, 2020

> .. if a fix had a 1 in 250,000 chance of causing a different error that could result in a fatal crash then it would be worse than the 737-MAX problems.

That's an improper calculation. Slight nuance but orders of magnitude difference. A better approximation is:

If a fix had a 1 in 250,000 chance of causing a different error that would result in a fatal crash then it would be worse than the 737-MAX problems.

And converse:

If a fix had a 1 in 250,000 chance of causing a different error that could result in a fatal crash then it could be worse than the 737-MAX problems.

MAX's MCAS generates way more errors than 1 in 250k. A few of them resulted in a crash.

Veserv · on April 2, 2020

Yes. I was being slightly sloppy with my wording. I meant would and would.

asdfman123 · on April 2, 2020

Well, what I mean is that perhaps such critical software should be carefully rewritten with rigorous architectural and QA standards so they don't have to use an ugly hack in the first place.

karmelapple · on April 2, 2020

Rewriting software is not only costly and subject to breakage, but for some of these systems requires an absolutely monumental FAA recertification process.

The cost of recertifying software shouldn't be the sole reason not to rewrite something, but I imagine you've been part of a rewrite where not quite everything worked as intended, even after having a lot of tests.

"The devil you know" very much applies to software that controls such life-critical functions and flying airplanes. If a pilot knows how to work around something, introducing something they may not know how to work around could be the difference between life and death.

Why did two 737-MAXes crash and the fleet grounded? New systems were introduced that pilots didn't know how to address - and seemingly could not workaround, even in spite of the engineers who designed them not wanting that outcome.

A rewrite, even with the most rigorous architectural and QA standards, is not a panacea.

Veserv · on April 2, 2020

I agree, critical software should be written to a high quality standard. In fact, I will take it one step further and say that critical software must be written to an OBJECTIVE quality level that is sufficient for the problem at hand. If that level can not be reached, where we are confident that the risk is mitigated to the desired level, then the software should not be accepted no matter how hard they tried or how much they adhered to "best practices". We do not let people build bridges just because they tried hard. They have to demonstrate that it is safe, and, if nobody knows how to build a safe bridge in a given situation, then the bridge is NOT BUILT.

To then circle back to airplane software, the standards of original development are even higher than the standard I stated above. There are ~10,000,000 flights in the US per year according to the FAA and for at least the 10 years before the 737-MAX problems (I believe closer to 20), software was not been implicated in a single passenger air fatality. That means that in over 100,000,000 flights there were only two fatal crashes due to software (not even in the US, so we would actually need to include global flight data, but I will not bother with that since I am unaware of the count of software-related fatalities in other countries) for a total fleet-wide reliability of 1 in 50,000,000, 7 9s on a per-flight basis. If we use the per-time basis I used above, there are 25,000,000 flight-hours per year, so over 10 years, 6 minutes in 250,000,000 hours is 1 in 2,500,000,000 or 99.99999996% uptime, 9 9s, 25,000x gold standard server availability, 250,000x the availability guaranteed by the AWS SLA. Also note that with servers, we can use independent replicated servers to gain redundancy allowing uptime to multiply (1 in 100 failure for each server means chance of failure of both at the same time, assuming independence, is 1 in 10,000), but the same does not apply to airplanes since every airplane must succeed.

The thing to understand is that the software problems we are seeing are not necessarily an indication that their standards are lower than the prevailing software industry and that they should adopt their practices. It could be, and likely is, that the OBJECTIVE quality level we require is extremely high, and they have not been able to achieve it as of late. This obviously does not excuse their problems since, as I stated above, they must reach the OBJECTIVE quality level we require; it is just an observation that maybe it is not because they are incompetent, maybe they are really, really good and the problem is just really, really, really hard.