Hacker News new | past | comments | ask | show | jobs | submit login
Boeing's software fix for the 737 Max overwhelms the plane's computer (moonofalabama.org)
73 points by sra77 on July 8, 2019 | hide | past | favorite | 70 comments



From the article

-----------

Boeing says that it can again fix the software to avoid the problem the FAA just found. It is doubtful that this will be possible. The software load is already right at the border, if not above the physical capabilities of the current flight control computers. The optimization potential of the software is likely minimal.

MCAS was a band aid. Due to the new engine position the 737 MAX version had changed its behavior compared to the older 737 types even though it still used the older types' certification. MCAS was supposed to correct that. The software fix for MCAS is another band aid on top of it. The fix for the software fix that Boeing now promises to solve the problem the FAA pilot found, is the third band aid over the same wound. It is doubtful that it will stop the bleeding.

The flight control computers the 737 MAX and NG use were developed in the early to mid 1990s. There are no off-the-shelf solutions for higher performance.

Boeing's latest announced time frame for bringing the grounded 737 MAX planes back into the air is "mid December". In view of this new problem one is inclined to ask "which year?"

-----------

Ouch...

Introducing a new CPU into airplane like 737 will require another whole series of test/certifications.

And right about now, Boeing management is thinking we could use some of those senior engineers we laid off because we thought they were not needed as our products are now mature.


Damn, I hope those senior engineers charge Boeing an arm and a leg when they are asked to come back.


What's the over/under on the executives and senior management who made that decision getting fired for it?


Why should the executives suffer any repercussions?

It’s not like the wrote any of the doomed software or anything.

They desperately need their multi-million dollar bonuses so that they can afford to continue to live such lavish lifestyles.

What’s wrong with you people? You act like a few lost lives have some value or something.....

/s


I know military jets to have unstable flight characteristics to get advantages in maneuverability that need an operating flight computer.

But I have problems accepting this fact for civilian planes, especially if the manufacturer just wanted to save development costs. Because you could simply design a more stable plane. Not trivial, but also not impossible.


Then you probably don't want to fly in a T-tail jet. [0] [1] Or a helicopter [2]. Or in a jetliner at cruise. [3] [4]

The fact is, every single airplane ever built is unstable in the wrong circumstances. They stall, they spin, they lose lift, they suffer mechanical failures. Aeronautical engineering, rigorous maintenance, and thorough pilot training have resulted in the safest transportation system that has ever existed, but the only completely stable aircraft is one that's sitting in a hanger.

[0] https://en.wikipedia.org/wiki/T-tail

[1] https://aviation.stackexchange.com/questions/1400/how-do-con...

[2] https://aviation.stackexchange.com/questions/35764/are-helic...

[3] https://en.wikipedia.org/wiki/Coffin_corner_(aerodynamics)

[4] https://www.faa.gov/regulations_policies/advisory_circulars/...


By stable I meant gliding properties. If the center of mass is moved too close to the center of lift, planes tend to pitch up and then are in danger of stalling. Some people suggest Ed the heavy engines might be responsible for this, but I believe this was misinformation.


Yep. Military jets also crash a lot.


I mean it would just be an additional layer of safety. Didn't find my ejection seat on the last flight I took.

Another commenter actually said that it isn't really true that the plane is inherently unstable. If so, the error was probably with regulations about classification and Boeing wanting to take a shortcut.


Does this mean that one of the reasons MCAS relied on a single sensor could be the already saturated CPU? Mindblowing.


I thought the extra sensor was an optional upgrade?...


No, that was the AoA indicator in the cockpit. The second AoA sensor was always there, MCAS just didn't consult it.


Simple question: Why doesn't Boeing upgrade to i386 for their new planes? I heard it's available for aerospace applications.


Simple speculation: The software makes heavy use of 80286 protected mode and its quirks, which in an embedded hi-rel environment is fine. An areospace 80386 doesn't necessarily have higher throughput (remember, these are in-order-execution architectures), if run at the same clockrate. And you probably can't simply increase the clockrate, since some timing may depend on the CPU clock, due to busy-loop waiting, or some sensor readout/communication using the CPU clock for protocol timing.


IIRC there were huge swathes of 'gotchas' with running 286 pmode code in 386 pmode, to the point where lots of code simply wouldn't work at all.

(One thing that springs to mind was that the way addresses running past the end of a segment worked was different)


A batch of CPUs of 286 complexity on a more modern process can't be that high compared to the development of a new aircraft (which the MAX kind of is).


I don’t have evidence, but most likely, it is to avoid the related costs and risks of certification of this « new » CPU for its intended usage. Not only the CPU but all the software stack needs to be re-proven safe, including compilers and test tools.

Note: this is not necessarily a stupid decision.


A simpler solution would be to use the same CPU done with a slightly newer process to allow a higher clock rate. That would ease the recertification process.

If you think they are at best 40 MHz 286 parts, the best that was available by then, running software derived from previous releases that ran on slower chips, there is plenty of room for improvement.

IIRC, you can get a modern 65816 (the 16-bit descendant of the 6502) in 300+ MHz. And I wouldn't expect it to be on a cutting edge process.

That considered, it's entirely possible most of the time is being consumed by reading the sensors. In that case, the sensor communication would need to be upgraded and recertified.


Yes, I'm familiar with second-source 6502. Also, I've seen the Rabbit2000 processor, it's basically a Z80 running at 50 MHz. But I wasn't aware that 80286 has any modern second-source clones. Are there?


I don't know, but it wouldn't be too hard. Or expensive, in relation to bleeding edge semiconductors.

Fun thing: I remember a 40 MHz 286 desktop at one point. Wikipedia tells me they only went to 25 MHz. That's some overclocking...

Fun fact #2: with a feature size of 1.5 µm, one could fit an entire 6502 done in 7 nm process on top of a single 80286 transistor.


Forget about upgrading - they could just press the Turbo button on that 286. That's what it's there for!


On my boxes, pressing the Turbo button donwclocked them to make Zaxxon playable.


Thanks for making me chuckle early in the morning.


That's a great comment!


With that comes a large amount of verification scope and effort. It is much easier to verify inside and out a small, non complex design running on an incredibly well understood simple platform.


This smells like more lives are about to be lost if Boeing doesn't get their act together


No.

A few executives might shave a fraction of a penny off their multi-million dollar annual bonuses. And a lot of lower level workers are going to lose their jobs.

But no more lives will be lost if Boeing fails to consolidate their defecation on this matter.


Could this be considered unacknowledged technical debt?


Maybe we're just ignorant of the available chips for this use case. Otherwise just reads like they've been kicking the can of system redesign/refactoring down the road for decades. It appears that short term thinking abounds in every industry.


I'm a developer with extensive experience in SIL-4 programming, specifically for life-critical systems (rail transport). I've built software subsystems that are used in 38 countries around the world to keep the trains safe.

The issue is the extensive costs of changing from one CPU type to another. Certification for these systems is a multi-year process and can cost millions of dollars even before any kind of success is guaranteed.

There are still very old CPU's out there, functioning just fine for decades. Safety programming doesn't just swap out parts like consumer computing does - it takes a lot of work to change CPU's.

I don't think Boeing has been kicking the can down the road on the upgrade. I do think they've been trying to cut costs and exploit their customers by offering extra safety features as upgrades, rather than making them standard. Its interesting to note that some of their customers don't require such stringent safety features in the regions they operate - i.e. this is as much of a legislative issue as anything else. It could very well be that Ethiopia doesn't have the same safety requirements encoded in its laws governing flight as France does, so Boeing offers different features not just according to budget but also legislation - although we are sure to see that change rapidly now.


I think if Boeing (or Airbus - what do they use? Faster chips or better algorithms or ...?) really wanted that extra processing power, they would've pushed to get the next chip in line to be certified for aerospace applications.

The fact that they neglected to do this... I don't know, public perception and Boeing's own representation seemed to paint a picture of "computer" and "chip"-aversion up to the point when they no longer could ignore the issue by designing a plane that needed to have it in order to be certified they way they wanted it to. And by then it was of course already way too late to certify another chip...

Then again, on the other hand: Boeing designs a lot of military planes, too, aren't these almost always exclusively fly-by-wire? Shouldn't they have the know-how in these things as well? Or is there a no information-exchange policy in effect between their military and civilian teams?


Something like that, thinking about it as a kind of senescence, arriving at a point where they realize that the system is depleted. It needs to be planned for.


Definitely - it's important to keep in mind that from an engineering perspective, the 737 has probably already reached EOL, and Boeing has been considering to develop a replacement for years (Boeing Y1 - https://en.wikipedia.org/wiki/Boeing_Yellowstone_Project), but that was delayed until Airbus forced their hand with the A320 neo, so they decided to offer a (as it now looks) "quick & dirty" upgrade of the 737 to keep it competitive. So upgrading the flight computers was probably out of the question with the given development time.


As has been discussed on HN, this whole 737 MAX flaw is not a software problem. Boeing wants you to believe it is. The flaw is actually a physical design problem of the aircraft. Even if the software were to work perfectly, the plane is still flawed. The only real fix is not to fly the 737 MAX.


This is entirely false. Stop spreading disinformation.

Myth: the 737 MAX 8 is not inherently stable, has relaxed stability, etc. Fact: it's very much inherently stable.

Myth: the 737 MAX 8 is easier to stall than other planes. Fact: no it isn't.

The pitch-up characteristic of the MAX 8 is less strong than of e.g. the 757 and that plane flies just fine.

The actual problem with the MAX 8 is that Boeing added MCAS to allow it to share a type rating with the rest of the 737 family (allowing existing 737 pilots to fly the MAX 8 without additional training), and they fucked up MCAS. There's a number of solutions on the table, including removing rather than fixing MCAS and giving up the 737 type rating.

I am continuously astounded that even on HN people are focusing on news cycle bullshit about inherent instability instead of the actual issues with Boeing/FAA that caused this situation.

Evidence so far suggests that MCAS was originally a non-critical system that was found to be too weak during flight testing, and given significantly more pitch authority. For whatever reasons, this didn't trigger the reclassification of MCAS as a critical system and it all went downhill from there.

Here's a pair of sources slightly more credible than the bullshit news cycle:

https://www.youtube.com/channel/UCphqjYZxxzjNbONVmY-0J7Q

https://www.youtube.com/channel/UCwpHKudUkP5tNgmMdexB3ow


If a software system has to be designed to make up for aircraft design problems, namely the too large engines that break the limits for ground clearance in the normal 737NG config, then the mentioned issues are not "entirely false" and "disinformation".

Boeing needed to counter the A320neo and they needed to take shortcuts that ended up killing people. I wouldn't be surprised if this were management failure and the actual engineers at Boeing have always been throwing around copious WTFs when building the 737 MAX 8.

"Hey engineers, improve the 737 to be X% more fuel efficient and make sure it doesn't need to be reclassified as new type. Have fun!"


> they needed to take shortcuts that ended up killing people.

No. They chose to take shortcuts.


I stand corrected.


> I am continuously astounded that even on HN people are focusing on news cycle bullshit about inherent instability instead of the actual issues with Boeing/FAA that caused this situation.

How is it surprising that people are sceptical of claimed native flight characteristics after the way Boeing has handled communications after the crashes? I don't think that there are any independent facts around, even airline pilots who had been flying MAX from day one would not have much (if any) experience at the edges of the flight envelope. One data point we do know is that some people at Boeing were apparently very concerned about MCAS ever being off.


Skeptical, absolutely! But it doesn't help to make up new facts.


The engines are too BIG for the airframe! Why people don't get it?


> The pitch-up characteristic of the MAX 8 is less strong than of e.g. the 757 and that plane flies just fine.

The 757 ( and 737NG etc ) has constant pitch rate. The Max does not, which is the root of the problem.

The actual forces involved aren't pertinent.


Ironically, I believe that your information is also inaccurate. MCAS is required in order to meet federal airworthiness requirements. Without it, in certain flight conditions the back-pressure on the control yoke gets less as the yoke is pulled further back. It's like over-steer in a car, and is simply not allowed. Yes, the plane is dynamically stable; if you leave the yoke in one position it won't pitch up even more. However, the forces must be corrected somehow, regardless of type rating concerns.

"The 737 MAX was a bit too easy to pull into a stall when flying with high AoA and making abrupt maneuvers. The larger engines for the MAX hung further forward from the wing, added a destabilizing aerodynamic area ahead of the center of gravity, destabilizing the pitch moment curve at high AoA.

Boeing and the certification authority, FAA, decided added margins was called for. Boeing added a pitch augmentation at high AoA called Maneuvering Characteristics Augmentation System, MCAS.

The aircraft should trim nose down to increase the stick force needed once it passed into the light grey area where the base aircraft had a region of less stability. Before the augmentation, the pilot felt if the aircraft wanted to fly into the stall, it got easier to increase the AoA after 12°AoA. With the augmentation the felt extra force was the same for the first and last part of the curve before the maximum lift was achieved at stall (and stall warning kicked in)." [0]

The manner of the fix (MCAS transparently pushing the nose down) was designed to avoid pilot retraining and thus keep the same type rating.

Edit: The fact that the 737-Max needs a handling tweak is not a failure. Modern planes have all kinds of these tweaks, whether aerodynamic (such as strakes), mechanical (stick shakers) or enabled in software. As the cited article continues: "So far so good. It's common an aircraft’s flight control system has fixes to stability margin changes in different parts of the flight envelope." The problem is that Boeing had a pretty severe collapse of its systems engineering regime.

"The implementation for the 737 MAX had two problems, however:

- The fault checking of the triggering AoA signal was not rigorous enough. This problem has been discussed a lot. No need to add anything.

- The judgment the pilots would identify a problem with the augmentation as a trim runaway and shut the trim off was wrong. Why the pilots didn’t see MCAS rouge actions as a trim runaway is poorly understood."

(The article was published in February. Since then lots of information has come to light about how MCAS determinedly fought correction, and the huge mental and physical loads imposed on the pilots.)

Edit 2: FAA regulation mandating increasing elevator forces for all transport aircraft: FAR §25.253 High-speed characteristics, (a) Speed increase and recovery characteristics, (3):

With the airplane trimmed at any speed up to VMO/MMO [maximum operating airspeed], there must be no reversal of the response to control input about any axis at any speed up to VDF/MDF [maximum airspeed demonstrated in testing]. Any tendency to pitch, roll, or yaw must be mild and readily controllable, using normal piloting techniques. When the airplane is trimmed at VMO/MMO, the slope of the elevator control force versus speed curve need not be stable at speeds greater than VFC/MFC [maximum control airspeed], but there must be a push force at all speeds up to VDF/MDF and there must be no sudden or excessive reduction of elevator control force as VDF/MDF is reached. [1]

[0] https://leehamnews.com/2019/02/08/bjorns-corner-pitch-stabil...

[1] https://www.ecfr.gov/cgi-bin/text-idx?node=14:1.0.1.3.11#se1...


"MCAS is required in order to meet federal airworthiness requirements"

Cite this. Specifically this. The rest of your comment agrees with mine without this being true, and I have not seen any evidence of this being true.


Hmm. OK, better wording would be "Handling mitigation such as MCAS is required in order...". I've seen several news articles, such as the one cited, which reported that the yoke forces decreased near stall AoA in certain flight regimes. Here's one from the NYT: [0] Originally MCAS was implemented for rather extreme maneuvers, and required both the AoA sensor and a G-force sensor to agree. Later, it was discovered that low-speed stalls also had yoke-force problems, and the control authority was increased, and the G-force requirement dropped.

[0] https://www.nytimes.com/2019/06/01/business/boeing-737-max-c...


And as stated in countless well cited articles, this was done in order to maintain the 737 type rating and make the MAX 8 handle like other 737. Not to meet regulations that were otherwise unmet.


I don’t think this is correct.

Your reference suggests that this tweak is to match the expectations of pilots certified to fly 737s, which handle in a certain way.


See edit 2, citing regulation which must be met by all transport aircraft. I think also §25.255 Out-of-trim characteristics, (b) (1) The stick force vs. g curve must have a positive slope at any speed up to and including VFC/MFC


I don’t see how the FAAs assessment that the 737 MAX was easier to put into a stall for greater that 12 degrees AoA is the same as a “reversal of controls” or a “sudden or excessive reduction of elevator control” stipulated in 25.253

What am I missing?

I am not an aviation expert in any sense.


"Reversal of control force", not "reversal of controls". There's also section 25.175, which stipulates "The stick force curve must have a stable slope at speeds"... and gives speed ranges for climb, cruise, approach, and landing. For example:

§25.175 Demonstration of static longitudinal stability.

Static longitudinal stability must be shown as follows:

(a) Climb. The stick force curve must have a stable slope at speeds between 85 and 115 percent of the speed at which the airplane—

(1) Is trimmed, with—

(i) Wing flaps retracted;

(ii) Landing gear retracted;

(iii) Maximum takeoff weight; and

(iv) 75 percent of maximum continuous power for reciprocating engines or the maximum power or thrust selected by the applicant as an operating limitation for use during climb for turbine engines; and

(2) Is trimmed at the speed for best rate-of-climb except that the speed need not be less than 1.3 VSR

These are transports, not fighters. The basic idea is that you want it to be harder work to make a sharper maneuver, and want the aircraft to naturally level out. If the force required decreases for a steeper pitch/roll/yaw, then the plane will naturally want to intensify the maneuver. It's like a car with oversteer--let go of the steering wheel and it will make a sharper turn. As the article cited said, the FAA and Boeing's test pilots weren't happy with the yoke forces in certain situations.


Thanks for clearing that up.

Now is it unusual for sensors and computer controls to help meet the airworthiness requirements?

I’m guessing it’s definitely not preferable.

I also think everyone is agreed that Boeing (the company and decision makers) really fucked up here, but it seems like a chain of bad decisions at every stage has played a part in this disaster.


There's all kinds of these artificial enhancements in aircraft. They range from "stick shakers" [0] (which cause the control stick/yoke to shake when the plane detects a near-stall) to yaw dampers (which counteract "Dutch Roll" in swept-wing planes) [1] to full-blown "you tell me what you want and I'll keep the plane from tumbling out of the sky" control systems for fighters [2]. There are also "flight envelope protection" systems which prevent pilots from overstressing aircraft. What's uncommon is for them to lack redundancy. Yeah, Boeing has much bigger problems than just MCAS.

[0] https://en.wikipedia.org/wiki/Stick_shaker

[1] https://aviation.stackexchange.com/questions/6391/what-is-th...

[2] https://en.wikipedia.org/wiki/Lockheed_F-117_Nighthawk#cite_...


It's a problem with the entire process. The hardware, software, safety as an option, lack of training, risk assessment, design, alarms that aren't easily identifiable, features that can't be over-ridden.

There's issues throughout practically every step and they want people to believe that they're just going to push a software update and everything will be 100% fixed.


Yeah, it seems that Boeing hit a hard design limit with the 737 airframe. Main issue is the low ground clearance which makes it impossible, one could argue, to fit new engines. Which would mean that the 737 reached its end-of-life. That must scare the shit out of Boeing execs, and rightly so when your bread and butter product needs to be replaced by something new.


Aviation is about managing the risks. Even flawed airplane like 737NG (a 4 decades old airframe when designed, whole bunch of hacks from cockpit to elevator) can become very safe with 0.2 hull losses per million flights. The Classic had its rudder issue, it was fixed. This will also get fixed by a combination of engineering, code, checklists and training. It won't be cheap but it will get fixed


Or admit it's a hardware and aircraft design problem and go through the exponentially more expensive process of fixing it.


What’s that old quote...

“It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It.”


That explains so much.


Is the 80286 still in production? Or is it old stock on a shelf somewhere that is used?


AFAIK, Airbus has stockpiled huge amounts of legacy CPUs in climate controlled conditions. Boeing must have done the same


Was really curious as to what Airbus was using, thanks for this! Why not just use the latest CPUs though? I can't imagine the cost would be relatively significant in the greater scheme of things. Are the legacy CPU's really better for this use-case?


Aero industry moves at a very slow pace. The state of the art F-22 Raptor uses intel 386 cpu and the F-35 lightning upgraded to more "modern" PowerPC CPUs, similar to the ones on legacy Macs.

Planes don't need insane multitasking processing power like our smartphones or PCs. They mostly do signal processing and sensor fusion in a tight loop which is quite trivial even for legacy CPUs as it's basic flight math equations which results in highly optimized code.

In terms of aero chips, basic is always better as you want a silicon that's tried and tested for decades to have a deep understanding of it's quirks and bugs so you know the code execution is reliable.


The redesign and compliance costs are enormous. That said, these costs are something Boeing could afford many times over.

Manpower could also be a bottleneck. Design talent is not what it used to be in the first world after much of production moved overseas.


I was trying to investigate this about Airbus but without much luck. I even found they use a 80186 with another Motorola processor for cross checking values. I also wonder does maybe Airbus have "more" processors for different tasks? I also read their A340 uses 80386. That might mean they have expertise and groundwork set to migrate the A320 to the i386, of they already haven't (hard to find data). The 'bus is a much more digitalised plane and 20 years younger, I would guess they've got something different or a bit more modern.


Probably minuscule when stacked next to the costs of grounding all 737MAXs for half a year and the negative publicity of multiple planes catastrophically nosediving out of the sky


>Are the legacy CPU's really better for this use-case?

Yes. Legacy CPU's have the advantage of years and years of testing. Newer CPU's are not as reliable for safety-critical systems inasmuch as not all the kinks have been worked out, and there may be catastrophic bugs in newer designs that won't be discovered until mass deployment on consumer markets.

Meltdown and Spectre have sent shock-waves through the safety-computing industry. I wouldn't want to fly on any plane that is running on the latest-generation Intel chips - they're just not settled yet.


They may find a physical fix for it later. Like adding weight where it can help. Even then, perhaps nobody wants to fly with it plus it can't offer the same performance.


It amazes me how Pilots are required mainly for the purposes of being able to act as a last resort against a computer which has gone haywire, and the public feel better thinking that the pilot can "take control" and then at the same time aircraft manufacturers are removing the ability for the said Pilots to override the computer. That's progress!


Mods: Can you remove the #more from the submission URL? The fragment makes the browser scroll down which skips the first half of the article.


Sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: