Hacker News new | past | comments | ask | show | jobs | submit login
Incident: Airbus A330 at Taipei, primary computers failed on touchdown (2020) (avherald.com)
300 points by akamaka 45 days ago | hide | past | favorite | 135 comments



For everyone commenting about different hardware failure modes and possible solutions. Take a look at the airbus « flight control laws » such as https://apstraining.com/wp-content/uploads/FCS-Airbus-Quick-... it governs what happens to flight controls and what protections are available in different scenarios. In this case the system worked, it crashed and reverted to direct law providing manual braking control.

Off topic but I found this very interesting « The deceleration performance of the occurrence flight between 6,600 feet and 7,300 feet from the threshold of runway 10 deteriorated. It may be due to paint marking and rubber deposit on the touchdown zone of runway 28. »


Build-up of rubber from the tires of landing airplanes is a routine problem with runway maintenance. Typically the airport authority will use a friction-measurement device at regular intervals to determine when the surface friction of the runway has fallen below required parameters, and resurface. But rain makes things much worse and might have combined with rubber build-up to produce a particularly slick section.

Flying light aircraft, I was taught to avoid landing right in the touchdown zone just to have better traction. I'm not sure if this is actually useful or just Old Pilot Superstition but it has some logic to it, and putting a Skyhawk down on a 14k foot runway you have a lot of room for eccentric opinions (similar to landing just a touch off centerline so the nosewheel isn't on the painted stripes, another one I've heard people mention as a "best bad practice").


My dad was a commercial pilot, and taught me to fly at a young age. One of the bits of sage advice was something along the lines of "if your nosegear has two or more wheels, try to get it bang on the centre line, every time. Otherwise, miss a little tiny bit. The lights go THUMP-THUMP-THUMP and eventually drive you mad or, combined with a little gust of wind, cause you to hop once more".

I think there's a lot of Old Pilot Superstition with more than a dollop of truth in it.


I don't think anything too bad could happen when landing light aircraft on slippery but long runway. I was doing some winter flying and was landing pa28 on slippery runway (compressed snow) and I didn't noticed any difference until i started pressing brakes which had virtually no effect on deceleration. On touchdown you have a lot of aerodynamic authority on your controls so runway friction doesn't really matter unless you have super strong crosswind that could blow you out the runway i guess.


If I recall rightly Concorde pilots often took control from autoland and landed it slightly off the center line.

This was to avoid upsetting the passengers and to ensure the champagne didn't get shaken too much.


I don't think you were allowed any service items during landing, so this can't have been true.


Could be referring to the champagne still in bottles. How long does it take for a "shaken" bottle to return to "won't explode everywhere when you open it" state, anyway? Does the low pressure in the cabin make it more prone to erupt?


It's very odd they don't do anything to speed up the wheels or apply a coating of water to them on dry days to avoid extreme wear and excess rubber for future planes landing.

Even a passive design with the shape of the tread could help substantially reduce the amount of rubber scraped off the wheel on each landing.


Apparatus to spin up wheels was analyzed exhaustively, early on. They determined that the added weight and complexity did not pay.

That is not to say that some clever physical design could not help. But the idea of spinning up wheels is well-known to gear engineers. Tire wear is a substantial expense, so any bright ideas would be welcomed, and eventually tried out.


Seems like one could design the tread (or an extra rubber tab) to catch the wind and self spin.


And, as I said, these ideas have been explored in depth, and abandoned. Nothing has changed in tires or landing gear, since, to merit going back over everything that was tried.

Suppose your stuff got the tire spinning at the circumferential rate of 20 knots. The plane touches down at 130+ knots. Is the decrease in rubber deposited noticeable? How fast would you need the tire to be spinning to make a noticeable difference? How much does your extra apparatus weigh? How much extra space does it take in the wheel well?

The extra weight has to be carried throughout the flight. Space in the wheel well is tightly constrained, because it is near the center of gravity of the aircraft, where cargo space is most valuable, and where the wing spar crosses through the fuselage.

Everybody thinks of spinning up wheels. Thus far nobody has had answers to the questions that led to acting on the idea.


Well using sprinklers to wet the initial part of the tarmac would work. But the safety folks don't have an understanding of nuance that a runway where just the first 300 feet is wet is different from the runway being fully wet.


Thanks, more or less the reality check I was hoping for :)


So you are suggesting hydroplaning?


> The deceleration performance of the occurrence flight between 6,600 feet and 7,300 feet from the threshold of runway 10 deteriorated. It may be due to paint marking and rubber deposit on the touchdown zone of runway 28

Just to clarify for others, this is the same runway but taken in two different directions (west to east and east to west, multiply the number by 10 and you get [rounded] angle in degree, so 100ish vs 280ish). It's 8500 ft. long and it looks like there was a loss of performance around the "touchdown" point of going the other way on that runway, where you had accumulated rubber from those landings (which is pretty normal and monitored, as pointed by leecb below), if that makes any sense.


> accumulated rubber from those landings

Rubber accumulation on runways is a normal and expected situation. Part of runway maintenance at airports serving larger aircraft includes regular removal of the rubber using high pressure water or chemicals.

https://en.wikipedia.org/wiki/Airfield_rubber_removal


I'm still always surprised just how much rubber is allowed to accumulate.

Here's a snapshot of one of the Zurich runways: https://www.google.com/maps/@47.4792249,8.5411849,660m/data=...

Scroll a little NW or SE to see what the actual color of the runway is underneath all the rubber.


i'm not surprised, look how invasive the cleaning techniques are.


Not going to lie, I never thought this was a thing that'd need doing, but on further reflection it seems so obvious I'm kinda mad it never occurred to me in retrospect.


Its unfortunate they don’t calculate how big a difference it made. They’d calculated a 170ft margin and stopped with 30ft margin and I’m curious what their braking delta was for that 700ft of deterioration.

« With three FCPCs inoperative, actual remaining runway distance (30 feet margin) of the occurrence flight was shorter than the calculated value (172 feet margin), possibly due to tailwinds, runway conditions, and manual braking as these factors might increase the braking distance. »

I get sucked into how all the little things that are not right make such a big difference in these kind of scenarios.


Complex systems and aerospace particularly are dominated by little things. Failure or near misses are never single cause, those are all designed for. When fails to happen it is almost always a conspiracy of serval things out put together any one of which would have mostly prevented the incident.

There is a lot that practices involving life critical complex systems can teach other fields.


Agreed, and let's not lose sight of the fact that...this was a successful landing. Everyone walked away from it and the passengers likely never knew anything went wrong.

Four hopefully independent things went wrong here: three computer failures plus less-than-documented braking performance. Additionally, there were at least two aggravating circumstances: wet conditions and a tailwind landing. Part of the investigation is to find out if there was a fourth or fifth system failure (poor pilot reaction time? Rubber level on the runway unacceptable? Either is plausible, but far from indicated by the report so far).

One extra thing going wrong here, in the wrong direction (anything that impaired braking or pilot reaction time) would likely have led to loss of life. Investigating near misses like this, and not only being exercised once five things go wrong and a hundred people die, is a sign of a healthy safety culture.


Overshooting the runway isn't that bad, it's expected in emergency scenarios. In those cases, minor injuries and extensive damage to the aircraft are accepted in favor of something worse. Usually you find rubble pits and soft earth behind a runway for the nose gear to dig into, a 10-20ft drop on the nosegear will hurt but the aircraft is now breaking with it's entire front, not just a rubber wheel. The Mayday TV show had a few incidents where the aircraft overshot and that happened.


> the passengers likely never knew anything went wrong.

If the deceleration varied as much as the report said, then the passengers probably could tell. That was very different from a normal landing. If that wasn't enough, being towed to the terminal might have given them another clue.

> One extra thing going wrong here, in the wrong direction (anything that impaired braking or pilot reaction time) would likely have led to loss of life.

Well, it would have lead to the plane running off the end of the runway. What's past the end of that particular runway? A cliff? A river? Or just another 100 meters of grass leading up to a perimeter fence, then more flat ground for some distance beyond it? If it doesn't overrun the runway too far (say, 100 or 200 meters), no, that doesn't seem likely to lead to loss of life - not unless there's some specific hazard there.


According to the first comment on TFA, it's in a built up area. I checked Google maps [0] and it's pretty bad: 50m of tarmac (no EMAS I think), 50m of grass, then a minor road with commercial buildings.

Regardless, that's one to be considered another aggravating circumstance, not an additional failure. When deciding whether this was a close call, you can treat it as if every runway is wet and in a permanent tailwind and immediately followed by a wooden building full of schoolchildren.

[0] https://maps.app.goo.gl/u4eFmqDAysMVbbWC8


Oh, absolutely. You treat it as if there is no margin.

And you don't put wooden buildings full of schoolchildren immediately past the end of runways.


Unless I am mistaken, runway 10 is west to east so it would have gone through the parking lot and then into the river. That also the direction I remember most planes landing there.


I read this in an old 1950s era Readers Digest (Harper Subscribers only). Interesting (fictional) article about a jet that was doomed before takeoff by an accumulation of a number of small things each of which would not individually have caused an incident but taken together were fatal. https://harpers.org/archive/1957/09/the-jet-that-crashed-bef...

Edit: Accessible version of the article here. https://books.google.co.in/books?id=2gk-2rP34LIC&pg=RA2-PA14...


The flight data recorder chart on page 114 of the Chinese-language report[0] seems to record that the pilots hit the brakes very hard in the last few seconds, maybe when they realized they were close to overrunning the runway.

That caused a sudden peak of -0.47g deceleration, so it looks as if they could have used the brakes to slow the plane earlier on the runway, but maybe they didn't because they were expecting they could sort out the problem with their spoilers and reverse thrust and stop it in the normal way, without giving the passengers a shock.

[0]https://www.ttsb.gov.tw/media/4912/ci-202-final-report_chine...

edit: that chart's x-axis is the aircraft's position on the 8500ft-runway, and most of the recording traces seem to begin at the point when it had touched down finally with all wheels


> hit the brakes very hard in the last few seconds

In direct law, is it possible to brake too hard and have the tires skid on the runway? Is it possible that the pilots were afraid of this?

I ask because people are characterizing direct law as "less smart". So how smart is it? Smart enough to include ABS?


I was a passenger on a JetBlue flight in the winter of 2000/1, where air traffic was being diverted from JFK due to a snowstorm, but for whatever reason (fuel?) we had to land there anyway. The runway was basically unplowed at that point. As soon as we touched down there was a sensation we were heading into a skid. I wouldn't really call it a braking skid... it the plane just started turning away from its angle of motion. The pilot somehow managed to slow down before we went off the side of the runway. Ended up stuck in deep snow, partially in a ditch. After what seemed to be some attempts at getting us out if the snow, we were evacuated off the slide. A couple people movers were brought out which we got into, amd which themselves became stuck in the snow on the runway for an hour ... fun times.


Oh 100%! Anti skid is only available in alternate and normal laws. Section 6.5 braking system of this document.

https://www.smartcockpit.com/docs/A330_Flight_Deck_and_Syste...


Similar naming scheme, but actually its normal and alternate break systems. Powered by different hydraulic systems.

Not to be confused with flight control laws, normal, alternate and direct. One isn't necessarily related to the other.


I don’t know for sure but I think the article or the comments said that ABS was still operative.

However, I’d add that the pilots would not have had time to fully appreciate the situation, probably didn’t know that all three systems had failed, hadn’t had time to run checklists, and so they might have been worried about the possibility that ABS had also failed. All the other stopping systems had failed after all.

Remember this happened as they touched down, which is a pretty high workload environment at the best of times.


But that page does not mention direct law. Wouldn't anti-skid protection depend on the switch shown on the landing gear panel shown in 6.9?


I don't know, but the computer had failed, the runway was wet, and they were confused, so, yes, it's not unlikely they were also afraid of skidding


Aircraft anti-skid is more complicated than a car. There are a lot of hydraulics and sensors working with extremely high loads.

To simplify how things work, there are two sets of systems in the plane.

There are the computers controls that move all of the surfaces. They have no connection to the various other sensors, because they only move things as they are ordered to. These form the base flight-by-wire system.

On top layer, you have the three flight computers handling all of the automation. A triply redundant system with all of the plane's sensors available to them, not just the sensors in the braking system.

So, the ABS is handled by the flight computers and the flight computers are their own backups. The automation will attempt to degrade gracefully into alternate law. If they all go, the plane reverts to direct law and the pilots interact with the base fly-by-wire system which has little to no automation.

Failure of the flight computers are rare. Every other part of the braking system is less reliable than the flight computers.

From another angle, if a failure caused the flight computers to crash, is there also a problem with the braking system? By having a clear, defined failure mode, a pilot only needs to be trained to land the plane with manual braking.

When the flight computers drop out, the fly-by-wire system becomes directly connected with the flight controls. This includes the anti-skid functionality.

With the above design, it doesn't make a lot of sense to add a layer of complexity between the flight computers and the fly-by-wire system. The redundant flight computers are the backup systems. They can run with partial functionality. If they all fail, the plane reverts to a known, extremely well tested, working systems.

There are multiple overlapping failure modes already. Adding another failure mode is a bad idea.


> To simplify how things work, there are two sets of systems in the plane.

If the plane was not simplified, how many systems would there be? It seems to me that the goal is not simplicity, but automation and safety. Otherwise, there would still be flight engineers.

> This includes the anti-skid functionality.

Fly-by-wire includes anti-skid? Direct law includes anti-skid? What is 'this' referring to?

> If they all fail, the plane reverts to a known, extremely well tested, working systems.

Well tested compared to what? Well tested compared to flying with flight computers?

> Adding another failure mode is a bad idea.

Why is the anti-skid toggle a physical switch if it is only available in some systems of laws? It's physical presence means pilots will think it is always available. Isn't a misunderstanding a failure mode in itself?


How hard is it to make airliner lose grip? There is a lot of weight above the wheels and a lot of downforce from the wings.


> a lot of downforce from the wings

Not unless the plane is upside down, surely...


Don’t call me Shirley!


If the spoilers are up, yeah.


I wonder how much was just being « used » to auto braking assistance and what felt right.


I don't entirely grok the vocabulary, but it sounds like the computers each said "This input is outside of limits, something must be wrong with me" so shut off for safety. But is there a mechanism that says "we can't all be wrong, it must be the sensor", to avoid situations like this?


It sounds like it was more complex than that. The computers shut down because of a COM-MON order mismatch. In other words, there is a watchdog in the system that monitors the three computers and their orders (control surface movements, etc. - in this case rudder inputs). There are 3 computers and one of them is designated COM (command) while another is designated MON (monitoring). If the values of the inputs that the computers want to send to the actuators diverge too much for too long, the command computer is disconnected and another computer is designated command. In this case it sounds like the command and monitoring computers had a significant delay between them when they transitioned from flight law (settings used in flight) to ground law (settings used while on the ground) while the pilot was also applying rudder (which is very common during landing, to correct for crosswind). When on the ground, the rudder is set to not deflect as much from the pilot's inputs compared to when in the air. It sounds like normally the computers switch to ground law close enough in time that the resulting mismatch is ignored, but in this case the significant delay caused a "split brain" situation and the watchdog disconnected all 3 computers.

Perhaps the root cause is an issue in the weight-on-wheel sensor inputs that the computers use to transition. The computers need to use redundant sensors so they don't all rely on the same faulty sensor, but maybe the sensors are not appropriately cross-fed or the sensor input combination logic is too different between the computers.

The problem with not disconnecting a computer in this situation is that bad inputs from that computer may also cause a crash. But maybe if no other computer is available to take its place the logic should be different, at least during takeoff/landing.


> Perhaps the root cause is an issue in the weight-on-wheel sensor inputs that the computers use to transition. The computers need to use redundant sensors so they don't all rely on the same faulty sensor, but maybe the sensors are not appropriately cross-fed or the sensor input combination logic is too different between the computers.

This would indeed explain the discrepancies between COM/MON which is what they called asynchronicity (which is a bit different from where my mind went when I read that word the first time). There was no particular fault found in the computers.

Maybe wet runway factored in a bit (or some other issue with the runway itself), but this is a common occurence around the world and this was a first on the whole 330/340 family.

> The problem with not disconnecting a computer in this situation is that bad inputs from that computer may also cause a crash. But maybe if no other computer is available to take its place the logic should be different, at least during takeoff/landing.

Yes, only thing I would add is that they failed in cascade within one second of each other, 3 seconds after main gear touching (and before the nose gear touched down):

> Three seconds after the main gear touched down, autobrake system fault was recorded on FDR. One second later, PRIM1/PRIM2/PRIM3 faults were recorded at the same time and the spoilers retracted, as the ground spoiler function was lost.

Whether all three should be allowed to be failed within one second then default back to direct law (manual) is a hard question to solve, in this precise case it seems like try again would have maybe worked? Would it always be safe to do a retry? I don't know. This is a super critical phase of flight so retrying doesn't sound safe.

Apparently they would have needed 2 functional for full autobrake:

> Ground spoilers function requires at least one functional FCPC, arming autobrake requires at least two functional FCPCs, deployment of thrust reversers require unlock signal from either FCPC1 or FCPC3


Im gonna re-explain what you’ve written in language I’m more familiar with to check my understanding, and hopefully you can correct any errors I make.

As I understand it the A330 has three primary flight computers, all observing the same inputs (which might come from different physical sensors monitoring the same thing?) and producing outputs, also know as “orders” for other systems in the plan, like actuators.

Of these three computers, one will act as the primary command computer (COM), one as a monitoring computer (MON), the third is spare and normally ignored. Only orders from the COM machine is sent to downstream systems like actuators.

There’s a separate watchdog system that monitors the outputs (orders) of both the COM and MON, and if their order values diverge by too much for too long, it shuts down the COM computer and passes control to the MON computer and the spare. As part of this process one of those two computers becomes COM the other MON. I assume the size of the divergence determines how long they can diverge for. A large divergence is only allowable for a very short period of time, and a small divergence is allowed for longer.

In addition to all of this, the computers have different operating modes which changes how they respond to inputs. In this case the relevant states are “normal law” for in the air, and “ground law” for on the ground. One things that’s different between these states is how tightly coupled rudder inputs from the pilot are to rudder orders to the actuator. In the air the rudder is less tightly coupled than on the ground. E.g. commanding full left rudder in the air results in less physical movement of the rudder than the same command on the ground.

When the plane lands, detected by a pressure sensor in the landing gear, the computers transition from “normal law” to “ground law”. Which for some outputs, like the rudder, might result in a step change in outputs (orders) from the computers.

So in this specific scenario what happened is that the flight computers for some reason didn’t transition between “normal law” and “ground law” simultaneously (or close enough to simultaneously). So the COM computer significantly changed its rudder output as a result of changing law, but the MON computer didn’t, because it hadn’t changed law yet (the inverse of this is also possible). As a result they were producing very large differences in rudder orders, resulting in the monitoring watchdog killing the COM computer and failing over to the spare. Where the above situation happened a second time, resulting in all computers being shut down.

Is all of the above correct?

All of this does make me wonder, if changing law can result in computer outputs quickly changing, doesn’t that make law changes inherently dangerous? If you’re a pilot landing a plane applying significant rudder inputs, doesn’t the above me that those inputs will have a vastly different effect once the wheels touch the runway?


[Private pilot here]

> doesn’t that make law changes inherently dangerous?

Well, yeah, but landing is inherently dangerous for the exact same reason: the system dynamics change suddenly when the wheels touch the ground. That happens (obviously) as a consequence of the laws of physics whether or not you have a computer in the loop. So this is a risk that just goes with the territory.


> If you’re a pilot landing a plane applying significant rudder inputs, doesn’t the above me that those inputs will have a vastly different effect once the wheels touch the runway?

IANAP so hopefully someone will correct me, but from my understanding in this case the answer would be “yes, they will have a different effect, as they should”.

Rudder is used in crosswind landings to maintain runway alignment while still in air; once enough weight is on wheels and speed is low enough rudder quickly loses its effectiveness and control shifts to wheel steering and brakes.

That said, I suspect it doesn’t help that flight computer sort of has a binary context flag (either we are in the air or on the ground), it might have simplified some of the business logic but does not seem to map well to reality at a crucial moment. If imagined in slow motion, the system doesn’t just flip a state but goes through a spectrum.


> So in this specific scenario what happened is that the flight computers for some reason didn’t transition between “normal law” and “ground law” simultaneously (or close enough to simultaneously). So the COM computer significantly changed its rudder output as a result of changing law, but the MON computer didn’t, because it hadn’t changed law yet (the inverse of this is also possible). As a result they were producing very large differences in rudder orders, resulting in the monitoring watchdog killing the COM computer and failing over to the spare. Where the above situation happened a second time, resulting in all computers being shut down.

Possible solution: always designate COM/MON computers which agree on the mode: flight or ground. Only disable a primary COM computer if it disagrees with the MON computer while both are running on the same mode.

> There’s a separate watchdog system that monitors the outputs (orders) of both the COM and MON, and if their order values diverge by too much for too long, it shuts down the COM computer and passes control to the MON computer and the spare.

So, if the MON computer is faulty it will always disable the 3 computers?


> Possible solution: always designate COM/MON computers which agree on the mode: flight or ground. Only disable a primary COM computer if it disagrees with the MON computer while both are running on the same mode.

I’m not sure that helps, how do know that the COM computer is correct and MON isn’t? Ultimately you only really care if the two computers are trying to the plane to do different things, if they’re in different modes but producing the same outputs I’m not sure how much you would care.

> So, if the MON computer is faulty it will always disable the 3 computers?

I’m just interpreting what I’ve read. If you know better then please do tell us.


Actually, I know nothing about the subject; Please don't take my comments as such. Sorry, I should have made that clear.

About the first proposal: redundancy using majority of votes is well known.

Second, GP said:

> There’s a separate watchdog system that monitors the outputs (orders) of both the COM and MON, and if their order values diverge by too much for too long, it shuts down the COM computer and passes control to the MON computer and the spare.

What I read from this is: COM differs from MON; watchdog disables COM and uses MON and spare as a new COM/MON. But if previous MON was faulty, it will still differ from spare (except if both are failing in a sufficiently similar way).


Another possible solution: if the mode change is not unanimous, ask the pilots! The pilots might not know, (e.g. when landing in fog - famous accidents happened because of this), but at that point a go round or diversion to another airport seems to be the safest plan of action.


The pilot in command has ~no spare capacity during touchdown, full focus is required for rudder and stick. Especially during crosswind landings. The other pilot probably shouldn't be expected to make such a decision in a second out of the blue.


The copilot could press a button as soon as the plane lands, if and only if he is certain the plane touched down. Then, if the computers can't decide, use that input for a decision. Literally asking and then awaiting a reply is slow indeed.


Is MON one of the three, or is it a fourth computer? If it's one of the three, how does the third computer get disabled?


as I understand it...and I am not an expert but have been exposed to some similar systems just not with Airbus...I believe the following is correct at a systems design level:

* There are three flight 'computers' (boxes) (its more complex than that but that complexity is not germane to your question)

* each box has two entirely different motherboards with different processors and independent software inside of it

* each motherboard takes the same inputs and calculates the appropriate outputs.

* if those outputs disagree, inside of the same box, you get a COM/MON fault and the box/system takes an appropriate action...such as disengaging

* once all of THAT happens in a single box...the boxes are also looking to see if all three boxes are agreeing with each other. This is where you get 'voting.

* if all three boxes agree, great! If two agree, disregard the third. If none agree, execute fault fallbacks.

* If you run out of computers doing things that make sense - shut the computers off and make really loud noises to alert the pilots they are on their own

so...you have the computer agreeing with itself and then you have the computers agreeing with each other. Both are important/critical for fault tolerance.


Makes sense. Thanks.


I'm not sure there was a sensor issue here. It appears that the Flight Computer monitoring routine for the Rudder position somehow caused all three computers to crash. This was somehow exacerbated by the pilot's rudder inputs, which I don't fully understand

> the combination of a high COM/MON channels asynchronism and the pilot pedal inputs resulted in the rudder order difference between the two channels to exceed the monitoring threshold

The flight computer failure resulted in the inability to use two braking mechanisms, Thrust Reversers (make air from the engine go forwardsish) and Spoilers (stop the wing from producing lift, putting more weight on the wheels and making the brakes more effective.


The thing that bugs me is "did the pilots engage those systems manually?"

And

How was their reaction time doing so compared to someone who would be used to doing so via muscle memory on the regular?

Automating things is nice and all, but there is something to be said for keeping manual skills sharp.


Well in this case the entire system failed safe after a pretty catastrophic failure of the automated systems.

So on the whole I would say this incident demonstrates that the current safety standards, contingency plans, and pilot train all work as needed. I don’t think there’s anything here to suggest that the pilots manual skills are rusty.

And when talking about the specific systems that didn’t active. They didn’t activate because they require a positive indication from the flight computers that’s it’s safe to activate. Something that probably can’t be overridden by the pilots. Which is why the planning process for flights requires pilots to assume they won’t work, and ensure the runway is long enough for the worst possible scenario.


> So on the whole I would say this incident demonstrates that the current safety standards, contingency plans, and pilot train all work as needed. I don’t think there’s anything here to suggest that the pilots manual skills are rusty

According to the article they had 30 feet of runway remaining when they brought the plane to a halt. So, yeah I guess, everything worked out, but I wouldn't say that was indicative of a well oiled machine.


Well yes, that’s one of the criticisms in the final report. That the flight plans didn’t provide adequate additional runway length to handle this specific runway in the rain. I think there may have also been criticism of the airport runway maintenance with too much rubber build up on the runway.

All of which resulted in this plane having less margin for error than it should have done. But that’s what we have safety factors, to account human error and natural deviation in the environment. In this case that safety factor prevented this incident from being more serious, and post-mortem has identified areas for improvement, which will no doubt be instituted.

To me, all of this points to extremely robust safety procedures that have prevented the loss of life in an extreme and unusual scenario, and is capable of analysing the outcome to find ways of further improving safety so it’s capable of surviving even more extreme situations.

Expecting any safety system to cope perfectly with every scenario is unrealistic, but one that handles pretty much all of them without any serious injury or loss of life is clearly working very well.


To put things in perspective, an A330 lands at 140 kts, which translates to 236 ft/sec, which means that if the plane had travelled along the runway at 140 kts for 130 milliseconds longer, then they'd have a runway excursion.

Now, they would have decelerated before the malfunction, so maybe a 500 millisecond slower reaction time would have caused the plane to leave the runway. Maybe, 3 second delayed reaction time would have resulted in the plane hitting a building or a wall.

That doesn't sound a robust safety factor to me.


You’re making a lot of assumptions about the braking capability of this plane. Notably you’re assuming that the plane was completely incapable of stopping in a shorter distance, ignoring pilot reaction time.

The report mentions that the pilots failed to apply maximum manual breaking till they were a good distance down the runway. A reasonable interpretation of this fact is that the pilots were afraid of locking up the wheels by accident, so they were applying the minimum breaking they though they could get away with. It was only when they were getting towards the end of the runway that the pilots realised they really needed to break a tad harder.

In short if the runway had been an extra 300feet long, the pilots would have still stopped the plane within 30ft of the end. It’s perfectly natural and completely rational for people to consume every ounce of available safety factor when your dealing with the unknown. So measuring the consumed safety factor after the fact isn’t inherently useful or indicative of what the actual minimum safety margin needed to be.

Additionally a runway excursion is not inherently dangerous, provided you don’t do it at high speed. It never going to be very good for the plane, but leaving the runway doesn’t automatically result in all the passengers dying.


This report doesn't indicate any fault or blame at all on the pilots: they reacted with reasonable actions to the circumstances as they arose. The only possible nit one might pick is the PF's delay in asking for the PM's assistance in braking.

But, yeah, the report does fault (kinda) the runway conditions and insufficient margins in the flight path/plan to account for an extreme such as this. The fact that a terrible tragedy did not have to occur to make those identifications seems to indicate a very well functioning system to me.


That's how I read it too. The discrepancy was deemed "I'm broken and need to shut down" instead of something more benign in a condition they didnt realize might happen. That a critical difference since all 3 will think "I'm broken" as happened here.

That stuff can be hard to get right. Glad they found this one with 10 meters to spare.


It seems that flight planning is done on the basis that the loss of all flight computers is possible, and to make sure that your runway is long enough to accommodate that situation.

So thankfully the loss of all three computers isn’t inherently dangerous, and is demonstrated by this incident. Based on my reading of the report the primary take away from this (apart from a an issue with the flight computers software), is that flight plans aren’t and prep by the airport wasn’t conservative enough, so they ended up with slightly less safety margin than they expected in this scenario.


> loss of all three computers isn’t inherently dangerous

You seem to be using a different definition of "dangerous" from the rest of us. Loss of systems that pilots are used to depending on is inherently dangerous. After that, degrees of luck, skill, and conservative planning become major factors in the outcome.

The loss of spoilers and reversers is itself dangerous. In slightly different conditions, e.g. snow, the aircraft would not have stopped where it did.

There was a Lockheed aircraft that ran off the end of a runway at an above-usual touchdown speed: the spoilers would not go up because for lack of weight on wheels, and brakes had no traction because spoilers were not up.


I don’t know what to say. Pilots are trained for this exact situation, flight plans are designed for this situation. Every reasonable measure is taken to ensure that this exact situation isn’t unnecessarily more dangerous than it absolutely needs to be. As proven by this exact incident.

You can talk about hypotheticals like snow, but this is a commercial airport. They will either have snow clearing equipment or they’ll redirect planes if they can’t clear the snow. It’s also Taipei where it basically never snows around the city. Simply put if it was unsafe to land this plane in this state with snow (and snow was believed to fall that day), then the plane wouldn’t even take off.

Lockheed aircraft aren’t passenger airlines, they have a completely different risk profile and risk appetite. I imagine Lockheed aircraft occasionally get taken down my enemy fire, but we don’t build commercial aircraft to handle that situation.


"[Not] unnecessarily more dangerous" is very far from the same as "not dangerous".

If you cannot imagine this identical failure occurring at a different airport, in different ground conditions, I don't know what to say. You might as well say Sully splashing down in the Hudson without loss of life was unsurprising because he was trained for emergencies. (And, incidentally, airlines do not, in fact, train pilots for "all engines failed"; it is considered too unlikely and too unsurvivable.)

Lockheed did make many, many passenger aircraft, and the failure I cited was, in fact, in a passenger aircraft. If you cannot understand changes in the aircraft business landscape, I don't know what to say.


> (And, incidentally, airlines do not, in fact, train pilots for "all engines failed"; it is considered too unlikely and too unsurvivable.)

Nope, that is wrong. Even before Sully, pilots are trained for All Engines Out. Aircraft even have systems to handle this like the RAT (Ram Air Turbine) which deploy on loss of power to enable critical systems like control surfaces and basic navigational gear (GPS, Radar, Artifical Horizon). Prop planes will feather all engines to maximize gliding distance. All Engines Out is definitely survivable and in some cases was even recoverable in flight.

After Sully, the practise of training for All Engines Out was expanded to even lower altitudes and earlier in the takeoff procedure (as noone had failed all three engines that low before). You can read that in the FAA report (and the Mayday Episode summarizing it).

There is other episodes too, like cases where the aircraft ran out of fuel (Gimli Glider) or flew through Volcanic Ash (BAF 9).


My brother is an airline pilot. He is given exactly zero hours of simulator time for "All engines out". That flights have survived the event does not contradict the fact.


Well, probably ask your brother again, because All Engines Out is required for a pilots license in the US and most other countries. Even Helicopter pilots have to train in the simulator for engine failure (autorotation landing).

It must come up in training atleast once and every aircraft has an "all engine failure" checklist for this exact situation (The FAA recommended the addition of a "all engine failure at low altitude" checklist as well, which I believe has occured).

Either your brother is incorrect about the simulator requirement, forgot about it or is flying a two-seater Cessna.

You can verify this also by watching some of the videos of popular pilots on youtube such as Mentour of 74Crew.


He flies 747s at the moment. Those have 4 engines. But he has logged a lot of 2-engine airliner time.

Yes, there is a checklist to pull out if all the engines fail. But, as I already said, the airline allows him exactly zero minutes of simulator time for it. I questioned him very closely about this.

People flying single-engine light aircraft have to think about engine failure all the time; but there are no single-engine airliners.


All engine failures in commercial airlines happen about once every two years.

Since 1953 there have been ~38 incidents where an airliner has been forced to glide (I.e. complete failure of all propulsion). That’s 38 incidents in 68 years. The incident rate has been falling while flight numbers have been increasing.

Since 2003 (it’s hard to get date before then) there have been ~600,000,000 passenger flights, and seven gliding incidents. So the odds of being on a plane with an all engine failure is around 1:80,000,000. Winning the lottery is around 1:45,000,000.

As failure cases go, thats really not bad. In the US around 104 people die every day in traffics accidents.

So if you wanna get hot and bothered about public safety, I would suggest you start there. Rather than criticising the safety procedures of the safest form of transport.


I have not, in fact, criticized the safety procedures of aviation. I have corrected your absurd claim that aviation emergencies are "not dangerous". It is an uncontroversial fact that people have died in aviation emergencies.


Of course I can imagine an identical failure happening at a different airport with different ground conditions. Just like I can imagine that there would be even more conservative safety margins to go along with possibility of worst ground conditions.

Do you think there’s a single flight plan used for every plane and airport? Of course there isn’t, a new flight plan is created for every single flight, with margin built into to handle the expected conditions in flight and at landing. So if you’re landing at an airport that gets snow, you increase your required runway allowance to ensure that a lose of all three flight computers doesn’t become dangerous.


Not dying when you missed seeing a stop sign and cruised through an intersection does not demonstrate anything positive about your planning. It only means you were lucky. That the plane stopped with only 10 meters to spare does not demonstrate a lack of danger; it demonstrates how easily the result could have turned out very, very different. Being only one second later applying brakes would have used up another 300 ft of runway.

If you imagine that an identical landing would not have been attempted on a snowy day, you know nothing about airline operations. And, if you don't understand the role of luck in averted disasters, it is a good thing you don't have any actual responsibility.


Go an read the report again. It clearly mentions that the pilots didn’t apply maximum breaking till they were a good way down the runway.

That strongly suggests the pilots were worried about locking up their wheels, and thus were applying the minimum breaking they though they could get away with. Up until they realised they were running out of runway.

The runways could have been an extra 600ft long, and they still would have only stopped within 30ft of the end, because it’s quite clear the pilots were trying to use up as much of the runway as they thought they could get away with. A perfectly reasonable approach when you don’t know how hard you can break without causing a loss of traction.

You shouldn’t read so much into the amount of spare runway left when you’re dealing with a situation where consuming every spare inch is the safest cause of action.


I am so glad you have no responsibility for public safety.


What can I say. The report and remedial actions pretty much agree with what I’ve said. So those who are in charge of public safety are taking a very different stance to you.

Guess we’re all screwed, probably explains why air travel has such an atrocious safety record compared to other forms of transport.


> "we can't all be wrong, it must be the sensor", to avoid situations like this?

Not an expert but my understanding is that typically with these systems they take a poll and vote, if 1/3 disagree it's ignored if 2/3 disagree they scream and switch to manual/fallback simpler systems.


For a binary signal, it's impossible for all three to mismatch. For a more analog signal, all three will generally mismatch to some extent whether it's in time or in space or both.

Fault tolerance and fault detection are two separate but often coupled concepts. Systems can be designed to be inherently tolerant to a fault without detecting the fault (and good designs often are). All faults need to be detected eventually, though, so that they can be repaired before more faults occur and compound into a broken system.

There's typically very tight timing requirements for fault tolerance, and significantly looser timing requirements for detection. As a result, you can often solve them differently. In a 3 string system with voting, it's often the case that the median signal is used for control without any interpretation of "goodness". That strategy works fine for short time periods of fault tolerance, as two strings would have to produce bad signals for the system to be affected. Separate from the median voting control path, you would then have a variety of consistency checking algorithms looking at the three signals and trying to intelligently determine whether any of the strings have failed. Those algorithms are often stateful and complicated, and rely on heavy filtering to avoid false positives.

When a fault is detected, at minimum it needs to be communicated to an operator. In some cases, the detected fault will also trigger a "fault response", i.e. disabling the offending computer.

In this case, it sounds like maybe a fault detection algorithm had a false positive that disabled the computer, and the same algorithm was running on all three computers.

Despite there being 3 computers, it doesn't sound like this is a 3 string voting system. Rather, each of the three computers are independently able to control the system. The 3 strings exist for redundancy rather than for fault tolerance. Fault tolerance is provided by having two computers that cross-check everything they do, and third computer is there so that the cross-checking is fault tolerant. Two string redundancy is very common in automotive and aerospace.


Indeed, it seems like the computers were not in agreement

> the combination of a high COM/MON channels asynchronism

In this case, since the automation failed, the plane reverted to "Direct law", where the pilot's inputs directly control the plane, instead of passing through computer checks first.


How can you tell the difference between 1/3 disagreeing and 2/3 disagreeing?

If 2/3 disagree, don't they just agree with each other, and therefore it still only looks like 1/3 disagreeing?


1/3 disagrees - two sets of outputs are converging, one is diverging

2/3 disagrees - all 3 sets of output diverge from each other; there is no clear course of action


The 2020 should be removed from the title. The article is mostly new information from the final report:

> On Sep 3rd 2021 Taiwan's ASC released their final report in Chinese and their English Executive Summary concluding the probable causes of the incident were: …


Remind me of the Ariane 5 maiden flight: redundant hardwares does not protect against unsanitized inputs. I don't know much about the design of the airbuses, but I think it's sensible to specify the last primary computer running in a different mode as the first two, simply because double hardware failure is very very unlikely. So it boils down to either software bug and or unsanitized inputs. Assuming that, the last computer should provide some basic but safe functions.


I've seen bad user input, one very unusual char, bring down a whole self-healing geographically redundant cluster of machines running an expensive commercial cluster software. Any connected devices can bring each other down.


That sounds like a great war story if you’re open to telling it :)


Story, story, story, story!


Well that’s kind of what happened. The primary flight computers all failed, to the plane fell back to “direct law”, which means pilot input is directly transmitted to control systems without a computer attempting to interpret intent or protect against pilot error.

As a consequence a lot of the more advanced functions became disabled because they rely on the primary flight computer to indicate when it’s safe for them to operate. Presumably because accidental operation of those features in flight is extremely dangerous.


Why can't thrust reversers and spoilers be engaged without computer supervision? The article just says that ground spoilers require one of the three flight computers to remain operative, autobrake needs two, reversers needs one out of two specific units to be running to unlock.


All flight controls on the A330 are fly-by-wire - they're connected via computers. The most critical surfaces have secondary, dumb controllers that can take over if the primary flight computers fail, but less critical systems may not be connected to those backup systems - it's expected that a triple fault of the primary flight computers should be very rare, and manual braking is available as a backup after all.


Right. In this case we have a completely-unprecedented triple failure of the primary flight control system causing loss of auto-brake, spoilers and thrust reversers; a wet runway with less than expected braking action; a tailwind landing and still a happy outcome.

Honestly defense-in-depth seems to be working OK here.


And note that for all aircraft the landing distance is calculated _without_ thrust reversers or spoilers, and thus is based on braking performance alone.

Some of the challenge here was the seconds spent at near landing speed without any braking being performed, as that'll eat up runway distance fast, following the auto brake failure prior to manual braking commencing.


It's also worth stating (for the OP) that thrust reversers are potentially really dangerous -- if they deploy in flight without being commanded to, the aircraft can -- or, more likely if the engine computer does not detect it and shut itself down, will become barely controllable, with previous fatal outcomes. [1]

Two or more FCPCs failing is sufficient evidence that a "faecal fan incident may be occurring" that the risk of deploying them is just not worth it, especially as (pointed out by this parent above) they are effectively optional equipment and the FAA requires you to expect them to be inop to be safe.

What they do save is fuel and time -- taxi time to the terminal, permit the use of high-speed runway turnoffs, etc.

[1] https://en.wikipedia.org/wiki/Lauda_Air_Flight_004


[flagged]


Hell yeah downvote me for calling out despicable behavior. These are the best kinds of downvotes.


Like if this happened to a flight in Florida, we would be making Florida Man memes and shit. It's clear there is a double standard here.


Yes, it worked because there as a direct controller to fall back to. But worryingly because two things, it wasn’t a triple failure of the redundant flight computers, is seems to point to a singular failure of the monitoring logic.

The other thing is that the report notes that the aircraft came perilously close to a disaster because it calls out other specific factors could have easily eaten up runway margin leading to a disaster.


Based on the article it seems this is a safety feature to prevent the accidental activation of those functions in flight.

I imagine there’s zero reasons for using ground spoilers or reverse thrusters in the air, and doing so would probably cause a complete loss of aircraft.

So to make sure that can’t happen, the functions require a positive “it’s safe to active now” signal from the flight computers.

So the functions themselves are quite capable of operating without the flight computers. They simply refuse to do so unless a flight computer has given them the all clear.


Japan Airlines Flight 350 accident in 1982 - at 164ft/50m AGL, the captain suffering a schizophrenic attack disengaged A/P, activated reversers, and pushed down his control column hard forward. 24 deaths, 150 survivors.

ugh.


> I imagine there’s zero reasons for using ground spoilers or reverse thrusters in the air, and doing so would probably cause a complete loss of aircraft.

It has, on several occasions, although I'm not aware of any where it was a commanded deployment of reversers in flight.

https://en.wikipedia.org/wiki/Lauda_Air_Flight_004 https://en.wikipedia.org/wiki/TAM_Transportes_Aéreos_Regiona...


Like I said, zero reasons for using reverse thrusters in the air. Complete loss of air craft if you do.


Actually the ground spoilers can be used in a spectrum during flight especially when the plane is descending and needs to slow down as well.


Accidental deployment of thrust reversers is a bad thing and incidents of that kind have killed a lot of people. It seems better to bias the failure modes toward non-deployment. The reversers are not critical systems, and aircraft are permitted to operate even if their reversers are not operational. They mainly save wear and tear on the tires and brakes.


In other words, it's more important to ensure they don't deploy when they shouldn't, than that they do deploy when requested.


This was one of the reasons of why TAM3057 crashed in São Paulo.


> The crew applied maximum manual braking and managed to stop the aircraft 10 meters/33 feet ahead of the runway end (runway length 2600 meters/8530 feet).

Hope the crew had their brown pants on.


From the cockpit, 33’ must have looked right under their nose.


Songshan's not a large airport, it's mostly used for domestic flights, and a few nearby international destinations. Single runway, immediate obstacles on both ends (no matter which way you land). Although, runway 10 is eastbound & it's just a fence and small parking lot, minor traffic, and then you land in the river.

Sounds like it would have been stressful.


I wonder how the pilots would have reacted if the computers had crashed 10-20 seconds earlier. Would they have landed or would they go up again to wait for reboot?


Difficult to say, since the computer crash was linked directly to having just touched down. 10-20 seconds before there was no disagreement because neither computer has detected that the wheels were on the ground.


I wonder what would have happened if they tried to go around? The TO/GA is configured ahead of time in the FMS… so if the flight computers are out, I wonder if there’s some hard-coded performance targets for the engines in a go around situation.


There was barely a second between the first indication of a primary computer fault and the call for reversers to deploy. Once you select reverse thrust, you cannot go around under any circumstances.


It’s an interesting scenario. Your wheels touch the ground; you engage reverse thrust and it instantly fails; you are rolling at dangerous landing speed but now can’t go around?

Relatedly, I wonder why did the pilots not engage the regular brakes ASAP.


Not quite, reading around it seems that flight planning requires you to assume that reverse thrusters and ground spoilers don’t work.

So you land, your reverse thrusters immediately fail, and you’re now required to stop using your manual breaks and consume all of the breaking distance you originally planned for, with presumably a bit of margin on top.


They would push throttle manually in max power and perform the maneuver by hand.


This could probably use a 2020 in the title


Final report was just issued this week.


Maybe explains why there were only 87 pax in a 330 too.


I note there was a tailwind serious enough to reduce breaking performance, and also that it is possible to land in the opposite direction on the runway in question. Is the practice of landing into the wind ignored within certain parameters?


In general, aircraft are certified to land/takeoff with tailwinds (and crosswinds, etc.) up to a certain speed. You don't want to be changing runway direction all the time, after all.

Ultimately, it's up to the crew on board to determine the landing distance required for the current conditions (including the wind—whatever direction that is!), and if they don't believe the conditions are safe they should not proceed with the landing. They can, of course, request the opposite runway, though obviously if there is frequent traffic ATC is likely to refuse unless they want to change the direction for all traffic.


> The root cause was determined to be an undue triggering of the rudder order COM/MON monitoring concomitantly in the 3 FCPC. At the time of the aircraft lateral control flight law switching to lateral ground law at touch down, the combination of a high COM/MON channels asynchronism and the pilot pedal inputs resulted in the rudder order difference between the two channels to exceed the monitoring threshold. The FCPC1 failed first.

In lay-programmers terms, this sounds like a race condition. Is that correct?


Not quite. The computers disagreed, they didn't race. Likely cause might be that the sensors responsible for detecting touchdown might have been rusty or failed. The computers aren't racing towards ground law here, they switched to ground law at different times sufficiently apart that the monitoring watchdog turned them all off for commanding the aircraft to do three different things.

Imagine it more like website and it's hosted by three servers with a replicated database. A monitoring tool checks that the number of entries in each server's database is the same. If one server has a different number, the tool turns it off, it now requires a human to bootstrap the database again. If two servers disagree the website goes offline, because it's no longer possible to tell which server has correct data and not operating is cheaper than the man hours required to puzzle the data back together.

In this case, a likely cause might be that suddenly the network latency between the three servers got suddenly very large (cable damage maybe?) and now all three no longer agree on the dataset. The servers are shutdown.


Feels like there might be more to the story than is being let on.

Curious why the PF has to specifically ask before landing for the spoilers to be called, to tell if he's touched down. AFAIK it's actually common Airbus procedure to called Spoilers and Reversers and deceleration after landing. Also telling whether or not you touched down isn't really a problem... even for the smoothest/softest of landings. Then there's the PM calling out "center line" at 130 feet...


Commercial piloting is hours of boredom with few moments of raw sheer terror.


Odd that there is no information from Airbus there.

FYI: the comment section contains some interesting background.


Well there is a single bullet point from/about Airbus

> 7. Following the occurrence, Airbus reviewed its in-service experience, and confirmed that no other triple PRIM fault at touchdown event had been reported on A330/A340 aircraft family since entry into service. The A330/A340 fleet fitted with electrical rudder has accumulated 8.7 millions of Flight Cycles and 44.3 millions of Flight Hours (in-service data from April 2020).


So, does this mean that the outcome was, Airbus reviewed the situation and the data, and made no changes or corrections to the system?


There will be a software update.

Read page 9 of the exec summary linked in the article.


Good thing they failed on touchdown and not up in the air.


Airbus: what is it doing now?


Good thing that this was on a smaller airplane, otherwise it could have been a larger disaster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: