Ah yes, The Fundemental Failure Mode Theorem[0] at work: Complex systems usually operate in failure mode.
Or, "That's why I'm skeptical of people who look at some catastrophic failure of a complex system and say, "Wow, the odds of this happening are astronomical. Five different safety systems had to fail simultaneously!" What they don't realize is that one or two of those systems are failing all the time, and it's up to the other three systems to prevent the failure from turning into a disaster." [1]
See also United 232 http://en.wikipedia.org/wiki/United_Airlines_Flight_232. The DC10 has three separate hydraulics systems, therefore the probability of all three failing is p^-3 right?? But wait, the lines are all grouped together through the tail, meaning if one is severed there, the other two are likely to be severed as well.
You might conclude from this that airliner designers are fools. An airliner is a very complex machine, and can have unexpected interactions between its components. Look at the other side:
1. the airframe held together despite an explosion at the back. The rudder and horizontal stabilizer stayed on.
2. the aircrew figured out how to control the airplane with no hydraulics, i.e. there was still some redundancy in the system.
3. the landing gear was designed so it could be extended and locked with no hydraulic power, and that worked
4. if the airplane or aircrew was any less, nobody would have survived
5. electric power stayed on
And, airframe companies learn from these disasters, which is why airplane travel is incredibly safe. Boeing airliners, for example, do not locate critical components inline with the turbines. Hydraulic lines do not extend past the inboard engine. There are a number of other improvements as well.
Having worked on 757 flight controls for three years, I can assure you that none of the engineers want any part of a defective design. None want to make any decisions that lead to a smoking hole in the ground. An awful lot of effort is spent pouring over the designs again and again looking for mistakes.
I don't know if they existed at design time, but hydraulic fuses could have prevented draining of all the fluid. That makes the design take longer, increases weight, introduces new scenarios such as the fuse activating when it shouldn't, increases cost, and increases maintenance & parts (cost of ownership).
As with these matters there is no one true correct answer, but rather a very complicated set of tradeoffs and probability estimates. In hindsight it is easy to see designs as defective, but they could all be done in good faith.
No offense meant to aircraft designers, my point is just that it's a common fallacy to assume that probabilistic events are independent when they are in fact dependent in some ways. In the case of debris penetrating the tail, if one hydraulic line is severed, the probability of the other two also being severed is high, therefore the probability of a triple failure is not equal to p^-3; you must add to that the probability of this single catastrophic event occurring.
That post from The Old New Thing[1] makes a fascinating point. Something like that ought to be required reading for anyone who does anything even vaguely engineering-like.
- what are the total number of failures that can happen?
- what is the probability of those failures occurring
- how many sets of those can combine into a catastrophic failure?
And then from those numbers you can derive the probability of a catastrophic failure occurring.
Yes, but the thing that the Fundamental Failure Mode Theorem tries to draw attention to is "what is the probability of those failures occurring".
i.e. it's possible that some of those failures have already occurred, and you just haven't noticed because the redundant systems are being redundant and preventing the overall system from failing catastrophically. Or you have noticed, but think that the redundant systems are sufficient, not realising how much closer they bring you to a single point of failure. So the probability of your whole system failing are higher than you'd expect, because you already have failures which you think are p < 1 (possibly p << 1) but are actually p = 1.
In the case of the Gimli glider, they had two independent FQIS systems and a floatstick in case of a single failure, and a rule that the airplane was non-servicable in case of both failing.
On the flight in question, one FQIS was non-servicable. The second was servicable but had been switched off, but due to a miscommunication it was thought that the no-fly rule had been overridden and the plane was OK with a floatstick measurement. Further, if the fuel calulation from the floatstick measurement had been correct, they would have refueled the plane and no-one would ever have heard about Air Canada Flight 143 because everything would have been fine.
The problem was that they were knowingly operating in a failure mode, without either FQIS and disregarding the no-fly rule, and thinking that the floatstick measurement was sufficient. Therefore it only needed one further failure - miscalculating the amount of fuel from the floatstick - to bring about disaster.
Well, we can observe that the probability of a catastrophic failure in an airliner is in fact extremely low, since they happen extremely infrequently. So even if systems are operating in a failure mode, there appears to still be enough redundancy left to lower the odds enormously.
I sort of take the opposite approach with comments that the odds of such a failure are astronomical. Given how safe airliners are these days, anything that causes a bad emergency with one must be an extremely unlikely event. If it weren't, more airliners would crash than actually do. I've seen this going around with MH370, for example. People will dismiss an idea for what caused the disappearance with a comment that such an event is extremely unlikely. Well sure, pretty much by definition, whatever caused it has to have odds of something like a billion to one.
I'd also like to bring up a minor quibble, in that Air Canada 143 did not end in disaster. It certainly came close, and should be regarded as a serious incident with lessons to be learned, but ultimately everybody survived and the airplane was returned to service, precisely because there weren't quite enough failures to cause a disaster. Various things went wrong, but the pilots managed to stop the chain of events by successfully responding to the in-flight emergency. The ability of the airplane to continue flying and somewhat functioning after fuel exhaustion is a type of redundancy, and it ultimately saved it.
Transat 236 is an amazing story. How lucky were they to run out of fuel, out in the middle of the Atlantic, but be able to glide to the Canary Islands?
If it had had a few hundred pounds less of fuel, it would have ditched in the middle of the ocean and likely many of the passengers would have died.
"the cockpit warning system sounded again with the "all engines out" sound, a long "bong" that no one in the cockpit could recall having heard before and that was not covered in flight simulator training."
I hope that's something that IS now covered. Why would you never train for that secenario?
Mine did, actually. My instructor used a clipboard to block my field of view, and walked me through coming to a controlled stop with minimal visibility.
In Germany part of the training in many schools involves getting to the nearest autobahn and going at least 120mph to teach you driving at such speeds. Yes,there are training courses which don't do that, but it's not uncommon.
From what I've heard the driving/licensing test in Germany is actually a test - in the United States the written test questions show a picture of a stop sign and ask what it means (there are then three multiple choice options).
Yeah, it involves quite complicated questions about driving (a diagram with yourself, a pedestrian, a cyclist, a tram, and 3 other cars - you have to specify who goes in what order), and also technical questions, like "your temperature gauge is going into the red zone - what do you do?" or "on a winter day your windows get foggy from inside when driving - what is the safest solution?", and also general question about laws "what is the bac limit?", "how fast can you go on the autobahn when the conditions are poor?"(answer:130km/h(~80mph)).
While I feel that US driver education and testing has many faults, I don't want everyone to have the impression that the test is as simple as 'What does a STOP sign mean?'. Hyperbole is good for humor but can be misguiding in a discussion about actual solutions.
While it has been some time since my driver's test, I remember somewhat complicated questions about right-of-way, dealing with vehicle problems, handling adverse driving conditions, and responding to potential accident situations. Of course, part of the problem I have with our driver education system is how much it varies among the states. Most of what is shared deals with the operations of highway driving, while I think there needs to be more requirements for overall safe driving.
Also, I was a little incredulous at first that 80 mph is the answer for 'poor' conditions. But, I suppose I have driven on 80 mph interstates here under extremely heavy rain while everyone maintained the speed limit. I suppose it's just one of those things where we have a different sense of scale.
While it may have been in isolation the "What does a STOP sign mean?" question is a question that I actually had on my written test.
I think it was phrased a bit differently, but that was the essential question. The multiple choice answers were also extremely leading with one being even more obviously correct than you would expect.
In California you're also able to take the test 3 times in a row if you fail it every time, you can then take it another three times immediately if you pay $20. Since there aren't many questions it's basically impossible to fail.
U.S.? If so, me too, and I think we have horribly insufficient driving instruction here. Maybe it varies by state, but yeah, we didn't go on the freeway, I didn't really learn how to parallel park, I didn't learn anything about highway ettiquette (i.e. keep right unless passing), etc.
I wish we were required to have many, many more training hours before being licensed, especially in the colder climates with icy conditions on the regular.
When I first had my license, I ran up the back of someone. After checking that the car wasn't too badly damaged and exchanging details I decided to drive the car the rest of the way home.
Was cruising along the freeway, when the bonnet (hood) flipped up. Thankfully, I could see through the gap, and managed to pull the car over safely.
Hoods blockage up is a comparatively common event in rally races. I've read that many teams specifically design the hood to leave a large gap if it flips up, thus allowing the driver to continue down the stage. This happened to a top driver at the latest Sno*Drift rally here in the US, and the video is pretty fun to watch.
I actually remember them saying that hoods are designed to leave a small gap of visibility at the lowest part of the windshield when they fly up. I'm tall, and the instructor pointed out that I'd have to lean over/down to be able to see through it.
But I can think of a dozen other emergency situations they didn't cover.
If any of you are really interested in these incidents they have recently been added to netflix titled "Air Disasters" and additionally you can find many more on Youtube from https://www.youtube.com/user/aircrashofficial and similar uploaders!
It is the pilot's responsibility to ensure that there is enough fuel for the flight, so that's no big surprise, especially since the fuel gauges were inoperative and they did not attempt to verify fuel level any other way. They weren't the only ones at fault, though.
No, it wasn't clearly the pilot's responsibility. FTA:
[The safety board] noted that Air Canada "neglected to
assign clearly and specifically the responsibility for
calculating the fuel load in an abnormal situation,"[4]
finding that the airline had failed to reallocate the
task of checking fuel load that had been the
responsibility of the flight engineer on older
(three-crew) aircraft. The safety board also said that
Air Canada needed to keep more spare parts, including
replacements for the defective fuel quantity indicator,
in its maintenance inventory, as well as provide better,
adequate training on the metric system to its pilots and
fuelling personnel.
The way around blaming them is that it has a horrible systemic effect to do so. In the future pilots are less likely to admit fault, less likely to provide details in investigations and so the organization has less data to learn with.
Instead look at what happened and say "if things had been different this couldn't have happened" and then make those things a reality. Maybe the answer is putting a sticker with metric/imperial conversion on the tank or something, so pilots aren't confused when checking with dipstick.
> The way around blaming them is that it has a horrible systemic effect to do so.
That's why it's a good idea in complex systems to pre-assign to each requirement a responsible party. Otherwise, you could just use the "systemic effect" argument to either blame every party or no party in the system, neither of which is very useful.
The fuel requirement was pre-assigned (for every other model of plane in the Air Canada fleet) to the flight engineer. Who that requirement was assigned to for this one model of plane wasn't really clear in the article.
It was never explicitly said. The plane used to have a 3-person crew and had moved to a 2-person crew (missing the flight engineer). This left the check unassigned. It is mentioned in the 2nd paragraph under the Investigation header.
If you are running an organization that deals with a lot of complexity (airlines, web systems etc.), it's generally not a good idea to blame anyone but the system. If you look at everything systemically, then the organization continually learns. You have to trust the people in the system to be coming to work in good faith, if you can't do that you have other issues.
I can see where it looks that way, but I don't think the effects are like that in practice. From the Wikipedia page, it appears that the career effects of this for the pilots involved were fairly minor, given the mitigating circumstances.
I don't really follow. How are you going to recommend changes which help the pilots make correct decisions if you don't say that this event occurred because these pilots didn't make a correct decision?
You trust that the pilots made the best decision they could given the data they had, and you find a way to give them better data next time. If they took a known risk, you ask why they thought it was ok to take that risk.
If you think they were truly acting in bad faith, you have other issues.
Sorry, maybe I wasn't being clear. I don't think they are the same thing, my point is more that if you think someone is acting in bad faith you should fire them, but assuming they aren't, you shouldn't assign the fault to them, because then the only solution is "just do better next time." The organization learns nothing if you assume the pilot is at fault. The entire point here is that the pilot is not at fault, which leaves the door open to fixing the system. It's all about enabling organizational learning.
That makes no sense. Assigning fault to the pilot could mean that your training is inadequate, or that your rules for rest or drugs aren't sufficient, or that they need to be more explicitly empowered to resist external schedule or financial pressures, or any number of other things.
There's a wide range of reasonable "it's the pilot's fault" judgments which don't involve firing them or just shrugging your shoulders and saying "do better".
I guess maybe this is just an issue of semantics. I don't view the pilot being at fault if the training is inadequate, I view the training as being at fault.
I was at that airfield last summer for a cadet course (I'm an air cadet) and I'll be going again this summer to get my glider pilot's licence; just like the pilot of the Gimli Glider did so many years ago! I can't wait!
The pilot (or co-pilot), I don't recall, was also an Air Cadet at that airfield, which is how he knew that it was there, a good long strip, and well maintained.
I hope the Americans don't use this as an excuse to stick with their old school system. I guess in one way they are closer to the Kingdom of England than us Canadians are.
In the aviation world it's really mixed. ICAO uses imperial and metric measurements for different things. Feet are used for altitude and knots for distance, but temperature is always measured in centigrade.
It's convenient to use feet because planes are stacked in 500' increments and not 152.4m increments. VFR (visual flight rules) flights are usually on the 500's (eg. 3500', 4500', etc. depending on heading) whereas IFR (instrument flight rules) flights are on the 000's (4000', 5000', etc.).
Knots are convenient because 1 knot is equal to 1 minute of 1 degree of arc on a great circle. If you're flying anywhere far away this ends up being important as a great circle is the shortest route between any two places.
Oddly enough, the metric system is useful with temperature because the standard lapse rate is 2 degrees per 1000' of altitude. So if you had to climb from 6000' to 8000' on an IFR flight plan, you would usually drop 4 degrees centigrade, which might be significant if it was raining out and it dropped below freezing. Having water on your wings and climbing up to an altitude where it's freezing is going to make you have a really bad day.
Remember that we live on a sphere (or something approximating one -- it's a sphere that bulges). A great circle route actually is a straight line, it's just that on a 2D map it looks like you're constantly turning.
After reading the whole article I'm convinced that learning metric from primary school age up would have avoided this accident.
"In this case, the weight of a litre (known as "specific gravity") was 0.803 kg. (...) Between the ground crew and pilots, they arrived at an incorrect conversion factor of 1.77, the weight of a litre of fuel in pounds."
See, here's where metric system sanity checks become relevant, that are simply not possible with imperial because nothing is compatible with each other.
In metric, I just know, without any conversion factors to memorize, that 1kg water equals 1l at room temperature. Everyone with high school education should also know that gasoline products are lighter. Now I can do a sanity check - since the weight of one liter should be less than 1kg, there's no way 1.77 is right. If two well educated people do this math, at least one of them should see the mistake.
Although imperial measurements are used in the UK in legal situations (mph/yards to next junction etc), they're actually almost never used in engineering fields. In fact, I'd say that Canadian engineers use imperial measurements far more than UK ones do. The influence of the US?
I really can't wait for us to start using a sane system. While we're at it, could we also start writing our dates in a sane way? Putting the month before the year is beyond ridiculous.
On-Topic: Anything that good hackers would find interesting. That includes more than hacking and startups. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual curiosity.
Also,
Please don't submit comments complaining that a submission is inappropriate for the site. If you think something is spam or offtopic, flag it by going to its page and clicking on the "flag" link. (Not all users will see this; there is a karma threshold.) If you flag something, please don't also comment that you did.
Or, "That's why I'm skeptical of people who look at some catastrophic failure of a complex system and say, "Wow, the odds of this happening are astronomical. Five different safety systems had to fail simultaneously!" What they don't realize is that one or two of those systems are failing all the time, and it's up to the other three systems to prevent the failure from turning into a disaster." [1]
[0] http://en.wikipedia.org/wiki/Systemantics#System_failure
[1] http://blogs.msdn.com/b/oldnewthing/archive/2008/04/16/83984...