Hacker News new | past | comments | ask | show | jobs | submit login
Amazon cargo plane crashes in Texas (wsbtv.com)
361 points by Trisell 4 months ago | hide | past | web | favorite | 297 comments

I have gained such respect for the FAA and the NTSB in that every crash is followed up with a root cause analysis that all other industries should be in awe of.

When this is all over, we'll know exactly what went wrong. We'll see the FAA issue guidance or rule changes that will ensure it doesn't happen again. And then we'll somehow see an even further reduction in airline crashes.

I hope that in a few more years, as more companies join the industry, we may see a similar pattern start to evolve in the space launch industry.

Not root cause analysis. Complex systems fail in complex ways. If you want to read more about how this is done, try Normal Accidents (Perrow), Human Error (Reason)... but you're going to land on Engineering a Safer World (Leveson) eventually. You can get a free copy from the author's web site: http://sunnyday.mit.edu/safer-world/index.html

There are tech shops that use these sorts of techniques for every major incident, and for collections of related near-misses. If you want to improve past 4 9s reliability, or if you're considering taking traffic that could wreck a life if it availability or secrecy fail, that only seems appropriate.

The Navy likes to keep things simple and called it the Swiss cheese model.[1] It’s usually never a single factor, but a combination of factors. The holes line up and an accident happens.

I think it’s also important to emphasize that getting to this level of safety is so much more cultural than technical. The ability to be open about failure and for people to feel safe in communicating to investigators is critical. The Navy would run two parallel investigations into an incident, one focused on Safety, where anything said was confidential and couldn’t be used in the administrative investigation (FNAEB) that could result in career ending consequences.[2]

Though I worry a bit with the constant wars and punishing deployments this culture may be heading in the wrong direction. [3] One of my XOs said, “One generation of Admirals will make their names breaking the Navy, and the next will make their names putting it back together.”

I have immense respect for the FAA and NTSB. They are trying to infuse this safety ethos into the fledgling drone industry. No easy task.

1. https://en.wikipedia.org/wiki/Swiss_cheese_model 2. https://www.airwarriors.com/community/threads/my-fnaeb-exper... 3. https://www.theatlantic.com/technology/archive/2018/10/the-n...

I just wanted to reiterate that I've found the Swiss cheese model one of the simplest and most effective communication tools for thinking about and communicating to laypeople how various events happen where they want to attribute 'cause' to a single factor, decision, etc. In my experience it can be doubly difficult to get people to think in this way once culture has got them thinking differently (I despise regression analysis now, because it breeds a layperson culture of both an illusion of understanding, and that individual variables really have individual coefficients that can be simply manipulated).

My first introduction to it was watching a documentary on air crash investigation as a child, and I've never forgotten it since.

I don't even work in 'safety' per se(currently I do analytical and data work for banks and financial regulation, so one could argue that is a kind of safety), but it just keeps coming up again and again because in the real world, most big human disasters are multi-causal chains, because evolution (in a social sense) will breed out those disasters that are both big in their detriments and simple in their causes.

The JAGMAN investigation is the legal/administrative investigation.

The Safety Investigation Board is separate, and privileged.

FNAEB investigates the crew, and usually follows a Class A mishap ($2m damage or hull loss, or death/permanent disabling injury) but can be triggered by near-miss or pilot flathatting etc.

A friend is a retired Naval aviator, said the #1 reason aviators get permanently grounded from the fleet, is refusal to accept responsibility for mistakes that they were clearly responsible for. Things like forgetting to set the altimeter correctly before takeoff, causing a serious near-miss or pilot deviation. Honestly and fully admitting their mistakes, is more likely to result in a FNAEB returning them to the fleet. Asoh Defense comes to mind.[0]

[0] https://en.wikipedia.org/wiki/Japan_Airlines_Flight_2#The_%2...

That’s a great example of a safety practice that nets out unsafe: better to ask why the altimeter requires setting and the plane can be started without it!

Thanks for the reminder, three investigations. It’s been awhile and FNAEB was always a terrifying word that imprinted on my brain. Accepting responsibility was definitely a key part of Naval aviation.

I do wonder about balance in drone safety, though...

...for instance, Zipline is doing amazing things in Rwanda improving the medical (okay, blood) logistics system drastically. They're legitimately saving lives regularly. But what they're doing is practically illegal in the US due to the way safety regulations are put together (although they have indeed achieved a very high degree of safety and work closely with ATC, etc... it's not the wild west). Some of that is logical... Rwanda is a developing country with a great need that overcomes a lot of safety concerns. But doubtless lives could be saved if similar drones were allowed in the US.

Second: FAA regulations have already slowed the development of electric aircraft in the US which has significant climate consequences and thus can indirectly lead to lives lost...

...that all said, I think the NTSB and FAA do a good job. Particularly the NTSB.

Zipline is doing good work. I think we will get there, it’s just going to take time. Acting FAA administrator Dan Elwell gave a great speech at InterDrone last year about the safety aspect.[1] Personally, I think in some ways parts of industry are slowing down progress. State and local will be key players, the FAA knows this, but industry seems to believe they shouldn’t be involved and below 400’ should be like class e & g airspace with no rights for property owners. If the Uniform Law Commission settles on 200’ that could be a good thing. [2] Then assuming the FAA authorizes a system as safe, private property owners could conduct BVLOS flights over their property (i.e. ranches, mines). Or assuming state and local are involved, they could authorize flights over public routes. The airspace below 400’ will more likely resemble class A airspace than anything else and will have to involve state and local for planning. No magical UTM solution will solve it alone. [3] It will be a combination of technology and operations.

[1] https://www.interdrone.com/news/dan-elwell-speaks-to-audienc...

[2] https://unmanned-aerial.com/drone-industry-responds-to-draft...

[3] https://www.utm.arc.nasa.gov/upp-industry-workshop/UTM%20PP%...

The US already has a good blood distribution system so I doubt many lives would be saved by adding drones. Our major problem is convincing enough eligible people to donate.

The US, like the rest of the world, has about a 7 percent spoilage rate. Rwanda, due to the just-in-time drone delivery network covering the ~entire country, has virtually zero. And rural hospitals in the US have more of a logistics challenge than you might think. Quality and access to care suffers a lot, and maternal mortality has actually gone up in recent years. Something like Zipline could significantly help.

>Though I worry a bit with the constant wars and punishing deployments this culture may be heading in the wrong direction.

Maybe 'constant wars' is already the wrong direction.

If anyone interested in the book, its webpage is moved.

New MITPress web page is: https://mitpress.mit.edu/books/engineering-safer-world

Since the book is open access, the direct link to the book (again from MITPress webpage) is https://www.dropbox.com/s/dwl3782mc6fcjih/8179.pdf?dl=1

Engineering a Safer World was a truly career changing book for me, and I am so thankful for bts and mstone for introducing it to me.

Would you say it is highly relevant for those interested in AI safety?

What's your threat model when you say "AI safety"? Which scenarios are you attempting to prevent?

Imagine if we did that with software generally.

Every time there's a data breach, an agency would investigate with the same thoroughness they do in aviation.

Then, National Geographic could have a spin off show called Data Breach Investigation (their aviation show is called Air Crash Investigation in the UK, I think it's called different things in different Countries.)

A man can dream...

We would be limited to 20 pieces of software that were very reliable.

The problem with software, which makes it so unlike hardware, is that we can use so much of it, the diversity is huge.

> ... software, ... unlike hardware, ...

Everything is software now. So much firmware (which is software), and so much hardware that cannot run without software. Your refrigerator, your dishwasher, your car, airliners -- the trend is for everything to be software now.

We can build reliable software when we want to. It just has limited scope and/or undergoes more expensive development procedures.

It is easy to build a finite state machine for a dishwasher that is fairly reliable. Likewise, consumer jet code that hits safety considerations is heavily tested and audited...and the FAA cares about it whenever it causes a plane to crash, leading to a lot of regulations on how the code is vetted.

Sure, software is everywhere, but the physical properties of hardware are fundamentally different when it comes to QA, risk analysis, general testing.

But when all your hardware has firmware, you've lost that advantage.

Depends how complex the firmware is.

The trick is not to create reliable software. The trick is to create systems that can tolerate complete failure of the software. This is the secret to airplane reliability.

(Although they still do their best to make the software reliable, they do not bet lives on its reliability.)

Safe Systems from Unreliable Parts https://www.digitalmars.com/articles/b39.html

Designing Safe Software Systems part 2 https://www.digitalmars.com/articles/b40.html

Your comment reminds me of this article about the Space Shuttle code that I come back to every few years: https://www.fastcompany.com/28121/they-write-right-stuff

This popped up recently: https://news.ycombinator.com/item?id=19180280

It's about a guy doing similar things, with model rocketry.

It seems though that the resiliency is all due to field tests/ spectacular failure in the latter. (It doesn't detract from the project at all though).

I'm curious whether they applied the same level of scrutiny to the libraries, compilers, and operating system their code used. And firmware.

All of that probably has to be done in order for the AP-101 to be flight certified. As far as I can recall, they wrote their own operating system and didn't rely on any external libraries.

Even when it was new, the AP-101 was part of a family of flight computers that NASA had already been using, and the same is true for the HAL/S compiler used for most of the code.

If you want to explore how it all worked, this SE thread contains some great information and also serves as a good launching point for further research:


People named a computer HAL, and sent it to space??

One of the tools that helped people and other computers into spaces was named, in part, HAL

I like the sentiment, but there are big differences.

In software, you can poke/prod away at the running application to get it to do something it shouldn't. To me, this is a bit like yanking wires that you know are important to see what happens to plane in mid flight. Another anecdote I use is to image pulling the steering wheel off a car while someone's driving it.

I think modern software testing and security analysis is probably far away better than what the ntsb provides because we've already systems in place to cope with crashes.

Imagine if we did that with car crashes, which kill 40,000 people every single year in the USA.

NTSB investigates hundreds of "major" highway accidents per year. Typically accidents with multiple fatalities and at least one of the vehicles being operated for hire. For example NTSB investigated the Schoharie limousine crash where 20 people died.

Graduated drivers license laws for young drivers, age-21 drinking laws, smart airbag technology, rear high-mounted brake lights, commercial drivers licenses, and improved school bus construction standards all came about at least partially due to NTSB recommendations.

This is a good discussion point: we could do this with car crashes, but I would guess that most causal factors can be summed up as "driver inattention".

How would you solve that? The FAA can issue all sorts of regulations requiring rest breaks, fatigue detection, etc. -- be sure you're ready for that level of intervention before you ask for this level of investigation.

Car crashes are actually the perfect example, because car fatalities aren't just a function of engineering a safe and sturdy car, and human inattention both can't be eliminated, but also isn't independent of design.

For instance, how many people ride in the car? What is the average car trip length? Why and when are people driving? Speed profiles, line of site in both car and environment. Does traffic all flow one way, can a car ever land in incoming traffic if it loses control, are traffic barriers designed to bring an out of control car safely to rest with good line of sight to the accident? Are features keeping drivers alert or distracting them?

The difference between lots of fender benders with a bit of panel beating and constant fatalities can come down to multiple causes, many outside of engineering a safe car.

"be sure you're ready for that level of intervention before you ask for this level of investigation."

There's no good reason not to, for example, have every car use a breathalyzer to start the engine -- those driving drunk on occasion would protect themselves from DUIs, lives would be saved, etc. Or to implement other, similar deep regulations. But the ideology around cars ("freedom"), vs. planes ("safety"), is too different to allow for such levels of intervention.

The good reason not to install a breathalyzer in every car is that I don't drive drunk and would vote against any politician foolish enough to suggest something so intrusive.

You don't drive drunk, but other people do. breathalysers are only put in AFTER an offense. To eliminate the "first generation" drunk driving accidents would require preemptive installation of breathalysers on every car.

The way they are implemented now would be similar to only installing seatbelts in cars where the driver has already been in an accident. Similar sentiment was around when seatbelt laws were put in place, "I'm a good driver." but the seatbelt protects you from yourself crashing as well as when someone else crashes into you!

In the same way, breathalysers (ignition interlocks) for everyone wouldn't be "for you" as much as for everyone to prevent them from drinking and crashing into you.

I hadn't thought about widespread ignition interlocks, and obviously there would be a lot of public pushback, and it would be quite intrusive, but I could see it helping reduce traffic fatalities dramatically.

I don't care how much it would reduce traffic fatalities. I am absolutely opposed to this level of nanny state overreach. And I trust that enough other Americans agree with me to prevent this kind of nonsense from making it into law.

What I would support is a further reduction in the allowed BAC, and stricter enforcement of traffic laws in general.

If you don't drive drunk, what do you lose from starting the engine via breathalyzer, rather than via key turn? Why is one more intrusive than the other? Most governments already require you to carry insurance, have a license/training to drive, wear seat belts -- in what way is this different?

Cars could never be invented in today's legal environment.

About 80% to 90% of reports would state it was operator error on the part of one or more of the drivers involved. Cars are pretty safe when people aren't drooling, lead-footed maniacs.

Huh? 80-90% More like 99.x%. Modern cars are extremely reliable, and even if they do fail (usually due to poor maintenance), that rarely causes a crash, since a disabled car can easily pull over to the side of the road, unlike aircraft.

Almost all crashes are ultimately caused by driver error. Doing FAA/NTSB-style investigation on them would be pointless, because that's exactly what they'd find: people were driving recklessly, inattentively, etc. And there's nothing that can be done about that, because we as a society absolutely refuse to have serious driver training and any standards for driver conduct. We can't even agree on whether the left lane is for passing or not! Try driving in Germany sometime and you'll find that what we accept from drivers here in the US is really quite awful.

To be fair, probably 80-90% of NTSB aviation investigations final reports contain the words “pilot error”.

The term "pilot error" is outdated, and suggests blame. "Human performance factors" is usual way to describe such mishaps.

That seems like a pretty good idea. Federally mandated breach investigation when more than X people are compromised. Guidelines are put in place that can be audited.

Among the many different things required of the GDPR, CCPA and US state-level data privacy regulations are:

1. Data Breach disclosures 2. Reporting/Public statements if the vulnerability or path that caused the breach has been fixed.

Note, this might seem a little weird (like "of course we've fixed the breach!") but often data breaches aren't detected for _months_ after they occur. There's plenty of companies who didn't know they were breached until they were on HaveIBeenPwned.

It's not NTSB level thoroughness, but I think it's a healthy start in that direction.

No agency needed. Companies do this internally already... right?

If and how extensively is culturally driven in an organization. Regulations, regardless of who imposes them, are almost always necessary. The unfortunate thing is that so many regulations are formed without support from the consumers of those regulations which is why they are often backwards or onerous.

Gun regulations are a great example of good intentions, bad implementation due in part to ignorance, and industry resistance/interference. Another is PCI which is a self regulation that's often to vague or specific to be useful.

Honestly that sounds like hell. I sure as hell didn’t become a developer so I could get thrown in jail for buggy code. Building software is something no one really knows how to do and we are still at the infancy of our field. I don’t think it would be fair to impose the same standards thst real engineers have on such a nascent and choatic field.

The software industry has been a thing for over 60 years now, with the first commercial software company being founded in 1955 (https://en.wikipedia.org/wiki/Software_industry#History).

The first commercial aviation company was formed in 1909 (https://en.wikipedia.org/wiki/Airline#History), the FAA was founded in 1958 and the NTSB in 1967....

At some point the "we're in our infancy" stage has to end.

The guidelines for safe aviation have way lower Kolmogorov complexity than the guidelines for safe “everything that software can do”-iation. It’s actually close to a subset when you think about it.

Well ideally it would end when we know what the fuck we are doing.. the number of bugs I find in all sorts of software (with Apple being one of the worst offenders) is frankly embarrassing and a pox mark of the entire industry.

If planes were as reliable as software they would be the leading cause of death in the world.

I would recommend reading "Developing Safety-Critical Software: A Practical Guide for Aviation Software and DO-178C Compliance".

The software planes use is reliable.

We know how to build reliable software. When it really matters, because lives are on the line or because failures cost real money, it is done.

What we don’t know is how to write reliable software while also delivering an MVP for antsy VCs and shipping feature updates every week.

And that’s fine. Not all software needs extreme reliability. The problem is that a lot of software gets categorized as “doesn’t need it” when it really shouldn’t be, like all those systems holding sensitive personal information.

One could argue that it’s precisely because there are no standards that the field is so chaotic.

I’m not sure what the right balance is - the more standards and compliance there are, the more innovation suffers. On the other hand, without standards or legal responsibility, it makes sense for companies to play fast and loose with security, privacy, and stability in favor of features - a consumer is going to pick a better product based on UX, features, and cost, not based on security they have no way of judging.

>I sure as hell didn’t become a developer so I could get thrown in jail for buggy code...

You'd better stay away from any company that requires FDA approval. Believe it or not, every change is signed. By you. By the FDA compliance person. Etc etc. I'll give you two guesses as to how they decide who goes to prison if God forbid something in the software is shown to have caused a fatality? Or, worse, a series of fatalities?

That sort of regulation happens even in the software industry. It just depends on the purpose of the software.

I don't think developers get thrown in jail, even for a bug that kills multiple people. The one time I've heard of people getting thrown in jail it was upper management, for trying to game the FDA audit and approval process. They got led out the door in handcuffs by US Marshalls.

Source: Worked in FDA-regulated software for six years, at two different places.

>I'll give you two guesses as to how they decide who goes to prison if God forbid something in the software is shown to have caused a fatality?

Would you mind just telling us instead? Thanks.

See IEC 62304:2006 (https://www.iso.org/standard/38421.html) and the related standards which govern the formal processes which medical device software must meet for FDA and CE approval (as well as other national standards).

The whole process from requirements, specifications, high and low level design, implementation, validation and verification and the rest of the lifecycle requires stringent oversight, including documentation and signoffs. The signatures on those documents have legal meaning and accountability for the engineers who did the analysis and review at each stage.

Look at this from an economics perspective, do people in this sort of positions get paid more than comparable positions in other fields? I'd never heard of legal liability for software engineers. Has anyone ever been convicted or otherwise been held accountable when there was a failure?

That’s really funny actually, I used to work as a developer for the FDA for their researchers. But since we were just publishing meaningless papers, I was probably safe.

Nobody is thrown in jail ever for a airplane failure analysis. Every human failure is considered a process failure, that gets fixed by improving the processes or the machinery

> Nobody is thrown in jail ever for a airplane failure analysis

...in the United States. I definitely recall watching multiple episodes of Air Disasters [1] where surviving pilots in crashes in other countries that killed passengers went to jail for things that in the US would have at most resulted in a suspended license and demotion or termination from their airline.

For example, Air France Flight 296 in 1988. Another example is Gol Transportes Aéreos Flight 1907 in 2006 in Brazil.

[1] AKA Air Emergency, Mayday, or Air Crash Investigation, depending on what country you watch in and what channel it is on.

...and the accident investigation data and reports are expressly restricted from being used in legal proceedings (Chicago Convention, Annex 13) with the intention that the process is for preventing accidents, not for prosecuting scapegoats.

If the investigations of any particular mishap were to indicate that some particular person or persons had been criminally negligent, it would be no surprise to see those people prosecuted eventually. If another team of investigators has to write a different report, they will.

Not sure if I agree that no one really knows how to do it, but yeah, software is not as disciplined as something like civil engineering. And for most things, it doesn’t need to be. For others, their teams operate much more slowly and with more testing and QA - think of a basic web app developer versus someone writing software for Boeing aircraft. The jobs are completely different. For most web app developers, screwing up doesn’t endanger anyone’s life.

I mean we can kind of make software, but it always has bugs and even the biggest companies who have (supposedly) the best developers consistently make products that don’t work very well at all. Software qualiry looks really bad compared to, say, bridges or cars or airplanes.

Respectfully, I call bullshit on this.

We know some very good ways to build reliable software. It's just that most industries can't / don't want to pay the costs associated with doing so.

The don't-know / don't-want distinction is functionally irrelevant, but I have a minor tick about it because asserting the former is often used to avoid admitting to the later.

Legacy IBM and ATT, for all their cluster%_$^!s, were amazing at engineering reliable systems (software and hardware). Maybe that's not practical nowadays, outside of heavily regulated industries like military / aerospace, because markets have greater numbers of competitors. But we do know how.

An accurate and modern truth would probably be "We don't know how to quickly build reliable software at low cost."

I mean, clearly, if we really wanted to, we could build all of our software with Coq and have it be formally verified. But we don’t. The Curry Howard isomorphism proves that code and math are equivalent and since we can prove a bunch of stuff with math, we should also be able to prove that our software systems are big free.

And so you’re right, maybe my statement is a little bit of a simplification, but not by much. The amount of man hours it would take to prove that the Linux kernel had no bugs according to some specification would require an absolutely prohibitive amount of time. As such formal verification is so prohibitive that only the most trivial systems could ever be proved.

This is pretty similar to saying that we don’t know how to build bug-free code. We need a paradigm change for things to get better, and I’m not sure it will happen or if it’s even possible.

I think of it in slightly different terms. Yes, we could build software that reliably. (Perhaps not totally bug free, but, say, with 1/100th of the bugs it currently has.) But that takes a lot more time and effort (say, 100 times as much, as a rough number). That means, because it would take 100 times as much effort to produce each piece of software, we'd only have 1/100th as much software.

Would that be a better world?

Sure, I hate it when my word processor crashes. On the other hand, I really like having a word processor at all. It has value for me, even in a somewhat buggy state, more value than a typewriter does. Would I give up having a word processor in order to have, say, a much less buggy OS? I don't think I would.

I hate the bugs. But the perfect may be the enemy of the "good enough to get real work done".

I look at that as the efficiency distinction.

It goes without saying that if we had more time, there would be fewer bugs.

But there are also things (tooling, automation, code development processes) that can decrease the amount of bugs without ballooning development time.

Things that decrease bugs_created per unit_of_development_time.

Automated test suites with good coverage, or KASAN [1, mentioned on here recently]. Code scanners, languages that prohibit unsafe development operations or guard them more closely, and automated fuzzing too.

[1] https://www.kernel.org/doc/html/v4.14/dev-tools/kasan.html

It is not about tools. It is about culture and process. Look at something like how SQLite for a real example of living up to the requirements of aviation software: 100 % branch coverage, extensive testing, multiple independent platforms in the test suite etc etc.

You don’t need anything fancy – but you do need to understand your requirements and thoroughly simulate and verify the intended behavior.

Do you know that first hand, or is this a case that you can see the cracks in the software industry because you are part of it?

I worked as a mechanic years ago, and I feel the same way about cars as you (and I) do about software. Everything is duct taped together and it's a miracle any of it works as well as it does.

Safety regulations and tests are gamed to all hell, reliability almost always seems to come from an evolutionary method of "just keep using what works and change what doesn't" with hastily tacked on explanations as to why it works, and marketing is so divorced from reality that people assume significantly more is happening in the car than really is.

Now I don't know anything about civil engineering or aricraft, but I have a hunch it's not as well put together as it may seem at first glance.

I still think that software is behind those other fields, and I'm honestly still on the fence if regulations would significantly help or would just slow things down, but it's absolutely the "wild west" in software.

I'm not a "real" engineer, but many of my cousins are (civil engineers) and everything about a bridge, say, is modeled and proven ahead of time.

I work as a developer and everyday I feel like this isn't the right the way to do it. I'm not sure what the right way is, but being a software developer I probably have less faith in it then the general population. Something just feels wrong about it, how we do things..

Imagine building a bridge without a specification. It would be criminally irresponsible. It has been known for decades now that proper specification is absolutely necessary to produce correct software. This is practically a tautology, because there isn't a formally meaningful way to define "correct" besides "satisfies some specified behaviors." Good unit testing does this in an ad hoc way, because the tests serve as a (usually incomplete) specification of the desired behavior. That said we must remember that testing can never prove the absence of bugs, only their presence. The only practicable ways to prove the absence of bugs in nontrivial programs are automatic formal verification or constructive proof using formally defined semantics.

Of course for most commercial software correctness is so irrelevant that it's left undefined. All that matters is pleasantness, that is to say does it generally please the user, perhaps by attracting customers or investors.

Airplanes have software. It works very well without bugs. Because that is what is demanded in the aviation industry.

Look at the Equifax hack. Vulnerability was known but no one cared to apply the patch. In engineering, you chose 2 amongst being better, faster or cheaper.

It's clear in IT we always choose faster and cheaper. No one went to jail for the Equifax hack except that one low level manager for insider trading.

This industry has been around for 60 years, and has had the benefit for that entire time of the experience from the previous several hundred years of people developing what is modern engineering. It isn't really young or nascent anymore.

There are large sectors in this industry where rigorous engineering effort isn't really applicable. It seems to me, though, that this has effectively been used as an excuse to avoid any genuine rigor. It isn't helpful that so many programmers have such huge egos.

Then you shouldn’t be a developer?

Do you have a better way of doing things? Because you formally prove that the code you write has no bugs? Give me a break. You are as bad as the rest of us. Maybe you haven't realized how bad you are, but that just means you have some learning to do.

If you enjoy reading about risk and root cause analysis in general try the venerable Usenet group comp.risks or read the digest web version at http://catless.ncl.ac.uk/risks/. It's one of those internet sinkholes that can eat hours of your time.

Check out The Design of Everyday Things by Don Norman too. He talks very highly of the NTSB and their accident investigation reports. The root cause isn’t usually one single error, but a series of mistakes that collectively caused the accident.

I love this newsletter but they switched to sending out content wrapped in pre tags with baked-in newlines a few years ago and the value of fighting through wasn't worth it.

Why does this matter? Screen readers or something else?

The newlines are baked into the rss feed too, which breaks my newsreader. I've talked to the maintainer but they didn't want to change the stack they use to produce it.

The safety cultures of agencies like the FAA, NTSB and international counterparts such as the EU's EASA is amazing, but not without its downsides.

Certification costs have been going up as a portion of overall development costs[1], this leads to more expensive planes, less competition and technological development.

There's some amount of safety that's counterproductive, since people will just opt for e.g. driving which is much less safer, but the way these agencies are set up means they can't ever address that question. They're never going to declare a system as unreasonably too safe and expensive.

So we really should be careful to wish that aviation safety be applied to other industries.

1. https://aviationweek.com/commercial-aviation/what-certificat...

The space industry already has a long-established tradition of doing similar failure investigations.

Yeah, I really don't get his comment there. Whenever there is a crash, the launch provider usually does not launch RUD is fully investigated. The analysis is equally exhaustive (if not much more so) than in the aero industry.

The space industry already does this. In fact sometimes the same people and agencies are involved. It can take years of investigation to return to flight after a serious accident of unknown cause.

I'd be interested in hearing about it applied to totally different industries, particularly software.

John Allspaw applied concepts from The Field Guide to Understanding Human Error to software post mortems. When I was at Etsy, he taught a class explaining this whole concept. We read the book and discussed concepts like the Fundamental Attribution Error.

I've found it very beneficial, and the concepts we learned have helped me inn almost every aspect of understanding the complicated world we live in. I've taken these concepts to two other companies now to great effect.





So much fantastic reading I hadn't seen before in this discussion. Mucho thanks to all for sharing!

Incidentally, Amazon does this. Particularly, they've adopted into the culture the concept of "good intentions don't work, mechanisms do." A decent summary can be found at this link:


For example, if an admin bungles a copy/paste shell command and it causes an outage, instead of punishing the admin for not wanting hard enough to do it correctly, they'll change the process so it doesn't rely on admins copy/pasting shell commands.

That's awesome. I'm guessing it's limited to software Amazon themselves develops and doesn't extend to software deployed to AWS?

How could it? AWS will happily run whatever code you want, using whatever good or bad development practices you care to choose.

Fair. More just wondering if that have any sort of process for investigating data breaches involving software hosted on their servers.

Look at the “shared responsibility model” - if someone got in through the host then I guess they’d investigate. If you leak your SSH private key it’s on you.

It’s not their software to investigate.

if you're into process safety, the CSB has a number of youtube videos on disasters and their causes https://www.youtube.com/user/USCSB

As someone working in the aircraft industry, I hope to see this approach with motor vehicles. I think every car should have a 'black box' as well as mandatory dash cams.

Strangely enough, dash cams are actually illegal in many countries (namely the EU).

Note that neither Fukushima nor Deepwater Horizon followed basic aircraft engineering principles: No single point of failure will cause the loss of the airplane.

Um, I thought Fukushima was not caused by any single point of failure either: it was caused by an earthquake and a tsunami happening together (admittedly, the earthquake caused the tsunami). The earthquake caused the reactor to automatically shut down, which would have been fine, except then they got hit with a tsunami that topped their seawall and flooded the basement, shutting down the emergency generator, which disabled the pumps which cooled the reactor.

The earthquake didn't cause any damage. The single point of failure was the seawall being breached. The rest was a zipper of each failure causing the next. The tragedy is this zipper could have been halted at many places.

For example, the hydgrogen overpressure in the reactor (caused by previous failures) was vented through an overpressure valve into a pipe. The pipe exited into the enclosed reactor building, which lead to an explosion.

The pipe should have been vented to the exterior. The cost of that would have been insignificant, like numerous other zipper-stops.

Deepwater Horizon was a big zipper, too, none of which would have been costly to prevent.

Also, the emergency generators should have been put on platforms so they wouldn't flood. Critical machinery should have been located far enough away from the reactor so they could be worked on without fear of radiation.


I wish we did it in auto investigations.

Except for TWA 800

Boeing 767 crashes or emergencies are extremely rare: https://en.wikipedia.org/wiki/Category:Accidents_and_inciden...

Most are related to human intervention such as terrorism, pilot error, fuel error and only a very small amount are mechanical errors. Of the 12 problems, only 5 were mechanical error. I wonder what the cause of this will be.

Those numbers are outrageous on their own. But what's even more amazing is that not all emergencies result in crashes. Air Canada Flight 143 ran out of fuel halfway into the journey and there were no fatalities! They literally glided to a motor track. That's so incredible a modern movie would have trouble getting its audience to suspend disbelief for it.

The cause analysis of how they ended up with half a tank of fuel is crazy. The crew loading the fuel mistook lbs for kg. The plane's fuel indicators were faulty and required a manual procedure before takeoff. The engineer communicated that to a pilot and the pilot misunderstood. That pilot relayed his misunderstanding to the next pilot and that next pilot misunderstood the misunderstanding. And then that next pilot misunderstood the checklist indicating whether it was legal to fly with the non-functioning fuel gauge. And the flight engineer didn't catch the lbs/kg mistake. Then during a brief stopover in Ottawa, the captain thought to re-measure the fuel with the floatstick, but still used the incorrect conversion factor.

Why don't they just ditch imperial measurements altogether?

Also, that description sounds like tracing a bug through a few layers of software where there wasn't proper error handling along the way, just trusting the data being passed around.

Amusingly, the reason why the confusion was there was because at that time Canada was in the process of converting from the Imperial system to the Metric, but they had to manually convert some things still!

If I'm remembering correctly, they were in the process of switching from imperial to metric at the time.

I wonder how many lives are lost each year (not just in aviation but globally) because the US refuses to convert to the metric system.

> I wonder how many lives are lost each year (not just in aviation but globally) because the US refuses to convert to the metric system.

Keep in mind that the U.S. is not the only place where measures are mixed. If anything, British people end up using more measures than Americans. Here in Canada, we have a lot of things read out in both, because of the immense sticking power of customary measures.

U.S. organizations are actively encouraged and (for the most part) fully allowed to adopt SI measures, but it's not like switching road sides, where everything makes sense after you hear "drive on the right now".

Keep in mind that powered flight was basically entirely developed by Americans and pre-SI Brits, and already had more than fifty years of history by the time either pioneering country adopted SI in any significant way. It should not surprise you in the slightest that an American plane expected to hear fuel mass given as weight in pounds; in fact, it should astonish you that anyone is trying to measure it in kilograms in North America.

I'm British, so you don't need to explain using mixed systems, however in any kind of academic, engineering or scientific contexts metric units are used almost exclusively. Yes, people still refer to height in feet and inches a lot, and we'll drink a pint (a real pint! None of your mini-pints!), but that does change over time. For example older people will use stone and pounds for their weight but being younger the doc doesn't bother telling me my weight in anything but kilograms so I don't even know my weight except in Kg.

Even in Britain, in almost all engineering contexts, SI units would be assumed. Your point about Americans assuming imperial units is precisely wwhy it's so dangerous that metrification hasn't happened there.

If the US went metric, then there would no longer be any need to assume anything but metric anywhere, and it would be safer for it.

Yes, there's a long standing tradition, but it only takes a couple of generations to completely change, and that would likely be hastened by being the last large country to do so and having a large cultural footprint on the rest of the world.

If the US did convert to metric, in 3 or 4 generations the whole world would be metric, and there wouldn't be any "wrong assumptions", and it is making bad assumptions that lead to preventable failures.

If China can manage to go metric, the US can too.

> however in any kind of academic, engineering or scientific contexts metric units are used almost exclusively

This is true in the United States as well, but fueling a commercial airplane is not an academic, scientific, or even an engineering exercise. It is as routine and pedestrian as refueling a bus.

> Your point about Americans assuming imperial units is precisely wwhy it's so dangerous that metrification hasn't happened there.

Or maybe it's an argument for why "metrification" is itself dangerous. The problem is the transition, not being on either side of the fence. It seems to me that the only safe mode of transition is first to dual readout, and only then to SI-exclusive.

Don't get me wrong, I like SI units, I use them every day, and I don't long to spend any time multiplying and dividing by irregular fractions; I just don't like the dismissive "shoulda been metric" rhetoric that floats around everywhere; as though you can just stop selling letter paper and force everyone to use A-series compatible envelopes, and convert the clean (if baroque) markings on the paper products to bizarre decimal fractions of a g/m² without any empathy for the old guard.

Some people care about how Britain still drives on the wrong side of the road, but that doesn't mean they should berate you every time a drunk roadtripping Frenchman turns out into oncoming traffic.

A British bus will be refuelled in litres, which I think was the point.

> as though you can just stop selling letter paper and force everyone to use A-series compatible envelopes, and convert the clean (if baroque) markings on the paper products to bizarre decimal fractions of a g/m² without any empathy for the old guard.

The first part was done in Britain in the 1950s, but with the weights rounded to convenient metric numbers.

> A British bus will be refuelled in litres, which I think was the point.

Refuelled in litres, but with efficiency measured in miles per gallon. It's kind of a mess.

That's the case for cars, but I wouldn't be certain it's the same for buses, especially large fleets of public transport buses.

All three British bus manufactures give all measurements except speed in metric, although none give an actual fuel consumption. (Just "20% better" etc.)

[1], as the first reference I can find, uses both l/100km and mpg.

[1] https://www.whatdotheyknow.com/request/fuel_consumption_of_p...

Miles per US gallon or miles per imperial gallon?

The latter, miles per UK gallon. Watch out for that, because if you don't know better, you'll read about some car in the UK and wonder why it has such astounding fuel economy!

Even in academia doesn't the US use CGS, and most of the rest of the world uses SI/MKS? We can't use the same physics textbooks.

I took high-school and college physics in the US around 30 years ago. It was all SI then, and I suspect it still is.

> If the US went metric, then there would no longer be any need to assume anything but metric

That might be a tiny bit optimistic. I'm in my mid 50s, and UK schooling taught me nothing but metric. I vaguely remember some sort of half-hearted campaign and leaflets about it, I think that was before we joined the EU. In 2019, and we are still not there yet.

I'm not sure the last popular holdouts are ever likely to change at this point.

I'm surprised, but thankful, that the Brexit lunacy hasn't (yet) brought calls to return to Imperial and £sd. :)

> I'm surprised, but thankful, that the Brexit lunacy hasn't (yet) brought calls to return to Imperial and £sd. :)


Return of pounds and ounces? Britain might allow firms to use imperial measures after Brexit https://www.telegraph.co.uk/news/2017/02/17/return-pounds-ou...

EXCLUSIVE: Experts Slam "Bonkers" Proposal To Re-Introduce Imperial Measurements After Brexit, In Letters To Government http://www.gizmodo.co.uk/2017/04/exclusive-experts-slam-bonk...

Now that we are to be a sovereign nation again, we must bring back imperial units https://www.telegraph.co.uk/news/2017/04/01/now-sovereign-na...

Sigh. Thankfully I'd missed that. Not surprised it was Heffer, I think he'd like to send us back to about 1912. Gizmodo piece was fairly entertaining though. :)

Fortunately the post-boomer “sanity” generation are not going to accept imperial units other than pint (for beer), regardless of what people pining for a past in which they were relevant may want.

> Keep in mind that powered flight was basically entirely developed by Americans and pre-SI Brits

The french played a very dominant role as well.

> The french played a very dominant role as well.

As far as I've known, that largely involved a tremendous number of English channel crossings, but I will look more into that so I don't stay ignorant; thanks. ;- )

So an Frenchman, creator of he first practical monoplane, crosses the English channel before an Brit does... but the Channel is named for the English so the Brits get the credit? Nice dude...

Here are a few places for you to begin your research:









Aside: How many American-made components do you see with big bold French labels on them? https://en.wikipedia.org/wiki/File:Fdr_sidefront.jpg

Fuselage, Empennage, Ailerons...

Hangar, Mayday (m'aider), Bombardier, Airport...

Hopefully you're being deliberately ironic, but your comment indicates a purely anglo-centric perspective. Continental European endeavour irrelevant except when it intersects with the homeland!

The loss of MCO is actually a very interesting story about both engineering culture (engineers noticed a problem but the organization didn’t respond to investigate/fix it) and the conversion to metric. The root came from NASA deciding to use metric on this project and a contractor using the wrong units. But it’s the kind of thing that the process should be able to catch and handle. It bugs me to dismiss this failure as “haha why doesn’t NASA use metric” because not only is it a direct result of NASA switching to metric, there are really good engineering lessons that get ignored.

Lives would also be lost in the ensuing confusion if US were to switch to the metric system.

Indeed. It's only due to the skill of the pilots that lives weren't lost during the Gimli Glider incident.

Only partially. Part of it was pure luck, that no one was on the track at that moment when the plane landed on it. There were people at the track at the time, doing races, but the plane just happened to come down at the right time.

They ended up fixing it afterwards and it continued to fly with Air Canada until 2008 and it went up for auction in 2013[0].

[0] https://business.financialpost.com/transportation/air-canada...

The Gimli Glider: https://en.wikipedia.org/wiki/Gimli_Glider

the pilots were initially punished (which is outrageous), then lauded, for their skills.

Why is it outrageous that they were punished? Yes they saved the day but they had a very important role in allowing the mistake to happen in the first place.

The FAA has a tendency to punish deliberate acts or serious lack of judgement, i.e. "I'll fly under this bridge, it'll be awesome."

When mistakes happen by accident, they tend to give people lots of slack to encourage honest accounts so that they and the rest of the community can learn from the mistake. This is the reason behind things like the NASA form, which is sometimes seen as a "get out of jail free" card. You write a detailed account of what happened after certain kinds of accidents/incidents, and if they conclude that you weren't being too stupid, or it wasn't a repeat issue that you should have learned from, they'll skip certain minor enforcement actions against you.

Because pilots are humans, and humans make these types of human-mistakes generally. If your safety protocol relies on humans being perfect, you better resign yourself to a decent number of crashes. Better to build a system where openness around unintentional mistakes is encouraged, and then design that system to be redundant even in the face of human error.

I agree, but in this case the automatic systems to avoid human error were the primary systems, and the humans knew they were malfunctioning before takeoff, and yet failed to double check for human error and flew the flight despite it being illegal to do so with the automated system malfunctioning. So yes you want to design your system to limit the impact of human error, but the human pilots also bear a very large responsibility too. Not as much as the airline for sure, but I could see why the initial verdict would lean toward negligence from the sheer number of lapses in judgement that were made.

Eh, I think most people can understand gliding pretty easily. What would really mess with their heads is a helicopter losing its engine and autorotating down to safety.

Helicopters are often easier to land than planes. Landing a helicopter without power is part of learning how to fly a helicopter in the first place... https://youtu.be/pL1-QH7eQAY

Landing a 767 without power, not so much... https://youtu.be/GlkCofOyxUA

While true, it's worth noting that mandatory auto-rotation training is due to the fact that helicopters are inherently unstable aircraft whose turbine engines have the habit of stalling at the worst possible moment.

> "The thing is, helicopters are different from planes. An airplane by it's nature wants to fly, and if not interfered with too strongly by unusual events or by a deliberately incompetent pilot, it will fly. A helicopter does not want to fly. It is maintained in the air by a variety of forces and controls working in opposition to each other, and if there is any disturbance in this delicate balance the helicopter stops flying immediately and disastrously. There is no such thing as a gliding helicopter. This is why being a helicopter pilot is so different from being an airplane pilot and why, in general, airplane pilots are open, clear eyed, buoyant extroverts, and helicopter pilots are brooders, introspective anticipators of trouble. They know that if something bad has not happened, it is about to."

-Harry Reasoner

> whose turbine engines have the habit of stalling at the worst possible moment

Helicopter turbine engines operate on exactly the same principles at the 90,000lb thrust turbofan under the wing of a 777. They are incomprehensibly reliable.

Things that lead to helicopter autororation are usually mechanical failure of the transmission.

Then there is the Harrier Jump Jet.

Then there is the challenge of landing a Harrier Jump Jet on an aircraft carrier.

Then there is the challenge of landing a Harrier Jump Jet on an aircraft carrier that happens to be postage stamp sized and in the South Atlantic high seas during an actual war, the Falklands War.

Now I want a video where this happens instead of just someone talking over some pictures.

This still seems amazing and unbelievable to me.

Google “helicopter autorotation” and have your pick.

Here’s one with an outside view, which I think is easier to grasp for non-pilots: https://youtu.be/twGid07JR9s

Basically, the helicopter “falls” down and slightly forward, driving the main rotor disc (potential energy turning into disc rotational energy), then the pilot flares the helo, slowing the forward speed and taking energy from the rotor disc to slow the vertical descent, finally resulting in the aircraft settling to the Earth at a slow speed. Just like gliding an airplane, it’s a one-shot kind of a thing (except in training where you generally keep the engine running and available).

Note that the rotor disc is connected to the engine via a one-way clutch, so the engine can drive the rotor but not vice-versa. Otherwise, a seized engine would result in a rotor disc stoppage (and an uncontrolled descent).

Here's a Smarter Every Day video on the topic. Live demonstration starts at 6min mark.


Landing a 767 without power, not so much... https://youtu.be/GlkCofOyxUA

That's a bit misleading. The Gimli Glider landed without thrust but almost certainly had electrical and hydraulic power. Same deal with the British Airways 777 that crashed at Heathrow.

Meanwhile, LATAM landed a 777 with no electrical power[1] not that long ago. That's a whole different ballgame.

1: http://avherald.com/h?article=4c1cc3f6&opt=0

practicing how to land an airplane without power is part of a basic private pilots license.

Yes, including several people who worked on game engines and whom I have frequently cursed out...

While definitely a cool factor, I'm not sure it would be so mindblowing. As a kid, we always called maple seeds 'helicopters', and their descent is a similar concept.

Planes gliding to a safe landing after losing power isn't something anyone should be harboring disbelief of to begin with. I've experienced it twice myself.

The surprising part of that story was the seemingly endless string of fuck-ups that resulted in them running out of fuel mid-flight.

>Planes gliding to a safe landing after losing power isn't something anyone should be harboring disbelief of to begin with. I've experienced it twice myself.

You must have been really lucky.

Sure, planes are perfectly capable of gliding to a safe landing after losing power. The problem is that there usually isn't a safe place nearby on the ground to land them at. So instead of a smooth, safe landing on some tarmac, they have a horrible crash-landing onto some kind of ground, if they're lucky. If they're not lucky, there's only mountains around and they just crash into them.

The nice thing about helicopters losing power is that a helicopter only needs a very small patch of level ground to land safely. Airplanes need an airstrip of some kind, and big airplanes need really long ones. Usually, if they're lucky, there's a decently straight highway close by.

It helped that the Captain (I believe) was ex-military and an expert glider pilot. They landed at a mothballed RCAF base that was being used for drag racing. Amazing story.

In addition to the captain's glider experience, the first officer just happened to have served at the RCAF station where they eventually landed. So they had at least a couple of extra things working in the favour that day.

They are reading the Wikipedia article, without attribution.

Wow, I find those numbers simply amazing.

The design construction operation and regulation of commercial aircraft is the peak of safe and effective infrastructure technology it’s truly spectacular

We obviously have no idea, but my first thought was fraudulent goods not properly labeled (batteries specifically).

Value Jet went down in the Everglades due to a similar reason

Makes me feel better because I flew on a 767 today that must not have been updated since 1992 (original seats with ashtray, original overhead lights and window shades now not working, etc) and I was wondering a little about airworthiness.

Be relieved that the parts of the plane which are actually critical to flight receive far more maintenance attention than the passenger amenities.

Around Seattle we say: if it's not Boeing, I'm not going.

For reliable updates and objective information, as always, consult avherald. For this accident: http://avherald.com/h?article=4c497c3c&opt=0

Airliners.net is usually interesting to follow as well. Mostly pilots posting: https://www.airliners.net/forum/viewtopic.php?f=3&t=1416323

The FlightRadar image in that thread is pretty telling: https://pbs.twimg.com/media/D0HZO8GX4AAvxAL.jpg

The last readout they were at an altitude of 1325ft, with a vertical descent of 29504 feet per minute (~335mph). So that reading was (probably) roughly 2-3 seconds before hitting the ground.

PPRuNe is also quite informative, although the ratio of pilots to peanut gallery has decreased over time.

For those unfamiliar (like myself), what is this?

Air traffic control. You hear them talking to, then losing contact with the flight. They refer to the flight mostly as "591 Heavy", though you'll also hear "3591".

There's several references. Around 27:09 you can hear them talk about looking for a lost aircraft.

Around 10:06 the controllers start asking if aircraft are picking up "ELT's", Emergency Locator Transmitters

Oh cool, thanks. I didn't realize this chatter is public.

Aircraft communication, including ATC, is mostly done using plain old VHF radio. You can buy a decent handheld air band radio for $200. Ancient technology, but it works well enough.

Or if you’re lazy there’s LiveATC, an iOS app that lets you stream these channels.

Also handy if you’re not right next to the transmitter. Air band is mostly line of sight, so it’s common to be able to hear all the airliners but not hear the ATC talking to them.

You can get a decent handheld VHF walkie that lets you also tune in on ATC for a quarter of that price.

A rtl-sdr [0] is even better. For $30 you can tune in to anything from 500 kHz up to 1766MHz.

[0]: https://www.rtl-sdr.com/buy-rtl-sdr-dvb-t-dongles/

True, works well enough, and also it's a disaster waiting to happen for very obvious cybersecurity reasons

I’m not finding those reasons to be as obvious as you, perhaps you could elaborate.

What’s stopping someone from impersonating ATC?

If I understand correctly, a foggy day + “cleared for takeoff” is all that’s needed for a malicious actor to kill hundreds of people

A resultant accident would also require:

ATC to ignore the false transmission and take no action.

The departing pilots to accept a new voice and a clearly inferior/weak radio clearance as fact without verifying.

The departing pilots to not have situational awareness of an aircraft previously cleared to land on their runway.

The departing pilots to still not check final approach on the just-falsely-cleared runway before taxiing into position.

The landing pilots to have ignored the false transmission (also from a weak/inferior radio and new voice) clearing an aircraft onto their intended runway.

The landing pilots to not be watching the runway when they break out at minimums. (assuming your foggy day is a worst-case scenario)

The departing aircraft to have already started a takeoff roll and be more than 1,000' down the runway. (aircraft "touchdown zone" is not at the beginning of the runway)

Possible? Yes. Not in my top ten fears as a pilot. Much of ATC is a collaboration between professionals, not a dictatorship. It's really an amazing thing to participate in.

I prefer the crowd-sourcing safety of a completely open channel. Everyone is listening to everyone and can intercede if something weird is happening, including other pilots. Having a completely private channel but stolen credentials (we are all aware this happens all the time) means that you have to completely trust the voice on the other line.

> If I understand correctly, a foggy day + “cleared for takeoff” is all that’s needed for a malicious actor to kill hundreds of people

Only if the real ATC really, really drops the ball.

It's happened a few times, and leads to an immediate "who the fuck was that on this frequency?", and that's likely to result in the pilots in the area treating it like a comms outage.

See https://aviation.stackexchange.com/questions/44279/what-prev...

It clearly hasn’t become too much of a problem yet, but I feel pretty concerned that, given my understanding, it appears there’s only one layer of defense against this type of attack. The response requires 1) the ATC to figure out what happened, 2) the ATC to promptly cancel the takeoff clearance, and 3) the pilot receiving+responding to the cancelled clearance with enough time.

Too many things in that chain can go wrong, especially so given this would all need to happen in just a few seconds. A sophisticated attacker might even be able to jam the signal right after they give the fake clearance or (not entirely certain this is possible) use a highly directional transmitter that would allow the targeted plane to receive the message but not others.

I’m definitely not an expert in this area, so I wouldn’t be surprised if I missed something, but if I didn’t, this appears to be an astonishingly large vulnerability.

It’s just simply something that isn’t as big of a deal as you’re thinking. Hell, we have problems today with idiots on frequency that are technically qualified to be there but are gumming up the works.

When was the last time you authenticated that construction worker directing traffic on the ground?

Pilots fly without a control tower all the time. They’re also the final authority to the safe operation of that aircraft. If anything is amiss, we’ll do something else. Maybe that’ll mean contacting a different facility on a different frequency, or declare lost comms via transponder and go to our filed alternate while things are worked out.

Try listening to LiveATC for an uncontrolled field on a nice weekend day. (Or even a towered airport like KCMA on a Saturday at noon.) It’s controlled chaos and yet we all make it work.

A completely different situation, but a Las Vegas controller had a stroke while on duty not long ago. You can hear in the transmissions that as the situation goes on the pilots stop obeying and begin verifying instructions -https://youtube.com/watch?reload=9&v=Jv1kmuFOhWk.

Not much, but I suspect building and running a crypto infrastructure in a secure way for everything from tiny privately owned aircraft to international carriers isn't easy.

The same thing that’s stopping people from impersonating a police officer.

Perhaps he's referring to the MITM attack from Die Hard II...?

There's also ADS-B and ACARS, both unencrypted data from the aircraft. I assume the cost/benefit to encrypt voice and data has been looked at and dismissed.

How so? You can still Jam digital radio waves, and "upgrading" to a more secure system would be hella expensive as you would need to retrofit the world's planes with the new system

Any aircraft retrofit is crazy expensive. FAA certification is tedious, and has to be redone even for tiny changes. And, aircraft don't make money on the ground.

Security, for sure, but where’s the cyber?

picture I have is, should move to a digital jam-resistant spread spectrum, with certificates, encryption, and some VHF fail-safe. also, voice communication is high workload, not error free, many boilerplate communications like reading out altimeter settings can be text messages or automated. I imagine work is being done but not familiar with it.

Air Traffic Control recording in the vicinity of the crash.

One of the radio apps I downloaded for my phone last year included ATC channels as well as police channels. I found it oddly calming and reassuring listening to Chicago police channel, since it proved that someone who gave a damn was always out there, taking care of us.

I find listening to ATC communications similarly soothing, even when it is regarding accidents or problems. The poise and professionalism of ATC is pretty astounding.

My go-to is this YouTube channel, which highlights various bits of ATC communication and overlays maps as it happens https://www.youtube.com/channel/UCuedf_fJVrOppky5gl3U6QQ

What was the app called?

That's terrible, and it could have easily been worse had the crash happened closer to Houston.

VASAviation (Youtuber that publishes ATC recordings and plots them on a map with weather) for this incident:

https://www.youtube.com/watch?v=0cn58iVuzBY [2mins30]

I can’t comment on the cause, but it reminds me of a UPS cargo plane fire in Dubai. Lithium batteries can’t be extinguished using oxygen-deprivation methods. It is typically the kind of goods that are shipped by plane by mistake.

It was featured in an Air Crash Investigation episode named “Fatal Delivery”. https://en.wikipedia.org/wiki/UPS_Airlines_Flight_6

Interesting link, thank you. They mention that the debris is spread out over three miles of shallow water. I don't know anything about aviation, but it seems like this would suggest that the plane broke apart above ground.

How did you get that from the link? I read 'eyewitnesses report it going in nose first'.

In one of the yellow highlighted sections

The crash scene extends over a distance of 3 miles in shallow waters up to 5 feet deep.

Given the velocity of impact, it's not unusual for an intact plane to literally explode into tiny pieces upon impact, that continue traveling for quite some distance.

I was surprised to learn Amazon has it's own planes. I always assumed that cargo planes carried cargo from a variety of companies.

Amazon doesn’t own them, Atlas Air owns and operates them for Amazon. Groups like American Airlines Cargo and Emirates Cargo do have mixed loads though, but these Prime jets are more akin to UPS and FedEx jets that serve a single user.

Interesting fact, thanks for sharing.

When I lived near a major cargo airport for Fedex, I often got Prime stuff delivered via Fed Ex, which I assumed was intermingled. I didn't realize Amazon also had it's own literal fleet.

Was really strange seeing this last night on the news, I was only about 20 miles away. Very curious what the cause was.

i thought the domain was webtv.com for a second. (WebTV was how my paternal grandparents used the Internet.)

Hopefully they have a contingency plan to keep those Prime parcels on time.

Edit: They must have contingency plans for such scenarios. That's part of the job. But I think that it is not possible to keep to the initial schedule, so I'm guessing that the plan is to re-despatch ASAP to minimise delays while displaying the standard "delayed-apologises" to customers.

Are people downvoting because they think you’re being heartless? Is this not a valid concern that businesses have to deal with even in the midst of dealing with the tragic loss of life?

This is not a valid concern. The cost of refunding all the packages will be trivial compared to the cost of the plane, Amazon will apologize to the customers and refund or offer replacement, no special contingency plan needed.

Indeed. We (not Amazon) had customer parcels on UPS 1354 and, upon learning of the crash, set about figuring which parcels were possibly/likely on the plane, starting re-orders for them, and upon getting final confirmation from UPS, notifying customers of the delay and that a re-order was already in progress and when they should expect to receive it. (Our products are all customized and so we didn’t want to wait for the final confirmation to begin making things rights.)

As far as I know, zero customers complained about the obviously tragic and unusual circumstances of the delay.

For Amazon, they almost surely know with immediate certainty what was on the plane, and are mostly selling from stock, making the resolution easier in some ways (though larger scale on the other hand).

We treated this response as an ad-hoc process and have zero plans to systemize the response to a once every few decades issue.

Total loss of shipment is a valid concern in the supply chain and transport industries...

Remember also that the total loss of the plane impacts more that this single flight. Like airlines, I am sure it had an utilisation schedule, and now it's gone so they'll have to account for that.

That's why they do have contingency plans... And that's why they develop them with a cool head. Because when disaster strikes people tend to get emotional.

Edit: Wikipedia says that Amazon Air operates 39 planes. Sudden loss of one has to have an impact that has to be planned.

Sudden loss of one is unlikely enough that they probably don't have a specific plan ahead of time.

Sudden loss due to crash = unlikely (fortunately)

Grounded for unknown timespan due to some kind of severe damage = does happen from time to time (good thing, because otherwise, crashes would become more likely...)

Since the effect on the logistics chain is identical (at least apart from that single shipment in question), you just use the contingency plan for the latter when the former happens. And for the latter, you MUST have a plan.

The plan there involves sending the same cargo through another plane or similar.

They likely don’t need a specific plan. Airplanes go offline for heavy maintenance checks periodically and the carrier needs capacity to cover for that (or other AOG [broken airplane-“airplane on ground”] scenarios).

You're saying that airlines/delivery services like UPS don't plan and prepare for air disasters... That's laughable frankly.

Anyone who operates a whole fleet of 767 has a plan in case of a crash. That's the most elementary professionalism.

This isn't an airline. Airlines operate many more flights with lots of people on them.

Atlas is an airline, with 112 aircraft. Airlines aren't limited to passenger transport - https://en.wikipedia.org/wiki/Cargo_airline.

That's certainly smaller than Delta's ~800 planes, but they're certainly not tiny.

The problem is the phrasing I think, it seems too self-interested.

That said, Amazon is a logistical beast and regularly deals with transport failures (semi-crashes, lost containers, train derailments, fires, etc) so I don't feel that this would necessarily warrant any out of normal response.

I do know 3 people on board died, and yes that's tragic. But is there a need to comment to say and repeat it again and again? This is really the fake, superficial side of social media I hate. The disaster planning and recovery angle seemed on point to me on HN. This is not Facebook or Reddit...

I've seen a press release where Amazon replace all packages when a small van got into an accident 15 years ago. Since the plane crash is national news they'll likely track the shipment tracking numbers and resend replacement packages free of charge.

wow dude


I'm sure you have excellent reasons for your strong position on the plight of the disabled—both for how much you know about it and how strongly you feel. But approaching an HN discussion with a stridently rhetorical approach only leads to flamewars, and those are against the site guidelines. A softer touch is needed in order to preserve the quality of good conversation, which is the purpose of these threads. Among other things, that involves listening to and respecting others—even if they're wrong and ignorant—and stopping short of getting aggressive (e.g. personal attacks or calling names), even though the level of the discourse here can indeed be frustrating.

If you'd please review https://news.ycombinator.com/newsguidelines.html and take the spirit of this site more to heart, we'd be grateful. It's in your interest too, because then your views will be more likely to change hearts and minds. Providing correct information in a neutral way may not persuade the person you're arguing with, but it will win over the larger readership who don't comment.

We detached this subthread from https://news.ycombinator.com/item?id=19239100 and marked it off-topic.

This is you searching for ways to be offended.

The result of this crash is three known fatalities. When focusing on whether deliveries were disrupted, rather than the known loss of life, the appearance is one of heartlessness.

Amazon deliveries are surprisingly reliable when my expectation is to get a new lamp or even a shipment of toilet paper. But if my life relied on their never being delayed, I would already be dead about 5-10% of the time, and it doesn't even take a plane crash.

This is an incredible stance to be taking: weighing the unlikely potential death of someone relying on known unreliable delivery schedules against three confirmed casualties from a freak accident. You want to talk about callousness?

Chains of consequence are very common. Especially in accidents. I'm defending the idea that Amazon might plan around accidents. That's neither strange nor callous, it's a form of conscientiousness, nothing more. Once again, your life isn't the same as others. Disabled people do die from taxis not arriving, messages not being passed on, etc - because nothing is set up for us, services are not backstopped because able people don't need the backstop. Society isn't built for us, so we have to take our chances all the time, with no failsafe being provided. In some cases, every day. The disabled rarely have the financial resources to be truly safe. Your comment is ableism at its finest.

> Your comment is ableism at its finest.

No, it's an internal locus of control. If I went into the Disability Olympics I'm guaranteed at least the bronze. Believe you me when I say we could share some war stories about how the world is not set up to accommodate everyone.

That said, my point is that this is not the place to discuss that. It is not acceptable to go into a thread about people who did die in a way they could not have foreseen, and try to force the discussion onto people who could have died. Especially when your scenario hinges on poor planning and reliance on a known unreliable service for critical needs.

You're claiming ableism and saying that the world does not accommodate you, while callously downplaying the lives that were lost by people who were trying to do exactly that. If Amazon deliveries are what you rely on to survive, how can you brush off the deaths of those working to bring you those goods?

Nobody is obligated to talk only about the crew deaths. Arguably it's sanctimonious and tacky to make a big show of "respect" for strangers who died. (Or else you had better publicly mourn everybody else who died from less newsworthy causes.) Meanwhile somebody alive shared a personal experience of being disabled, while making a fairly minor point, and you kind of shat on them.

I'm not sure we read the same thread here. I only brought up the casualties in response to the statement that if you aren't focused on the shipping delay here, you're callous and don't care about the disabled. Being not particularly well off on that front myself, I found that to be an unreasonable and selfish comment, one that attempted to hijack a tragedy to further someone's unrelated agenda. I would fully accept that I could have read into it something that was not there, had it not been for the followup "ableism" comment immediately afterward.

If they had approached from a different angle, something like "Remember that this can have knock on effects, my sister is waiting on her insulin delivery and after a particularly rough week a missed delivery could be fatal" it wouldn't have ruffled feathers. Instead it was phrased such that anyone who hadn't considered that concept was morally deficient. You claim to be upset by my "sanctimonious" behavior and yet view the post I responded to as just "sharing a personal experience"?

> Chains of consequence are very common. Especially in accidents. I'm defending the idea that Amazon might plan around accidents.

They almost certainly do plan for them. They will find out what shipments were on the plane, update the tracking status for those orders, and either refund those customers or ship replacements to them.

It will take some time for Amazon to find out what was on that particular plane, and so even for those customers for whom they decide to ship replacements rather than provide refunds there is no way they will get their items by the original due date.

...and this was a plane crash, which is probably the best case for this. With a plane crash, especially a plane that is specifically contracted for Amazon shipments, Amazon finds out quickly that they have a lost a shipment, and it is then just a matter of finding out which packages were on that plane as opposed to other planes sent out from the same airport.

In the far more common cases, such as delays in the rail system, or weather or natural disasters closing down shipping routes, (1) it will take longer for Amazon to even find out there is a problem, and (2) the same problem might affect an immediate shipment of a replacement.

It is simply not possible with existing technology, even with the resources of an Amazon, to build the shipping system you seem to want: one that will allow people to order essential items two days before they must have them, and always receive them.

Amazon Prime delivery isn't reliable enough for those use cases anyway. Prime packages arrive late all the time. If you might die from a small delivery delay, you need to order things earlier or find a different source.


There are well known ways to build reliable systems out of unreliable ones through the use of slack and redundancy.

If you need something to live, you don't wait until you have a 48 hour supply and then rely on two day delivery, you keep a two week or more supply so that if delivery fails you have plenty of time to realize this and source it from somewhere else before it becomes an emergency.

And if it does become an emergency, there are emergency services for that. Go to a hospital and they will have or fly in just about anything. This is almost certainly far more expensive than not running everything down to the wire to begin with, but at least you don't die.

Wow. How often do I have to repeat that disabled people are not able-bodied and don't have the same choices the abled do. Disabled people rarely earn an income above poverty, they can't necessarily stock items (nor can needs always be anticipated.) They can't hire a service to shop for them. Again, what's my second source that delivers to my door?

As for ER - seriously ill disabled people simply can't go to ER every time there's a problem that might go badly wrong - they don't have the health, and will also get black-balled by the ER if they do this too often, in which case the quality of service they get becomes abysmal. This is probably the most difficult, heart-breaking choice they face. With many health conditions, symptoms are so various that if you don't they don't show up at ER falsely fairly often you're risking your life. But if you do show up at ER frequently, that too risks your life, if you're labelled as a bad patient. Your comment blames the victim, repeatedly. Ableism all day long.

Many health conditions include cognitive deficits, even if only from extreme fatigue. Not to mention that head injuries are a kind of disability, and social support for head injuries isn't better than for other disabilities, it's considerably worse.

Again, I'm being swatted for suggesting that it might be a good thing for Amazon to plan around accidents. I'm very willing to believe that this doesn't affect able bodied people nearly as much as it affects many of the one in five Americans with a disability. Good on ya. Now open your hearts.

There are dozens of services that will deliver to your door or via FedEx/UPS/DHL/USPS.

It seems like your main complaint is that "guaranteed two day delivery" is a money back guarantee rather than an actual damages guarantee that would give suppliers a greater incentive to achieve 99.999999% reliability instead of 99%. But there are services that will satisfy those guarantees, and they're more expensive than Amazon precisely because they have to price in the cost of taking the steps necessary to achieve that level of reliability.

"Fast, reliable, cheap; pick two" is not a conspiracy against the disabled.

"guaranteed two day delivery" is not a money back guarantee. It's just a slogan. Maybe half of my Prime packages make it to my door within two days. Amazon only really guarantees that you'll get the package eventually.

It's a money back guarantee because you can cancel/return the item and get your money back. They also happen to provide the same guarantee in the event you decide you don't want the item for other reasons.

When a company tells me 2-day delivery is guaranteed or my money back, I expect to get my money back when the delivery takes longer than 2 days. I don't expect to have to return the item. What's the point of a guarantee if I'd be in the exact same situation without it?

Suppose you buy some balloons for a party that's in three days. They arrive in four days. By the time you receive them, the party's over and you have no use for them. If there was no delivery guarantee, you not only didn't have them when you needed them, now you do have them when you don't need them and are out the money. With the guarantee, you send them back and get a refund.

It also allows you to send back items that arrive late even if you do still want them, allowing you to punish them by returning the item only to go buy it from a competitor.

You said Amazon lets you return them without the delivery guarantee. What does the delivery guarantee add here?

I pay $100 for Prime so I can get items I want to keep long-term faster. What good does being able to return them do me? If the guarantee meant anything I'd get some of the $100 I paid for the 2-day shipping back for items that arrived late and weren't returned.

> You said Amazon lets you return them without the delivery guarantee. What does the delivery guarantee add here?

It adds "missed delivery date" to the list of valid reasons to return the item. In theory you could have a guarantee that allows returns because you didn't like the item but not because it was delivered later than estimated -- hard to enforce and kind of silly, but that doesn't mean the scrupulous person who only makes returns for agreed upon reasons wouldn't be getting a money-back delivery guarantee when you add it.

> I pay $100 for Prime so I can get items I want to keep long-term faster. What good does being able to return them do me?

It allows you to punish them by returning the late item only to go buy it from a competitor.

> If the guarantee meant anything I'd get some of the $100 I paid for the 2-day shipping back for items that arrived late and weren't returned.

The $100 pays for the "2-day delivery in most cases" shipping service.

If it's worth that much as-offered then they don't need to let you keep the item for free on top of that. If it isn't then why are you paying for it?

> It allows you to punish them by returning the late item only to go buy it from a competitor.

They know most people won't do that most of the time because that increases the inconvenience they've already suffered from the late package. The guarantee should benefit me, not increase the damage I've suffered from the late package.

> The $100 pays for the "2-day delivery in most cases" shipping service.

I don't get 2-day delivery in most cases. They fail to deliver within that window more than half the time.

> If it's worth that much as-offered then they don't need to let you keep the item for free on top of that.

I'm not suggesting that I get to keep the item for free. I'm saying that I should get a partial refund on the shipping when I pay extra for guaranteed 2-day shipping and the item takes more than 2 days to arrive.

> Again, I'm being swatted for suggesting that it might be a good thing for Amazon to plan around accidents.

you're being swatted for the disingenuous grandstanding of suggesting they should have a plan, when you already know what the plan of _every_ shipping company is[1], and you know that the portions relevant to you will be covered in the terms and conditions you skip over, despite the supposed importance of this subject to you.

maybe you want them to have a different plan, but that's not what you're saying.

[1]: refund shipping costs, and pay out whatever insurance was purchased on the shipment, subject to some contract-specified delay.

He did suggest an alternative: order earlier.

Plane crashes are very rare, but train derailments are not. Unexpected bad weather closing off routes to trucks is not. Natural disasters closing roads and airports is not.

Some small percentage of shipments will be delayed, and there is nothing anyone can do about it. If you need something by a hard deadline, and it is going to be shipped other than perhaps locally, then you must order early enough that if it is delayed you have time to seek an alternative source that can still make the deadline.

If you use Prime two day shipping timed for delivery on your deadline, you will eventually be screwed because you won't find out about a shipment glitch until it is too late to switch to a backup other than picking up at a local store that stocks the item.

The rate of missed delivery dates from Prime won't be appreciably affected by this incident.

I sell through FBA and see how much stuff gets returned for "missed fulfilment promise". The rates might surprise you. I've also sold medical products and once got a feedback crediting it with saving someone's life (was a blood pressure monitor).

It’s great thing to keep in mind but you make it sound like everything else in whole supply chain proces is perfect and this is the only thing that can go wrong.

All supply chain operations have many levels of risk attached to it and you always need to manage it. Main risk you should make sure you don’t have to is to depend on shipment arriving on time. Plane crashing, fire in a delivery center, last mile truck crashing, heavy snow blocking roads, etc, etc. And that’s just shipping - there’s also risk with insurance company getting cofnused with you prescriptions, credit card getting denied, pharma company having production issues, etc, etc. You have risks also with your local pharmacy, even if you aren’t disabled - again roads maybe closed, they may burn down, staff can catch some serious infection that will force closure, etc, etc

If you need critical meds, you need to make sure you always have enough of a buffer, as there’s many things that can always go wrong

Items, not meds. See my other comments here. You clearly have no idea how close to the edge, financially and in many other ways, society pushes the disabled, nor how frequently. Society is not built for us, and genuine consideration - as the discussion here shows - is extremely scant.

I am very well aware that my health and safety are commonly at risk; therefore I don't find the suggestion that a firm might actually plan around an accident to be bizzare or antisocial.

Oh, trust me, I know how close to financial edge medical issues can get you.

But, there's tons of things that can go wrong with any company operations, and for some they'll have plan b to meet contract, for others they refunds, for others they have insurance, and for some of them - it's just an inherent risk of running a business, that they have to roll with.

It's unreasonable to expect any company to operate risk free, with 100% guarantee of fulfilling contract every single time - there's not a single company on earth that can do that. And as a customer, one needs to be aware of that.

But what do you want Amazon to do? Send out two (or more) packages for every order on distinct routes just in case one gets delayed?

I understand that you are advocating for a minority group, and that's noble. But the expectation that it's not okay for Amazon to miss a delivery due to a plane crash is irrational. It has nothing to do with "ableism".

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact