> Nonetheless, the official said the incident shows that the Air Force and the Army have a serious training problem that needs to be corrected. "We need to know how our equipment works; when the battery is changed, it defaults to his own location," the official said. "We've got to make sure out people understand this."
I hope that's not the mentality of the military today, that special forces operators in the field need to remember UI/UX details of the software they use...rather than the software developer adding in relatively simple safeguards, such as confirmation boxes.
It's also a lesson in how software development and engineering can go awry without close feedback from the audience. There are on-the-field realities -- such as the battery going low -- that even if the developers are aware of, they can't easily predict how the intended user will react in those scenarios...to tragic results, in this case.
If you can't wrap your brain around that statement... go thank a soldier.
Again let me emphasize I'm not saying the software is OK as is. It should be fixed, somehow. But at the same time, the military can't afford to just, say, stop using the unit entirely until then, and wait until all equivalently serious bugs are fixed. Not having the software 'cause it's in the shop can be fatal too.
This puts a high premium on the efficiency of bug-finding, especially spec bugs of new systems. My intuition is that this could improve by a lot.
Another thing I wonder about: this happened in 2001, an era that IIRC, portable devices were few and sleep/power-down modes were not as well-executed as they are today. Yes, today, if I were to power off my mobile device, then replace its battery, and power it back on...I would expect things to be more or less how I left them (which is a little unrealistic, depending on the application)...but back then, I could see how a developer would think: "Well of course everyone knows system state gets reset when the device is powered down for any reason"
John R Fox's citation:
"As the Germans continued to press the attack towards the area that Lieutenant Fox occupied, he adjusted the artillery fire closer to his position. Finally he was warned that the next adjustment would bring the deadly artillery right on top of his position. After acknowledging the danger, Lieutenant Fox insisted that the last adjustment be fired as this was the only way to defeat the attacking soldiers. Later, when a counterattack retook the position from the Germans, Lieutenant Fox's body was found with the bodies of approximately 100 German soldiers."
Pretty sure you can find some more if you look.
 - http://www.cmohs.org/recipient-detail/3431/thacker-brian-mil...
 - http://www.cmohs.org/recipient-detail/2744/fox-john-r.php
You'd hope they would add some sort of safety to prevent this sort of thing, though.
(I know, I know, don't complain about the downvoting.)
Those aren't mutually exclusive alternatives. Special forces operators (and soldiers more generally) need to understand the operational characteristics of the equipment they actually have.
This does not mean that suboptimal UI/UX in military equipment shouldn't be addressed, it should.
As an example, consider the Therac-25 incident. It was a radiotherapy machine designed to operate in two modes - direct exposure to a low-power electron beam, or firing a high-power beam at a set of targets to produce X-rays. The predecessors used a loosely-coupled system - when the targets were not in place the high-power mode was electrically disconnected, or in other words the exposure system was separate from the control system. This was switched to a tightly coupled system, where the computer also served as the safety interlock. A particular sequence of inputs combined with a race condition could result in the targets (and sensors) being unlocked and removed but the beam firing at high power. Since the predecessors had hardware interlocks, this didn't result in any exposure. But they re-used the software going forward, and this bug surfaced.
No direct relation to this incident, of course. The controller should have a specific warning message and confirmation dialog before firing within a minimum safe distance. Probably wasn't in the specs though.
Confirmation dialogs are dead UX. Users have been trained by internet browsers with pop ups and other poorly designed pieces of software with poorly worded dialog boxes to completely ignore dialog boxes. I've tested this several times. Users don't even consciously register that the dialog box ever appeared. You can watch them, right over their shoulder, and as soon as they close a dialog, they will turn to you and ask "what do I do now?" you will ask, "what did the pop-up box say to do?", and they will respond "what pop-up box?". Walk them through the process again, don't pre-warn them about when the dialog appears, and they will do it again.
This is also why crapware is so easy to install on any system that has wizard-based installers.
Do not use dialog boxes. It's better to hire two testing teams and set them against each other to break the software. Test and test and test again, then disallow bad behavior. The "plugger" example should not have allowed issuing a fire command on its own location, unless some explicit, completely out-of-band, one-time-use option would enable it so. Actually, better yet, it should not re-initialize the target location on start-up with the device's location. Just clear out the target, when the user tries to fire, they will see the lack of target and probably know "oh, must be because I changed the batteries". There is a reason they are called "fail-safe"s. You create failure modes that err on the side of safety. It's better if the device fails to fire than to fire at the wrong target.
I don't care if you can't afford it. If you cannot afford a large testing infrastructure, then you can't afford to make safety-critical software at all. You cannot solve this problem with software.
These situations come about when you have a management infrastructure that cares more about feature lists than correctness. I can hear them now, "we need to move faster", and "if it works, don't fix it."
There are many actions that could be very harmful under some situations, but occasionally required (e.g. dealing with websites that get a certificate for `www.mysite.net` instead of `mysite.net`). Much of safety engineering is ensuring these are not encountered in day-to-day routine, but these are never perfect, and much worse in open, dynamic environments (like the military, when this case occured).
Militaries do typically have radio protocols to reduce the risk of artillery targeting unintended locations. I agree that good testers should have caught this bug, but there are hundreds of corner cases and you will always miss some of them. Domestic electric equipment is designed to prevent live wires from being exposed, but RCD-s are still a thing.
If the dialog has OWN POSITION in big red blinking letters they will understand.
Never firing on your position can cost lives too.
I've seen both engineers and product managers use this as a crutch. And, frankly, as a PM I can say the blame generally lies with the product manager. This is our job. It is to understand how this technology will be put in practice by users and to make sure we are properly distilling and prioritizing the needs of those users into requirements for the engineers.
Sounds like just about every law or regulation written since the dawn of civilization.
Not sure if it a solvable problem. Heck, even nature has crap like this popping up (a human embryo grows a functional tail at one point, and sometimes it stays around).
While I agree with the general point about specifications, we are talking about a weapon of war here. Interrupting a firing procedure for maintenance and then not checking the co-ordinates is exactly the kind of careless mistake I'd expect to get someone killed. At least it defaulted to the location of the receiver, preventing further errors, rather than a semi-random target.
I did not find anything in the official Washington Post archive: http://www.washingtonpost.com/wp-adv/archives/front.htm
This search shows other articles by that author on the Washington Post site during that time, but not this specific one: https://www.google.com/search?q=%22vernon+loeb%22+%22kandaha...
It looks like this happened in December 2001 and the speculation of it being caused by the battery change was released in March 2002. Christian Science Monitor  reported on it in December when the cause was unknown. This PDF  has some more information on the Washington Post article.
Feb 2, 2002. U.S. Soldiers Recount Smart Bomb's Blunder
Mar 24, 2002. 'Friendly Fire' Deaths Traced to Dead Battery; Taliban Targeted, but U.S. Forces Killed
However, I think this is one of those (many) cases where it is an "operator error" which a better design could have prevented.
In other words, I think it was a plain bug. Somewhere in the fire() function there should have been some check for:
distance(current_gps_coordinates, target_coordinates) < min_safety_distance
In the article, I explain why this is going to be a bigger problem in the near future, and speculate that some simulation-based could, perhaps, do the trick. But I am not sure.
Any thoughts about that?
Having some form of 'interactive spec' where stakeholders can 'play with' the system to verify intended behaviour is a really interesting idea. Of course it is used in software development all the time to varying levels of fidelity (wireframes, mockups, interactive prototypes and so on). But I think here is the rub... the better the quality the simulation the greater the amount of effort until it becomes approximately equal to the cost of just doing it.
Maybe with enough automation and tooling...
From what I have seen (I looked a bit at the what people are doing in UAVs, autonomous vehicles and so on - see e.g. http://blog.foretellix.com/2015/07/03/my-impressions-from-th...), I think a lot can be done to improve both price and performance.
In other words, I think it is possible to invent new tools / methodologies to make simulation (especially high-level simulation) easier, and especially to get a lot more out of it, at all stages of design / verification / maintenance.