Hacker News new | past | comments | ask | show | jobs | submit login
Count to ten when a plane goes down (johncbeck.tumblr.com)
1112 points by UltraMagnus on July 21, 2014 | hide | past | web | favorite | 273 comments



Firstly, firing the intern doesn't make sense - it was a mistake waiting to happen and he just happened to do it at the wrong time.

Secondly, the punishment meted out should be:

1. Proportional to the degree of carelessness (in this case not that much since he accidentally hit a wrong key adjacent to the right one, didn't mow down anybody while driving drunk)

2. Inversely proportional to the likelihood of the error (in this case the likelihood was very high since the reset key was a. uncovered/single-press b. right next to single reset key).

3. Proportional to intention (this was a completely unintentional error)

If you say, that the punishment should also be dependent on the degree of damage, I would say that the responsibility of managing the risk of such damage wasn't his but of the person responsible for implementing such a high risk design. If such a person is not around, find the person who approved such a design. Government departments are usually very good with paper trail.


The author has actually responded to this on his blog: http://johncbeck.tumblr.com/post/92502108047/so-what-did-you....

    So what did you do after you got fired from the embassy?

    What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)
So I guess it wasn't a big deal after all.


I think you're wrong to ignore the consequences of the actions as an input to the punishment. A small amount of unintentional carelessness that causes huge damage could still be punished. One could argue that a certain degree of mindfulness in critical situations is a job requirement, and casually making a careless -- though unintended -- mistake demonstrates a lack of mindfulness indicating that the person is not properly qualified for the job.

I understand that we don't want a culture that fires people for making the sort of mistake anyone might make. But to be so careless on a day that is clearly an exception where something more important than standard business procedure is going on, can't you at least see why firing the intern for such a lack of mindfulness might at least make sense, even if you disagree with it?

I interned for a government organization that maintains hydroelectric dams and the software that controls them throughout the Southeastern US. A careless mistake could -- in the worst case -- cause blackouts, cost the company millions of dollars, or even cost lives (if the data-control feedback loop caused a turbine to spin up at the wrong time or to fail to shut off in an emergency). And, as is quite common in organizations with non-software-engineers running the show, the development processes were entirely haphazard. The environment was such that it would be really easy for me to push unreviewed code, or to make a stupid deployment mistake, or to be careless in a number of ways that the system didn't protect me against.

But it was OK, because they hired smart, competent people who understand the need to triple-check, if necessary, before committing. People who understood the gravity of the situation, and who didn't phone it in if they weren't feeling it that day. If I demonstrated that I wasn't one of those people, I would fully expect to be fired.


> I think you're wrong to ignore the consequences of the actions as an input to the punishment.

This equivocates on "consequences" of actions, though. It's obvious that the consequences of hitting F7 before the incident were understood by all responsible to be low enough that any intern could be expected to make the right decision. After the incident, the consequences of hitting F7 were sharply increased such that no future intern would ever be allowed to make that decision. But then you can't make an argument that assumes "consequences" were the same at both points in time.

We make this fallacy all the time probably because we're designed by evolution to reassess the morality of an action based on consequences. It works as a social heuristic for shaming or rewarding people but it makes no rational sense that the morality of an action should retroactively change based on future consequences. You can see similar behavior in our rewarding athletes for profound genetic advantages, or punishing criminals for profound genetic deficits. The consequences somehow redeem or condemn, and they should do neither.


No, the consequences were the same before and after the incident: a total system reboot. The varying factor here was temporal: it was usually a low-risk action when the office was empty, a high-risk one when the office was full.

The negligence on the intern's part was to make decisions and act without regard for risk as if he was in the low-risk window despite the evidence he was actually in the high-risk one (all the already-active PCs).

It makes perfect sense that the punishment should reflect inappropriate regard being given for known consequences. That's what negligence is.


> No, the consequences were the same before and after

I'm talking about the perceived consequences, not the actual consequences. The fallacy here is to perceive low consequences at one point in time, perceive high consequences at a later time and then try to change history such that low consequences were never really perceived.

> The negligence on the intern's part was to make decisions and act without regard for risk as if he was in the low-risk window despite the evidence he was actually in the high-risk one (all the already-active PCs).

He was in a perceived low-risk window. The perceived consequence of accidental reboot was already figured in and was already perceived to be low. Else why would the F7 key be next to F6? It is certainly unfair to expect someone to perceive high-risk when everyone else perceives low-risk.

> It makes perfect sense that the punishment should reflect inappropriate regard being given for known consequences. That's what negligence is.

The perceived consequences were low-risk, therefore the known consequences were low-risk.


> He was in a perceived low-risk window.

... because he was negligent. "Oh, all the computers are already on? That only happens when Washington's waiting on something. Oh well, I'll carry on like this was any other low-risk morning"

> Else why would the F7 key be next to F6?

Same reason why "rm -rf " is the one keystroke away from disaster. Perceived risk has nothing to do with it.


> "Oh, all the computers are already on? That only happens when Washington's waiting on something. Oh well, I'll carry on like this was any other low-risk morning"

Because those situations were also low-risk mornings. He only saw that pattern when people left late. He had no reason to expect that people would be working early in the morning because that situation had never occurred. Further, a secretary playing a computer game in the morning suggests business as usual, no one working.

> He was in a perceived low-risk window. ... because he was negligent

No, someone else set up the computers and software with F6 and F7 command functions side by side and then evaluated the entire network as low-risk for interns under all situations. It is perfectly reasonable for an intern to take the same low-risk perspective as his superiors.

> Same reason why "rm -rf " is the one keystroke away from disaster. Perceived risk has nothing to do with it.

Perceived risk has everything to do with it. It is inconceivable today that an intern would have unrestricted access to a company's file system and be literally a few keystrokes from disaster. The key reason for that is because perceived risk now is much closer to actual risk. In 1983, no one had a clue about the kinds of things that could go wrong. Understanding real risk is a painstaking process requiring time, trial and error.


> it was OK, because they hired smart, competent people who understand the need to triple-check, if necessary, before committing. People who understood the gravity of the situation, and who didn't phone it in if they weren't feeling it that day.

In my experience that is not nearly sufficient for implementing any process that can't tolerate errors. It is necessary to have conscientious people of course, but they still are humans. Given the opportunity for 2,000 hours a year, year after year, they will screw up.

Humans are very bad at following procedures. For recent examples, consider the people operating our nuclear missiles and those protecting our bomb-grade nuclear materials. If even they don't have enough motivation to follow procedure ...


I got a strong impression from the article that he had no idea there were extra people there, or that there was even a critical situation. He wrote as though he was doing a mundane daily task, and said he was really surprised his boss was even there. Given those details, he had no reason to have a heightened sense of awareness. He also mentions that a reset of all workstations should have had no impact at the time.

In this situation, the secretary playing the game is just as culpable as the intern, which is to say, not really responsible.


Well, he mentioned that he noticed an exceptional circumstance right when he arrived -- notably, that computers which were usually his responsibility to turn on were already on. You could argue that a less negligent (and more aware) individual would have extrapolated that into a heightened sense of awareness.


There is a wholly different level of responsibility and ability to ensure quality in your scenario that is simply not present in the article.

In one case accidentally pressing the wrong key deleted incredibly important data, while in your case you have plenty of time to review and ensure quality at your leisure.


He's used a longer version of that story to view this as a management lesson – it's definitely not just “blame the intern”:

http://globis.jp/774-2


Sorta weird how the 2 articles differ from one another about being "fired".

Count to ten article:

> I, naturally, felt terrible and was, appropriately, fired.

Honesty Wins article:

> But, naturally, that day was my last day of work at the American Embassy. But, not because I was fired; although, I might have been fired if that day didn’t just happen to be the last scheduled day of my summer internship.


Bureaucratic jit-jitsu:

http://johncbeck.tumblr.com/post/92502108047/so-what-did-you...

“Oh, don't worry boss, we sacked the fool who made that mistake!”


I find it surprising that the exact key he fat fingered wasn't burned into his long term memory. Not that it would of been helpful to him down the line, actually the level of obsessing which would of taken place after the fact to remember it this many years later would of been quite counter productive.

It is simply that given the simplicity and consequences of the error, it is the type of thing I generally see people beating them selves up over until they can not forget it.

(In case you are saying to yourself but he did remember the key, look at the two versions in one he says F6 machine reboot F7 all reboot, and in the other he says F7 machine reboot F8 all reboot, indicating that while I hope he knew the keys functions then, he has since forgotten the exact key, or is substituting F keys for story telling purposes)


I bet this entire article comment thread would be completely different with this additional context. Thanks, it certainly paints a better light on the situation!


Punishment is not a good way to correct behavior.

Your number 2 is particularly wrong. For punishment to work in affecting behavior, you can't punish for a very unlikely event, especially accidental.

Punishment changes behavior by making people anxious and afraid of the punishment. If you punish something that's very unlikely, it does no good. It's like if pushing ctrl-F restarted the stations, and the guy has never pushed ctrl-F before, and never been punished for it. It's very unlikely that he would ever push ctrl-F. But he happens to trip getting up and accidentally hit ctrl-F while he's catching his balance. Does that warrant a heavier punishment? It's more unlikely, certainly, but what would the punishment change about his behavior?

Punishment works because you are afraid of it. It works because you want to avoid it. But accidents don't happen because of defiance or a rational decision making process.

If you were punished moderately and frequently for a common mistake like hitting F7, it could correct behavior because you would be more vigilant when hitting F6. Having it be proportional to the degree of carelessness in terms of correcting behavior is not important. If someone is more careless, they will get more frequent punishment. If the punishment is too strong, it will just make people fearful instead of correcting the behavior.

Firing someone who consistently makes mistakes is a corrective action, not punitive.

Punishment is generally more of a cultural thing and less of a means of correcting an issue. Punishment is expected, so it's delivered. In western culture we have a particular need to find someone responsible and punish them. Rarely though do you feel "I don't want to get punished, so I am going to do this right." but it's not uncommon to think "I don't want to get punished so I'll avoid this altogether."

Corrective behavior is better when it's not punitive. Look at the design of the software, correct that problem. Look at the systems that allowed this to happen, correct them. Work with the staff and find out why this could happen, help them correct it. If people are punished for writing the software poorly, they're just going to cover up the flaws that they find instead of bringing them to light to correct them. If staff are punished for making mistakes, they're going to hide them instead of seeing if they can fix them.

Punishment is often just a game to abdicate responsibility. "Oh, it wasn't my fault. It was his fault. The proof that it is his fault is that he got punished for it. I've done my part to solve this problem."

Especially in complex environments like corporations and government, I think that the last thing you should do is look for a person to blame. Instead of looking for the person responsible for implementing the design, or the person who approved it. Look at why it was implemented, how it was approved. Instead of pinning it on an individual, pin it on a system.

I think you should only look at an individual if they are committing malfeasance for the purpose of benefiting themselves outside of the system. If the person approved the design because they weren't aware of the potential risk, then find out why. If they approved it because there was supposed to be another safeguard to stop it from accidentally happening, find out why that wasn't there. If they approved it because they gave the contract to their friend who wasn't the best decision, and overlooked issues for a cut, then go ahead and blame them.

If there's a problem with the person, say the designer was just irreconcilably bad, then remove him. If it's a problem with training, then train him. If it was something he did as a greenhorn in the past, and now he's much better, then for God's sake don't punish him for a mistake he made years ago when he was put into a project that was more important than the skills he was hired with, unless he grossly lied about his skills.


The author states they felt it was appropriate when they were fired. In what world would it be appropriate to get fired for a single, simple, incredibly easy to make mistake? Doubly insane when there were exactly zero safeguards in place to prevent the mistake from being made.


According to his next post:

What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)


Well, that clears it up - and it's quite a clever way to tick the arbitrary "someone took the fall" box.


Itoh should have been fired.


Why? How did Itoh demonstrate that the company would be better without him than with him?


He was the one who was more responsible for the incident than the intern.


As acdha above notes, he was. But the author incredulously blurted out that he himself had hit the button when he was told his supervisor had been fired over the event (maybe ten minutes later).

The reality was that his boss took the fall for him, which is awesome and terrible. Much of the discussion in this thread has been a tempest in a teapot due to missing context.

His supervisor took the fall to protect him, he was fired on paper, but it was his last day anyway, and he did actually get to work at the embassy again, as it really was a simple innocent mistake.

Though certainly one with serious, long-lasting consequences.

Edit: http://globis.jp/774-2


OK, probably I'm just dumb and have poor reading comprehension (and will get downvoted again for asking a simple question), but can you explain why Itoh was responsible?

It seems that the translators could have saved their work more regularly -- perhaps they hold some of the blame. Obviously the poster could have thought a bit before hitting the button -- he holds all the blame for the resetting of all the terminals. How is Itoh "more responsible"?


My reasoning is that Mr. Itoh put an intern in charge of a system that could cause major damage. It's like giving the intern keys to your AWS console and shitting your pants when he terminates all you EBS root disks that you didn't back up.

Mr. Beck wasn't culpable because he didn't understand the full effects of his actions or the tension of the current situation. Ioth should not have let Beck in the door that morning and he should not have given that much power to the intern.


Makes sense (and is convenient). The people at the embassy had to have someone to blame, and report back to their superiors that they had dealt with the situation appropriately.


The world where you know you that a) you have an incredibly powerful key with no safeguards at your fingertips and b) you might be in a breaking news situation and nonetheless you go for the key right next to the dangerous one carelessly enough that you miss?

Think about it: Unix is equally as "insane". If you're the guy on the console who meant to clean out some crap dir and accidentally typoed "rm -rf /" and then caused an international crisis you're going to get fired too.

Then years later HN will call for Dennis Ritchie to get fired instead.


Also, the situation where somebody has to be fired.

I imagine that someone wanted someone's head, so whose head should it have been? They guy who wrote the system couldn't be fired, he was in a different company. And maybe a macro has been assigned to that key, so it wasn't his fault anyway.


The person in charge of minimizing risk to their internal systems.

Unfortunately, most small companies have no-one who fills that role, or if they do, it's the same person who both has the power to fire others, and is unwilling to entertain the notion that they themselves are at fault.


Except he didn't know there was a breaking news situation. He said pushing the wrong button wouldn't normally be a big deal.


He said he came in to find the system running, and the only time that happened was when Washington was waiting for info.

And even if you don't buy that, if pushing the button's not a big deal it doesn't need all the safeguards everyone's yelling for. (Had such safeguards been in place he might equally well have seen them, thought "oh, nobody's in, this will do what I want anyway" and approved it).


That button needs a safeguard even if it's not an international incident. Even if only one person would lose a few hours' work from an all-nighter, work for something that's not so important, it's still someone's work.

And given that it's right next to a 'single terminal reset' key, it should be immediately obvious to anyone who's ever used a keyboard - mistakes can and do happen, even when you're fluent.


And yet, to this day, Firefox has both Ctrl-Q (close all Firefox windows without prompting) and its neighbor Ctrl-W (close current tab) and refuses to change that or provide remappable keyboard shortcuts. One of the biggest UI failures I'm aware of in 2014.

https://bugzilla.mozilla.org/show_bug.cgi?id=52821


This in combination with "Restore all tabs on startup" not being the default is a disaster.

The first thing I do on a new Firefox installation is enable "Restore all tabs on startup".


Even without that flag enabled, you can restore all tabs when closing/reopening Firefox. History > Restore Closed tabs.


Thanks, good to know.


This is in all apps in OS X, that's the standard keymapping.


Ctrl-Q exits on Thunderbird too. Since in MS Outlook that combination marks a mail as read, for keyboard-heavy users switching between the two, it is no fun.


One might wonder if that workstation showed the online and offline status of all the computers.


The worst unix disaster I ever saw happened to one of my co-workers. He was working on a client machine, logged in as root because he needed to compile and install some complicated software. As he was working, he did an ls -l /bin and copy-pasted it to a text editor so he could make sure everything was installed correctly. Unfortunately, after returning to his console, he accidentally hit paste. Most of /bin was actually symlinked somewhere else. As you know, ls shows symlinks like this:

lrwxrwxrwx 1 root root 20 Apr 27 17:02 cc -> /etc/alternatives/cc

Guess what happens when you paste a whole list of those into a console as root?


That's fascinating.

To prevent this from happening with me, I've added the following line to my `.Xdefaults`

    URxvt.perl-ext: confirm-paste
(I'm using `rxvt-unicode`)

http://i.imgur.com/joHRXaH.png


The worst unix disaster ever? Could you elaborate for non-unix people such as myself?


The important character here is the '>'. This redirects output to a file and overwrites the file. The lrwxrwxrwx will only print an error, but the redirect to the target executable will erase the target.

For example:

  $ echo "asdf" > foo
  $ cat foo
  asdf
  $ lrwxwrwxrwx 1 root root -> foo
  lrwxrwxrwx: command not found
  $ cat foo
  $
So basically, this zero'd out every executable on the system.


Yep, that's bad.

Out of curiosity, what was the solution to fix all that?


Possibly a cp or an scp from a remote system with a working set of binaries.


scp was in /usr/bin, so we could at least copy enough basics from another system and recover the rest from a backup. Needless to say we lost the client contract.


Oh no, that redirect!


But typing "rm -rf /" is significantly harder to do accidentally than typing F7 instead of F6.


Not really, a lot of novice unix users are of the habit of removing files with -rf switch. I cringe everytime I see it.

The command "rm -rf ~/blue/" is just a single space key from being equivalent to "rm -rf /" with "rm -rf ~/blue /"


On any modern system it's actually "sudo rm -rf / --no-preserve-root" and then entering your password while staring at the command.

"rm -rf ~/blue /" will not come close to deleting / unless you are in the habit of running every command as sudo, even ignoring the presence of --no-preserve-root


Much, much, much the worse is "rm -rf ~ /blue". I don't give a crap about 99% of the stuff outside of $HOME, but of course, the stuff in $HOME is the stuff that's trivial to destroy.


You're missing the point:

    $ cd dir where there are source files and temp files
    $ rm *.tmp # or so you think
    '.tmp not found'
    # too bad


Except when: (these are terrible lessons to learn)

1. You type it into the wrong system (D'oh)

2. You have run `mount --bind / /somewhere/else` then `rm -rf /somewhere` a week later

:(


It boggles my mind that --one-file-system is not the default :/


> Not really, a lot of novice unix users are of the habit of removing files with -rf switch. I cringe everytime I see it.

Every few days I remind myself of this.... then I have to delete another directory with a git repository in it, and end up add the -f in again


I run into this all the time, too. Now my -rf usage is almost always wrapped by this:

    rmgit()
    {
        git status
        read -p "Are you sure? " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]
        then
            rm .git -rf
        fi
    }


I think this would be a bit better for interactive cases. Note: written just now, I haven't actually felt the need for this safeguard... yet.

    rmrf()
    {
        (echo "The following files are going to be deleted!!!"
         for FILE in "$@"; do
             echo "<<<" "$FILE" ">>>"
         done) | less
        read -p "Are you sure? " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]
        then
            rm -rf "$@"
        fi
    }


Doesn't help against network filesystems mounted in subdirectories. --one-file-system (which really ought to be the default) prevents this.


What would be the "good" alternative ? I often tries "rmdir" or "rm -r" if the directory is not empty and very often there are some "protected files" so I add -f. Thus it happens that I directly lauch "rm -rf".


Watch it fail first, verify that the failure makes sense, check to see if there's a way to delete one file with -f before deleting the rest with -r. Use -rf only as a last resort, and only by appending the f to an already-failed command whose syntax you've validated.


More easily, "rm -rf ~" with a premature "Enter"


do the -rf after the directory to avoid this in future

Premature enter just results in:

[web@server /]$ rm ~

rm: cannot remove `/home/web': Is a directory


Hehe, I did something like that once, except that I typed "rm -rf ~ /blue/". There was no /blue/, but I managed to wipe out my home directory, and I did not have a backup. :-| It was on my personal machine, so at least I did not delete anybody else's files, but I still got burned hard enough to learn a valuable lesson.


The / key was right next to Enter on a lot of old keyboards. It was quite easy to type 'rm -rf /tmp/garbage*' and have a simple fumble turn it into 'rm -rf /'. I mean, there's this guy I know, he did that once.


There's also the chance of it happening when writing a script

Such as this classic:

https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commi...


Ten years prior to that, Apple had a similar bug in one of its installer scripts on OS X. I have a hard time finding much about it online now, because it happened at a time when OS X and the Internet were a lot less popular than today, but what I recall is that an unexpectedly customized installation directory (say with spaces or one level closer to "/" than the default) would cause the installer to delete a whole lot of things.


I once installed GGClient in C:/Program Files/ instead of any particular folder

so of course, I said "I'll just uninstall it from there and install it in the correct folder" and it proceeded to delete C:/Program Files/


There's also this one from Pool of Radiance's 2001 release:

http://www.rpgfan.com/news/2001/1416.html

Uninstall wipes out your Windows directory.


I still have nightmares from this buggy mess. To add insult to injury the shop didn't want to take the game back afterwards.


The one that I did only a few months ago was something like

    $ cp -r path/to/some/directory path/to/very/important/directory
    $ (run some commands to verify copy did what I wanted)
    $ rm -r path/to/some/directory path/to/very/important/directory
Of course, all I had meant to do was delete `path/to/some/directory`, but I just pressed 'up' in my history and switched `cp` to `rm`. Of course I hit Ctrl-C in an instant, but my FS was already hosed...


Not really, on my keyboard at least / is directly next to . and you could feasibly be clearing out a directory or something with rm -rf .


It's my habit to never use the -f flag until I get those annoying confirmation messages. I <CTL>-c to cancel that command, then scroll up and add the flag to run again. I think this is a good habit? Anyway, the worst thing I've done along these lines was resetting a dev DB that had seen considerable un-backed-up configuration work. I couldn't blame rm for that.


Eh, "rm -rf $TEMPDIR/$TEMPFILE" in a shell script is just a couple typos away from deleting everything on the network. Yet I've seen people put crap like that in build scripts even after they've previously inadvertently deleted half the network drives.

Fortunately, despite rm's poor choice of options and bash's poor default handling of variable name typos and the obvious PEBKAC, backups saved the day here.

People are human. Policy ought to reflect this.


Be honest, who doesn't have

rm -rf *

in their shell history? Now it's just one accidental twitch away.


I suspect everyone tried to remove all dotfiles and dotfolders with rm -r .* as well...


Many years ago I wrote a kernel module for my own use in response to a similar incident. It checked to see if the calling process was deleting a file called ".landmine" and killed the calling process if it was.

Far from perfect - it depended on the order of deletion - but a more general solution than preserve root. Of course it still requires the user to mark things they consider "important".


Are you trying to say it's OK because Unix behaves analogously?

It is definitely a problem with Unix also.


It is. Which is why everyone in Unix who types "rm -rf " then types their next character _very carefully_ and reads the line before committing.

I'm trying to say that when you've got something dangerous without safeguards, you take care around it. Not taking care of known-dangerous things and causing severe damages as a result is an arguably good case for dismissal.


The proper answer is to type "rm <dir> -rf" otherwise you're risking a stray strike on enter.


The other proper answer is to have a good backup and recovery system.


> The other proper answer is to have a good backup and recovery system.

And, of course, that should've been the solution to OP's incident with the Korean Airlines flight 007. Backups, surprisingly, are scarcely mentioned at all in this whole thread.


rm -rf / is not quite the same as the f7 key restarting all machines sitting right next to the f6 key to restart a single machine.

you cannot fat finger rm -rf /


  # rm -rf /tmp/bla

  # rm -rf / tmp/bla


no


You can fat-finger enter before you're done typing.


that's a pretty fat fucking finger...


You forgot the part where the only reason he was fucking with the F6 key in the first place was to play a game. That's irresponsible and grounds for firing.


Actually, from the article he was a system administrator and another employee had been playing a game which froze her own terminal. The author did nothing wrong except press the wrong button (and to your point: not report his coworker for playing games on her terminal in her free-time).


> not report his coworker for playing games on her terminal in her free-time).

I was enlisted in the Marines, MOS as a programmer (4063), 1989 - 1993. I never really programmed, but spend my time as a small computer support guy.

Computer games were officially forbidden, but unofficially tolerated, provided one was discrete. I suspect the same 'don't ask don't tell' policy applied to EUCE at the embassy in question.

Sea story. My team was once directed by our boss, the Major, to 'sweep' the command for 'games' and remove them from computers. This took the better part of two weeks, and was massively unpopular with our peers. 'A Marine On Duty Has No Friends', we repeated to ourselves. We even got into the spirit of things and deleted games from _our_ computers.

Near the end of this evolution I hand-carried some paper into my Major's office. He was, yes, playing a computer game.

He did at least have the grace to look embarrassed.


He publicly embarrassed USG, POTUS and a major US ally. SK is going to call up the state dept and demand an explanation. Someone has to be fired. This isn't some startup in California where everyone just plays it cool. The termination of his boss and his boss's boss and his boss's boss's boss all the way up were probably considered as well.


I think you will find they embarssed themselves.

Responsibility flows upwards, not downwards. Its just unfortunate that the people at the bottom are often carrying the people above far more than they should...


He was probably fired because they realized they couldn't put a summer intern in control of such a critical system. I would make the same call.


If you go with this line of reasoning, whoever put him in that position should also be fired, and their boss should be fired for putting someone in charge who made such a poor decision in the first place.


The person that should be fired is always the person who has responsibility for the amount of budget represented by the loss.

e.g. No intern that needs permission to get a box of pencils from the supply closet should ever be fired for putting a mistake into production that costs a company $100,000. If a company loses $100,000 on a mistake, you look to the person in the hierarchy who manages budgets of that size. It's their job to make sure the safeguards are in place to prevent losses like that.

In government it's difficult but not impossible to put a dollar value on losses like this. In this case, whoever was in charge of that network, and could request budget to build safeguards (whether software or training) against such mishaps, was ultimately responsible. Firing the intern is just shit rolling downhill.


Firing the intern means that the story you just spun to your boss about how this all occurred won't be contradicted by the intern and you might not get fired.


Interesting and very sensible comment - it's one of the few here that adds some real value to the discussion.


Only part of that makes sense. The person putting an intern in such a position of control should be reprimanded, but it wouldn't make much sense for the next level higher because there isn't a blatant mistake. Hiring someone that turns out to make a mistake isn't as blatant as giving an intern the power to shut down a mission critical system.


They might have been, we just don't know.


That may have happened.


I do agree with you.

An equally sufficient solution would have been to install a safety switch on any button with that much importance. Something like this but probably smaller, or just a plastic cover that fit over the F7 key: http://www.thinkgeek.com/product/15a5/

A fireable offense would be lying about the action or trying to cover it up.

Should the lady who asked for the reset be fired for playing a game and asking him to reset the computer?

Should the technician that didn't install some sort of safety be fired for not foreseeing this issue?

He would be much less likely to make the same mistake in the future than the person who would replace him.

If there are terminals that could erase a presidential report and there is no backup available, you send a non-critical staff member to guard every one of those terminals, or at least put a sticky-note in the middle of the monitor.

I'd say several other people deserved to be fired for this, but the intern was not one of them.


Appropriate or not, it's probably what was to be expected in what I imagine even in 1981 was not the most enlightened HR management regime (the US foreign service). Also, 32 years is a lot of time to wash away the bitterness of having been unfairly fired from a summer job, especially if you, as the author, ended up doing pretty well for yourself.

Also, whether or not it was appropriate is completely irrelevant to the story being told.


I expect the HR management back then was more enlightened than it is now - the quick-to-pounce media and politically instigated witch hunts (terrorism, save-the-kids, etc.) have ensured that cover-your-ass is more and more a necessity.


That sounds like rose colored glasses. The cold war had its fair share of witch hunts. Granted this was the 80's not the 50's but lets not forget McCarthyism was based around democracy vs communism.

"That Korean announcement and the slow response by the US President—both caused by delayed real information—caused decades of conspiracy theories."


HR was called "Personnel" in those days. It was different.

The reality here is that an intern doesn't have civil service protection, so it is quick and easy to dispose of them. Going after the supervisor may take longer, so if rapid action is needed, they'll fire the first person who serves "at the pleasure of" the executive.


Oh, I don't think people are any worse - I just think there's more adherence to the letter of whatever regulations there are now. The principles of zero-tolerance (three strikes...) have been widely applied, regardless of the nuance of some situation.

Note also that it turns out the OP was "fired" but immediately rehired too...


On the other hand, they're hiring people to do these things, not computer programs. Shouldn't we expect them to notice that when doing routine-thing-x, it's awfully easy to accidentally do catastrophically-dangerous-thing-y, and thus it would be a very good idea to be extremely slow and deliberate when doing routine-thing-x?

I have to routinely create and drop databases on my local system. Our production databases, which I also have to connect to, contain hundreds, maybe thousands, of person-months of work. I realized that it would be a good idea, before issuing DROP DATABASE commands, to deliberately stop and double-check what server I'm connected to. Luckily, I haven't screwed that one up yet.


I sure am glad that I never accidentally pushed an unlabeled and unprotected "get fired immediately" button. If it is important to not have all the workstations on site shut down at once, go to the control terminal and disconnect the keyboard before the system startup employee comes in without any clue as to what is going on and starts his ordinary daily routine. Maybe write a note and wedge it into his keys?

Based solely on the shortened account, it was not appropriate at all to fire him. Convincing him that it was is just doubly inappropriate. There may be more to the story, but as it is, it looks like angry scapegoating against a hapless, lowest-level employee.


When so many people higher up are given incomplete information or even downright embarrassed on the world stage because of one simple mistake, I feel it is appropriate. It is still, however, doubly insane that one person's stray keystroke can do all of that.


Insane, but somebody had to pay for that screw up and you can bet your ass it wasn't going to be the guy managing the newbie 23 year old. It'll be the newbie 23 year old himself.


attention to detail is a more valuable skill than people realize


IMO cleary he is not telling us the entire story.


In the world of bureaucratic need for scapegoats.


> In what world

Japan, I guess? I've never been there but the story was consistent with my impression of their work culture.


This was the American Embassy, which follows American work culture.

Also, in Japanese companies, it's basically impossible to fire people. They can, however, be assigned to a desk in a windowless room and be given nothing to do for several years, until they take the hint and "voluntarily" quit.


c.f. http://www.nytimes.com/2013/08/17/business/global/layoffs-il...

Suffice it to say that I am aware of situations created by a societal expectation of lifetime employment which make the above article look positively sane. (And I recently learned that, in some cases, what I had assumed was just an ironclad social contract actually is legally enforceable, which blows my mind.)


Not true. See correction to article.


The correction is unclear... Since there are always exceptions, of course it's not the case that "dismissing a permanent employee (正社員) is always illegal". But there's in fact a (somewhat vague but broad) provision in labour law and also precedents that make it very difficult to legally fire a permanent employee in normal circunstances.

Basically you can legally fire a permanent employee in the same sense that a civil servant in most countries can be fired: if the employee does something egregious, like stealing from the company, not showing up for a long period of time with no reason, etc. Certainly not for incompetence, or even if the company has been in the red for several years in a row.

E.g. Japan Airlines went basically bankrupt (technically a restructuring) and even so they had trouble laying off part of the staff.


The Japanese way is good in the sense that, as long as you have Internet, you could make your startup without worrying about putting a roof over your head or finding an office space to work from, and you get a still-full salary to bootstrap it without having to put the time.

You even get access to a pool of other soon-to-available engineers to work with if you're stuck with other poor sods in the room.

Definitely another scenario than the being suddendly kicked out of the door by security right before the week-end with a box of your belongings and, if you're lucky, a tiny check to not starve until next week.


I highly doubt you are allowed to retain ownership of anything you create on the job though.


It was the AMEMB in Tokyo, though the basic principles apply to ay bureaucracy answering to political masters. Interns don't get AFSA (or, at the time, AFGE) union representation, and somebody's head was going to roll for that mistake, even though the company that programmed a non-confirmed global reset into a single keypress was truly at fault. Fair? Nope. Inevitable? Yep.


Lots of fields are like that. You generally get paid better because of the risk. (I hope he was!)


How about when Russia returned the data recorders after years of refusing to South Korea - made a press spectacle of it - and then South Korea discovered the recorders were empty and missing the data tapes when the press was gone.

Or the US navy crew who received medals after shooting down the Iranian airline.

Once there is loss of life, it is 100% politics afterwards with little to no practicality, just look at all the mass shootings where there were zero changes afterwards. We simply do not value life, it is politics first.


> Or the US navy crew who received medals after shooting down the Iranian airline.

You make it seem like they received the medal for having shot down the plane. In reality, those who were awarded medals, were awarded Tour of Duty medals for their time spend in a combat zone. I believe the distinction is important, particularly since that class of medals are routinely awarded to individuals during their time in the military.


If a police officer shot and killed innocent bystanders, should they get achievement awards for doing their job otherwise?

My answer would be no, you failed at your job regardless.

Same thing with military.


They didn't fail. They were ordered to shoot down a plane, and they shot it down.


That's a very narrow minded position.

Gosh I hope you never screw up even once, cause you'll never live it down.


I assume you are american. Would you be also so generous if the majority off killed people would be americans?


Wtf does being American have to do with anything? And no, it doesn't make a lick of difference. The given situation was a fighter pilot who shot down a commercial airliner under orders later at some point in his career being given a medal for some completely, 100% unrelated reason. Maybe he deserved to be tried for war crimes. Maybe he also deserved that medal. Can you see that this is not cognitive dissonance?


It wasn't actually. Iran Air Flight 655 was shot down via a SM-2MR surface-to-air missile. Its the first paragraph on the Wikipedia article.

This is an insanely important distinction, since a fighter pilot would have had the opportunity to visually inspect the craft beforehand. Where in this case, the accident was made due to incorrect system information which misidentified the aircraft as being anything other than a commercial airline.


He didn't screw up once, he was criticised for being over aggressive and initiating fights on multiple occasions.


Who are are talking about?


Would you say that making a mistake in software is equal to taking a human life?


Depends on the what the software does.


Not sure why the downvotes, but I think for the application software that probably 98% of us here write, it isn't as important as we'd like to believe in comparison to a person's life.

Either that or life is really really cheap.


If you were the developer for a Therac-25, yes, yes I would.


The issue form most new stories is that even when the truth comes out, the great majority of people will never hear the actual facts. One issue is because news stations move on from caring about the story quickly. Or, the bigger issue in my opinion, is that people won't believe the new, correct facts since the old ones will have been engrained in their head. Solving both these issues would be really helpful for society, but are obviously damn hard to solve since we haven't really gotten anywhere in this space.


When there is such chaotic news story, I usually switch from news to Wikipedia. That has all the facts and continues the story even after all media lost interest.


It's a sign of poor management that someone has to be fired when something goes wrong, outages are learning situations for all involved, and it is widely held that the person who took the action that caused an outage is not responsible, but that all involved are responsible.

See John Allspaw's Swiss Cheese Theory : http://www.kitchensoap.com/2012/02/10/each-necessary-but-onl... .

[ Edit: I guess it's not Allspaw's model, but he applies it to systems engineering rather well - http://en.wikipedia.org/wiki/Swiss_cheese_model ]

"Accidents emerge from a confluence of conditions and occurrences that are usually associated with the pursuit of success, but in this combination—each necessary but only jointly sufficient—able to trigger failure instead."

The person who pushed the button is not at fault, the manager is not at fault, the guy who designed the button is not at fault - all are jointly responsible.

Blaming the intern does, however, reflect extremely poorly on Itoh and everyone else in the chain of command. A superior who demands retribution for a simple mistake that happened to cause him or her pain is basically worthless.

But, I forget, we're talking about Ronald Reagan.


I remember this time sequence very well because I was living in Taiwan when the incident happened. Yes, people who lived in east Asian time zones saw news reports that appeared to be based on knowledgeable sources that the plane might have landed safely with all passengers alive. This explanation of why the Western-aligned diplomats and military officials based in east Asia didn't have complete information when they were interviewed by the press is quite interesting, and explains puzzling memories I have from that day.


>And let’s hope that there is no stupid 23-year-old with his finger on an important keyboard in this information chain.

No. This is something you would read in Design of Everyday Things where Don Norman would totally shame the the engineers who made that system. Software shouldn't be designed with the assumption that no one makes errors.


What I find incredible to believe is that this problem could have happened without the F7 erroneous keystroke by a human. A simple power outage could have resulted in this exact same catastrophe.

Why didn't the backups work? System wasn't "robust" enough. (Did I just use the word "robust"?)


Alan Cooper's About Face is a great book on interaction design for computers and one of his axioms is "Hide the escape lever". Basically make sure the ejector seat control isn't right next to the throttle.


Really brave of the author to share this story. I know most people would be afraid to admit this kind of a public "mistake".


Further reading for those who want to be disabused of the concept of human errors:

https://en.wikipedia.org/wiki/The_Design_of_Everyday_Things


> That Korean announcement and the slow response by the US President — both caused by delayed real information — caused decades of conspiracy theories.

I appreciate that the OP was a part of the situation, but conspiracy theories were not caused by this.

It was time of very high tension between the US and Soviet Union. So when a plane veers off the course into not just Soviet airspace, but into an explicitly cordoned off top secret area, ignores all communication attempts, ignores the presence of fighter jets and just keeps on flying, then the situation itself is a fertile soil for conspiracy theories.


'It was a time of very high tension' doesn't quite capture how different it was.

Through the glass of a yellow newspaper box, the Miami News headline that the Soviets had shot down a plane carrying a Congressman. My first thought was "This is the war." Not 'a' but 'the'. The primary stance of the US military was squared off against the USSR and had been for more than 30 years.


Wow. I got goosebumps when I read that article. I'm old enough to actually remember when KAL 007 was shot down, and while I wasn't old enough to hear about the conspiracy theories, I do remember the thing about people being safe and landing in Russia. To think that this was just a small mistake on the part of someone, which caused international ripple effect, and who later blogged about it is really something incredible.


"With great power comes great responsibility."

Incidentally, "features" like this are why I don't trust systems that have some centralised control - IMHO giving any one individual (or organisation, in many cases these days) such power over others is not a good thing.


Scapegoat. The ritual expulsion of the evil spirits wrapped neatly in a little parcel to appease the elders and thereby prevent them blaming each other - harmony continues in the hall of power. Meanwhile the problem was in the process, not the employee, so nothing has been fixed, and the guy who had learned the lesson is no longer there, and so the problem will recur with the next lamb to the slaughter.


I disagree that it was appropriate that you were fired, but interesting story all about.


For security reasons I think that it might have been justified, considering the events that had just taken place. Still a crappy way to go out though.


What security reasons? Firing the author didn't change what had already happened.


People were bat shit crazy in the middle of that Cold War. If someone randomly decided to turn off machines without notice, even if they said "whoops accident, my bad", their actions would have instantly thought of as sabotage.

I'm not agreeing with the outcome.


That fact that he wasn't too concerned about having accidentally reset all the computers in the building suggests that he may not have had an appropriate temperament/attitude for a sysadmin managing critical systems.


Or, you know, he had a perfectly good reason to think that accidentally resetting all the computers in the building at that time would not be a problem:

"Not long after I arrived in my office, I received a call from a secretary in the Agriculture Department who liked to play a computer game before her workday started. Her favorite game had a bug that regularly froze her workstation. [...] I realized that I had mistakenly hit F7 and reset all the workstations in the embassy. This realization didn’t bother me much, because no one except the Agriculture section secretary was usually on the computer system this early in the morning."

I'm sure I'd have thought something like: "Phew! Glad I made that mistake now, rather than at 11am when everyone was half-way through their morning's work. Likely no harm done at all, and I'm going to be really careful with that command in the future. Yup, definitely dodged a bullet there..."


Yes, he had a pretty good reason to think that probably no major damage was done, and this was sufficient to comfort him. This kind of carelessness about the possibility of causing harm or having caused harm suggests to me that he wasn't taking his responsibility as seriously as he should.


you're really reading too much into a simple story, from over 20 years ago, etc.

In any case, we already know where your line of thought leads, risk adverse cultures ultimately stagnate and wither away.


I often suspect that most of the work involved in keeping a power hierarchy going, is involved with trying to pretend that this kind of shit doesn't happen all the time.


And then conspiracy theorists latch onto this kind of shit, but believe that it must be malicious silliness...

Why didn't Reagan respond immediately? Well, he was waiting to hear from Chancellor Gorkon that the KAL flight had been successfully beamed aboard and was en route to Pluto, of course... Clearly they'd have their shit together better so it couldn't have been a 23-year old rebooting all the computers accidentally and wiping out hours of critical work -- that would just be ridiculous...


Conspiracy theorists are just 20th and 21st century prophets, really. They search for meaning in an all too often meaningless world.

It's comforting to think that people can control the direction of every choice in the world, and that someone is at the helm.

It's uncomfortable to think about the daily series of random, unconnected decisions that drive the direction of our species.


I'm not sure that it is more comforting to think that there is someone at the helm, as much as anyone who aspires to be considered to be at the helm has to keep pushing that story, so it gets repeated more often and with better special effects than the story about there not being anyone at the helm.

Actually being in control of stuff is very difficult, but convincing people that you are in control of stuff is pretty easy as we are all suckers for narrative. The main ways to disrupt a power narrative is to spread other narratives or for a situation to occur that upsets the existing narrative, so getting people to make up new ones. This explains why totalitarian governments can collapse so quickly, which wouldn't be possible if the people running them were actually in control of anything.


My first thought after reading the article was that it was ridiculous to fire/scapegoat the author for hitting the wrong key, too. This has happened to me before, where a single keystroke ( in my case, a line break in a config file ) caused me to take down a production system. My punishment? Designing a more robust system that would protect itself from a badly formatted config file. To this day, ten years later, a similar error has not been repeated, despite several attempts of people to push bad config files to our production systems. If I had been instead fired, no doubt a similar, but perhaps not exact, error would have been repeated every year or so.

If I had made the same mistake twice without any attempts to fix the situation long term, then, yes, I think that would have been a fire-able offense.

If you're working with people who care primarily about their own positions and egos without regard to the team as a whole, well, be prepared to be thrown under the bus when it comes time for those people to protect themselves.


Thanks for posting this. I found https://news.ycombinator.com/item?id=8062683 yesterday but yours appears to be the direct link to the author's blog, which I had missed.


Great story....thanks for your willingness to share.


> On this day, I highlighted her workstation and hit the F6 key to reset. But my screen went temporarily black and then seemed to be starting again. I realized that I had mistakenly hit F7 and reset all the workstations in the embassy.

Ugh.

Those with automation capabilities: keep this lesson in mind, because it will happen to you in production one day. 'dsh -a reboot' is incredibly easy to type and can have disastrous effects. Creating abstraction layers around common admin tasks can help catch simple mistakes and give prompts before dangerous behavior.


I hope you fought that firing...

... incompetence like that comes from having F6 next to F7 and no checks or authorisation needed for a potentially dangerous action etc. Processes should be designed for people to make the common mistakes... its what they do.


nm. just seen the follow up. :)


"My boss, a >> Japanese << computer engineer named Itoh, poked his head in the door. "

hmmm, I am pretty sure Mr. Itoh was not Japanese working in the American embassy. I am pretty sure he was American.


If he was a first gen American his culture would have been greatly shaped by Japanese culture.


I thought the headline meant to count to ten when a plane goes down...while you are in it!


Me too. I still don't know how counting to 10 will help you press the right key on your keyboard though.


Reset all computers in the embassy with F7? No warning prompt?

Fire the idiot who wrote that function.


In fairness, it was a different world back then. There were so few people administering computer networks that you could generally assume someone who was doing so had been thoroughly trained; and the thing about highly trained people is that they tend to view things like failsafes and safeties as pointless time-wasters.

"I know what I'm doing when I hit F7, but the damn system makes me sit there for 30 seconds before it does what I told it to do! Piece of junk."

The result was that software in that era tended to come with a lot more sharp edges. The age of the Recycle Bin that would save you from yourself didn't arrive until administering systems became something the general public was expected to do.


Ahhh....the days of sharp tools, no failsafes, and young programmers or admins.

I recall that time I wrote a batch manager for the VAX 11/780 at Caltech High Energy Physics. It consisted of a program to monitor the batch queue and start jobs as scheduled ("BATch MANager", or "BATMAN"), and a program for users to submit jobs ("Run Overnight Batch INput", or "ROBIN").

The configuration file for BATMAN was stored in /etc/batman.

During development, I occasionally had to "rm /etc/batman". Of course, out of habit, as soon as I typed "/etc/" my fingers would automatically type "passwd", and once I did not catch this in time. Oops. It happened to be a Sunday morning at around 7AM, and I had to call the other admin, who handled backups, to come in and restore that file. He was annoyed.

The second time I did this, he was pretty pissed.

The third time, I fortunately had been working at the terminal we had in the machine room, and managed to shut down power to the machine before the write buffers were flushed, and the file was OK after fsck. I didn't have to deal with an angry co-admininstrator that time. Just angry physicists.

The other admin (Norman Wilson, in case anyone knows him or he reads HN) then made a link named /etc/safe_from_tzs to /etc/passwd to stop my nonsense once and for all.

That worked until the first time I wanted to overwrite /etc/batman instead of rm it.

That led to a cron job that maintained a copy of /etc/passwd in a separate file, and periodically checked to see if it were missing or misformatted, and restored it if so.


One would think after the first two times you'd find a better way to do this, realizing your infrequent but habitual mistake. Why didn't you change any of your practices after the first two screw ups?


Not speaking for tzs, but back in the good ol' days, everybody was pretty busy. A lot of software got written by operators a little bit here and there in between running jobs and moving paper around the building and that sort of thing; paid programmers were frequently dealing with change requests from business departments, all of whom wanted their thing done yesterday; and depending on the size of the organization, there might be a PFY or two, but they generally weren't allowed anywhere near production hardware.

For example, one of my early jobs in IT involved running batch programs that produced reports on a mainframe designed for the punch card era. It had moved on from punch cards, but all of the batch jobs still expected them as input, so they were stored instead as "digital cards" in the job files themselves. The operator -- me -- would be responsible for bringing a job up on the terminal, changing each occurrence of some two-letter code in each card file to some other two-letter code, some date code to some other date code, and so on. Each batch job might be just one step of half a dozen or so required to produce paper printouts from the database. The terminal emulator did not have a find & replace function. Naturally, I screwed up jobs on a regular basis.

This mainframe ran mainly on COBOL74. Over the course of a lot of unpaid overtime, a few hours here and there for several weeks, I gradually wrote a variable interpolator in COBOL that could be called as the first step of a batch file and would replace all occurrences of a variable tag with an input parameter passed to the job. Instead of pulling up a job file and replacing a bunch of two-letter codes, you'd just run the job with the two-letter code as a parameter, and this program would rewrite all of the data cards in the batch file. COBOL has no string operators or a string data type, but I found a way to abuse some system calls to make it work.

So it took weeks to fix the most common operator error in that shop.

IT staff spend more time on Facebook, Reddit, HN, and online gaming now than we ever had available for fixing processes back in the day.


I'm not absolutely sure on the number of times. It is possible that I only rm'ed it twice, and then Norman made the link to end that. This would have been around 1981, so there has been some memory fade.


are you that angry co-admininstrator ? just curious


hahaha :) I didn't mean the comment to sound angry - nope not the co-admin. I do have a running interest in stuff like continuous improvement, organizational excellence etc. so consider this field research!


> you could generally assume someone who was doing so had been thoroughly trained

No amount of training can prevent something like this. It's like today's browsers where the tab can be closed with ctrl+w and the whole window with ctrl+q. It doesn't matter how many times you've done it and how used are you to the position of the 'w'. One day you will close the whole window by accident.


Personally I agree. Mistakes happen, everyone has accidentally hit the wrong key at one point or another in their life. I was pretty surprised how seemingly fine he was with being fired. At the same time, I guess the net result of the mistake was big enough that it did kind of require a response, and it has been about 30 years since it happened.


IMHO, firing someone who owns up to a keystroke mistake like that is wrong. Good managers fix the problem, weak managers fix blame.

Root cause analysis + countermeasure might have boiled down to "operator error due to shitty interface" + "we will tape a guard over F7 key, since it will never ever get fixed in software"


> the net result of the mistake was big enough that it did kind of require a response

That's a dangerous way of thinking. 9/11 was big enough that it required a response. Not sure if we'll ever reverse the airport security stupidity that was such a response.


This is a result of a way of thinking called "Politician Fallacy": "We need to do something. This is something. Therefore, we need to do this". Of course, 9/11 required a response - however it didn't require just any response, it required appropriate response. TSA is not one, and it starts to be more and more clear to more and more people. OTOH, firing somebody who caused the network to go down at the critical moment may be entirely appropriate - one of your responsibilities is not to make such mistakes, you failed at it, you're fired.


I never understood the rationale of "you made a mistake, so you're fired". By making a mistake, the employee has increased her value in that she will never make that mistake again. If you're going to replace the employee you have to pay to hire someone even better (to recoup costs of talent hunt, training) and someone who somehow won't make a typo. It just seems like a situation that is strictly worse than keeping the current employee.


>>> By making a mistake, the employee has increased her value in that she will never make that mistake again

This is a far-reaching conclusion. That assumes that a) no mistakes can be prevented before they happen for the first time and b) every mistake can be prevented after making it. The truth of either far from obvious. Moreover, it is routine in our culture that sever mistakes are punished - e.g. if you make a mistake of driving drunk and cause harm, you'd probably be punished, not lauded as model citizen since you'd never make the mistake again.

Moreover, if no punishment follows the mistake, why the mistake would not be repeated? What would be the motivation to avoid the repetition of the mistake - do you assume the sympathy for the co-workers would be enough? It is not always a sufficient motivator.

>>> It just seems like a situation that is strictly worse than keeping the current employee.

That assumes employees are a fungible commodity, and if you pay the same money you always get the same one. This is not true - you can find employee which would be more attentive, or one with more experience.


If you believe that you can find an employee that is more attentive or with more experience, and you are not laying off your employees right now to find those better employees, what the hell is going on? Are you just hanging out, basically sitting there knowing you have suboptimal employees, and eagerly waiting for them to fuck up so you have an excuse to axe them? You know your employees are (dun dun dun) capable of mistakes but it's expensive to lay them off so you're watching like a hawk for when you get to upgrade them?

The key difference here to me is mistakes vs negligence. Employee makes a typo -> It's a mistake. Not severe, negligent incompetence. It's a learning experience. The company is worse off by firing that person who has experience

If someone is slacking off? Yeah, fire them, that's not a mistake, that's negligence. You email a colleague in another time zone asking for help and they ignore you because you didn't CC their manager? Yeah, fucking fire that person.

I mean, in fact we have an industry based around the fact that people make mistakes: it's called software testing. Should we be firing developers when they make a mistake (i.e. their code has more than zero bugs)? That would be ridiculous. You're not even punishing them in that case - they're going to use their current salary at your shop to leverage a higher salary at the new place they (effortlessly) land at, whereas you're going to spend tens of thousands of dollars to hire that mythical developer that you should have fired this guy for a year ago?


> By making a mistake, the employee has increased her value in that she will never make that mistake again.

Personal experience tells me this is not always true.


You're right, I shouldn't've used "never". But that mistake is an experience and people learn from mistakes. Now that person is less likely to make the mistake again.

I mean, the goal of the business is to create value / profit and find people who add value to your organization. Not to judge and suss out people who you discover are capable of making a mistake and saying "AHA! I FOUND YOU! You were an imposter all along not worthy of paying! Time to start from scratch again!"


IMO the fact that we have multiple examples of these kinds of accident-prone key pairs does partially exonerate whoever did this particular F6/F7 bit.


I can't count how many times I accidentally hit the Save Game button instead of the Load Game button in Half-Life. The keys were literally, like, right next to each other.


...the whole window with ctrl+q.

OMG I've never done that but now that I know about it I'm very afraid. If I do it tomorrow I'm blaming you.


Thankfully, Chrome has a built-in feature to prevent this from happening (on OSX at least). Just go to Chrome > Warn before Quitting and make sure there's a checkmark next to the option.

Now, if you accidentally press Cmd + Q, it should prompt a "Hold Cmd + Q to Quit" instead of actually quitting.


Or, Settings > On Startup... "Continue where you left off". This will restore your tabs after launching Chrome.


I disabled that warning because it's annoying every time I want to close the browser, even without the dangerous keyboard shortcut. If it happens, you can go to the menu and find "recent tabs" or just ctrl+shift+t.


If you do this in Firefox, you can go to History -> Restore previous session. This will bring up all the tabs you had open last time it quit.


This helps: https://addons.mozilla.org/en-US/firefox/addon/customizable-...

Also, I made a mistake, it doesn't close the window. It closes all windows at once. Be afraid!


ctrl + q very appropriately quits the application. If this came as a ticket to me, I'd close it as working as expected.


Traditionally applications have asked the user if they're sure they want to quit. That's a no-no these days, but it's still a reasonable choice in situations where the cost of quitting might be high (there's unsaved content, or the app takes a long time to start, or it's impossible to persist the current state of the application).

For some time, the Chrome team refused to implement a 'Sure you want to quit?' popup due to a general anti-popups consensus. They also refused to implement a checkbox to enable that behavior due to a general anti-configuration consensus. They've since relented on the latter.


Mozilla Firefox used to have a setting under options/preferences to disable loading images and loading javascript. People complained and said this should be removed as not enough people use it and that people who use it can create/use an add-on to do the same.

It is not easy to find a right balance between providing adequate functionality while avoiding information overload. The web is still evolving. We are learning and we will do better (overall) as time goes by. :)


... and then you'd spin it off as a feature (or design review) request for better handling of accidental quit actions, right?


I like this idea. I will try to remember this because it is the perfect answer in situations like these...


Ctrl-Q typically preserves the session though, so all non-ajax sites will come back as they are, including partially filled forms.


Yeah, if you don't care about privacy and save such data to disk as active session, active browser tabs, history, cookies, etc. Those should be RAM only for privacy reasons and never stored to any medium which can store those for extended periods.


I've done it enough times that I was finally motivated to figure out to set browser.showQuitWarning=true in about:config.


The other reason I enable browser.showQuitWarning in Firefox is because you can Save and Quit instead of just quitting.


No, this isn't true at all.

The power switch on the IBM PCs were way at the back so that people couldn't unintentionally reset the computer. The same thinking went into Ctrl-Alt-Del, which was a combination that people wouldn't accidentally hit.

So having a system where F7 would reboot the entire system was pretty dumb, even in the early 80s.


Probably not. The power switch was at the back because that was were the power supply was.

IBM didn't put much thought into safety. You could blow up the early IBM's if you turned on the monitor (screen) before the CPU box.


I can't find information why the design(s) were as they were. The design of Ctrl-Alt-Del was intentionally unintentional.

Gates noted Ctrl-Alt-Del should have been one button, not three [1]. David Bradley, the inventor of the trifecta, did make it deliberately difficult to reboot, however, it was also originally an Easter Egg which made it to production [2].

[1] http://www.theverge.com/2013/9/26/4772680/bill-gates-admits-... [2] https://en.wikipedia.org/wiki/Control-Alt-Delete#History


Ctrl-Alt-Del was an excellent choice for it's intended purpose. It can be easily typed at they keyboard (no need to fiddle with the back of the computer) but it's very unlikely to be something you hit accidentally.

I'm not sure a single key would have been a good idea. "The reboot key" just sounds like a mistake waiting to happen. I've seen enough stories on the 'net of laptops with power buttons in terrible places on the keyboard to get a glimpse.

The mistake was using it for the Windows NT screen lock/unlock. Changing the "reboot your computer" sequence into the "start using my computer" sequence is a rather non-sensicle (ignoring implementation) choice.


> "The reboot key" just sounds like a mistake waiting to happen.

It already did happen: I've used keyboards with shutdown and reboot keys, and yes, it's a terrible idea.

My alma mater used to run programming competitions in a lab where the workstations had the reset button exactly at knee height. This wasn't a problem normally, when you're sitting under the desk next to the computer; but when you have three or four people gathered around the screen, kicking the reset button was a definite possibility. Eventually taping the cover of a calculator over this button became part of our regular routine.


I actually really like the added knowledge that my login screen and password input box is a system one...


Meanwhile, the Apple II+ had the reset button right above the return key.


I was about to argue, then I realized my first computer was the Apple IIe. Reset button offset a bit to the right. http://images.cdn.fotopedia.com/flickr-144862832-hd.jpg


And on the IIe or later you had to press Ctrl+Open Apple+Reset. (Similar to the IBM PC.) On the earlier models, if you mistyped, BEEP, the system rebooted.


Another alternative is confirmation fatigue.

As in: I hit F6, "Do you want to reboot this?" dialog pops, I hit 'Y', "Do you really really want to reboot this?", I hit 'Y' again.

Instead of actually reading what it says, you just instead press F6-Y-Y in quick succession.

Modern interfaces sometimes make you type some kind of string to confirm, but most either use a password (like sudo) or some hardcoded string that everyone eventually memorizes.

But even today, Windows 7 only makes you click that one button in UAC, and most people probably do it without even thinking about it.


Whenever I implement a bulk delete feature I tell the user how many records they're about to delete and ask them to type it back in.

If it's possible they're trying to delete data from the wrong place (say, an administrative account that manages many customers) another safeguard is to have them select the name of the context (customer name, etc.) out of a list of four or five nonsense alternatives.

The user experience tends to involve a lot of double takes and rereading, which is precisely what I want.


In another comment, derefr points out that GitHub similarly makes you type the name of the repo when you delete it.


I have idly considered addressing that problem when it really matters by asking multiple random questions whose answers need to be some combination of "Y" and "N" to proceed. With the result that you simply cannot engage in muscle memory.

Anyone using the app would hate me.


When you delete a github repo, you have to retype the fully-qualified name of the repo you want to delete. I think that's exactly the right level of annoyance: it makes sure that if you're mistaken about where you are, you realize it, and that if you're making a typo, you'll have to make it the same way twice.


Not as much as we hate the person who made the decision to prevent phones/computers from turning on immediately when the battery is empty, even if they're plugged in :P

If I ever meet that person IRL... I might even go so far as to make a tasteless joke about committing physical violence in retaliation for the hassle they've caused me.


There are good reasons for this

- When the battery has just started charging, the voltage will not be high enough for the phone to actually work, because the draw from the battery exceeds the plug pack input

- Sometimes when transmitting, the phone uses more power for a fraction of a second than the power pack can deliver. This surge of energy could come from the battery, but the battery is empty so it won't work correctly

- Having some amount of battery means the phone can soft-off correctly when the plug is removed suddenly. The alternative is an un-expected hard off which is usually bad. The user might experience data loss.

There are a bunch of grey areas around low voltage, such as flash writes failing marginally or radio not working correctly or partial saves. Much easier for the engineers and perhaps more reliable for the users to make them wait just a little.


Sounds logical, but then why on earth do many phones turn the screen on right after plugging in?


> There are good reasons for this

They all seem pretty bad to me.

> When the battery has just started charging, the voltage will not be high enough for the phone to actually work, because the draw from the battery exceeds the plug pack input

What type of battery has an innate "draw"? They need a certain voltage and have a certain internal resistance, but it's easy to efficiently increase the effective internal resistance by boosting voltage with switched capacitor circuits (or whatever). If there's a "smart battery manager" you can bet the hardware to do this is already there.

"Draw" would be an excuse if you were hooking things up manually to a car battery. It's not an excuse in the highly integrated environment of a cellphone where corrective circuitry is dirt cheap (and free relative to what's already probably there).

> Sometimes when transmitting, the phone uses more power for a fraction of a second than the power pack can deliver.

That's what capacitors are for. They're almost certainly more efficient, too. Efficiency slumps away from the optimal I,V much faster for batteries than for capacitors.

> Having some amount of battery means the phone can soft-off correctly when the plug is removed suddenly. The alternative is an un-expected hard off which is usually bad. The user might experience data loss.

I'm pretty sure this is the actual reason why it's done. It's an awful reason.

First of all, you claim that "an un-expected hard off is usually bad". WTF? Does your ext4 linux partition usually die when you hard-off it? I've probably hard-offed ext4+linux 1000 times, never had any problems. I would go to great pains to avoid hard-offing a production server but you must acknowledge that in the age of solid journaled filesystems, hard-offs almost never lead to actual bad consequences, especially for light usage patterns. I'm sure it's worse on some hardware configurations but I've never met a system where it got all the way to "usually bad" territory.

On the other hand, having a power manager lock me out of my phone for 15 minutes after I determine I need to use it has led to loss of data. Significant loss of data. And worse. Pictures that were never taken, phone calls that were delayed at significant inconvenience, the inability to look up contacts for others... these are real world negative consequences that are 1000x more important than a .1% chance of filesystem corruption times, say, a 20% chance of actual power failure. It seems hopelessly myopic to suggest that the cost/benefit trivially favors the prevention of uncommon filesystem errors over addressing the immediate and possibly time-sensitive needs of the user.

I think that whichever organizations choose to implement the lockout feature are doing a massive disservice to their customers, foisting significant hassle upon them in order to save a few pennies/customer of repair costs, if that. Your arguments haven't convinced me otherwise.


I believe UAC works well even when a human always hits OK. The point is to make sure a human is there and trying to do something and not some malicious code.


This is very true, which begs the question - why was a 23-year-old summer intern placed in charge of the embassy computer system, or even given access to its central console? He may have have self-taught facility with computers, but I find it hard to believe he'd have much experience at that age with the sort of large, mission-critical institutional computer system described in the article.

I wonder if he had a supervisor sysadmin that he was working under, but given how he described his boss, that seems unlikely as well.


You might be surprised. What I observed during that time was that a lot of, if not most, senior people were clueless about what computers were actually doing. The potential "reach" of that intern might just not have occurred to people.

And we still have national security / diplomacy disasters resulting from relatively low level people having access to vital computer systems beyond anybody's imagination.


You're invoking an annoying and ridiculously overused false dichotomy that is as false today as it was 30 years ago. An interface that has fail-safes does not have to be annoying and clunky. In fact, interfaces that are annoying and clunky are a great contributor to human mistakes, because they require a lot of rote action, which encourages people not to pay attention and work on auto pilot.

If something has changed over these years, it's the overall understanding of design principles and their popularization. (Thanks, Don Norman and other people in the field!)

Training has nothing to do with it. Even if you can train a person to work with a badly designed system without making mistakes (often), designing the system well in the first place is almost always significantly easier and cheaper.

For example, accidental key presses can be easily prevented by requiring the user to type a command of reasonable length. Typing "reboot-all-workstations" is not that difficult, but it would definitely prevent the incident described in the article.


I think he was pointing out why it was built that way, not why it was that way. Today we know it is a false dichotomy, he was pointing out, back then, the world thought it wasn't.


You know, I think that was the point of the comment. I might just be reading it wrong, but it seemed to me that he was decrying such interfaces, using obviously hyperbolic language like "pointless time-wasters" (in the context of something other people believed) and "the damn system makes me sit there for 30 seconds". It sounded to me like he was even slightly making fun of that worldview.


You are correct, making fun of it was my intention. I suppose next time I will have to hang a "WARNING: SARCASM" sign on my comment, to make sure everybody gets it :-D


You're not taking into account the fact than what is considered bad and clunky interface today may have been the best in class 30 years ago.


Even allowing for the era, providing the opportunity for a possibly disastrous effect to take place due to someone dropping the keyboard or knocking it with their elbow is inexcusable. Accidents happen - they should be minimized and steps should be taken towards quick recovery but they should be expected.


It wasn't until Windows 95 (or was it 3.1?) that usability and standardization were taken very seriously in the PC world. I recall vaguely using an 80's-era DOS-based word processor at my father's office, and pressing "F1" for help, and wiping out my work.


If you visit countryside museums with old agricultural or workshop equipment, you can verify they are basically all maiming devices. Things used even in the 1950s. In one machine you push a piece of wood downwards and there's a blade that hacks slices off the bottom. Real tough men pushed right till the end so there was no waste. They often had some missing fingertips. Some really small changes made those accidents avoidable, like using some other pieces to hold the worked piece.


Holy moly. That's almost like the one Far Side cartoon with a "Wings Stay On" "Wings Fall Off" switch. http://imgur.com/AosYvGn


There was probably something like a warning prompt that came up, but he may have been used to disregarding the message because it was always what he intended to do.


Two servers side by side. Broken KVM that only switched the monitor. The screen was Windows NT, but the keyboard in front of the monitor was connected to another server running OS/2. Ctrl-Alt-Del The Windows screen did nothing, but the sudden hard drive activity on the OS/2 server told me what I needed to know. Not thirty seconds later, I had visitors. I was around 21 at the time, so yeah... Experience.


You know what's totally plausible? That hitting F6 and F7 prompted for confirmation, but using the same prompt - and he just hit "Y", "Return" as he'd done 1000 times before (just like everybody does with the UAC dialog in Windows 7 today), and that bit didn't make it into the story because it's totally irrelevant to the point he's making. Heck, it probably wasn't even F6 or F7. If he's a normal human, he likely can't remember.


It is possible there was a warning for both functions and the operator just acknowledged the warning assuming it was for the single reset without reading it. This is a common problem with warning too often in interfaces.


It's still pretty obviously a horrible idea to have the two keys right next to each other with confirmations that are even remotely similar.


You must have missed the main point of this essay, so I'll reiterate here.

"take a deep breath, count to ten"


Stole the words right out of my keyboard. Who puts the 'reset every workstation in the building' button right next to the 'reset just one workstation' button with no confirmation prompt?


He might have been that times IT Administrator? We've come a long way over these past 30 years, but think about it for a second, if you're an IT Administrator today handling all the office machines, you probably have the power to click 1 button to turn them all off? (Depending on the setup of course..) There's obviously more access rights involved today.

Doesn't answer "who" put the button in there, I'm just thinking out-loud! :)


> if you're an IT Administrator today handling all the office machines, you probably have the power to click 1 button to turn them all off?

To err is human. To push the error to thousands of instances online at once - that's devops.


The issue wasn't whether he allowed to press it; it was just way too easy to press accidentally.

Even if I have the authority to press the big red button, I'd like it to be behind glass and far from the light switch.


It could very well have been a minicomputer system where he was on the operator console and the other "computers" were terminals.


crontab -e crontab -r


crontab -ri


I don't remember how it is on the newer versions, but try using fdisk in an old linux distro and see how many confirmations you get when deleting a partition.

Or just the old rm -rf /

Older stuff asks for much less confirmations.


No thanks. It makes me shudder to just see this printed in the comments section!


I was leaving my old job and handing in my MacBook Pro. After getting permission from the network guys (who were going to format it anyway) I ran rm -rf / as root.

It was surprisingly boring, taking a very long time to delete all my files (I should have deleted them first). Eventually it got around to deleting fonts, which caused things to render a little strange, but after 60 minutes nothing much had changed and it was still chugging, so we shut it down and went to the bar for my last "Friday night drinks".


I did the same thing but had remotely SSH mounted some of our production servers.

Won't make that mistake again.


I managed to run (sudo) mv /* /tmp one day.

I was trying to move everything out of a folder to tmp (mv ./* /tmp), which was fine. Then I cd to /, and, wanting to rerun the command I'd run right before the mv, pressed up twice and enter quickly...

Well, it failed once it finished moving /bin/mv to tmp. Of course cp, and a lot of other helpful commands come before mv alphabetically.

It wasn't too bad, just needed a boot from a live disc to move everything back, but I still get nervous whenever / and * are in the same command.


I did it to one of my linux boxes once, but caught it before it got everything.

It made for a very trippy and insightful experience. Commands that had already been run (like cp, ls, mv, cd, etc.) worked fine since they were in memory but other ones wouldn't run (/bin was one of the first to go).

I tried for about an hour to bootstrap the system back to normal by copying files of CD/etc but in the end the damage was too bad for my (very amateur) experience and I ended up reinstalling.


Some fun exploration of a `rm -rf`'d system: http://lambdaops.com/rm-rf-remains

discussed here last month https://news.ycombinator.com/item?id=7892247


It's still the same on some Linux distros at least. I wiped a Slackware install last year carelessly attempting to fdisk an external drive. I caught it a few moments in, but it took a bit of doing to get the data back and I had to re-install.


Function first, failsafes second. Or fifth. Or twentieth.


`parted` always makes me nervous. It's probably not so bad but the changes are instant.


That was one my my reactions too (yeah it was a different age). My other reaction was about the total loss due to a restart - in what could have easily happened by faults in many other systems - presumably there was no way to save data as work went no, no journaling or anything. Indeed this was a different age.


It's easy to have this attitude now. But if you ask people who were in IT 30 years ago, they'll tell you that systems in those days had _many_ sharp edges. You were expected to know your way around, and the consequences of mistakes were pretty severe.


I was in IT 30 years ago, and this was never the attitude. What's different today:

1. we know a heluva lot more about human factors design

2. we have a heluva lot more excess computer power that can be devoted to human factors


I'd add: 3. many more people are using computers.


Some people are saying that a warning would have helped.

Remember that warning functions are easy to ignore. Never use a warning when you mean undo.

http://alistapart.com/article/neveruseawarning


That won't help here. You cannot "undo" a workstation reset, or any other action that results in a state propagation.


Sure you can.

Pressing the button would start a timer. While the timer is running, the user would have the opportunity to review their selection (maybe even with a simulation of what effect the selection has) and could undo the request if necessary. Only after the timer expires would the action actually be taken.

This is how the "undo send" feature of Gmail works.

http://mashable.com/2010/08/22/how-to-undo-send-in-gmail/


I don't know whether armchair usability trivia helps here. Not every system deals with ephemeral web drivel; some systems interact with the real world and have impact.


Are you saying that usability does not need to be considered in the design of critical systems?

Human factors grew out of the need to build safe and error-resistant weaponry in World War II. Poor attention to human factors and user interface design was a factor in the Three Mile Island disaster.

http://en.wikipedia.org/wiki/Human_factors#In_aviation

http://en.wikipedia.org/wiki/Three_Mile_Island_accident#Huma...

With respect, to call this armchair usability and to imply that some users are just stupid is to completely misunderstand what usability is.


My thoughts exactly. With that sort of "global" function, you would expect some sort of countdown timer prompt on each terminal that could be cancelled.


Think about the resource constraints of systems back then.


agreed 100% (if there truly wasn't a prompt). OP took the fall because the head administrator was too ashamed to admit that their system was so poorly designed to allow for this to happen.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: