Secondly, the punishment meted out should be:
1. Proportional to the degree of carelessness (in this case not that much since he accidentally hit a wrong key adjacent to the right one, didn't mow down anybody while driving drunk)
2. Inversely proportional to the likelihood of the error (in this case the likelihood was very high since the reset key was a. uncovered/single-press b. right next to single reset key).
3. Proportional to intention (this was a completely unintentional error)
If you say, that the punishment should also be dependent on the degree of damage, I would say that the responsibility of managing the risk of such damage wasn't his but of the person responsible for implementing such a high risk design. If such a person is not around, find the person who approved such a design. Government departments are usually very good with paper trail.
So what did you do after you got fired from the embassy?
What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)
I understand that we don't want a culture that fires people for making the sort of mistake anyone might make. But to be so careless on a day that is clearly an exception where something more important than standard business procedure is going on, can't you at least see why firing the intern for such a lack of mindfulness might at least make sense, even if you disagree with it?
I interned for a government organization that maintains hydroelectric dams and the software that controls them throughout the Southeastern US. A careless mistake could -- in the worst case -- cause blackouts, cost the company millions of dollars, or even cost lives (if the data-control feedback loop caused a turbine to spin up at the wrong time or to fail to shut off in an emergency). And, as is quite common in organizations with non-software-engineers running the show, the development processes were entirely haphazard. The environment was such that it would be really easy for me to push unreviewed code, or to make a stupid deployment mistake, or to be careless in a number of ways that the system didn't protect me against.
But it was OK, because they hired smart, competent people who understand the need to triple-check, if necessary, before committing. People who understood the gravity of the situation, and who didn't phone it in if they weren't feeling it that day. If I demonstrated that I wasn't one of those people, I would fully expect to be fired.
This equivocates on "consequences" of actions, though. It's obvious that the consequences of hitting F7 before the incident were understood by all responsible to be low enough that any intern could be expected to make the right decision. After the incident, the consequences of hitting F7 were sharply increased such that no future intern would ever be allowed to make that decision. But then you can't make an argument that assumes "consequences" were the same at both points in time.
We make this fallacy all the time probably because we're designed by evolution to reassess the morality of an action based on consequences. It works as a social heuristic for shaming or rewarding people but it makes no rational sense that the morality of an action should retroactively change based on future consequences. You can see similar behavior in our rewarding athletes for profound genetic advantages, or punishing criminals for profound genetic deficits. The consequences somehow redeem or condemn, and they should do neither.
The negligence on the intern's part was to make decisions and act without regard for risk as if he was in the low-risk window despite the evidence he was actually in the high-risk one (all the already-active PCs).
It makes perfect sense that the punishment should reflect inappropriate regard being given for known consequences. That's what negligence is.
I'm talking about the perceived consequences, not the actual consequences. The fallacy here is to perceive low consequences at one point in time, perceive high consequences at a later time and then try to change history such that low consequences were never really perceived.
> The negligence on the intern's part was to make decisions and act without regard for risk as if he was in the low-risk window despite the evidence he was actually in the high-risk one (all the already-active PCs).
He was in a perceived low-risk window. The perceived consequence of accidental reboot was already figured in and was already perceived to be low. Else why would the F7 key be next to F6? It is certainly unfair to expect someone to perceive high-risk when everyone else perceives low-risk.
> It makes perfect sense that the punishment should reflect inappropriate regard being given for known consequences. That's what negligence is.
The perceived consequences were low-risk, therefore the known consequences were low-risk.
... because he was negligent. "Oh, all the computers are already on? That only happens when Washington's waiting on something. Oh well, I'll carry on like this was any other low-risk morning"
> Else why would the F7 key be next to F6?
Same reason why "rm -rf " is the one keystroke away from disaster. Perceived risk has nothing to do with it.
Because those situations were also low-risk mornings. He only saw that pattern when people left late. He had no reason to expect that people would be working early in the morning because that situation had never occurred. Further, a secretary playing a computer game in the morning suggests business as usual, no one working.
> He was in a perceived low-risk window. ... because he was negligent
No, someone else set up the computers and software with F6 and F7 command functions side by side and then evaluated the entire network as low-risk for interns under all situations. It is perfectly reasonable for an intern to take the same low-risk perspective as his superiors.
> Same reason why "rm -rf " is the one keystroke away from disaster. Perceived risk has nothing to do with it.
Perceived risk has everything to do with it. It is inconceivable today that an intern would have unrestricted access to a company's file system and be literally a few keystrokes from disaster. The key reason for that is because perceived risk now is much closer to actual risk. In 1983, no one had a clue about the kinds of things that could go wrong. Understanding real risk is a painstaking process requiring time, trial and error.
In my experience that is not nearly sufficient for implementing any process that can't tolerate errors. It is necessary to have conscientious people of course, but they still are humans. Given the opportunity for 2,000 hours a year, year after year, they will screw up.
Humans are very bad at following procedures. For recent examples, consider the people operating our nuclear missiles and those protecting our bomb-grade nuclear materials. If even they don't have enough motivation to follow procedure ...
In this situation, the secretary playing the game is just as culpable as the intern, which is to say, not really responsible.
In one case accidentally pressing the wrong key deleted incredibly important data, while in your case you have plenty of time to review and ensure quality at your leisure.
Count to ten article:
> I, naturally, felt terrible and was, appropriately, fired.
Honesty Wins article:
> But, naturally, that day was my last day of work at the American Embassy. But, not because I was fired; although, I might have been fired if that day didn’t just happen to be the last scheduled day of my summer internship.
“Oh, don't worry boss, we sacked the fool who made that mistake!”
It is simply that given the simplicity and consequences of the error, it is the type of thing I generally see people beating them selves up over until they can not forget it.
(In case you are saying to yourself but he did remember the key, look at the two versions in one he says F6 machine reboot F7 all reboot, and in the other he says F7 machine reboot F8 all reboot, indicating that while I hope he knew the keys functions then, he has since forgotten the exact key, or is substituting F keys for story telling purposes)
Your number 2 is particularly wrong. For punishment to work in affecting behavior, you can't punish for a very unlikely event, especially accidental.
Punishment changes behavior by making people anxious and afraid of the punishment. If you punish something that's very unlikely, it does no good. It's like if pushing ctrl-F restarted the stations, and the guy has never pushed ctrl-F before, and never been punished for it. It's very unlikely that he would ever push ctrl-F. But he happens to trip getting up and accidentally hit ctrl-F while he's catching his balance. Does that warrant a heavier punishment? It's more unlikely, certainly, but what would the punishment change about his behavior?
Punishment works because you are afraid of it. It works because you want to avoid it. But accidents don't happen because of defiance or a rational decision making process.
If you were punished moderately and frequently for a common mistake like hitting F7, it could correct behavior because you would be more vigilant when hitting F6. Having it be proportional to the degree of carelessness in terms of correcting behavior is not important. If someone is more careless, they will get more frequent punishment. If the punishment is too strong, it will just make people fearful instead of correcting the behavior.
Firing someone who consistently makes mistakes is a corrective action, not punitive.
Punishment is generally more of a cultural thing and less of a means of correcting an issue. Punishment is expected, so it's delivered. In western culture we have a particular need to find someone responsible and punish them. Rarely though do you feel "I don't want to get punished, so I am going to do this right." but it's not uncommon to think "I don't want to get punished so I'll avoid this altogether."
Corrective behavior is better when it's not punitive. Look at the design of the software, correct that problem. Look at the systems that allowed this to happen, correct them. Work with the staff and find out why this could happen, help them correct it. If people are punished for writing the software poorly, they're just going to cover up the flaws that they find instead of bringing them to light to correct them. If staff are punished for making mistakes, they're going to hide them instead of seeing if they can fix them.
Punishment is often just a game to abdicate responsibility. "Oh, it wasn't my fault. It was his fault. The proof that it is his fault is that he got punished for it. I've done my part to solve this problem."
Especially in complex environments like corporations and government, I think that the last thing you should do is look for a person to blame. Instead of looking for the person responsible for implementing the design, or the person who approved it. Look at why it was implemented, how it was approved. Instead of pinning it on an individual, pin it on a system.
I think you should only look at an individual if they are committing malfeasance for the purpose of benefiting themselves outside of the system. If the person approved the design because they weren't aware of the potential risk, then find out why. If they approved it because there was supposed to be another safeguard to stop it from accidentally happening, find out why that wasn't there. If they approved it because they gave the contract to their friend who wasn't the best decision, and overlooked issues for a cut, then go ahead and blame them.
If there's a problem with the person, say the designer was just irreconcilably bad, then remove him. If it's a problem with training, then train him. If it was something he did as a greenhorn in the past, and now he's much better, then for God's sake don't punish him for a mistake he made years ago when he was put into a project that was more important than the skills he was hired with, unless he grossly lied about his skills.
What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)
The reality was that his boss took the fall for him, which is awesome and terrible. Much of the discussion in this thread has been a tempest in a teapot due to missing context.
His supervisor took the fall to protect him, he was fired on paper, but it was his last day anyway, and he did actually get to work at the embassy again, as it really was a simple innocent mistake.
Though certainly one with serious, long-lasting consequences.
It seems that the translators could have saved their work more regularly -- perhaps they hold some of the blame. Obviously the poster could have thought a bit before hitting the button -- he holds all the blame for the resetting of all the terminals. How is Itoh "more responsible"?
Mr. Beck wasn't culpable because he didn't understand the full effects of his actions or the tension of the current situation. Ioth should not have let Beck in the door that morning and he should not have given that much power to the intern.
Think about it: Unix is equally as "insane". If you're the guy on the console who meant to clean out some crap dir and accidentally typoed "rm -rf /" and then caused an international crisis you're going to get fired too.
Then years later HN will call for Dennis Ritchie to get fired instead.
I imagine that someone wanted someone's head, so whose head should it have been? They guy who wrote the system couldn't be fired, he was in a different company. And maybe a macro has been assigned to that key, so it wasn't his fault anyway.
Unfortunately, most small companies have no-one who fills that role, or if they do, it's the same person who both has the power to fire others, and is unwilling to entertain the notion that they themselves are at fault.
And even if you don't buy that, if pushing the button's not a big deal it doesn't need all the safeguards everyone's yelling for. (Had such safeguards been in place he might equally well have seen them, thought "oh, nobody's in, this will do what I want anyway" and approved it).
And given that it's right next to a 'single terminal reset' key, it should be immediately obvious to anyone who's ever used a keyboard - mistakes can and do happen, even when you're fluent.
The first thing I do on a new Firefox installation is enable "Restore all tabs on startup".
lrwxrwxrwx 1 root root 20 Apr 27 17:02 cc -> /etc/alternatives/cc
Guess what happens when you paste a whole list of those into a console as root?
To prevent this from happening with me, I've added the following line to my `.Xdefaults`
$ echo "asdf" > foo
$ cat foo
$ lrwxwrwxrwx 1 root root -> foo
lrwxrwxrwx: command not found
$ cat foo
Out of curiosity, what was the solution to fix all that?
The command "rm -rf ~/blue/" is just a single space key from being equivalent to "rm -rf /" with "rm -rf ~/blue /"
"rm -rf ~/blue /" will not come close to deleting / unless you are in the habit of running every command as sudo, even ignoring the presence of --no-preserve-root
$ cd dir where there are source files and temp files
$ rm *.tmp # or so you think
'.tmp not found'
# too bad
1. You type it into the wrong system (D'oh)
2. You have run `mount --bind / /somewhere/else` then `rm -rf /somewhere` a week later
Every few days I remind myself of this.... then I have to delete another directory with a git repository in it, and end up add the -f in again
read -p "Are you sure? " -n 1 -r
if [[ $REPLY =~ ^[Yy]$ ]]
rm .git -rf
(echo "The following files are going to be deleted!!!"
for FILE in "$@"; do
echo "<<<" "$FILE" ">>>"
done) | less
read -p "Are you sure? " -n 1 -r
if [[ $REPLY =~ ^[Yy]$ ]]
rm -rf "$@"
Premature enter just results in:
[web@server /]$ rm ~
rm: cannot remove `/home/web': Is a directory
Such as this classic:
so of course, I said "I'll just uninstall it from there and install it in the correct folder"
and it proceeded to delete C:/Program Files/
Uninstall wipes out your Windows directory.
$ cp -r path/to/some/directory path/to/very/important/directory
$ (run some commands to verify copy did what I wanted)
$ rm -r path/to/some/directory path/to/very/important/directory
Fortunately, despite rm's poor choice of options and bash's poor default handling of variable name typos and the obvious PEBKAC, backups saved the day here.
People are human. Policy ought to reflect this.
rm -rf *
in their shell history? Now it's just one accidental twitch away.
Far from perfect - it depended on the order of deletion - but a more general solution than preserve root. Of course it still requires the user to mark things they consider "important".
It is definitely a problem with Unix also.
I'm trying to say that when you've got something dangerous without safeguards, you take care around it. Not taking care of known-dangerous things and causing severe damages as a result is an arguably good case for dismissal.
And, of course, that should've been the solution to OP's incident with the Korean Airlines flight 007. Backups, surprisingly, are scarcely mentioned at all in this whole thread.
you cannot fat finger rm -rf /
# rm -rf /tmp/bla
# rm -rf / tmp/bla
I was enlisted in the Marines, MOS as a programmer (4063), 1989 - 1993. I never really programmed, but spend my time as a small computer support guy.
Computer games were officially forbidden, but unofficially tolerated, provided one was discrete. I suspect the same 'don't ask don't tell' policy applied to EUCE at the embassy in question.
Sea story. My team was once directed by our boss, the Major, to 'sweep' the command for 'games' and remove them from computers. This took the better part of two weeks, and was massively unpopular with our peers. 'A Marine On Duty Has No Friends', we repeated to ourselves. We even got into the spirit of things and deleted games from _our_ computers.
Near the end of this evolution I hand-carried some paper into my Major's office. He was, yes, playing a computer game.
He did at least have the grace to look embarrassed.
Responsibility flows upwards, not downwards. Its just unfortunate that the people at the bottom are often carrying the people above far more than they should...
e.g. No intern that needs permission to get a box of pencils from the supply closet should ever be fired for putting a mistake into production that costs a company $100,000. If a company loses $100,000 on a mistake, you look to the person in the hierarchy who manages budgets of that size. It's their job to make sure the safeguards are in place to prevent losses like that.
In government it's difficult but not impossible to put a dollar value on losses like this. In this case, whoever was in charge of that network, and could request budget to build safeguards (whether software or training) against such mishaps, was ultimately responsible. Firing the intern is just shit rolling downhill.
An equally sufficient solution would have been to install a safety switch on any button with that much importance. Something like this but probably smaller, or just a plastic cover that fit over the F7 key: http://www.thinkgeek.com/product/15a5/
A fireable offense would be lying about the action or trying to cover it up.
Should the lady who asked for the reset be fired for playing a game and asking him to reset the computer?
Should the technician that didn't install some sort of safety be fired for not foreseeing this issue?
He would be much less likely to make the same mistake in the future than the person who would replace him.
If there are terminals that could erase a presidential report and there is no backup available, you send a non-critical staff member to guard every one of those terminals, or at least put a sticky-note in the middle of the monitor.
I'd say several other people deserved to be fired for this, but the intern was not one of them.
Also, whether or not it was appropriate is completely irrelevant to the story being told.
"That Korean announcement and the slow response by the US President—both caused by delayed real information—caused decades of conspiracy theories."
The reality here is that an intern doesn't have civil service protection, so it is quick and easy to dispose of them. Going after the supervisor may take longer, so if rapid action is needed, they'll fire the first person who serves "at the pleasure of" the executive.
Note also that it turns out the OP was "fired" but immediately rehired too...
I have to routinely create and drop databases on my local system. Our production databases, which I also have to connect to, contain hundreds, maybe thousands, of person-months of work. I realized that it would be a good idea, before issuing DROP DATABASE commands, to deliberately stop and double-check what server I'm connected to. Luckily, I haven't screwed that one up yet.
Based solely on the shortened account, it was not appropriate at all to fire him. Convincing him that it was is just doubly inappropriate. There may be more to the story, but as it is, it looks like angry scapegoating against a hapless, lowest-level employee.
Japan, I guess? I've never been there but the story was consistent with my impression of their work culture.
Also, in Japanese companies, it's basically impossible to fire people. They can, however, be assigned to a desk in a windowless room and be given nothing to do for several years, until they take the hint and "voluntarily" quit.
Suffice it to say that I am aware of situations created by a societal expectation of lifetime employment which make the above article look positively sane. (And I recently learned that, in some cases, what I had assumed was just an ironclad social contract actually is legally enforceable, which blows my mind.)
Basically you can legally fire a permanent employee in the same sense that a civil servant in most countries can be fired: if the employee does something egregious, like stealing from the company, not showing up for a long period of time with no reason, etc. Certainly not for incompetence, or even if the company has been in the red for several years in a row.
E.g. Japan Airlines went basically bankrupt (technically a restructuring) and even so they had trouble laying off part of the staff.
You even get access to a pool of other soon-to-available engineers to work with if you're stuck with other poor sods in the room.
Definitely another scenario than the being suddendly kicked out of the door by security right before the week-end with a box of your belongings and, if you're lucky, a tiny check to not starve until next week.
Or the US navy crew who received medals after shooting down the Iranian airline.
Once there is loss of life, it is 100% politics afterwards with little to no practicality, just look at all the mass shootings where there were zero changes afterwards. We simply do not value life, it is politics first.
You make it seem like they received the medal for having shot down the plane. In reality, those who were awarded medals, were awarded Tour of Duty medals for their time spend in a combat zone. I believe the distinction is important, particularly since that class of medals are routinely awarded to individuals during their time in the military.
My answer would be no, you failed at your job regardless.
Same thing with military.
Gosh I hope you never screw up even once, cause you'll never live it down.
This is an insanely important distinction, since a fighter pilot would have had the opportunity to visually inspect the craft beforehand. Where in this case, the accident was made due to incorrect system information which misidentified the aircraft as being anything other than a commercial airline.
Either that or life is really really cheap.
See John Allspaw's Swiss Cheese Theory : http://www.kitchensoap.com/2012/02/10/each-necessary-but-onl... .
[ Edit: I guess it's not Allspaw's model, but he applies it to systems engineering rather well - http://en.wikipedia.org/wiki/Swiss_cheese_model ]
"Accidents emerge from a confluence of conditions and occurrences that are usually associated with the pursuit of success, but in this combination—each necessary but only jointly sufficient—able to trigger failure instead."
The person who pushed the button is not at fault, the manager is not at fault, the guy who designed the button is not at fault - all are jointly responsible.
Blaming the intern does, however, reflect extremely poorly on Itoh and everyone else in the chain of command. A superior who demands retribution for a simple mistake that happened to cause him or her pain is basically worthless.
But, I forget, we're talking about Ronald Reagan.
No. This is something you would read in Design of Everyday Things where Don Norman would totally shame the the engineers who made that system. Software shouldn't be designed with the assumption that no one makes errors.
Why didn't the backups work? System wasn't "robust" enough. (Did I just use the word "robust"?)
I appreciate that the OP was a part of the situation, but conspiracy theories were not caused by this.
It was time of very high tension between the US and Soviet Union. So when a plane veers off the course into not just Soviet airspace, but into an explicitly cordoned off top secret area, ignores all communication attempts, ignores the presence of fighter jets and just keeps on flying, then the situation itself is a fertile soil for conspiracy theories.
Through the glass of a yellow newspaper box, the Miami News headline that the Soviets had shot down a plane carrying a Congressman. My first thought was "This is the war." Not 'a' but 'the'. The primary stance of the US military was squared off against the USSR and had been for more than 30 years.
Incidentally, "features" like this are why I don't trust systems that have some centralised control - IMHO giving any one individual (or organisation, in many cases these days) such power over others is not a good thing.
I'm not agreeing with the outcome.
"Not long after I arrived in my office, I received a call from a secretary in the Agriculture Department who liked to play a computer game before her workday started. Her favorite game had a bug that regularly froze her workstation. [...] I realized that I had mistakenly hit F7 and reset all the workstations in the embassy. This realization didn’t bother me much, because no one except the Agriculture section secretary was usually on the computer system this early in the morning."
I'm sure I'd have thought something like: "Phew! Glad I made that mistake now, rather than at 11am when everyone was half-way through their morning's work. Likely no harm done at all, and I'm going to be really careful with that command in the future. Yup, definitely dodged a bullet there..."
In any case, we already know where your line of thought leads, risk adverse cultures ultimately stagnate and wither away.
Why didn't Reagan respond immediately? Well, he was waiting to hear from Chancellor Gorkon that the KAL flight had been successfully beamed aboard and was en route to Pluto, of course... Clearly they'd have their shit together better so it couldn't have been a 23-year old rebooting all the computers accidentally and wiping out hours of critical work -- that would just be ridiculous...
It's comforting to think that people can control the direction of every choice in the world, and that someone is at the helm.
It's uncomfortable to think about the daily series of random, unconnected decisions that drive the direction of our species.
Actually being in control of stuff is very difficult, but convincing people that you are in control of stuff is pretty easy as we are all suckers for narrative. The main ways to disrupt a power narrative is to spread other narratives or for a situation to occur that upsets the existing narrative, so getting people to make up new ones. This explains why totalitarian governments can collapse so quickly, which wouldn't be possible if the people running them were actually in control of anything.
If I had made the same mistake twice without any attempts to fix the situation long term, then, yes, I think that would have been a fire-able offense.
If you're working with people who care primarily about their own positions and egos without regard to the team as a whole, well, be prepared to be thrown under the bus when it comes time for those people to protect themselves.
Those with automation capabilities: keep this lesson in mind, because it will happen to you in production one day. 'dsh -a reboot' is incredibly easy to type and can have disastrous effects. Creating abstraction layers around common admin tasks can help catch simple mistakes and give prompts before dangerous behavior.
... incompetence like that comes from having F6 next to F7 and no checks or authorisation needed for a potentially dangerous action etc. Processes should be designed for people to make the common mistakes... its what they do.
hmmm, I am pretty sure Mr. Itoh was not Japanese working in the American embassy. I am pretty sure he was American.
Fire the idiot who wrote that function.
"I know what I'm doing when I hit F7, but the damn system makes me sit there for 30 seconds before it does what I told it to do! Piece of junk."
The result was that software in that era tended to come with a lot more sharp edges. The age of the Recycle Bin that would save you from yourself didn't arrive until administering systems became something the general public was expected to do.
I recall that time I wrote a batch manager for the VAX 11/780 at Caltech High Energy Physics. It consisted of a program to monitor the batch queue and start jobs as scheduled ("BATch MANager", or "BATMAN"), and a program for users to submit jobs ("Run Overnight Batch INput", or "ROBIN").
The configuration file for BATMAN was stored in /etc/batman.
During development, I occasionally had to "rm /etc/batman". Of course, out of habit, as soon as I typed "/etc/" my fingers would automatically type "passwd", and once I did not catch this in time. Oops. It happened to be a Sunday morning at around 7AM, and I had to call the other admin, who handled backups, to come in and restore that file. He was annoyed.
The second time I did this, he was pretty pissed.
The third time, I fortunately had been working at the terminal we had in the machine room, and managed to shut down power to the machine before the write buffers were flushed, and the file was OK after fsck. I didn't have to deal with an angry co-admininstrator that time. Just angry physicists.
The other admin (Norman Wilson, in case anyone knows him or he reads HN) then made a link named /etc/safe_from_tzs to /etc/passwd to stop my nonsense once and for all.
That worked until the first time I wanted to overwrite /etc/batman instead of rm it.
That led to a cron job that maintained a copy of /etc/passwd in a separate file, and periodically checked to see if it were missing or misformatted, and restored it if so.
For example, one of my early jobs in IT involved running batch programs that produced reports on a mainframe designed for the punch card era. It had moved on from punch cards, but all of the batch jobs still expected them as input, so they were stored instead as "digital cards" in the job files themselves. The operator -- me -- would be responsible for bringing a job up on the terminal, changing each occurrence of some two-letter code in each card file to some other two-letter code, some date code to some other date code, and so on. Each batch job might be just one step of half a dozen or so required to produce paper printouts from the database. The terminal emulator did not have a find & replace function. Naturally, I screwed up jobs on a regular basis.
This mainframe ran mainly on COBOL74. Over the course of a lot of unpaid overtime, a few hours here and there for several weeks, I gradually wrote a variable interpolator in COBOL that could be called as the first step of a batch file and would replace all occurrences of a variable tag with an input parameter passed to the job. Instead of pulling up a job file and replacing a bunch of two-letter codes, you'd just run the job with the two-letter code as a parameter, and this program would rewrite all of the data cards in the batch file. COBOL has no string operators or a string data type, but I found a way to abuse some system calls to make it work.
So it took weeks to fix the most common operator error in that shop.
IT staff spend more time on Facebook, Reddit, HN, and online gaming now than we ever had available for fixing processes back in the day.
No amount of training can prevent something like this. It's like today's browsers where the tab can be closed with ctrl+w and the whole window with ctrl+q. It doesn't matter how many times you've done it and how used are you to the position of the 'w'. One day you will close the whole window by accident.
Root cause analysis + countermeasure might have boiled down to "operator error due to shitty interface" + "we will tape a guard over F7 key, since it will never ever get fixed in software"
That's a dangerous way of thinking. 9/11 was big enough that it required a response. Not sure if we'll ever reverse the airport security stupidity that was such a response.
This is a far-reaching conclusion. That assumes that a) no mistakes can be prevented before they happen for the first time and b) every mistake can be prevented after making it. The truth of either far from obvious. Moreover, it is routine in our culture that sever mistakes are punished - e.g. if you make a mistake of driving drunk and cause harm, you'd probably be punished, not lauded as model citizen since you'd never make the mistake again.
Moreover, if no punishment follows the mistake, why the mistake would not be repeated? What would be the motivation to avoid the repetition of the mistake - do you assume the sympathy for the co-workers would be enough? It is not always a sufficient motivator.
>>> It just seems like a situation that is strictly worse than keeping the current employee.
That assumes employees are a fungible commodity, and if you pay the same money you always get the same one. This is not true - you can find employee which would be more attentive, or one with more experience.
The key difference here to me is mistakes vs negligence. Employee makes a typo -> It's a mistake. Not severe, negligent incompetence. It's a learning experience. The company is worse off by firing that person who has experience
If someone is slacking off? Yeah, fire them, that's not a mistake, that's negligence. You email a colleague in another time zone asking for help and they ignore you because you didn't CC their manager? Yeah, fucking fire that person.
I mean, in fact we have an industry based around the fact that people make mistakes: it's called software testing. Should we be firing developers when they make a mistake (i.e. their code has more than zero bugs)? That would be ridiculous. You're not even punishing them in that case - they're going to use their current salary at your shop to leverage a higher salary at the new place they (effortlessly) land at, whereas you're going to spend tens of thousands of dollars to hire that mythical developer that you should have fired this guy for a year ago?
Personal experience tells me this is not always true.
I mean, the goal of the business is to create value / profit and find people who add value to your organization. Not to judge and suss out people who you discover are capable of making a mistake and saying "AHA! I FOUND YOU! You were an imposter all along not worthy of paying! Time to start from scratch again!"
OMG I've never done that but now that I know about it I'm very afraid. If I do it tomorrow I'm blaming you.
Now, if you accidentally press Cmd + Q, it should prompt a "Hold Cmd + Q to Quit" instead of actually quitting.
Also, I made a mistake, it doesn't close the window. It closes all windows at once. Be afraid!
For some time, the Chrome team refused to implement a 'Sure you want to quit?' popup due to a general anti-popups consensus. They also refused to implement a checkbox to enable that behavior due to a general anti-configuration consensus. They've since relented on the latter.
It is not easy to find a right balance between providing adequate functionality while avoiding information overload. The web is still evolving. We are learning and we will do better (overall) as time goes by. :)
The power switch on the IBM PCs were way at the back so that people couldn't unintentionally reset the computer. The same thinking went into Ctrl-Alt-Del, which was a combination that people wouldn't accidentally hit.
So having a system where F7 would reboot the entire system was pretty dumb, even in the early 80s.
IBM didn't put much thought into safety. You could blow up the early IBM's if you turned on the monitor (screen) before the CPU box.
Gates noted Ctrl-Alt-Del should have been one button, not three . David Bradley, the inventor of the trifecta, did make it deliberately difficult to reboot, however, it was also originally an Easter Egg which made it to production .
I'm not sure a single key would have been a good idea. "The reboot key" just sounds like a mistake waiting to happen. I've seen enough stories on the 'net of laptops with power buttons in terrible places on the keyboard to get a glimpse.
The mistake was using it for the Windows NT screen lock/unlock. Changing the "reboot your computer" sequence into the "start using my computer" sequence is a rather non-sensicle (ignoring implementation) choice.
It already did happen: I've used keyboards with shutdown and reboot keys, and yes, it's a terrible idea.
My alma mater used to run programming competitions in a lab where the workstations had the reset button exactly at knee height. This wasn't a problem normally, when you're sitting under the desk next to the computer; but when you have three or four people gathered around the screen, kicking the reset button was a definite possibility. Eventually taping the cover of a calculator over this button became part of our regular routine.
As in: I hit F6, "Do you want to reboot this?" dialog pops, I hit 'Y', "Do you really really want to reboot this?", I hit 'Y' again.
Instead of actually reading what it says, you just instead press F6-Y-Y in quick succession.
Modern interfaces sometimes make you type some kind of string to confirm, but most either use a password (like sudo) or some hardcoded string that everyone eventually memorizes.
But even today, Windows 7 only makes you click that one button in UAC, and most people probably do it without even thinking about it.
If it's possible they're trying to delete data from the wrong place (say, an administrative account that manages many customers) another safeguard is to have them select the name of the context (customer name, etc.) out of a list of four or five nonsense alternatives.
The user experience tends to involve a lot of double takes and rereading, which is precisely what I want.
Anyone using the app would hate me.
If I ever meet that person IRL... I might even go so far as to make a tasteless joke about committing physical violence in retaliation for the hassle they've caused me.
- When the battery has just started charging, the voltage will not be high enough for the phone to actually work, because the draw from the battery exceeds the plug pack input
- Sometimes when transmitting, the phone uses more power for a fraction of a second than the power pack can deliver. This surge of energy could come from the battery, but the battery is empty so it won't work correctly
- Having some amount of battery means the phone can soft-off correctly when the plug is removed suddenly. The alternative is an un-expected hard off which is usually bad. The user might experience data loss.
There are a bunch of grey areas around low voltage, such as flash writes failing marginally or radio not working correctly or partial saves. Much easier for the engineers and perhaps more reliable for the users to make them wait just a little.
They all seem pretty bad to me.
> When the battery has just started charging, the voltage will not be high enough for the phone to actually work, because the draw from the battery exceeds the plug pack input
What type of battery has an innate "draw"? They need a certain voltage and have a certain internal resistance, but it's easy to efficiently increase the effective internal resistance by boosting voltage with switched capacitor circuits (or whatever). If there's a "smart battery manager" you can bet the hardware to do this is already there.
"Draw" would be an excuse if you were hooking things up manually to a car battery. It's not an excuse in the highly integrated environment of a cellphone where corrective circuitry is dirt cheap (and free relative to what's already probably there).
> Sometimes when transmitting, the phone uses more power for a fraction of a second than the power pack can deliver.
That's what capacitors are for. They're almost certainly more efficient, too. Efficiency slumps away from the optimal I,V much faster for batteries than for capacitors.
> Having some amount of battery means the phone can soft-off correctly when the plug is removed suddenly. The alternative is an un-expected hard off which is usually bad. The user might experience data loss.
I'm pretty sure this is the actual reason why it's done. It's an awful reason.
First of all, you claim that "an un-expected hard off is usually bad". WTF? Does your ext4 linux partition usually die when you hard-off it? I've probably hard-offed ext4+linux 1000 times, never had any problems. I would go to great pains to avoid hard-offing a production server but you must acknowledge that in the age of solid journaled filesystems, hard-offs almost never lead to actual bad consequences, especially for light usage patterns. I'm sure it's worse on some hardware configurations but I've never met a system where it got all the way to "usually bad" territory.
On the other hand, having a power manager lock me out of my phone for 15 minutes after I determine I need to use it has led to loss of data. Significant loss of data. And worse. Pictures that were never taken, phone calls that were delayed at significant inconvenience, the inability to look up contacts for others... these are real world negative consequences that are 1000x more important than a .1% chance of filesystem corruption times, say, a 20% chance of actual power failure. It seems hopelessly myopic to suggest that the cost/benefit trivially favors the prevention of uncommon filesystem errors over addressing the immediate and possibly time-sensitive needs of the user.
I think that whichever organizations choose to implement the lockout feature are doing a massive disservice to their customers, foisting significant hassle upon them in order to save a few pennies/customer of repair costs, if that. Your arguments haven't convinced me otherwise.
I wonder if he had a supervisor sysadmin that he was working under, but given how he described his boss, that seems unlikely as well.
And we still have national security / diplomacy disasters resulting from relatively low level people having access to vital computer systems beyond anybody's imagination.
If something has changed over these years, it's the overall understanding of design principles and their popularization. (Thanks, Don Norman and other people in the field!)
Training has nothing to do with it. Even if you can train a person to work with a badly designed system without making mistakes (often), designing the system well in the first place is almost always significantly easier and cheaper.
For example, accidental key presses can be easily prevented by requiring the user to type a command of reasonable length. Typing "reboot-all-workstations" is not that difficult, but it would definitely prevent the incident described in the article.
"take a deep breath, count to ten"
Doesn't answer "who" put the button in there, I'm just thinking out-loud! :)
To err is human. To push the error to thousands of instances online at once - that's devops.
Even if I have the authority to press the big red button, I'd like it to be behind glass and far from the light switch.
Or just the old rm -rf /
Older stuff asks for much less confirmations.
It was surprisingly boring, taking a very long time to delete all my files (I should have deleted them first). Eventually it got around to deleting fonts, which caused things to render a little strange, but after 60 minutes nothing much had changed and it was still chugging, so we shut it down and went to the bar for my last "Friday night drinks".
Won't make that mistake again.
I was trying to move everything out of a folder to tmp (mv ./* /tmp), which was fine. Then I cd to /, and, wanting to rerun the command I'd run right before the mv, pressed up twice and enter quickly...
Well, it failed once it finished moving /bin/mv to tmp. Of course cp, and a lot of other helpful commands come before mv alphabetically.
It wasn't too bad, just needed a boot from a live disc to move everything back, but I still get nervous whenever / and * are in the same command.
It made for a very trippy and insightful experience. Commands that had already been run (like cp, ls, mv, cd, etc.) worked fine since they were in memory but other ones wouldn't run (/bin was one of the first to go).
I tried for about an hour to bootstrap the system back to normal by copying files of CD/etc but in the end the damage was too bad for my (very amateur) experience and I ended up reinstalling.
discussed here last month https://news.ycombinator.com/item?id=7892247
1. we know a heluva lot more about human factors design
2. we have a heluva lot more excess computer power that can be devoted to human factors
Remember that warning functions are easy to ignore. Never use a warning when you mean undo.
Pressing the button would start a timer. While the timer is running, the user would have the opportunity to review their selection (maybe even with a simulation of what effect the selection has) and could undo the request if necessary. Only after the timer expires would the action actually be taken.
This is how the "undo send" feature of Gmail works.
Human factors grew out of the need to build safe and error-resistant weaponry in World War II. Poor attention to human factors and user interface design was a factor in the Three Mile Island disaster.
With respect, to call this armchair usability and to imply that some users are just stupid is to completely misunderstand what usability is.