Hacker News new | past | comments | ask | show | jobs | submit login

I never understood the attitude of some companies to fire an employee immediately if they make a mistake such as accidentally deleting some files. If you keep this employee, then you can e pretty sure he'll never made that mistake again. If you fire him and hire someone else, that person might not have had the learning experience of completely screwing up a system.

I think that employees actually makes less mistakes and are more productive if they don't have be worried about being fired for making a mistake.




There is a great quote from Tom Watson Jr (IBM CEO):

> A young executive had made some bad decisions that cost the company several million dollars. He was summoned to Watson’s office, fully expecting to be dismissed. As he entered the office, the young executive said, “I suppose after that set of mistakes you will want to fire me.” Watson was said to have replied,

> “Not at all, young man, we have just spent a couple of million dollars educating you.” [1]

All depends on how leadership views employee growth

[1] http://the-happy-manager.com/articles/characteristic-of-lead...


There's s story about Bill Clinton's early years that is similar. He became governor at 32 and had ambitious plans, increasing the gas tax to fix the roads was one of them. The tax passed and subsequently Clinton lost re-election. He was stung at his loss since he was a fairly popular governor despite the gas tax hike. A few years later he decided to run again and went all over the state to talk to voters. In one small town he came across a man and introduced himself. The man said "I know who you are, you're the sumbitch that raised the gas tax!" Clinton replied "I guess I can't count on your vote then." The man said "Oh, I'll still vote for you." Shocked, Clinton asked why? The man grinned and said, "Cause I know you'll never do that again!"


That's not really the same though. Did Clinton actually manage to fix the roads? If he did, that wasn't a mistake and voters were simply retaliating for a tax increase.


> Did Clinton actually manage to fix the roads? ... and voters were simply retaliating for a tax increase.

Not a great argument because many people view one of the primary responsibilities of local government is to maintain the roads (in USA). If they cannot properly budget and allocate money, regardless if the tax increase worked, it was the wrong way to fix the problem. With this mindset government can fix every problem by raising taxes.... Not acceptable to most people.


That argument doesn't necessarily make sense. You can't budget properly and allocate funds if you have no funds. Look at all the countries with a high level of social services. They collect a lot of tax.

If people really think that the government can maintain roads with no money, assuming they don't have that money, I don't know what to say.


> You can't budget properly and allocate funds if you have no funds. Look at all the countries with a high level of social services. They collect a lot of tax.

If their primary purpose is to take care of the roads, that should be one of the first items that gets funded with taxes they already collect, therein lies the problem people have. It's not like they have no money, it was improperly allocated to the point where they were in the negative to meet the needs required of them. We are not a country with a lot of social services, we have very few. It a case of the government not doing their jobs well and taking more money cover that fact up.


My point is that services don't come from thin air. There may have been things besides the roads that may need to get funded every year and not have enough surplus to cover the roads. You may even have priorities that are important enough that even if running them was a inefficient, you may need to fund them anyway while you try and improve efficiency. Introducing a tax so that you could finally fund a project is not at face value a bad idea.

I am not familiar with this particular instance. But the story about Clinton as it stands is not really relevant. Much like this sub-thread.


> If people really think that the government can maintain roads with no money

Do you really think they brought in "no" money? That's ridiculous. The government should figure out how to waste less of the existing taxes before demanding more.


More simply, setting tax rates is part of budgeting.


If every time I did not budget properly would it be acceptable to ask my boss for more money? Every time? Or is it my fault for not budgeting properly. I'd probably be fired if I did this.


That seems like an unrelated question. I thought we were talking about Clinton's one-time budget to improve roads that included a tax increase to cover it. Clinton wasn't governor in the previous term, he wasn't the one that under-budgeted the roads originally.


> he wasn't the one that under-budgeted the roads originally.

Im not sure who caused the budget deficit in the first place, but he is the one that took more money from citizens to fix a problem that should have been fixed by reallocating existing funds.


No one thinks they can fix the roads with no money. Rather, they think they can fix the roads with the amount of money they have.

That said, governments do spend money that doesn't exist as a matter of routine. That's why the Fed exists.


> I never understood the attitude of some companies to fire an employee immediately if they make a mistake such as accidentally deleting some files. If you keep this employee, then you can e pretty sure he'll never made that mistake again.

I did fire an employee who deleted the entire CVS repository.

Actually, as you say, I didn't fire him for deleting the repo. I fired him the second time he deleted the entire repo.

However there's a silver lining: this is what led us (actually Ian Taylor IIRC) to write the CVS remote protocol (client / server source control). before that it was all over NFS, though the perp in question had actually logged into the machine and done rm -rf on it directly(!).

(Nowadays we have better approaches than CVS but this was the mid 90s)


What the hell. How do people just go around throwing rm -rf s so willy nilly.


Campfire horror story time! Back in 2009 we were outsourcing our ops to a consulting company, who managed to delete our app database... more than once.

The first time it happened, we didn't understand what, exactly, had caused it. The database directory was just gone, and it seemed to have gone around 11pm. I (not they!) discovered this and we scrambled to recover the data. We had replication, but for some reason the guy on call wasn't able to restore from them -- he was standing in for our regular ops guy, who was away on site with another customer -- so after he'd struggled for a while, I said screw it, let's just restore the last dump, which fortunately had run an hour earlier; after some time we were able to get a new master set up, though we had lost one out of data. Everyone went to bed around 1am and things were fine, the users were forgiving, and it seemed like a one-time accident. They promised that setting up a new replication slave would happen the next day.

Then, the next day, at exactly 11pm, the exact same thing happened. This obviously pointed to a regular maintenance job as being the culprit. It turns out the script they used to rotate database backup files did an "rm -rf" of the database directory by accident! Again we scrambled to fix. This time the dump was 4 hours old, and there was no slave we could promote to master. We restored the last dump, and I spent the night writing and running a tool that reconstructed the most important data from our logs (fortunately we logged a great deal, including the content of things users were creating). I was able to go bed around 5am. The following afternoon, our main guy was called back to help fix things and set up replication. He had to travel back to the customer, and the last things he told the other guy was: "Remember to disable the cron job".

Then at 10pm... well, take a guess. Kaboom, no database. Turns out they were using Puppet for configuration management, and when the on-call guy had fixed the cron job, he hadn't edited Puppet; he'd edited the crontab on the machine manually. So Puppet ran 15 mins later and put the destructive cron job back in. This time we called everyone, including the CEO. The department head cut his vacation short and worked until 4am restoring the master from the replication logs.

We then fired the company (which filed for bankruptcy not too long after), got a ton of money back (we threatened to sue for damages), and took over the ops side of things ourselves. Haven't lost a database since.


Mine is from back when I was a sysadmin at the local computer club. We had two Unix machines (a VAX 11/750 and a DECstation of some model). We had a set of terminals connected to the VAX and people were using the DECstation by connecting to it using telnet (this was before ssh).

What happened was that one morning when people were logging in to the DECstation they noticed that things didn't quite work. Pretty much everything they normally did (like running Emacs, compiling things, etc) worked, but other, slightly more rare things just didn't work. The binaries seemed to be missing. It was all very strange.

We spent some time looking into it and finally we figured out what had happened. During some mantenance, the root directory of the DECstation had been NFS-mounted to the VAX, and the mount point was under /tmp. I don't remember who did it, but it's not unlikely that it was me. During the night, the /tmp cleanup script had run on the VAX which deleted all files that had an atime (last access time) of more than 5 days. This meant that all files the DECstation needed to run, and all the files that were used during normal operation were still there, but anything slightly less common had been deleted.

This obviously taught me some lessons, such as never mount anything under /tmp, never NFS mount the root directory of anything and never NFS mount anything with root write permissions. The most important thing about sysadmin disasters are that you learn something from them.


When disk space is limited and you are working with large files, you need to clean up after yourself. And human make mistakes. I am not sure if this still does anything in newer rm, but it used to be a common mistake:

    $ rm -rf / home/myusername/mylargedir/
(note the extra space after slash)

The real solution is comprised of:

    * backups (which are restored periodically to ensure they contain everything)
    * proper process which makes accidental removal harder (DCVS & co.)


Day 1 in my first job in the UK I ran an "update crucial_table set crucial_col = null" without a where clause on production. Turned out there were no backups. Luckily the previous day's staging env had come down from live, so that saved most of the data.

What most people don't realize is that very few places have a real (tested) backup system.

_goes off to check backups_


I had a coworker who would always do manual dangerous SQL like these within a transaction ... and would always mentally compare the "rows affected" with what he thought it should be before committing.

And then commit it.

It's a good habit.


My workflow for modifying production data is:

   1) Write a select statement capturing the rows you want to modify and verify them by eyeball
   2) (Optional) Modify that statement to select the unchanged rows into a temp table to be deleted in a few days
   3) Wrap the statement from step 1 in a transaction
   4) Modify the statement into the update or delete
   5) Check that rowcounts haven't changed from step 1
   6) Copy-and-paste the final statement into your ticketing or dev tracking system
   7) Run the final statement

It may be overkill, but the amount of grief it can save is immeasurable


I have never done what the GP describes but I consider myself very lucky as it's a very common mistake. I have heard enough horror stories to always keep that concern in the back of my mind.

I do what your coworker did and it's a great feeling when you get the "451789356 rows updated" message inside a transaction where you are trying to change Stacy's last name after her wedding and all you have to do is run a ROLLBACK.

Then it's time to go get a coffee and thank your deity of choice.


One of PostgreSQL's best features is transactional DDL: You can run "drop table" etc. in a transaction and roll back afterwards. This has saved me a few times. It also makes it trivial to write atomic migration scripts: Rename a column, drop a table, update all the rows, whatever you want -- it will either all be committed or not committed at all. Surprisingly few databases support this. (Oracle doesn't, last I checked.)


MySQL's console can also warn you if you issue a potentially destructive statement without a WHERE clause: http://dev.mysql.com/doc/refman/5.7/en/mysql-tips.html#safe-...


The `--i-am-a-dummy` flag, which I wish were called `--i-am-prudent` because we all are dummies.


It works for more than databases.

- With shells, I prefix risky commands on production machines with #, especially when cutting and pasting

- Same for committing stuff into VCS, especially when I'm cherrypicking diffs to commit

- Before running find with -delete, run with -print first. Many other utilities have dry-run modes


I do a select first using the where clause I intend to use to get the row count.

Then open a transaction around the update with that same where clause, check the total number of rows updated matches the earlier select, then commit.

This approach definitely reduces your level of anxiety when working on a critical database.


My practise is to do:

  UPDATE ImportantTable SET
    ImportantColumn = ImportantColumn
  WHERE Condition = True
Check the rows affected, then change it to:

  UPDATE ImportantTable SET
    ImportantColumn = NewValue
  WHERE Condition = True


Not doing this is like juggling with knives. I cringe every time I see a colleague doing it.


Lots of people "do backups", not many have a "disaster recovery plan" and very few have ever practised their disaster recovery plan.

Years ago we had an intern for a time, and he set up our backup scripts on production servers. He left after a time, we deleted his user, and went on our merry way. Months later, we discover the backups had been running under his user account, so they hadn't been running at all since he left. A moment of "too busy" led to a week of very, very busy.


I've done that where crucial_col happened to be the password hash column.

We managed to restore all but about a dozen users from backup, and sent a sheepish email to the rest asking them to reset their passwords.


Yup, I did something like that command once, to a project developed over 3 months by 5 people without a backup policy (university group project). Luckily, this was in the days when most of the work was done on non-networked computers, so we cobbled everything together from partial source trees on floppies, hunkered down for a week to hammer out code and got back to where we were before. It's amazing how fast you can write a piece of code when you've already written it once before.

That was the day I started backing up everything.


I am finding more and more that the 'f' is not required. Just 'rm -r' will get you there usually, and so I'm trying to get into the habit of only doing the minimum required. Unfortunately, git repos require the -f.


Accidents like these have happened to me enough times that my .bashrc contains this in all my machines:

    alias rm='echo "This is not the command you are looking for."; false'
I install trash-cli and use that instead.

Of course this does not prevent other kinds of accidents, like calling dd to write on top of the /home partition... ok, I am a mess :)


> The real solution is comprised of

* making "--preserve-root" the default... :-)


Now days with the low price of disk space and high price of time, it's much cheaper to buy new disk drives than to pay people to delete files. And safer!


I did something similar to my personal server using rsync.

> cd /mnt/backup

> sudo rsync -a --delete user@remote:some/$dir/ $dir/

Only to see the local machine become pretty much empty when $dir was not set.

Funny to still see Apache etc still running in memory despite any related files missing.


On Linux, if a process is holding those file handles open, the OS doesn't really delete them until the process is killed. You can dig into /proc and pull out the file descriptor address, cat the contents back out, and restore whatever is still running as long as you don't kill the process.

For next time Apache is hosting a phantom root dir. ;) These things happen to all of us. We just have to be prepared.


Ahh, learned something new. Informative comment.


> before that it was all over NFS, though the perp in question had actually logged into the machine and done rm -rf on it directly(!).

With NFS Version 3, aka NeFS, instead using rlogin to rm -rf on the server, the perp could have sent a PostScript program to the server that runs in the kernel, to rapidly and efficiently delete the entire CVS tree without requiring any network traffic or even any context switches! ;)

http://www.donhopkins.com/home/nfs3_0.pdf

The Network Extensible File System protocol(NeFS) provides transparent remote access to shared file systems over networks. The NeFS protocol is designed to be machine, operating system, network architecture, and transport protocol independent. This document is the draft specification for the protocol. It will remain in draft form during a period of public review. Italicized comments in the document are intended to present the rationale behind elements of the design and to raise questions where there are doubts. Comments and suggestions on this draft specification are most welcome.

The Network File System The Network File System (NFS™* ) has become a de facto standard distributed file system. Since it was first made generally available in 1985 it has been licensed by more than 120 companies. If the NFS protocol has been so successful why does there need to be NeFS ? Because the NFS protocol has deficiencies and limitations that become more apparent and troublesome as it grows older.

1. Size limitations.

The NFS version 2 protocol limits filehandles to 32 bytes, file sizes to the magnitude of a signed 32 bit integer, timestamp accuracy to 1 second. These and other limits need to be extended to cope with current and future demands.

2. Non-idempotent procedures.

A significant number of the NFS procedures are not idempotent. In certain circumstances these procedures can fail unexpectedly if retried by the client. It is not always clear how the client should recover from such a failure.

3. Unix®† bias.

The NFS protocol was designed and first implemented in a Unix environment. This bias is reflected in the protocol: there is no support for record-oriented files, file versions or non-Unix file attributes. This bias must be removed if NFS is to be truly machine and operating system independent.

4. No access procedure.

Numerous security problems and program anomalies are attributable to the fact that clients have no facility to ask a server whether they have permission to carry out certain operations.

5. No facility to support atomic filesystem operations.

For instance the POSIX O_EXCL flag makes a requirement for exclusive file creation. This cannot be guaranteed to work via the NFS protocol without the support of an auxiliary locking service. Similarly there is no way for a client to guarantee that data written to a file is appended to the current end of the file.

6. Performance.

The NFS version 2 protocol provides a fixed set of operations between client and server. While a degree of client caching can significantly reduce the amount of client-server interaction, a level of interaction is required just to maintain cache consistency and there yet remain many examples of high client-server interaction that cannot be reduced by caching. The problem becomes more acute when a client’s set of filesystem operations does not map cleanly into the set of NFS procedures.

1.2 The Network Extensible File System

NeFS addresses the problems just described. Although a draft specification for a revised version of the NFS protocol has addressed many of the deficiencies of NFS version 2, it has not made non-Unix implementations easier, not does it provide opportunities for performance improvements. Indeed, the extra complexity introduced by modifications to the NFS protocol makes all implementations more difficult. A revised NFS protocol does not appear to be an attractive alternative to the existing protocol.

Although it has features in common with NFS, NeFS is a radical departure from NFS. The NFS protocol is built according to a Remote Procedure Call model (RPC) where filesystem operations are mapped across the network as remote procedure calls. The NeFS protocol abandons this model in favor of an interpretive model in which the filesystem operations become operators in an interpreted language. Clients send their requests to the server as programs to be interpreted. Execution of the request by the server’s interpreter results in the filesystem operations being invoked and results returned to the client. Using the interpretive model, filesystem operations can be defined more simply. Clients can build arbitrarily complex requests from these simple operations.


Surely you've heard of at least these arguments:

- Employee was error prone and this mistake was just the biggest one to make headlines. Could be from incompetence or apathy.

- Impacted clients demanded the employee at-fault be terminated.

- Deterrence: fire one guy, everyone else knows to take that issue seriously. Doesn't Google do this? If you leak something to press, you're fired, then a company email goes out "Hey we canned dude for running his mouth..."

It's better to engage the known and perhaps questionable justifications than to "never understand".


Case 1: It's fine to fire individuals for ongoing performance issues. (though you must make clear to those who remain that the number and types issues the individual already had, and the steps that had been taken to help the individual rectify their performance issue.)

Case 2: no competent manager would fire an employee who made a mistake to satisfy clients. They may move the employee to a role away from that client, but it would be insanity to allow the most unreasonable clients to dictate who gets fired. Any manager who does what you suggest should expect to have lost all credibility in the eyes of their team.

Case 3a: A leak to the press is a purposeful action. Firing for cause is perfectly reasonable. Making a mistake is not a purposeful action.

Case 3b: If you want to convey that a particular type of mistake is serious, don't do so by firing people. Do so with investments in education, process, and other tools that reduce the risk of the mistake occurring, and the harm when the mistake occurs. Firing somebody will backfire badly, as many of your best employees will self-select away from your most important projects, and away from your company, as they won't want to be in a situation where years of excellent performance can be erased with a single error.


Case 2: Agreed, but not everyone is lucky enough to work for a competent manager. And managers don't fit neatly within competent and incompetent buckets. Under external or higher pressure ("his job or your job") a normally decent manager might make that call.

Case 3a: Good distinction, a conscious leak is not a mistake. It's possible for a leak to be accidental though, say under alcohol, lost laptop, or just caught off guard by a clever inquisitor.

Case 3b: Firing has the effects you mention, but it also has the effect of assigning gravity to that error. I'm not claiming the benefits outweigh the drawbacks, but some managers do.

I'm not a proponent of the above, but it's good to understand the possible rationale behind these decisions.


Firing someone over making a mistake is never a good idea.

If you're going to have firing offenses, spell those out. E.g. breaking the law, violating some set of rules in the handbook, whatever., so that people can at least know there's a process or sensibility to the actions.

If people can be fired for making a mistake, and that wasn't laid out at the outset, then they're just not gonna trust the stability of your workplace.


Firing for mistakes can make sense in the context of a small company that has to pay enough to rectify the mistake that it significantly impacts the budget. If this cost needs to be recouped, it is only fair that it be recouped from the salary preserved by terminating the responsible party. We're not all megacorps.

This is going to depend on the severity, cost, budget, importance of the role filled, etc., but I think it's probably one of the only semi-plausible justifications for firing based on things that do not reflect a serious and ongoing competency or legal issue.


That's nonsense.

A mistake is made, and a material loss has been incurred. This sucks. Been there, done that, didn't get the t-shirt because we couldn't afford such a luxury. I watched my annual bonus evaporate because of somebody else's cock-up.

But there's no reason to believe that firing the mistake-maker is the best move. Maybe the right move is to find money somewhere else (cutting a bunch of discretionary, pushing some expenses into the future, reducing some investment), or maybe it's to ask a few people to take pay cuts in return for some deferred comp. Or maybe it's to lay-off somebody who didn't do anything wrong, but who provides less marginal value to the company.

But it'd be one hell of a coincidence if, after an honest to god mistake, the best next action was to fire the person who made the mistake. After all, if they were in a position to screw your company that hard, they almost certainly had a history of being talented, highly valued, and trustworthy. If they weren't good, you wouldn't have put them in a place where failure is so devastating.


>But there's no reason to believe that firing the mistake-maker is the best move.

Yeah, I'm not saying it necessarily or even probably is. I'm saying that reality sometimes makes it so that we have to make these compromises.


>Firing for mistakes can make sense in the context of a small company that has to pay enough to rectify the mistake that it significantly impacts the budget. If this cost needs to be recouped, it is only fair that it be recouped from the salary preserved by terminating the responsible party.

What was the fired person doing? Presumably they were performing required work otherwise the company wouldn't have been paying them in the first place.

That means you know need to pay to replace which costs more than keeping an existing employee. Or you could divide their responsibilities among the remaining employees but if you thought you could do that you would have already laid them off without waiting for them to mess something up.


If you're going to let your clients decide when you fire someone you're having some enormous issues. Take the person off their account, sure, but how in hell does a client make your HR decisions?


> Deterrence

If I see someone getting axed for making a mistake, I'd be making a mistake if I didn't immediately start firing up the resume machine.


> Doesn't Google do this? If you leak something to press, you're fired, then a company email goes out "Hey we canned dude for running his mouth..."

I've never heard of this happening. I've heard of people fired for taking photographs (or stealing prototypes!) of confidential products and handing them to journalists.


Leaking something to the press is an entirely different class of failure than a technical screw up.


"Why would I fire you? I just paid $10M to train you to not that make mistake again!"


I had a great boss (founder of the company) who said, after I just screwed up, "There is not a mistake you can make, that I haven't already made. Just don't make the same mistake twice."


Reminds me of the boot-camp story of the nervous recruit inquiring of his sergeant:

"Sir beg my pardon for asking but why did you give Smith 50 press-ups for asking a question? You said that there were no stupid questions. Sir."

"I gave him the press-ups the SECOND time he asked. Will you need to ask again?"


That's awful. So they train people to never ask for clarification or refresh if they misunderstand or forget, so instead they go on to make a far worse mistake acting on incorrect information.


A clarification question is not asking the same question twice. The point seems to be that you should pay attention.


I believe that is the lesson from the story, but I don't believe the lesson from the story == the lesson irl.

The IRL takeaway is that if you don't open your mouth, you don't get punished. If you do open your mouth, you might get punished.


It's incentives. The true benefit to the company comes when people can make mistakes and learn from them. But often, the forces on management are not in alignment. Imagine Manager Mark has a direct report, Ryan, commit a big and public error. And then 1.5 years later Ryan commits another public error.

"What kind of Mickey Mouse show is Manager Mark running over there?" asks colleague Claire, "Isn't that guy Ryan the same one that screwed up the TPS reports last year?"

On the other hand, if Mark fires Ryan, then mark is a decisive manager. Even if the total number of major errors is higher, then still there will not be a risk of letting being known as a manager that let's people screw up multiple times.


"From the Earth to the Moon" - a great series about the space race (Tom Hanks made it after Apollo 13) - has a scene about this:

https://www.youtube.com/watch?v=XuL-_yOOJck


Just like how a new executive is brought into a failing company, the company still fails but the executive is awarded a nice severance package.


Sounds like the life story of Yahoo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: