Hacker News new | past | comments | ask | show | jobs | submit login

There's a mantra at my company that you can't assign blame for a problem to a particular person. If one person is capable of breaking your system, you have a bad system. The focus isn't on finding the one person or the one mistake that caused it, but fixing the process so one person or one mistake can't wreak that much havoc. I think it's a very good philosophy.

I remember the huge AWS outage that happened and was due to one engineering fat fingering a command. Instead of firing him they put in policies in place so that can't happen again.

Why would they fire him? He's the one person in the company who's never going to make that mistake again.

"we built a workflow that allowed one person to ruin the company."

that's one heckuva excuse, dude.

Bought a book mentioned in another thread about understanding system failures.

If the conclusion blames an individual then 100% of the time the real problem is with the system that gave them that much power.

What's the book? Would like to add to my reading list :)

And here I thought I could avoid grabbing it from my shelf...

"The Field Guide to Understanding 'Human Error'" by Sidney Dekker.

Has this mantra been stress-tested in the real world with a large scale data breach?

Edit: to add to his, what I mean to say is: it's great that (some) companies have this culture internally. It remains to be seen whether the mantra would survive a sufficiently large scandal. Maybe that's when the legal team comes in with the damage control plan as outlined in another comment by @justboxing.

I work for AWS. We haven't had a breach, but consider that S3 outage not too long ago, which was due to one engineer fat-fingering a command. Rather than blaming or disciplining that person, AWS changed the process so that people aren't manually typing in those commands.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact