
Be kind (2016) - tosh
https://www.briangilham.com/be-kind
======
coldtea
Recently, I was asked if I was going to fire an employee who made a mistake
that cost the company $600,000. No, I replied, I just spent $600,000 training
him. Why would I want somebody to hire his experience?

(Thomas J. Watson, IBM CEO)

~~~
bpchaps
An old coworker at a clearing firm once accidentally rebooted two _very_
critical servers during a _very_ critical time. From what I'm told, it ended
up costing the company about a billion dollars in fees and fines.

He managed to keep his job.. just with no more prod access. :)

~~~
foota
What's a clearing?

~~~
koolba
Clearing (and settlement) is the process of delivering securities and
receiving the cash after a securities transaction (ex: stock purchase).

------
orev
Disappointed that one of the lessons was not “don’t deploy on Friday and then
immediately run out the door.” I know most of you will say that that shouldn’t
be an issue if you have proper tests, devops, etc., but this type of thing is
the reason that Ops usually controls the releases. Yes, I know, Ops is
obsolete and can go suck a lemon, but it’s stuff like this that shows the
wisdom of the older ways.

So yeah, be nice to Ops too, because they actually have experience in stuff
like this and one weekend of downtime is not an appropriate price to pay for
every developer to learn a lesson.

~~~
peterwwillis
Actually, what I learned is that being afraid to deploy on Friday means you're
lacking in testing, verification, auto healing and rollback processes.

Also, if something goes down in a way that requires a human to work on the
weekend, it should result in a postmortem, and all of the components in the
deployment chain related to the failure should be evaluated, with new tasks to
fix their causes. If it happens multiple times, all project work should stop
until it's fixed.

This of course is balanced against how much failure your business can
tolerate. If the service goes down and nobody loses money, do you really need
your engineers working overtime to fix it?

~~~
klodolph
> Actually, what I learned is that being afraid to deploy on Friday means
> you're lacking in testing, verification, auto healing and rollback
> processes.

Our philosophy is that if nothing ever breaks in production, you are being too
conservative with your controls and development. Or if you look at it another
way, you can allocate resources towards stability and new features, and the
(near) 100% testing/verification/auto healing/rollback coverage means that too
much of your resources are allocated to stability and not enough towards new
features. Running a service with uptime too close to 100% uptime also causes
pathologies in downstream services, and if your never have to fix anything
manually the skills you need to fix things manually will atrophy.

Or, for our service,

\- There should be a pager with 24 hour coverage, because our service is
critical,

\- That pager _should_ receive some pages but not too many, so operations
stays sharp but not burdened,

\- Automation and service improvements should eliminate the sources of most
pages, and new development should create entirely new problems to solve,

\- If the service uptime is too high, it should be periodically taken down
manually to simulate production failures, and development controls should be
reevaluated to see if they are too restrictive.

Eliminating all the production errors takes a long time and a lot of effort.
Yes, we are spending that effort, but the only way that this process will
actually “finish” is if the product is dead and no more development is being
done. The operations and development teams can then be disbanded and
reallocated to more profitable work. A healthy product lifecycle, in general
(and not in every case), should see production errors until around the team is
downsized to just a couple engineers doing maintenance.

Google calls this an "error budget". We have something similar where I work.
[https://landing.google.com/sre/book/chapters/embracing-
risk....](https://landing.google.com/sre/book/chapters/embracing-risk.html)

You can phrase it as “afraid to deploy on Friday”, but I think “afraid to
cause outages in production” indicates that the blast radius of your errors is
too large or that you’re being too conservative.

~~~
ASalazarMX
Since that pager seems to prevent sleep (24 hour coverage receiving some pages
but not too many), I mustn't be very popular among employees.

I prefer my midnight emergencies at a minimum.

~~~
klodolph
> Since that pager seems to prevent sleep…

The product has 24/7 pager coverage, but that does not mean that one person
has the pager the whole time! At any given time the pager is covered by two or
three people in different time zones. The way my team is structured, I will
only get paged after midnight if someone else drops the page. And I only have
a rotation for one week every couple months or so.

There are definitely employees who don’t enjoy having the pager, but we get
compensated for holding the pager with comp time or cash (our choice). The
comp time adds up to something like 3 weeks per year, and yes, there are
people who take it all as vacation. No, these people are not passed over for
promotions. No, this is not Europe.

So the trade off is that seven weeks a year you carry your laptop with you
everywhere you go, maybe do one or two extra hours of work those weeks, and
don’t go to movies or plays, and then you get three extra weeks off. Yes, it's
popular. People like pager duty because they get to spend extra time with
their families, because they like to go camping, or because they want the
extra cash.

I have _once_ been paged after midnight.

~~~
mmt
> People like pager duty because they get to spend extra time with their
> families, because they like to go camping, or because they want the extra
> cash.

Adequately compensating on-call is, of course, the right way to do it. All
sorts of considerations that were, otherwise, problems, such as how to ensure
a "fair" rotation, magically go away [1].

Unfortunately, it's vanishingly rare, at least among "silicon valley" startups
(and maybe all tech companies). I suspect it's one of those pieces of Ops
wisdom that's vanished from the startup ecosystem because Ops, in general, is
viewed as obsolete, especially by CTOs who are really Chief Software
Development Officers.

Insofar as it's a prerequisite to all your other suggestions, it makes them
non-starters in such companies.

[1] Although I suppose if the compensation is too generous, there may still
end up being complaining about unfairness in allocation

------
js2
There's another "be kind" story that's also been submitted to HN in the past:

[http://boz.com/articles/be-kind.html](http://boz.com/articles/be-kind.html)

\- Brian recalls a time he was treated with kindness, and remembers to be kind
to others.

\- Boz recalls a time he was almost fired due to not being kind, and remembers
to be kind to others.

So remember, be kind!

Previously:

\- 2475 points, 2 years ago:
[https://news.ycombinator.com/item?id=12707606](https://news.ycombinator.com/item?id=12707606)
(Brian)

\- 1198 points, 3 years ago:
[https://news.ycombinator.com/item?id=9534310](https://news.ycombinator.com/item?id=9534310)
(Boz)

------
dave_aiello
What's great about this post is that he told the important points and ended it
quickly. Too many articles like this go on for a lot longer.

~~~
tyrex2017
amen

------
duck
I wiped out some pretty important databases (i.e. our ERP) at my first job.
While it was extremely stressful, I don't remember being scared of being fired
(who else was going to fix it?). The one thing I learned, beyond not doing
that again, was that being confident makes everyone else not worried, and that
gives you a good bit of grace with leadership teams.

I still laugh thinking about how the president, who had never showed any
emotion before and was as serious as they come, brought in a Burger King meal
for me while I was up late working on the restore.

------
chris_mc
List of things I've done in my first ~9 months at my company:

\- Messed up the Software Engineering dept. Jira when I tried to customize my
dept. Jira too much. Took hours to fix, during which time Software Engineering
dept. couldn't do any ticketing actions.

\- `sudo shutdown -h now` on a remote battery control PC because my terminal
was still logged into it via SSH from 5 hours previous. I was trying to
shutdown my laptop at the end of the day, and don't like the buttons when I
have Tilda hotkeyed to 'F1'. Had to send a technician to the site 2 days later
at a cost of several hundred bucks, plus the battery was not operational for 2
days during the most operational time we had in recent months, so we lost
money there. I've done several more of this sort of thing that has required a
tech, too.

\- Forecast that a certain day would be the best day to do a specific
operational test, but then fucked up inputting the ISO-format date/time string
so it started at 7PM on a Saturday rather than 7AM (I know the format pretty
well now: `2018-08-24T23:15:00-07:00`).

\- Forecast that a certain day would be the best day to do a specific
operational test, but fucked up the script, so it mis-calculated everything
and the forecast turned out to be worse than useless (lost the company $30k
over 4 hours).

Luckily, my company was fine with all this and we learned a ton from it (other
people made similar mistakes, too), so it was useful in some way. I am also
way more careful and deliberate about anything now, no more "iterative-
keyboard-banging" (my original programming style) in Python while connecting
directly to the production database!

~~~
slx26
have to say that reading this comment made me nervous

------
stareatgoats
In my experience workplaces that actually value mistakes are rare. That is,
not only tolerate them, but value them as _learning opportunities_ , as I
believe they should be, and as the linked article proposes. Such workplaces do
exist (I've heard), but not in my own anecdotal experience. Mistakes have
always tended to be treated as something shameful, something to be hidden
unless disclosure is absolutely unavoidable, and something to accuse people
of, unless the offender has been a management favorite or something similar.

It's come to the point where I've acquired a nagging suspicion that this is
how it needs to be. That 'to be kind' will always be icing on the cake so to
speak, no more. Maybe I've grown too cynical.

------
adamc
In my experience, believing in people is far more likely to get them to "live
up" to your belief than is the alternative. This is a great story about using
a mistake as a lesson, but also about building a strong and cohesive work
culture.

------
Sonnol53
At the end of the day, know that your coworkers don't have bad intentions
really helps. I have great manager who gives me room to learn and push myself
to work harder.

~~~
softinio
unfortunately some do have bad intentions (it may even be unconsciously)due to
their own insecurities.

------
ImaCake
My work involves doing a lot of work for other people's projects. The nature
of the work often means a simple mistake will result in days or weeks of work
being lost.

This article reflects the difference in how some people have treated me when I
have told them I have made a mistake. When I make a mistake now, those who
treated me nicely I will tell without hesitation. But when they have been less
than nice to me about past failures, I will consider not telling them I made a
mistake.

------
nunobrito
Good story. I also screwed up something at work in my early 20s. Would expect
to have gotten fired, unexpectedly happened a similar reaction to this story.

Was a wakening call. Taught me to be far more careful when deploying and
testing s/w.

------
dools
I had a similar experience when I accidentally ran chmod -r 666 / as root on a
company server that ran the website and mail for the whole company.

The admin taught me about find and xargs, a lesson I have remembered for the
last 15 years.

------
hobls
The longer I work the more convinced I become that kindness is critical. 22
year old me would not agree, but he was wrong.

------
sakopov
Thought I read this before and sure enough here's the previous discussion
about this post [1]. The way I look at it is if you're not nice to someone who
screwed up it's probably because you've never royally screwed anything up
yourself, but I'm sure your time is coming. We are all humbled at some point
during our careers. Mine was 10 years into it and I'll never forget it.

[1]
[https://news.ycombinator.com/item?id=12707606](https://news.ycombinator.com/item?id=12707606)

------
chiru59
What kind of blogging platform is this website built upon? Also, the theme is
amazing, is it available for download?

~~~
wemdyjreichert
[https://builtwith.com/detailed/briangilham.com](https://builtwith.com/detailed/briangilham.com)

------
wbobeirne
Simple stuff, but I feel I need to be reminded of this every now and then.
Thanks for the repost.

------
nosefrog
There are two kinds of lessons you can learn when you mess up:

1\. Be more careful next time so that you're less likely to make the same
mistake again.

2\. Fix the system so that nobody can make that kind of mistake again.

Learning lesson #1 will means that _you 're_ less likely to make the same, but
learning lesson #2 will prevent you _and your team_ from making that same
mistake.

For something that's as easy to test as "is the site working", the real lesson
there is that you set up your deployment system so that the website needs to
respond to a health check before the deploy finishes.

(I realize this is nitpicking and isn't the point of the article, just thought
I'd mention it :P)

------
mikestew
_I talked about the need for proper QA. About thoroughly testing my changes._

Though somewhat beside the point, if you have a dedicated test team, know that
I don't trust my test infrastructure to catch _all_ your screw-ups anymore
than you have confidence that there won't be any. If it passes the tests, I'm
pretty confident we didn't break anything, but it can wait until Monday,
right?

On to the point, everyone has to do this once. We've all got a story (I have
an anthology). If you're confident that a lesson has been learned, no reason
to belabor the point.

------
ggm
I have been kind and have been the recipient of kindness.

I have also been frighteningly, unforgivably unkind and I can tell you, being
kind is better. I still shudder when I think about how unkind I was.

------
gumby
There's always a risk. We kept the person who deleted our company's entire
source repo ... but they didn't learn and it happened a second time.

Good thing IT kept excellent backups! And despite this I didn't "learn" to
fire on first fuckup.

------
garganzol
This is not just a kindness. That's what I call a professional team work.

------
RickJWagner
Nice read, with a good conclusion. We should all heed this one.

------
eludwig
"Hello babies. Welcome to Earth. It's hot in the summer and cold in the
winter. It's round and wet and crowded. On the outside, babies, you've got a
hundred years here. There's only one rule that I know of, babies-"God damn it,
you've got to be kind."

-Kurt Vonnegut

