
Building an online community around learning from incidents (2019) - vinnyglennon
https://www.learningfromincidents.io/blog/learning-from-incidents-in-software
======
closeparen
I've noticed an interesting phenomenon in postmortem review meetings: whenever
someone suggests an additional manual step or bureaucratic approval that might
have helped, this suggestion is _always_ accepted. When it's patently
ridiculous, the action item will just languish in a queue until it's purged in
the next ticketing system migration. But in the moment, the whole room's
sentiment is "yes of course, obviously we should have been doing that, how
irresponsible of us." No one wants to talk costs, proportionality, error
budget, etc. I've stopped bringing it up for fear of seeming reckless. But
it's not like we're the space program.

Anyone seen this in their postmortem culture? How did you deal with it?

~~~
asymptotic
There is a quote from Jeff Bezos: "Good intentions never work, you need good
mechanisms to make anything happen.". Hence at Amazon (ideally) any suggestion
that a manual step or bureaucratic approval could improve a process is always
naturally followed up by “But what happens when good intentions fail? We need
a plan for a mechanism”. At least this is the theory, and how Amazon and AWS
avoid accruing manual steps with no plan to automate them.

------
sokoloff
Read John Allspaw’s doc (the first reference in the article).

It’s absolutely critical to create a culture that when Kirin makes an error,
they have no fear and total incentive to explain what they did, when they did
it, why they thought it was the right course of action, what happened when
they did it, etc.

