
What Makes a Good Runbook? - mooreds
https://www.transposit.com/blog/2019.11.14-what-makes-a-good-runbook/
======
cle
The best runbook is the one that doesn’t exist. Automating operational work
vastly decreases risk, and gives you mechanisms to address many of the points
in this article (you can write audit logs, tests to ensure your automation
works when you need it, etc.).

If you need them, make them as short as possible. It’s an architectural smell
if your runbooks are long and complicated—part of good design is making sure
you can easily operate your system.

~~~
alttab
What if the automation breaks?

Zero ops is a good goal as.it builds the right incentives but even the most
mature systems need a good run book.

~~~
cle
That’s effectively the same as having an incorrect/out-of-date runbook. Except
the likelihood of that happening with automation is much lower if you write
tests to ensure they work when deploying.

There are always some break-the-glass mechanisms that should be documented
(like, how to log on to a host). I’m just saying that in mission critical
systems, you can dramatically reduce risk by minimizing the amount of manual
work you have to do, which prevents mistakes.

~~~
theshadowknows
I agree. Especially when you’re using vendor platforms for things. Salesforce
is a great example. Sometimes things just stop working. And it’s a good idea
to know which levers to pull and which buttons to press (and in what order) to
get things moving again.

