
Open-Sourcing Our Incident Response Documentation - kungfudoi
https://www.pagerduty.com/blog/incident-response-documentation/
======
zwischenzug
Interesting. I ran the support function for some of the world's busiest
gambling sites' backends. A lot of this looks very familiar!

Probably the most significant thing I learned was the power of automation
through Incident Models. I spent 7 months of my own time, full time, writing
them for the previous two years' major incidents. This changed my life, as I
simply stopped getting called, and juniors only escalated to me when the docs
were faulty.

~~~
brsanthu
Any details you can share kind of things you automated and tools that used?

~~~
zwischenzug
Oh, the automation I'm speaking of there was the automation of human behaviour
through following a script/checklist. It helped everyone - the juniors felt
more confident about escalating, and they learned from it too.

(I'd read The Checklist Manifesto, which I couldn't recommend enough, BTW).

In more trad automation, I got a little obsessed by automating the
construction of environments, but in a human-readable way:

[http://ianmiell.github.io/shutit/](http://ianmiell.github.io/shutit/)

and Docker (wrote this book):

[https://www.amazon.com/Docker-Practice-Ian-
Miell/dp/16172927...](https://www.amazon.com/Docker-Practice-Ian-
Miell/dp/1617292729/)

See also:

[https://medium.com/@zwischenzugs](https://medium.com/@zwischenzugs)

I should really write up those experiences; the passage of time has given me
more perspective on them. I don't work in ops anymore :)

------
remh
That's a really well written, extensive guide. Thanks to the pagerduty folks
for sharing this.

------
UseStrict
That is an incredibly well documented guide. I develop monitoring software and
am part of an active on call roster so I've seen both sides. I'm surprised how
much information overlaps.

------
the_arun
Thanks for the documentation. Is is available in GitHub instead of a zip file?

~~~
the_arun
Nvmd. After some digging found them here -
[https://github.com/PagerDuty/incident-response-
docs](https://github.com/PagerDuty/incident-response-docs)

