I used to work for a very high level director who was promoted many, many times (probably VP level+, $300k/yr easily total comp, probably 80 indirect reports overall in the org, probably 10-20 years experience) whose entire incident playbook handling philosophy was "how quickly can we roll it back/why hasn't it been rolled back yet/have you tried rolling it back yet"
It's weird for it to be their _entire_ playbook, but most outages that I've exacerbated were because I panicedly tried to fix things instead of just rolling it back and then taking stock.
I often have to work hard to convince people of all experience levels that it's the best way forward.
- "It's just a little bug I can just fix it [and definitely won't make it worse with code that I haven't tested as rigorously right?]"
- "My KPI/bonus/project plan relies on this going out today"
- "My code is fine it's the infrastructure [that I didn't warn] that can't handle it. They need to fix their side now."
I don't know about your VP but "how fast can we get back to before it was broken?" is reasonably the first thing you should be asking
Incident response should always be: (1) get people enacting the final disaster recovery plan and rollback whilst we (2) see if we can recover from where we stand.
Doing #1 puts some serious boundaries on how bad it can get
This probably really depends on the type of business you have. I work for a CDN, our outages are usually caused by one of our network peers/providers borking things. There is nothing to rollback.
For sure, and you're not going to be able to roll back a failed power supply. I'm just saying it's a totally reasonable first and maybe even second question