I used to work for a very high level director who was promoted many, many times ...

ketralnis · on Feb 21, 2023

It's weird for it to be their _entire_ playbook, but most outages that I've exacerbated were because I panicedly tried to fix things instead of just rolling it back and then taking stock.

I often have to work hard to convince people of all experience levels that it's the best way forward.

- "It's just a little bug I can just fix it [and definitely won't make it worse with code that I haven't tested as rigorously right?]"

- "My KPI/bonus/project plan relies on this going out today"

- "My code is fine it's the infrastructure [that I didn't warn] that can't handle it. They need to fix their side now."

I don't know about your VP but "how fast can we get back to before it was broken?" is reasonably the first thing you should be asking

bombcar · on Feb 21, 2023

Incident response should always be: (1) get people enacting the final disaster recovery plan and rollback whilst we (2) see if we can recover from where we stand.

Doing #1 puts some serious boundaries on how bad it can get

spydum · on Feb 21, 2023

i find its usually the same persons or teams responsible for both. hard to do them in parallel

cortesoft · on Feb 21, 2023

This probably really depends on the type of business you have. I work for a CDN, our outages are usually caused by one of our network peers/providers borking things. There is nothing to rollback.

ketralnis · on Feb 21, 2023

For sure, and you're not going to be able to roll back a failed power supply. I'm just saying it's a totally reasonable first and maybe even second question

lordnacho · on Feb 21, 2023

"It works on my career"

cududa · on Feb 21, 2023

“And that’s how docker was born”

packetslave · on Feb 21, 2023

If you're a director with 10-20 years experience and you're making $300k/year, you're REALLY doing it wrong.

NobleLie · on Feb 21, 2023

"Let me run the reverse reverse migration script again"

Spivak · on Feb 21, 2023

Woe be upon thine fools that change code and database schema simultaneously.