Hacker News new | past | comments | ask | show | jobs | submit login

I used to work for a very high level director who was promoted many, many times (probably VP level+, $300k/yr easily total comp, probably 80 indirect reports overall in the org, probably 10-20 years experience) whose entire incident playbook handling philosophy was "how quickly can we roll it back/why hasn't it been rolled back yet/have you tried rolling it back yet"



It's weird for it to be their _entire_ playbook, but most outages that I've exacerbated were because I panicedly tried to fix things instead of just rolling it back and then taking stock.

I often have to work hard to convince people of all experience levels that it's the best way forward.

- "It's just a little bug I can just fix it [and definitely won't make it worse with code that I haven't tested as rigorously right?]"

- "My KPI/bonus/project plan relies on this going out today"

- "My code is fine it's the infrastructure [that I didn't warn] that can't handle it. They need to fix their side now."

I don't know about your VP but "how fast can we get back to before it was broken?" is reasonably the first thing you should be asking


Incident response should always be: (1) get people enacting the final disaster recovery plan and rollback whilst we (2) see if we can recover from where we stand.

Doing #1 puts some serious boundaries on how bad it can get


i find its usually the same persons or teams responsible for both. hard to do them in parallel


This probably really depends on the type of business you have. I work for a CDN, our outages are usually caused by one of our network peers/providers borking things. There is nothing to rollback.


For sure, and you're not going to be able to roll back a failed power supply. I'm just saying it's a totally reasonable first and maybe even second question


"It works on my career"


“And that’s how docker was born”


If you're a director with 10-20 years experience and you're making $300k/year, you're REALLY doing it wrong.


"Let me run the reverse reverse migration script again"


Woe be upon thine fools that change code and database schema simultaneously.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: