Hacker News new | past | comments | ask | show | jobs | submit login

"just roll it back" /s

Anybody want to guess root cause?

Do we have a "root cause" bingo card?

DNS

Database

What else are super likely?




What an odd comment. As a software engineer, my professional guess is the computers aren’t working the way they should.


I used to work for a very high level director who was promoted many, many times (probably VP level+, $300k/yr easily total comp, probably 80 indirect reports overall in the org, probably 10-20 years experience) whose entire incident playbook handling philosophy was "how quickly can we roll it back/why hasn't it been rolled back yet/have you tried rolling it back yet"


It's weird for it to be their _entire_ playbook, but most outages that I've exacerbated were because I panicedly tried to fix things instead of just rolling it back and then taking stock.

I often have to work hard to convince people of all experience levels that it's the best way forward.

- "It's just a little bug I can just fix it [and definitely won't make it worse with code that I haven't tested as rigorously right?]"

- "My KPI/bonus/project plan relies on this going out today"

- "My code is fine it's the infrastructure [that I didn't warn] that can't handle it. They need to fix their side now."

I don't know about your VP but "how fast can we get back to before it was broken?" is reasonably the first thing you should be asking


Incident response should always be: (1) get people enacting the final disaster recovery plan and rollback whilst we (2) see if we can recover from where we stand.

Doing #1 puts some serious boundaries on how bad it can get


i find its usually the same persons or teams responsible for both. hard to do them in parallel


This probably really depends on the type of business you have. I work for a CDN, our outages are usually caused by one of our network peers/providers borking things. There is nothing to rollback.


For sure, and you're not going to be able to roll back a failed power supply. I'm just saying it's a totally reasonable first and maybe even second question


"It works on my career"


“And that’s how docker was born”


If you're a director with 10-20 years experience and you're making $300k/year, you're REALLY doing it wrong.


"Let me run the reverse reverse migration script again"


Woe be upon thine fools that change code and database schema simultaneously.


You missed the other two common ones: permissions change and a disk filled up somewhere.

Before finding out the dead simple failure mode and fix, engineers need to spend countless hours diving into the most technically complex scenarios that might be happening but are irrelevant. Then they can reset permissions or add disk space or add back a DNS entry.


Reminds me of the old sysadmin who always made a file 10% the size of the disk named .root-emergency or similar. Disk filled up? Delete the file, get some breathing time, fix the problem, recreate the file.


Isn't that what ext filesystems on Linux do? IIRC the reserved portion is 5% which can be dropped if you need some headroom.


Yeah, the root-reserved blocks are tunable.

Won't save you if someone's running-as-root reporting job goes rogue and fills up the disk, though, while the file might... I mean, obviously one ought not have done that in the first place, but the real world is a whole thing.


Yep. Often a system crash is caused by logging, which often logs ... as root.


Try SCE to AUX.


Investor left their bottle of Tequilla sitting on the delete key?


that sounds Twitter-Musk-esque, amirite?


It's from the series Silicon Valley


Seems to be limited to loading comments in-app. Posts are loading fine and Reddit.com is showing comment threads.

Busted API deployment?


You should add "BGP config problems" and "cryptolocker"


A Tesla fire in Newark, CA took out an internet backbone?


Or a North American Fiber-Seeking Backhoe.


Just for fun I've bet on hardware failure.


Cat walking on keyboard?


It’s always DNS.


Pre-IPO jitters?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: