Hacker News new | past | comments | ask | show | jobs | submit login
That Time I Accidentally Terminated 600 Instances (vertis.io)
2 points by vertis 7 months ago | hide | past | favorite | 3 comments

My favourite story about that is when I knocked down a money transfer service for the entire country for a couple of hours.

Received a ticket to shutdown some servers, "Finally they decom'ed those blades. Should update them first so I could use them for the other things", send the blades to update through OA. 15 minutes later a call with borderline angry voice inquiring why the servers are down. I respond with "Probably because there is some fuckup" in businessese. In 15 minutes it became apparent what the ticket was to shutdown a no longer needed services with a very similar names by shutting down the VMs which hosted them.

The best part? The ticket for the shutdown shouldn't had been even routed to me in the first place, because VMs were handled by the other team.

The worst? You definitely don't want to interrupt the firmware update process.

About ten years ago, we had a guy on his very first day accidentally destroy the S3 bucket hosting all our assets. There were only like 8 of us at this point, we’re all in a small room and we hear “Oh no!”

We had backups but they took a couple hours to recover so we had a decent chunk of downtime.

Ouch. That would suck.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
