- append _prod / _stage to the database name to avoid ambiguity (and ensure psql is configured to show the database name at the prompt)
- configure editors to color code sessions where possible (red for prod, amber for stage and green for dev/local)
- if a manual data modification is necessary, use a transaction and have commands prewritten to verify the intended outcome
In terms of underlying infrastructure I also very much appreciate deletion protection in terraform providers for any data stores, forcing a two phase destruction (one plan/apply to disable the protection and a second to actually destroy)
I did something similar about ~18 months ago; brought up the wrong window and I deleted a table at the bottom of the relationship hierarchy (via the SQL Server Management Studio GUI no less, so I had to click through). Half a second later I realized I was on the wrong server and I tried to cancel the operation. My stomach churned. It did take me a moment to snap out of things (this table had like 200 million rows of ephemeral data that was scraped from the web), but I came to the conclusion that the data is either completely gone forever, or I can talk to co-workers to see what can be done.
Someone suggested we restore the database from its last backup (which we do daily with a few weeks retention), then do a "SELECT INTO FROM" to re-fill the table. We lost a couple hours of data and were soon back online.
I am sure there are more nuggets to take away from this story, but I took: always maintain a robust set of backups and let people know about the problem so you can at least try to work out a solution.
Yup clicking drop table in phpMyAdmin ~15year ago because the button was next to optimize, after a party-night.
BTW: The table was from an insurance company and ~all sent in accidents from that day where deleted, around 500 "customers"...sweaty palms is an understatement.
A while ago a small company I was working for was acquired by a huge national company. Through "synergy" I ended up in working in BI doing ETL from the systems we used into their DB2 warehouses. When there were errors, we'd manually go through the error log and manually insert the erroneous rows. During this transition I was given a set of credentials that, apparently were used in the actual, daily production job, without me being aware. Well, I managed to lock that account out with 3 bad password attempts. It was a huge ordeal; everyone of my director and VP were reprimanded. Noone publicly blamed me because it was, obviously, stupid to run production jobs with credentials you give to some guy watching ETL that day, but I was still quite embarrassed.
Absolutely. Having done this blind and also with a well documented process I can say the latter is far smoother.
The restore process should be written in an incident response run book and practiced by the on-call people. (Maybe in your quarterly disaster recovery tests?)
The terror of data loss fades significantly when you come across a clean, one page restore process and you’re able to get the data back without a lot of effort.
- read only connections / users in general
- append _prod / _stage to the database name to avoid ambiguity (and ensure psql is configured to show the database name at the prompt)
- configure editors to color code sessions where possible (red for prod, amber for stage and green for dev/local)
- if a manual data modification is necessary, use a transaction and have commands prewritten to verify the intended outcome
In terms of underlying infrastructure I also very much appreciate deletion protection in terraform providers for any data stores, forcing a two phase destruction (one plan/apply to disable the protection and a second to actually destroy)