Hacker News new | past | comments | ask | show | jobs | submit login
How I destroyed the company's DB (zaidesanton.substack.com)
14 points by thunderbong 5 months ago | hide | past | favorite | 17 comments



> The program I used (DBeaver) saw the empty 3rd line, and ignored the 4th line.

A great example of software causing grief by trying to be too clever.


Yeah, I'm firmly against software that tries to helpfully fix mistakes and figure out what you meant. Anything that enforces semi-colons wouldn't have that issue.

Every now and then some story appears on HN complaining that tools can give you an error message and how to fix it instead of just fixing it for you. But when it just fixes it for you, you get problems like this.

Obviously many things went wrong here and the article covers them. But it seems to just gloss over the fact that their SQL client could have prevented it as well...


Dbeaver has a lot of UI problems; although I haven't decided if that is DBeaver's fault or that providing a DB client is a crushingly hard problem.

Although I would have expected a "WARNING - update with no WHERE" nag dialog; I seem to recall it does that from time to time.


Jetbrains IDEs have a cool feature that lets you assign different colors for the console tabs for each db. In my setup, the tab and toolbar for the production db are bright pink (could even change the background to pink but that’s a little too harsh)


PyCharm + DB plugin also has a feature to draw a green rectangle around the query about to run.


I saw a DBA drop a production database once. In a minor miracle they were at the tail end of moving storage onto a san, so there was a several hour down time and a small data loss but not a large impact.

I have had some whoppers of screw ups... lost 250k of actual money (paid out).

Its never about the mistake. Its about what you do in response to it. OWN IT, step up, be on point for the fix; it might not be easy but your making a statement about what kind of person you are.

The Mark Twain quote "Humor is tragedy plus time" is truth. The sooner after things are back to normal, and new processes are in place embrace the jokes... Its a signal to everyone that things are OK!


I've done something similar before, though it was a client's database. They needed an empty copy of their database for a second instance of the software they were running, so I had the tool I was using create a massive DDL-script, then created the new database, selected it and ran the DDL-script in there. The script ran without errors, but all the tables in the original database were now empty. The script that was created by the tool started off with more than a screen full of comments, so I overlooked the "USING [original database]" statement that was the actual start of the script. Because their backup strategy didn't work well, it took until late that night before the person responsible for backups managed to pull the most recent backup (from end of the day before) from tape which I could then restore.


I do wonder if people over-react to database deletions. I see them from time to time and they're generally not big deals in practice. If the recovery systems get a little bit smarter it is conceivable that running SQL against prod is an acceptable thing.

The real problem is silent or partial corruption. Deleting everything gets detected almost immediately; often minutes after the query is run. Silent data corruption can spread for months and be unrecoverable without thoughtful design. But silent data corruption can come from reviewed code too.


Oof, that can be scarily easy to do.

DBeaver by default has a warning you have to accept pop up if you try to run an UPDATE without a where clause. Good reason not to disable that warning.



There are two types of DBs: ones that have been destroyed and repaired, and others that don’t know it yet.

Thankfully it was easily recoverable and the author came out on the other side with some valuable lessons and a bit more wisdom. An easy mistake to make if the process isn’t there to prevent it, at which point it becomes a systemic problem and not a personnel problem.


Minimal process: Edit SQL in editor, have peer review it, debug in your head, copy&paste with 4-eyes.

Compliant process: Write Ticket, approve/review ticket, (give DB access to developer/ops) or CI/CD run SQL with migration, (revoke access), close ticket.

Why? Because I've deleted portions of a live db when I thought I was on testing.


This is why I try not to have permissions to prod.

Sure, in some sense as the AWS account owner and tech lead, I’m generally part of the trust chain and could grant them to myself. But there’s only misery available by having them bestowed normally.

I want it to be effort to touch prod, preferably in a way that involves a second person — because it keeps me from being careless. Which I (as a human) often am.

Can’t break what you can’t touch!


I made this same mistake when I first started out as a developer (forgot the where in a sql statement on production). Both times, we had backups and could easily restore all of the corrupt records.

I haven't made that mistake again (it's been 15+ years).


You should always run a SELECT query first to see that you are operating on the right data. Do a manual sanity check.

Then simply replace SELECT with DELETE and you're done.


PSA, if you’re doing update query, replace update with select *, make a cup of coffee and then look at the results. Both on rows returned as the execution time.


The difference between disaster and near miss is whether you started with BEGIN.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: