
Ask HN: What was your worst technical mistake? - kevinmannix
As an intern, I had a manager tell me that each developer will make at least one giant mistake during their career. The important thing is to have a plan of action, be transparent, and ensure that a similar mistake won&#x27;t be made again.<p>Every now and then a story gains some traction that deals with an employee mistake, whether a small typo or a system design issue that wreaks havoc. Most recently, the story of a fresh-faced developer that dropped production while setting up his dev environment [0].<p>I&#x27;m sure there are many more stories that would could inform others what not to do or what warning signs to look for. What was your biggest mistake, how did you remedy the situation, and what were the repercussions?<p>[0]: http:&#x2F;&#x2F;uk.businessinsider.com&#x2F;worst-first-day-ever-reddit-2017-6
======
existencebox
I share this story from time to time whenever this question comes up. I'm
probably a broken record at this point but I've always thought it important to
set expectations clearly for new devs by being open about my own failures; and
after the recent reddit post it seems about time to braindump once again.

I deleted /etc on a live, user facing, production cluster once.

Wrote a script to determine OS, settings, a bunch of other bits, and then
configure the node appropriately. I sanity checked it for BSD, ubuntu, debian,
RHEL, all the machines I thought it would run on.

Turns out there was a Solaris cluster.

Long and the short; the software I was configuring installed differently on
Solaris, my script did not properly audit/validate, and proceeded to, upon not
finding the right subdirectories when performing a traversal, declare itself
done while still sitting in /etc and nuke the entire dir.

The joking lesson I tell myself from this I summarize as a quote my sysadmin
mentor told me: "Don't miss."

Less glibly, and more actionably,

\- enumerate your edge cases and failure modes rigorously both from a "what do
I expect" and a "what if" perspective. (kinda under this bucket, UNDERSTAND
YOUR GODDAMN SPEC, AGGRESSIVELY; this is true both in ops and dev)

-Write your code with the EXPECTATION that bits will fail, and have it self audit.

-rm * is a big hammer. For all the press DD gets, rm * (and rf) should be used with care and proper precaution, ESPECIALLY if automated. Have extra "mental flags" to give extra care if you see rm *'s and such in your code.

-PHASED ROLLOUTS.

I'm sure there are more learnings, but those are what come to mind at a
thought.

To answer the latter half of your question, the repercussion (and remedy) was
my boss going to me: "whelp, you get to send out an outage email, and learn
how to rebuild a cluster" (not before calling the other sysadmins into the
room, having a brief moment of "let's point and laugh" and then sharing their
own explosions, some of which made mine pale in comparison :) )

------
itamarst
An employee who dropped production on first day is not at fault, it's the
company's fault. I have similar but not quite as bad story, deploying code
that _almost_ brought down our company's main customer. My fault, but
organization was at fault too (but to be fair we had ops people who shut it
off when it caused problems).

So two thoughts:

1\. How bad the outcome is doesn't necessarily reflect on how big a mistake
something is. Software is so complex that even small hard-to-avoid mistakes
can cause big problems... and sometimes big mistakes only cause trivial
problems. So while big mistakes make good stories (and I'm sure people will
post some), every mistake is worth learning from.

2\. Most problems are, in the end, not an individual's fault. It's a whole
system that failed. So don't just like for what you can do better, though
that's important. Figure out where the system broke, and how to make the
system better.

If you want to more deliberately learn from mistakes, Gary Klein's book "The
Power of Intuition" is really useful.

(I am BTW writing a weekly email with mistakes I've made both programming and
in my career - the story I mentioned above is the first email you'd get, and I
just sent out the 41st, with plenty more mistakes to come. 20+ years of coding
and still more mistakes to make!
[https://softwareclown.com](https://softwareclown.com) if you're interested.)

------
jonrgrover
I used inheritance rather than composition when writing a wrapper to DataTable
(before extension methods existed) in C#. I fixed it later for future
companies, but it ran in about ten times the time it should have taken and it
killed the product. A little while later I offered to come back to the company
to fix the mistake. it would only have taken 2 to 3 hours, but by then the
product was dead.

