
Ask HN: What was your worst “oops” moment? - kenrose
Given the recent GitLab data incident, I think many of us likely have scar tissue from a time we did something regrettable in production.<p>- Accidentally nuked the production database?
- Deployed a not easily reversible change?
- Decommissioned irrecoverable infrastructure?<p>In the interest of empathy and #HugOps, what are your horror stories? Share them here.
======
bsvalley
I isolated myself for 5 months (unpaid) to build an app based on a simple idea
I had. I went OCD on it until it was done and live. Passion took over and I
really wanted to see it live. I realized a few week later that no one actually
needs it. I was so convinced by the quality of the app, that people would want
to use it... I bypassed the most important thing - does it really bring any
value? Not really. I was totally blind coding for 5 months... Do your market
validation first and do not necessarily fall in love with your own product.
This is not something I could have built in a month, I knew it was a big
investment up front. Oops!

~~~
romanhn
I did something very similar - left my job a bunch of years back to create a
social network for angel investors. I wrote up some lessons from my failure
here:
[https://github.com/rshekhtm/VentureTap](https://github.com/rshekhtm/VentureTap),
with this paragraph most relevant to this discussion:

"The biggest non-technical takeaway for me was that development in a vacuum
can sink a project that may have otherwise stood a chance. Confusing secrecy
for competitive advantage, I was very guarded about sharing my thoughts with
other people and in fact did not solicit feedback until after the first cut of
the code was released. Having read books on angel investing, I thought I
understood the space - if only I had been aware of the concepts of lean
software development! If nothing else, I gained appreciation for transparency
and communication, albeit too late to make a difference for VentureTap."

Oops :)

------
mattbgates
Worked on a project for almost a year with PHP 5.6 (I started the project
knowing PHP 7.0 was already out there for anyone to use). This might have been
fine to leave for a little project, but since it is a SaaS and likely will
require updates in the future, figured I'd switch it over to PHP 7.0 (better
security, speeds, etc.). Upon the switch, there was nothing but errors. Turned
out that some database code I was using wasn't compatible with PHP 7.0. Had to
go through every file that touched the database and update it with new code.
Luckily the project is still in development and hadn't yet reached public use,
so "oops" for using outdated code, but "awesome" because updating the code
from PHP 5.6 to PHP 7.0 took about a week... and now I have more peace of
mind. It'll be years before PHP 8.0 comes out, right? Hehe.

------
AznHisoka
Not really tech-related, but a few years ago, I took an Uber to another city
for a concert, and accidentally used my company's credit card to pay for the
trip.

What made it worse was I told my boss I was sick that day, and couldn't get
out of bed to go to work.

~~~
usaphp
You could tell you went to a doctor you knew in that city. Uber receipt does
not show what exactly you did there, does it?

~~~
dmichulke
That appointment with Dr. Dre at 10pm?!

------
codegeek
Not necessarily a "tech" oops moment, but many years ago, I accidentally
boarded a wrong flight and realized after fighting for the same seat with
another passenger for few mins. Luckily, I was able to get out on time and run
to my real flight. This is how the conversation went:

Me: Sir, this is my seat.

Person: Are you sure ? I think this seat is mine.

Me (annoyed but confident): Please check your boarding pass

Person: Ok let me check

...In the meantime, someone yells from another seat "You guys both going to
austin/texas right?"

That was my oops moment. I was not going to texas of course. That seat was
half way through the plane and people were still getting in. :)

~~~
victorhn
How was not this detected at boarding gate?

~~~
codegeek
yea I was surprised too. I vaguely remember now but the person checking the
boarding pass just took it and not sure if they actually ran it through the
scanner. Sometimes, they just manually tear one portion of the boarding pass.

------
nvr219
Accidentally cut over to a new production db before making sure the old db was
fully synced. Noticed 2 DAYS later.

------
beat
Long long ago in a previous century, I worked on a project with a "seasonal"
production cycle (develop for six months, run it for six months, etc). I spent
a dev cycle on a project to pull scanned image storage out of Sybase BLOBs and
into the filesystem, for a massive performance boost. This meant restructuring
our disk and db sizing, since the vast majority of our db storage was tied up
in those BLOBs. After months of dev and careful testing, it was ready to go.

It was my crowning achievement on my first big programming job, a massive mark
on the system. And it was intended to be my last - I had taken an offer in
another city. The first day of "production season" was my last day on the job
I'd worked for four and a half years.

So when production started on Friday morning, the DBA noticed something bad -
the databases were filling _far_ faster than expected. Like, "We are going to
run out of space in four hours" kind of faster. The space was supposed to be
enough to last for three months, with room to spare! Oops.

So instead of spending my last day saying goodbye to all my friends and
colleagues, I spend it in a blind panic, trying to figure out _what the hell
is happening_? I'd tested so thoroughly!

But my tests were tiny in scale compared to the scope of the system in
production. One scanner running for an hour or so is nothing, compared to
twenty of them running 24/7\. I couldn't even begin to dent the db size, even
with the apparent bug. I missed something.

After a few hours of panic, I found the bug. It was a latent bug dating to the
very beginning of the system, caused by the difference between NULL and 0 in a
C pointer. You say there's no difference? Well, allow me to retort! The Sybase
db library would, given a BLOB column in a table, automatically allocate 8k of
space for it if you passed in a 0 pointer for content, but would not allocate
if you passed in NULL. Right there in one of the thirty-odd _excellent_ Sybase
paper manuals (they really were excellent), if you read it. It never showed
before then, because those 8k allocations were swamped by the size of the real
images being inserted in most rows.

That was a remarkable day.

And in the end, I feel vindicated, too. I had argued strenuously to remove the
BLOB column completely, but the DBA (a cautious fellow) resisted, because he
wasn't sure what other impact that might cause in other code. Lava Flow
antipattern, right there. If we'd removed the column completely, as I
suggested, the bug would never have occurred.

------
bbcbasic
Went to a client meeting one day early. Not my fault entirely as the organiser
didn't bother to give me the change of date.

------
cr0sh
I was 18 years old, and working at my first software development job. These
were the waning days of dumb terminals, leased lines, and modems. We had no
"test environment" \- test was dev, test was production.

The system was an RS6000 minicomputer (actually, it was like an overgrown PC)
running AIX, and I was remotely logged into it over a 9600 baud leased line,
direct to the client. It was their production system for Arizona's indigent
healthcare system - AHCCCS. My employer was a mom-n-pop shop that developed
the system, in PICK (we used UniVerse on AIX).

All I recall today was that I was testing some kind of data conversion routine
(or something of that nature) on the database for the transaction data; that
is, the claims transactions. These were input via hand-entry and/or scanned
document (usually a combination). My testing consisted of running the routine
(which read from live data, but wrote to "test" database records), checking
the output, clearing the test records, tweaking the code, then re-running the
routine again - until it was right. You can see where this is going...

So there I was, doing a repetitious task, getting into a mindless rhythm -
when disaster struck: I cleared the production tables. Oops.

I held my head, and let out a loud moan. People looked up from their desks.
Before they could ask me what was wrong, all of the phones in the office lit
up. Every single phone was ringing - because of me.

My supervisor and I were called into the owner's office, and asked what had
happened. My head was hung, I explained the issue, he asked my supervisor why
- I thought heads were about to roll, me being a kid, my super (thinking back
to then) not being that much older (late 20s). The owner asked my supervisor
what could be done; he thought quickly, realized that tape backups from the
night before had been made, and that the table I blew away could be -mostly-
reconstructed from two other tables I hadn't touched. A bit of fast coding, a
restore from backup, and most of the data could be restored by morning,
leaving their data entry people to re-enter a few hours worth from paper
records.

Amazingly, we weren't fired on the spot, but told to get to work, and make it
quick. We pair coded at that point, built and tested the conversion routines
meticulously on our development box, then moved them to production to run. We
backed up the remaining data, and ran the conversion on a test batch, then
combed thru the output to make sure it looked right. We ended up staying thru
the night to the next morning, but it got done, and our company continued on
with the client. They were one of the bigger insurance companies in the state
at the time; I have no idea how much money was lost due to my slip - but it
wasn't insignificant, I'm certain.

I learned valuable lessons from that experience - the importance of backups
(extremely recent ones!), the need for a testing area, and to never use
production for testing, making sure to check twice before pressing the return
key on a "delete everything" command, plus how to be fair and calm in the face
of disaster, toward a very junior member of a team (that one from the owner).

Let's just say I haven't repeated that mistake in the 20+ years since; looking
back on it now as an older and more experienced developer, it was definitely
one of those "why would we fire you now? You've just been taught a very
important lesson for a lot of money, which you'll never repeat again!" kind of
moments.

That was my first software development job; the first that launched my career
only 6 months out of high school with no degree. It was a great place to be to
learn many lessons which have remained with me since.

