
Confessions of your worst WTF moment - superted
http://stackoverflow.com/questions/63668/confessions-of-your-worst-wtf-moment
======
edw519
During the week before Christmas, I had to rush a quick change into production
that would allow us to split orders into multiple shipments. The software was
fine, but an admin accidently booted the wrong server (that had a bad test
version of the software) to send feeds to the UPS label printing server.

Two days later, 272 packages arrived at one customer's house in Minnesota.

She had a good laugh, and we managed to fix the problem in a couple of days
without too much additional expense.

Lessons learned:

1\. If something is wrong, I'd rather have it crash than give wrong results.

2\. Get a good version control system.

3\. Have good policies and procedures.

4\. Don't change _anything_ after December 1.

5\. Don't ask me to do a rush job if I'm busy on Hacker News.

~~~
knotty66
I had a similar situation once where a character set encoding bug in a cron
script I wrote and was told to put live late on a Friday caused one lucky
customer to receive an SMS message every minute for over 60 hours.

I got away with that one.

------
kingofspain
I've been fairly lucky over the years (touch wood!) and for most hiccups I've
had the wherewithal to get it fixed, restored or whatever before anyone
noticed...

...Except the time I worked for a web dev company that had just set up a
sideline selling _product_ [1]. At that time we received maybe 5 orders per
day and while I was integrating the payment gateway automation, we had to
manually enter card info into a web interface. We had a _lot_ of attempted
orders from Nigeria so very strict verification was enabled and if you
mistyped a cardholder name or address even slightly then it would fail.

So one day we get an order for around £150. I enter this guys card info into
the page and hit send. Bam! "Not accepted: please try again". I re-check his
info and it's fine except my capitalisation is slightly different. I adjust
and try again. Same message. I try a couple more variations with no luck. I
phone the customer and find his card actually has 'Mr' on it. I try that and
nothing. I try 'Mr.'. And so on and so on. Eventually I give up and tell him
to send a cheque but we'll dispatch today.

Couple of hours later we get a call from an irate Mr Johnny Customer. His bank
have just informed him that we've charged 16 x £150 to his card. Turns out
that the address verification would only fail _after_ a charge attempt was
made - and then a request made immediately to void. However, for debit cards,
the voided process can take 3 days and his card was useless in the meantime.

Not too bad in the scheme of things, but this error was over 3 times my
monthly wage at the time and I'd only been there 2 weeks.

[1] specifics removed as they've since gone on to become quite a big deal
(from the solid foundations I built I presume!) and Google can be a harsh
mistress.

------
arethuza
Fortunately I have to go over back 20 years for mine.

It was a missing $, it was something like this inside a shell loop:

    
    
        mv $i $.old
    

Except I had

    
    
        mv $i .old
    

This was a payroll database where I had just moved all fifty files in the
database to a single file.

No problem - tape backups were up to date! They had three tapes.

First tape failed to restore.

Second tape failed to restore.

Third tape worked!

I learned a lot from that little adventure.

------
danilocampos
My third iPhone project, and the first truly big whack of MVC code I'd ever
written.

Everything was tested and working, but I didn't want to release yet. Had to
optimize a meaty, graphical tableview that was choppy and dropping frames.

Got it glassy smooth, tested it lightly, then released.

Press was solid, sales were outstanding, everything was going well. (edit: I
forgot: This was especially exciting news because I was about to leave a safe
job for big adventure)

But the app was crashing _and destroying any data the user had input before
the crash._

I had to pull the app while I figured out what was wrong, destroying the
product's amazing momentum and killing my sales rank. I didn't care. I felt
terrible, taking people's money with a broken product.

Came down to one over-released button object, _one line of code_ , along with
a boneheaded assumption of when I should commit persistent data.

I donated the entirety of my early sales to charity and chalked up the
experience to the importance of rigorous QA, no matter how trivial a change
seems.

~~~
ananthrk
"I donated the entirety of my early sales to charity and chalked up the
experience to the importance of rigorous QA, no matter how trivial a change
seems."

WOW. Respect.

------
die_sekte
For class, I built a RSA implementation that was susceptible to frequency
analysis. That wasn't so bad, considering that I was a pupil, however the fact
that I got an A+ for it was slightly strange.

------
jbarham
My first job out of university was working on the mainframe systems for the
Canada Pension Plan (equivalent to US Social Security). Every now and then
some upstanding citizen would go in to register for their retirement benefits
only to be told that unfortunately, as far as the computer was concerned, they
were already dead. So we would run the "Lazarus" routine to resurrect them.

That was in the mid 90's but I'm sure the same thing is still happening today.
Pensioners will eventually die, but that old COBOL code never will! :)

------
phaedrus
I wrote a piece of network test software for NASA on an internship. Now, I
must here stop and say that the Johnson Space Center outsources their IT
security to a contracting company, who is based out of another state, and who
is batshit paranoid and completely unwilling to admit the existence of
anything but Windows and Office. I had to escalate up three levels just to get
authorization to hook up a computer running this scary thing called "Linux" to
their network.

Anyway I built a test computer with two NICs in it: one was connected to the
official network so I could get internet for Linux updates and to do research,
and the other was connected to my private test network. While testing the
software I wrote, which is capable of sending low-level "raw" ethernet
packets, I sent 10,000 maliciously malformed IP packets from a MAC address of
"00 00 00 00 00 00" to make sure I couldn't crash the other copy of my program
receiving it across the test network no matter what it received.

Unfortunately, after hitting enter I realized I'd typed the wrong ethN port,
and actually sent the 10,000 malformed packets across the official network.
They weren't directed at anything, but they did reach the switch and probably
triggered an IDS. Oops.

I found out later that the IT people, not content to just turn off my access,
actually drove out and physically disconnected my ethernet cable from their
switch!

------
RyanMcGreal
The probability of this discussion thread ultimately containing at least one
`rm -rf` must be close to 1.

~~~
tseabrooks
I'll bite:

In my undergrad Operating Systems we used Minix and were expected to rewrite
various parts of the system ourselves as our class projects. We were all given
our own systems in a smallish lab used only for this class so we didn't need
to save anything. About halfway through the semester one of my classmates
mistyped a cd command, didn't realize it failed, and quickly did an rm -rf.
This resulted in all their work for the semester being lost. Because we had
all been given our own machines nothing had been saved. After this the rest of
the class started backing up our work daily.

------
mootothemax
Not _worst_ , but certainly embarrassing. A good many years ago, I was working
for a small company, and we got to the stage where we could invest in a couple
of high-end servers. I set everything up, started to test and couldn't log in
to the web front-end: the new server was crashing as soon as I entered my
login credentials. My non-technical boss was starting to get a bit confused
about why we'd spent all of that money...

The login screen required 3 numbers from the user's PIN to be entered, being
selected at random. So, pick a number at random between 1 and 10 (maximum PIN
size), then pick another random number which hasn't already been generated,
and do the same for the third number.

Problem: the random number generator object was created afresh for each
number. The default seed value for the generator was the current time, but
with no greater precision than seconds. Alarm bells started to ring.

The old server was so slow that the loop could run over several seconds - so
getting a fresh random pseudorandom number each time. The new server had no
such problems running our code, and IIS noticed an infinite loop and killed
the thread before it had a chance to affect the rest of the server.

30 seconds after realising this, I moved the object creation out of the loop,
and life was good again. Whoops.

------
jacquesm
Spending a day figuring out why a new routine wasn't called only to realize
there is another copy in /usr/local/bin of the same program that took
precedence in the path.

Having the cleaning lady unplug the air conditioner to vacuum inside a glass
enclosed server cage (fancy office) and to walk in there in the morning. It
was like walking in to a wall of heat, amazingly most of the machines still
worked, but we did end up replacing all of them.

~~~
jsankey
This reminds me of uni days, when I had a habit of compiling a quick snippet
to a binary called "test", and would be left mystified when I ran it and it
just exited immediately. I hate to admit how long it took me, on multiple
occasions, to realise I was running /usr/bin/test (more familiar to me as
"[").

Eventually I wised-up and dropped that habit :).

------
zandorg
I tried to get some open source webserver code into a Gnutella (P2P software)
program called Gnucleus.

He put the code in the repository but never enabled it.

The WTF moment? When Morpheus (P2P company) took Gnucleus and rebadged it as
Morpheus Preview Edition.

My code was in there, orphaned as nothing ran it, and it got downloaded 100
million times. So I missed my chance and didn't get any money because Gnucleus
was GPL.

WTF?

------
masterponomo
When I was a junior programmer at a bank, I had an assignment to change some
reports in the plastics issue/reissue system. I used made-up cardholder names,
such as Malaguena Splunt, Maxie Terwilliger, and all the Beverly Hillbillies.
I ran my test, checked the reports, and thought I was done. Then a phone call
from security informed me that my cards were ready. That's when I found out
that running a test would actually result in test plastics being created.
Since fraudsters could get hold of plastics, the person who created them was
required to go to the plastics area (located in a vault with armed guards) and
fill out a form for each plastic, including the name on the card and the
purpose of the individual test plastic. I spent a very uncomfortable half day
writing up forms for my cast of stupid names, under the watchful eyes of the
security guards.

------
legooolas
Setting up a new switch in the office, and to show the pretty light-show on
startup, rebooted the switch. Turns out that the newer Procurve 1810Gs don't
automatically save the config (you have to tell them to after making changes),
and they don't have loop protection on by default...

So reboot -> no loop protection and that 3-cable LACP trunk I'd just put them
on was suddenly a bit of a loop[1]...

Moral of the story : Don't assume that (what seems like) a minor version
update in a product doesn't change the behaviour substantially (the 1800G
switches save the config when you make changes, the 1810Gs have it as an
explicit step).

[1] A gigabit loop on a completely flat network, so there was essentially a
gigabit of broadcast traffic pounding at the ~100 machines on the network...

------
lt
Several years ago, computers running Win9X in the company I worked for would
stop booting, complaining about missing loader or somesuch. IT was puzzled and
couldn't figure out what was going on - no data was missing and hardware
seemed fine. A windows repair would fix it until it happened again a few weeks
later with someone else.

This happened for a year or so until I was making changes to a code I had
written in the past, in a method that would delete all files from a temp
folder it had created. I noticed a rare case where it could fail to get the
folder name and clean up non-recursively the drive root instead (wiping some
windows boot files and mostly nothing else).

I then learned why you validade arguments and check return codes.

------
MaysonL
I had an off-by-one error, for the case of 4n+3 rows in a half-tone picture
display. I learned about it when I arrived at the office one afternoon and my
boss called me into his office, handed me a copy of TIME, and told me to look
at page 57. The only reason I didn't get fired was that that release also had
a (lossless) compression algorithm that allowed the magazine to push back
their photo deadline by a day. (I've always wondered how good my ~50%
compression was for pictures, when you only have enough RAM to look at one
scan line at a time). Luckily the customers noticed the problem before
printing more than a few copies.

------
keltex
One of my clients wanted to be able to send an email "blast" to all 50,000 of
their customers via their intranet application. They wanted to be able to
include an attachment with the email. I wrote the code and told them it was
ready.

A frantic call came in at midnight. The whole email system was down.
Unfortunately they had attached a 2 MB PDF file to each email. So our little
underpowered Exchange server was desperately trying to send 100 GBs of emails
downs our T1 (plus dealing with all the bounce-backs).

------
masterponomo
Another bank/plastics mishap in my yute: one of my colleagues had a PIN pad on
her desk. I ran my own credit card through it for a $1.00 charge, and it was
approved. I tried a few more times, approved each time. Finally I got a "pick
up card" response. I said, "Hey look, the test system wants to pick up my
card." My co-worker: "That's connected to production, not test." Oops!

------
andrewingram
Hm, nothing too severe. Once I didn't check that the config files used a dummy
smtp server before running the unit tests for sending order confirmation
emails, it was probably a bad idea to be using real data too :)

Just required an apologetic email to customers, no real fallout.

