Ask HN: What's the worst thing your code has done? - Procrastes
======
gozur88
I wrote some code to manage kits in a warehouse. Like, a customer would order
a kit that required A, B, and C. Then the picker would get sent to those three
locations and put it all in a box and onto the conveyor belt for shipping.

The problem was the warehouse owner wanted partial kitting. So if the
warehouse only had (in our example) A and B, the code would send the picker to
put A and B into a box, then direct him to drop it off in a special partial
kit area. When C was back in stock, the system would have the workers fill out
the partial kits and ship them. This way if a kit required a dozen items and
you were just waiting for one to arrive, you could get most of the work done
beforehand.

The problem was now A and B are in boxes and not in "inventory". So when
someone orders a kit that contains A, B, and D the A and B bins are empty (as
all items A and B are already part of a kit and thus not available) and the
code would direct him to put D in a box and put it in the partial kit area.
Eventually the D bin is empty, so when an order comes for a kit that requires
D and E, we get another flood of partial kits, all going to the same location
(which was just a square painted on the warehouse floor).

Anyway, long story short, if the right few items were out of stock and the
right orders came in the right sequence, nearly the entire inventory of the
warehouse ended up in a giant pile of boxes that was too large for the workers
to sort through even when the needed items arrived.

Everything was humming along just peachy for weeks and then BAM! Red faces all
around. It took days for them to put all the inventory back into the proper
bins and fix all the data, and that probably cost into seven figures, all
told.

In my defense, I wasn't the last one to touch that module.

~~~
mapster
what was the best solution for them? out of stock creates work backorder &
potential log jam req. overtime and temp workers, but avoids the out-of-stock
false positive. in the end, more happy kit owners.

~~~
gozur88
It's been awhile, but I think we just put a hard limit on the number of
partial kits.

------
jnord
In the early days of my career I had to modify some code for a PLC that
operated on a car production line. The modified code took too long to run so a
watchdog process assumed the code had frozen and performed an emergency
shutdown of the hydraulics of the line's welding robots. Six cars were damaged
when the heavy robot arms crashed and buckled car roofs, and the one-car-
every-45-seconds production line ground to a halt for 15 minutes.

~~~
euyyn
Emergency shutdown meaning all the huge robot arms fall down, instead of just
freezing in place, doesn't sound like a good idea to start with :)

~~~
bigger_cheese
Hydraulics are scary things there can be a lot of stored energy released very
quickly if one of them gives out suddenly. I've heard horror stories about
limb amputations etc due to sudden release of hydraulic pressure causing
shrapnel to go flying across the room.

If there is a fault the safest thing for them to do is bleed out energy slowly
unfortunately in this case sounds like this crushed the 'obstruction' in the
process.

My Industrial plant code screwup story was not caused by me but was pretty
impressive, what was supposed to be a "simple firewall change" knocked out
communication between two interlinked parts of our plant which caused the line
to stop and a big delay with a few million dollars lost. I believe the root
cause was someone fat fingered the addition of a new firewall rule and we
ended up dropping every incoming packet.

------
AnimalMuppet
We had a microwave generator that was used to cook cancers in living patients.
We'd ask for a given power, and we had the ability to read back how much power
we actually got. But we didn't check that the power we read back was something
reasonable. When an op amp failed, the generator produced full power whenever
we asked for any power at all. The patient literally got hot enough to emit
smoke.

Thank God, the patient was a pig. We hadn't made it into clinical use yet.

~~~
CapacitorSet
>When an op amp failed, the generator produced full power whenever we asked
for any power at all.

Huh, I'd expect sensitive systems like these to have some sort of hardware
redundancy/voting system.

~~~
sjg007
The therac system is well known.. hardware interlock.

------
chrisbennet
Many years ago, when the earth's crust was still cooling. I wrote an
application to generate tool paths for the milling machines my employer made.
Milling machines use a cutting tool that looks something like a drill bit
except that it cuts on the side of the tool instead of the tip.

One day I was told that my software had a bug. The tool wasn't being retracted
(pulled out of the material being cut) before being rapidly moved to a new
location. As a result, the cutting tool was being broken off.

I asked [I think it was our application engineer] if we sold the replacement
tools to the customer and I was told "yes". Then I asked him: "Then isn't
breaking off tools kind of a _feature_ "?

"Just fix it Chris. Just fix it."

~~~
LarryPage
And as the earth's crust finally cooled, you fixed it.

------
vortico
Not me, but [https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commi...](https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac#diff-3fbb47e318cd8802bd325e7da9aaabe8L351)

~~~
amingilani
Oh my god, I laughed for a good 5 minutes on this one. The comments just add
to the hilarity.

------
davimack
Not mine, but one I ran into. This is on an automated testing rig for
microwave devices, which are odd things - you don't have wires for microwaves,
you have wave guides, which are basically tubes which you can pipe the
microwave through, and which are incredibly fiddly to get situated properly.
So, to test one of these things, you're likely to get a failure and not have
any discernible reason for it failing - you'll tear it down and not find any
problems, put it back together and it'll work just fine.

Well, the engineer writing the test code knew these devices were odd, and that
sometimes they'd just fail. So, s/he put in an if block to the effect that,
"if this fails once, run the test 30 times and, if it passes 25/30 times, call
it a pass." So, every now and again, the entire automated testing line comes
to a halt and sits there for 31x the amount of time it should take, and it's
not a short test (maybe sat there 30 minutes each iteration).

------
zaptheimpaler
I wrote some code that was pulling batches of events off a queue, doing some
processing and then writing them out to HDFS.

The inner loop was something like:

    
    
        while message:
          converted_event = new Event()
          for event in message.events():
             converted_event.set_fields(event)
             write_to_hdfs(converted_event)
      

Can you spot the bug? Led to a month of corrupted data before I noticed..

The `set_fields` method does not clear all fields, so every event had more and
more junk data than the one before it. All because i thought i would be clever
and get some performance gains by initializing `converted_event` outside the
inner loop.

------
flukus
Working on school software I forgot to add "and IsDeceased = 0" to a query.
Turns out parents don't like getting notifications about their dead childs
truancy.

~~~
kazinator
A database with dead kids that have to be tested for in every damn query is a
pretty nasty database.

Maybe there should be a separate database of historic students who used to go
to that school, and currently enrolled.

It's not just "isDeceased", but "goesToThisSchool". Nobody want to get some
notification from a school about something, when their kid doesn't go there
any more for any reason.

~~~
dragonwriter
> Maybe there should be a separate database of historic students who used to
> go to that school, and currently enrolled.

Or, rather than duplicating data, just use a view with appropriate criteria to
limit to currently active, living students for most queries. But a developer
that's called into build a query generally isn't going to get a lot of mileage
out of suggesting rearchitecting the database, in either of those ways.

~~~
flukus
> But a developer that's called into build a query generally isn't going to
> get a lot of mileage out of suggesting rearchitecting the database, in
> either of those ways.

It's like you were there :)

It was a third party product so changing the structure was out of the
question. We had some views but they pulled in the entire database and ended
up with so much duplicate and irrelevant information they were unusable.

I tried creating a clean set of views like "v_currentStudents" that could then
be joined on for information relevant to the current report. I even built a
small test suit for them, but getting the support devs (who I was covering for
when this happened) to change their cowboy ways was too hard. Management
didn't like they idea either, cut into the billable hours.

------
carvin
I was an intern at a university security lab working on a 7 months project.
Early on, I figured it would be a good idea to use SVN to save my work so I
setup a repository and did a few commits but quickly stopped maintaining the
repo.

One hour before the end of my internship, I was ready to leave, my work done,
ready to be used for the next person taking over the project. I want cleanup
my files and documentation so it is all tidy and I try to commit my work. Of
course SVN cannot commit because the repo and my work have nothing left in
common. So I type (on a Linux system): svn delete to cleanup the repo so that
I can push my files... I lost months of work and I was not able to recover my
lost files from the file system... I had to leave for my country of origin
since this internship was part of an exchange program. I felt so bad about it,
it still haunts me.

~~~
acidus
Don't let it drown you!

------
tatersolid
I once wrote a server "clean up" script moved all _.log files older than a few
hours to an archive.

Someone else added it to a group policy for all corporate servers, including
all our Exchange servers, where the active database transaction logs are named
_.log.

~~~
hluska
If we're ever in the same city, I owe you a beverage! Great story.

------
throwawaysntc
My code probably contributed to the financial crash of 2007/8.

Unfortunately, I cannot share much details except that I wrote code that was
meant to manage the amount of risk that a certain really big financial
institution was supposed to take. My code may or may not have shipped after I
left that institution. If it did ship, maybe it did not do what it was
supposed to do. If it did not ship, maybe it failed to replace the broken
system that it was supposed to replace. Either way, months after I left, the
head of the institution acknowledged on TV that they were taking on more risk
that they intended to.

------
istotex
On the last project I was working on, I built a backend on Node.js v4 for an
online course site. For a long time I was trying to convince our team leader
to switch to Node v6, since it supported ES6 and I couldn't wait to use the
new JavaScript features like, e.g. classes. However, he was always reluctant
to make the switch, since there were other priorities at the time.

At some point, I found out that inserting 'use strict' at the beginning of
each Node.js module, enabled the experimental ES6 (harmony) features in Node
v4. Needless to say, I was super excited and immediately started using classes
and other ES6 goodies everywhere, even refactoring already existing modules.

Shorty after that, we noticed that our servers were leaking memory and started
crashing almost every day. At the time, I had no idea what the problem was -
and believe me I tried everything to find a solution - until a couple of
months later we switched to Node v6, and everything miraculously returned back
to normal. In the meantime though, during those 2 dreadful months between v4
and v6, we had to setup cron to restart our servers every single day at
04:00...

Never use experimental features.

~~~
jefozabuss
Never use experimental features ... in production

------
Procrastes
I'll kick it off with my own. I've had a few, but the most dramatic was when I
once changed the wrong line in a configuration script and ripped a three
ton(U.S) mixer out of a concrete floor.

~~~
euyyn
Don't leave it there! Details!

~~~
Procrastes
I was working on a control system for cattle feed mills. We had to wire into
the system sensor-by-sensor and actuator-by-actuator as they continued to make
feed. We started out with the entire system simulated, then gradually ended up
with a fully live system.

I (thought I) set an actuator running an auger (screw) that offloaded the feed
from the mixer into a leg (12 meter tall vertical screw) to "run always." That
should be safe right? The auger runs all the time, carrying away anything that
dumps into it. What I had actually done was set a hay belt to "run always" it
was stuffing the mixer with more and more hay until it was a solid mass inside
the box.

Everything seemed fine when we started that next batch of feed... then the
mixer started. The lights dimmed and there was this shriek of metal and a bang
from the mill floor. We shut down and went out to see this huge mixer hanging
off a drive chain at 45 degree angle from the floor. Bolt heads the size of
manhole covers had sheared off and were lying nearby. Fortunately no one had
been standing nearby. I don't know if my memory matches reality, but I recall
a light from one of skylights shining down on it in the grain dust like a
spotlight.

I was pretty sure this was going to be my last day on the job.

I walked over to stand next to the Mill Manager, a salty fellow named Marvin
with three fingers on his right hand. Marvin looked up the chain and back down
to the bolts on the floor and said "Yep, it'll do that."

Two workers lowered it down and welded the bolt heads back in place like they
did it every day.

I was with the company for five years. I don't recall every having a support
call from that mill after we finished the installation.

------
LorenPechtel
Wasn't actually my fault: My code ordered the factory to errantly produce
several thousand dollars worth of left-hinged doors. (A guy who should have
known better set a bunch of flags that messed up it's hinge-determination
logic. Anything that was supposed to be produced as one left and one right got
produced as two left instead.) As everything was build-to-order it's unlikely
any got used at least for their intended purposes. (I still have a few unused
doors around--put some casters on them and you have a nice looking rolling
wooden platform. The laser printer on the floor beside me is sitting on one of
those.)

------
ioddly
When I was a teenager, I crashed a MUD hosting server by forking a process in
a loop. The admin kindly explained ulimit to me. (This was before VPSes were a
thing).

I was so mortified, I guess it stuck well enough that that's the worst thing
off the top of my head.

But it seems like I'm an underachiever based on this thread.

~~~
AnimalMuppet
Being an underachiever on this thread may make you an overachiever at writing
code...

------
MarkMMullin
Desperately sought just an extra 4K of RAM to see if a LISP expert system
would get through a diagnosis on a Huge Aircrash Firefinder maintenance guide
- had a kernel license, dug around and found a magic flag for a 4K block -
tested it, seemed OK, put it out in the field, and the first time it ran, it
grabbed that extra 4K and was instantly rewarded with a "Panic: out of swap
space" and the whole damn thing dropped dead :-(

~~~
kazinator
> _Huge Aircrash_

Is that a jab at Hughes Aircraft? :) Looks like "Firefinder" is some kind of
radar system developed by them.

~~~
MarkMMullin
Friendly jab - it was old when I started my career in the '80s :-) And yeah,
firefinder is a radar

------
sidlls
Helped the armed forces of my country kill people.

~~~
amingilani
So your code worked as intended, and the intention was to kill people since it
was used by the military? You sir, have written the most destructive code on
this thread.

I'm sorry for how you must feel.

~~~
sidlls
Yeah. I had no idea at the time what its purpose was, either and found out
about it after the fact.

It isn't a pleasant thing to live with.

~~~
szemet
Then you did not want to hurt anyone.

Imagine someone who works as a knife grinder. If he do his job right the
knifes will be much more dangerous, they may cause accidents or even some will
be used intentionally as a deadly weapon.

Then considering these possibilities: an ethical knife grinder should do a
shitty job, should quit, or should live in self-reproach?

It may be more complicated if you are a gunsmith. But those guns are used by
your customers - so in what extent your ethical evaluation will depend on
their actions in this case?

For example if your guns are used in an arming race, and eventually they help
avoiding war then you are a saint? If your guns are used in a victorious war
then you are a hero? And if they are used for killing innocents then you are
an evil person? Or you should be judged by the average probabilities of the
global gun usage? Or what?

~~~
dragonwriter
> Imagine someone who works as a knife grinder. If he do his job right the
> knifes will be much more dangerous, they may cause accidents

Poorly sharpened knives are _far_ more likely to cause accidental injuries,
and serious ones, than well-sharpened knives. At least in kitchen use, though
I'd expect the “a dull knife is more likely to fail to cut what you meant,
slideshow off, and strike something else” effect would apply in most uses of
knives.

~~~
szemet
Maybe. It is also possible that better knifes cause less but more serious
accidents, and then it is hard to compare the two. But I stop now, because
what we have now are just plausible hypotheses without any real evidence -
theoretical knife science waiting for confirmation... ;)

~~~
bmelton
OSHA[1] recommends keeping knives sharp to prevent restaurant and kitchen
maladies from occurring.

The Ohio Bureau of Worker's Compensation[2] recommends the same.

The Bureau of Industrial and Labor Statistics[3] cites dull knives as a common
cause for injury, and recommends keeping knives 'sharp and in good trim' to
prevent accidents.

In short, "a sharp knife is a safe knife" isn't hokum. When you're pushing a
knife into something, you're storing and releasing kinetic energy. A sharper
knife requires less kinetic energy to begin cutting the object, which is
ostensibly dangerous, but not as dangerous as a failure to cut, which releases
all that kinetic energy in uncontrollable fashion.

Past that, in the event that you do get cut by a knife, a sharper knife makes
a cleaner cut, which means easier healing, easier care, and (if dire enough)
easier reattachment. Oh, and less scarring to boot.

[1] -
[https://www.osha.gov/SLTC/youth/restaurant/knives_foodprep.h...](https://www.osha.gov/SLTC/youth/restaurant/knives_foodprep.html)

[2] - [https://www.bwc.ohio.gov/downloads/blankpdf/SafetyTalk-
Preve...](https://www.bwc.ohio.gov/downloads/blankpdf/SafetyTalk-
Preventingcuts.pdf)

[3] -
[https://books.google.com/books?id=W0M4AQAAMAAJ&pg=PA190&lpg=...](https://books.google.com/books?id=W0M4AQAAMAAJ&pg=PA190&lpg=PA190&dq=statistical+safety+of+sharp+knives+vs.+dull&source=bl&ots=sE_YjoOvmf&sig=axVDXg30fYXHqpobV6UtROwNiFo&hl=en&sa=X&ved=0ahUKEwiaubGdo5PVAhWD8CYKHfKVAV04ChDoAQgmMAE#v=onepage&q=statistical%20safety%20of%20sharp%20knives%20vs.%20dull&f=false)

------
canada_dry
Almost got me fired on the spot.

One of my first implementations at a bank many years ago... bunch of 'C'
levels are in the main branch for my first big launch demo...

Tape a few keys...

    
    
          **ERR ** HELL FROZE OVER!
    
    

LPT: never use this in an else case.

~~~
kazinator
So this was something like:

    
    
        default:  /* unreachable case */
           assert(0 && "hell musta just froze over");
    

that type of thing? Impossible case throwing funny error message?

------
tejtm
Exactly what I told it to do. Which seemed perfectly reasonable to me ... but
had my boss running down the hall muttering something about damage control,
seems not all biologists liked receiving letters introducing them to other
biologists who's results on some marker or another differed in some not
trivial way.

------
kafkaesq
Made people rich, who definitely didn't deserve it.

------
donatj
I wrote code for a domain squatter ad control system as my very first task at
my very first job out of college. I am not proud and honestly didn't realize
what it was until I got pretty far into it.

------
ams6110
Not my code, but I was involved in cleaning up the aftermath. Financial
company, a programmer had made a one line change to clean up some working
directory at the end of a program. Something like

    
    
      "rm -rf /var/scratchdir /"
    

Yeah the space was a typo. Wasn't running as root but was able to make a
pretty big mess regardless.

~~~
seanwilson
I move things to /tmp now instead of deleting them. Where the margin of error
is a single character "rm" is just too risky.

------
aivarsk
I developed and maintained CI scripts for large modular C++ application 10+
years ago. Someone added `rm -rf $(SOME_TEMP_DIR)/` to global Makefile that
was run before building anything. My CI scripts did not set SOME_TEMP_DIR...

Came to work the next day, nightly build still had not finished on slave
servers, had errors about non-existent home folder when tried to log in.

What made it worst was that every server mounted a NFS share that contained
fingerprints and binaries of different versions of software modules built on
different platforms.

Killed all slaves, restored the NFS share from week old backups on tapes, tens
of developers could not create new versions of software and send previous
versions/patches to customers for a while.

~~~
vortico
This is about the third time this has happened in this thread. What's the
reason for writing the final "/" and not just `rm -rf $(SOME_TEMP_DIR)`?

------
allenrb
Really hoping there's at least one Ariane 5 avionics engineer who reads HN...

------
tj-teej
This ones a doozie

I was working on Cloud Management software for a Private Cloud at a major tech
company in SV. We had software which would reserve Prod IP space for
hypervisors, e.g. this hardware SKU can support up to 5 VMs, therefore it
needs to reserve 5 IP addresses in the corresponding subnet.

Turned out the API call to reserve the IP space from the IP Manager wasn't
asynchronous and because the manager tried to get consecutive space, the
runtime increased exponentially with the requested # of IP addresses.

In preparation for Holiday traffic, we were onboarding a new SKU of Hardware.
This hardware supported more tenants and so instead of requesting 7 IP
addresses per HV, now we're asking for 15. This took the latency of a call to
the IP Manager from 3-5 seconds to 5-10 minutes. To round off the perfect
storm, the code was retrying requests which failed, without propagating the
failure to the Cloud Admins using the software.

One day in October, I received a panicky call from our Capacity manager,
customers are trying to spin-up VMs but are being told there's no IP space
left. He knows we've onboarded all the racks, and he's done the math on the
subnets (which are showing as fully reserved), and there still isn't IP
space...WTF!!

Turned out the IP manager's VIP was cutting off requests after a few minutes,
(never a possibility when reserving only 7 spaces) but the reservation process
wasn't stopping, the IP was being reserved, marked as in-use, but never
actually making it to the networking service to be used by VMs.

Solution: At 2am on a Friday night I ran a script to manually mark tens of
thousands of production IP records as not-in-use in the IP manager, purely
based on grepping through logs from my service, and nslookups. But don't
worry, we pinged each IP just to be safe :)

------
kazinator
I ran a BBS on an 8-bit microcomputer in the 1980's. I wrote everything
myself, including low-level modem drivers in assembly code.

I had some code which handled a temporary loss of carrier. It would poll for
the carrier to come back for a few moments, otherwise indicate to layers
higher up that carrier is lost, so the user can be logged out.

Problem is, in that piece of code, I forgot to pop something off the stack
that I pushed onto the stack. I had a user who was a bit of a cracker. I got a
note from the guy, "I got into your operating system by dialing touch tones
while connected".

Dialing a touch tone interrupted the carrier sense in the modem, triggering
that code with the bad stack handling that would crash the BBS program,
leaving the I/O hooks still connected to the modem driver, giving the caller
full access to the system.

This didn't reproduce during the usual case when the carrier was lost
permanently, only when it recovered.

------
baccredited
Can anyone top this one:

    
    
      -  function initMultiowned(address[] _owners, uint _required) {
      +  function initMultiowned(address[] _owners, uint _required) internal {
    

This bug led directly to over $30 million dollars being stolen yesterday. Not
my code, but impressive nonetheless.

Hackers have stolen $32 million in Ethereum in the second heist this week
[http://www.businessinsider.com/report-hackers-
stole-32-milli...](http://www.businessinsider.com/report-hackers-
stole-32-million-in-ethereum-after-a-parity-breach-2017-7)

Fix initialisation bug.
[https://github.com/paritytech/parity/commit/e06a1e8dd9cfd8bf...](https://github.com/paritytech/parity/commit/e06a1e8dd9cfd8bf5d87d24b11aee0e8f6ff9aeb)

------
arunmp
About fifteen years back I wrote a shell script which runs in the background
and which is supposed to send an email to the administrator with the log file
, every time it ends up with error. The trouble was, it was an infinite loop(
being a background process!) and there was some error .I forgot to tell the
code to end , in case there was an error.Very dutifully, the program clogged
up the company mail server completely with thousands of mails with error logs
over the weekend ,no emails coming or going out and one very angry
administrator.

------
khedoros1
I investigated this bug: Backup system, using a tree data structure where the
root was a hash describing a backup, and the leaves were variable-size chunks
of data. Backing up a virtual machine, it would process only the changed
areas, and re-build that section of tree. Roughly 1 in a few million backups
silently lopped off a branch of the tree, a couple levels up. Customers have
thousands of VMs, we have thousands of customers. Silent data corruption,
somewhere, every day. Rarely-triggered off-by-one errors in un-reproducable
data suck.

------
oldsklgdfth
I was tasked to write a restart function for a desktop application. At the
time I was straight out of college with no idea what I was doing, so I asked
the lead for some direction.

He told me to: -write out a script that waits 1 second and then runs the
application -run the script in a separate process -kill the application

I bet that code is still there. It works, but damn is that cringy.

------
seanwilson
Not mine but I've seen someone do the classic of having a Bash script with
something like "rm -rf $PATH/" where if you run the script without $PATH set
it'll wipe out the whole drive if it has permissions. Took out a CI server but
luckily we had backups.

Edit: OK, this seems like a very common issue!

~~~
PhasmaFelis
The Linux version of Steam had one of these for a bit. People were _not_
happy.

~~~
seanwilson
Hmm, what's the lesson here to stop this common and very high impact bug then?
Never delete directories using Bash scripts + whatever delete function you do
use should be locked down to only ever being allowed to act in your app's
subfolder + empty path strings aren't allowed?

------
mattbgates
I was testing on a Shared Hosting once and got stuck in a loop, crashing the
entire server and everyone on it. I had to get the host to reset it because it
just wasn't going to ever end. They weren't mad and didn't penalize me or
anything, just told me to be careful.

------
juli1pb
system("rm -rf $dir/")

I forgot to check my inputs. Ran in production for a backup system.

------
andrewstuart
Been unused and irrelevant.

------
SirLJ
Had a bug in my stock market scan and missed a trade that would have netted me
20% - easy the biggest trade of the year...

------
twovi
rsync -avz project_files/ root@192.168.0.1:/

Essentially production was not acceptable for a little bit....

------
imaginenore
Accidentally removed our corporate ID from the ad code, very high traffic
website. So the ads displayed, but we were not getting paid for the clicks.
$140K lost in a few hours. At the time that was almost double my yearly
salary.

Nobody got fired, because we had a QA team, and their testing procedure didn't
test for something like that.

