Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's the worst thing your code has done?
89 points by Procrastes on July 17, 2017 | hide | past | web | favorite | 97 comments

I wrote some code to manage kits in a warehouse. Like, a customer would order a kit that required A, B, and C. Then the picker would get sent to those three locations and put it all in a box and onto the conveyor belt for shipping.

The problem was the warehouse owner wanted partial kitting. So if the warehouse only had (in our example) A and B, the code would send the picker to put A and B into a box, then direct him to drop it off in a special partial kit area. When C was back in stock, the system would have the workers fill out the partial kits and ship them. This way if a kit required a dozen items and you were just waiting for one to arrive, you could get most of the work done beforehand.

The problem was now A and B are in boxes and not in "inventory". So when someone orders a kit that contains A, B, and D the A and B bins are empty (as all items A and B are already part of a kit and thus not available) and the code would direct him to put D in a box and put it in the partial kit area. Eventually the D bin is empty, so when an order comes for a kit that requires D and E, we get another flood of partial kits, all going to the same location (which was just a square painted on the warehouse floor).

Anyway, long story short, if the right few items were out of stock and the right orders came in the right sequence, nearly the entire inventory of the warehouse ended up in a giant pile of boxes that was too large for the workers to sort through even when the needed items arrived.

Everything was humming along just peachy for weeks and then BAM! Red faces all around. It took days for them to put all the inventory back into the proper bins and fix all the data, and that probably cost into seven figures, all told.

In my defense, I wasn't the last one to touch that module.

Sounds like the problem was the requirements, not the implementation...

what was the best solution for them? out of stock creates work backorder & potential log jam req. overtime and temp workers, but avoids the out-of-stock false positive. in the end, more happy kit owners.

It's been awhile, but I think we just put a hard limit on the number of partial kits.

In the early days of my career I had to modify some code for a PLC that operated on a car production line. The modified code took too long to run so a watchdog process assumed the code had frozen and performed an emergency shutdown of the hydraulics of the line's welding robots. Six cars were damaged when the heavy robot arms crashed and buckled car roofs, and the one-car-every-45-seconds production line ground to a halt for 15 minutes.

A similar PLC story: A friend was an intern at a well known coffee manufacturer. He was writing code for new sensors to use which were to be installed on the assembly line machines later that year. There was a staging machine that his team used in the same room, and they would log into the machine and push their code to run tests.

He sent the machine an update, rebooted it, yet the staging machine was acting unchanged from the previous version. Moments later a supervisor ran into the room, yelled to put his hands up from the keyboard, reached over him, ran some commands, and disappeared into another room. A few minutes later he announced that my friend logged into the main assembly line machines and rebooted them with code that used a sensor that didn't exist yet, which stopped the entire chain for ~20 minutes. The company suffered $250k production losses during that time.

How come an intern has access to the main production line PLCs?

This reminds me poor intern's story posted few days back on Reddit: https://www.reddit.com/r/cscareerquestions/comments/6ez8ag/a...

It must have been an error on the network's access restrictions, because I agree, he shouldn't have had the ability to get to that part of the network. Maybe it was a routing error, since he claims he didn't switch the machine destination before uploading.

Emergency shutdown meaning all the huge robot arms fall down, instead of just freezing in place, doesn't sound like a good idea to start with :)

Hydraulics are scary things there can be a lot of stored energy released very quickly if one of them gives out suddenly. I've heard horror stories about limb amputations etc due to sudden release of hydraulic pressure causing shrapnel to go flying across the room.

If there is a fault the safest thing for them to do is bleed out energy slowly unfortunately in this case sounds like this crushed the 'obstruction' in the process.

My Industrial plant code screwup story was not caused by me but was pretty impressive, what was supposed to be a "simple firewall change" knocked out communication between two interlinked parts of our plant which caused the line to stop and a big delay with a few million dollars lost. I believe the root cause was someone fat fingered the addition of a new firewall rule and we ended up dropping every incoming packet.

Cutting power = you know that the arms will go slack.

Freezing = you hope that whatever the problem is, maintaining power doesn't make it worse.

Ah yes, PLC programming. I expect there will be a lot of stories from PLC programmers in this thread ;)

"It is easy to make a mistake but if you really want to stuff it up you need a PLC". :)

Hey! I'm working at my first software internship and I will very soon be doing some PLC programming on moderately important chemical equipment. Do you have any tips on how to not spectacularly fuck everything up?

I found it helpful to not change "complex" outputs directly, but rather call a function that does that for me and handles the complexity in one place. For instance, if a brake must be removed before activating a motor, rather than pasting the same code every time you need to use the motor, wrap that in a small function block and use that.

In general, make the code easy to interpret, especially when debugging - which means organizing your code and using simple abstractions. State machines can be useful because they're easy to interpret if you comment your states, leading to high-level understandings like "the machine is waiting for an item" rather than "it will do something when I0.5 goes high".

That sounds pretty spectacular, alright. Although, I can't help wondering what it would have sounded like to set the robot arms to playing the William Tell Overture as a restart test. I mean the cars were already messed up...

We had a microwave generator that was used to cook cancers in living patients. We'd ask for a given power, and we had the ability to read back how much power we actually got. But we didn't check that the power we read back was something reasonable. When an op amp failed, the generator produced full power whenever we asked for any power at all. The patient literally got hot enough to emit smoke.

Thank God, the patient was a pig. We hadn't made it into clinical use yet.

>When an op amp failed, the generator produced full power whenever we asked for any power at all.

Huh, I'd expect sensitive systems like these to have some sort of hardware redundancy/voting system.

The therac system is well known.. hardware interlock.

Although that is true, the point is that the software could have done something (in this particular failure case) since it had the means to monitor the actual power. Like "Oh shit, there is too much power; something is wrong; shut everything down".

I'd been looking for a new way to roast whole hogs... My search has ended here.

Holy crap! That kind of failure (where some governing component drops out) is one of my favorite nightmares. Did you add an RF meter to the design?

No. We added code that, if the power was too far off from what we asked for, tried to kill power three different ways, plus alerted the operator. It was a bit tricky, because the power is never exactly what you ask for (variable impedance match, plus noise).

> We added code that, if the power was too far off from what we asked for, tried to kill power three different ways, plus alerted the operator.

My first worry was that your measurement would be wrong, not that the power wouldn't be killed! Any redundancy on that side? Or was it not necessary?

The specific issue that we encountered was that the power was measured correctly, but was out of control. At that point, not being able to kill the power is a very real concern.

If we measured wrong, we could either be high or low. If we measured high (that is, the reading is higher than reality), we would either turn down the power until it read right, or else kill power completely. If we read low, though, IIRC we would limit how high we'd turn up the gain to try to get the power we were asking for.

There was also a feedback loop based on temperature. If the power was double what we asked for, the temperature would quickly climb, and we'd reduce power. It would have worked, even with inaccurate power readings, though not as smoothly as it should with accurate power readings. But when we got 20 times the power we asked for (due to the power control failure), it was too much too fast.

Congratulations. You've found out there is value in having real programmable fuses instead of control electronics.

Usually the failure mode in these kinds of system should be to emit nothing....

Did the police show up?

For a pig? No.


Many years ago, when the earth's crust was still cooling. I wrote an application to generate tool paths for the milling machines my employer made. Milling machines use a cutting tool that looks something like a drill bit except that it cuts on the side of the tool instead of the tip.

One day I was told that my software had a bug. The tool wasn't being retracted (pulled out of the material being cut) before being rapidly moved to a new location. As a result, the cutting tool was being broken off.

I asked [I think it was our application engineer] if we sold the replacement tools to the customer and I was told "yes". Then I asked him: "Then isn't breaking off tools kind of a feature"?

"Just fix it Chris. Just fix it."

And as the earth's crust finally cooled, you fixed it.

Oh my god, I laughed for a good 5 minutes on this one. The comments just add to the hilarity.

That is a thing of beauty.

Not mine, but one I ran into. This is on an automated testing rig for microwave devices, which are odd things - you don't have wires for microwaves, you have wave guides, which are basically tubes which you can pipe the microwave through, and which are incredibly fiddly to get situated properly. So, to test one of these things, you're likely to get a failure and not have any discernible reason for it failing - you'll tear it down and not find any problems, put it back together and it'll work just fine.

Well, the engineer writing the test code knew these devices were odd, and that sometimes they'd just fail. So, s/he put in an if block to the effect that, "if this fails once, run the test 30 times and, if it passes 25/30 times, call it a pass." So, every now and again, the entire automated testing line comes to a halt and sits there for 31x the amount of time it should take, and it's not a short test (maybe sat there 30 minutes each iteration).

I wrote some code that was pulling batches of events off a queue, doing some processing and then writing them out to HDFS.

The inner loop was something like:

    while message:
      converted_event = new Event()
      for event in message.events():
Can you spot the bug? Led to a month of corrupted data before I noticed..

The `set_fields` method does not clear all fields, so every event had more and more junk data than the one before it. All because i thought i would be clever and get some performance gains by initializing `converted_event` outside the inner loop.

Working on school software I forgot to add "and IsDeceased = 0" to a query. Turns out parents don't like getting notifications about their dead childs truancy.

A database with dead kids that have to be tested for in every damn query is a pretty nasty database.

Maybe there should be a separate database of historic students who used to go to that school, and currently enrolled.

It's not just "isDeceased", but "goesToThisSchool". Nobody want to get some notification from a school about something, when their kid doesn't go there any more for any reason.

> Maybe there should be a separate database of historic students who used to go to that school, and currently enrolled.

Or, rather than duplicating data, just use a view with appropriate criteria to limit to currently active, living students for most queries. But a developer that's called into build a query generally isn't going to get a lot of mileage out of suggesting rearchitecting the database, in either of those ways.

> But a developer that's called into build a query generally isn't going to get a lot of mileage out of suggesting rearchitecting the database, in either of those ways.

It's like you were there :)

It was a third party product so changing the structure was out of the question. We had some views but they pulled in the entire database and ended up with so much duplicate and irrelevant information they were unusable.

I tried creating a clean set of views like "v_currentStudents" that could then be joined on for information relevant to the current report. I even built a small test suit for them, but getting the support devs (who I was covering for when this happened) to change their cowboy ways was too hard. Management didn't like they idea either, cut into the billable hours.

Oh, holy shit. This wins the thread, as far as I'm concerned.

I was an intern at a university security lab working on a 7 months project. Early on, I figured it would be a good idea to use SVN to save my work so I setup a repository and did a few commits but quickly stopped maintaining the repo.

One hour before the end of my internship, I was ready to leave, my work done, ready to be used for the next person taking over the project. I want cleanup my files and documentation so it is all tidy and I try to commit my work. Of course SVN cannot commit because the repo and my work have nothing left in common. So I type (on a Linux system): svn delete to cleanup the repo so that I can push my files... I lost months of work and I was not able to recover my lost files from the file system... I had to leave for my country of origin since this internship was part of an exchange program. I felt so bad about it, it still haunts me.

Don't let it drown you!

I once wrote a server "clean up" script moved all .log files older than a few hours to an archive.

Someone else added it to a group policy for all corporate servers, including all our Exchange servers, where the active database transaction logs are named .log.

If we're ever in the same city, I owe you a beverage! Great story.

My code probably contributed to the financial crash of 2007/8.

Unfortunately, I cannot share much details except that I wrote code that was meant to manage the amount of risk that a certain really big financial institution was supposed to take. My code may or may not have shipped after I left that institution. If it did ship, maybe it did not do what it was supposed to do. If it did not ship, maybe it failed to replace the broken system that it was supposed to replace. Either way, months after I left, the head of the institution acknowledged on TV that they were taking on more risk that they intended to.

On the last project I was working on, I built a backend on Node.js v4 for an online course site. For a long time I was trying to convince our team leader to switch to Node v6, since it supported ES6 and I couldn't wait to use the new JavaScript features like, e.g. classes. However, he was always reluctant to make the switch, since there were other priorities at the time.

At some point, I found out that inserting 'use strict' at the beginning of each Node.js module, enabled the experimental ES6 (harmony) features in Node v4. Needless to say, I was super excited and immediately started using classes and other ES6 goodies everywhere, even refactoring already existing modules.

Shorty after that, we noticed that our servers were leaking memory and started crashing almost every day. At the time, I had no idea what the problem was - and believe me I tried everything to find a solution - until a couple of months later we switched to Node v6, and everything miraculously returned back to normal. In the meantime though, during those 2 dreadful months between v4 and v6, we had to setup cron to restart our servers every single day at 04:00...

Never use experimental features.

Never use experimental features ... in production

I'll kick it off with my own. I've had a few, but the most dramatic was when I once changed the wrong line in a configuration script and ripped a three ton(U.S) mixer out of a concrete floor.

Don't leave it there! Details!

I was working on a control system for cattle feed mills. We had to wire into the system sensor-by-sensor and actuator-by-actuator as they continued to make feed. We started out with the entire system simulated, then gradually ended up with a fully live system.

I (thought I) set an actuator running an auger (screw) that offloaded the feed from the mixer into a leg (12 meter tall vertical screw) to "run always." That should be safe right? The auger runs all the time, carrying away anything that dumps into it. What I had actually done was set a hay belt to "run always" it was stuffing the mixer with more and more hay until it was a solid mass inside the box.

Everything seemed fine when we started that next batch of feed... then the mixer started. The lights dimmed and there was this shriek of metal and a bang from the mill floor. We shut down and went out to see this huge mixer hanging off a drive chain at 45 degree angle from the floor. Bolt heads the size of manhole covers had sheared off and were lying nearby. Fortunately no one had been standing nearby. I don't know if my memory matches reality, but I recall a light from one of skylights shining down on it in the grain dust like a spotlight.

I was pretty sure this was going to be my last day on the job.

I walked over to stand next to the Mill Manager, a salty fellow named Marvin with three fingers on his right hand. Marvin looked up the chain and back down to the bolts on the floor and said "Yep, it'll do that."

Two workers lowered it down and welded the bolt heads back in place like they did it every day.

I was with the company for five years. I don't recall every having a support call from that mill after we finished the installation.

> three ton(U.S)

tons and tonnes/'metric tons' are roughly interchangeable, like yards and meters. :)

Wasn't actually my fault: My code ordered the factory to errantly produce several thousand dollars worth of left-hinged doors. (A guy who should have known better set a bunch of flags that messed up it's hinge-determination logic. Anything that was supposed to be produced as one left and one right got produced as two left instead.) As everything was build-to-order it's unlikely any got used at least for their intended purposes. (I still have a few unused doors around--put some casters on them and you have a nice looking rolling wooden platform. The laser printer on the floor beside me is sitting on one of those.)

When I was a teenager, I crashed a MUD hosting server by forking a process in a loop. The admin kindly explained ulimit to me. (This was before VPSes were a thing).

I was so mortified, I guess it stuck well enough that that's the worst thing off the top of my head.

But it seems like I'm an underachiever based on this thread.

Being an underachiever on this thread may make you an overachiever at writing code...

Desperately sought just an extra 4K of RAM to see if a LISP expert system would get through a diagnosis on a Huge Aircrash Firefinder maintenance guide - had a kernel license, dug around and found a magic flag for a 4K block - tested it, seemed OK, put it out in the field, and the first time it ran, it grabbed that extra 4K and was instantly rewarded with a "Panic: out of swap space" and the whole damn thing dropped dead :-(

> Huge Aircrash

Is that a jab at Hughes Aircraft? :) Looks like "Firefinder" is some kind of radar system developed by them.

Friendly jab - it was old when I started my career in the '80s :-) And yeah, firefinder is a radar

Helped the armed forces of my country kill people.

So your code worked as intended, and the intention was to kill people since it was used by the military? You sir, have written the most destructive code on this thread.

I'm sorry for how you must feel.

Yeah. I had no idea at the time what its purpose was, either and found out about it after the fact.

It isn't a pleasant thing to live with.

Then you did not want to hurt anyone.

Imagine someone who works as a knife grinder. If he do his job right the knifes will be much more dangerous, they may cause accidents or even some will be used intentionally as a deadly weapon.

Then considering these possibilities: an ethical knife grinder should do a shitty job, should quit, or should live in self-reproach?

It may be more complicated if you are a gunsmith. But those guns are used by your customers - so in what extent your ethical evaluation will depend on their actions in this case?

For example if your guns are used in an arming race, and eventually they help avoiding war then you are a saint? If your guns are used in a victorious war then you are a hero? And if they are used for killing innocents then you are an evil person? Or you should be judged by the average probabilities of the global gun usage? Or what?

> Imagine someone who works as a knife grinder. If he do his job right the knifes will be much more dangerous, they may cause accidents

Poorly sharpened knives are far more likely to cause accidental injuries, and serious ones, than well-sharpened knives. At least in kitchen use, though I'd expect the “a dull knife is more likely to fail to cut what you meant, slideshow off, and strike something else” effect would apply in most uses of knives.

Maybe. It is also possible that better knifes cause less but more serious accidents, and then it is hard to compare the two. But I stop now, because what we have now are just plausible hypotheses without any real evidence - theoretical knife science waiting for confirmation... ;)

OSHA[1] recommends keeping knives sharp to prevent restaurant and kitchen maladies from occurring.

The Ohio Bureau of Worker's Compensation[2] recommends the same.

The Bureau of Industrial and Labor Statistics[3] cites dull knives as a common cause for injury, and recommends keeping knives 'sharp and in good trim' to prevent accidents.

In short, "a sharp knife is a safe knife" isn't hokum. When you're pushing a knife into something, you're storing and releasing kinetic energy. A sharper knife requires less kinetic energy to begin cutting the object, which is ostensibly dangerous, but not as dangerous as a failure to cut, which releases all that kinetic energy in uncontrollable fashion.

Past that, in the event that you do get cut by a knife, a sharper knife makes a cleaner cut, which means easier healing, easier care, and (if dire enough) easier reattachment. Oh, and less scarring to boot.

[1] - https://www.osha.gov/SLTC/youth/restaurant/knives_foodprep.h...

[2] - https://www.bwc.ohio.gov/downloads/blankpdf/SafetyTalk-Preve...

[3] - https://books.google.com/books?id=W0M4AQAAMAAJ&pg=PA190&lpg=...

I don't feel responsible. I do feel like having a bit of remorse has been useful for me. It's helped me be much more discriminating in my choice of employer.

Almost anything can be repurposed for killing, so take it easy.

Suppose you work on the grep program GNU Coreutils. Harmless, right?

Some regime could use that to grep out a list of innocent people to put on a hit list.

If you had no idea what the purpose was, that means the program had conceivable purposes of various kinds, not related to killing, just like grep.

If you develop something that is pretty much only for killing, obviously so, then you know, right? Or else are capable of incredible denial.

This is interesting. Or is this just a "joke" (albeit dark) using an alternative definition of "worst"

The person asking wasn't specific about what he or she meant. I understand usually that means in the "how did your code fail in some spectacularly bad way?" sense but I took some liberty to answer.

Almost got me fired on the spot.

One of my first implementations at a bank many years ago... bunch of 'C' levels are in the main branch for my first big launch demo...

Tape a few keys...


LPT: never use this in an else case.

So this was something like:

    default:  /* unreachable case */
       assert(0 && "hell musta just froze over");
that type of thing? Impossible case throwing funny error message?

Exactly what I told it to do. Which seemed perfectly reasonable to me ... but had my boss running down the hall muttering something about damage control, seems not all biologists liked receiving letters introducing them to other biologists who's results on some marker or another differed in some not trivial way.

Made people rich, who definitely didn't deserve it.

I wrote code for a domain squatter ad control system as my very first task at my very first job out of college. I am not proud and honestly didn't realize what it was until I got pretty far into it.

Not my code, but I was involved in cleaning up the aftermath. Financial company, a programmer had made a one line change to clean up some working directory at the end of a program. Something like

  "rm -rf /var/scratchdir /"
Yeah the space was a typo. Wasn't running as root but was able to make a pretty big mess regardless.

I move things to /tmp now instead of deleting them. Where the margin of error is a single character "rm" is just too risky.

I did something like that in some code in a from-scratch embedded Linux distro, maybe nine years ago:

  rm -rf $MISPELED_ROOT_DIR/lib/
Oops, the script didn't have "set -u", and I happened to run that as root. So, /lib directory gone.

I managed to recover that machine by copying libs from another one running the same distro.

I developed and maintained CI scripts for large modular C++ application 10+ years ago. Someone added `rm -rf $(SOME_TEMP_DIR)/` to global Makefile that was run before building anything. My CI scripts did not set SOME_TEMP_DIR...

Came to work the next day, nightly build still had not finished on slave servers, had errors about non-existent home folder when tried to log in.

What made it worst was that every server mounted a NFS share that contained fingerprints and binaries of different versions of software modules built on different platforms.

Killed all slaves, restored the NFS share from week old backups on tapes, tens of developers could not create new versions of software and send previous versions/patches to customers for a while.

This is about the third time this has happened in this thread. What's the reason for writing the final "/" and not just `rm -rf $(SOME_TEMP_DIR)`?

Really hoping there's at least one Ariane 5 avionics engineer who reads HN...

This ones a doozie

I was working on Cloud Management software for a Private Cloud at a major tech company in SV. We had software which would reserve Prod IP space for hypervisors, e.g. this hardware SKU can support up to 5 VMs, therefore it needs to reserve 5 IP addresses in the corresponding subnet.

Turned out the API call to reserve the IP space from the IP Manager wasn't asynchronous and because the manager tried to get consecutive space, the runtime increased exponentially with the requested # of IP addresses.

In preparation for Holiday traffic, we were onboarding a new SKU of Hardware. This hardware supported more tenants and so instead of requesting 7 IP addresses per HV, now we're asking for 15. This took the latency of a call to the IP Manager from 3-5 seconds to 5-10 minutes. To round off the perfect storm, the code was retrying requests which failed, without propagating the failure to the Cloud Admins using the software.

One day in October, I received a panicky call from our Capacity manager, customers are trying to spin-up VMs but are being told there's no IP space left. He knows we've onboarded all the racks, and he's done the math on the subnets (which are showing as fully reserved), and there still isn't IP space...WTF!!

Turned out the IP manager's VIP was cutting off requests after a few minutes, (never a possibility when reserving only 7 spaces) but the reservation process wasn't stopping, the IP was being reserved, marked as in-use, but never actually making it to the networking service to be used by VMs.

Solution: At 2am on a Friday night I ran a script to manually mark tens of thousands of production IP records as not-in-use in the IP manager, purely based on grepping through logs from my service, and nslookups. But don't worry, we pinged each IP just to be safe :)

I ran a BBS on an 8-bit microcomputer in the 1980's. I wrote everything myself, including low-level modem drivers in assembly code.

I had some code which handled a temporary loss of carrier. It would poll for the carrier to come back for a few moments, otherwise indicate to layers higher up that carrier is lost, so the user can be logged out.

Problem is, in that piece of code, I forgot to pop something off the stack that I pushed onto the stack. I had a user who was a bit of a cracker. I got a note from the guy, "I got into your operating system by dialing touch tones while connected".

Dialing a touch tone interrupted the carrier sense in the modem, triggering that code with the bad stack handling that would crash the BBS program, leaving the I/O hooks still connected to the modem driver, giving the caller full access to the system.

This didn't reproduce during the usual case when the carrier was lost permanently, only when it recovered.

Can anyone top this one:

  -  function initMultiowned(address[] _owners, uint _required) {
  +  function initMultiowned(address[] _owners, uint _required) internal {
This bug led directly to over $30 million dollars being stolen yesterday. Not my code, but impressive nonetheless.

Hackers have stolen $32 million in Ethereum in the second heist this week http://www.businessinsider.com/report-hackers-stole-32-milli...

Fix initialisation bug. https://github.com/paritytech/parity/commit/e06a1e8dd9cfd8bf...

About fifteen years back I wrote a shell script which runs in the background and which is supposed to send an email to the administrator with the log file , every time it ends up with error. The trouble was, it was an infinite loop( being a background process!) and there was some error .I forgot to tell the code to end , in case there was an error.Very dutifully, the program clogged up the company mail server completely with thousands of mails with error logs over the weekend ,no emails coming or going out and one very angry administrator.

I investigated this bug: Backup system, using a tree data structure where the root was a hash describing a backup, and the leaves were variable-size chunks of data. Backing up a virtual machine, it would process only the changed areas, and re-build that section of tree. Roughly 1 in a few million backups silently lopped off a branch of the tree, a couple levels up. Customers have thousands of VMs, we have thousands of customers. Silent data corruption, somewhere, every day. Rarely-triggered off-by-one errors in un-reproducable data suck.

I was tasked to write a restart function for a desktop application. At the time I was straight out of college with no idea what I was doing, so I asked the lead for some direction.

He told me to: -write out a script that waits 1 second and then runs the application -run the script in a separate process -kill the application

I bet that code is still there. It works, but damn is that cringy.

Not mine but I've seen someone do the classic of having a Bash script with something like "rm -rf $PATH/" where if you run the script without $PATH set it'll wipe out the whole drive if it has permissions. Took out a CI server but luckily we had backups.

Edit: OK, this seems like a very common issue!

The Linux version of Steam had one of these for a bit. People were not happy.

Hmm, what's the lesson here to stop this common and very high impact bug then? Never delete directories using Bash scripts + whatever delete function you do use should be locked down to only ever being allowed to act in your app's subfolder + empty path strings aren't allowed?

I was testing on a Shared Hosting once and got stuck in a loop, crashing the entire server and everyone on it. I had to get the host to reset it because it just wasn't going to ever end. They weren't mad and didn't penalize me or anything, just told me to be careful.

system("rm -rf $dir/")

I forgot to check my inputs. Ran in production for a backup system.

Been unused and irrelevant.

Had a bug in my stock market scan and missed a trade that would have netted me 20% - easy the biggest trade of the year...

rsync -avz project_files/ root@

Essentially production was not acceptable for a little bit....

Accidentally removed our corporate ID from the ad code, very high traffic website. So the ads displayed, but we were not getting paid for the clicks. $140K lost in a few hours. At the time that was almost double my yearly salary.

Nobody got fired, because we had a QA team, and their testing procedure didn't test for something like that.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact