Hacker News new | past | comments | ask | show | jobs | submit login
Unix Administration Horror Stories (1993) (yak.net)
59 points by amyjess on Sept 2, 2015 | hide | past | favorite | 35 comments



I started out with no experience working for a company that provided services for telephony.

Our company bought out a telephony billing software company (we were leasing that previously). The deal included the guy who wrote the first release and subsequent, by himself, for years. This matters as it explains the mindset - there was low level of trust between us.

So that guy who wrote the software didn't want us messing with his servers/workstations. His solution? To remove the default route. As we were located in different cities, it meant we could only access his systems when he would add routes, which he would remove afterward.

My company needed access to a server one day, and they asked if I could remotely get that going.

Since we paid the bills, we owned the internet router, and I realized I could hop from that router to his servers and add the default route at will.

This worked great and everyone was happy (or didn't know we were doing it), except for me. I was determined to automate that.

So I wrote up a script that would log in to the router, then the server, then add the route, and logoff.

However what I didn't understand was that without wait states between commands it was possible under the right circumstances to overflow a router. This was exactly what happened while I was testing my script... A bunch of garbage flashed across the screen, my telnet session disconnected, and suddenly I could no longer ping their router.

Now we had to tell that guy what happened, so he could meet the phone company engineer on site to reprogram the router.

Just because something works a few times doesn't mean it's good to go...


One of the first things they told me on my first day as an operator in the data-center of a big phone company was: "Remember, in Unix it is not Ctrl + C, Ctrl + V but Ctrl + Insert, Shift + Insert".

One night, in my second week there, I saw some errors on the console coming from a billing process that was running since a couple of days before. I called the on-call engineer in charge of that process and he asked me if I could copy-paste the errors and email them to him...

That month, about 2 million clients received the bills one day too late.


I'm so used to X11 remapping it to Ctrl+Shift+C, Ctrl+Shift+V that I didn't know about this.

This is unfortunate, because Chrome will map Ctrl+Shift+C to the developer console, meaning I open the page inspector instead of copying the line, leading to much pain and consternation on my part.


Not rm related, by I was recently trying to revive a NeXTStation and inatalling from cd left me with a system that booted to single user with a read only file system. The /etc/fstab had been written with the CD as sda and the hard drive as sdb. I couldn't create a mount point, like /tmp/foo because of the read only file system. (I tried to mount a floppy on /tmp, but got the same read-only-file system error.

On a whim I dumped the strings from the mount command and noticed two hard coded paths: /tmp/mnta and /tmp/mntb. Sure enough I was able to mount the root file system (again) read-write to update the file.

No emacs at this point but I did have vi, but it creates a /tmp/file and that directory isn't writable... So I ended up writing usomg sed to mangle my fatab..

Out of curiosity, how would windows deal with a similar problem (corrupted registry?). Any windows stories of triumphs?


Could someone explain what the sccs account does on DEC?

On creative uses of rm(1) I will never forget my first IT job. I was sitting next to the huge vault door that led to our servers. The boss was in the room next to mine. Small operation of just 4 people and I was 19 years old.

Suddenly the boss comes running past my desk and into the server room, seconds later one of our FreeBSD web servers goes down.

After some panting and insistent question asking I find out that he accidentally ran rm -rf /home &. Apparently he hit the ampersand instead of typing out the users directory that he wanted to remove. So his solution to this was to run to the server and pull the power. :)

Not going to debate on whether his solution was better than killing the pid because we had to recover some user dirs from tapes anyways so it was inevitable.

And before anyone mentions it, yes it had HW RAID with a battery. ;)

Edit: Ok I immediately realized sccs could be source control but the poster in the thread doesn't really make it clear how it affected the system so I wasn't sure.


>Apparently he hit [something else] instead of typing out the [...] directory that he wanted to remove.

This exact reason is why I always and without fail type

  rm /some/directory/^A^[[C^[[C -rf
(where ^A means Ctrl+A and ^[[C means right arrow key)

instead of

  rm -rf /some/directory/


I once typed something like:

echo "test mail message" > /usr/bin/qmail

instead of:

echo "test mail message" | /usr/bin/qmail

and went on holiday. Many lessons learned.


why would you use absolute path for qmail?


This was years ago - about 15 I think - and I can't actually remember why. I could lie and say it was something that djb wrote in the manual about best practice at the time but I think it was more likely habit.


Not trusting root's path is defensive practice.

What's 'qmail'?


If root's PATH environment variable can be compromised, then you might as well ditch the whole system.


It may not be common practice (I typically rely on $PATH). But it is, as I noted, defensive.

There are numerous ways in which $PATH can change. Shared profiles and other users modifying root's init among them. Intent isn't necessarily malicious. It can still be unexpected.


There was a Steam script that had a nasty habit of running '# rm /*' due to an empty script variable.


I do not trust vendor scripts.

Really fscking annoying: scripts that include inline binary data.


The first google result for qmail:

> qmail is a mail transfer agent (MTA) that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure replacement for the popular Sendmail program; qmail's source code is in the public domain.


Wrong question.

Not "what is the application 'qmail'" (I'm quite aware, thanks). But "what is the token 'qmail'" evaluated to by the shell.

If you prefer: "which qmail".


That's a a good question. The mail client (most boxes uses mailx to implement it these days) is normally just 'mail'. and qmail would normally be in an sbin dir.


Note: the question applies to all executed commands.


Sorry, I misunderstood you. I retract my snark :)


Habit from scripting. Remember that users can modify their own paths, possibly to prioritize ~/bin over everything else (which may have unexpected behavior).


One of my first jobs was as first, second and third line support at our university. The support group had an ethos of being pro-active and creative - so when I saw a zero-day for local root on Solaris 5.8(?) posted to bugtraq - my first inclination was to a) test it b) test if the sys.admin group was aware/had patched the servers. Since we ran just our own linux boxes, I reached for the first Solaris box on hand, the employee file (and maybe mail?) server. With some 2 (two)-10.000 user accounts. Exploit compiled, ran... and the box crashed.

Thus I learned that exploits can be unreliable - not binary get root or nothing. The phone call down to the sys.admins was a fantastic serving of humblepie. At least they were happy about the heads-up about needing to patch the servers...

I know also know not to run random code from the Internet (even if I had at least read through it) - and that some people kiddie-proof their public zero-days in order to save busy sys.admins some time putting out fires lit by naive support staff and users...


My own horror story: I mounted an NFS share to /tmp while working, and forgot to unmount it later.

Think about it.

Keep thinking about it.

Still keep thinking about it.

Still not sure why that's dangerous?

Give up?

tmpwatch removes files from /tmp with a modification time over a certain amount of days.

I owned up, but a colleague swore he had evidence to prove it wasn't me, even though I was sure it was.


Sorry to hear it! Did you mount it to /tmp directly, or a subdirectory under /tmp? The reason I ask is the manpage says that tmpreaper should not switch filesystems, so I would think it shouldn't ever descend to /tmp/mountpoint :/


/tmp/mountpoint. IIRC it was tmpwatch ON RHEL 4.


In the rm section of horror one story hints towards, and another explicitly notes,

> .* expands to ../*

Not on my box it doesn't. Was this true in some (rather cruel) past?


Actually it simply expands to .., so rm -rf .* happily deletes ../, therefore ./, etc. If you're in /tmp at the time you're making this mistake, it will clobber /. Current GNU rm may be more careful, though, and warn you of unintended consequences.


Actually, that was my point: for me, .* does not expand to .. or .; the special entires "." and ".." appear to not be considered during globbing.

After closer inspection, it turns out that this is because I'm a zsh user:

  % zsh -c 'echo .*'
  zsh:1: no matches found: .*
  % bash -c 'echo .*'
  . ..
Indeed, zsh's manual states:

> No filename generation pattern matches the files `.' or `..'.

… and I've just added another entry to my list of reasons as to why I love zsh.


Same safe behaviour happens when using bash and BSD rm (OS X):

    $ mkdir foo
    $ cd foo
    $ touch 1
    $ touch 2
    $ touch 3
    $ rm *
    $ ls -la .
    total 0
    drwxr-xr-x   2 mike  wheel   68  3 Sep 14:41 .
    drwxrwxrwt  18 root  wheel  612  3 Sep 14:41 ..


I though the spec on rm says it cannot delete the directory you are currently in? http://pubs.opengroup.org/onlinepubs/009604499/utilities/rm....

Plus, it mentions . and .. as no-nos


I think it is not a rm thing but a bash (or sh) thing. The rm command receives the parameter after the wildcards have been expanded.

Unix in the past was for the tough men...


I'm not sure but my persona horror story on OS X was when I typed mkdir ~ then tried to delete my new folder. Needless to say, I'm now paranoid about backups ;-)


Things did used to work this way. As a result of some early trauma I have always written it as 'rm -rf .??*' to avoid matching '.' or '..'.


If you'd care for some fiction, the BOFH [0] series is tangentially related.

[0] http://bofh.ntk.net/BOFH/index.php


I see a lot of `it bites if you forget this very simple glob thing' on Unix.

Do such horror stories exist for Windows ?


Tangentially, the fellow who set up yak.net was my (an many other GaTech students) first Unix mentor in the mid-80s. I owe him a pretty big debt.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: