
Unix Administration Horror Stories (1993) - amyjess
http://www.yak.net/carmen/unix_horror_stories
======
Smushman
I started out with no experience working for a company that provided services
for telephony.

Our company bought out a telephony billing software company (we were leasing
that previously). The deal included the guy who wrote the first release and
subsequent, by himself, for years. This matters as it explains the mindset -
there was low level of trust between us.

So that guy who wrote the software didn't want us messing with his
servers/workstations. His solution? To remove the default route. As we were
located in different cities, it meant we could only access his systems when he
would add routes, which he would remove afterward.

My company needed access to a server one day, and they asked if I could
remotely get that going.

Since we paid the bills, we owned the internet router, and I realized I could
hop from that router to his servers and add the default route at will.

This worked great and everyone was happy (or didn't know we were doing it),
except for me. I was determined to automate that.

So I wrote up a script that would log in to the router, then the server, then
add the route, and logoff.

However what I didn't understand was that without wait states between commands
it was possible under the right circumstances to overflow a router. This was
exactly what happened while I was testing my script... A bunch of garbage
flashed across the screen, my telnet session disconnected, and suddenly I
could no longer ping their router.

Now we had to tell that guy what happened, so he could meet the phone company
engineer on site to reprogram the router.

Just because something works a few times doesn't mean it's good to go...

------
6t6t6
One of the first things they told me on my first day as an operator in the
data-center of a big phone company was: "Remember, in Unix it is not Ctrl + C,
Ctrl + V but Ctrl + Insert, Shift + Insert".

One night, in my second week there, I saw some errors on the console coming
from a billing process that was running since a couple of days before. I
called the on-call engineer in charge of that process and he asked me if I
could copy-paste the errors and email them to him...

That month, about 2 million clients received the bills one day too late.

~~~
photosinensis
I'm so used to X11 remapping it to Ctrl+Shift+C, Ctrl+Shift+V that I didn't
know about this.

This is unfortunate, because Chrome will map Ctrl+Shift+C to the developer
console, meaning I open the page inspector instead of copying the line,
leading to much pain and consternation on my part.

------
salgernon
Not rm related, by I was recently trying to revive a NeXTStation and
inatalling from cd left me with a system that booted to single user with a
read only file system. The /etc/fstab had been written with the CD as sda and
the hard drive as sdb. I couldn't create a mount point, like /tmp/foo because
of the read only file system. (I tried to mount a floppy on /tmp, but got the
same read-only-file system error.

On a whim I dumped the strings from the mount command and noticed two hard
coded paths: /tmp/mnta and /tmp/mntb. Sure enough I was able to mount the root
file system (again) read-write to update the file.

No emacs at this point but I did have vi, but it creates a /tmp/file and that
directory isn't writable... So I ended up writing usomg sed to mangle my
fatab..

Out of curiosity, how would windows deal with a similar problem (corrupted
registry?). Any windows stories of triumphs?

------
INTPenis
Could someone explain what the sccs account does on DEC?

On creative uses of rm(1) I will never forget my first IT job. I was sitting
next to the huge vault door that led to our servers. The boss was in the room
next to mine. Small operation of just 4 people and I was 19 years old.

Suddenly the boss comes running past my desk and into the server room, seconds
later one of our FreeBSD web servers goes down.

After some panting and insistent question asking I find out that he
accidentally ran rm -rf /home &. Apparently he hit the ampersand instead of
typing out the users directory that he wanted to remove. So his solution to
this was to run to the server and pull the power. :)

Not going to debate on whether his solution was better than killing the pid
because we had to recover some user dirs from tapes anyways so it was
inevitable.

And before anyone mentions it, yes it had HW RAID with a battery. ;)

Edit: Ok I immediately realized sccs could be source control but the poster in
the thread doesn't really make it clear how it affected the system so I wasn't
sure.

~~~
sweetcakes_2600
>Apparently he hit [something else] instead of typing out the [...] directory
that he wanted to remove.

This exact reason is why I always and without fail type

    
    
      rm /some/directory/^A^[[C^[[C -rf
    

(where ^A means Ctrl+A and ^[[C means right arrow key)

instead of

    
    
      rm -rf /some/directory/

------
camperman
I once typed something like:

echo "test mail message" > /usr/bin/qmail

instead of:

echo "test mail message" | /usr/bin/qmail

and went on holiday. Many lessons learned.

~~~
gruez
why would you use absolute path for qmail?

~~~
dredmorbius
Not trusting root's path is defensive practice.

What's 'qmail'?

~~~
ciupicri
If root's PATH environment variable can be compromised, then you might as well
ditch the whole system.

~~~
dredmorbius
It may not be _common_ practice (I _typically_ rely on $PATH). But it _is_ ,
as I noted, _defensive_.

There are numerous ways in which $PATH can change. Shared profiles and other
users modifying root's init among them. Intent isn't necessarily malicious. It
can still be unexpected.

~~~
rhinoceraptor
There was a Steam script that had a nasty habit of running '# rm /*' due to an
empty script variable.

~~~
dredmorbius
I _do not_ trust vendor scripts.

Really fscking annoying: scripts that include inline binary data.

------
e12e
One of my first jobs was as first, second and third line support at our
university. The support group had an ethos of being pro-active and creative -
so when I saw a zero-day for local root on Solaris 5.8(?) posted to bugtraq -
my first inclination was to a) test it b) test if the sys.admin group was
aware/had patched the servers. Since we ran just our own linux boxes, I
reached for the first Solaris box on hand, the employee file (and maybe mail?)
server. With some 2 (two)-10.000 user accounts. Exploit compiled, ran... and
the box crashed.

Thus I learned that exploits can be unreliable - not binary get root or
nothing. The phone call down to the sys.admins was a fantastic serving of
humblepie. At least they were happy about the heads-up about needing to patch
the servers...

I know _also_ know not to run random code from the Internet (even if I had at
least read through it) - and that some people kiddie-proof their public zero-
days in order to save busy sys.admins some time putting out fires lit by naive
support staff and users...

------
nailer
My own horror story: I mounted an NFS share to /tmp while working, and forgot
to unmount it later.

Think about it.

Keep thinking about it.

Still keep thinking about it.

Still not sure why that's dangerous?

Give up?

tmpwatch removes files from /tmp with a modification time over a certain
amount of days.

I owned up, but a colleague _swore_ he had evidence to prove it wasn't me,
even though I was sure it was.

~~~
e1ven
Sorry to hear it! Did you mount it to /tmp directly, or a subdirectory under
/tmp? The reason I ask is the manpage says that tmpreaper should not switch
filesystems, so I would think it shouldn't ever descend to /tmp/mountpoint :/

~~~
nailer
/tmp/mountpoint. IIRC it was tmpwatch ON RHEL 4.

------
deathanatos
In the rm section of horror one story hints towards, and another explicitly
notes,

> .* expands to ../*

Not on my box it doesn't. Was this true in some (rather cruel) past?

~~~
wazoox
Actually it simply expands to .., so rm -rf .* happily deletes ../, therefore
./, etc. If you're in /tmp at the time you're making this mistake, it will
clobber /. Current GNU rm may be more careful, though, and warn you of
unintended consequences.

~~~
deathanatos
Actually, that was my point: for me, .* does not expand to .. or .; the
special entires "." and ".." appear to not be considered during globbing.

After closer inspection, it turns out that this is because I'm a zsh user:

    
    
      % zsh -c 'echo .*'
      zsh:1: no matches found: .*
      % bash -c 'echo .*'
      . ..
    

Indeed, zsh's manual states:

> _No filename generation pattern matches the files `. ' or `..'._

… and I've just added another entry to my list of reasons as to why I love
zsh.

~~~
nailer
Same safe behaviour happens when using bash and BSD rm (OS X):

    
    
        $ mkdir foo
        $ cd foo
        $ touch 1
        $ touch 2
        $ touch 3
        $ rm *
        $ ls -la .
        total 0
        drwxr-xr-x   2 mike  wheel   68  3 Sep 14:41 .
        drwxrwxrwt  18 root  wheel  612  3 Sep 14:41 ..

------
mmosta
If you'd care for some fiction, the BOFH [0] series is tangentially related.

[0] [http://bofh.ntk.net/BOFH/index.php](http://bofh.ntk.net/BOFH/index.php)

------
johnchristopher
I see a lot of `it bites if you forget this very simple glob thing' on Unix.

Do such horror stories exist for Windows ?

------
kjs3
Tangentially, the fellow who set up yak.net was my (an many other GaTech
students) first Unix mentor in the mid-80s. I owe him a pretty big debt.

