
How “Exit Traps” Can Make Bash Scripts More Robust and Reliable - striking
http://redsymbol.net/articles/bash-exit-traps/
======
sigil
Traps are neat but beware, they aren't completely reliable. You can't trap
SIGKILL or SIGSTOP. In the article's examples, a SIGKILL would (1) leave temp
directories around, (2) fail to restart a service, and (3) leave an expensive
AWS resource running.

Remember that SIGKILL isn't always the result of a human typing `kill -9`
either: the Linux OOM killer sends it; all unixes potentially send it during
shutdown and runlevel switching; programs like timeout(1) send it as a last
resort.

Here are some other ways to approach the 3 examples:

1) Avoid temp files and directories if you can. Sometimes you can't, but
anecdotally I come across LOTS of shell scripts that create a temp file when
they could have used a pipe. Bonus: pipes are fast.

2) Insuring a service comes back up after maintenance: use a process
supervisor with automatic restarts, and have the service script grab a startup
lockfile first thing. Use a blocking flock(1) or setlock(8) and discard the
lock fd immediately afterwards. To bring the service down for maintenance,
grab the startup lockfile, stop the service, then do your thing. Once your
maintenance script exits -- through any means, including SIGKILL -- the kernel
automatically releases the lock and the hitherto blocked service continues
starting up.

3) Capping expensive resources: if the EC2 instance truly is temporary, why
not impose a timeout and police all such instances out-of-band, with an alarm?
The article is right that omissions of this kind can be $$$.
[https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html)

~~~
tzs
> Traps are neat but beware, they aren't completely reliable. You can't trap
> SIGKILL or SIGSTOP.

A long time ago, there actually was a sneaky way on some Unixes to trap
SIGKILL. If a program was being run under ptrace then any signal would pause
the program and alert the program that was doing the trace--even SIGKILL.

So I made a program that I named "sh" and carefully made to have the same
memory size as /bin/sh, that just forked and exec'ed another program of mine
under ptrace. The other program was named "superman". Whenever my fake sh
received notification that "superman" had received a signal, it would write
the number of the signal into a variable in superman's address space, and then
make it so "superman" continued but with the signal changed to SIGINT. The
SIGINT handler in "superman" would would check that variable to see the real
signal, and print an appropriate smart remark.

I started this running, then went to the head system admin/system programmer
and told him something was wrong and I couldn't kill my program. After seeing
that ^C and ^\ did nothing useful he logged into another terminal, became
root, found "superman" with ps, and did a kill -9.

The look on his face was priceless when "superman" just printed something like
"SIGKILL is not strong enough to harm a Kryptonian!" and continued running.

I was a little sad when later Unixes made SIGKILL kill processes being traced.

~~~
sigil
I love this story. Thanks for sharing!

------
redsymbol
Author of article here (but not submitter). I wrote this years ago, and it's
fun to see it pop up on HN like this every year or two!

Here's another bash article people like:

[http://redsymbol.net/articles/unofficial-bash-strict-
mode/](http://redsymbol.net/articles/unofficial-bash-strict-mode/)

~~~
tzahola
In the past few years, whichever company I was working at, I was relentlessly
evangelising the unofficial bash strict mode!

~~~
v_lisivka
Bash strict mode forces to write clean scripts, with each potential error
handled. I use it for more than decade. I wrote lot of bash scripts at
Bazaarvoice.

BTW. I developed small set of common functions for strict mode, called bash-
modules. Can you look at it and provide some feedback? See
[https://github.com/vlisivka/bash-modules](https://github.com/vlisivka/bash-
modules) .

------
hossbeast
Your process can always segfault, or the machine lose power. A more robust way
of handling this kind of thing is to always clean up the leftovers from
previous runs, at the start of the program. This also has the advantage that
the state from the most recent run is always available, for debugging /
analysis.

Also on linux, you can use unnamed temp files, see O_TMPFILE in
[http://man7.org/linux/man-pages/man2/open.2.html](http://man7.org/linux/man-
pages/man2/open.2.html)

~~~
jwilk
How do you use O_TMPFILE from a shell script?

~~~
hossbeast
You can't use it directly from bash, but its readily available in perl, with
the POSIX module.

Even in bash, you can unlink your temp files before you use them, and pass
them around by file descriptor rather than by name.

~~~
jwilk
> its readily available in perl, with the POSIX module

Oh? I don't see anything like this in the Perl codebase.

On the other hand, O_TMPFILE is available in Python since 3.4:

[https://docs.python.org/3/whatsnew/3.4.html#os](https://docs.python.org/3/whatsnew/3.4.html#os)

------
pedro84
I've used a variation of this that catches other signals as well:

    
    
      trap 'rc=$?; trap "" EXIT; cleanup $rc; exit $rc' INT TERM QUIT HUP
      trap 'cleanup; exit' EXIT

~~~
eridius
This is unnecessary. The EXIT trap fires on any exit from the shell, not just
graceful exit.

~~~
jwilk
The behavior varies with shell:

    
    
        $ cat test_trap 
        trap 'echo exiting' EXIT 
        kill $$
        
        $ bash test_trap
        exiting
        Terminated
        
        $ ksh test_trap
        exiting
        Terminated
    
        $ mksh test_trap
        exiting
        
        $ dash test_trap
        Terminated
        
        $ zsh test_trap
        Terminated

~~~
eridius
This article is specifically about Bash. I'm not surprised that `trap`
behavior varies per shell, and I'd recommend reading the documentation on it.

------
Ysx
Beware that you can only have one EXIT handler. I've used this in the past
figuring it's analogous to Go's `defer`, but unfortunately not.

e.g this script:

    
    
        #!/bin/bash
        trap 'echo Handler 1' EXIT
        trap 'echo Handler 2' EXIT
    

Will only call 'Handler 2' on exit.

~~~
orivej
I wrote `defer` (or rather `atexit`) for Bash:
[https://github.com/orivej/bash-
traps/blob/master/atexit.sh](https://github.com/orivej/bash-
traps/blob/master/atexit.sh)

------
natecavanaugh
The author mentions that he's always discovering new user cases for this, and
it does look handy. I wonder if there are any cookbook/recipe style documents
around that showcase some of those?

------
fao_
I get an SSL error for this page:

    
    
      redsymbol.net uses an invalid security certificate. The 
      certificate is only valid for the following names: 
        mobilewebup.com, www.mobilewebup.com 
      The certificate expired on 08 July 2011, 04:16. 
      The current time is 14 January 2018, 02:14.
      Error code: SSL_ERROR_BAD_CERT_DOMAIN

~~~
eridius
It's an http: link. Why are you trying to load it as https:?

~~~
simcop2387
Bad detection with https anywhere or similar browser extension is what i'd
expect

~~~
fao_
Oh wow. Thank you for that. I forgot I had set it to force HTTPS.

------
sigzero
Korn Shell has trap as well.

~~~
kps
Although many of bash's extensions were borrowed from ksh (occasionally even
compatibly), 'trap' was in the original Bourne sh.

