Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Defer for Shell (2020) (cedwards.xyz)
91 points by harporoeder on June 20, 2023 | hide | past | favorite | 49 comments


> This is known as idempotency.

No? Idempotency is ensuring that calling the same function with the same input N times will have the exact same effect as calling it once. Adding this defer does _not_ guarantee idempotency, because it has no impact on the body of the script. You can provide just as much idempotency by running `rm -rf /tmp/$script_dir*` on the first line.


https://arslan.io/2019/07/03/how-to-write-idempotent-bash-sc...

I am not sure how autoritative this is but I immediately thought of this post. (It's the vim-go guy)


While this is great, this only cleans up temporary files on a clean exit (well, depending on the shell), and no trap can catch SIGKILL. I much prefer cleaning up my files _before_ I use them, like:

```

touch 'temp.txt'

exec 3 < 'temp.txt'

exec 4 > 'temp.txt'

rm -f 'temp.txt'

```

Now I have file descriptors 3 and 4 available for reading and writing, respectively, and even if I pulled the plug out from my desktop the file will not be anywhere on my filesystem.


There is still a small window between touch and rm for the file to not be cleaned up. There is no easy way around it with standard shell and coreutils, but there could be a tool that would use memfd_create(2) and exec user's command passing the /dev/fd/XXX filename somehow.

A simple case could work like this:

  $ memfile /usr/bin/echo
  /dev/fd/3
  $


Can you use something similar for lock files?

For instance, I have some script which looks like this:

    function cleanup() {
        pkill -P $$  # Kill child processes.
        rm -f myscript.lock
    }
    trap cleanup exit

    lockfile -r 0 myscript.lock  # Try to create lock file once.
    if [[ $? != 0 ]]; then
        echo ":("
        exit 1
    fi
    
    # Work work...
    cleanup
Like you said, the cleanup handler is not invoked if the script receives a SIGKILL. Since the lock file needs to be present in the FS (contrary to a tmp file), maybe this is some inherent limitation, but I would be curious if some better approach exists.


If you're only trying to coordinate among sub-processes of your top-level script (ones that are at some point forked from the root script and duplicate its file descriptors), there's probably a way to get an approach similar to mine working, though I'd have to think a bit about exactly how to do so. Probably something along the lines of using a FIFO/pipe file in a similar way that I use a normal file in the original snippet.

If you're trying to coordinate multiple invocations of the same script, you're out of luck trying to adapt my approach - something has to manage a resource for multiple unrelated processes, and thus they're all going to need a stable identifier of some sort to access it. There might be something mildly better than a filesystem-based lockfile, but it's almost certainly not worth the complexity/unfamiliarity to other people who might use your code. You're probably best off putting a file in a dedicated temporary directory like `/tmp` and relying on out-of-band cleanup to catch things like SIGKILL that will fall through the cracks of your normal cleanup.


Yeah my use case is unfortunately the latter.

Maybe the out-of-band cleanup you suggest might be something like a simple daemon which is invoked with `$$` and `path/to/myscript.lock`. Which would then poll for the presence of a process with the given PID and deletes the lock file.


I believe the EXIT trap is not clearly defined in POSIX, so you should also trap the other signals you care about (SIGINT, SIGTERM, ...)


Same here. I'm maintaining a script which generate a file, sets up a loopback device and dmcrypt, then mounts the resulting map. Every of these operations but the mount can be done in a way that the kernel will remove them once they're not needed anymore, i.e. unmounting the filesystem leads to the dm-crypt map being discarded, leads to the loopback device being removed and the space reserved by the file being freed. This even persists past the exit of the script.


If I understand correctly, this works more like a pipe and not a file in the sense that you can only read what you wrote once, right? But a nice trick nonetheless.


> you can only read what you wrote once

In the way I’ve written it, yes. But you can open as many file descriptors as you need before you `rm` the file, and they each will keep their own position in the file. You can’t do a dynamic number of reads through the file (at least without dumping the contents somewhere else first), but a fixed number greater than one is absolutely possible.


Ah, nice. Thanks!


Heh! Not sure if it's a cosmic coincidence, but I posted something (https://news.ycombinator.com/item?id=36410781) similar based on https://github.com/bashup/events a few hours ago in another thread.

The version here's much simpler, which has its merit if you need the performance or have a simple (append-only, exit-only) use-case.

bashup.events supports removing handlers, one-time events, pseudo-promises, and more. It's easy to use it for something like this, but also modular deferred init/load and more.

(linked comment has a little more detail on how I use it.)


I wrote something similar some years ago: https://codereview.stackexchange.com/questions/67417/an-atex...

I do not like the way you build expressions by concatenating strings. This is the driving force behind SQL injections.


> I do not like the way you build expressions by concatenating strings.

Unfortunately this is the only option if you're trying to preserve POSIX shell compatibility - arrays are implementation-specific additions to the specifications.

I do not like the way Bash has become synonymous with shell, and appreciate people making an effort to make their scripts work across a large variety of different systems.


I'm adding this to my dotfiles because it just seems so dang useful, but this really seems to be the intersection of elegance and hackery that's the epitome of shell scripting. A close second would be that "fork bomb" that's just a bunch of :{}|{} or something weird like that, but `defer` here is actually useful!


:() { :|:& };:

It's easy to read if you know what's happening - it's defining a function called ":"


well damn, I never really did look into why this is a forkbomb but this just made it extremely clear in one short sentence. thanks


This seems conceptually similar to python's `with` statement.

https://docs.python.org/3/reference/compound_stmts.html#the-...


Yeah, with also provides a nice finalization call, but there are a few differences: - with requires the subject to implement a particular interface. - multiple with calls either nest, or they execute in the order they’re written, as soon as you exit the block. Multiple withs nest uncomfortably deep.

Go’s defer, and this she’ll one, turn statements into a stack where the past defer executes first on exit.


contextlib from the standard library, specifically contextlib.ExitStack, provide the rest of the patterns you might want to use with 'with'. Including removing the need for nested 'with' block, and non-lexical finalization orders.


More like C and Python's atexit



I’m trying to figure out a way this can backfire horribly but falling short of it. Curious to test the interaction with set -e if your defer command also errors. Very clever.


Well, for one thing, it doesn't quote arguments correctly.

If you do this...

    touch "a b"
...it will create one file called "a b".

But if you do this...

    defer touch "a b"
...it will create two files called "a" and "b".


  #!/bin/sh

  DEFER=

  defer() {
    local i
    local cmdline=""

    for arg in ${@+"$@"} ; do
      [ -n "$cmdline" ] && cmdline="$cmdline "
      case $arg in
      *"'"* )
        case $arg in
        *[\"\$\\]* )
          cmdline="$cmdline'$(printf "%s" "$arg" | sed -e "s/'/'\\\\''/g")'"
          ;;
        * )
          cmdline="$cmdline\"$arg\""
          ;;
        esac
        ;;
      *'"'* | *['$*?[(){};&|<>#']* | '~'* )
        cmdline="$cmdline'$arg'"
        ;;
      *' '* | *'        '* ) # literal tab between second pair of quotes
        cmdline="$cmdline\"$arg\""
        ;;
      * )
        cmdline="$cmdline$arg"
        ;;
      esac
    done
    DEFER="$cmdline; $DEFER"
    # printf "DEFER=%s\n" "$DEFER" # debugging
    trap "command $DEFER" EXIT
  }
(The quoting code is complicated because it's taken from a program that quotes "neatly". Words that don't need quoting aren't quoted, and single or double quotes are used, whichever is nicer. It's not valuable if all you do is execute, but it could help the debug printf be more understandable.)


No need for the gobbledygook. Just use printf %q, which will quote neatly, too.

    defer() {
       new_cmd="$(printf "%q " "$@")"
       DEFER="${new_cmd% }; $DEFER"
       trap "command $DEFER" EXIT
    }
Yeah, %q is not in POSIX, so this won't work with e.g. dash – but trying to stick to POSIX is almost always a dumb idea.


(And, BTW, the original code, for all its escaping gymnastics, will only work correctly in bash anyway)


Did you spot anything in the for loop that is Bash specific?

That was taken from code that doesn't use local and which absolutely has to work in various other shells.

I just happened to plant it into a function that uses the local feature. That is not just a Bash feature; it's in the Korn Shell, and Dash has it too.


The trap EXIT only does the right thing in bash (run on any exit condition, try-finally-style). For dash and unfortunately even for zsh, you'd have to specify multiple signal handlers:

     > bash -c 'trap "echo cleaning up" EXIT; while :; do sleep 1; done'
     ^Ccleaning up
     > dash -c 'trap "echo cleaning up" EXIT; while :; do sleep 1; done'
     ^C
     > zsh -c 'trap "echo cleaning up" EXIT; while :; do sleep 1; done'
     ^C
What makes it even more irritating is that you need to manually clear the handlers and kill yourself if you trap INT etc, so you can't just add a few more signals to the trap clause above either.

IMO the lack of [[, print %q and sane trap semantics, plus the fact that >99.9% of systems that have a posix shell also have bash means that outside of very specific circumstances (e.g. it needs to run in busybox) trying to limit oneself to posix sh is ill-advised.


In general, limiting yourself to POSIX has the benefit that you think harder what not to do in the shell.

If something can't be done easily in POSIX scripting, it's probably a poor fit for shell programming, even if it can be done nicely in Bash.

Writing scripts in Bash only makes sense when you have these assumption:

1. Every system you care about has the FOSS language "GNU Bash" installed.

2. At least one of those systems has no other FOSS language you could use for programming that is vastly better than Bash.

3. At least one system installation meeting condition (2) is locked down; nothing can be installed.

If just one of these doesn't hold, you have one reason or another not to use Bash. (If just 3 doesn't hold, you do have a reason to use it: installing things on target systems creates a dependency and requires space. That has to be weighed against being stuck with shell scripting.)


> If something can't be done easily in POSIX scripting, it's probably a poor fit for shell programming,

I can see where you are coming from, because I used to think along similar lines. In particular, whilst Bash is an ugly and baroque mess, Bourne shell, at it's core, is actually a simple and elegant design, marred by a handful of irritating flaws. Since neither is particularly suitable for writing robust code, it's very tempting to say stick to sh, and where that does not suffice, pick a better language.

Unfortunately, I was forced to conclude that a) often there isn't a great alternative to shell scripting and b) bash is sufficiently ubiquitous and POSIX did a sufficiently bad job again that POSIX sh is generally not worth bothering with.

There is a handful of features, none of which are unique to bash (ksh has all of them, but sadly lost), but all of which are absent from POSIX sh, that make it significantly easier to write robust scripts. The crucial ones, I think, are:

1. sane trap EXIT (ksh also has this, zsh sadly doesn't but has "always")

2. process substitution (ksh and zsh also have this)

3. printf %q (ksh and zsh also have this)

Not needing to use ugly cruft like [ "x$ans" = xyes ] is also good, but less essential.

Take your code for example (and you're clearly someone with hardcore unix skills): it's much more complicated, slower, and brittle all due to the absence of 1&3.

Is that an indication that you shouldn't write anything that needs to do clean-up in shell script? I don't think so. People write lots of really useful stuff with bash all the time (devenv, nix build steps etc.). I'd argue that all three of your conditions typically hold for anything that's well expressed in < 1 page of bash. The reasons are:

A. When sh is available, so is bash in almost all cases.

B. Any competent devops or sysadmin person can handle bash, and so can a lot of normal developers. The same is not true for any of the typically plausible replacement languages (python, perl, ruby, ...). Furthermore, all of these have trouble expressing some shell idioms equally well and, apart from perl5 which is fairly fossilized at this point, bring significant versioning problems.

C. It's not just about being locked down, it's also about not wanting to add extra attack-surface, maintenance & mental overhead, bloat etc. If you are Jane Street and use Ocaml for everything throughout your org, you can probably (and profitably!) avoid almost all bash scripting, but that's a fairly unique situation.

Basically, my recommendation is to avoid shell scripts for anything where it does not have a clear advantage over python or similar. And where it does, to avoid most of the extra functionality that bash offers over sh for scripting, but to make use of at least 1-3 above where they are a natural solution. Because emulating any of these with posix shell is both painful and error-prone, and all of them are very often useful.


Here's my solution to the problem I pointed out.

The basic idea is let the shell do expansion / word splitting on the to-be-deferred command when defer() is called, then save the words into an array, and avoid doing expansion. This should avoid running into quoting problems in the first place.

    #! /bin/bash

    defer_run() {
      while [ ${#DEFER_OFFSETS[@]} -gt 0 ]
      do
        # Get / remove last element of offset and length arrays.
        offset=${DEFER_OFFSETS[-1]}; unset DEFER_OFFSETS[-1]
        len=${DEFER_LENGTHS[-1]}; unset DEFER_LENGTHS[-1]

        # Get a slice of the main array corresponding to one deferred
        # command's word list. Execute that word list as a command.
        "${DEFER_LIST[@]:$offset:$len}"
      done
    }

    defer() {
      declare -g -a DEFER_LIST DEFER_OFFSETS DEFER_LENGTHS
      trap defer_run EXIT

      DEFER_OFFSETS+=( ${#DEFER_LIST[@]} )
      DEFER_LENGTHS+=( $# )
      DEFER_LIST+=( "$@" )
    }

    defer echo a  b
    defer echo "c  d"
This prints:

    c  d
    a b
The quoted string "c d" is preserved all the way through as one argument to echo. The non-quoted ones work as expected too.

Note that since bash doesn't have lists of lists, I've stuck all the word-lists of all the deferred commands together into one giant array, and I've saved offsets and length so I can separate them out again.

Since expansion doesn't happen again after you've deferred a command, you can't expand (say) a variable when the deferred command runs. But if you want to do that, define a function and defer it, or just use eval:

    defer eval 'echo $myvar'


You mean it doesn't quote arguments automatically, it is safe so long as you always quote the command

    defer 'touch "a b"'
(a usability pitfall I know, but probably not fixable easily)

edit: maybe escaping could be added though, using sed?

    DEFER="; ${DEFER}"
    for arg in "$@"; do
        DEFER="$(printf -- %s "${arg}" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/'/") ${DEFER}"
    done


    printf "%q" "$VAR"
P.S. quotes matter, surprisingly or unsurprisingly because “bash”


I suspect that this works only for bash and won't work in other shells.


It works in bash and zsh, which are pretty much the only bourne-like shells that matter.


It's not lexically scoped, so if you (try to) use this in a function, all cleanups are deferred until exit rather than when the function returns.


So it won't work if you source a script instead of running it?


As written it is vulnerable to shell injection. See the GitHub link at the very end for a fix.


I know, so far, only two languages that use defer: Go and Nim.



Perl has experimental support for defer: https://perldoc.perl.org/perlsyn#defer-blocks


Swift, but it is bound to narrower scopes, not just the function scope like Go. How does Nim scope defer statements?


In Nim, defer is a macro, that rewrites scope into "try: finally:" statement. 'finally' code block is always executed, even if there were any exceptions.

https://nim-lang.org/docs/manual.html#exception-handling-def...


Yeah, my main question was about the scope of the defer but it appears that it's block scoped like Swift, not function scoped like Go.


Also Swift!


Zig has it too.


and harelang.org




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: