
When Bash scripts bite - yminsky
https://blogs.janestreet.com/when-bash-scripts-bite/
======
mambodog
I highly recommend ShellCheck[0] if you're writing any bash. With the warnings
and stylistic advice it provides, I feel like I can actually be confident that
my scripts are doing what I think they're doing.

[0]:
[https://github.com/koalaman/shellcheck](https://github.com/koalaman/shellcheck)

~~~
helb
I second this, it's a great tool.

However, don't let it give you a false sense of security.

Shellcheck probably won't say anything about the "biting" parts from the
article – because they are valid, just behaving a bit different than the user
expects…

That goes for any linter or static analyser i guess.

~~~
bjackman
Dunno about this particular case but I'd say linters are exactly for finding
things that are valid but behave differently than the user expects.

E.g. "if (x = 1)" is valid C but i'd be pretty unimpressed if a linter didn't
flag it up!

------
peff
This doesn't mention one of my favorite bash gotchas, which is using `set -e`
with `pipefail` at all. Try this:

    
    
        set -euo pipefail
        yes | head
    

That will consistently exit because `yes` gets sigpipe and quits. Which is
expected, but triggers a script exit. But more exciting is that something
like:

    
    
        generate_data | head
    

only _sometimes_ fail. It's a race that depends on whether generate_data is
able to stuff all of its data into the pipe buffer before head calls close().

EDIT: I seemed to remember sharing this bug not too long ago, and indeed I
did. pixelbeat responded with some interesting links:
[https://news.ycombinator.com/item?id=13940628](https://news.ycombinator.com/item?id=13940628)

~~~
xyzzy_plugh
I don't know that this obviates the need to check PIPE_STATUS after the
statement.

As another poster mentioned, set -e and set -o pipefail are crutches and tell
me the script it sloppy.

------
Shivatron
As an aside, I like reading Jane Street's blog. They're one of the few
companies in our space (I'm also in "automated trading") that discusses even
non-proprietary stuff openly. When you work in an information vacuum, it's
comforting to know that presumably similar people face the same challenges.

When the author wrote "a particular production bash script (if that doesn't
sound horrifying, hopefully it will by the end of this post)," I couldn't help
but smile...

~~~
foota
They're a pretty neat company from what I've seen, they also host a puzzle
site here:
[https://www.janestreet.com/puzzles/](https://www.janestreet.com/puzzles/)

I interviewed there, but unfortunately didn't make the cut.

~~~
nailer
Same here. Interviewed with Jane St along time ago, got rejected in the end
but came away massively impressed.

------
networked
The error handling is one reason I prefer Tcl to Bash or POSIX sh if I need a
nontrivial script that shells out to other programs. Tcl handles errors,
substitution, etc. properly (unsurprisingly) by default:

    
    
      > set foo [exec /bin/true][exec /bin/false][exec /bin/true]
      child process exited abnormally
          while executing
      "exec /bin/false"
    
      > exec true | false | true
      child process exited abnormally
          while executing
      "exec true | false | true"
    
      > set sp "hello world"; exec echo $sp; # No need to quote $sp there.
      hello world
    

Not all shell-like things are as convenient to do in Tcl as they are in sh
(the most significant difference to me is that you cannot pipe to or from
functions), and it is more verbose, but because everything is a string in Tcl
I find that it integrates with *nix (or Windows!) command line programs better
than other scripting languages. E.g.,

    
    
      > lmap x [split [exec ps | tail -n +2] \n] {lindex $x 0}
      4540 5767 5768 31161
    

What happens here is that you take the output of `ps | tail -n +2`, split it
on newlines, then map over it treating each line (that is something like "5814
pts/0 00:00:00 ps") as a list and taking the first element in it. The result
is a list of PIDs (a string containing the PIDs separated by whitespace).

I can recommend trying Tcl to anyone fighting Bash who doesn't want to replace
it with Python/Ruby/etc. If you try it, though, use version 8.6 or at the very
least 8.5. The previous versions are EOL but are still common in the wild. If
a recent Tcl is not available on your system, you can build a self-contained
static binary interpreter with
[http://kitcreator.rkeene.org/kitcreator](http://kitcreator.rkeene.org/kitcreator).

~~~
blacksqr
>you cannot pipe to or from functions

You, like many others, underestimate Tcl. I present to you, command pipes for
Tcl:

[http://wiki.tcl.tk/17419](http://wiki.tcl.tk/17419)

Tcl can be any language you want, because it's already the language you need.

~~~
networked
>You, like many others, underestimate Tcl.

Not in this case, at least :-), since on that wiki page there is one
implementation of command pipes that I wrote and a link to another (in
fptools).

What I mean when I say that "you cannot pipe to or from functions" is that
while you can pipe data from one external process to another in an [exec]
command, you can't pipe it from an external process to a proc or vice versa,
which would be analogous to what the POSIX shell can do. Functions and
external processes are less similar in Tcl than they are in sh.

The best you can do with a simple command pipe implementation like those on
the wiki page is pipe data from [exec foo] to [bar $arg] to [exec baz <<
$arg], which, unlike [exec foo | baz], will read foo's entire output before
passing it along. The only real solution that I know to allow you to treat
snippets of Tcl code and external processes as basically interchangeable in a
gradually read pipeline is rkeene's pipethread library. I am a big fan of it,
but using an external dependency that isn't in your operating system's package
repositories adds considerable friction to writing and distributing a simple
script. (I do hope pipethread ends up in Tcllib.)

------
unhammer
Not that people should be expected to know this, but here's an idiom that does
work:

    
    
        if res="$(ldap-query-for-valid-users)"; then
            echo "($res)" > "/tmp/all-users.sexp"
        else
            handle_failure
        fi
        
    

I second the recommendation to use
[https://github.com/koalaman/shellcheck](https://github.com/koalaman/shellcheck)
– you really shouldn't be writing shell scripts without it – but in this case
it doesn't seem to handle the issue (with default settings at least).

~~~
CyberShadow
Since 'set -e' is used already, a simpler variation is sufficient:

    
    
      result=$(ldap-query-for-valid-users)
      echo "($result)" > "/tmp/all-users.sexp"
    

Decoupling the process substitution from another invocation allows the error
to be detected.

------
ben0x539
Slightly more useful than `set -e` is `set -E` and `trap exit ERR`.

But still there's basically no way to make this consistently useful.

Even if you religiously set -e/E in every scope just in case, if you're
anywhere in a scope inside the non-final operand of a bunch of &&/||s, or as
the conditional expression in a control structure or whatever, -e/E will just
do nothing, you can't turn it any more on, you just don't get early
termination on errors no matter how many nested function calls you're actually
removed from the original ||. It's not great.

------
dmoreno
I was just thinking how cool would it be to have a module in python to really
ease generating bash style scripts, with less overhead than normal
`subprocess` methods have... And there was it just a Google search away:
[https://pypi.python.org/pypi/sh/1.12.13](https://pypi.python.org/pypi/sh/1.12.13)

~~~
annnnd
Ok, looks great, but... I wonder what "gotchas" this module (and Plumbum) has?
Does anyone use this in production?

EDIT: just learned that PBS is now sh.py, so I removed it from the comment.

~~~
daenz
Author of sh.py here. It seems to be used by many people in production around
the world. I've actively maintained it since 2011, and it supports python
2.6-3.6, inclusive. Most of the gotchas have been worked out by now, but the
FAQ covers the most common stumbling blocks:
[http://amoffat.github.io/sh/sections/faq.html](http://amoffat.github.io/sh/sections/faq.html)

~~~
annnnd
Thank you! Can I suggest that you link to documentation from your pypy page?
Nice work btw.

~~~
radiowave
Yes please do that. I failed to find the documentation when I had a brief
click around earlier today. (Mind I wasn't trying very hard.)

------
dtoma
Does error handling in your bash scripts ever become annoying enough to
warrant a rewrite in, say, OCaml?

~~~
viraptor
Definitely not OCaml, but usually python or ruby will do the job better.

~~~
BoorishBears
We use node at work now. The fact is it's not worth it trying to get everyone
to write good bash scripts. Js doesn't have nearly as many foot guns and
there's a library for just about anything you could want to do.

~~~
mbrock
Error handling in asynchronous JavaScript is quite notorious for being easy to
screw up.

~~~
nilliams
True for async code, but if you use something like ShellJS [1] (which would be
an easy way to rewrite a bash script that was getting out of hand), all
commands are synchronous, so error handling is just try/catch.

[1]
[https://www.npmjs.com/package/shelljs](https://www.npmjs.com/package/shelljs)

------
zwischenzug
I was in a meeting with a major vendor and a bunch of fintech leads recently
and major vendor's techie said '...no-one likes bash scripts...'

After the meeting I said to colleagues, 'I quite like bash scripts, actually',
and they all said 'I thought that too...'

~~~
oblio
Well, shell scripts are extremely convenient. It's easy to write very simple
ones since they're a natural extension of shell usage.

So it's really easy to go from

$ command1

$ command2

to:

#!/bin/bash

command1

command2

As a result, everybody "likes" bash scripts since they're so easy to write,
initially.

And these shell scripts will work well in 80% of situations and you can get
them to 90-95% just by sprinkling a few ifs around.

Once thes script becomes longer and more complex and you want to make them
easier to read and DRYer, that's when the pain begins. Or when you need to
handle a bit of logic which would be trivial to do with better data structures
such as arrays or dictionaries or sets...

You can work around these issues, of course, but almost every workaround is
either ugly or a hack (as someone was joking, "elegant hacks" for older Unix
hackers, "gross hacks" for younger ones :) ).

~~~
growse
I agree - in my experience bash is extraordinarily easy to write the first
time. It becomes a nightmare to maintain, because most people aren't actually
that familiar with how bash error handling, flow logic etc. actually works.
It's bad because it fails in surprising, non-conventional ways.

~~~
kagamine
I just over-log absolutely everything. If it fails log it. If it succeeds and
you might need to know about it, log it. That's not great error handling, but
it helps with debugging.

~~~
majewsky
What about "set -x"? Same effect, but you only have to flip the switch once.

------
ben0x539
Another fun one is that

    
    
        set -e
        export x=$(false)
        echo ok
    

prints ok, but

    
    
        set -e
        export x
        x=$(false)
        echo ok
    

exits early because of the `false`.

~~~
unhammer
But here [http://www.shellcheck.net/](http://www.shellcheck.net/) gives

    
    
        Line 4:
        export x=$(false)
               ^-- SC2155: Declare and assign separately to avoid masking return values.

------
jwilk

        echo ... > "/tmp/all-users.sexp"
    

No, this is not a secure way to create temporary files.

Please use mktemp(1).

~~~
darkerside
Honest question: what security principles does this violate?

~~~
Karunamon
/tmp generally has more open permissions than the rest of the file system, and
if you're using a known file name, you're open to that file being abused by an
attacker.

I know of no good reason to not use mktemp.

------
RijilV
What this really boils down to is you should avoid subshells where possible
and handle this with either variable assignment or and streams (pipes).

eg:

set -euo pipefail

foo() {

    
    
       false
    
       echo "hello world"
    
     }
    

variable=$(foo) # fails

foo | do_stuff # fails

My preference is to handle things as streams

------
shmerl
Yes, subshells can be tricky. Don't rely on stuff like pipefail, rather check
return codes and react accordingly. Bash also provides a way to read return
codes of piped commands. See PIPESTATUS in man bash.

------
10165

        echo ($(ldap-query-for-valid-users)) > /tmp/all-users.sexp
    

should be something like

    
    
        x=$(ldap-query-for-valid-users);
        test ${#x} -gt 0||exec echo no valid users >&2;
        echo \("$x"\) > /tmp/all-users.sexp;
    

This way they would get the message "no valid users" to stderr and the script
would exit. According to the blog post that is what they wanted.

Alternatively,

    
    
        x=$(ldap-query-for-valid-users);
        test ${#x} -gt 0||exit 100
        echo \("$x"\) > /tmp/all-users.sexp;
    

if they prefer a nonzero exit code to a message to stderr.

------
infinity0
A related annoyance is that when you write a for-loop in a Makefile, every
iteration gets run even if one of them fails, due to how the shell calculates
exit codes of for-loops.

So most of the time you should do "set -e; for XXX". Otherwise your Makefile
loops will "succeed" incorrectly.

------
iamNumber4
How about just writing safe code.

Really... Check your damn return values. set -e is a crutch of a sloppy
programmer.

Ok, yes; you can do this really slick thing in one line by stringing together
a bunch of commands. However, just because you can does not mean you should.

Bash makes it simply with 'if ! <command>; then <failure commands> fi'. Try
not to string ten things together. Keep conditional true state in your scripts
execution flow.

~~~
jstimpfle
That means that you have to write everything to temporary files first. If you
want to do that then maybe Python is already more convenient.

~~~
kps
> That means that you have to write everything to temporary files first.

Using the example from the post,

    
    
        echo "($(ldap-query-for-valid-users))" > "/tmp/all-users.sexp"
    

you have at least a couple alternatives to an additional temporary file.

    
    
        if ! valid__users=$(ldap-query-for-valid-users)
        then
          ... failure case ...
        fi
        echo "($valid__users)" >/tmp/all-users.sexp
    

or

    
    
        {
            echo -n '('
            if ! ldap-query-for-valid-users
            then
                ... failure case ...
            fi
            echo ')'
        } >/tmp/all-users.sexp

~~~
btilly
And when multiple bash scripts do that, it is easy to have them accidentally
overwrite each other's temp files. Plus an attacker can play with temp files
in a number of interesting ways to turn their use into an attack.

~~~
kps
I was just illustrating the error checking using commands functionally
equivalent to the original post. Temporary files are best avoided entirely
unless some command actually requires a seekable file, in which case use
mktemp(1) or equivalent (and 'trap 0' to clean up after yourself).

------
wdroz
I use Xonsh[0] for scripting now. I explain why in my blog[1], basically
because we are in 2017 and I like python.

[0]: [http://xon.sh/](http://xon.sh/)

[1]: [https://william-droz.com/xonsh-a-modern-shell-that-enable-
py...](https://william-droz.com/xonsh-a-modern-shell-that-enable-python-in-
your-terminal.html)

~~~
therealmarv
then why not use python? Installing another shell just for running another
script seems wrong unless it is only for you personally and you are ok with
this.

~~~
wdroz
Xonsh let you interact with other bash commands.

    
    
      >>> out = $(echo @(x + ' ' + y))
      >>> out
      'xonsh party\n'
      >>> @("ech" + "o") "hey"
      hey
    

In raw python, you have to play with subprocess by yourself.

------
jstimpfle
I don't think the problem is (ba)sh the language, but rather the idioms it
implements.

It's usually a bad idea to begin writing the result before knowing that all
input is there. That's like a server announcing a 200 OK and beginning to
stream, only later detecting I/O error. It's difficult to deal with such a
server as a client.

More generally, in pipelines we deal with pairs of programs that are only
connected by a text stream, with no possibility to communicate out-of-band
conditions. In `PRODUCER | CONSUMER`, PRODUCER can't tell from a sigpipe
whether CONSUMER crashed or has read all the data it needs. And CONSUMER can't
know whether PRODUCER crashed or if it should take action on the results.

Most scripts don't really need the take-action-immediately level of
concurrency. It's sometimes nice to be concurrent (use multiple CPUs at once).
But the cases where the processed data can't be buffered at least in a
temporary file before taking action are really rare.

------
supremesaboteur
Some more edge cases :
[http://mywiki.wooledge.org/BashFAQ/105](http://mywiki.wooledge.org/BashFAQ/105)

------
therealmarv
When I see very long bash scripts I get angry. There is a reason why perl &
later python were invented and they are both in 98% of cases available on Unix
compatible systems.

Not that I'm not ok with short bash scripts or for systems were you only have
bash... but if your Unix compatible system only has bash there is something
else wrong with your system in my point of view.

~~~
arghwhat
I'd much rather have 200 lines of bash than the 300-400 lines of python to
replace it.

I have recently rewritten a lot of Python and Perl (over 10kLOC) to a small
collectiom of 10-ish bash scripts that weigh in at about 700LOC total. There
are some tasks where Bash is the only appropriate tool.

Much simpler code, much easier to read, and a _lot_ faster too.

~~~
omribahumi
How is bash faster than Python?

~~~
arghwhat
When what you need is a pipe, Python won't do anything positive to
performance. Plus, CPython is horrendously slow in general.

Other than that, the simpler design also allowed greater parallelism.

------
sambe
Yeah, naively using errexit won't get you the whole way. BashFAQ, ShellCheck
and StackOverflow have quite a bit of information, and you can often write
things that are pretty robust if you take in all the information. Particular
cases that come to mind are:

1) command substitution. you can catch this with an explicit error handler
inherited by child processes. 2) pipefail SIGPIPE false positives. pretty
hairy, command dependent whether this is a "real" error. often not, so you can
work around it by ignoring SIGPIPE 3) process substitution. As far as I know,
there is no way to workaround this whilst still using the convenience syntax.
You have to carefully use explicit named pipes and carefully use wait on the
PIDs (carefully!). Maybe still with races...

In my experience, you can write moderately robust shell scripts if you care
enough and use all these flags and linters. But by the time you are at this
stage you probably shouldn't be using shell scripting. More like training to
spot problems in other people's code.

------
a_bonobo
This works but it gets ugly when you have to use 'set -e' everywhere:

set -e

    
    
      foo() {  
          /bin/false  
          echo "foo"  
      }  
      echo "$(set -e; foo)"

------
mannykannot
The use of global values within programs is generally deprecated, for good
reasons. The use of global values that modify the semantics of its interpreter
probably deserves even more skepticism.

The inheritance of these flags by subshells, which may have been written
assuming different semantics, would be potentially even more problematic, so I
think bash is right in this case, though arguably the inheritance could be
limited to subshell code defined in the same file.

------
memracom
set -euo pipefail should NEVER be used in a bash script. If you feel tempted
to do so, then you are trying to make the shell scripting language be
something that it is not.

This is the time for a full scale programming language such as Python or
perhaps Groovy on the JVM or Go language. When you need to write robust code,
use the tools that were created for writing robust code.

Bash just has too many quirks.

Note that this is related to the most common way that people build a Big Ball
of Mud. You have a simple app and you need a couple of features so you add
them on. Rinse and repeat. Before too long you have an app that does too much
and was never designed/architected to do that much stuff. You are probably
ignoring a number of techniques for integrating functionality in large apps
such as message queueing, microservices, separate libraries or packages,
multiple languages.

Shell scripts suffer the same trajectory towards too much complexity. When you
see it happening, and before the task gets too complex, replace the script
with an app and apply all the normal software engineering techniques to make
it robust.

~~~
vorg
The good thing about Apache Groovy (on the JVM) is when the big ball of mud
starts forming, because its syntax is close enough to Java's, it's easy to
quickly split parts off and rewrite it in Java. Only the lambda syntax and
some other inconsistencies like the behavior of == are different.

------
equalunique
I wasn't sure about the title of this post chose to view it when I saw
janestreet.com

------
known
Use sh -c

-c Read commands from the command_string operand instead of from the standard input. Special parameter 0 will be set from the command_name operand and the positional parameters ($1, $2, etc.) set from the remaining argument operands.

------
tobyhinloopen
I still think Bash scripts (or unix scripts, or shell scripts, or whatever
they are called) are awful.

------
user5994461
Use ansible instead of bash.

Everything bash can do, ansible can do it better.

------
draw_down
There are so many awful things about shell scripting. It's always so difficult
to do simple things with it.

------
nialv7
Why write new bash scripts in this day and age...

~~~
riquito
They are fast, ubiquitous, 0-dependencies, short, espressive... most of the
time. The fact is that we don't have anything better to glue commands together
(afaik).

~~~
Slackwise
> 0-dependencies

Actually, quite the opposite. All that shell scripts do is glue together other
programs they depend on.

That makes them not portable. You can't just move a bash script from Linux to
macOS because it uses ancient pre-GPLv3 versions of bash and all the other GNU
utils.

~~~
jstimpfle
Just use POSIX-compatible sh instead of bash. It's also not hard to use mostly
POSIX-compatible commands (grep, sed, cat, cp, mkdir ...).

For scripts where you need something unstandardized, say, imagemagick, you
would wind up with up with a dependency in most other languages as well.

