
Delegating complex treatments to filters in shell programs (2015) - pwg
https://michipili.github.io/shell/2015/04/12/shell-filter-programming.html
======
jstimpfle
Yes, the critizised code is definitely bad. Glues two user-defined constructs
together (is-in-list and a variable). Also, unquoted variables and relying on
shell splitting...

But you don't necessarily need to shell out to awk / perl / grep / whatever.

    
    
        is_in_list() {
            local thing=$1
            shift
            while [ $# -gt 0 ] ; do
                [ "$1" = "$thing" ] && return 0
                shift
            done
            return 1
        }
    

Now just do `is_in_list "$thing" "$@"` where the positional array `"$@"` is
the list (or by all means use unquoted `$COMPILER_VERSIONS`, i.e. unsafe
shell-splitting).

Another possibility, where the list is given as a space-separated string, is
with case statements. That approach is logically equivalent to the awk
version, but doesn't fork so is much more performant.

    
    
        is_in_list() {
            case $2 in
            $1|*" $1 "*|*" $1"|"$1 "*) return 0;;
            *) return 1;;
            esac
        }
    

Not tested, but that's the idea.

~~~
falsedan
Bash is all about shelling out: it's a shell. I would happily use grep here,
and if I had an array:

    
    
      printf "%s\0" "$@" | grep -qz needle
    

> _much more performant_

I believe that performance-critical code shouldn't be written in bash. I have
seen pipelines be much faster than tweaked bash functions, simply because they
run in parallel & thus the cost of forking is paid once, up-front.

~~~
jstimpfle
I agree with you. I think we are just speaking about different things.

I was more thinking about the things that happen in a configure script, for
example. Not a "hoot loop" like a big grep over millions of lines or
something.

There are situations in shell code where the difference between a case-match
or string-suffix-replace written in shell and a process spawn matters. Not
only because forking a process for one simple string operation is wasteful
(cost: about 1ms), but also because of the semantic problems that come with
child processes (error handling).

------
falsedan
There's another advantage to using filters in pipelines: each part of the
pipeline runs in parallel!

I recently rewrote some bash code to replace

    
    
      for subdir in $names
      do
        for parent in $(parent_of $dir)
        do
          [ -d $parent/$subdir ] && …
        done
      done
    

with

    
    
      parents_of $dir | append_subdirs $names | is_dir | xargs …
    

I expected a modest speed improvement (due to not calculating the parents of a
dir repeatedly), but was surprised to see it was twice as fast!

------
10165
The author's first sentence regarding "novice" shell programmers I think may
be applicable to the other submission about shell scripts currently on the
front page.

For example, in C the idioms often combine several instructions into a single
line, i.e., "nesting". Kernighan suggested nesting in an early C tutorial:

    
    
       while ( putchar( getchar( ) ) != '\0' )
    

In the shell maybe it is better to test each "expression" on a line by itself.
Or maybe not. I do this anyway.

More often I see on the web that other shell scripters prefer to nest as many
commands as they can, perhaps to reduce the number of lines.

For example,

    
    
        variable=$(command1 $(command2));
    

This could be alternatively expressed as something like

    
    
       variable1=$(command2);
       # can now test variable1 before proceeding to next line
       variable2=$(command1 $variable1);
    

The result of nesting is subshells and complexity that I am not sure
reluctant, occasional shell scripters are prepared to think about.

And if I am not mistaken that was at least part of the problem that Jane
Street had in the other submission.

