
Unix Wildcards Gone Wild (2014) - veganjay
https://www.defensecode.com/public/DefenseCode_Unix_WildCards_Gone_Wild.txt
======
okdana
>Simple trick behind this technique is that when using shell wildcards,
especially asterisk (...), Unix shell will interpret files beginning with
hyphen (-) character as command line arguments to executed command/program.

To be clear, the shell isn't really 'interpreting' anything here — it knows
absolutely nothing about the argument conventions of whatever program you're
running[0]. All it's doing is passing a list of arguments to an executable;
how the executable deals with that is up to it.

And this isn't only a shell problem — it's an issue when you pass arbitrary
arguments to ANY external utility in ANY language. For example, maybe you have
some kind of tool that calls grep:

    
    
        # Perl
        exec '/usr/bin/grep', '-R', $input, 'mydir';
    
        # PHP (with Symfony Process)
        (new Process(['/usr/bin/grep', '-R', $input, 'mydir']))->run();
    
        # Python
        subprocess.run(['/usr/bin/grep', '-R', input, 'mydir'])
    
        # C
        execv("/usr/bin/grep", (char *[]) {"grep", "-R", input, "mydir", NULL});
    

All of these are safe in the sense that there's no risk of shell command
injection (in fact, only the PHP one even calls the shell), but NONE of them
are safe against this problem, where the input value might begin with a
hyphen. In this grep scenario it's probably not a security issue, but it can
produce confusing results for the user if nothing else.

As others mentioned, you must ALWAYS use '\--' to mark the end of option
processing when you do something like this.

[0] Technically the shell knows about the conventions of its built-ins, but
even those mostly handle arguments in a similar fashion to external utilities,
where an arbitrary argv is fed to a function and then parsed using some
getopts-like method.

~~~
jwilk
You shouldn't assume grep is in /usr/bin. On many systems it's in /bin.

Perl's exec and Python's subprocess already search for executables in PATH, so
you don't need to provide the full path.

In C, you can use execvp() instead of execv().

(No idea about PHP…)

~~~
okdana
Right, it was just a random example that popped into my head. You would
probably also want to think twice about using -R (at least with GNU grep),
about hard-coding the search directory, &c.

All of PHP's process-execution methods (except one, which is unusable on many
systems) go through the shell, so by virtue of that it also searches the PATH.

------
Pete_D
I didn't see this mentioned in the article, but you can protect against this
in your own scripts by including a -- before user-supplied parameters, which
prevents them from being treated as flags.

    
    
        $ rm -- -rf
        rm: cannot remove '-rf': No such file or directory
    

(see Guideline 10,
[http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_...](http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap12.html#tag_12_02))

~~~
jwilk
Alternatively:

    
    
      $ rm ./-rf

~~~
zouhair
That if you know such file exist in said folder.

------
derefr
This, plus the problem of dealing with files with spaces, newlines, tabs, etc.
in their names, are what draw the line for me between "things I am willing to
write as a shell script" and "things that will make me use a scripting
language† with its own file-manipulation system-call builtins."

A three-line script that takes one file as an argument? Sure, write that in
bash.

A script that processes N files passed as arguments? Or that processes a
_listing_ of files piped in from stdin? And uses branching logic on the files
themselves? Bash goes out the window.

Sure, it's _possible_ to write shell scripts to do that. But why? Unless
you're targeting a PDP11 or the Busybox-linked /bin/sh in your initramfs, is
shell scripting really the _best_ option?

\---

† Personally, I reach for Ruby in these cases; its stdlib Pathname and
FileUtils classes are pretty solid for this use-case, and Ruby interpreters
are now roughly everywhere I want to deploy.

Ruby _is_ kind of clumsy for doing all the command piping parts of scripting,
though; especially since there's no easy way short of wrapping Kernel.system
in your own function to make a Ruby script crash on a nonzero subshell exit
status.

I'd love to know if there was a language that was like Ruby in terms of the
syntax and semantics around strings, arrays, and files; but then was "just
like writing a shell script" in the sense of being able to just use binary
names to make "function calls" that are really synchronous subshell spawns and
exit-code checks in one (but which still use the variable-passing semantics of
the language where the resulting exec(2) ARGV is concerned.) Probably a Ruby
DSL is the simplest way to get that—but if it's not in the stdlib, I'm not
going to be able to use it for DevOps scripting :/

~~~
LukeShu
> _A script that processes N files passed as arguments?_
    
    
        for filename in "$@"; do
            ...
        done
    

> _Or that processes a listing of files piped in from stdin?_

That has the tricky bit of "what about filenames that contain newlines" that
would come up in an implementation in _any_ language. Since the one byte not
allowed in filenames is '\0', let's say that the listing uses '\0' to separate
filesnames.

    
    
        while read -d '' -r filename; do
            ...
        done
    

There are things that are hard in Bash... but the things you mentioned aren't
them.

> _but which lets you strictly control the ARGV and env of those function
> calls without the whole thing passing through a TCL-like stringification and
> re-tokenization._

I'm pretty sure that's just Bash, if you remember to wrap everything in
double-quotes.

~~~
derefr
That second example is write-only code, and becomes moreso the more things you
have to do with the filename. What if you have to reassemble a few of these
into an array, filter that array by matching tokens within it, and then feed
those arguments through as NUL-separated stdin to another binary, _with_ error
checking all along the way to make sure you never call the final command with
invalid arguments?

In bash, that's a 50-line monstrosity. In Ruby (or Python, or even Perl), it's
~5 clean, readable lines.

> I'm pretty sure that's just Bash, if you remember to wrap everything in
> double-quotes.

Now try composing a command-line in Bash out of those "simply double-quoted
values" to pass to ssh as a command to start an ssh exec channel with, where
the command also contains a `su -c` stanza.

Again, in another language, that's just a repeated application of a
shellescape function as you're composing things. In bash... I have no idea.

\---

These are both examples of the type of code you have to write _all the time_
in Bash when dealing with file manipulation, because any file could have any
byte-sequence (except NUL) in its name, and you're dealing with the results of
calling commands that want to pipe those filenames out to you as raw delimited
data ( _hopefully_ NUL-delimited.)

Let's say you're writing a script that copies a bunch of files into a dir, and
then archives them into a tar file. Are you sure your command reads and
recreates both relative and absolute symlinks correctly, in a way that they'll
unpack correctly for the tar file? In most scripting languages, you've got
functions like Pathname#relative_path_from(dir) and Pathname#clean_path. To
get the "correct" symlink in bash, you've got `readlink -f` and a comparison
and then a bunch of string manipulations using `basename` and `dirname`.

Or how about a "recursively collapse empty or _pseudo-empty_ directory"
function? Where, by "pseudo-empty", I mean "empty except for .DS_Store;
Thumbs.db; Desktop.ini; those resource-fork files on FAT volumes/SMB shares
that start with ._; or other empty or pseudo-empty directories". That's not a
function anything ships with, but it's ~10 lines of Ruby. Try to implement it
in Bash! (Yes, you can decide whether to collapse the _whole hierarchy_ in one
pass with `find`. But the mission here is to remove all the empty _and_
pseudo-empty dirs found in the hierarchy, like an enhanced version of `find
-empty`.)

(Notice that these are both examples of things where a Bash programmer would
be likely to say "why not write a new binary in C that does this thing, and
then call it from your script"? Well, because the whole reason I'm stuck
writing in a scripting language is that this code needs to be _portable_ to
unknown target CPU ISAs and OS ABIs, silly.)

There is a reason that the default kind of cloud-init script is a YAML file,
rather than a Bash script. There is a reason that Vagrantfiles aren't Bash; a
reason that Ansible is less popular than Chef or Puppet or Fabric; a reason
why every /bin/init that isn't sysvinit has soundly beaten sysvinit. And it's
why, even in Ansible, or autotools, or any package manager's pre-install and
post-install hooks, the parts where you _do file manipulation_ are abstracted
out, either being done through YAML (Ansible), or through domain-specific
file-manipulation _binaries_ (dpkg) or through pre-written and thoroughly-
checked file-manipulation Bash scripts (autotools).

And that reason is that doing file manipulation directly in Bash is not worth
the time and effort, and _especially_ not worth the maintenance burden in a
codebase shared with other programmers who are probably not Bash experts.

~~~
LukeShu
> _That second example is write-only code,_

Tacking on the `-d ''` flag to `read` makes it write-only!?

> In bash, that's a 50-line monstrosity. In Ruby (or Python, or even Perl),
> it's ~5 clean, readable lines.

As an example, let's take the input, batch it in to groups of 5, filter-out
files that don't contain "token1" or "token2", and pass those as a NUL-
separated list to stdin of another binary.

I wrote that up. It came out to 10 lines in either Ruby or Bash.
[https://pastebin.com/EksRxmVJ](https://pastebin.com/EksRxmVJ) It's certainly
not a "50-line monstrosity"

> _Again, in another language, that 's just a repeated application of a
> shellescape function as you're composing things. In bash... I have no idea._

In Bash, "shellescape" is pronounced "printf %q" (and shelljoin is pronounced
"printf '%q '"). In Bash, that's similarly just repeated application of printf
%q:

    
    
        base_cmd=('whatever' 'arg with space')
        nested_cmd=(sh -c "$(printf '%q ' "${base_cmd[@]}")")
        ssh user@hostname "$(printf '%q ' "${nested_cmd[@]}")"
    

For direct comparison with Ruby:

    
    
        escaped=$(printf '%q ' "${base_cmd[@]}")
        escaped=base_cmd.map{|a|shellescape(a)}.join(" ")
        escaped=shelljoin(base_cmd)
    

Or, Bash 4.4 gained @Q as a shorthand for that:

    
    
        base_cmd=('whatever' 'arg with space')
        nested_cmd=(sh -c "${base_cmd[*]@Q}")
        ssh user@hostname "${nested_cmd[*]@Q}"
    

> _In most scripting languages, you 've got functions like
> Pathname#relative_path_from(dir) and Pathname#clean_path._

You mean $(realpath --relative-to=DIR) and $(realpath)?

> _Or how about a "recursively collapse empty or pseudo-empty directory"
> function? ... it's ~10 lines of Ruby. Try to implement it in Bash!..._

And it's 6 lines of Bash?

    
    
        find \( -not -type d \
                -not -name .DS_Store \
                -not -name Thumbs.db \
                -not -name Desktop.ini \
                -not -name '._*' \
             \) -printf '%h\0%p\0' | sort -uz
    

Or were you looking for a listing of pseudo-empty directories?

    
    
        comm -z23 <(find -type d -print0 | sort -z) \
                  <(find \( -not -type d \
                            -not -name .DS_Store \
                            -not -name Thumbs.db \
                            -not -name Desktop.ini \
                            -not -name '._*' \
                         \) -printf '%h\0%p\0' | sort -uz)
    
    

Like I said, there are things that are hard to do in Bash, but I'm not certain
you've pointed any of them out.

> _There is a reason that the default kind of cloud-init script is a YAML
> file, rather than a Bash script._

And there's also a reason why a "GNU/Linux distro" is essentially a giant pile
of shell scripts (maybe embedded in to Makefiles).

> _a reason why every /bin/init that isn't sysvinit has soundly beaten
> sysvinit._

I mean, I'm happy with systemd on my systems, but: OpenRC is a giant pile of
shell scripts.

> _the parts where you do file manipulation are abstracted out, ... through
> domain-specific file-manipulation binaries (dpkg)_

Great example, because the core of Debian is the "rules" file for each
package, which is... a Makefile, containing shell.

Or, look at Arch Linux, the core of which is the "PKGBUILD" file for each
package, which is... a Bash script.

> _or through pre-written and thoroughly-checked file-manipulation Bash
> scripts (autotools)._

Autoconf is targeting shells that _aren 't Bash_. It's pre-written and
thoroughly-checked shell that contains wonky workarounds, in order to be
compatible with buggy shells from 25 years ago.

If you're assuming "modern shell / utilities", file manipulation in Bash _isn
't hard_.

~~~
jwilk
> You mean $(realpath --relative-to=DIR) and $(realpath)?

For historical reasons, some old Debian-based distros (such as Ubuntu 14.04
LTS "Trusty Tahr") doesn't ship with coreutils' realpath. Instead, they have
separate, somewhat incompatible realpath package, which doesn't have
--relative-to.

Also, beware that $() strips trailing newlines.

~~~
LukeShu
Good mention! I knew that realpath is a relatively new addition to GNU
coreutils, but I didn't realize that older Debians had an incompatible
implementation of it.

With a single $(realpath ...), chomping the trailing newline is fine, as
realpath will insert a trailing newline after the filename; but definitely
something to be aware of with other commands.

~~~
jwilk
Newline in realpath (and many other commands) output is mostly for aesthetics.
It doesn't help with taming $(), because $() strips _all_ trailing newlines,
including the ones that were actually part of the pathname. You would have
have to resort to something as ugly as:

    
    
       f=$(realpath ... && printf 'x')
       f=${f%x}

------
dwheeler
Here are some of my pages on attacks via filenames and how to counter them...

[https://www.dwheeler.com/essays/fixing-unix-linux-
filenames....](https://www.dwheeler.com/essays/fixing-unix-linux-
filenames.html)

[https://www.dwheeler.com/essays/filenames-in-
shell.html](https://www.dwheeler.com/essays/filenames-in-shell.html)

It is not really just a shell problem.

------
pixelbeat__
This is one of the reasons GNU ls now quotes names with unsafe characters by
default, so the common case of copy and pasting output from ls to commands
will be safe, or if typing the argument, there is a visual indication of a
potential issue.

If one want the older unsafe defaults you can add -N to your ls alias

------
jwilk
Is there a more comphesensive list of command line options that can be abused
in argument injection attacks?

~~~
jacquesm
One nasty trick is web pages that show a benign code snippet that expands to
something evil upon being cut-and-pasted.

~~~
craftyguy
or my personal favorite are folks who recommend this crap:

    
    
        curl https://<whatever> | bash

~~~
Spivak
This really isn't as bad and people make it seem. If you're trusting the
vendor and aren't going to read the code anyway it's no less safe than
installing a package.

I don't _like_ installations like this because they're messy, hard to track,
harder to clean up, don't lend themselves to mass deployments, don't provide
any integrity verification, the list goes on, but they're not particularly
unsafe.

These things exist because there aren't good package that don't require root.
Maybe we'll get there eventually.

~~~
jwilk
One nasty thing about piping to shell is that the server can detect that this
is happening:

[https://news.ycombinator.com/item?id=11532599](https://news.ycombinator.com/item?id=11532599)

