Quoting (and splitting) is basic but eludes a lot of people including developers. I'd guess around one quarter to one third of all questions / issues raised with the command line tools I've maintained relate to quoting and so don't actually have anything to do with the tool in question.
I think it's important for anyone — developers especially — using a shell to understand basic shell principles.
Shell is a REPL that has some special filesystem-related helpers. Looking through $PATH for any +x file that matches the name of the first word entered. Performing glob expansion (*, ?, {foo,bar}...) on all but the first word. Backticks. Environment variable scoping (difference between `export FOO=bar` and `FOO=bar ./my_program`). Presence of "." and ".." in the FS. And so forth.
Once someone wraps their head around the execution model, they should be golden.
If we're talking about bash, I think it's a mistake to refer to any of this as "glob expansion", a term which doesn't appear in the bash man page. Rather, it differentiates between multiple types of expansion and the order in which they take place. Quote:
The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion.
On systems that can support it, there is an additional expansion available: process substitution.
Furthermore, expansion absolutely does work on the first word and it's not obvious to me why someone might think it doesn't. The simplest example is probably tilde expansion, i.e. the ability execute ~/bin/foo or ~someone/bin/bar. But the other types work too, e.g. brace expansion:
$ ls l1
ls: l1: No such file or directory
$ l{s,1}
ls: l1: No such file or directory
$ touch l1
$ l{s,1}
l1
Command substitution:
$ $(echo l)s -l l1
-rw-r--r-- 1 user wheel 0 10 Dec 14:12 l1
Arithmetic expansion:
$ perl$((4+1)).$((36/2)) -E 'say "foo"'
foo
Even pathname expansion - here, `fi*` expands to `find` from $CWD, which is then found in $PATH and executed:
$ ls -l
$ touch find foo bar quux
$ fi* . -name f\*
./foo
./find
> I think it's a mistake to refer to any of this as "glob expansion"
Since we are being pedantic regarding the terms, while I can't really find "glob expansion" in the Bash manual either, it documents a bunch of "glob" settings: "noglob", "GLOBIGNORE", "dotglob", "extglob"...[1]
FWIW, I expected to find these in the Bash info page on Ubuntu, but curiously, "info bash" does not show up the info page (just the fallback manpage) — "info find" does work. Whatever happened to (tex)info pages?
The * character is addressed in pattern matching rather than parameter expansion.
From the bash manpage (version 5.0):
The special pattern characters have the following meanings:
* Matches any string, including the null string. When the globstar shell option is enabled, and * is used in a pathname expansion context, two adjacent *s used as a single pattern will match all files and zero or more
directories and subdirectories. If followed by a /, two adjacent *s will match only directories and subdirectories.
The GNU bash manual describes the '-f' option as "Disable filename expansion (globbing)."
"Globbing" is how * expansion is generally referenced in other documents, if not often within the bash manpage or info pages themselves.
> ...expansion absolutely does work on the first word
As was already pointed out, you are right. It does not work to match executables in the $PATH, which is what I was, for some reason, trying to communicate badly.
For me, it is a mess no matter how you look at it.
It is because there are so many layers. The way commands are parsed is already not that easy, but then the command itself has its own rules, and it is not uncommon to pass a script as an argument.
for example if you type "sh -c ls", first it is parsed into exec("sh", "-c", "ls"), then "sh" parses its arguments "-c" and "ls", then a sub-shell is started that parses the command "ls", then the actual "ls" program is launched.
This is simple but at every step, there are some quirks, first is all the shell parsing issues, with globs, interpolation and all that. Then you have the issues of how the command parses its arguments (be careful about "-"). Then back to the shell parser with the sub-command, then "ls" itself has its quirks, like locales for instance.
I have done Perl (line noise), C++ (memory unsafe with too many features), PHP (injections everywhere), JS (with IE6 support) and none of these languages make me more uncomfortable than UNIX shells.
I think one reason for this is that quotes and wildcard expansion works very differently across platforms. On Windows it's the tool's responsibility to do wildcard expansion and parse quotes, but on UNIX the shell usually does wild card expansion and (AFAIK) also removes quotes before invoking the tool. This is especially "fun" in Python scripts which invoke command line tools and that must work across UNIX-oids and Windows, or when trying to run UNIX command line tools on the Windows command line.
On Windows even splitting arguments is done as part of the application (WinMain, for example, gets a single string, which is not something glued together by below-WinMain code but how processes work on Windows, also CreateProcess and other process-creation APIs need a single command line). SSHv2 has a similar quirk where the command line for request_exec is just one string and gets executed using <user's shell> -c <string goes here>, which makes the exact semantics dependent on the shell set for the user. This might have been an attempt at lowest-common-denominator engineering for Windows.
Overall Windows command line is and always was broken and unsound due to being based on the flawed DOS batch model.
The article manages to make this point in the most hopelessly long-winded way possible, going mile after mile on "guess what's wrong! did you guess yet?" and fails to fully cover a nuanced subject in the process.
Informative writing gets to the point instead of playing gotcha games.
This is why it's generally a bad idea to allow non-quoted values if the quoting is not necessary. It sounds great - people will have to type less! - but in practice it just means they don't know exactly when they have to quote and they make mistakes.
Thankfully most languages didn't make this mistake but some did: CMake, YAML, Bash, TCL. All notoriously awful.
It's a fair point. I think the best solution would be for shells to have two modes - interactive and batch mode. Batch mode would require strict quoting.
Alternatively it could let you type without quotes and then automatically quote it for you (e.g. you press shift-enter and it auto-quotes it).
It's fine for interactive use because you can debug errors when they happen. The issue is with scripts.
Sometimes you need to deliberately not quote, though. For example, an output of a program might be a list of files, which I then might want to operate on using "for filename in $files". If I do this then I must have a guarantee that the files do not have spaces in them, but this is usually not a particularly onerous requirement for a closed system that doesn't have untrusted filename inputs. Splitting data using text separated by whitespace is part of the essence of Unix (see TAOUP). If you want to ban this kind of thing too, then you might as well not permit shell scripting at all.
The problem with “works most of the time” is that it stops working when you’re not around. So it might be fine for a one-off shell command, but you never know what the next guy will pass into it.
The biggest problem I see with people dealing with Tcl is that they carry their understanding that {} delineates scope (or something else lexically). Of course, this is wildly untrue in Tcl as {} is simply a "quote" operator.
This stuffs people up horribly until they unlearn the idea that {} has semantic meaning.
? But I do not see an advantage either way. I think you attribute to quoting issues that are caused by type-punning. The quoting rules of Tcl are simple. I mostly encountered problems when people do things like
# x is not actually a list, but may appear that way if a and b are nice
set x "$a $b"
# Proper way:
set x [list $a $b]
But I fail to see how quoting could address or alleviate this.
> Thankfully most languages didn't make this mistake but some did: CMake, YAML, Bash, TCL.
As well as Perl and PHP, though both have since recanted (Perl started 20 years ago with strict mode, which IIRC forbids barewords in most contexts; and PHP deprecated barewords in 7.2 and removed them in 8).
Off the top of my head, the only place strict Perl allows a bareword is the left-hand side of the => fat comma operator and pre-declared subroutines (IIRC, I never could quite remember the rules for that).