
Quoting command line arguments the wrong way (2011) - moopling
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
======
makecheck
If an API has a problem, you fix the API. If necessary, you release a free
library to back-port the improved API to older OS versions as well. You come
up with a fixed version, make it as convenient as possible, call it the “new
standard”, officially deprecate the alternatives, and _then_ write a blog
post. Except the blog post would only need two lines of sample code showing
how easy it is to work around the problem now.

Not this. Frankly, few developers will even know about the need for careful
coding such as this, and even fewer will actually do it because it will muck
up each and every program with dozens of lines of extra stuff to work around a
deficient part of the PLATFORM.

~~~
tetrep
I'm absolutely amazed that Windows doesn't offer a process spawning API that
takes an array of strings as arguments[0], if only because that's exactly how
a C program expects them anyway.

[0]: [https://linux.die.net/man/3/exec](https://linux.die.net/man/3/exec)

~~~
dom0
These APIs _do_ exist, they just don't work that way:

> These functions appear to be precisely what we need: they take an arbitrary
> number of distinct command line arguments and promise to launch a
> subprocess. Unfortunately and counter-intuitively, these functions do not
> quote or process these arguments: instead, they’re all concatenated into a
> single string, with arguments separated spaces

~~~
Dylan16807
Or in other words, those APIs _don 't_ exist, and mentioning a function that
has the same C-type but does something totally different is trivia at best.

~~~
dom0
... which is exactly my point. They apparently exist, and would even appear to
work correctly, until some scrutiny is applied.

~~~
Dylan16807
> They apparently exist[...]until some scrutiny is applied.

But that's the opposite of " _does_ exist"?

------
rusanu
Looking _now_ at how Windows handles console application arguments, it sure
looks broken. But you have to put your mindset in cca. 1990 and think what
Windows applications looked like back then, and what was the model Microsoft
was betting on. Arguments were passed via DDE[0], and then later all the bets
were on OLE[1] and finally COM[2]. System components were all the time
accessed via in-process DLLs communicating with services over LRPC[3]. In this
world, the command line, the pipe philosophy and the 'less is more' mindset
were not only not welcome, they were the _adversary_.

Even when finally it was aknowledged that the command shell needs some love
too, the answer was PowerShell, which yet again defined an _object_ interface
between cmdlets[4].

[0]
[https://en.wikipedia.org/wiki/Dynamic_Data_Exchange](https://en.wikipedia.org/wiki/Dynamic_Data_Exchange)
[1]
[https://en.wikipedia.org/wiki/Object_Linking_and_Embedding](https://en.wikipedia.org/wiki/Object_Linking_and_Embedding)
[2]
[https://en.wikipedia.org/wiki/Component_Object_Model](https://en.wikipedia.org/wiki/Component_Object_Model)
[3]
[https://en.wikipedia.org/wiki/Local_Procedure_Call](https://en.wikipedia.org/wiki/Local_Procedure_Call)
[4]
[https://en.wikipedia.org/wiki/PowerShell#Pipeline](https://en.wikipedia.org/wiki/PowerShell#Pipeline)

~~~
sukilot
That's a long way of saying that Microsoft spent many years promoting a
overcomplicated bad ideas over simple correct ideas.

~~~
wongarsu
>promoting a overcomplicated bad ideas over simple correct ideas.

Encode the program name together with all arguments as one big string,
discarding all type safety, spawn the process of your favorite shell, let that
shell process the string in order to spawn one or multiple processes, in each
process have the standard library process the arguments into an array called
argv, and have your average process call out to yet another library to parse
these string argument strings into flags and parameters, and prints a not
standarized, potentially localized string if an error occurs, and if the error
is fatal exits with a semi-standarized return code. The calling program tries
to make sense of the output of the called program, often by fuzzy matching
against known output.

That's the standard way how it's done since the beginning of Unix. Some APIs
skip the shell, but that's a minor detail in all this. If this sounds simple
and like the obviously best solution, then congratulations. The windows
developers disagree and tried (and continue to try) to find a better way. They
mostly failed so far, but I think we should thank them for at least trying to
innovate.

~~~
arielb1
If you use some API other than system(3) - even if you literally use a shell-
script - it's:

1) serialize data structure to semi-typed array of strings 2) pass array of
strings directly to target program 3) have target program parse the array of
strings to flags & parameters

With no shell or C library touching the command line arguments at all.

The only way this could be more direct would be if you used JSON instead of
arrays of strings, web-API style.

------
ajross
Everyone does this "wrong" because every app does this differently. The core
reason windows command line options aren't (or at least shouldn't be) used to
pass complicated data is that way back when DOS simply provided a single
command line string to the executed program and let it parse it itself. So no
two command line parsers are the same. The glitch with escaping here is merely
one symptom of a broader problem.

Unix got this right by forcing the shell to provide the kernel a pre-parsed
list of strings, so the only insanity the tool integrator needs to understand
is the shell's quoting syntax. Which is still insane. But it's only insane in
one particular way.

~~~
pwdisswordfish
Unix got something right in that you can unambiguously pass a list of separate
strings to launched processes. However, it does nothing to ensure unambiguous
_meaning_ of those strings.

This is for example why you should avoid giving your files such cute names as
'-rf'.

~~~
quotemstr
> This is for example why you should avoid giving your files such cute names
> as '-rf'.

The kernel should ban these names. I'm a big fan of dwheeler's proposal for
fixing filenames: see [http://www.dwheeler.com/essays/fixing-unix-linux-
filenames.h...](http://www.dwheeler.com/essays/fixing-unix-linux-
filenames.html)

These is no god damn reason why a filename should be able to contain, say, LF,
DEL, or BEL. None whatsoever.

~~~
kalleboo
> These is no god damn reason why a filename should be able to contain, say,
> LF, DEL, or BEL. None whatsoever.

OK you want ASCII 0x07 to be disallowed. Should a filename be allowed to
contain "㜇"? (U+3707)

~~~
kazinator
That's not a problem because the UTF-8 encoding of U+3707 will absolutely not
contain any USASCII control characters, or any special shell or filesystem
characters. It will all be bytes in the range 0x80-0xFF.

~~~
kalleboo
There are other encodings than UTF-8 though. Which is kind of my point. If you
have your file system set to UTF-16 (doesn't NTFS do this?) then 0x07 will be
present.

~~~
quotemstr
I _also_ believe that filesystems should require that all filenames be fully
normalized UTF-8. I don't think the benefits (slight, IMHO) of allowing
filenames to be arbitrary byte strings outweigh the costs of code complexity
and security problems.

------
pwdisswordfish
This is a quite good example of the kind of problems you run into when you
follow the philosophy of representing all data in informally-specified ad-hoc
text formats. Everyone thinks they can just roll their own parser/serialiser,
which they then neglect to test thoroughly enough, creating subtle bugs when
the serialisation side forgets to escape data somewhere, or the parsing side
doesn't even provide any way to escape grammar-significant characters.

~~~
grymoire1
I think this indicates a huge difference between Microsoft/UNIX mindsets.
Microsoft allowed

    
    
       rename *.txt *.bak
    

To do this, the "rename" command had to understand how to parse the asterisk
character while being familiar with the contents of the directory. However,
creating a new replacement "rename" command is difficult, as well as creating
new commands that can parse wildcards.

In the Unix environment, the shell expands the asterisk to all files that
matches that pattern, and then passes these files to the "rename" command, who
never sees the asterisk. Therefore it's trivial to create a new "rename'
utility because it doesn't need to parse wildcards. However, renaming all .txt
to .bak is awkward in a UNIX system.

~~~
sukilot

        find --name *.txt --exec mv {} {}.bak\;
    

That's imperfect because find doesn't provide a nice way to do the pattern-
replacement but it's a pretty simple pattern.

If this use case were important, it's simple enough to write a purpose-built
program that generated command-line argument pairs according to a spec.

~~~
Asooka
I would probably do

    
    
       find -name '*.txt' -exec bash -c 'mv -nv "$0" "${0/.txt/.bak}"' {} \;
    

That's not at all user-friendly, but these are supposed to be programmer
tools... If you want user-friendly file operations on UNIX command line, use
midnight commander. It can do mass rename, etc.

------
joeyh
That's about Windows, but many uses of system() that involve non-static
strings also probably get quoting wrong.

Of course, avoiding the shell is the best way to avoid the problem. Sometimes,
you can't avoid the shell.

My preferred shell quoting method, for unix, is to wrap each parameter in
single quotes. Then only single quotes inside a parameter are a problem. They
can be replaced with '"'"'

Probably a lot of things use double quotes and perhaps try to escape $ and '
and " but miss details to do with \ and perhaps other characters that some
shells treat specially.

Another way is to pass the filename in the environment: system("rm -rf
\"$DIR\"")

~~~
pwdisswordfish
> Sometimes, you can't avoid the shell.

What circumstances have you got in mind?

~~~
tene
My favorite awkward environment to cite is running a command remotely over
ssh. As far as I've been able to tell from casual testing, without having read
the source code yet, ssh does something very similar to what Windows does here
and just glues everything together with spaces and passes it to the remote
shell for interpretation, so you have to deal with the shell and provide your
own quoting.

~~~
dozzie
>>> Sometimes, you can't avoid the shell.

>> What circumstances have you got in mind?

> My favorite awkward environment to cite is running a command remotely over
> ssh.

Dude, if you run remote commands by calling them through SSH, you didn't just
got things backward: you fucked things up heavily. SSH was never ever designed
as a batch, unsupervised tool, despite many people using it as such.

Remote code that is parametrized should be run exactly as that: as a remote
procedure call, a technique known for _over thirty years_ now. One of the
reasons is quoting (because for non-interactive SSH call the command needs to
be quoted exactly twice if in shell, and exactly once when run from exec()),
but there are problems with distributing keys, maintaining usable home
directories, and disabling unnecessary things that are enabled by default
(port forwarding, among the others), and that _doesn 't exhaust the list of
issues_.

Proper RPC protocol, like XML-RPC (which was released _twenty years ago_ and
is still usable while being quite simple), covers quoting -- or actually,
_serializing_ data -- without programmers worrying if they got their list of
metacharacters right and did enough passes for things to work correctly.

On the other hand, I'm not surprised that people do this through SSH (and a
variant of this stupidity: adding _apache_ user to sudoers, so a web panel can
add firewall rules). After all, I've never seen an easy to use RPC server that
has all the procedures passed in its configuration. I needed to write such
thing myself (once in Perl, as xmlrpcd, and recently in Python, with custom
protocol that can do a little more, as harpd of HarpCaller project).

~~~
tene
You're entirely right; I completely agree.

And yet, sometimes you're working in an environment that already has a good
ssh configuration for other reasons, and you're very low on engineering time
that you can invest into something, and ssh is good enough for a first pass
implementation. Alternately, you may be working on some kind of ad-hoc data
collection or maintenance task that's not going to become part of any long-
term infrastructure (or will be replaced by something better), and you don't
yet have any better systems in place to run ad-hoc programs across the
cluster.

I completely agree with you that good RPC is a much better foundation to build
reliable systems on.

[edited to add]: HarpCaller looks like a pretty interesting project, and
similar to several things I've considered building in the past. Nice work.

~~~
dozzie
> sometimes you're working in an environment that already has a good ssh
> configuration for other reasons, and you're very low on engineering time
> that you can invest into something, and ssh is good enough for a first pass
> implementation.

I would agree personally before I wrote xmlrpcd. After I wrote it, I, its
author, have no excuses for using SSH as an RPC protocol. Though I'm not good
on the marketing side, so I understand that people just don't know about such
tools.

> Alternately, you may be working on some kind of ad-hoc data collection or
> maintenance task that's not going to become part of any long-term
> infrastructure (or will be replaced by something better), and you don't yet
> have any better systems in place to run ad-hoc programs across the cluster.

Honestly, this is yet another matter.

To properly manage a set of servers, one needs three different services[&],
each for different thing. One service is for running predefined procedures
(that can possibly be parametrized) -- this is what HarpCaller and earlier
xmlrpcd are for. Another service is for managing configuration and scheduled
jobs -- this is a place for CFEngine and Puppet. Then there is what you just
said: a tool for running commands defined in an ad-hoc manner and collecting
their output _synchronously_. From the three, the first and second don't match
how SSH works and is used, but for the last one it actually makes sense.

[&] It doesn't _have_ to be three services, but we don't have one that would
cover all three in an uniform way.

> [edited to add]: HarpCaller looks like a pretty interesting project, and
> similar to several things I've considered building in the past. Nice work.

Thank you. I'm quite proud of how it turned out, and the middle part of it was
an excellent pretext for me to write something for production use in Erlang.

------
nilved
I'm going to use this opportunity to ask a question I've been thinking about
for a long time. Why do we have both environment variables and command line
arguments? They are the same thing, except one is key-to-value and one is
positional and often needs to be parsed by hand in an ad-hoc fashion. I don't
think that people should use command line arguments when environment variables
are an option, and I'm not aware of any use cases where they are not an
option.

~~~
Animats
A more interesting question is, why do we pass in command arguments and
environment variables to a subprocess, but only get an integer status code
back? The original reason comes from the way fork/exec was implemented in
PDP-11 UNIX. But that was a while ago. At some point, the subprocess concept
should have been extended to handle return values, like all other forms of
function call. "exit" should have an argc/argv, which get passed back to the
caller.

~~~
quotemstr
It's always bugged me that the POSIX exit status can only really communicate
seven bits of information reliably. (The other half of the byte is overloaded
with signal-exit information from the shell.) Windows does it better: there,
you at least get 32 bits, which is enough for an HRESULT.

Also, I've never understood where this exit(-1) idiom comes from. It's
nonsense.

------
quotemstr
Oh, hi. This is my article. I'm a bit sad that I lost edit rights when I left
Microsoft.

~~~
ronsor
You should always host your own blog and then post a summary on the MSDN blogs
-- always have control over content

------
zkhalique
Oh brother, this is one of the reasons UNIX was much more developer-friendly.

~~~
agentgt
Actually Bash does have some issues with command line arguments if doing
variable expansion. For example in Java apps it is actually fairly difficult
to pass -DsomeParameter="something with a space in it" if doing variable
expansion.

~~~
kbp
Could you clarify what you mean? What's wrong with -DsomeParameter="$var"? Or
am I misunderstanding?

~~~
agentgt
I'm trying to find the exact situation but basically if you have something
like:

    
    
       PROPS="-Dprop1=foo bar -Dprop2=bar foo"
    
       java $PROPS SomeClass.class
    

You can try to escape with backslashes and what not but it becomes fairly hard
if not impossible to expand multiple parameters correctly as a single
variable. I believe one solution is to use arrays and another is just not use
arguments and instead rely on other configuration mechanisms (env variables,
files, etc).

You see this often rear its ugly head with daemon scripts.

~~~
colemickens
A bit hard to understand your exact scenario here (basically the same shell
lexing problem...), but I think this is what you're looking for.

I left an example of an argument with a space in there (the last one).

    
    
        PROPS=("-Dprop1=foo" "bar" "-Dprop2=bar foo")
        java "${PROPS[@]}" SomeClass.class
    

(The quotes around `${PROPS[@]}` is important.) Do note, there is an
unfortunate edge case here if PROPS is empty, you'll get a false `""` arg
passed to java in that case. There's a less pleasant syntax that avoids that
issue but I don't recall it off-hand.

(edit: Please read the replies to my post, I didn't think about the fact that
this syntax is bash specific. Thanks to those who pointed it out)

~~~
agentgt
Oh I am sure there are ways around it but the big issue is that almost all of
the daemon scripts out there do not do it. That is you can't just set some
configuration in /etc/default/some_daemon as the script will try to
concatenate the command.

I tried to find a failsafe solution once while rewriting a daemon script and
just gave up.

------
saurik
A friend of mine had to fix this recently for PowerShell, which had a
regression that caused arguments you passed to programs to be incorrectly
escaped to executed commands.

[https://github.com/PowerShell/PowerShell/pull/2182](https://github.com/PowerShell/PowerShell/pull/2182)

It is extremely common to get this wrong. Apache Portable Runtime even gets
this wrong :/. (I haven't submitted a patch for this yet, but I intend to: I
ran into it a couple months ago and then got distracted after working around
it in my program by predicting what incorrect escaping might be performed by
APR and compensating by adding quotes and escape characters to my input to
their open process function... my build is statically linked so I don't feel
bad about this temporary hack ;P.)

~~~
quotemstr
Wow. Microsoft really has changed for the better since I left. Not a
coincidence, I'm sure. ;-)

------
kazinator
This should be "quoting command line arguments the right way for passage into
applications developed using Microsoft Visual C, and linked to its C run-time
library that parses the command line string and calls main or wmain".

There is no general correct way to quote command line arguments in Windows,
because every application receives just a character string which it parses
however it wants.

There is no single specification for the syntax by which arguments are
delimited within the command string.

------
jdright
I dont really know what to say from this without risking being downvoted. But,
this coming from a Microsoft blog is a little... awkward. DOS heritage +
really bad shell implementations, well.. I avoid the hell out of using command
line on any Windows. Luckily there is mintty.

~~~
marcosdumay
> well.. I avoid the hell out of using command line on any Windows

How do you avoid it if it's the only API available for spawning processes on
Windows?

~~~
alkonaut
High level Api's could easily provide this, e.g. the C#

    
    
        Process.Start(executable, args);
    

Takes a single string as args, and has no overload taking an array as args,
which formats it safely and correctly so that the receiving process would see
the same string in its args vector.

So while it seems to be pretty easily fixable, it hasn't been.

~~~
marcosdumay
The right way to do it would be to change the OS API into accepting an array
of strings, and turn this function into an overlay that will have its
parameters parsed and broken apart in user-space.

But yes, if MS ever fixes this it's more likely that they'll go into the route
you described. I can't wait to see how many bug reports with crazy
descriptions it will create, and how many more overlays will be written to fix
the bugs preserving backward compatibility.

------
vram22
IIRC, Microsoft C (quite some years ago - had worked on a product using it)
had different variants of functions to spawn or execute a process from another
one - like the exec family - execlp, execvp, execvpe, execlpe, etc., which
varied in things like fixed vs. variable number of args, checking environment
vs. not, etc. I also remember reading about it in a Waite Publications book,
The Microsoft C Bible, by Nabajyoti Barkakati. Not sure if the issues
mentioned in the OP could be solved if those functions were present - need to
check.

Edit: and the DOS / Windows exec family of functions was likely derived from
Unix's exec().

------
Pxtl
What I find frustrating is how many MS tools freak at the sight of a quoted
path when quoted paths are what "copy filename with path" (or whatever the
context command is called) gives.

------
wfunction
I recall seeing different behavior between the C runtime and
CommandLineToArgvW. I don't remember what the difference was, but I remember
it driving me nuts.

------
jaclaz
[2011]

------
grymoire1
Oh. I was confused until I realized the publishing site was Microsoft.
Apparently "Everyone" only refers to Windows programmers, and
Unix/Mac/whatever programmers do not exist in this universe.

~~~
chc
It's an MSDN article. Most of the articles on Apple's developer site are
equally inapplicable to Windows development without explicitly saying so.

