
How Command Line Parameters Are Parsed (2016) - nikbackm
http://daviddeley.com/autohotkey/parameters/parameters.htm
======
m1el
Windows command line parsing is _crazy_. I've recently stumbled onto this
post, explaining various pitfalls in parsing command line arguments in
windows:

[http://www.windowsinspired.com/how-a-windows-programs-
splits...](http://www.windowsinspired.com/how-a-windows-programs-splits-its-
command-line-into-individual-arguments/)

~~~
zenojevski
The first example

    
    
      "She said "you cant do this", didnt she?"
    

Parses the same way in Bash:

    
    
      $ for A in "She said "you can\'t do this\!", didn't she?"; do echo -$A-; done
      -She said you-
      -can't-
      -do-
      -this!, didn't she?-
    

So I'd have thought this would be a given. It doesn't look so unintuitive –
though I'm not sure if the results are the same for the same reasons.

So I went to take a look at the Bash source code. Which is (expectedly) pretty
hairy. Top-of-stack quoting characters are referenced through-out, to enable a
`pass_next_character` which seems to me similar conceptually to
`ignore_special_chars`.

------
gumby
The unix section is missing a case: when invoked from login, arg[0] has a '-'
prepended.

I understand not many people write a program suitable as a top level shell
(for use in /etc/passwd) but for completeness...

~~~
leni536
Is this an actual separate case? How are top level shells invoked? Isn't this
just a specific call to execve?

~~~
dfox
It is good example that argv[0] is just another normal argument that is passed
by the caller of execve() and does not necessarily have to be related to the
actual filename of program executed in any way.

~~~
TeMPOraL
Not sure if it still works that way, but years ago I used to overwrite the
argv[0] param with my own data to change how the process appears on lists such
as one shown by the 'w' utility.

~~~
GauntletWizard
Yep. There's a tool in djb's daemontools that uses this behavior -
[https://cr.yp.to/daemontools/readproctitle.html](https://cr.yp.to/daemontools/readproctitle.html)
. It's kinda a pathological case, but it's a kinda cool way to assure that you
can get _some_ data from your processes, even on a totally horked system, as
well as being fairly convenient from an old-school administration perspective
(giving the ability to see what errors a program has encountered with just ps
or top; Already likely to be the first commands run when something's wrong.)

------
laurent123456
It'd be interesting to know why Microsoft implemented parameter parsing that
way. Maybe they decided to let the executable parses its own parameters as an
optimisation - it means if the executable doesn't care about its arguments (or
doesn't have any, like most GUI programs), an extra step is saved.

Except now it's clear it was a premature optimisation, if it takes 10 pages to
document the parsing behaviour, which probably also cost millions of wasted
developer hours fixing weird corner cases.

Maybe at that time they thought, "well programmers can just call
GetCommandLine() to get the args in a consistent way".

~~~
digi_owl
I suspect it is a remnant of the DOS days.

~~~
userbinator
The "command line as a single string" dates to CP/M if not earlier:

[https://en.wikipedia.org/wiki/Zero_page_(CP/M)](https://en.wikipedia.org/wiki/Zero_page_\(CP/M\))

I don't think it's an optimisation, but a design choice of simplicity; in
fact, CP/M (and DOS) would parse the first two arguments if present, and put
the results into the filename fields of the two default FCBs, allowing
programs which take two filename arguments (traditionally, an input and an
output) to be implemented easily without having any commandline parsing logic
of their own. More info on that here:

[http://www.gaby.de/cpm/manuals/archive/cpm22htm/ch5.htm](http://www.gaby.de/cpm/manuals/archive/cpm22htm/ch5.htm)

It would be interesting to trace the origins of command-line parameter passing
design even further, to the mainframe OSes that came before UNIX, but I'm not
so familiar with them.

Edit: OpenVMS appears to also pass a plain string to the process, which is
then parsed within:
[http://hoffmanlabs.org/vmsfaq/vmsfaq_015.html](http://hoffmanlabs.org/vmsfaq/vmsfaq_015.html)
(section 10.3)

------
_pmf_
> (Note: Avoid using printf when printing parameters, as the parameter may
> contain a % which would be interpreted by printf as a conversion specifier.
> The program may crash with a bizarre error, such as "runtime error R6002 -
> Floating point not loaded".)

It would be safe to use

    
    
        printf("arg[%d] = %s", i, argv[i]);

~~~
delinka
It should say: Avoid using user input as your format string. Never pass
command line parameters as the format argument to the printf family of
functions.

------
amelius
An error which I still think is ridiculous: argument list too long.

~~~
tankenmate
In the times past the command line was pushed onto the user stack of the new
process, this was before the process was of course started. This means you
were limited to the default available stack plus or minus whatever else needed
to be pushed onto the stack for a starting process. In older kernels sbrk()
(to grow the user stack of a process) could only be called on a process that
already exists, not one that is being crated. This means your argument list
could indeed be too long.

~~~
amelius
I vaguely was aware of this, but whatever the reason, it's ridiculous that in
this day and age of gigabytes of memory, I still can't use shell wildcards in
directories with a decently large number of files.

~~~
tankenmate
The limit on Linux is usually a bit under 2MB (ARG_MAX), other Unicies will
vary no doubt and Windows is 32KB, and I'm sure you are aware that xargs is
your friend.

~~~
vram22
Yes, xargs works on Unixen. I wonder if there is any equivalent on Windows
(other than using Unix xargs via Cygwin etc.), or writing your own? This point
didn't occur to me before, but it could be useful.

------
billpg
I wish Unix and co did it the Windows way - leave it to the invoked process to
parse. I have an app that takes an SQL query on the command line and I have to
remember I can't just type SELECT * because the shell thinks it knows what I
mean by *.

~~~
daxelrod
Note that even on Windows, you still have to deal with the shell interpreting
what you type.

The biggest downside of the Windows way, though, is that there stops being one
answer to "how do I pass arbitrary strings on the command line to another
process"?

Sure, most programs use CommandLineToArgvW but it's also hard to find
information on how a particular program parses its command line.

[https://blogs.msdn.microsoft.com/twistylittlepassagesallalik...](https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-
quotes-command-line-arguments-the-wrong-way/) is an overview of the fun that
is command line handling on Windows.

~~~
zokier
> Note that even on Windows, you still have to deal with the shell
> interpreting what you type.

Hypothetically that is not strictly true. You could create a shell that would
be a very thin wrapper around CreateProcess and not do any interpretation or
parsing of the childs arguments. In unixy shell _must_ do at minimum some
degree of parsing to split the arguments.

------
webuser321
A more interesting comparison would be glibc vs other libc.

What are the significant differences?

Is "flexibility", for lack of a better word, really a desirable property for
entering command line arguments?

Should parsing, cf. entering, arguments be simple or complex?

Are there better ways to pass arguments/parameters than on the commandline?

Besides config files, which themselves may introduce parsing complexities.

Consider passing "arguments" as environmental variables, e.g., as in
daemontools, envdir. Variables can be read from "files" in a chroot directory.

~~~
majewsky
> Variables can be read from "files" in a chroot directory.

Reading environment variables from process memory is less expensive (requiring
only memory access) than reading files, which requires at least three syscalls
(open, read, close) and thus multiple context switches, and, worst of all,
disk access (unless you're putting the files in a tmpfs).

> Are there better ways to pass arguments/parameters than on the commandline?

If such a system were designed today, data would probably be strongly typed
(passed as structured objects) rather than stringly typed (passed as text). In
fact, to my understanding, this is what Powershell does. I think that the
conceptual simplicity of using free-form text is a virtue for Unix more than a
burden, but that's also an argument of taste.

~~~
webuser321
"... which requires at least three syscalls..."

As does every program that reads from a config file.

One does not have to use files to set the variables of course. This just works
well for long-running-programs, e.g. daemons.

"... unless you're putting files in a tmpfs)"

Always files are in mfs.

As for strongly typed data and passing data instead of text, I prefer k to
Powershell. There is also a uniformity to APL built-in functions. The number
of arguments is limited. PS is quite slow on startup and too verbose.

As for "free-form" sometimes rearranging arguments (as glibc is capable of)
can cause more complexity than is warranted.

Consider a program like signify. Then consider gpg.

The best command line program "interfaces" IMO are the ones the fewest options
and least possible variability in arguments. Best interface is no interface,
etc.

One of the obvious benefits for UNIX programs of this nature is portability.

------
smilekzs
Another perspective (as a rhetorical question): Why does "command line
argument(s)" have to be an array of strings, rather than a single string?
Like, consider an alternative universe where C had `int main(const char*
arg)`.

~~~
pwdisswordfish
You mean, like this alternative universe? [https://msdn.microsoft.com/en-
us/library/windows/desktop/ms6...](https://msdn.microsoft.com/en-
us/library/windows/desktop/ms633559\(v=vs.85\).aspx)

Flipping this around, why does the 'command line' have to be a string (or a
list of strings) that my program has to parse, rather than a ready-to-be-used
data structure? Consider instead an alternative universe where the command-
line arguments to your program are represented with something like a JSON
object. Or better, a struct that the command-line interpreter can type-check.

------
chrismatheson
Just the TOC on this made me laugh, haven't even read the rest yet

------
tetha
Mh. Is this a fair comparison? The unix part is massively simplified, and the
windows parts goes into great and great details about the implementation of
that entry point.

Bash command line splitting alone is nasty, let alone the 6 other shells.
Every language under the sun has a dozen and twelve libraries to parse argv[]
into something useful.

If you document every nook, cranny and special case of those libraries, the
list for unix will dwarf the special cases of windows. In fact, I think bash
command line parsing alone can dwarf the windows section, according to the man
page.

~~~
laumars
The reason the Windows section is longer is because in Windows it's the
application that splits the argv[]. An application receives the parameters as
one long string and the application itself splits it into argv[] which means
that there isn't a whole lot of consistency in how argv[] gets populated.

Where as on Linux / UNIX, parameters arrive at the application already split.
Your point about the shells is a good one because at least on Linux you only
need to learn the nuances of your preferred shell (though in my experience all
the main ones seem to follow the same rules reliably) knowing that you wont
have any weird issues for example with quotes being included or not included
in argv[] in some applications but not others despite formatting the
parameters in exactly the same way.

From a personal point of view, I do find working with parameters in Windows a
highly frustrating affair when compared with Linux. However I appreciate the
issue is down to inheritance rather than design and Microsoft cannot easily
change something like this without breaking a thousand things in subtle and
often unexpected ways.

