Hacker News new | past | comments | ask | show | jobs | submit login
Conventions for Command Line Options (nullprogram.com)
124 points by zdw on Aug 1, 2020 | hide | past | favorite | 106 comments



Just, whatever, you do, please please PLEASE PLEASE support `--help`. I don't mean `-h` or `-help`, I mean `--help`.

There's only one thing the long flag can possibly mean: give me the help text. I understand that your program might prefer the `-help` style, but many do not. And do you know how I figure out which style your program likes? That's right, I use `--help`. I have to use `--help` rather than just `-help` because of GNU userspace tools, among others. It seems unlikely they're going to suddenly clean up their act this decade, so I have to default to starting with `--help`.

So it's very frustrating when the response to `program --help` is "Argument --help not understood, try -help for help." This is then often followed by me saying indecent things about the stupid program.


Even better: support both -h and --help.

Also, don't pipe the help output to stderr if the user requested help. "prog --help | less" not working is annoying.


Or worse - you pass --help and the script runs and starts doing stuff


And is then not robust against being stopped via ctrl c


also --version (and -v if possible, although that can be mapped to --verbose)


Traditional UNIX®️ options for help are -? and -h. --long-options are a horrid GNU-ism and shunned by clean UNIX®️ compliant programs, because such programs come with detailed, professionally written manual pages which contain extensive SYNOPSIS and EXAMPLES sections.

Implementing --long-options makes the poor users type much more, hurting ergonomy and the users' long-term productivity and efficiency.


I think long human-readable options are very nice when invoking a command in a script.

A (hypothetical) script can invoke rsync like this:

  rsync -Oogx
but long options makes things more readable:

  rsync \
    --omit-dir-times \
    --owner \
    --group \
    --one-file-system
It also helps when you're wondering if '-d' means debug or directory.

(although yes, it can allow options to be added to a program ad-nauseum)


-? - what are you smoking? :)

That is a glob in most shells, try putting a file named "-a" in your current directory and see what happens...

The proper way to write it would be -\?


It goes without saying that it has to be escaped in most shells, but that is the traditional option for help on UNIX®️. You would be well advised to educate yourself on the history of UNIX®️ before coming up with "what are you smoking?"


I see nothing wrong with extensive SYNOPSIS and EXAMPLES sections for programs using --long-options.

The fish shell extracts completion information from man pages, so that the user does not have to type so much and they get useful output, pulled from the manpage, no less.


I write a lot of commandline utilities and scripts for automating tasks. I find environment variables much simpler to provide and accept key/value pairs than using arguments. Env vars don't depend on order; if present, they always have a value (even if it's the empty string); they're also inherited by subprocesses, which is usually a bonus too (especially compared to passing values to sub-commands via arguments, e.g. when optional arguments may or may not have been given).

Using arguments for key/value pairs allows invalid states, where the key is given but not the value. It can also cause subsequent, semantically distinct, options to get swallowed up in place of the missing value. They also force us to depend on the order of arguments (i.e. since values must follow immediately after their keys), despite it being otherwise irrelevant (at least, if we've split on '--'). I also see no point in concatenating options: it introduces redundant choices, which imposes a slight mental burden that we're better off without.

The only advice I wholeheartedly encourage from this article is (a) use libraries rather than hand-rolling (although, given my choices above, argv is usually sufficient!) and (b) allow a '--' argument for disambiguating flag arguments from arbitrary string arguments (which might otherwise parse as flags).


Some tangential musings:

I can't stop but see here parallels between cmdline arguments vs. environment variables for programs, and keyword arguments vs. dynamic binding for functions in a program (particularly in a Lisp one).

That is, $ program --foo=bar vs. $ FOO=bar program

seems analogous to:

  (function :foo "bar")
  ;; vs.
  (let ((*foo* "bar"))
    (function))
When writing code (particularly Lisp), keyword arguments are preferred to dynamic binding because the function signature is then explicitly listing arguments it uses, and dynamic binding (the in-process phenomenon analogous to inheriting environment from a parent process) is seen as dangerous, and a source of external state that may be difficult to trace/spot in the code.

I suppose the latter argument applies to env vars as well - you can accidentally pass values differing from expectation because you didn't know a parent process changed them. The former doesn't, because processes don't have a "signature" specifying its options, at least not in a machine-readable form. Which is somewhat surprising - pretty much all software written over the past decades follows the pattern of accepting argc, argv[] (and env[]), and yet the format of arguments is entirely hidden inside the actual program. I wonder why there isn't a way to specify accepted arguments as e.g. metadata of the executable file?


Lisp will at least warn you if you bind a variable with earmuffs that isn't declared special though. Biggest downside to environment variables is the lack of warning if you misspell something.


Interesting that you bring up Lisp and dynamic scope, since I've previously combined env vars with Racket's "parameters" (AKA dynamic bindings): https://lobste.rs/s/gnniei/setenv_fiasco_2015#c_owz2pp


Yeah, I just can't stop thinking about envs / dynamic binding when someone brings up the other.

Speaking of using both direct and dynamic arguments, a common pattern in (Common) Lisp is using dynamically-bound variables (so-called "special" variables) to provide default values. Since default values in (Common) Lisp can be given as any expression, which gets evaluated at function-call time, I can write this:

  (defun function (&key (some-keyword-arg *some-special*))
    ...)
And then, if I call this function with no arguments, i.e (some-function), the some-keyword-arg will pull its value from current dynamic value of some-special. In process land, this would be equivalent to having command line parser use values in environment variables as defaults for arguments that were not provided.


In my head there's been a hierarchy for a long time.

when I build command line utilities and I think about the way that they'll be used, I tend to use configuration files for things that will change very slowly over time, and environment variables as a way to override default behaviors, and command line arguments to specify things that will often vary from invocation to invocation. In fact, most of the time, I use the environment variables either for development/testing features that I don't really intend to expose to most users, or for credentials that don't get persisted.

it's never occurred to me to use environment variables as a primary way to configure an application. I'll have to noodle on that for a while. My gut says that it's enough of a deviation from the Unix convention that I probably won't use that.


> My gut says that it's enough of a deviation from the Unix convention that I probably won't use that.

It's not completely unheard of. Checking the man pages for a few I remember being affected:

* `ls` looks at LS_COLORS

* `grep` looks at GREP_OPTIONS, GREP_COLOR, GREP_COLORS, as well as the not-grep-specific LC_ALL, LC_COLLATE, LANG, etc, and POSIXLY_CORRECT

* `find` also looks at LC_%, LANG, POSIXLY_CORRECT as well, plus TZ and PATH for some of its other options

Looks like the program-specific ones LS_% and GREP_% can be overridden by options directly on the command, but the others don't have such options.

(Makefile wildcards because asterisk keeps doing italics)


I feel like I didn't make "primary" clear enough - it would be like using an environment variable for the path for ls. That's gonna be a big no from me, dog.


> it would be like using an environment variable for the path for ls. That's gonna be a big no from me, dog

Me too. My point about env vars was limited to key/value options. I'm happy to use positional arguments when names are irrelevant, e.g. `ls foo/` is better than `DIR=foo/ ls` or `ls --dir foo/` or `ls --dir=foo/` or whatever, since the latter just adds noise and increases the chance we'll get something wrong.

I'm also happy to use arguments for flags which are either present or absent, where values are irrelevant, e.g. `ls --all` rather than e.g. `ALL=1 ls` or `ls --all=yes` or whatever. In this case the presence of a value is even more hamful, since the program might only be checking for the variable's presence, in which case `ALL=0 ls` would be misleading; plus we'd have to guess whether the value should be 0/1, yes/no, y/n, on/off, enable/disable, etc.


These are all GNU/Linuxisms and are nowhere to be found in a real UNIX®️ like illumos, Solaris, HP-UX or IRIX64.

This is also a good example of why having GNU/Linux as one's first OS instead of a real UNIX®️ is so toxic.


Is this a joke?


No, it's a judgment on GNU, GNU/Linux and the horrid incompetence of contemporary information technology industry.


There's a few reason I don't like environment variables for this: the first is that a random env variable can influence the operation of a program. Do "export foo=bar" and then maybe 30 minutes later you unexpectedly pass "foo". It's also hard to inspect; the output of "env" tends to be somewhat long, and you don't know which the program will pick up on. Flags are much clearer and more explicit.

There's also the issue that typo'ing "foo" as "fooo" will silently fail. Okay, that's a simple example, but some tools have "PROGRAMNAME_LONGDESCRIPTION". Being in all-caps also makes it hard to spot typos for me. You generally want your env vars to be long, so ensure they're unique.


> It's also hard to inspect; the output of "env" tends to be somewhat long, and you don't know which the program will pick up on. Flags are much clearer and more explicit.

I don't think I disagree with your post, but this particular point seems to be trivially solved by prefixing the variable name with the program name?

IOW: `env | grep '^GIT_'` to see what you're passing to `git`.


Yeah, fair enough; that's what I do as well.


I’m curious if your preference for environment variables is only for writing programs or for using programs as well. From a user’s perspective, would you prefer that common commands use environment variables to get options? For example, would you prefer a “find” command that uses environment variables instead of command line options?


> For example, would you prefer a “find” command that uses environment variables instead of command line options?

I use 'find' so much that my muscle memory would hurt if it changed, but it's actually a really interesting example. Its use of arguments to build a domain-specific language is a cool hack, but pretty horrendous; e.g.

    find . -type f -a -not \( -name \*.htm -o -name \*.html \)
We can compare this to tools like 'jq', which use a DSL but keep it inside a single string argument:

    key=foo val=bar jq -n '{(env.key): env.val}'
Note that 'jq' also accepts key/value pairs via commandline arguments, but that requires triplets of arguments, e.g.

    jq -n --arg key foo --arg val bar '{(env.key): env.val}'


The second example didn't work for jq-1.6 ---I had to write it as

  jq -n --arg key foo --arg val bar '{($key): $val}'


Yes you're right. I spotted it myself, but only after I couldn't edit anymore :(


That would depend on the target, the Grand Parent talks about Automation thus the target for their utilities is most likely the system not a user.

When writing automation routines env vars are often better form my experience

Tools that will be manually run by a user or admin then cli options are better


I think there's a place for both, but env vars can make things really annoying to troubleshoot. Already, I often print all env vars before running automated commands and it can be a mess to dig through. Culling down environment variables when spawning a subprocess is difficult. Bad flags can error immediately if you made a typo. I've often misspelled an envvar and its hard to tell it did nothing (I think I saw a recent bug where trailing white space like "FOO " was the source of a years long bug). `FOO=BAR cmd` is also weird for history (although that's mostly a tooling issue).


It seems equivalent to long options requiring a value. Also, if you mistype an environment variable, you won't get warned about it.


Another nice thing about env vars is that they don't appear in the process table, unlike argv. For many applications, that makes them a suitable place to inject secrets, and makes them convenient to inject via wrapper tools.

For instance, the fact that the aws(1) cli tool and the AWS SDKs by default take IAM creds from AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, etc. means that you can write programs like aws-vault (https://github.com/99designs/aws-vault) which can acquire a temporary AWS session and then execute another command with the creds injected in the standard AWS_... env vars. For instance:

    aws-vault exec prod-read -- aws s3 ls
    aws-vault exec stage-admin -- ./my-script-that-uses-the-aws-sdk-internally
Also, passing arguments as env vars is part of the "12-factor app" design (https://12factor.net/config). That page has some good guidance.


The environment does appear in the process table: see `ps e` on Debian at https://manpages.debian.org/buster/procps/ps.1.en.html or ps -e on BSD at https://www.freebsd.org/cgi/man.cgi?query=ps&apropos=0&sekti... or pargs -e on Solaris at https://www.unix.com/man-page/opensolaris/1/pargs


It's protected by at least uid, but `/proc/$PID/environ` does exist, and it might be exposed other ways, too.


That's actually a bad thing for non-secure arguments, and environment variables aren't much better for secure arguments.


Quick comment, not an expert, but environ vars keep their state after the program is called. From a functional programming perspective, or just for my own sanity, wouldn’t it be more interesting to keep the states into minimum?


You're right that mutable state should be kept to a minimum, but immutable state is fine. There usually isn't much need to mutate env vars.

Some thoughts/remarks:

- If we don't change a variable/value then it's immutable. Some languages let us enforce this (e.g. 'const'), which is nice, but we shouldn't worry too much if our language doesn't.

- We're violating 'single source of truth' if we have some parts of our code reading the env var, and some reading from a normal language variable. This also applies to arguments through.

- Reading from an env var is an I/O effect, which we should minimise.

- We can solve the last 2 problems by reading each env var once, up-front, then passing the result around as a normal value (i.e. functional core, imperative shell)

- Env vars are easy to shadow, e.g. if our script uses variables FOO and BAR, we can use a different value for BAR within a sub-command, e.g. in bash:

    BAR=abc someCommand
This will inherit 'FOO' but use a different value of 'BAR'. This isn't mutable state, it's a nested context, more like:

    let foo  = 123
        bar  = "hello"
        baz  = otherCommand
        quux = let bar = "abc"
                in someCommand
As TeMPOraL notes, env vars are more like dynamically scoped variables, whilst most language variables are lexically scoped.


That's only if you `export` the vars. You can do `FOO=1 BAR=2 cmd` and FOO and BAR will only have those values for the process and children. This is isomorphic to `cmd --foo=1 --bar=2`.


Absolutely, this is one of the reasons that environment variables are a nightmare to work with


The beautiful thing of environment variables is that you can read them whenever you actually need them in your program. They pierce a hole through your call stack to the point that you need them. On the contrary, for command line arguments, you need to pass their values from the main function through whatever deep of the call stack you need them.


I think that's an advantage. You know what goes in and out, you can see where it's passed and used. It provides transparency and fewer accidental behavioral changes.

E.g. I had a really weird bug in a script, git seemed to break for no reason, until i figured out that some other script had exported GIT_DIR which pointed to the git top level directory. The name of the variable isn't bad if want to save that directory. But git uses GIT_DIR as the location of the .git directory it should look at when it is defined.

Using environment variables is kind of like using global state, it can often cause weird behavior at a distance without a clear path of influence if you don't know exactly which environment variables are used by all your scripts.


I'd say that's an example of "shotgun parsing", which is a bad idea since we might have already performed some irrevocable actions by the time we notice part of the input is missing/invalid ( https://stackoverflow.com/questions/50852226/what-does-shotg... ). It's better to check for and parse all input up-front https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

I write a lot of tools in Haskell, where 'getEnv' is an "IO action" whose type prevents us using it inside most code. The simplest way to handle this ends up with a "functional core, imperative shell" model, e.g. most of my CLIs have the form (where 'interact' takes a pure String -> String function and uses it to transform stdio):

    main = do
      foo <- either error id . parseFoo <$> getEnv "FOO"
      interact (doTheThing foo)

    doTheThing :: Foo -> String -> String
    doTheThing = ...
This is also trivial to test. Some languages make it easy to replace env vars inside a single expression (e.g. bash), but others require us to set and reset the global environment, which is painful and error prone (especially if tests are run concurrently).


In many languages you can always get at the argv list; e.g. sys.argv in Python, os.Args() in Go, ARGV in Ruby, etc.

It's not universal though; you can't in C or shell scripts for example (although, if you really want to, it's not hard to assign a global). I never had the need to inspect commandline arguments beyond the main() function though.


Same thing can be said about global variables. I think any sane developer would agree it's a anti-pattern.


Mutating global variables is certainly an anti-pattern, but I'm not so convinced that global constants are a problem; i.e. if we treat env vars like immutable variables (in fact I treat most variables as immutable).


If the command line arguments are getting to0 much, the next step is to use an config file, like an INI file and parse that.

Very simple and removes all the extensive command line clutter, especially if many settings stay the same.

I think environment variables are not very user friendly and not that visible.


Config files introduce a whole slew of complexity: race conditions, a whole bunch of new error conditions (permission denied, parse errors, read-only filesystems, disk full, etc.); system- or user-wide config files are spooky action-at-a-distance, which can make problems hard to track down, makes commands non-deterministic and may impact reproducibility (e.g. on other people's machines).

If commands insist on taking options via a config file, my preferred way to call them is to generate the config in-place via process substitution, e.g.

    someCommand -c <(printf 'foo\n\tbar="%s"\n\tbaz="%s"\n' "$BAR" "$BAZ")
Although this isn't always possible, e.g. if 'someCommand' tries seeking around the config file rather than just parsing it in one pass.


Libraries can provide validation of key:value options. For example in python-land ‘click’ provides ‘nargs’ and ‘type’ parameters for the common case, and if those don’t suffice a custom coerce/validate callback.


Most cli parsers don't fully support the author's suggested model because it means you can't parse argv without knowing the flags in advance.

For example, the author suggests that a cli parser should be able to understand these two as the same thing:

    program -abco output.txt
    program -abcooutput.txt
That's only doable if by the time you're parsing argv, you already know `-a`, `-b`, and `-c` don't take a value, and `-o` does take a value.

But this is a pain. All it gets you is the ability to save one space character, in exchange for a much complex argv-parsing process. The `-abco output.txt` form can be parsed without such additional context, and is already a pretty nice user interface.

For those of us who aren't working on ls(1), there's no shame in having a less-sexy but easier-to-build-and-debug cli interface.


But every single command line parser I have ever used (and I have used many over the past 25 years in many programming languages) does in fact know before parsing that a b and c don't take a value and o does: accepting the grammar for the flags and then parsing them in this way is like the only job of the parser?


I suppose the only reason why you wouldn't want to have a centralized description of the command line grammar is if you're using flags which alter the set of acceptable flags. Like e.g. if you put all git subcommands into one single executable - the combinations and meanings of commandline flags would become absurdly complicated.


This is usually handled by adding support for that sort of syntax to the parsing library. The result is typically not complicated: you have a parser for the global options, then, if using subcommands, in effect a map of subcommand name to parser for options for that command.


Some programs may allow for plugins which can have their own options. But you may want to provide an option for where to load the plugins from.


Why would someone ever want to use the second syntax? (Genuine question, not rhetorical!)

It seems using = for optional option arguments is only allowed with long form. Why is that? Wouldn't it be nicer to be able to write "program -c=blue" rather than "program -cblue"?


The second syntax when you want to ask a program to pass an argument to another program. Having to pass 2 arguments is a lot more annoying since you'd often then have to prefix each one.


Very opinionated, and IMO without enough justification.

The assertions around short options with arguments - conjoining with other short options, for example - are actively harmful to legibility in scripts, since there's no lexical distinction between the argument and extra short options. I don't recommend using that syntax when alternatives are available and I deliberately don't support it when implementing ad-hoc argument parsing (typically when commands are executed as parsing proceeds).


Counterexamples where this is good for legibility.

    tar -xf archive.tar.gz

    tar -czf archive.tar.gz dir/


I'm guessing the comment was talking about examples like this:

    program -abcdefg.txt
Just from reading this, you can't tell where the flags end and the filename begins unless you have all the flags and their arities memorized.


Worse than that, it adds an unstable convention to an otherwise stable interface. What if there used to be a, b flags and then they add a c flag? Who knows what your command does now. It's just bad all around.


It’s not unstable or ambiguous at all.

    -abcdefg.txt
In your hypothetical -b takes an argument which means it consumes all the characters following it. Anything following b is never interpreted as a flag. There is no danger of -c changing the parsing. Even if b takes an optional argument it still consumes all characters following it.


Ah, you are right.


That’s always true even before you talk about option combining.

    python scr.py -a -v
Interpreting this command correctly requires you to know that python passes everything after the first positional argument to the script.

    ansible play.yml -v
Where ansible itself consumes the -v argument.

It gets even worse with subcommands.

    sudo -k command -a subcommand -v
Options parsing is nice because the arity of options is always 0 or 1.


> tar -czf dir/

You forgot the output filename.

The thing I like about tar is that you only need to learn two options: 'c' and 'x':

    tar c somedir | gzip > somedir.tar.gz
    cat somedir.tar.gz | gunzip | tar x


Thank you! Fixed!


> short options with arguments - conjoining with other short options, for example - are actively harmful to legibility in scripts

Agreed, which is why you shouldn't do that :-) They're a lot easier to type though. I consider short and long arguments as solving different problems: the short are short so it's easy for interactive usage (especially for common options), and the long ones are for scripting (or discoverability, especially with things like zsh's completion).

So I usually use "curl -s" on the commandline, and "curl --silent" when writing scripts.


I really do like argparse.

It will cleanly do just about anything you need done, including nice stuff like long/short options, default values, required options, types like type=int, help for every option, and even complicated stuff like subcommands.

And the namespace stuff is clever, so you can reference arg.debug instead of arg['debug']


I always found argparse did argument parsing well enough but it felt clunky when you need something more complicated like lots of subcommands. I find myself using it exclusively when I'm trying to avoid dependencies outside the Python standard library.

My choice of argument parsing in Python is Click. It has next to no rough edges and it's a breath of fresh air compared to argparse. I recently recommended it to a colleague who fell in love with it with minimal persuasion from me. I recommend it highly.

[1] https://click.palletsprojects.com/en/7.x/


I feel like argh and plac preceded/inspired Click.

Also, it's not Python but in Nim there is https://github.com/c-blake/cligen which also does spellcheck/typo suggestions, and allows --kebab-case --camelCase or --snake_case for long options, among other bells & whistles.


Spell check is something I'd love to see in Click. As complicated as git can be, I always liked the spell check it has for subcommands.

As for the different cases, I personally avoid using camel or snake case purely because I don't need to reach for the shift key. Maybe some people like it, but I find it jarring.


Agreed vis a vis SHIFT.

cligen also has a global config file [1] to control syntax options like whether '-c=val' requires an '=' (and if it should be '=' or ':', etc.) which has been discussed here, various colorization/help rendering controls, etc. Shell scripts can always export CLIGEN=/dev/null to force more strict and/or default modes.

[1] https://github.com/c-blake/cligen/wiki/Example-Config-File


That is a fascinating project. I will look at it.

What I wonder is -- there are some very nice third-party libraries for python, and I wonder why they don't make it into the standard library. It would be nice if there were ways of "a tide that raises all ships"


I don't disagree, but I think there's a fair amount of friction when maintaining standard libraries rather than third party ones just because of the implied stability. Your versioning is tied to python itself so it's probably kind of dull having to work with an API that you can't improve simply by bumping the version of the library.


Try Typer (https://typer.tiangolo.com/) sometime, which is built on Click.


At a glance it looks a bit too simple. I didn't look to far into it, but it seems to be missing short options, prompting for options and using environment variables for options. As it's built on click, I'd guess you can call into click to do those, but at that point I don't see a major benefit typer is providing.


Look, much of the purpose of posting things in a public forum is not to directly converse with the person to whom you are replying, but to add something relevant and interesting to third-party readers of the thread.

It's up to you whether you're interested in the software package linked in the comment or not, but it is harmful to third-party readers who might potentially be interested in it when you inexplicably reply with unsubstantiated lies about it, which they might take as true.


A comment actually directed toward third-party readers would read like "For people who like Click but could also use X, try Typer." Using the second-person imperative to say 'Try this' challenges your interlocutor to ask "Why should I?", to evaluate it for their own use case, to say "You shouldn't have recommended this to me with no qualification, it's not for everyone."


To be clear, not only is the package in question not "missing short options, prompting for options and using environment variables for options", there is not even the slightest indication on its docs site that it might be missing such features.

I'm not harmed by the reckless indifference of that comment, because I know a decent amount about these libraries, and it's not my business if people want to self-harm by believing false things, but people scrolling the thread are potentially harmed ("damn, I like the syntax of this, but it doesn't support short options?").


> program -iinput.txt -ooutput.txt

What good is that? Who wants to save a space? Given -abcfoo.txt I can't tell whether it's abcf oo.txt or -abc foo.txt? So that's a definite drawback, and the benefit is?


Right, for all we know, that's 8 options and two duplicate. It's also harder to implement because now I have to keep symbol table (or I have to refer more frequently to it.) I can't see much of a utility argument here except terseness for its own sake. In that case, why don't we just pack our own bytes?


> Go’s [...] intentionally deviates from the conventions.

Sigh

Of course if does.


My impression is that nobody bothers with the Go flags package and most people use the POSIX compatible pflag [1] library for argument parsing, usually via the awsome cobra [2] cli framework.

Or they just use viper [3] for both command line and general configuration handling. No point reinventing the wheel or trying to remember some weird non-standard quirks.

[1] https://github.com/spf13/pflag

[2] https://github.com/spf13/cobra

[3] https://github.com/spf13/viper


Nobody.. except Hashicorp :(


I always wondered who to blame for the single - long arguments in terraform.


Go -options sound like what X11 uses.


Well it does keep things simple, and as a side effect removes the cognitive burden of choosing a certain argument style (for both the developer and user).


It definitely doesn't make things simpler. Maybe in a vacuum, it would have been simpler. But it's not in a vacuum, and it breaks with well established conventions, so we have to remember which special snowflake programs decided not to do what we expect.


> When grouped, the option accepting an argument must be last ... program -abcooutput.txt

Good grief, no.

There may be some utilities out there which work this way, but it is not convention and should not be regarded as one.

Single letter options should almost always take an argument as a separate argument.

Some traditional utilities allow one or more arguments to be extracted as they are scanning through a "clump" of single letter options:

  -abc b-arg c-arg
Implementations of tar are like this.

Newly written utilities should not allow an option to clump with others if it requires an argument. Only Boolean single-letter options should clump.

Under no circumstances should an option clump if its argument is part of the same argument string. For instance consider the GCC option -Dfoo=bar for defining a preprocessor macro, and the -E option doing preprocessing only. Even if -Dfoo=bar is last, we don't want it to clump with -E as -EDfoo=bar --- and it doesn't.

But, in the first place, even if it did, we don't want to be looking to C compilers for convention, unless we are specifically making a C compiler driver that needs to be compatible.


Some other commenters have mentioned environment variables as input.

IMO there are broadly two types of command: plumbing and porcelain. There's a certain amount of convention and culture in distinguishing them and I'm not going to try to argue the culture boundary...

For the commands which are plumbing (by whatever culture's rules), the following apply:

* They are designed to interact with other plumbing: pipes, machine-comprehension, etc

* Exit code 0 for success, anything else for error. Don't try to be clever.

* You can determine precisely and unambiguously what the behaviour will be, from the invocation alone. Under no circumstances may anything modify this; no configuration, no environment.

For the commands which are porcelain (by the same culture's rules, for consistency), the following apply:

* Try to be convenient for the user, but don't sacrifice consistency.

* If plumbing returns a failure which isn't specifically handled properly, either something is buggy or the user asked for something Not Right; clean up and abort.

* Environment and configuration might modify things, but on the command line there must be the option to state 'use this, irrespective of what anything else says' without knowing any details of what the environment or configuration currently say.

To make things more exciting, some binaries might be considered porcelain or plumbing contextually, depending on parameters... (Yes, everyone sane would rather this weren't the case.)


Do I add more to this code just for convention? The command line option parsing (or broken ParseOptions dependency) will become magnitudes larger and more complex than what the program does.

  usage = 0
  argc = len(sys.argv)
  if argc == 2 and sys.argv[1] == "-r":
   hex2bin()
  else:
   if argc == 2 and sys.argv[1].startswith('-w', 0, 2):
    s = sys.argv[1][2::]
   elif argc == 3 and sys.argv[1] == '-w':
    s = sys.argv[2]
   elif argc >= 2:
    usage = 1
   if usage == 0:
    try:
     width = int(s)
    except ValueError:
     print("Error: invalid, -w {}".format(s))
     usage = 1
    except NameError:
     width = 40
   if usage == 0:
    bin2hex(width)
   else:
    print("usage: mondump [-r | -w width]")
    print("       Convert binary to hex or do the reverse.")
    print("            -r reverse operation: convert hex to binary.")
    print("            -w maximum width: fit lines within width (default is 40.)")
  sys.exit(usage)


No, you take code away by using argparse! Handles all this GNU longopt and argument parsing stuff for you, and autogenerates the --help display. Probably something like this:

    import argparse

    def auto_int(x): return int(x,0) # http://stackoverflow.com/questions/25513043/

    def main(argv):
        parser=argparse.ArgumentParser()
        parser.add_option('-r',dest='reverse',action='store_true',help='reverse operation: convert hex to binary')
        parser.add_option('-w',default=40,dest='width',nargs='?',type=auto_int,help='maximum width: fit lines within width (default is %(default)s.)")
        options=parser.parse_args(sys.argv[1:])
        if options.reverse: bin2hex(options.width)
        else: hex2bin()

    if __name__=='__main__': main(sys.argv[1:])


I recently ran into a case I hadn’t seen before with python’s argparse. Multiple arguments in a single option, e.g. “—foo bar daz” with —foo set to ‘*’ swallows both bar and daz, where I would have expected to have to explicitly specify “—foo bar —foo daz” to get that behaviour. I guess this is a side effect of treating non-option arguments the same as dash-prefixed arguments, but I have no idea what the “standard” to expect with this is?

Otherwise, my main bugbear is software using underscore instead of dash for long-names, and especially applications or suites that mix these cases.

I really like the simplicity that docopt somewhat forces you into, which avoids most of these tricky edge cases, but am seeing less and less usage nowadays of it.


Hmm, if you configure an option to take all args following, why would one be surprised by that?


I use clap/structop in Rust for all my CLIs ever since those libraries came out and it's just stupid easy to make a nicely functioning CLI. You define a data structure which holds the arguments a user will pass to your program and you can annotate fields to give them short and long names. You can also define custom parsing logic via plain functions if the user is passing something exotic.

At the end you can parse arguments into your data structure in one line. At that point the input has been validated and you now have a type safe representation of it. Well-formatted help text comes for free.


I very much like the subcommand paradigm git implements:

    mycmd <global args> subcommand <subcommand specific args>
However, I haven't found a C++ library that implements this properly with an easy-to-use API (and what I mean by easy to use is: specifying the structure of cmd line options should be easy and natural, and retrieving options specified by the user should be easy, and the "synopsis" part of the man page should be auto-generated).

If anyone knows of one, would love to hear about it.


Have you tried https://github.com/CLIUtils/CLI11 ? The subcommand example can be found at https://github.com/CLIUtils/CLI11/blob/master/examples/subco... . It can't generate man pages though


Hello there!

Yep, it's some kind of an advertisement, but for an open source project.

I made some experience with parsing command line options when working on an open source traffic simulation named SUMO (http://sumo.dlr.de) and decided to re-code the options library. It works similar to Python's argparse library - the options are defined first, then you may retrieve the values type-aware.

You may find more information on my blog pages: http://krajzewicz.de/blog/command-line-options.php

The library itself is hosted on github: - cpp-version: https://github.com/dkrajzew/optionslib_cpp - java-version: https://github.com/dkrajzew/optionslib_java

Sincerely, Daniel


I wish he mentioned what C++ option parser sticks to the rules he outlined.


Argparse doesn’t need sys.argv by default. The main gripe about it was also a bit odd.

Why would one want an option with an optional argument? And then complain about ambiguity? argparse is flexible on purpose but it doesn’t mean one needs to use every feature. optparse is still around for those who desire more discipline.

click is nice though I don’t care for multiple decorator driven libs in one project.


Nice to see these guidelines all laid out.

I grew up with the DOS / Windows convention of slashes, and clicked the link hoping there'd be mention of it.


For python click library is fantastic and follows all the good pratice explained in the blog post.


program -c # omitted program -cblue # provided program -c blue # omitted

This is confusing


Your comment is confusing due to missing line breaks :-) Try indenting the lines with two spaces.

And I agree, I would prefer:

  program -c=blue


Using the equals sign for long options has become convention, but using the equals sign for short options is very unusual.

I find "-c blue" the most intuitive of the possibilities. It does mean that you can't have an optional argument for short options.


Too late for that. Was on mobile app.

program -c # omitted program -cblue # provided program -c blue # omitted

This doesn't make sense. How -c blue is omitted while it looks just like an argument for -c


If the argument to "-c" is optional, the parser doesn't know if "blue" is a an optional argument to "-c" or separate argument to "program".

I think the takeaway is to avoid options with optional arguments. If the argument to -c was required, "-c blue" would be fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: