Just, whatever, you do, please please PLEASE PLEASE support `--help`. I don't mean `-h` or `-help`, I mean `--help`.
There's only one thing the long flag can possibly mean: give me the help text. I understand that your program might prefer the `-help` style, but many do not. And do you know how I figure out which style your program likes? That's right, I use `--help`. I have to use `--help` rather than just `-help` because of GNU userspace tools, among others. It seems unlikely they're going to suddenly clean up their act this decade, so I have to default to starting with `--help`.
So it's very frustrating when the response to `program --help` is "Argument --help not understood, try -help for help." This is then often followed by me saying indecent things about the stupid program.
Traditional UNIX®️ options for help are -? and -h. --long-options are a horrid GNU-ism and shunned by clean UNIX®️ compliant programs, because such programs come with detailed, professionally written manual pages which contain extensive SYNOPSIS and EXAMPLES sections.
Implementing --long-options makes the poor users type much more, hurting ergonomy and the users' long-term productivity and efficiency.
It goes without saying that it has to be escaped in most shells, but that is the traditional option for help on UNIX®️. You would be well advised to educate yourself on the history of UNIX®️ before coming up with "what are you smoking?"
I see nothing wrong with extensive SYNOPSIS and EXAMPLES sections for programs using --long-options.
The fish shell extracts completion information from man pages, so that the user does not have to type so much and they get useful output, pulled from the manpage, no less.
I write a lot of commandline utilities and scripts for automating tasks. I find environment variables much simpler to provide and accept key/value pairs than using arguments. Env vars don't depend on order; if present, they always have a value (even if it's the empty string); they're also inherited by subprocesses, which is usually a bonus too (especially compared to passing values to sub-commands via arguments, e.g. when optional arguments may or may not have been given).
Using arguments for key/value pairs allows invalid states, where the key is given but not the value. It can also cause subsequent, semantically distinct, options to get swallowed up in place of the missing value. They also force us to depend on the order of arguments (i.e. since values must follow immediately after their keys), despite it being otherwise irrelevant (at least, if we've split on '--'). I also see no point in concatenating options: it introduces redundant choices, which imposes a slight mental burden that we're better off without.
The only advice I wholeheartedly encourage from this article is (a) use libraries rather than hand-rolling (although, given my choices above, argv is usually sufficient!) and (b) allow a '--' argument for disambiguating flag arguments from arbitrary string arguments (which might otherwise parse as flags).
I can't stop but see here parallels between cmdline arguments vs. environment variables for programs, and keyword arguments vs. dynamic binding for functions in a program (particularly in a Lisp one).
That is,
$ program --foo=bar
vs.
$ FOO=bar program
seems analogous to:
(function :foo "bar")
;; vs.
(let ((*foo* "bar"))
(function))
When writing code (particularly Lisp), keyword arguments are preferred to dynamic binding because the function signature is then explicitly listing arguments it uses, and dynamic binding (the in-process phenomenon analogous to inheriting environment from a parent process) is seen as dangerous, and a source of external state that may be difficult to trace/spot in the code.
I suppose the latter argument applies to env vars as well - you can accidentally pass values differing from expectation because you didn't know a parent process changed them. The former doesn't, because processes don't have a "signature" specifying its options, at least not in a machine-readable form. Which is somewhat surprising - pretty much all software written over the past decades follows the pattern of accepting argc, argv[] (and env[]), and yet the format of arguments is entirely hidden inside the actual program. I wonder why there isn't a way to specify accepted arguments as e.g. metadata of the executable file?
Lisp will at least warn you if you bind a variable with earmuffs that isn't declared special though. Biggest downside to environment variables is the lack of warning if you misspell something.
Yeah, I just can't stop thinking about envs / dynamic binding when someone brings up the other.
Speaking of using both direct and dynamic arguments, a common pattern in (Common) Lisp is using dynamically-bound variables (so-called "special" variables) to provide default values. Since default values in (Common) Lisp can be given as any expression, which gets evaluated at function-call time, I can write this:
(defun function (&key (some-keyword-arg *some-special*))
...)
And then, if I call this function with no arguments, i.e (some-function), the some-keyword-arg will pull its value from current dynamic value of some-special. In process land, this would be equivalent to having command line parser use values in environment variables as defaults for arguments that were not provided.
In my head there's been a hierarchy for a long time.
when I build command line utilities and I think about the way that they'll be used, I tend to use configuration files for things that will change very slowly over time, and environment variables as a way to override default behaviors, and command line arguments to specify things that will often vary from invocation to invocation. In fact, most of the time, I use the environment variables either for development/testing features that I don't really intend to expose to most users, or for credentials that don't get persisted.
it's never occurred to me to use environment variables as a primary way to configure an application. I'll have to noodle on that for a while. My gut says that it's enough of a deviation from the Unix convention that I probably won't use that.
I feel like I didn't make "primary" clear enough - it would be like using an environment variable for the path for ls. That's gonna be a big no from me, dog.
> it would be like using an environment variable for the path for ls. That's gonna be a big no from me, dog
Me too. My point about env vars was limited to key/value options. I'm happy to use positional arguments when names are irrelevant, e.g. `ls foo/` is better than `DIR=foo/ ls` or `ls --dir foo/` or `ls --dir=foo/` or whatever, since the latter just adds noise and increases the chance we'll get something wrong.
I'm also happy to use arguments for flags which are either present or absent, where values are irrelevant, e.g. `ls --all` rather than e.g. `ALL=1 ls` or `ls --all=yes` or whatever. In this case the presence of a value is even more hamful, since the program might only be checking for the variable's presence, in which case `ALL=0 ls` would be misleading; plus we'd have to guess whether the value should be 0/1, yes/no, y/n, on/off, enable/disable, etc.
There's a few reason I don't like environment variables for this: the first is that a random env variable can influence the operation of a program. Do "export foo=bar" and then maybe 30 minutes later you unexpectedly pass "foo". It's also hard to inspect; the output of "env" tends to be somewhat long, and you don't know which the program will pick up on. Flags are much clearer and more explicit.
There's also the issue that typo'ing "foo" as "fooo" will silently fail. Okay, that's a simple example, but some tools have "PROGRAMNAME_LONGDESCRIPTION". Being in all-caps also makes it hard to spot typos for me. You generally want your env vars to be long, so ensure they're unique.
> It's also hard to inspect; the output of "env" tends to be somewhat long, and you don't know which the program will pick up on. Flags are much clearer and more explicit.
I don't think I disagree with your post, but this particular point seems to be trivially solved by prefixing the variable name with the program name?
IOW: `env | grep '^GIT_'` to see what you're passing to `git`.
I’m curious if your preference for environment variables is only for writing programs or for using programs as well. From a user’s perspective, would you prefer that common commands use environment variables to get options? For example, would you prefer a “find” command that uses environment variables instead of command line options?
> For example, would you prefer a “find” command that uses environment variables instead of command line options?
I use 'find' so much that my muscle memory would hurt if it changed, but it's actually a really interesting example. Its use of arguments to build a domain-specific language is a cool hack, but pretty horrendous; e.g.
find . -type f -a -not \( -name \*.htm -o -name \*.html \)
We can compare this to tools like 'jq', which use a DSL but keep it inside a single string argument:
key=foo val=bar jq -n '{(env.key): env.val}'
Note that 'jq' also accepts key/value pairs via commandline arguments, but that requires triplets of arguments, e.g.
jq -n --arg key foo --arg val bar '{(env.key): env.val}'
I think there's a place for both, but env vars can make things really annoying to troubleshoot. Already, I often print all env vars before running automated commands and it can be a mess to dig through. Culling down environment variables when spawning a subprocess is difficult. Bad flags can error immediately if you made a typo. I've often misspelled an envvar and its hard to tell it did nothing (I think I saw a recent bug where trailing white space like "FOO " was the source of a years long bug). `FOO=BAR cmd` is also weird for history (although that's mostly a tooling issue).
Another nice thing about env vars is that they don't appear in the process table, unlike argv. For many applications, that makes them a suitable place to inject secrets, and makes them convenient to inject via wrapper tools.
For instance, the fact that the aws(1) cli tool and the AWS SDKs by default take IAM creds from AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, etc. means that you can write programs like aws-vault (https://github.com/99designs/aws-vault) which can acquire a temporary AWS session and then execute another command with the creds injected in the standard AWS_... env vars. For instance:
Quick comment, not an expert, but environ vars keep their state after the program is called. From a functional programming perspective, or just for my own sanity, wouldn’t it be more interesting to keep the states into minimum?
You're right that mutable state should be kept to a minimum, but immutable state is fine. There usually isn't much need to mutate env vars.
Some thoughts/remarks:
- If we don't change a variable/value then it's immutable. Some languages let us enforce this (e.g. 'const'), which is nice, but we shouldn't worry too much if our language doesn't.
- We're violating 'single source of truth' if we have some parts of our code reading the env var, and some reading from a normal language variable. This also applies to arguments through.
- Reading from an env var is an I/O effect, which we should minimise.
- We can solve the last 2 problems by reading each env var once, up-front, then passing the result around as a normal value (i.e. functional core, imperative shell)
- Env vars are easy to shadow, e.g. if our script uses variables FOO and BAR, we can use a different value for BAR within a sub-command, e.g. in bash:
BAR=abc someCommand
This will inherit 'FOO' but use a different value of 'BAR'. This isn't mutable state, it's a nested context, more like:
let foo = 123
bar = "hello"
baz = otherCommand
quux = let bar = "abc"
in someCommand
As TeMPOraL notes, env vars are more like dynamically scoped variables, whilst most language variables are lexically scoped.
That's only if you `export` the vars. You can do `FOO=1 BAR=2 cmd` and FOO and BAR will only have those values for the process and children. This is isomorphic to `cmd --foo=1 --bar=2`.
The beautiful thing of environment variables is that you can read them whenever you actually need them in your program. They pierce a hole through your call stack to the point that you need them. On the contrary, for command line arguments, you need to pass their values from the main function through whatever deep of the call stack you need them.
I think that's an advantage. You know what goes in and out, you can see where it's passed and used. It provides transparency and fewer accidental behavioral changes.
E.g. I had a really weird bug in a script, git seemed to break for no reason, until i figured out that some other script had exported GIT_DIR which pointed to the git top level directory. The name of the variable isn't bad if want to save that directory. But git uses GIT_DIR as the location of the .git directory it should look at when it is defined.
Using environment variables is kind of like using global state, it can often cause weird behavior at a distance without a clear path of influence if you don't know exactly which environment variables are used by all your scripts.
I write a lot of tools in Haskell, where 'getEnv' is an "IO action" whose type prevents us using it inside most code. The simplest way to handle this ends up with a "functional core, imperative shell" model, e.g. most of my CLIs have the form (where 'interact' takes a pure String -> String function and uses it to transform stdio):
main = do
foo <- either error id . parseFoo <$> getEnv "FOO"
interact (doTheThing foo)
doTheThing :: Foo -> String -> String
doTheThing = ...
This is also trivial to test. Some languages make it easy to replace env vars inside a single expression (e.g. bash), but others require us to set and reset the global environment, which is painful and error prone (especially if tests are run concurrently).
In many languages you can always get at the argv list; e.g. sys.argv in Python, os.Args() in Go, ARGV in Ruby, etc.
It's not universal though; you can't in C or shell scripts for example (although, if you really want to, it's not hard to assign a global). I never had the need to inspect commandline arguments beyond the main() function though.
Mutating global variables is certainly an anti-pattern, but I'm not so convinced that global constants are a problem; i.e. if we treat env vars like immutable variables (in fact I treat most variables as immutable).
Config files introduce a whole slew of complexity: race conditions, a whole bunch of new error conditions (permission denied, parse errors, read-only filesystems, disk full, etc.); system- or user-wide config files are spooky action-at-a-distance, which can make problems hard to track down, makes commands non-deterministic and may impact reproducibility (e.g. on other people's machines).
If commands insist on taking options via a config file, my preferred way to call them is to generate the config in-place via process substitution, e.g.
Libraries can provide validation of key:value options. For example in python-land ‘click’ provides ‘nargs’ and ‘type’ parameters for the common case, and if those don’t suffice a custom coerce/validate callback.
Most cli parsers don't fully support the author's suggested model because it means you can't parse argv without knowing the flags in advance.
For example, the author suggests that a cli parser should be able to understand these two as the same thing:
program -abco output.txt
program -abcooutput.txt
That's only doable if by the time you're parsing argv, you already know `-a`, `-b`, and `-c` don't take a value, and `-o` does take a value.
But this is a pain. All it gets you is the ability to save one space character, in exchange for a much complex argv-parsing process. The `-abco output.txt` form can be parsed without such additional context, and is already a pretty nice user interface.
For those of us who aren't working on ls(1), there's no shame in having a less-sexy but easier-to-build-and-debug cli interface.
But every single command line parser I have ever used (and I have used many over the past 25 years in many programming languages) does in fact know before parsing that a b and c don't take a value and o does: accepting the grammar for the flags and then parsing them in this way is like the only job of the parser?
I suppose the only reason why you wouldn't want to have a centralized description of the command line grammar is if you're using flags which alter the set of acceptable flags. Like e.g. if you put all git subcommands into one single executable - the combinations and meanings of commandline flags would become absurdly complicated.
This is usually handled by adding support for that sort of syntax to the parsing library. The result is typically not complicated: you have a parser for the global options, then, if using subcommands, in effect a map of subcommand name to parser for options for that command.
Why would someone ever want to use the second syntax? (Genuine question, not rhetorical!)
It seems using = for optional option arguments is only allowed with long form. Why is that? Wouldn't it be nicer to be able to write "program -c=blue" rather than "program -cblue"?
The second syntax when you want to ask a program to pass an argument to another program. Having to pass 2 arguments is a lot more annoying since you'd often then have to prefix each one.
Very opinionated, and IMO without enough justification.
The assertions around short options with arguments - conjoining with other short options, for example - are actively harmful to legibility in scripts, since there's no lexical distinction between the argument and extra short options. I don't recommend using that syntax when alternatives are available and I deliberately don't support it when implementing ad-hoc argument parsing (typically when commands are executed as parsing proceeds).
Worse than that, it adds an unstable convention to an otherwise stable interface. What if there used to be a, b flags and then they add a c flag? Who knows what your command does now. It's just bad all around.
In your hypothetical -b takes an argument which means it consumes all the characters following it. Anything following b is never interpreted as a flag. There is no danger of -c changing the parsing. Even if b takes an optional argument it still consumes all characters following it.
> short options with arguments - conjoining with other short options, for example - are actively harmful to legibility in scripts
Agreed, which is why you shouldn't do that :-) They're a lot easier to type though. I consider short and long arguments as solving different problems: the short are short so it's easy for interactive usage (especially for common options), and the long ones are for scripting (or discoverability, especially with things like zsh's completion).
So I usually use "curl -s" on the commandline, and "curl --silent" when writing scripts.
It will cleanly do just about anything you need done, including nice stuff like long/short options, default values, required options, types like type=int, help for every option, and even complicated stuff like subcommands.
And the namespace stuff is clever, so you can reference arg.debug instead of arg['debug']
I always found argparse did argument parsing well enough but it felt clunky when you need something more complicated like lots of subcommands. I find myself using it exclusively when I'm trying to avoid dependencies outside the Python standard library.
My choice of argument parsing in Python is Click. It has next to no rough edges and it's a breath of fresh air compared to argparse. I recently recommended it to a colleague who fell in love with it with minimal persuasion from me. I recommend it highly.
I feel like argh and plac preceded/inspired Click.
Also, it's not Python but in Nim there is https://github.com/c-blake/cligen which also does spellcheck/typo suggestions, and allows --kebab-case --camelCase or --snake_case for long options, among other bells & whistles.
Spell check is something I'd love to see in Click. As complicated as git can be, I always liked the spell check it has for subcommands.
As for the different cases, I personally avoid using camel or snake case purely because I don't need to reach for the shift key. Maybe some people like it, but I find it jarring.
cligen also has a global config file [1] to control syntax options like whether '-c=val' requires an '=' (and if it should be '=' or ':', etc.) which has been discussed here, various colorization/help rendering controls, etc. Shell scripts can always export CLIGEN=/dev/null to force more strict and/or default modes.
What I wonder is -- there are some very nice third-party libraries for python, and I wonder why they don't make it into the standard library. It would be nice if there were ways of "a tide that raises all ships"
I don't disagree, but I think there's a fair amount of friction when maintaining standard libraries rather than third party ones just because of the implied stability. Your versioning is tied to python itself so it's probably kind of dull having to work with an API that you can't improve simply by bumping the version of the library.
At a glance it looks a bit too simple. I didn't look to far into it, but it seems to be missing short options, prompting for options and using environment variables for options. As it's built on click, I'd guess you can call into click to do those, but at that point I don't see a major benefit typer is providing.
Look, much of the purpose of posting things in a public forum is not to directly converse with the person to whom you are replying, but to add something relevant and interesting to third-party readers of the thread.
It's up to you whether you're interested in the software package linked in the comment or not, but it is harmful to third-party readers who might potentially be interested in it when you inexplicably reply with unsubstantiated lies about it, which they might take as true.
A comment actually directed toward third-party readers would read like "For people who like Click but could also use X, try Typer." Using the second-person imperative to say 'Try this' challenges your interlocutor to ask "Why should I?", to evaluate it for their own use case, to say "You shouldn't have recommended this to me with no qualification, it's not for everyone."
To be clear, not only is the package in question not "missing short options, prompting for options and using environment variables for options", there is not even the slightest indication on its docs site that it might be missing such features.
I'm not harmed by the reckless indifference of that comment, because I know a decent amount about these libraries, and it's not my business if people want to self-harm by believing false things, but people scrolling the thread are potentially harmed ("damn, I like the syntax of this, but it doesn't support short options?").
What good is that? Who wants to save a space? Given -abcfoo.txt I can't tell whether it's abcf oo.txt or -abc foo.txt? So that's a definite drawback, and the benefit is?
Right, for all we know, that's 8 options and two duplicate. It's also harder to implement because now I have to keep symbol table (or I have to refer more frequently to it.) I can't see much of a utility argument here except terseness for its own sake. In that case, why don't we just pack our own bytes?
My impression is that nobody bothers with the Go flags package and most people use the POSIX compatible pflag [1] library for argument parsing, usually via the awsome cobra [2] cli framework.
Or they just use viper [3] for both command line and general configuration handling. No point reinventing the wheel or trying to remember some weird non-standard quirks.
Well it does keep things simple, and as a side effect removes the cognitive burden of choosing a certain argument style (for both the developer and user).
It definitely doesn't make things simpler. Maybe in a vacuum, it would have been simpler. But it's not in a vacuum, and it breaks with well established conventions, so we have to remember which special snowflake programs decided not to do what we expect.
> When grouped, the option accepting an argument must be last ... program -abcooutput.txt
Good grief, no.
There may be some utilities out there which work this way, but it is not convention and should not be regarded as one.
Single letter options should almost always take an argument as a separate argument.
Some traditional utilities allow one or more arguments to be extracted as they are scanning through a "clump" of single letter options:
-abc b-arg c-arg
Implementations of tar are like this.
Newly written utilities should not allow an option to clump with others if it requires an argument. Only Boolean single-letter options should clump.
Under no circumstances should an option clump if its argument is part of the same argument string. For instance consider the GCC option -Dfoo=bar for defining a preprocessor macro, and the -E option doing preprocessing only. Even if -Dfoo=bar is last, we don't want it to clump with -E as -EDfoo=bar --- and it doesn't.
But, in the first place, even if it did, we don't want to be looking to C compilers for convention, unless we are specifically making a C compiler driver that needs to be compatible.
Some other commenters have mentioned environment variables as input.
IMO there are broadly two types of command: plumbing and porcelain. There's a certain amount of convention and culture in distinguishing them and I'm not going to try to argue the culture boundary...
For the commands which are plumbing (by whatever culture's rules), the following apply:
* They are designed to interact with other plumbing: pipes, machine-comprehension, etc
* Exit code 0 for success, anything else for error. Don't try to be clever.
* You can determine precisely and unambiguously what the behaviour will be, from the invocation alone. Under no circumstances may anything modify this; no configuration, no environment.
For the commands which are porcelain (by the same culture's rules, for consistency), the following apply:
* Try to be convenient for the user, but don't sacrifice consistency.
* If plumbing returns a failure which isn't specifically handled properly, either something is buggy or the user asked for something Not Right; clean up and abort.
* Environment and configuration might modify things, but on the command line there must be the option to state 'use this, irrespective of what anything else says' without knowing any details of what the environment or configuration currently say.
To make things more exciting, some binaries might be considered porcelain or plumbing contextually, depending on parameters...
(Yes, everyone sane would rather this weren't the case.)
Do I add more to this code just for convention? The command line option parsing (or broken ParseOptions dependency) will become magnitudes larger and more complex than what the program does.
usage = 0
argc = len(sys.argv)
if argc == 2 and sys.argv[1] == "-r":
hex2bin()
else:
if argc == 2 and sys.argv[1].startswith('-w', 0, 2):
s = sys.argv[1][2::]
elif argc == 3 and sys.argv[1] == '-w':
s = sys.argv[2]
elif argc >= 2:
usage = 1
if usage == 0:
try:
width = int(s)
except ValueError:
print("Error: invalid, -w {}".format(s))
usage = 1
except NameError:
width = 40
if usage == 0:
bin2hex(width)
else:
print("usage: mondump [-r | -w width]")
print(" Convert binary to hex or do the reverse.")
print(" -r reverse operation: convert hex to binary.")
print(" -w maximum width: fit lines within width (default is 40.)")
sys.exit(usage)
No, you take code away by using argparse! Handles all this GNU longopt and argument parsing stuff for you, and autogenerates the --help display. Probably something like this:
import argparse
def auto_int(x): return int(x,0) # http://stackoverflow.com/questions/25513043/
def main(argv):
parser=argparse.ArgumentParser()
parser.add_option('-r',dest='reverse',action='store_true',help='reverse operation: convert hex to binary')
parser.add_option('-w',default=40,dest='width',nargs='?',type=auto_int,help='maximum width: fit lines within width (default is %(default)s.)")
options=parser.parse_args(sys.argv[1:])
if options.reverse: bin2hex(options.width)
else: hex2bin()
if __name__=='__main__': main(sys.argv[1:])
I recently ran into a case I hadn’t seen before with python’s argparse. Multiple arguments in a single option, e.g. “—foo bar daz” with —foo set to ‘*’ swallows both bar and daz, where I would have expected to have to explicitly specify “—foo bar —foo daz” to get that behaviour. I guess this is a side effect of treating non-option arguments the same as dash-prefixed arguments, but I have no idea what the “standard” to expect with this is?
Otherwise, my main bugbear is software using underscore instead of dash for long-names, and especially applications or suites that mix these cases.
I really like the simplicity that docopt somewhat forces you into, which avoids most of these tricky edge cases, but am seeing less and less usage nowadays of it.
I use clap/structop in Rust for all my CLIs ever since those libraries came out and it's just stupid easy to make a nicely functioning CLI. You define a data structure which holds the arguments a user will pass to your program and you can annotate fields to give them short and long names. You can also define custom parsing logic via plain functions if the user is passing something exotic.
At the end you can parse arguments into your data structure in one line. At that point the input has been validated and you now have a type safe representation of it. Well-formatted help text comes for free.
I very much like the subcommand paradigm git implements:
mycmd <global args> subcommand <subcommand specific args>
However, I haven't found a C++ library that implements this properly with an easy-to-use API (and what I mean by easy to use is: specifying the structure of cmd line options should be easy and natural, and retrieving options specified by the user should be easy, and the "synopsis" part of the man page should be auto-generated).
If anyone knows of one, would love to hear about it.
Yep, it's some kind of an advertisement, but for an open source project.
I made some experience with parsing command line options when working on an open source traffic simulation named SUMO (http://sumo.dlr.de) and decided to re-code the options library. It works similar to Python's argparse library - the options are defined first, then you may retrieve the values type-aware.
Argparse doesn’t need sys.argv by default. The main gripe about it was also a bit odd.
Why would one want an option with an optional argument? And then complain about ambiguity? argparse is flexible on purpose but it doesn’t mean one needs to use every feature. optparse is still around for those who desire more discipline.
click is nice though I don’t care for multiple decorator driven libs in one project.
There's only one thing the long flag can possibly mean: give me the help text. I understand that your program might prefer the `-help` style, but many do not. And do you know how I figure out which style your program likes? That's right, I use `--help`. I have to use `--help` rather than just `-help` because of GNU userspace tools, among others. It seems unlikely they're going to suddenly clean up their act this decade, so I have to default to starting with `--help`.
So it's very frustrating when the response to `program --help` is "Argument --help not understood, try -help for help." This is then often followed by me saying indecent things about the stupid program.