
Parsing command line arguments using a finite state machine and backtracking - elasticdog
https://jawher.me/2015/01/18/parsing-command-line-arguments-finite-state-machine-backtracking/
======
userbinator
That was interesting to read and certainly a different approach to solving a
problem, but the biggest question I had in my mind while reading the article
was "is it really necessary to create all this complexity --- and how does it
compare to the traditional getopt()?"

I've used getopt() many times, even implemented its functionality for fun, and
parsing commandline arguments has never been something I thought was
particularly difficult or complex.

~~~
taeric
I would think the biggest benefit is the fuller understanding of what you
built. Probably some efficiency benefits.

Of course, this is also the biggest fault, I would think. You rarely have the
luxury of understanding to this level all of the parts of your program. And
something like command line parsing probably is not the bottleneck in any way
of your program.

------
fintler
I ended up writing a blog post on how to handle CP arguments a while back. I
started coding before really thinking about it -- I mean, it's CP, how hard
could it be (famous last words)?

[http://blog.typeobject.com/thinking-out-loud-file-copy-
tool-...](http://blog.typeobject.com/thinking-out-loud-file-copy-tool-
arguments)

It turns out the arguments to CP are really tough to reason about if you just
naively dive into it. The approach in this article is much better than mine.
It feels like I was fumbling around in the dark.

If you're interested in seeing my implementation (it's buggy, but handles most
cases), it can be found at:

[https://github.com/hpc/dcp/blob/master/src/handle_args.c](https://github.com/hpc/dcp/blob/master/src/handle_args.c)

The entry point to handle_args.c is DCOPY_parse_path_args, which is called by
the main() in:

[https://github.com/hpc/dcp/blob/master/src/dcp.c](https://github.com/hpc/dcp/blob/master/src/dcp.c)

~~~
smhenderson
I thought about the same thing in regards to cp. I've been using OpenBSD a lot
lately and having it's source installed wanted to take a look at their coding
style and such. I picked cp as my first glance as I thought that should be
simple enough to understand quickly.

As you say, it's option parsing makes up the bulk of the cp.c code, with
utils.c doing most of the actual file copying. It turns out it's pretty tough
to offer a lot of options with some of them being mutually exclusive. Obvious
in hindsight but interesting to me at the time I was reading through it.

[http://cvsweb.openbsd.org/cgi-
bin/cvsweb/~checkout~/src/bin/...](http://cvsweb.openbsd.org/cgi-
bin/cvsweb/~checkout~/src/bin/cp/cp.c?rev=1.37&content-type=text/plain)

[http://cvsweb.openbsd.org/cgi-
bin/cvsweb/~checkout~/src/bin/...](http://cvsweb.openbsd.org/cgi-
bin/cvsweb/~checkout~/src/bin/cp/cp.c?rev=1.37&content-type=text/plain)

------
nine_k
Why write an FSM and backtracking explicitly? Use parser combinators, they
basically do the same thing, just express it in a clearer, grammar-friendly
way: [http://theorangeduck.com/page/you-could-have-invented-
parser...](http://theorangeduck.com/page/you-could-have-invented-parser-
combinators)

~~~
jawher
Parser combinators would certainly have been easier for me, the library author
but not necessarily so for the end user.

Contrast how it is done now:

    
    
      [-R [-H | -L | -P]] [-fi | -n] [-apvX] SRC... DST
    

With a parser combinator based approach:

    
    
      and(
        optional(and('-R', or('-H', '-L', '-P'))),
        optional(or(
                    '-fi', 
                    '-n'
                 )
        ),
        optional('-apvX'),
        repeatable('SRC'),
        'DST')
    

And this didn't even handle the fact that options order is not important.

~~~
jpdarago
For that I think you can do something similar to what mpc for C
([https://github.com/orangeduck/mpc](https://github.com/orangeduck/mpc)) and
LPeg for Lua ([http://www.inf.puc-rio.br/~roberto/lpeg/](http://www.inf.puc-
rio.br/~roberto/lpeg/)), which is to provide the parsing machinery and write a
small DSL with the same machinery for the users.

------
TheLoneWolfling
How does this avoid the exponential-time worst-case complexity of backtracking
approaches to walking NFAs?

