gemini - kennedy.gemi.dev

💾 Archived View for dmerej.info › blog › 0022-docopt-v-argparse.gmi captured on 2022-04-28 at 17:29:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

2016, Sep 24 - Dimitri Merejkowsky
License: CC By 4.0

Say you have a program named `foo` that is supposed to be used from the command line, written in Python.

It provides lots of various features, so you decide to group them into subcommands, like `git` or `mercurial` does for instance.

So your program can be called in the following ways:

$ foo bar --baz --num-jobs=4
runs bar(baz=True, num_jobs=4)

$ foo read file.in --verbose
runs read("file.in", verbose=True)

Assuming you are using Python for this, which library should you use to implement command line parsing?

An history of command line parsing in Python

getopt

Back in the old days, the answer was getopt

Basically, you would call the `getopt()` function with some weird string, like this:

"""
Usage: foo [options]

Options:

  -h, --help           Display this help message
  -o, --output <PATH>  Output path
  -v,--verbose         Be verbose

"""

import sys
import getopt

def usage():
    print(__doc__)


def main():
    try:
        opts, args = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="])
    except getopt.GetoptError as err:
        # print help information and exit:
        sys.exit(err)  # will print something like "option -a not recognized"

    for o, a in opts:
        if o in ("-h", "--help"):
            usage()
            sys.exit(0)
        elif o in ("-o", "--output"):
          ...

Here `ho:v` means you recognize the short options `-h`, `-o` and `-v`, and the colon after the `o` means that `-o` must be followed by an argument (so `foo -o -v` does not work, but `foo -o foo.out` does) Note that you have to use a list of strings to have long options recognized too. (Here `-h` and `--help` will have the same meaning)

Note the call to the `usage()` function. The *usage* of a command line program is simply a string that describes how the program should be called.

Also note how you specify `-o,--output` **three** times: once in the docstring, once it the `getopt()` call and finally once in the `main()` function.

optparse

Then came optparse[1].

1: https://docs.python.org/3/library/optparse.html

The idea was to get rid of the weird syntax to define long and short options, and also to generate the help message and usage automatically.

To use `optparse` you create a *parser*, then call the `add_option` method for each pair of long and short options you want to support.

Then you invoke the `parse_args()` method and it returns a tuple of two items:

The first item is a class (`optparse.Values`) that contains the values of the options as attributes.

The second item is simply the list of positional arguments.

Note the difference between *options* and *arguments* here.

When using `optparse`, everything that is optional starts with a dash (`-`), and is called an *option*, and everything that is required simply comes after all the options and is called a *positional argument*, or *argument* for short.

There's no way to have "optional arguments" or "required options".

The `optparse` authors explain why they choose to enforce this behavior in the documentation:

What are options for[2]
What are positional arguments for[3]

It contains some interesting points about command line API design, so I suggest you read it, even if you won't be using `optparse`.

2: https://docs.python.org/3/library/optparse.html#what-are-options-for

3: https://docs.python.org/3/library/optparse.html#what-are-positional-arguments-for

By default, the attribute name of the parsed option will be computed from its long form.

`--foo-bar=42` gives `opts.foo_bar = 42`.

But you can use `add_option("--input", dest="input_file"`) to override the name of the attribute.

If the option does not take an argument, its parsed value will be a boolean, and you can use `action="store_true"` or `action="store_false"`. You can set its default value by calling `parser.set_defaults()`.

So, if you have a `foo` program that takes a required input file as argument, and an output file as an option, you'll have to write something like:

parser = OptionParser()
parser.add_option("-o", "--output", help="path to the output file")

opts, args = parser.parse_args()

if len(args) != 1:
    sys.exit("Missing one required argument")

input_file = args[0]

if opts.output:
    output_file = opts.output
else:
    output_file = input_file + ".out"

`optparse` has a lot of nice features:

you don't have to write the usage or the help message yourself, it's generated automatically

you can provide default values and types for your options

you can group them together

and so on ...

argparse

argparse[4] is a direct successor of `optparse`.

4: https://docs.python.org/3/howto/argparse.html

The idea is to be more flexible that `optparse` and allowing to have "optional arguments" and "required options", because, let's face it, sometimes it's really convenient. For instance, for a dangerous operation, you may want the `--force` option to be required :P

Anyway, `argparse` uses the same API that `optparse`, except that:

`add_option` becomes `add_argument` and can be used both for *options* and *positional arguments*

`parse_args` returns only one item (an `argparse.Namespace` instance) for the values of both the options and the positional arguments.

Porting from `optparse` to `argparse` is thus quite easy. For instance, the previous example becomes:

parser = ArgumentParser()
# it's an option because it starts with a dash:
parser.add_argument("-o", "--output", help="path to the output file")
# it's a *required* positional argument
parser.add_argument("input", dest="input_file")

args = parser.parse_args()

input_file = args.input_file

if args.output:
    output_file = args.output
else:
    output_file = input_file + ".out"

But you can have optional positional arguments, by using `add_argument("foo", nargs="?")`, or required options, with `add_argument("--force", action="store_true", required=True)`

Also, `argparse` supports what are called *subparsers*, which means you can have complex API that runs different *commands* depending on the first argument. You do this by calling `parser.add_subparsers()`:

parser = argparse.ArgumentParser()
action_parsers = parser.add_subparsers(title="available actions")
bar_parser = action_parsers.add_parser("bar")
...
read_parser = action_parsers.add_parser("read")
...

This makes it possible to implement the example we started with:

$ foo bar --baz --num-jobs=4

$ foo read file.in --verbose

A nifty hack is to use `set_defaults()` to dispatch the code after parsing, like this:

bar_parser.set_defaults(action="do_bar")
...
read_parser.set_defaults(action="do_read")
...

if args.action == "do_bar":
    ...
elif args.action == "do_read":
    ...

By the way, `argparse` started its life as a standalone third-party library, but is now part of Python standard library since Python 2.7.

The official Python documentation for `getopt` and `optparse` both suggest to use `argparse` instead.

docopt

`docopt` came with a quite interesting new idea. What if, instead of generating the help message automatically, we *parsed* directly the usage string?

After all, the string usage does have a pretty normalized format. (It's even part of the "IEE Std 1003.1" standard, also knows as ... POSIX)

Instead of writing tons of boring boilerplate (instantiate the parser, call `add_argument` a bunch of times, then `parse_args`), what if you could just call a function using the usage string?

This is how you use `docopt`:

"""
Usage:
  foo bar [--baz] [--num-jobs=<jobs>]
  foo read <input_file> [--verbose]

Options:
  --baz              Do some baz stuff
  --verbose          Be verbose
  --num-jobs=<jobs>  number of jobs [default: 1]

"""

import docopt

opts = docopt.docopt(__doc__)
print(opts)

Note that the return value of `docopt` is a simple Python dictionary.

The keys will be `'--baz'`, `'baz'`, or `'<baz>'` depending on the context, and the values will always be strings, except when a subcommand is used.

So for instance you would have:

$ foo bar --num-jobs=42
{'--baz': False,
 '--num-jobs': '42',
 '--verbose': False,
 '<input_file>': None,
 'bar': True,
 'read': False}
% foo read input.txt
{'--baz': False,
 '--num-jobs': '1',
 '--verbose': False,
 '<input_file>': 'input.txt',
 'bar': False,
 'read': True}

docopt v argparse

OK, so by now it should be pretty clear your best option is between `argparse` and `docopt`

Note: there's also other projects like clik[5] that do things differently, but:

5: http://click.pocoo.org

I never used it.
I've been told not to use it[6]
This article is already quite long ...

Where docopt is better

Shorter code

Kind of obvious, right? All you have to do is import `docopt` and call `docopt.docopt()` with the usage string.

6: http://xion.io/post/programming/python-dont-use-click.html

Abide PEP

The PEP 257[7] says:

7: https://www.python.org/dev/peps/pep-0257

The docstring of a script (a stand-alone program) should be usable as its

"usage" message, printed when the script is invoked with incorrect or missing

arguments (or perhaps with a "-h" option, for "help"). Such a docstring should

document the script's function and command line syntax, environment variables,

and files. Usage messages can be fairly elaborate (several screens full) and

should be sufficient for a new user to use the command properly, as well as a

complete quick reference to all options and arguments for the sophisticated

user.

`docopt` is perfect for this.

Full control of the --help output

You can do this with `argparse` of course, but doing so is not trivial. You have to implement an `argparse` "formatter class", and the names are ugly:

argparse.RawDescriptionHelpFormatter
argparse.RawTextHelpFormatter
argparse.ArgumentDefaultsHelpFormatter
argparse.MetavarTypeHelpFormatter

Ugh.

You can also prevent `argparse` from generating the help message automatically, but then you have to re-implement handling `-h,--help` yourself.

Arbitrary conditions

Usage: my_program (--either-this <and-that> | <or-this>)

There isn't any way to do this directly with argparse.

You'll have to do it manually, like so:

parser = argparse.ArgumentParser()
parser.add_argument("--either-this")
parser.add_argument("or_this", nargs="?")

args = args.parse_args():
if not any((args.either_this, args.or_this)):
    sys.exit("...")

Grouping exclusive options

With `docopt`:

Usage: foo go (--up | --down)

With `argparse`, you must use `add_mutually_exclusive_group()` which is quite verbose ...

Maintainability

If you configure your `argparse.ArgumentParser` from several portions of your code, you risk defining the same option twice, which will lead to this kind of errors:

argparse.ArgumentError: argument -f/--format: conflicting option string: -f

and this gets old fast ...

Available for many languages

Although the reference implementation of `docopt` is in Python, the library has been ported to quite a few languages.

The official list is on github

Where argparse is better

In the standard library

Actually, this has both pros and cons.

Pros: you don't have to install anything, it will just work with your existing Python installation.
Cons: since it's part of the standard library, you'll have to wait for a new Python release to get bug fixes or new features.

No DSL

`docopt` is able to do its magic because it uses a DSL (domain specific language)

That's why you only need a few lines of code to use it, because the information you need is parsed from an other language, dedicated to this task.

Quite ironically, this is the same approach used by `getopt`. The weird `"ho:v"` string is *also* a DSL ...

But using a DSL comes with a few cons:

For instance, you need *2* spaces between the option name and its description for `docopt` to work:

foo

  --this-is-long Long description
  --short        Do it short

It's hard to see where the problem is here ...

Also, if `docopt` cannot parse your usage string, you'll get a cryptic error message with no clue about what line is wrong:

"""
Usage:
  naval_fate ship new <name>...
  naval_fate ship <name> move <x> <y> [--speed=<kn>]
  naval_fate mine (set|remove <x> <y> [--moored|--drifting]
  naval_fate ship shoot <x> <y>
"""


from docopt import docopt
opts = docopt(__doc__)

$ python naval_fate.py

docopt.DocoptLanguageError: unmatched '('

That's what you get when creating a new language: you have to take care of many subtle details and problems that comes with it...

See an article from ESR[8] for some discussion around this topic.

8: http://esr.ibiblio.org/?p=7032

But this is a bit philosophical, let's get to more concrete stuff.

Error messages

`docopt` will *always* print the usage if it can't parse the command line, with no explanation.

`argparse` will generate much more information:

$ naval_fate move
invalid choice: 'move' (choose from 'ship', 'mine')
$ naval_fate ship new
the following arguments are required: names
$ naval_fate ship move bounty 5 fortytwo
argument y: invalid float value: 'fortytwo'

There's a Pull Request[9] to fix this, though.

9: https://github.com/docopt/docopt/pull/63

Options that only make sense for one operating system

Yes, this is a contrived example:

if os.name == "nt":
    parser.add_argument("--on-mingw")

Still, you get the idea: since you're using Python to build your parser, you have access to the full features of the language, instead of being constrained by the DSL.

Naming

I really like the fact I have a two possible "names" : one for command line API, and an other one for the attribute of the returned value.

For instance, here is a pattern I often use:

def do_foo(raises=True):
    ok = try_something()
    if not ok and raises:
        raise FooFailed()

Using `argparse`, I can do:

parser.add_argument("--ignore-errors", dest="raises")
args = parser.parse_args()
do_foo(raises=args.raises) # I really don't care what the flag is called here
                           # for all I know, it could be --i-know-what-i-am-doing

With `docopt`, I would have to write:

opts = docopt.docopt(__doc__)
do_foo(raises=opts["--ignore-errors"])

which I find less readable, but maybe it's a matter of taste.

An other example is when you need a `--no-foo` option.

I prefer using:

parser.add_argument("--no-foo", dest="foo", action="store_false")
parser.set_defaults(foo=True)
...
if args.foo:
   ...

rather than `if not opts["--no-foo"]:`

Types

With `docopt`, all you get is a dictionary with strings as keys and values. (And sometimes booleans when you have subparsers)

With argparse, you can force values to be integers, floats, existing files, or even your own type.

For instance:

def valid_filename(value)
    if value in [".", ".."]:
        raise Error("Invalid name: %s" % value)

    bad_chars = r'<>:"/\|?*'
    for bad_char in bad_chars:
        if bad_char in value: