The Tao of Option Parsing

Optik was explicitly designed to encourage the creation of programs with straightforward, conventional command-line interfaces. To that end, it supports only the most common command-line syntax and semantics conventionally used under UNIX. If you are unfamiliar with these conventions, read this document to acquaint yourself with them.

Terminology

argument

a string entered on the command-line, and passed by the shell to execl() or execv(). In Python, arguments are elements of sys.argv[1:] (sys.argv[0] is the name of the program being executed). UNIX shells also use the term "word".

It is occasionally desirable to substitute an argument list other than sys.argv[1:], so you should read "argument" as "an element of sys.argv[1:], or of some other list provided as a substitute for sys.argv[1:]".

option

an argument used to supply extra information to guide or customize the execution of a program. There are many different syntaxes for options; the traditional UNIX syntax is a hyphen ("-") followed by a single letter, e.g. "-x" or "-F". Also, traditional UNIX syntax allows multiple options to be merged into a single argument, e.g. "-x -F" is equivalent to "-xF". The GNU project introduced "--" followed by a series of hyphen-separated words, e.g. "--file" or "--dry-run". These are the only two option syntaxes provided by Optik.

Some other option syntaxes that the world has seen include:

  • a hyphen followed by a few letters, e.g. "-pf" (this is not the same as multiple options merged into a single argument)
  • a hyphen followed by a whole word, e.g. "-file" (this is technically equivalent to the previous syntax, but they aren't usually seen in the same program)
  • a plus sign followed by a single letter, or a few letters, or a word, e.g. "+f", "+rgb"
  • a slash followed by a letter, or a few letters, or a word, e.g. "/f", "/file"

These option syntaxes are not supported by Optik, and they never will be. This is deliberate: the first three are non-standard on any environment, and the last only makes sense if you're exclusively targeting VMS, MS-DOS, and/or Windows.

option argument

an argument that follows an option, is closely associated with that option, and is consumed from the argument list when that option is. With Optik, option arguments may either be in a separate argument from their option:

-f foo
--file foo

or included in the same argument:

-ffoo
--file=foo

Typically, a given option either takes an argument or it doesn't. Lots of people want an "optional option arguments" feature, meaning that some options will take an argument if they see it, and won't if they don't. This is somewhat controversial, because it makes parsing ambiguous: if "-a" takes an optional argument and "-b" is another option entirely, how do we interpret "-ab"? Because of this ambiguity, Optik does not support this feature.

positional argument
something leftover in the argument list after options have been parsed, i.e. after options and their arguments have been parsed and removed from the argument list.
required option
an option that must be supplied on the command-line; note that the phrase "required option" is self-contradictory in English. Optik doesn't prevent you from implementing required options, but doesn't give you much help at it either. See examples/required_1.py and examples/required_2.py in the Optik source distribution for two ways to implement required options with Optik.

For example, consider this hypothetical command-line:

prog -v --report /tmp/report.txt foo bar

"-v" and "--report" are both options. Assuming that --report takes one argument, "/tmp/report.txt" is an option argument. "foo" and "bar" are positional arguments.

What are options for?

Options are used to provide extra information to tune or customize the execution of a program. In case it wasn't clear, options are usually optional. A program should be able to run just fine with no options whatsoever. (Pick a random program from the UNIX or GNU toolsets. Can it run without any options at all and still make sense? The main exceptions are find, tar, and dd -- all of which are mutant oddballs that have been rightly criticized for their non-standard syntax and confusing interfaces.)

Lots of people want their programs to have "required options". Think about it. If it's required, then it's not optional! If there is a piece of information that your program absolutely requires in order to run successfully, that's what positional arguments are for.

As an example of good command-line interface design, consider the humble cp utility, for copying files. It doesn't make much sense to try to copy files without supplying a destination and at least one source. Hence, cp fails if you run it with no arguments. However, it has a flexible, useful syntax that does not require any options at all:

cp SOURCE DEST
cp SOURCE ... DEST-DIR

You can get pretty far with just that. Most cp implementations provide a bunch of options to tweak exactly how the files are copied: you can preserve mode and modification time, avoid following symlinks, ask before clobbering existing files, etc. But none of this distracts from the core mission of cp, which is to copy either one file to another, or several files to another directory.

What are positional arguments for?

Positional arguments are for those pieces of information that your program absolutely, positively requires to run.

A good user interface should have as few absolute requirements as possible. If your program requires 17 distinct pieces of information in order to run successfully, it doesn't much matter how you get that information from the user -- most people will give up and walk away before they successfully run the program. This applies whether the user interface is a command-line, a configuration file, or a GUI: if you make that many demands on your users, most of them will simply give up.

In short, try to minimize the amount of information that users are absolutely required to supply -- use sensible defaults whenever possible. Of course, you also want to make your programs reasonably flexible. That's what options are for. Again, it doesn't matter if they are entries in a config file, widgets in the "Preferences" dialog of a GUI, or command-line options -- the more options you implement, the more flexible your program is, and the more complicated its implementation becomes. Too much flexibility has drawbacks as well, of course; too many options can overwhelm users and make your code much harder to maintain.