Thoughts on built-in documentation for command-line utilities (Part II)

Note: The formatting of text in some code blocks below is broken because the current page styling doesn’t allow for enough characters per line to display it properly. I'll try to fix this shortly. Meanwhile, you can get the original formatting by switching off CSS.

Here comes the second part of my inquiry into design patterns of built-in command documentation. As advertized, this part covers BSD’s as well as BusyBox’ way of providing built-in documentation with their utilities and discusses some best-practice guidelines towards the end.

The BSD way

Contrary to GNU, OpenBSD, and BSD systems in general, take a radically minimalistic approach to built-in command documentation: When an invocation error occurs, BSD utilities usually display no more than an error message followed by a very concise usage summary for the respective command, all on standard error. There are no equivalents of GNU’s --help and --version options, which means that there’s no way to request built-in documentation independently of making an error. But then, being able to do that on BSD would, arguably, not make too much sense anyway. For one, BSD utilities don’t need to carry their own version information because they are simply cat, cp, ls, etc. from a particular operating system release, e.g., OpenBSD 7.0. And the usage summary they display on invocation error is, in most cases, just a plain-text version of the SYNOPSIS section from the respective manual page, which will often be sufficient for a user to figure out the problem, especially when it is preceded by an error message.

Another thing that sets the BSD approach apart from GNU’s is that neither error messages nor built-in help are localized.

As a first example, let’s look at what happens when ls is run with an unknown option on OpenBSD:

$ ls -b
ls: unknown option -- b
usage: ls [-1AaCcdFfgHhikLlmnopqRrSsTtux] [file ...]

As can be seen here, the built-in help text for ls on this system comprises exactly one line. The same goes for many other utilities on OpenBSD and BSD in general. Coming up with reasonable line count statistics in this case is quite tricky, though. OpenBSD’s base system, for example, contains quite a few tools that are not developed within the project or originated outside the BSD sphere and follow a different scheme, e.g., less, libtool, or Perl. Some tools created by the OpenBSD project, such as OpenSSL, act differently as well.

I’ve had a closer look at the /bin, /sbin, and /usr/bin directories on OpenBSD 7.0 and found that, for the genuine OpenBSD utilities in there that follow the scheme shown above, the usage summary line counts range between 1 and 33, the average being around 2.[1] This means that someone using those utilities will hardly ever see the need to use a pager (or the terminal’s scrollback buffer) to view their invocation help. But then, this doesn’t apply to OpenBSD’s base system as a whole because, out of those three directories, the latter already contains 17 utilities for which the usage summary line count is higher than 66.

Additionally, programs installed through OpenBSD’s package system are, naturally, not bound to following any particular scheme for built-in documentation. This also means that, as far as the consistency of such documentation goes, OpenBSD is inevitably hardly better than the average Linux system.

And then, the BSD’s common way of providing built-in command documentation does have some flaws of its own as well.

As already mentioned, BSD utilities usually don’t provide any way of accessing their built-in documentation explicitly. And while it could very well be argued that there’s no real need for this feature, being able to take a quick peek at a command’s usage summary without having to render an entire manual page surely has its uses. In fact, it’s sufficiently popular among BSD users for there to be a well-established hack to trick command-line utilities into displaying their synopsis, namely running the respective command with -? as the only argument in order to cause an invocation error, e.g.:

$ cp -?
cp: unknown option -- ?
usage: cp [-afipv] [-R [-H | -L | -P]] source target
       cp [-afipv] [-R [-H | -L | -P]] source ... directory

This works for most BSD commands, but not for all of them. Some, like /bin/kill or shar, won’t treat -? as an option, which means that running them with -? as the only argument won’t trigger an invocation error. Awk, on the other hand, will recognize -? as an unknown option, but simply ignore it instead of erroring out and showing its synopsis. All in all, OpenBSD’s /bin, /sbin, and /usr/bin contain about 20 BSD-style utilities for which the -? hack doesn’t work. So, the user has to know in advance whether it will work for the utility in question.

To be fair, though, some of the inconsistency here is due to a couple of standard utilities not providing any help text by design, a problem that exists with GNU userspace tools just as well. However, all GNU utilities that do provide built-in documentation will show it when run with --help or --version, respectively. I.e., whether the user will get help text with --help and version information with --version in such a case does not depend on any further characteristics of the program in question. The same can’t be said about running BSD utilities with -?. But then, the latter is merely a hack, while the former is a standard in GNU userland.

Speaking of inconsistency, one thing that strikes me as rather odd is that a considerable number commands in the BSD toolbox won’t always provide an error message before displaying their usage summary when they are invoked incorrectly. More precisely, this concerns the case of commands that require arguments being run without any argument. E.g., invoking cp without arguments on a BSD system only displays the command’s usage summary:

$ cp
usage: cp [-R [-H | -L | -P]] [-f | -i] [-alNpv] src target
       cp [-R [-H | -L | -P]] [-f | -i] [-alNpv] src1 ... srcN directory

It’s unclear to me why it would make sense to omit the error message in this case. Even though the error here is obvious from the usage summary, there should still be an error message. After all, the return code of that command being 1 clearly indicates it as being and erroneous invocation and, as such, it should trigger the display of a proper notification. Not doing that is simply inconsistent. Apart from cp, commands displaying this behavior include cut, mv, rm, mkdir, and ln, just to name a few. Incidentally, the GNU counterparts of these tools all provide an error message in this case. And so do some other tools on OpenBSD, notably rcs and related commands, such as ci, co, and rcsdiff.

On top of that, OpenBSD’s base system contains commands that, going by their synopsis, would allow for being called without arguments, but return 1 and display their usage summary when doing that. One of those commands is atrm:

$ atrm                                                            
usage: atrm [-afi] [[job] [name] ...]
$ echo $?
1

Last but not least, BSD’s built-in command documentation lacks sufficient formatting. As mentioned earlier, the usage summaries BSD utilities display on erroneous invocation are usually mere plain-text versions of the respective manual page’s SYNOPSIS section. And one rather important aspect that doesn’t carry over into these plain-text versions is the clear distinction between regular text, literal command input, and parameters within such input. The GNU project solves this quite nicely by using sentence case for regular text and making parameters uppercase. The annoying thing about GNU in this case is that parameters wind up in uppercase in its manual pages as well.

The BusyBox way

BusyBox’ approach to built-in command documentation is somewhere in between GNU and BSD, at least when built with the CONFIG_SHOW_USAGE and CONFIG_FEATURE_VERBOSE_USAGE options enabled. On invocation error, BusyBox commands usually display an error message followed by a line denoting the BusyBox release in use as well as the build version[2], which is in turn followed by a more or less BSD-style usage summary, a short description of what the command does, and an option summary. Let’s look at BusyBox cp for an example:

$ cp -b cat.png /media/DONKEY/
cp: invalid option -- 'b'
BusyBox v1.33.1 (2022-02-07 22:14:38 CET) multi-call binary.

Usage: cp [-arPLHpfilsTu] SOURCE... DEST

Copy SOURCE(s) to DEST

	-a	Same as -dpR
	-R,-r	Recurse
	-d,-P	Preserve symlinks (default if -R)
	-L	Follow all symlinks
	-H	Follow symlinks on command line
	-p	Preserve file attributes if possible
	-f	Overwrite
	-i	Prompt before overwrite
	-l,-s	Create (sym)links
	-T	Treat DEST as a normal file
	-u	Copy only newer files

The user can also request to see a command’s built-in documentation by running it with the --help option. Unfortunately, doing that also sends the output to standard error when it should instead go to standard out in this case[3] – with the exception of the busybox command, which does send its help text to standard out.

Another small nuisance with BusyBox commands is that, like many BSD tools, they don’t display an error message when they fail because of required arguments missing from their invocation.

As far as the content of BusyBox’ built-in command documentation goes, I’m not sure whether including version information in the output of --help instead of providing a separate way to access it is such a good idea. It feels slightly noisy to me, but the annoyance is quite bearable.

On the positive side, BusyBox, like GNU, makes good use of letter case to keep things distinguishable in plain text. In contrast to mostly all-lowercase command input, parameters are written in capital letters, and regular text uses sentence case. BusyBox’ use of sentence case is even slightly more consistent than GNU’s because option descriptions always start with a capital letter as well. Then again, just as GNU, and for reasons remaining somewhat unclear to me, BusyBox also uses uppercase letters for parameters in its manual page (at least in the applet descriptions), instead of making use of *roff’s formatting capabilities.

Finally, in the case of BusyBox, including an option summary in built-in command help works pretty well, in my experience. The main reason for this is that BusyBox commands tend to have way fewer options than their GNU counterparts, which means an option summary usually won’t add too many lines. In addition to that, and contrary to GNU, BusyBox’ option summaries are well formatted. And, in BusyBox’ context, including an option summary also makes good sense because it being a multi-call binary, the manual page contains all the information on all the commands in the toolbox, which isn’t exactly handy when you’re trying to find information on a particular option of a certain command.

I was originally going to provide line count statistics for BusyBox’ built-in documentation as well, but then, a reasonable lack of motivation persuaded me to leave that to the reader. It wouldn’t have added anything substantial to what has already been said here, anyway. So, I decided to rather get into the interesting part of this whole story early and offer a small compendium of suggestions on how to actually design and implement built-in documentation properly.

Best-practice guidelines

So, let’s extract a few guidelines from what has been covered so far. Note, though, that all of the following is written on the premise that accurate, comprehensive, and well-written manual pages are already being provided for general reference. Also, as should be clear by now, I’m well aware that the following suggestions are not necessarily a good fit for all software. But if the piece of software in question is some independently developed tool or tool set, they should fit pretty well.

Before getting into the actual guidelines, however, let me take a step back and look at the simplest possible way of dealing with the problem of built-in command documentation. And that is, of course, not to provide it in the first place. It’s a fair approach, actually, because it gets a delicate problem out of the way without sacrificing all that much in terms of usability. After all, providing well-written manual pages and no built-in documentation is certainly more user-friendly than serving a mix of half-baked manual pages, comprehensive documentation in a different (and worse) format, and, on top of that, built-in documentation that is merely a partial plain-text translation of a respective command’s manual page without proper formatting.

That said, an added bit of convenience can make a noticeable difference in terms of workflow efficiency, especially when trying to determine a utility’s release version. Without built-in documentation, there is no convenient, portable way of doing that. You would have to rely either on (parsing) manual pages, or on obtaining the relevant information through the package system somehow, both of which make checking for a specific version of a utility a rather cumbersome and potentially expensive operation.

So, here’s a summary of things to keep in mind when implementing built-in documentation for a command-line utility.

1. Accessibility

1.1 Make it very fast

As mentioned in the first part of this little study, displaying a command’s built-in documentation should be very fast. And it should also be easy on system resources. Therefore, the whole operation should be brought down to merely displaying some static text, as far as that’s reasonable.[4] If a program needs to do a lot of pre-loading or pre-checking for its normal operation, this should be reduced to the absolutely necessary minimum when putting out built-in documentation. As a somewhat arbitrary rule of thumb: Putting out documentation that is built into a utility really shouldn’t take longer than a quarter of a second (on a cold start) and preferably happen considerably faster.

1.2 Make it available on user request

Apart from a command’s invocation help showing up on invalid invocations, built-in documentation should also be available via command-line options and, where appropriate, through subcommands. The best way of making that happen is to show invocation help when the command is run with the --help option or help subcommand and to show version information when it is run with the --version option or version subcommand. A utility may offer shortcuts for these – I tend to use -h, h, -V, and v, respectively – but should still support the long form as the standard meta options or subcommands, i.e., options and subcommands that make a utility display information about itself.

The reason version information should be separate from invocation help is that that reduces the respective output’s noise level and thus makes it easier to use a utility’s version information in scripts.

The rationale for using --help and --version is pretty straight-forward: First, it’s way less ambiguous than any single-letter option could ever be. It’s highly unlikely that any sane program for which these options are valid will not interpret them as meaning ‘show help text’ and ‘show version information’. That is certainly also due to the fact that they are already part of a widely adopted standard, namely GNU’s standard for designing command-line interfaces.

Single-letter options, on the other hand, are a swamp of ambiguity here, as I’ve already demonstrated in an article on parsing command-line arguments in 2019:

A -h option, [for instance], might or might not [mean ‘show help’]. For example, it has a very different meaning in chown and chgrp, where it is used to handle symbolic links.
Similarly, -v might increase output verbosity and -V show a program’s version, or vice versa (like with chattr), or both might be interpreted as something entirely different. See, for example, the definitions of -v in grep or awk, and that of -V in GNU's implementation of tar.

On top of that, supporting double-dashed long options doesn’t even violate POSIX, as long as POSIX-compliant short variants are provided as well.

Finally, with --help and --version already in place as the standard meta options, it seems only logical to use the exact same names with the double dash removed for the corresponding subcommands.

Now, there is a case to be made for subcommand-based interfaces not to provide --help and --version, but only offer the respective subcommands.[5] The reason this actually makes good sense is that, --help and --version are effectively subcommands already because, unlike “pure” options, they provide operations unto themselves instead of merely influencing aspects of a set operation, like removing a file or listing files within a directory. To make this more clear: Running, e.g., GNU rm on a file with any of its options except --help and --version will always remove a file unless an error occurs. Running the same command with either the --help or --version option will not remove any files. That even includes cases where --help or --version are given at an arbitrary position within an otherwise (to that point) valid invocation. For example:

$ ls
e-mail_rant  song68.mp3
$ rm e-mail_rant --version
rm (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Richard M. Stallman,
and Jim Meyering.

Having things work this way is not POSIX-compliant. But it might be a good idea nonetheless.

Anyway, simply providing --help and --version as alternate ways to invoke the corresponding subcommand is certainly a better solution than introducing yet another interface standard that is hardly any different form the ones already in use.

1.3 Use stdout and stderr appropriately

If the user provides an incorrect command invocation, put out a usage summary on stderr, after the (mandatory) error message, because, in this case, the usage summary is part of the program’s diagnostic output.

If, on the other hand, the user requests to see either invocation help or version information by using the respective option or subcommand, the result is regular or “conventional”[6] output and must therefore go to stdout. Apart from being wrong conceptually, not doing it this way will require redirecting that output for any use case beyond just viewing invocation help or version information.

One such use case is checking for a particular version of a utility. So, for example, if you wanted to make sure stat on a particular system is GNU stat because a Bash script you wrote depends on it, you could simply do that inside the script, by looking for a particular string found in GNU stat’s --version output:

_stat_is_gnu() {
  ## Tell whether stat is GNU stat

  local err=0
  local gnu_version='.*(GNU coreutils).*'

  stat --version | head -n 1 | grep -q -- "$gnu_version" || \
    { printf 'GNU version of `stat` command required\n' >&2; err=1; }

  [[ "$err" -gt 0 ]] && return 1
  return 0
}

Now, if the output of stat --version would go to stderr instead, it would have to be redirected before being piped to head (stat --version 2>&1) because head’s stdin reads from stat’s stdout, not its stderr stream. And having to do that every time that output is to be used programmatically would be quite a nuisance.

2. Content

Before we get into the details of what information invocation help and version information should contain, let me stress once again that erroneous command invocations should always trigger a meaningful error message. After all, error messages are also a form of built-in command documentation, and actually the most important one.

2.1 Invocation help

Invocation help should usually only contain a more or less BSD-style usage summary, that is: a reasonable plain-text translation of the SYNOPSIS section from the utility’s manual page. (See 3. Formatting and style for what makes a plain-text translation reasonable.)

I’m saying “usually” because there is one exception that appears quite useful to me. And that is having the first line of the output hold a short description of what the utility is or does if displaying invocation help happens on user request. As with the usage summary, this piece of information can simply be copied from the manual page because it is already provided in the NAME section.

Including an option summary, though, is usually not a good idea because you will need to add one line per option you introduce, which might lead your tools’ invocation help to grow to a line count that will require paging, which defeats the purpose of built-in help almost entirely.

2.2 Version information

As already covered in Part I, version information should contain a program’s name, followed by the respective version number – no more, no less.

The program’s name here refers to it’s actual name, not to whatever handle was used to invoke it. That means that the program itself should have this name hard-coded.

2.3 Localization

As far as localization goes, I’d argue that it’s a good idea to offer non-English versions of built-in documentation as well. All the information it needs to contain can be copied from the NAME and SYNOPSIS sections of a command’s manual page. So, translating the manual page will already provide the material.

That said, translating the manual page itself should be the priority here. And translations should only be published if they’re complete. So, unless you can ship any fully translated non-English manual pages, don’t translate built-in documentation either.

Needless to say, translations only provide real benefit when they are of good quality. Publishing bad translations, on the other hand, is nearly as harmful as publishing bad code because, just as bad code, they will inevitably be picked up as a good-enough, working example by someone and spread from there.

So, if you can’t translate things at a sufficient level yourself and don’t have a way to assess the quality of translations contributed by others, it’s better not to translate anything at all.

3. Formatting and style

Generally, follow the POSIX Utility Conventions.

3.1 Line-wrapping and indentation

With version information cut down to a concise one-liner and invocation help reduced to mostly just a command’s synopsis, line-wrapping likely won’t be much of an issue, but it might still be necessary on occasion.

Obviously, lines should be wrapped after a reasonable amount of columns. I tend to keep them at a width of no more than 80. While the historical reason for this convention no longer applies, I see no harm in following tradition here. You’ll have to choose some sort of limit anyway.

Closely connected to line-wrapping is indentation. Especially if the text of what is logically a single line in a command’s synopsis needs to be wrapped, its continuation on the next line should be properly indented, meaning at least beyond the command name (including subcommands). See, for example, the built-in synopsis of the OpenSSH client:

usage: ssh [-46AaCfGgKkMNnqsTtVvXxYy] [-B bind_interface]
           [-b bind_address] [-c cipher_spec] [-D [bind_address:]port]
           [-E log_file] [-e escape_char] [-F configfile] [-I pkcs11]
           [-i identity_file] [-J [user@]host[:port]] [-L address]
           [-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port]
           [-Q query_option] [-R address] [-S ctl_path] [-W host:port]
           [-w local_tun[:remote_tun]] destination [command]

How much whitespace to use per indentation level is, ultimately, a question of preference. Sometimes, it will just be given implicitly, as above. That said, I tend to think that using four spaces per level where indentation levels exist explicitly makes for pretty good readability.

Additionally, I recommend putting a line break before the actual synopsis and indenting it one level, like this:

Usage:
    <command> <options> <operands>

This is, arguably, a bit better than starting the synopsis right on the first line because it separates that synopsis from its accompanying meta text more clearly. Doing it this way also uses slightly less horizontal space.

3.2 Use letter case for better readability

Use sentence case for regular text, e.g., “Usage:”, and keep parameters all-uppercase in order to make both better distinguishable from mostly all-lowercase literal command input.[7] In other words: As far as letter case goes, just do what BusyBox does.

3.3 Allow single-letter option combining for a more compact synopsis

The POSIX Utility Syntax Guidelines say that

[o]ne or more options without option-arguments, followed by at most one option that takes an option-argument, should be accepted when grouped behind one - delimiter.

What this means is that, for example,

nano -c -H -x -Y none 

should also be allowed to be written as

nano -cHxY none 

This more compact format of writing out single-letter options is, obviously, also useful for writing more compact synopses, which makes for an additional good reason to implement it.

Now, with the above guidelines in place, it should be pretty straight-forward not to make the worst of built-in documentation.


[1] Unfortunately, my method of gathering the data needed to get these values was very primitve, awfully time-consuming, and quite error-prone, so that, after I had gone through /bin, /sbin, and /usr/bin, I decided to leave /usr/sbin out in order to still be able to handle the data. [back]

[2] By default, the build version will simply contain a timestamp from whenever the compilation was started, but distributions might use a different scheme. E.g., on Devuan Beowulf, BusyBox’ build version is shown as Debian 1:1.30.1-4, which is equivalent to the Debian package version of BusyBox on that system. [back]

[3] See footnote 12 in Part I for a rationale. [back]

[4] Making built-in documentation fully static, i.e., making it a string constant is arguably not all too reasonable because it requires not only hard-coding the program’s name at least twice, but also hard-coding all needed indentation. [back]

[5] See, for example: https://jmmv.dev/2013/09/cli-design-subcommand-based-interfaces.html [back]

[6] https://pubs.opengroup.org/onlinepubs/9699919799/functions/stdout.html [back]

[7] Uppercase options will still be easily recognizable as literal input because they either consist of a single letter preceeded by a hyphen or are represented by a single letter within an option group preceeded by a hyphen. [back]