Thoughts on built-in documentation for command-line utilities (Part I)

While the de facto standard form of documentation on Unix-like systems remains to be manual pages written in some incarnation of the roff formatting language, most command-line utilities in this day and age also provide one or another form of built-in documentation. And while larger software projects, such as GNU or the BSDs, each have their own more or less strictly defined rules or conventions on how to implement such documentation, it’s a complete do-as-you-please affair for those who develop software independently.

But, as nice as not being bound by any pre-defined rules may be, the absence of well-thought-out guidelines also makes it hard to use this freedom wisely. In fact, many command-line utilities offer built-in documentation in a way that makes it hard to use.

The purpose of this article is thus to give the question of how such documentation should be implemented careful consideration and, at the end, extract a set of guidelines on what makes a sensible implementation. To establish a reasonable basis for that, I will first try to define the actual purpose of built-in command documentation. Then I will take a look mainly at three different projects that each have their own distinct way of providing it – namely GNU, OpenBSD, and BusyBox –, discuss their merits and drawbacks, and also briefly comment on a few independent utilities along the way.

As there’s quite a lot of material to be covered here, I have decided to split this article up into two parts. The first one, i.e., this text, will deal with the basic whats and whys of built-in documentation and look at how GNU implements it. The second part will then cover OpenBSD and BusyBox as well as the discussion of some best-practice guidelines.

What is it good for?

Having looked at this from a few different angles, I’d say that the purpose of built-in command documentation essentially is

  1. to present that information about a command which is most immediately useful to someone wanting to run it
  2. to present this information in a very concise form
  3. to make it more readily accessible than the standard documentation

These things obviously overlap to a considerable extent, which means that, on the one hand, a lot of synergistic effect is to be had by getting them all right. On the other hand, the usefulness of built-in documentation is easily hurt by getting one of them wrong.

Now, point a) offers only a very loose description of what information should actually go into a command’s built-in documentation. My take on this is that, in order to best suit its purpose, built-in documentation should include no more than a short overview of how to invoke the respective command correctly and, in some contexts, concise version information, i.e., the actual program’s name and a version number.

The reason version information should contain a command’s actual program name is that program and invocation names can differ, for a couple of reasons. To give just one example, when I run pg --version on my Devuan system, I will be told that pg is equivalent to less 487 because I have pg set up as an alias for less.

The other point in the above list that deserves some clarification is c). So, what does it actually mean to make built-in documentation more readily accessible than the standard documentation? It means making it accessible in less time and with less effort. I.e., if the way to access standard documentation is running man <command>, the user should ideally have to type less than that to access built-in documentation. In addition to that, built-in documentation shouldn’t take longer to appear on the user’s screen than – in our case – a *roff manual page. This is practically a non-issue because printing out plain text is, by nature, a faster operation than rendering a *roff document for display. Also, in this day and age, both displaying built-in documentation and having a manual page rendered are usually split-second operations, so the user normally won’t perceive a significant difference anyway. Still, some command-line tools fail spectacularly at this. For instance, youtube-dl, run with its --help option, takes roughly one and a half second to put 428 lines of plain text on the screen on my system. For comparison, running tar --help on that same system emits 378 lines in about 0.005 seconds. Needless to say, both of these help texts are excessively long, but that’s a different issue.

The most important takeaway here is that the reason to have built-in documentation is for a user to be able to find some selected information about a command way quicker than it would take them to look it up in the respective manual page. Consequently, this kind documentation is not to be treated as a substitute for manual pages, which are meant to provide the user with a comprehensive reference manual.

The GNU way

One thing that distinguishes GNU command-line utilities from the standard utilities found on Unix-like systems not using the GNU toolbox is their provision of double-dashed long options. As per the GNU Coding Standards, command-line utilities are required to provide at least two such options: --help and --version.[1]

When a user enters a command with the --help option, the respective utility is supposed to “output brief documentation for how to invoke the program, on standard output, then exit successfully.”[2] In addition to that, utility authors are asked to “place lines giving the email address for bug reports, the package’s home page [...], and the general page for help using GNU programs”[3] “[n]ear the end of the ‘--help’ option’s output”[4]; and they may choose “to mention other appropriate mailing lists and web pages”[5] as well.

While it’s very much debatable whether the latter two are good ideas, a built-in quick overview of how to invoke a command correctly surely is a reasonable thing to have. However, the reality of help text built into GNU utilities is that even that part is often not exactly brief. For coreutils 8.30, the line count of what commands put on the screen when run with --help ranges between 10 and 168, the average being 36.[6] To give a few examples, the help text of GNU ls comprises 128 lines, making it the second largest in the collection, running cat --help displays 28 lines of text, and cp --help weighs in at 87. The main reason these line counts are often quite large is that GNU’s built-in help texts usually (if not always) duplicate the entire description of the respective command’s argument syntax from its manual page. Some (e.g., those of cat, chown, and date) even include usage examples.

I think that, for the most part, what GNU does here is quite the opposite of helpful, for several reasons.

When I start up the terminal emulator in my desktop environment, its standard window dimensions will provide room for 24 lines of text. When I maximize that window vertically, which I usually do for reading or writing larger portions of text (including code), but hardly ever for anything else, that will give me room for 56 lines. I could increase that number to 58 by switching off window decorations. This is at a configured default font size of 15 pixels. However, I almost always increase the font size in maximized terminal windows by two zoom levels (i.e., 4 pixels, in my case), leaving me with 45 lines top-to-bottom when window decorations are removed. 45 is also the value I get for TTY sessions in my setup, where I use the 12x24-pixel variant of the Terminus font.

As it turns out, even in the most generous scenario (58 lines of vertical space), about 15% of GNU’s core utilities will display help texts that don’t fit on my screen. Going with the much more appropriate limit of 45 lines increases that fraction to nearly 21%. On top of the aforementioned cp and ls, these 21% also include, among others, chown, date, dd, du, ln, sort, stat, tail, tr, and [. And that only reflects the situation for coreutils. As already mentioned above, the built-in help text of GNU tar (1.30) amounts to 378 lines, and those of GNU grep (3.3) and GNU xargs (from findutils comprise 72 and 55 lines, respectively.

This means that running a GNU utility with the --help option, will more or less often result in a wall of text being slammed onto the screen that requires a pager to make sensible use of. But if the whole thing needs to be (re-)run through a pager anyway, one might as well simply go right for the manual page.

And there’s another good reason to do that: Most of the time, the formatting of GNU’s built-in command help is way worse than it would have to be, even for fixed-with plain text. (Just try one of the utilities mentioned above and see for yourself.) On the other hand, manual pages will be presented nicely justified and with reasonable line-wrapping and word-splitting based on the available number of columns.[7] And because they’re done using a formatting language, manual pages allow for beyond-plain-text formatting, which enhances readability significantly.

The situation is better, but not generally different, for --version. For coreutils 8.30, running a command with this option will result in either 7 or 8 lines of output. This is because, besides the actual version information, GNU utilities’ --version output is supposed to include at least a copyright notice, a license statement, “a brief statement that the program is free software”[8], and a warranty disclaimer.[9] A list of the major authors may also be added[10], and virtually all commands in coreutils include it. On top of all that, the output of running a command with the --version option might include even more information about it. This is the case for, e.g., GNU find:

$ find --version
find (GNU findutils)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later .
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Eric B. Decker, James Youngman, and Kevin Dalley.

While these few lines will easily fit on a single screen, the problem with how GNU tools present version information should now be obvious: The signal-to-noise ratio is simply awful because all except the first line of a GNU utility’s --version output are quite certainly of no interest to a user wanting to know which version of that utility they are running. Everything but that first line should really just be kept in that utility’s manual page. And, in fact, GNU’s manual pages usually include most of this adjunct information already.

So, in essence, the way GNU does built-in command documentation combines duplicating excessive amounts of information from the standard documentation[11] with an output format that is generally way less suitable for presenting all this information, used in a way that is unnecessarily hard to handle. On top of that, man and even info is less to type than --help or --version. (Though, admittedly, the latter would not be that much of an issue if they had gotten it right content-wise.) Sadly, quite a few non-GNU utilities (e.g., bzip2 and URxvt[12]) largely follow this pattern.

Now, I think it’s safe to assume that neither the GNU project as a whole nor the individual developers of GNU tools are deliberately trying to hurt the users of their software. So, there must at least be some conceived merit in implementing built-in command help the way they do it. However, asking about this on their official IRC channel revealed that the two main reasons it’s done this way are that, one, it’s common practice not only for GNU software, and, two, GNU’s manual pages are, in fact, often generated using a tool named help2man, which “produces simple manual pages from the ‘--help’ and ‘--version’ output of other commands.”[13]

None of these qualify as a good reason. And the latter, of course, makes it a necessity to put lots of extraneous information into a command’s built-in documentation and is clearly a consequence of the GNU project discouraging the use of *roff manual pages – the standard form of documentation throughout the Unix-like operating system landscape[14] – in their Coding Standards.[15]

One argument I would have expected to hear is that putting a lot of information into built-in help text would enable the user to do without a manual page, should that be necessary. But then, what would be the actual merit in that? Unless for specialized use cases, where there are serious resource constraints, the proper installation of a command-line utility should and will normally include a manual page or other documentation. And while manual pages are indeed heavier than built-in help, having large amounts of the latter certainly isn’t desirable either in massively constrained environments because it adds weight to the utilities.

So, in conclusion, the way GNU software implements built-in command documentation is, for the most part, an example of how not to do that. It does, however, incorporate two features that are pretty useful. The first one is that the user can request built-in documentation explicitly, which is, for example, not the case with BSD utilities. The second one is that GNU utilities’ built-in documentation is, by default, localized.

[1] [back]

[2] [back]

[3] Ibid. [back]

[4] Ibid. [back]

[5] Ibid. [back]

[6] These numbers are based on obtaining the line counts of running all utilities from GNU coreutils 8.30, as packaged for Devuan Beowulf, with the --help option. This excludes coreutils, i.e., the all-in-one-binary variant of GNU’s core utilities, /bin/kill and /usr/bin/uptime, which are provided by procps-ng on this system, and /bin/hostname, which is part of Devuan’s net-tools package. Also excluded were dir and vdir, which are merely eqivalents of running ls with a particular set of options, and /usr/bin/test, which is just an alternate way to invoke /usr/bin/[ (and, contrary to the latter, doesn’t provide help text by design). For a list of all tools involved, see coreutils_for_stats.txt. Whenever I make comments about GNU coreutils in the remainder of this text, this is what I’m referring to. [back]

[7] Unfortunately, once a manual page has been rendered, the user is more or less stuck with whatever line width their terminal window or TTY screen allowed for when they ran the man command. Content will not be dynamically re-fit when the window or screen width changes, except for some primitive and not exactly helpful line-wrapping that some pagers may apply. [back]

[8] [back]

[9] Ibid. [back]

[10] Ibid. [back]

[11] The fact that the GNU project doesn’t recognize Unix manual pages as standard documentation and actively discourages their creation and use is deliberately being ignored here. The Info pages they aim to offer instead consist of nothing but fixed-width plain text with hyperlinks, which makes them absolutely inferior to classic manual pages in terms of formatting and thus readability. [back]

[12] URxvt, bzip2, and probably others make things even worse by putting their help text out on standard error. First of all, this is conceptually wrong because when the user requests to see help text, that help text is regular command output and doesn’t classify as either an error or diagnostic message. The practical consequence of this is that to pipe the built-in help text of these tools to a pager, the user first has to redirect standard error to standard output, which is quite annoying. [back]

[13] [back]

[14] To be fair, only the man utility is part of POSIX, and its specification doesn’t really say much about what it’s supposed to do and how it’s supposed to do these things. Nevertheless, Unix-like systems in general, even Linux by itself, treat traditional Unix-style manual pages as the standard form of “online” documentation. [back]

[15] Cf. and [back]