Thoughts on built-in documentation for command-line utilities (Part I)
While the de facto standard form of documentation on Unix-like systems remains to be manual pages written in some incarnation of the roff formatting language, most command-line utilities in this day and age also provide one or another form of built-in documentation. And while larger software projects, such as GNU or the BSDs, each have their own more or less strictly defined rules or conventions on how to implement such documentation, it’s a complete do-as-you-please affair for those who develop software independently.
But, as nice as not being bound by any pre-defined rules may be, the absence of well-thought-out guidelines also makes it hard to use this freedom wisely. In fact, many command-line utilities offer built-in documentation in a way that makes it hard to use.
The purpose of this article is thus to give the question of how such documentation should be implemented careful consideration and, at the end, extract a set of guidelines on what makes a sensible implementation. To establish a reasonable basis for that, I will first try to define the actual purpose of built-in command documentation. Then I will take a look mainly at three different projects that each have their own distinct way of providing it – namely GNU, OpenBSD, and BusyBox –, discuss their merits and drawbacks, and also briefly comment on a few independent utilities along the way.
As there’s quite a lot of material to be covered here, I have decided to split this article up into two parts. The first one, i.e., this text, will deal with the basic whats and whys of built-in documentation and look at how GNU implements it. The second part will then cover OpenBSD and BusyBox as well as the discussion of some best-practice guidelines.
What is it good for?
Having looked at this from a few different angles, I’d say that the purpose of built-in command documentation essentially is
- to present that information about a command which is most immediately useful to someone wanting to run it
- to present this information in a very concise form
- to make it more readily accessible than the standard documentation
These things obviously overlap to a considerable extent, which means that, on the one hand, a lot of synergistic effect is to be had by getting them all right. On the other hand, the usefulness of built-in documentation is easily hurt by getting one of them wrong.
Now, point a) offers only a very loose description of what information should actually go into a command’s built-in documentation. My take on this is that, in order to best suit its purpose, built-in documentation should include no more than a short overview of how to invoke the respective command correctly and, in some contexts, concise version information, i.e., the actual program’s name and a version number.
The reason version information should contain a command’s actual program
name is that program and invocation names can differ, for a couple of reasons.
To give just one example, when I run pg --version on my Devuan
system, I will be told that pg is equivalent to less 487 because
I have pg set up as an alias for less.
The other point in the above list that deserves some clarification is c).
So, what does it actually mean to make built-in documentation more readily
accessible than the standard documentation? It means making it accessible in
less time and with less effort. I.e., if the way to access standard
documentation is running man <command>, the user should
ideally have to type less than that to access built-in documentation. In
addition to that, built-in documentation shouldn’t take longer to appear on the
user’s screen than – in our case – a *roff manual page. This
is practically a non-issue because printing out plain text is, by nature, a
faster operation than rendering a *roff document for display. Also, in
this day and age, both displaying built-in documentation and having a manual
page rendered are usually split-second operations, so the user normally won’t
perceive a significant difference anyway. Still, some command-line tools fail
spectacularly at this. For instance, youtube-dl, run with its
--help option, takes roughly one and a half second to put 428
lines of plain text on the screen on my system. For comparison, running
tar --help on that same system emits 378 lines in about 0.005
seconds. Needless to say, both of these help texts are excessively long, but
that’s a different issue.
The most important takeaway here is that the reason to have built-in documentation is for a user to be able to find some selected information about a command way quicker than it would take them to look it up in the respective manual page. Consequently, this kind documentation is not to be treated as a substitute for manual pages, which are meant to provide the user with a comprehensive reference manual.
The GNU way
One thing that distinguishes GNU command-line utilities from the standard
utilities found on Unix-like systems not using the GNU toolbox is their
provision of double-dashed long options. As per the GNU Coding Standards,
command-line utilities are required to provide at least two such options:
--help and
--version.[1]
When a user enters a command with the --help option, the
respective utility is supposed to “output brief documentation for how to
invoke the program, on standard output, then exit
successfully.”[2] In addition to
that, utility authors are asked to “place lines giving the email address for
bug reports, the package’s home page [...], and the general page for help using
GNU programs”[3] “[n]ear the end
of the ‘--help’ option’s
output”[4]; and they may choose
“to mention other appropriate mailing lists and web
pages”[5] as well.
While it’s very much debatable whether the latter two are good ideas, a
built-in quick overview of how to invoke a command correctly surely is a
reasonable thing to have. However, the reality of help text built into GNU
utilities is that even that part is often not exactly brief. For
coreutils 8.30, the line count of what commands put on the screen when
run with --help ranges between 10 and 168, the average being
36.[6] To give a few examples,
the help text of GNU ls comprises 128 lines, making it the second
largest in the collection, running cat --help displays 28 lines of
text, and cp --help weighs in at 87. The main reason these line
counts are often quite large is that GNU’s built-in help texts usually (if not
always) duplicate the entire description of the respective command’s argument
syntax from its manual page. Some (e.g., those of cat, chown, and
date) even include usage examples.
I think that, for the most part, what GNU does here is quite the opposite of helpful, for several reasons.
When I start up the terminal emulator in my desktop environment, its standard window dimensions will provide room for 24 lines of text. When I maximize that window vertically, which I usually do for reading or writing larger portions of text (including code), but hardly ever for anything else, that will give me room for 56 lines. I could increase that number to 58 by switching off window decorations. This is at a configured default font size of 15 pixels. However, I almost always increase the font size in maximized terminal windows by two zoom levels (i.e., 4 pixels, in my case), leaving me with 45 lines top-to-bottom when window decorations are removed. 45 is also the value I get for TTY sessions in my setup, where I use the 12x24-pixel variant of the Terminus font.
As it turns out, even in the most generous scenario (58 lines of vertical space), about 15% of GNU’s core utilities will display help texts that don’t fit on my screen. Going with the much more appropriate limit of 45 lines increases that fraction to nearly 21%. On top of the aforementioned cp and ls, these 21% also include, among others, chown, date, dd, du, ln, sort, stat, tail, tr, and [. And that only reflects the situation for coreutils. As already mentioned above, the built-in help text of GNU tar (1.30) amounts to 378 lines, and those of GNU grep (3.3) and GNU xargs (from findutils 4.6.0.225-235f) comprise 72 and 55 lines, respectively.
This means that running a GNU utility with the --help option,
will more or less often result in a wall of text being slammed onto the screen
that requires a pager to make sensible use of. But if the whole thing needs to
be (re-)run through a pager anyway, one might as well simply go right for the
manual page.
And there’s another good reason to do that: Most of the time, the formatting of GNU’s built-in command help is way worse than it would have to be, even for fixed-with plain text. (Just try one of the utilities mentioned above and see for yourself.) On the other hand, manual pages will be presented nicely justified and with reasonable line-wrapping and word-splitting based on the available number of columns.[7] And because they’re done using a formatting language, manual pages allow for beyond-plain-text formatting, which enhances readability significantly.
The situation is better, but not generally different, for
--version. For coreutils 8.30, running a command with this
option will result in either 7 or 8 lines of output. This is because, besides
the actual version information, GNU utilities’ --version output is
supposed to include at least a copyright notice, a license statement, “a brief
statement that the program is free
software”[8], and a warranty
disclaimer.[9] A list of the
major authors may also be
added[10], and virtually all
commands in coreutils include it. On top of all that, the output of
running a command with the --version option might include even
more information about it. This is the case for, e.g., GNU find:
$ find --version
find (GNU findutils) 4.6.0.225-235f
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later .
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Eric B. Decker, James Youngman, and Kevin Dalley.
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION FTS(FTS_CWDFD)
CBO(level=2)
While these few lines will easily fit on a single screen, the problem with
how GNU tools present version information should now be obvious: The
signal-to-noise ratio is simply awful because all except the first line of a
GNU utility’s --version output are quite certainly of no interest
to a user wanting to know which version of that utility they are running.
Everything but that first line should really just be kept in that utility’s
manual page. And, in fact, GNU’s manual pages usually include most of this
adjunct information already.
So, in essence, the way GNU does built-in command documentation combines
duplicating excessive amounts of information from the standard
documentation[11] with an
output format that is generally way less suitable for presenting all this
information, used in a way that is unnecessarily hard to handle. On top of
that, man and even info is less to type than
--help or --version. (Though, admittedly, the latter
would not be that much of an issue if they had gotten it right content-wise.)
Sadly, quite a few non-GNU utilities (e.g., bzip2 and
URxvt[12]) largely follow
this pattern.
Now, I think it’s safe to assume that neither the GNU project as a whole nor
the individual developers of GNU tools are deliberately trying to hurt the
users of their software. So, there must at least be some conceived merit in
implementing built-in command help the way they do it. However, asking about
this on their official IRC channel revealed that the two main reasons it’s done
this way are that, one, it’s common practice not only for GNU software, and,
two, GNU’s manual pages are, in fact, often generated using a tool named
help2man, which “produces simple manual pages from the
‘--help’ and ‘--version’ output of other
commands.”[13]
None of these qualify as a good reason. And the latter, of course, makes it a necessity to put lots of extraneous information into a command’s built-in documentation and is clearly a consequence of the GNU project discouraging the use of *roff manual pages – the standard form of documentation throughout the Unix-like operating system landscape[14] – in their Coding Standards.[15]
One argument I would have expected to hear is that putting a lot of information into built-in help text would enable the user to do without a manual page, should that be necessary. But then, what would be the actual merit in that? Unless for specialized use cases, where there are serious resource constraints, the proper installation of a command-line utility should and will normally include a manual page or other documentation. And while manual pages are indeed heavier than built-in help, having large amounts of the latter certainly isn’t desirable either in massively constrained environments because it adds weight to the utilities.
So, in conclusion, the way GNU software implements built-in command documentation is, for the most part, an example of how not to do that. It does, however, incorporate two features that are pretty useful. The first one is that the user can request built-in documentation explicitly, which is, for example, not the case with BSD utilities. The second one is that GNU utilities’ built-in documentation is, by default, localized.
[1] https://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html#Command_002dLine-Interfaces [back]
[2] https://www.gnu.org/prep/standards/html_node/_002d_002dhelp.html#g_t_002d_002dhelp [back]
[3] Ibid. [back]
[4] Ibid. [back]
[5] Ibid. [back]
[6] These numbers are based on obtaining the line counts of
running all utilities from GNU coreutils 8.30, as packaged for Devuan
Beowulf, with the --help option. This excludes coreutils,
i.e., the all-in-one-binary variant of GNU’s core utilities, /bin/kill
and /usr/bin/uptime, which are provided by procps-ng on this
system, and /bin/hostname, which is part of Devuan’s net-tools
package. Also excluded were dir and vdir, which are merely
eqivalents of running ls with a particular set of options, and
/usr/bin/test, which is just an alternate way to invoke
/usr/bin/[ (and, contrary to the latter,
doesn’t
provide help text by design). For a list of all tools involved, see
coreutils_for_stats.txt.
Whenever I make comments about GNU coreutils in the remainder of this text,
this is what I’m referring to.
[back]
[7] Unfortunately, once a manual page has been rendered, the
user is more or less stuck with whatever line width their terminal window or
TTY screen allowed for when they ran the man command. Content will
not be dynamically re-fit when the window or screen width changes, except for
some primitive and not exactly helpful line-wrapping that some pagers may apply.
[back]
[8] https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html [back]
[9] Ibid. [back]
[10] Ibid. [back]
[11] The fact that the GNU project doesn’t recognize Unix manual pages as standard documentation and actively discourages their creation and use is deliberately being ignored here. The Info pages they aim to offer instead consist of nothing but fixed-width plain text with hyperlinks, which makes them absolutely inferior to classic manual pages in terms of formatting and thus readability. [back]
[12] URxvt, bzip2, and probably others make things even worse by putting their help text out on standard error. First of all, this is conceptually wrong because when the user requests to see help text, that help text is regular command output and doesn’t classify as either an error or diagnostic message. The practical consequence of this is that to pipe the built-in help text of these tools to a pager, the user first has to redirect standard error to standard output, which is quite annoying. [back]
[13] https://www.gnu.org/software/help2man/ [back]
[14] To be fair, only the man utility is part of POSIX, and its specification doesn’t really say much about what it’s supposed to do and how it’s supposed to do these things. Nevertheless, Unix-like systems in general, even Linux by itself, treat traditional Unix-style manual pages as the standard form of “online” documentation. [back]
[15] Cf. https://www.gnu.org/prep/standards/html_node/GNU-Manuals.html#GNU-Manuals and https://www.gnu.org/prep/standards/html_node/Man-Pages.html#Man-Pages. [back]