No fun with auto-paged ls
. A tale of futile
optimization.
Something that had recently begun to annoy me when using the command line
was that more often than not, when I'd list files with ls
, I'd
have to re-run the command right after that, but pipe its output to a pager
this time, because the listing didn't fit into the terminal window. As I often
prefer to see one file per line instead of the multi-column output
ls
provides by default, this quickly became a nuisance. So, I set
out to find a clean and simple way of having the output of ls
piped to a pager automatically whenever it would exceed the number of lines
available in a terminal window or at a TTY.
It turned out to be a futile endeavor for reasons that came to show when I
put what I had initially perceived as a viable solution to the test of actual
day-to-day command-line usage. But the attempt wasn't all without reward. It
let me experience first hand how trying to make something work more efficiently
can backfire and break things, it taught me how not to use shell command
aliases and why, and it made me aware of the powers of less
, the
de facto standard pager for Unix-like
systems.[1]
But let's see how it all went from the start.
Finding an approach that seems right
My initial idea was to have the output of ls
piped to a pager
only when that is really necessary, i.e., only when the output really doesn't
fit into the terminal window or on the screen at a TTY. That, however, turned
out to be a bit too clever to be useful simply because the available number of
lines is not something static. It changes with every command that is entered.
Additionally, a TTY screen may be split up into multiple virtual screens of
different sizes by a terminal multiplexer such as tmux
, and X
windows may be resized any time. Thus, implementing that kind of behavior would
require: a) determining the number of lines available for unpaged output, b)
finding out how many lines ls
will put out, c) comparing the two
and d) using a pager if the latter is greater than the former. And all of that
every time ls
is run.
Sure, this can all be implemented in a shell function of just a few lines.
Determining the number of available lines, be it on the TTY or inside an X
terminal window, is as easy as running tput lines
. Depending on
your hardware, there might not even be a noticeable difference in execution
speed compared to running ls
directly. The one thing, however,
that really made this approach appear wrong to me was that the only way to
reliably pre-count the lines ls
will put out is to run it and
count them. This means that to use a pager conditionally, ls
will
always have to be run twice to display the contents of a directory. That surely
is a pattern to avoid.
The other, more blunt approach is to just always use a pager. And after
reading some things on the
Web[2], I came up with a
potential solution doing exactly that. It involved a simple shell function as
well as a few command aliases and mainly relied on features offered by the
remarkably versatile less
pager.
The code looked like this:
lsmore() { ls "$@" | LESS= less -eFiRSX } alias l1='lsmore --color=always -p' alias la='lsmore --color=always -p -A' alias ll='lsmore --color=always -p -l -h' alias lla='lsmore --color=always -p -A -l -h'
Now, let's take that apart.
Why aliases won't do
The first thing to explain about this humble script is why it involves a
function and is not merely a bunch of command aliases. The answer is simple: It
keeps the arguments ls
is invoked with inside lsmore
up to the user instead of having a particular set of them hard-coded into the
command call. "$@"
will be replaced by all the arguments the
function was called with, expanded as individual entities. This cannot be
achieved simply by using aliases because an alias would have to include piping
to less
already, meaning that any argument passed to that alias
would be interpreted as an argument to less
instead of
ls
.
The following code example gives a taste of the undesired results that would
cause. (Don't worry about the options passed to less
at this
point. They will be explained later and are, in part, merely used here to aid
the demonstration. If you're curious, read the manual.)
$ ls a_dir/ a_file $ alias ll='ls -l | less -eFiRSX' $ ll total 8 drwxr-xr-x 2 msi msi 4096 Jul 14 22:04 a_dir/ -rw-r--r-- 1 msi msi 19 Jul 14 22:05 a_file $ ll -h Value is required after -h (--max-back-scroll) $ ll a_dir/ a_dir/ is a directory $ ll a_file Hello, I'm a file. $
Without any arguments, the alias works just fine. But as soon as arguments
are added, its fundamental flaw starts to show: Running ll -h
appends -h
as an option to less
, making it return an
error because -h
is supposed to set “a maximum number of lines to
scroll backward” (says the manual page) and thus requires a number as an
argument. Next, specifying the directory name a_dir
as an argument
results in invoking less
on a directory, which won't work because
less
is a pager, i.e., meant “for paging through text [files] one
screenful at a time” (more(1)), not a tool to list files in a directory.
Finally, calling ll
with the name of the regular file
a_file
as an operand will display the file's content because it
translates to running less
on that file.
You might wonder why the output of the ls -l
part of the alias
isn't visible in the latter two cases. That's simply because less
gives file operands precedence over piped
input.[3] (The
more
command behaves differently, on GNU/Linux at least.)
A few more words on aliases
There is a basic rule for shell scripting that says: “Variables hold data. Functions hold code. Don't put code inside variables!”[4] Sort of a similar rule can be applied to command aliases compared to shell functions: Aliases should contain only simple commands, meaning nothing more than a single command call with the desired options and without any operands. And even those options should be kept down to a carefully selected minimum. Everything else is prone to breaking part of your Unix toolbox, as has been demonstrated above.
Additionally, it is not necessarily a good idea to use the name of a
standard – or de facto standard – command as an alias. Doing that is equivalent
to replacing that command in your shell environment with whatever you put after
the equal sign in the alias definition, i.e., by something that may diverge
from the standard behavior of that command considerably. The change will only
apply to interactive shell usage and not to scripts, but that may still be
quite problematic, for two reasons: First, working with aliases that introduce
non-standard behavior under standard command names on a daily basis, you'll get
used to that non-standard behavior and run a high risk of accidentally
expecting it elsewhere until you hit a possibly painful reminder. For
comparison: You probably wouldn't go about rebinding all of Vim's command keys
to your liking because that would not only reduce your ability to work
efficiently with a standard Vim, you might also accidentally trash your work
because some custom keybinding you're used to conflicts with Vim's defaults in
a particularly dangerous way. Second, the only two ways to regain a command's
standard behavior in cases you need it would either be to temporarily remove
the alias (using unalias
) or to invoke that command with the full
path to the executable, which you might need to look up before. None of that is
exactly convenient.
Also, while I generally wouldn't recommend chaining together several
commands in an alias definition in the first place, it is a particularly bad
idea to do that under the name of a standard command. Unix tools are
traditionally designed to ‘do one thing and do it well’. For example, listing
files within a directory is a distinct task from paging. Thus, there are two
different tools for these tasks: ls
and less
(or
more
, or most
). Cramming both of them into an alias
definition under the name ls
completely breaks that concept. (It
will become clear below that this also holds for using a function to combine
commands.)
That said, there are harmless and even useful examples, too. For instance, I
use cal
as an alias of ncal -w
simply because I
prefer a different output format. But it's essentially still just running
cal
. So, if I ran cal
on a system that doesn't have
this alias, I will still get meaningful output and not cause any damage. I even
use aliases to make cp
, mv
and rm
ask
for confirmation before overwriting or deleting files when I'm doing things as
root. That seems to be somewhat controversial, but I prefer it that way.
Nonetheless, the Bash Reference Manual is making a fair point when it says: “For almost every purpose, shell functions are preferred over aliases.”[5]
Apropos shell functions
Just as with aliases, function names should generally be chosen in such a way that they don't conflict with commands that already exist on the system and are in the shell's search path because the names of user-defined shell functions will take precedence over those commands. For example, consider the following:
nano() { echo "Use vi." }
Defining this function as part of your shell environment would,
unsurprisingly, lead to being told to use vi
every time you run
nano
, as long as you don't provide the full path to the
nano
executable.
Function names should, of course, also be somewhat meaningful, i.e., convey
a sense of what a function does. Where all a function does is pipe the output
of ls
to a pager, the name lsmore
does a pretty good
job at that. It would have been more exact to call it lsless
. I
just happened to dislike that as a name, so I chose the closest equivalent.
By the way, a side effect of the lsmore
function being very
simple is that it is also POSIX-compliant and would therefore run in any
POSIX-compatible shell, such as Bash, DASH, KornShell or Z Shell, as long as
less
is available.
The problem with using less
unconditionally
Piping the output of ls
to less
unconditionally
poses a few problems. Ironically, solving them is mostly a matter of bringing
the behavior of less
closer to that of more
, the old
standard Unix pager.
The first problem is that, by default, less
doesn't exit
automatically when it reaches the end of a file. This means that running the
following, simplified version of lsmore
will always land the user
inside a captive interface that will need to be exited manually.
lsmore() { ls "$@" | less }
The whole point of writing lsmore
, however, was to increase the
efficiency of working on the command line. And effectively creating a variant
of ls
that the user will always have to exit before being able to
continue working in the shell really isn't serving that purpose. Whenever the
listing of files fits the window or screen and paging would thus not have been
necessary in the first place, the effect of running lsmore
should,
instead, be indistinguishable from that of running ls
directly.
The pager should not add any noise in such a case and exit seemlessly.
Fortunately, making less
behave this way is easily accomplished
by adding the options -F
and -X
to the command call.
As the manual page explains, -F
“causes less to automatically exit
if the entire file can be displayed on the first screen.” The problem with that
is that less
will normally also clear the terminal of its output
when it exits. -X
prevents that.
While calling less
with -FX
gets the major two
annoyances of running it unconditionally out of the way, hitting q
to exit it will still be required whenever the output of ls
actually has to be paged. A better solution would be to have less
exit automatically when the end of the file list is reached. This is exactly
how more
behaves. However, this behavior has a notable drawback:
If the pager exits immediately at the end of a file, you can't use it to go
backwards from the last line. And that in some way defeats the purpose of using
a pager.
So, it would be nice if there was some kind of double bottom to prevent
that. Having to hit q
to leave the pager already provides that.
But there's another, maybe even better solution. less
has an
option -e
that will make it exit whenever it reaches end-of-file
two times in a row. This means that when the user has paged or scrolled down to
the last line of a file, hitting any key that makes less
move
forward, such as “cursor down”, “page down” or Enter, will be enough to
leave the pager (with the exception of the END key). The downside is
that this option is rendered useless if the user scrolls or pages holding down
the respective key the whole time because it will make less
hit
end-of-file twice immediately after the last line and exit.
Another problem with piping the output of ls
to
less
is that the colorization of that output will normally be
lost. Preserving it involves two steps. The first one is to make sure
ls
will always color its output, i.e., even when it sits at the
left end of a pipe. This is why all the aliases in the first code example
contain --color=always
, which will be passed to the
ls
command inside lsmore
. The second one is to make
less
allow for ANSI color escape sequences in its output, which is
easily achieved by adding the option -R
to the command call.
Additional tweaks
In the final version of lsmore
shown in the first code example,
less
was given two additional options that haven't been discussed
yet: -i
and -S
. -i
will make
less
ignore the case of search terms as long as they are all
lowercase. -S
will cause it to truncate long lines instead of
wrapping them. But you'll still be able to read the truncated parts of a line
by scrolling sideways.
Another thing to explain about that code is what LESS=
directly
before the command call is for. LESS
is an environment variable
read by the less
program each time it is run. It is meant to
contain all options that should be passed to the pager automatically. For
example, if you wanted less
to always number lines, you could
simply put export LESS="-N"
into ~/.profile
. Saying
LESS=
before running less
simply resets the
LESS
variable to being empty for the following command, i.e., it
will make sure the program is really only run with the options specified after
the command name.
Some sensible aliases
Now, lsmore
is a comparatively long command name and thus
rather inconvenient to type every time, even where auto-completion is
available. (It's not distinct from lsmod
until the fifth letter,
for example.) So, it's a good idea to create some sensible aliases as shortcuts
to calling that function with the most common options for ls
.
As I'm using the command line quite a lot, I already had a bunch of aliases
for the most common ls
command variants in place, looking like
this:
alias l1='ls --color=auto -p -1' alias la='ls --color=auto -p -A' alias ll='ls --color=auto -p -l -h' alias lla='ls --color=auto -p -A -l -h'
So, I simply replaced ls
with lsmore
and then made
a few adjustments:
alias l1='lsmore --color=always -p' alias la='lsmore --color=always -p -A' alias ll='lsmore --color=always -p -l -h' alias lla='lsmore --color=always -p -A -l -h'
As mentioned above, --color=always
is needed for preserving
colored output in less
. The only other change made here is that I
omitted -1
from the options in the l1
alias simply
because the output of lsmore
will only show one file per line
anyway.
The catch
Depending on your level of shell usage and scripting experience, this may be looking pretty good. And indeed: It's easy to understand, small, portable and somewhat flexible. But: It's not a viable solution. Let's see why.
The first flaw to point out here is that lsmore
doesn't allow
for inserting any additional commands between ls
and
less
. In other words, using it means losing the ability to pipe
the output of ls
to anything else before it is handed over to the
pager. So, if, for example, you wanted to have a file list with line numbers
and ran ll | nl
, you would get numbered lines, but the output
would not be paged because nl
would work on the output of
less
rather than that of ls
.
And while less
can number lines by itself, lsmore
doesn't provide any way of adding the respective option to the command call
because the function's argument array is handed to ls
, not to
less
. That's the second flaw: Options to less
are
hard-coded. So much for the above-mentioned flexibility of this approach.
The third problem is with output colorization. Preserving colored output
required switching colorization on unconditionally and thus overriding
ls
's automatic mode, which would turn colorization off whenever
the command's output is piped to another program or redirected to a file. This
means that when the output of lsmore
is redirected to a file, that
file will contain ANSI color escape sequences if the output of ls
contains anything that would normally be colored (directories, for example). In
other words, there will likely be junk text in such a file.
So, as nice as auto-paging ls
might seem at first, it really
creates several considerably more dramatic problems than the one it was meant
to solve. Compared to that, occasionally having to run ls | less
after ls
almost seems like a non-issue.
You could, of course, also simply use a file manager to avoid the trouble of
having to pipe ls
to a pager altogether. It doesn't have to be
graphical one. Midnight Commander comes to mind, but there is, in fact, quite a
variety of terminal file managers to choose from these days.
But then, terminal emulators normally provide a searchable scrollback buffer that can usually be inflated beyond reason. Also, columnized output isn't that bad, actually. It does save a lot of space.
Last but not least, if I was looking for something in a directory containing
lots of files, I'd just use grep
to find it unless I
really wasn't sure what I'm actually looking for.
Addendum, 2020-07-17
Ignoring aliases and function definitions in command executions
Speaking about alias and function definitions and how to bypass them later,
the above article claimed that there were but two ways to bypass an alias that
carries the name of an existing command: either using unalias
to
temporarily undefine it, or providing the full path to the executable. The
latter was claimed to be the only way to bypass shell functions.
This is not correct because POSIX-compliant shells provide the
command
built-in, which, among other things, can be used as a
means to bypass alias and function definitions.
So, for example, running
command nanowould suffice to override both
alias nano=viand
nano() { echo "Use vi." }in the current shell environment.
The only thing to be aware of when using the command
built-in
like that is that the executable in question needs to be in the user’s
PATH
for it to work.
Using pg
as an alias of less
Concerning auto-paging the output of ls
, the article concluded
that it’s best to stop worrying about that and either just keep piping to a
pager as needed, or make use of a file manager or the terminal’s scrollback
buffer instead. But then, piping to a pager can still be made a little more
convenient.
For me, personally, piping ls
to less
is a bit
problematic because I often happen to mistype ls
for
less
and vice versa. To eliminate this problem, I simply declared
more
as an alias of less
, at first, since I have no
reason to use more
. But then, there is no reason to make that
alias a four-letter word either.
While trying to come up with something shorter that would make sense, I
remembered reading that Unix used to have another pager besides
more
that was called pg
. As a bit of research and an
ensuing discussion on the mailing list of The
Unix Heritage Society revealed, pg
(or rather: the most
prominent of several pager implementations going by that name) was officially
introduced with the second release of Unix System V (SVR2) in
1984.[6]
Excluded from POSIX since
2001,[7] the pg
command is, as it seems, only still readily available in recent releases of one
major open-source Unix-like system: illumos. Linux still ships it as part of
the util-linux package, but “for backward compatibility
only”,[8] and stopped having
it built by default with the release of util-linux 2.29 in late
2016.[9] Free-, Net-, and
OpenBSD don’t have it.[10]
Given the fact that pg
is not part of POSIX, doesn’t exist in
the base system of the major BSDs, and is a deprecated command not even built
by default anymore on Linux, there’s really no reason not to use it as an alias
for less
. After all, it is a fundamentally better name for a pager
than more
, less
, or most
– the latter
two not making any sense except for being clever puns resulting in command
names worse than that of the standard utility they aim to improve
upon.[11] Instead,
pg
for “page(r)” is much like cp
for
“copy”, mv
for “move”, ln
for “link” etc., i.e., a reasonable abbreviation for a meaningful
name. On English-language keyboards, PgUp and PgDown also seem to
be common abbreviations for PageUp and PageDown already. Last but
not least, pg
is 50% shorter than less
, and having to
type just two instead of four letters to call a command that is executed
manually quite often sure is an improvement.
So, my personal shell command aliases now include
alias more=less alias pg=lesson any Unix-like system I run.
[1] Though not part of POSIX, less
is usually
available on Unix-like systems and the default pager for most Linux
distributions as well as BSD systems. Fortunately, less
is also
one of those tools that have a single canonical implementation that basically
everyone uses. Therefore, less
will be less
on
virtually any system that ships it, the less
version included with
BusyBox being a notable exception. [back]
[2] Things to mention here are
a post on askubuntu.com that made me aware of why the
solution must involve a function as well as one or two other websites that
turned my attention to the command-line options offered by less
.
Unfortunately, I didn't bookmark those. [back]
[3] This is at the time of writing and might change in the
future. I had a bit of e-mail conversation with Mark Nudelman in the course of
writing this article and he told me that the question of how less
would handle such a case had just never come up before because he wouldn't have
expected anyone to both pipe data to less
and provide a file name
at the same time. [back]
[4] https://mywiki.wooledge.org/BashFAQ/050. By the way, dismissing this idea as nonsensical on the grounds that “everything is code” – or “data” – and then going on about it, as I've had the pleasure to experience first hand on IRC, is really beside the point. As much as the terminology is debatable, it is both clear to be seen what is meant by “code” versus “data” in this particular context and easily understandable why that rule makes sense. [back]
[5] https://www.gnu.org/software/bash/manual/html_node/Aliases.html [back]
[6] The discussion on that mailing list revealed quite a bit
more than that. However, the actual provenance of the SVR2 pg
utility still remained unclear.
[back]
[7] Cf. The Open Group Base Specifications Issue 6/IEEE Std 1003.1, 2004 Edition, Rationale volume, chapter C.4 Utilities. [back]
[8] https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/text-utils/pg.c?h=stable/v2.35 [back]
[9] Cf. Util-linux 2.29 Release Notes. [back]
[10] In fact, these systems don’t even ship more
and instead only provide less
by default, which will be invoked in
more
compatibility mode whenever a user runs the more
command because /usr/bin/more
is merely a hard link to the
less
executable.
[back]
[11] An explanation of how more
got its name is
provided by its original author at
https://danhalbert.org/more.html.
[back]