No fun with auto-paged `ls`. A tale of futile optimization.

2019-07-17, updated 2020-07-17

Something that had recently begun to annoy me when using the command line was that more often than not, when I'd list files with ls, I'd have to re-run the command right after that, but pipe its output to a pager this time, because the listing didn't fit into the terminal window. As I often prefer to see one file per line instead of the multi-column output ls provides by default, this quickly became a nuisance. So, I set out to find a clean and simple way of having the output of ls piped to a pager automatically whenever it would exceed the number of lines available in a terminal window or at a TTY.

It turned out to be a futile endeavor for reasons that came to show when I put what I had initially perceived as a viable solution to the test of actual day-to-day command-line usage. But the attempt wasn't all without reward. It let me experience first hand how trying to make something work more efficiently can backfire and break things, it taught me how not to use shell command aliases and why, and it made me aware of the powers of less, the de facto standard pager for Unix-like systems.^[1]

But let's see how it all went from the start.

Finding an approach that seems right

My initial idea was to have the output of ls piped to a pager only when that is really necessary, i.e., only when the output really doesn't fit into the terminal window or on the screen at a TTY. That, however, turned out to be a bit too clever to be useful simply because the available number of lines is not something static. It changes with every command that is entered. Additionally, a TTY screen may be split up into multiple virtual screens of different sizes by a terminal multiplexer such as tmux, and X windows may be resized any time. Thus, implementing that kind of behavior would require: a) determining the number of lines available for unpaged output, b) finding out how many lines ls will put out, c) comparing the two and d) using a pager if the latter is greater than the former. And all of that every time ls is run.

Sure, this can all be implemented in a shell function of just a few lines. Determining the number of available lines, be it on the TTY or inside an X terminal window, is as easy as running tput lines. Depending on your hardware, there might not even be a noticeable difference in execution speed compared to running ls directly. The one thing, however, that really made this approach appear wrong to me was that the only way to reliably pre-count the lines ls will put out is to run it and count them. This means that to use a pager conditionally, ls will always have to be run twice to display the contents of a directory. That surely is a pattern to avoid.

The other, more blunt approach is to just always use a pager. And after reading some things on the Web^[2], I came up with a potential solution doing exactly that. It involved a simple shell function as well as a few command aliases and mainly relied on features offered by the remarkably versatile less pager.

The code looked like this:

lsmore() {
  ls "$@" | LESS= less -eFiRSX
}

alias l1='lsmore --color=always -p'
alias la='lsmore --color=always -p -A'
alias ll='lsmore --color=always -p -l -h'
alias lla='lsmore --color=always -p -A -l -h'

Now, let's take that apart.

Why aliases won't do

The first thing to explain about this humble script is why it involves a function and is not merely a bunch of command aliases. The answer is simple: It keeps the arguments ls is invoked with inside lsmore up to the user instead of having a particular set of them hard-coded into the command call. "$@" will be replaced by all the arguments the function was called with, expanded as individual entities. This cannot be achieved simply by using aliases because an alias would have to include piping to less already, meaning that any argument passed to that alias would be interpreted as an argument to less instead of ls.

The following code example gives a taste of the undesired results that would cause. (Don't worry about the options passed to less at this point. They will be explained later and are, in part, merely used here to aid the demonstration. If you're curious, read the manual.)

$ ls
a_dir/  a_file
$ alias ll='ls -l | less -eFiRSX'
$ ll
total 8
drwxr-xr-x 2 msi msi 4096 Jul 14 22:04 a_dir/
-rw-r--r-- 1 msi msi   19 Jul 14 22:05 a_file
$ ll -h
Value is required after -h (--max-back-scroll)
$ ll a_dir/
a_dir/ is a directory
$ ll a_file
Hello, I'm a file.
$

Without any arguments, the alias works just fine. But as soon as arguments are added, its fundamental flaw starts to show: Running ll -h appends -h as an option to less, making it return an error because -h is supposed to set “a maximum number of lines to scroll backward” (says the manual page) and thus requires a number as an argument. Next, specifying the directory name a_dir as an argument results in invoking less on a directory, which won't work because less is a pager, i.e., meant “for paging through text [files] one screenful at a time” (more(1)), not a tool to list files in a directory. Finally, calling ll with the name of the regular file a_file as an operand will display the file's content because it translates to running less on that file.

You might wonder why the output of the ls -l part of the alias isn't visible in the latter two cases. That's simply because less gives file operands precedence over piped input.^[3] (The more command behaves differently, on GNU/Linux at least.)

A few more words on aliases

There is a basic rule for shell scripting that says: “Variables hold data. Functions hold code. Don't put code inside variables!”^[4] Sort of a similar rule can be applied to command aliases compared to shell functions: Aliases should contain only simple commands, meaning nothing more than a single command call with the desired options and without any operands. And even those options should be kept down to a carefully selected minimum. Everything else is prone to breaking part of your Unix toolbox, as has been demonstrated above.

Additionally, it is not necessarily a good idea to use the name of a standard – or de facto standard – command as an alias. Doing that is equivalent to replacing that command in your shell environment with whatever you put after the equal sign in the alias definition, i.e., by something that may diverge from the standard behavior of that command considerably. The change will only apply to interactive shell usage and not to scripts, but that may still be quite problematic, for two reasons: First, working with aliases that introduce non-standard behavior under standard command names on a daily basis, you'll get used to that non-standard behavior and run a high risk of accidentally expecting it elsewhere until you hit a possibly painful reminder. For comparison: You probably wouldn't go about rebinding all of Vim's command keys to your liking because that would not only reduce your ability to work efficiently with a standard Vim, you might also accidentally trash your work because some custom keybinding you're used to conflicts with Vim's defaults in a particularly dangerous way. Second, the only two ways to regain a command's standard behavior in cases you need it would either be to temporarily remove the alias (using unalias) or to invoke that command with the full path to the executable, which you might need to look up before. None of that is exactly convenient.

Also, while I generally wouldn't recommend chaining together several commands in an alias definition in the first place, it is a particularly bad idea to do that under the name of a standard command. Unix tools are traditionally designed to ‘do one thing and do it well’. For example, listing files within a directory is a distinct task from paging. Thus, there are two different tools for these tasks: ls and less (or more, or most). Cramming both of them into an alias definition under the name ls completely breaks that concept. (It will become clear below that this also holds for using a function to combine commands.)

That said, there are harmless and even useful examples, too. For instance, I use cal as an alias of ncal -w simply because I prefer a different output format. But it's essentially still just running cal. So, if I ran cal on a system that doesn't have this alias, I will still get meaningful output and not cause any damage. I even use aliases to make cp, mv and rm ask for confirmation before overwriting or deleting files when I'm doing things as root. That seems to be somewhat controversial, but I prefer it that way.

Nonetheless, the Bash Reference Manual is making a fair point when it says: “For almost every purpose, shell functions are preferred over aliases.”^[5]

Apropos shell functions

Just as with aliases, function names should generally be chosen in such a way that they don't conflict with commands that already exist on the system and are in the shell's search path because the names of user-defined shell functions will take precedence over those commands. For example, consider the following:

nano() {
  echo "Use vi."
}

Defining this function as part of your shell environment would, unsurprisingly, lead to being told to use vi every time you run nano, as long as you don't provide the full path to the nano executable.

Function names should, of course, also be somewhat meaningful, i.e., convey a sense of what a function does. Where all a function does is pipe the output of ls to a pager, the name lsmore does a pretty good job at that. It would have been more exact to call it lsless. I just happened to dislike that as a name, so I chose the closest equivalent.

By the way, a side effect of the lsmore function being very simple is that it is also POSIX-compliant and would therefore run in any POSIX-compatible shell, such as Bash, DASH, KornShell or Z Shell, as long as less is available.

The problem with using `less` unconditionally

Piping the output of ls to less unconditionally poses a few problems. Ironically, solving them is mostly a matter of bringing the behavior of less closer to that of more, the old standard Unix pager.

The first problem is that, by default, less doesn't exit automatically when it reaches the end of a file. This means that running the following, simplified version of lsmore will always land the user inside a captive interface that will need to be exited manually.

lsmore() {
  ls "$@" | less
}

The whole point of writing lsmore, however, was to increase the efficiency of working on the command line. And effectively creating a variant of ls that the user will always have to exit before being able to continue working in the shell really isn't serving that purpose. Whenever the listing of files fits the window or screen and paging would thus not have been necessary in the first place, the effect of running lsmore should, instead, be indistinguishable from that of running ls directly. The pager should not add any noise in such a case and exit seemlessly.

Fortunately, making less behave this way is easily accomplished by adding the options -F and -X to the command call. As the manual page explains, -F “causes less to automatically exit if the entire file can be displayed on the first screen.” The problem with that is that less will normally also clear the terminal of its output when it exits. -X prevents that.

While calling less with -FX gets the major two annoyances of running it unconditionally out of the way, hitting q to exit it will still be required whenever the output of ls actually has to be paged. A better solution would be to have less exit automatically when the end of the file list is reached. This is exactly how more behaves. However, this behavior has a notable drawback: If the pager exits immediately at the end of a file, you can't use it to go backwards from the last line. And that in some way defeats the purpose of using a pager.

So, it would be nice if there was some kind of double bottom to prevent that. Having to hit q to leave the pager already provides that. But there's another, maybe even better solution. less has an option -e that will make it exit whenever it reaches end-of-file two times in a row. This means that when the user has paged or scrolled down to the last line of a file, hitting any key that makes less move forward, such as “cursor down”, “page down” or Enter, will be enough to leave the pager (with the exception of the END key). The downside is that this option is rendered useless if the user scrolls or pages holding down the respective key the whole time because it will make less hit end-of-file twice immediately after the last line and exit.

Another problem with piping the output of ls to less is that the colorization of that output will normally be lost. Preserving it involves two steps. The first one is to make sure ls will always color its output, i.e., even when it sits at the left end of a pipe. This is why all the aliases in the first code example contain --color=always, which will be passed to the ls command inside lsmore. The second one is to make less allow for ANSI color escape sequences in its output, which is easily achieved by adding the option -R to the command call.

Additional tweaks

In the final version of lsmore shown in the first code example, less was given two additional options that haven't been discussed yet: -i and -S. -i will make less ignore the case of search terms as long as they are all lowercase. -S will cause it to truncate long lines instead of wrapping them. But you'll still be able to read the truncated parts of a line by scrolling sideways.

Another thing to explain about that code is what LESS= directly before the command call is for. LESS is an environment variable read by the less program each time it is run. It is meant to contain all options that should be passed to the pager automatically. For example, if you wanted less to always number lines, you could simply put export LESS="-N" into ~/.profile. Saying LESS= before running less simply resets the LESS variable to being empty for the following command, i.e., it will make sure the program is really only run with the options specified after the command name.

Some sensible aliases

Now, lsmore is a comparatively long command name and thus rather inconvenient to type every time, even where auto-completion is available. (It's not distinct from lsmod until the fifth letter, for example.) So, it's a good idea to create some sensible aliases as shortcuts to calling that function with the most common options for ls.

As I'm using the command line quite a lot, I already had a bunch of aliases for the most common ls command variants in place, looking like this:

alias l1='ls --color=auto -p -1'
alias la='ls --color=auto -p -A'
alias ll='ls --color=auto -p -l -h'
alias lla='ls --color=auto -p -A -l -h'

So, I simply replaced ls with lsmore and then made a few adjustments:

alias l1='lsmore --color=always -p'
alias la='lsmore --color=always -p -A'
alias ll='lsmore --color=always -p -l -h'
alias lla='lsmore --color=always -p -A -l -h'

As mentioned above, --color=always is needed for preserving colored output in less. The only other change made here is that I omitted -1 from the options in the l1 alias simply because the output of lsmore will only show one file per line anyway.

The catch

Depending on your level of shell usage and scripting experience, this may be looking pretty good. And indeed: It's easy to understand, small, portable and somewhat flexible. But: It's not a viable solution. Let's see why.

The first flaw to point out here is that lsmore doesn't allow for inserting any additional commands between ls and less. In other words, using it means losing the ability to pipe the output of ls to anything else before it is handed over to the pager. So, if, for example, you wanted to have a file list with line numbers and ran ll | nl, you would get numbered lines, but the output would not be paged because nl would work on the output of less rather than that of ls.

And while less can number lines by itself, lsmore doesn't provide any way of adding the respective option to the command call because the function's argument array is handed to ls, not to less. That's the second flaw: Options to less are hard-coded. So much for the above-mentioned flexibility of this approach.

The third problem is with output colorization. Preserving colored output required switching colorization on unconditionally and thus overriding ls's automatic mode, which would turn colorization off whenever the command's output is piped to another program or redirected to a file. This means that when the output of lsmore is redirected to a file, that file will contain ANSI color escape sequences if the output of ls contains anything that would normally be colored (directories, for example). In other words, there will likely be junk text in such a file.

So, as nice as auto-paging ls might seem at first, it really creates several considerably more dramatic problems than the one it was meant to solve. Compared to that, occasionally having to run ls | less after ls almost seems like a non-issue.

You could, of course, also simply use a file manager to avoid the trouble of having to pipe ls to a pager altogether. It doesn't have to be graphical one. Midnight Commander comes to mind, but there is, in fact, quite a variety of terminal file managers to choose from these days.

But then, terminal emulators normally provide a searchable scrollback buffer that can usually be inflated beyond reason. Also, columnized output isn't that bad, actually. It does save a lot of space.

Last but not least, if I was looking for something in a directory containing lots of files, I'd just use grep to find it unless I really wasn't sure what I'm actually looking for.

Addendum, 2020-07-17

Ignoring aliases and function definitions in command executions

Speaking about alias and function definitions and how to bypass them later, the above article claimed that there were but two ways to bypass an alias that carries the name of an existing command: either using unalias to temporarily undefine it, or providing the full path to the executable. The latter was claimed to be the only way to bypass shell functions.

This is not correct because POSIX-compliant shells provide the command built-in, which, among other things, can be used as a means to bypass alias and function definitions.

So, for example, running

command nano

would suffice to override both

alias nano=vi

and

nano() {
  echo "Use vi."
}

in the current shell environment.

The only thing to be aware of when using the command built-in like that is that the executable in question needs to be in the user’s PATH for it to work.

Using `pg` as an alias of `less`

Concerning auto-paging the output of ls, the article concluded that it’s best to stop worrying about that and either just keep piping to a pager as needed, or make use of a file manager or the terminal’s scrollback buffer instead. But then, piping to a pager can still be made a little more convenient.

For me, personally, piping ls to less is a bit problematic because I often happen to mistype ls for less and vice versa. To eliminate this problem, I simply declared more as an alias of less, at first, since I have no reason to use more. But then, there is no reason to make that alias a four-letter word either.

While trying to come up with something shorter that would make sense, I remembered reading that Unix used to have another pager besides more that was called pg. As a bit of research and an ensuing discussion on the mailing list of The Unix Heritage Society revealed, pg (or rather: the most prominent of several pager implementations going by that name) was officially introduced with the second release of Unix System V (SVR2) in 1984.^[6]

Excluded from POSIX since 2001,^[7] the pg command is, as it seems, only still readily available in recent releases of one major open-source Unix-like system: illumos. Linux still ships it as part of the util-linux package, but “for backward compatibility only”,^[8] and stopped having it built by default with the release of util-linux 2.29 in late 2016.^[9] Free-, Net-, and OpenBSD don’t have it.^[10]

Given the fact that pg is not part of POSIX, doesn’t exist in the base system of the major BSDs, and is a deprecated command not even built by default anymore on Linux, there’s really no reason not to use it as an alias for less. After all, it is a fundamentally better name for a pager than more, less, or most – the latter two not making any sense except for being clever puns resulting in command names worse than that of the standard utility they aim to improve upon.^[11] Instead, pg for “page(r)” is much like cp for “copy”, mv for “move”, ln for “link” etc., i.e., a reasonable abbreviation for a meaningful name. On English-language keyboards, PgUp and PgDown also seem to be common abbreviations for PageUp and PageDown already. Last but not least, pg is 50% shorter than less, and having to type just two instead of four letters to call a command that is executed manually quite often sure is an improvement.

So, my personal shell command aliases now include

alias more=less
alias pg=less

on any Unix-like system I run.

[1] Though not part of POSIX, less is usually available on Unix-like systems and the default pager for most Linux distributions as well as BSD systems. Fortunately, less is also one of those tools that have a single canonical implementation that basically everyone uses. Therefore, less will be less on virtually any system that ships it, the less version included with BusyBox being a notable exception. [back]

[2] Things to mention here are a post on askubuntu.com that made me aware of why the solution must involve a function as well as one or two other websites that turned my attention to the command-line options offered by less. Unfortunately, I didn't bookmark those. [back]

[3] This is at the time of writing and might change in the future. I had a bit of e-mail conversation with Mark Nudelman in the course of writing this article and he told me that the question of how less would handle such a case had just never come up before because he wouldn't have expected anyone to both pipe data to less and provide a file name at the same time. [back]

[4] https://mywiki.wooledge.org/BashFAQ/050. By the way, dismissing this idea as nonsensical on the grounds that “everything is code” – or “data” – and then going on about it, as I've had the pleasure to experience first hand on IRC, is really beside the point. As much as the terminology is debatable, it is both clear to be seen what is meant by “code” versus “data” in this particular context and easily understandable why that rule makes sense. [back]

[5] https://www.gnu.org/software/bash/manual/html_node/Aliases.html [back]

[6] The discussion on that mailing list revealed quite a bit more than that. However, the actual provenance of the SVR2 pg utility still remained unclear. [back]

[7] Cf. The Open Group Base Specifications Issue 6/IEEE Std 1003.1, 2004 Edition, Rationale volume, chapter C.4 Utilities. [back]

[8] https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/text-utils/pg.c?h=stable/v2.35 [back]

[9] Cf. Util-linux 2.29 Release Notes. [back]

[10] In fact, these systems don’t even ship more and instead only provide less by default, which will be invoked in more compatibility mode whenever a user runs the more command because /usr/bin/more is merely a hard link to the less executable. [back]

[11] An explanation of how more got its name is provided by its original author at https://danhalbert.org/more.html. [back]

No fun with auto-paged ls. A tale of futile optimization.