💾 Archived View for thrig.me › blog › 2022 › 10 › 31 › that-terrible-glob.gmi captured on 2023-03-20 at 18:26:26. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-01-29)

➡️ Next capture (2023-04-19)

🚧 View Differences

-=-=-=-=-=-=-

That Terrible Glob

The shell * glob has a variety of problems that probably should be guarded against. First up, we have a perhaps surprising interaction with command line arguments. That is, someone malicious can create

    $ touch ./-rf

and then if anyone runs

    $ rm *

in that directory the actual command run is

    $ mkdir empty && cd empty
    $ ls
    $ touch ./-rf
    $ touch foo bar
    $ ls
    -rf bar foo
    $ echo rm *
    rm -rf bar foo

as the glob has helpfully matched the filename and the shell has helpfully supplied it as a flag to the program. This attack requires a malicious local user who knows the code involved (ktrace can be instructive here, if the source is missing) and what flags to touch where to trip up a command.

The guard against this attack is to always delimit where the flags are and where the arguments are, especially in scripts that run unattended in who knows what directory:

    #  |--- flags go here ---|    |--- and the arguments go here ---|
    rm   -f                    --   *

Where the usual getopt -- flag (not every command supports this convention) is supplied to indicate "here endeth the flags", protecting the command from a glob turning any argument into a flag. Such verbosity might be annoying, but if it's hidden off in scripts doing who knows what who knows where, I would rather be on the safe side. On an unrelated note as a sysadmin I'm usually the one cleaning up any disasters, so there is some bias.

    $ cd .. && rm -rf -- empty

The observant may notice that I take pains to check every chdir call for failure. Any guesses as to why?

Bare Glob

The bare glob is another pitfall for many shells, though some (ZSH) do not suffer from it, by default. The problem is that many shells are sloppy, and if the glob matches nothing, the glob is passed through as-is. ZSH instead throws an error.

    $ mkdir empty && cd empty
    $ echo *
    *
    $ touch foo
    $ echo *
    foo
    $ rm foo
    $ echo *
    *

That is, the glob will behave differently depending on whether it matches anything, or not. In other words, a script may be one file touch (or rm) away from behaving in a totally different fashion than one might expect. This is probably not a big deal in an interactive shell, but could be terrible in an unattended script doing (or not doing?) something somewhere. This sort of thing is one of the many reasons why I drop sh as a programming language very quickly.

    $ exec zsh
    % rm foo
    % echo *
    zsh: no matches found: *
    % touch foo
    % echo *
    foo
    % rm foo
    % echo *(N)

    % exec ksh
    $ cd .. && rm -rf -- empty

The ZSH *(N) pattern is handy if you are dealing with a list of optional files, usually something along the lines of:

    for file in zsh.*(N); source $file

Some programs will garbage in/out a bare * (echo), others might do some sort of hopefully compatible glob expansion of their own (git?), and others... well who knows what will happen.

An example may help:

    remote_files=`ssh remote ls -d *`

Yes, this code is terrible. I've seen worse.

Anyways, this code (or something like it, I forget the exact original) assumes that no local files will ever match the glob. Should anything create a local file that does match (say, a coredump, someone making a backup file, a temporary file that is not cleaned up as expected, whatever, then the code will behave differently:

    $ mkdir empty && cd empty
    $ remote_files=`ssh remote ls -d *`
    $ touch scxbedrlxnig.tmp
    $ remote_files=`ssh remote ls -d *`
    ls: scxbedrlxnig.tmp: No such file or directory

This instance could be corrected by escaping the glob \* so that it is only unescaped on the other side of the SSH and becomes a glob there, but parsing ls(1) is also terrible, and you'd probably want an array for a list of files, but arrays in the shell are terrible, so I'd probably move to some other language besides sh here.

    #!/usr/bin/perl
    use 5.32.0;
    use warnings;
    use Net::SFTP::Foreign;
    my $sftp = Net::SFTP::Foreign->new( "remote", autodie => 1 );
    for my $ref ( @{ $sftp->ls } ) {
        say $ref->{filename};
    }

Efficient Portability

At the C level globs aren't very portable, or I recall that all my globbing code never did work aright on Linux (and I was never motivated enough to make it portable). There are also complexities that may not be supported, like the ZSH ** for recursion that the system glob(3) probably won't support. Yet another issue is that glob may perform a stat(2) (process tracing will reveal whether this is the case) so using glob may be less efficient than, say, fts_open(3) and then performing regular expression checks on the filenames. This can become more important as the quantity of files to process goes up.

That is, the shell glob is great for quick questions as "where is the source code for the uptime program?" on OpenBSD

    $ which uptime
    /usr/bin/uptime
    $ ls /usr/src/usr.bin/uptime
    ls: /usr/src/usr.bin/uptime: No such file or directory
    $ ls /usr/src/usr.bin/*/uptime*
    /usr/src/usr.bin/w/uptime.1

but not so much if there are millions of files to deal with.

bphflog links

bphflog index

next: Lisp Game Jam 2022