💾 Archived View for thrig.me › blog › 2022 › 12 › 14 › duplicate-environment-variables.gmi captured on 2023-05-24 at 18:27:35. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-04-19)

➡️ Next capture (2023-12-28)

-=-=-=-=-=-=-

Duplicate Environment Variables

A common problem is to actually convince programmers that duplicate environment variables are possible on unix; most programmers interact with the environment through an interface that gives the impression that environment variables are unique in their Platonic splendor.

    $ env FOO=bar perl -E 'say $ENV{FOO}'
    bar
    $ env FOO=bar cfu 'printf("%s\n", getenv("FOO"))'
    bar
    $ env FOO=bar perl -E '$ENV{FOO} = "baz"; say $ENV{FOO}'
    baz
    $ env FOO=bar cfu 'setenv("FOO", "baz",1);printf("%s\n", getenv("FOO"))'
    baz

See? FOO only has one value that changes when you update it. Therefore, environment variables are unique. Q.E.D.

Wrong!

Abstractions Are, Like, You Know, Everywhere

%ENV (a hash or associative array or dictionary, typical in those filthy scripting languages) or getenv(3) (the C library function interface) are abstractions built on top of something. What is that something? A computer! That models a PDP-11! No, closer to home. Various approaches work here, such as delving the code for getenv, or to read documentation such as environ(7), which may mention something along the lines of

    NAME
         environ - user environment

    SYNOPSIS
         extern char **environ;

    DESCRIPTION
         An array of strings called the "environment" is made available
         by execve(2) when a process begins. By convention these strings
         have the form name=value.

That's from OpenBSD; other unixlikes may vary with the documentation. But the gist is that **environ is an array of strings, which if you know anything about C might look a lot like *argv[] or the equivalent **argv for the arguments given to a program.

    // argarg - print two args
    #include <stdio.h>
    int main(int argc, char **argv) {
        if (argc > 2) {
            printf("%s %s\n", argv[1], argv[2]);
        }
        return 0;
    }

Thus we can have duplicate entries in **argv, a point that few will dispute:

    $ make argarg
    cc -O2 -pipe    -o argarg argarg.c
    $ ./argarg foo foo
    foo foo

Given this, what do you think **environ might allow by way of duplicates? This takes a bit more work to setup, but luckily someone wrote a small program that helpfully creates duplicate environment variables, somewhere under the glorious mess that is

https://thrig.me/src/scripts.git

With dupenv, we can wrap env, here merely to report what environment variables are set, and see if two FOO exist.

    $ dupenv FOO=bar FOO=baz env | grep FOO
    FOO=bar
    FOO=baz

Nope, FOO is not Platonic. More like contingent arising... I digress.

    $ dupenv SHELL=/bin/sh SHELL=/bin/ed env | grep \^SHELL
    SHELL=/bin/ksh
    SHELL=/bin/sh
    SHELL=/bin/ed

Whose shell is it, anyways?

The duplication has been known and published since at least the 1990s, though not widely known, even among unix users. There have been various band-aids put in place, because if you pick the wrong environment variable or otherwise fail to cleanup the list, you get security vulnerabilities that were there for something like 35 years,

https://www.sudo.ws/repos/sudo/rev/d4dfb05db5d7

whoops, and a complicating factor is that different languages have put the band-aid on in different ways, or not at all, and some will pick the first of any duplicated environment variables (C, Go, Perl, sudo, zsh, ...) while others will pick the last of any duplicated environment variables (bash, ksh, ...). Also languages vary as whether they de-duplicate environment variables, whether a de-duplicated list or the original list is passed to child processes, etc.

    $ dupenv FOO=aaa FOO=ZZZ cfu 'printf("%s\n", getenv("FOO"))'
    aaa
    $ dupenv FOO=aaa FOO=ZZZ ksh -c 'echo $FOO'
    ZZZ
    $ dupenv FOO=aaa FOO=ZZZ zsh -c 'echo $FOO'
    aaa
    $ dupenv FOO=aaa FOO=ZZZ expect -c 'puts "$env(FOO) [exec sh -c {echo $FOO}]"'
    aaa ZZZ
    $ dupenv FOO=aaa FOO=ZZZ python3 -uc 'import os;print(os.environ["FOO"]);os.execvp("env",["env"])' | egrep 'aaa|ZZZ'
    aaa
    FOO=aaa
    FOO=ZZZ
    $ dupenv FOO=aaa FOO=ZZZ perl -E 'say $ENV{FOO};exec qw(sh -c), q{echo $FOO}'
    aaa
    aaa

Buyer beware?

What Am I Bewaring?

Good question! One may note that some of the above languages pass duplicate environment variables to programs they run--garbage in, garbage out--and that some tools use the last of the duplicate values instead of the first. This is wiggle room for an attacker, and perhaps enough wiggle room to embiggen the CVE list. What would happen if say, hypothetically, you have some Python code that runs some bash scripts, and the bash scripts see completely different values for PATH or LD_PRELOAD or who knows what other envrionment variables? What could an attacker do with that difference? Could there be an information leak, or an escalation of privileges?

Recap

Are the programmers in error? At a certain level of abstraction, no. In C, using only the getenv and setenv interface the environment will not appear to contain duplicates, and this will in most cases not cause a problem. In Go or Perl where the environment list is de-duplicated it is even more true that environment variables are unique, though Go does let one create a new []string with duplicates for the syscall.Exec call.

There are security ramifications.

tags #perl #c #go #unix

bphflog links

bphflog index

next: Yet Another Blog Engine Rewrite