gemini - kennedy.gemi.dev

💾 Archived View for gmi.noulin.net › man › man7 › mount_namespaces.7.gmi captured on 2022-06-12 at 07:45:33. Gemini links have been rewritten to link to archived content
View Raw
More Information
-=-=-=-=-=-=-
MOUNT_NAMESPACES(7)                                                     Linux Programmer's Manual                                                    MOUNT_NAMESPACES(7)

NAME
       mount_namespaces - overview of Linux mount namespaces

DESCRIPTION
       For an overview of namespaces, see namespaces(7).

       Mount  namespaces  provide  isolation of the list of mounts seen by the processes in each namespace instance.  Thus, the processes in each of the mount namespace
       instances will see distinct single-directory hierarchies.

       The views provided by the /proc/[pid]/mounts, /proc/[pid]/mountinfo, and /proc/[pid]/mountstats files (all described in proc(5)) correspond to  the  mount  name‐
       space in which the process with the PID [pid] resides.  (All of the processes that reside in the same mount namespace will see the same view in these files.)

       A  new  mount  namespace is created using either clone(2) or unshare(2) with the CLONE_NEWNS flag.  When a new mount namespace is created, its mount list is ini‐
       tialized as follows:

       *  If the namespace is created using clone(2), the mount list of the child's namespace is a copy of the mount list in the parent process's mount namespace.

       *  If the namespace is created using unshare(2), the mount list of the new namespace is a copy of the mount list in the caller's previous mount namespace.

       Subsequent modifications to the mount list (mount(2) and umount(2)) in either mount namespace will not (by default) affect the mount list seen in the other name‐
       space (but see the following discussion of shared subtrees).

SHARED SUBTREES
       After  the implementation of mount namespaces was completed, experience showed that the isolation that they provided was, in some cases, too great.  For example,
       in order to make a newly loaded optical disk available in all mount namespaces, a mount operation was required in each namespace.  For this use case, and others,
       the  shared  subtree feature was introduced in Linux 2.6.15.  This feature allows for automatic, controlled propagation of mount and unmount events between name‐
       spaces (or, more precisely, between the mounts that are members of a peer group that are propagating events to one another).

       Each mount is marked (via mount(2)) as having one of the following propagation types:

       MS_SHARED
              This mount shares events with members of a peer group.  Mount and unmount events immediately under this mount will propagate to the other mounts that  are
              members  of  the  peer  group.   Propagation  here means that the same mount or unmount will automatically occur under all of the other mounts in the peer
              group.  Conversely, mount and unmount events that take place under peer mounts will propagate to this mount.

       MS_PRIVATE
              This mount is private; it does not have a peer group.  Mount and unmount events do not propagate into or out of this mount.

       MS_SLAVE
              Mount and unmount events propagate into this mount from a (master) shared peer group.  Mount and unmount events under this mount do not propagate  to  any
              peer.

              Note  that  a mount can be the slave of another peer group while at the same time sharing mount and unmount events with a peer group of which it is a mem‐
              ber.  (More precisely, one peer group can be the slave of another peer group.)

       MS_UNBINDABLE
              This is like a private mount, and in addition this mount can't be bind mounted.  Attempts to bind mount this mount (mount(2) with the MS_BIND  flag)  will
              fail.

              When a recursive bind mount (mount(2) with the MS_BIND and MS_REC flags) is performed on a directory subtree, any bind mounts within the subtree are auto‐
              matically pruned (i.e., not replicated) when replicating that subtree to produce the target subtree.

       For a discussion of the propagation type assigned to a new mount, see NOTES.

       The propagation type is a per-mount-point setting; some mounts may be marked as shared (with each shared mount being a member of a distinct  peer  group),  while
       others are private (or slaved or unbindable).

       Note  that  a  mount's  propagation type determines whether mounts and unmounts of mounts immediately under the mount are propagated.  Thus, the propagation type
       does not affect propagation of events for grandchildren and further removed descendant mounts.  What happens if the mount itself is unmounted  is  determined  by
       the propagation type that is in effect for the parent of the mount.

       Members are added to a peer group when a mount is marked as shared and either:

       *  the mount is replicated during the creation of a new mount namespace; or

       *  a new bind mount is created from the mount.

       In both of these cases, the new mount joins the peer group of which the existing mount is a member.

       A  new  peer  group  is  also  created when a child mount is created under an existing mount that is marked as shared.  In this case, the new child mount is also
       marked as shared and the resulting peer group consists of all the mounts that are replicated under the peers of parent mounts.

       A mount ceases to be a member of a peer group when either the mount is explicitly unmounted, or when the mount is implicitly unmounted because a mount  namespace
       is removed (because it has no more member processes).

       The  propagation  type of the mounts in a mount namespace can be discovered via the "optional fields" exposed in /proc/[pid]/mountinfo.  (See proc(5) for details
       of this file.)  The following tags can appear in the optional fields for a record in that file:

       shared:X
              This mount is shared in peer group X.  Each peer group has a unique ID that is automatically generated by the kernel, and all  mounts  in  the  same  peer
              group will show the same ID.  (These IDs are assigned starting from the value 1, and may be recycled when a peer group ceases to have any members.)

       master:X
              This mount is a slave to shared peer group X.

       propagate_from:X (since Linux 2.6.26)
              This  mount is a slave and receives propagation from shared peer group X.  This tag will always appear in conjunction with a master:X tag.  Here, X is the
              closest dominant peer group under the process's root directory.  If X is the immediate master of the mount, or if there is no dominant  peer  group  under
              the same root, then only the master:X field is present and not the propagate_from:X field.  For further details, see below.

       unbindable
              This is an unbindable mount.

       If none of the above tags is present, then this is a private mount.

   MS_SHARED and MS_PRIVATE example
       Suppose that on a terminal in the initial mount namespace, we mark one mount as shared and another as private, and then view the mounts in /proc/self/mountinfo:

           sh1# mount --make-shared /mntS
           sh1# mount --make-private /mntP
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           77 61 8:17 / /mntS rw,relatime shared:1
           83 61 8:15 / /mntP rw,relatime

       From  the  /proc/self/mountinfo output, we see that /mntS is a shared mount in peer group 1, and that /mntP has no optional tags, indicating that it is a private
       mount.  The first two fields in each record in this file are the unique ID for this mount, and the mount ID of the parent mount.  We  can  further  inspect  this
       file to see that the parent mount of /mntS and /mntP is the root directory, /, which is mounted as private:

           sh1# cat /proc/self/mountinfo | awk '$1 == 61' | sed 's/ - .*//'
           61 0 8:2 / / rw,relatime

       On a second terminal, we create a new mount namespace where we run a second shell and inspect the mounts:

           $ PS1='sh2# ' sudo unshare -m --propagation unchanged sh
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           222 145 8:17 / /mntS rw,relatime shared:1
           225 145 8:15 / /mntP rw,relatime

       The  new  mount  namespace  received a copy of the initial mount namespace's mounts.  These new mounts maintain the same propagation types, but have unique mount
       IDs.  (The --propagation unchanged option prevents unshare(1) from marking all mounts as private when creating a new mount namespace, which it does by default.)

       In the second terminal, we then create submounts under each of /mntS and /mntP and inspect the set-up:

           sh2# mkdir /mntS/a
           sh2# mount /dev/sdb6 /mntS/a
           sh2# mkdir /mntP/b
           sh2# mount /dev/sdb7 /mntP/b
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           222 145 8:17 / /mntS rw,relatime shared:1
           225 145 8:15 / /mntP rw,relatime
           178 222 8:22 / /mntS/a rw,relatime shared:2
           230 225 8:23 / /mntP/b rw,relatime

       From the above, it can be seen that /mntS/a was created as shared (inheriting this setting from its parent mount) and /mntP/b was created as a private mount.

       Returning to the first terminal and inspecting the set-up, we see that the new mount created under the shared mount /mntS propagated to its peer  mount  (in  the
       initial mount namespace), but the new mount created under the private mount /mntP did not propagate:

           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           77 61 8:17 / /mntS rw,relatime shared:1
           83 61 8:15 / /mntP rw,relatime
           179 77 8:22 / /mntS/a rw,relatime shared:2

   MS_SLAVE example
       Making  a  mount a slave allows it to receive propagated mount and unmount events from a master shared peer group, while preventing it from propagating events to
       that master.  This is useful if we want to (say) receive a mount event when an optical disk is mounted in the master shared peer group (in  another  mount  name‐
       space), but want to prevent mount and unmount events under the slave mount from having side effects in other namespaces.

       We can demonstrate the effect of slaving by first marking two mounts as shared in the initial mount namespace:

           sh1# mount --make-shared /mntX
           sh1# mount --make-shared /mntY
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2

       On a second terminal, we create a new mount namespace and inspect the mounts:

           sh2# unshare -m --propagation unchanged sh
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime shared:2

       In the new mount namespace, we then mark one of the mounts as a slave:

           sh2# mount --make-slave /mntY
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2

       From the above output, we see that /mntY is now a slave mount that is receiving propagation events from the shared peer group with the ID 2.

       Continuing in the new namespace, we create submounts under each of /mntX and /mntY:

           sh2# mkdir /mntX/a
           sh2# mount /dev/sda3 /mntX/a
           sh2# mkdir /mntY/b
           sh2# mount /dev/sda5 /mntY/b

       When  we  inspect the state of the mounts in the new mount namespace, we see that /mntX/a was created as a new shared mount (inheriting the "shared" setting from
       its parent mount) and /mntY/b was created as a private mount:

           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2
           173 168 8:3 / /mntX/a rw,relatime shared:3
           175 169 8:5 / /mntY/b rw,relatime

       Returning to the first terminal (in the initial mount namespace), we see that the mount /mntX/a propagated to the peer (the shared /mntX), but the mount  /mntY/b
       was not propagated:

           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2
           174 132 8:3 / /mntX/a rw,relatime shared:3

       Now we create a new mount under /mntY in the first shell:

           sh1# mkdir /mntY/c
           sh1# mount /dev/sda1 /mntY/c
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2
           174 132 8:3 / /mntX/a rw,relatime shared:3
           178 133 8:1 / /mntY/c rw,relatime shared:4

       When  we  examine the mounts in the second mount namespace, we see that in this case the new mount has been propagated to the slave mount, and that the new mount
       is itself a slave mount (to peer group 4):

           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2
           173 168 8:3 / /mntX/a rw,relatime shared:3
           175 169 8:5 / /mntY/b rw,relatime
           179 169 8:1 / /mntY/c rw,relatime master:4

   MS_UNBINDABLE example
       One of the primary purposes of unbindable mounts is to avoid the "mount explosion" problem when repeatedly performing bind mounts of a higher-level subtree at  a
       lower-level mount.  The problem is illustrated by the following shell session.

       Suppose we have a system with the following mounts:

           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY

       Suppose  furthermore that we wish to recursively bind mount the root directory under several users' home directories.  We do this for the first user, and inspect
       the mounts:

           # mount --rbind / /home/cecilia/
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY

       When we repeat this operation for the second user, we start to see the explosion problem:

           # mount --rbind / /home/henry
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/henry/home/cecilia
           /dev/sdb6 on /home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/henry/home/cecilia/mntY

       Under /home/henry, we have not only recursively added the /mntX and /mntY mounts, but also the recursive mounts of those  directories  under  /home/cecilia  that
       were created in the previous step.  Upon repeating the step for a third user, it becomes obvious that the explosion is exponential in nature:

           # mount --rbind / /home/otto
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/henry/home/cecilia
           /dev/sdb6 on /home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/henry/home/cecilia/mntY
           /dev/sda1 on /home/otto
           /dev/sdb6 on /home/otto/mntX
           /dev/sdb7 on /home/otto/mntY
           /dev/sda1 on /home/otto/home/cecilia
           /dev/sdb6 on /home/otto/home/cecilia/mntX
           /dev/sdb7 on /home/otto/home/cecilia/mntY
           /dev/sda1 on /home/otto/home/henry
           /dev/sdb6 on /home/otto/home/henry/mntX
           /dev/sdb7 on /home/otto/home/henry/mntY
           /dev/sda1 on /home/otto/home/henry/home/cecilia
           /dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY

       The  mount explosion problem in the above scenario can be avoided by making each of the new mounts unbindable.  The effect of doing this is that recursive mounts
       of the root directory will not replicate the unbindable mounts.  We make such a mount for the first user:

           # mount --rbind --make-unbindable / /home/cecilia

       Before going further, we show that unbindable mounts are indeed unbindable:

           # mkdir /mntZ
           # mount --bind /home/cecilia /mntZ
           mount: wrong fs type, bad option, bad superblock on /home/cecilia,
                  missing codepage or helper program, or other error

                  In some cases useful info is found in syslog - try
                  dmesg | tail or so.

       Now we create unbindable recursive bind mounts for the other two users:

           # mount --rbind --make-unbindable / /home/henry
           # mount --rbind --make-unbindable / /home/otto

       Upon examining the list of mounts, we see there has been no explosion of mounts, because the unbindable mounts were not replicated under each user's directory:

           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/otto
           /dev/sdb6 on /home/otto/mntX
           /dev/sdb7 on /home/otto/mntY

   Propagation type transitions
       The following table shows the effect that applying a new propagation type (i.e., mount --make-xxxx) has on the existing propagation type of a  mount.   The  rows
       correspond  to  existing  propagation types, and the columns are the new propagation settings.  For reasons of space, "private" is abbreviated as "priv" and "un‐
       bindable" as "unbind".

                     make-shared   make-slave      make-priv  make-unbind
       ─────────────┬───────────────────────────────────────────────────────
       shared       │shared        slave/priv [1]  priv       unbind
       slave        │slave+shared  slave [2]       priv       unbind
       slave+shared │slave+shared  slave           priv       unbind
       private      │shared        priv [2]        priv       unbind
       unbindable   │shared        unbind [2]      priv       unbind

       Note the following details to the table:

       [1] If a shared mount is the only mount in its peer group, making it a slave automatically makes it private.

       [2] Slaving a nonshared mount has no effect on the mount.

   Bind (MS_BIND) semantics
       Suppose that the following command is performed:

           mount --bind A/a B/b

       Here, A is the source mount, B is the destination mount, a is a subdirectory path under the mount point A, and b is a subdirectory path under the mount point  B.
       The propagation type of the resulting mount, B/b, depends on the propagation types of the mounts A and B, and is summarized in the following table.

                                  source(A)
                          shared  private    slave         unbind
       ──────────────────┬──────────────────────────────────────────
       dest(B)  shared   │shared  shared     slave+shared  invalid
                nonshared│shared  private    slave         invalid

       Note  that  a  recursive bind of a subtree follows the same semantics as for a bind operation on each mount in the subtree.  (Unbindable mounts are automatically
       pruned at the target mount point.)

       For further details, see Documentation/filesystems/sharedsubtree.txt in the kernel source tree.

   Move (MS_MOVE) semantics
       Suppose that the following command is performed:

           mount --move A B/b

       Here, A is the source mount, B is the destination mount, and b is a subdirectory path under the mount point B.  The propagation type of the resulting mount, B/b,
       depends on the propagation types of the mounts A and B, and is summarized in the following table.

                                  source(A)
                          shared  private    slave         unbind
       ──────────────────┬─────────────────────────────────────────────
       dest(B)  shared   │shared  shared     slave+shared  invalid
                nonshared│shared  private    slave         unbindable

       Note: moving a mount that resides under a shared mount is invalid.

       For further details, see Documentation/filesystems/sharedsubtree.txt in the kernel source tree.

   Mount semantics
       Suppose that we use the following command to create a mount:

           mount device B/b

       Here, B is the destination mount, and b is a subdirectory path under the mount point B.  The propagation type of the resulting mount, B/b, follows the same rules
       as for a bind mount, where the propagation type of the source mount is considered always to be private.

   Unmount semantics
       Suppose that we use the following command to tear down a mount:

           unmount A

       Here, A is a mount on B/b, where B is the parent mount and b is a subdirectory path under the mount point B.  If B  is  shared,  then  all  most-recently-mounted
       mounts at b on mounts that receive propagation from mount B and do not have submounts under them are unmounted.

   The /proc/[pid]/mountinfo propagate_from tag
       The  propagate_from:X  tag is shown in the optional fields of a /proc/[pid]/mountinfo record in cases where a process can't see a slave's immediate master (i.e.,
       the pathname of the master is not reachable from the filesystem root directory) and so cannot determine the chain of propagation between the mounts it can see.

       In the following example, we first create a two-link master-slave chain between the mounts /mnt, /tmp/etc, and /mnt/tmp/etc.  Then the chroot(1) command is  used
       to  make the /tmp/etc mount point unreachable from the root directory, creating a situation where the master of /mnt/tmp/etc is not reachable from the (new) root
       directory of the process.

       First, we bind mount the root directory onto /mnt and then bind mount /proc at /mnt/proc so that after the later chroot(1) the proc(5) filesystem remains visible
       at the correct location in the chroot-ed environment.

           # mkdir -p /mnt/proc
           # mount --bind / /mnt
           # mount --bind /proc /mnt/proc

       Next, we ensure that the /mnt mount is a shared mount in a new peer group (with no peers):

           # mount --make-private /mnt  # Isolate from any previous peer group
           # mount --make-shared /mnt
           # cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5

       Next, we bind mount /mnt/etc onto /tmp/etc:

           # mkdir -p /tmp/etc
           # mount --bind /mnt/etc /tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:102

       Initially,  these two mounts are in the same peer group, but we then make the /tmp/etc a slave of /mnt/etc, and then make /tmp/etc shared as well, so that it can
       propagate events to the next slave in the chain:

           # mount --make-slave /tmp/etc
           # mount --make-shared /tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:105 master:102

       Then we bind mount /tmp/etc onto /mnt/tmp/etc.  Again, the two mounts are initially in the same peer group, but we then make /mnt/tmp/etc a slave of /tmp/etc:

           # mkdir -p /mnt/tmp/etc
           # mount --bind /tmp/etc /mnt/tmp/etc
           # mount --make-slave /mnt/tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:105 master:102
           273 239 8:2 /etc /mnt/tmp/etc ... master:105

       From the above, we see that /mnt is the master of the slave /tmp/etc, which in turn is the master of the slave /mnt/tmp/etc.

       We then chroot(1) to the /mnt directory, which renders the mount with ID 267 unreachable from the (new) root directory:

           # chroot /mnt

       When we examine the state of the mounts inside the chroot-ed environment, we see the following:

           # cat /proc/self/mountinfo | sed 's/ - .*//'
           239 61 8:2 / / ... shared:102
           248 239 0:4 / /proc ... shared:5
           273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102

       Above, we see that the mount with ID 273 is a slave whose master is the peer group 105.  The mount point for that master is unreachable, and so a  propagate_from
       tag  is displayed, indicating that the closest dominant peer group (i.e., the nearest reachable mount in the slave chain) is the peer group with the ID 102 (cor‐
       responding to the /mnt mount point before the chroot(1) was performed.

VERSIONS
       Mount namespaces first appeared in Linux 2.4.19.

CONFORMING TO
       Namespaces are a Linux-specific feature.

NOTES
       The propagation type assigned to a new mount depends on the propagation type of the parent mount.  If the mount has a parent (i.e., it is a non-root mount point)
       and  the  propagation  type of the parent is MS_SHARED, then the propagation type of the new mount is also MS_SHARED.  Otherwise, the propagation type of the new
       mount is MS_PRIVATE.

       Notwithstanding the fact that the default propagation type for new mount is in many cases MS_PRIVATE, MS_SHARED is typically more useful.  For this reason,  sys‐
       temd(1) automatically remounts all mounts as MS_SHARED on system startup.  Thus, on most modern systems, the default propagation type is in practice MS_SHARED.

       Since, when one uses unshare(1) to create a mount namespace, the goal is commonly to provide full isolation of the mounts in the new namespace, unshare(1) (since
       util-linux version 2.27) in turn reverses the step performed by systemd(1), by making all mounts private in the new namespace.  That is, unshare(1) performs  the
       equivalent of the following in the new mount namespace:

           mount --make-rprivate /

       To prevent this, one can use the --propagation unchanged option to unshare(1).

       An  application  that  creates a new mount namespace directly using clone(2) or unshare(2) may desire to prevent propagation of mount events to other mount name‐
       spaces (as is done by unshare(1)).  This can be done by changing the propagation type of mounts in the new namespace to either MS_SLAVE or  MS_PRIVATE,  using  a
       call such as the following:

           mount(NULL, "/", MS_SLAVE | MS_REC, NULL);

       For a discussion of propagation types when moving mounts (MS_MOVE) and creating bind mounts (MS_BIND), see Documentation/filesystems/sharedsubtree.txt.

   Restrictions on mount namespaces
       Note the following points with respect to mount namespaces:

       [1] Each  mount namespace has an owner user namespace.  As explained above, when a new mount namespace is created, its mount list is initialized as a copy of the
           mount list of another mount namespace.  If the new namespace and the namespace from which the mount list was copied are owned by different  user  namespaces,
           then the new mount namespace is considered less privileged.

       [2] When  creating  a  less privileged mount namespace, shared mounts are reduced to slave mounts.  This ensures that mappings performed in less privileged mount
           namespaces will not propagate to more privileged mount namespaces.

       [3] Mounts that come as a single unit from a more privileged mount namespace are locked together and may not be separated in a less privileged  mount  namespace.
           (The unshare(2) CLONE_NEWNS operation brings across all of the mounts from the original mount namespace as a single unit, and recursive mounts that propagate
           between mount namespaces propagate as a single unit.)

           In this context, "may not be separated" means that the mounts are locked so that they may not be individually unmounted.  Consider the following example:

               $ sudo sh
               # mount --bind /dev/null /etc/shadow
               # cat /etc/shadow       # Produces no output

           The above steps, performed in a more privileged mount namespace, have created a  bind  mount  that  obscures  the  contents  of  the  shadow  password  file,
           /etc/shadow.   For  security  reasons, it should not be possible to unmount that mount in a less privileged mount namespace, since that would reveal the con‐
           tents of /etc/shadow.

           Suppose we now create a new mount namespace owned by a new user namespace.  The new mount namespace will inherit copies of all of the mounts from the  previ‐
           ous mount namespace.  However, those mounts will be locked because the new mount namespace is less privileged.  Consequently, an attempt to unmount the mount
           fails as show in the following step:

               # unshare --user --map-root-user --mount \
                              strace -o /tmp/log \
                              umount /mnt/dir
               umount: /etc/shadow: not mounted.
               # grep '^umount' /tmp/log
               umount2("/etc/shadow", 0)     = -1 EINVAL (Invalid argument)

           The error message from mount(8) is a little confusing, but the strace(1) output reveals that the underlying umount2(2) system call failed with the error EIN‐
           VAL, which is the error that the kernel returns to indicate that the mount is locked.

           Note, however, that it is possible to stack (and unstack) a mount on top of one of the inherited locked mounts in a less privileged mount namespace:

               # echo 'aaaaa' > /tmp/a    # File to mount onto /etc/shadow
               # unshare --user --map-root-user --mount \
                   sh -c 'mount --bind /tmp/a /etc/shadow; cat /etc/shadow'
               aaaaa
               # umount /etc/shadow

           The  final  umount(8)  command  above, which is performed in the initial mount namespace, makes the original /etc/shadow file once more visible in that name‐
           space.

       [4] Following on from point [3], note that it is possible to unmount an entire subtree of mounts that propagated as a unit into a  less  privileged  mount  name‐
           space, as illustrated in the following example.

           First,  we create new user and mount namespaces using unshare(1).  In the new mount namespace, the propagation type of all mounts is set to private.  We then
           create a shared bind mount at /mnt, and a small hierarchy of mounts underneath that mount.

               $ PS1='ns1# ' sudo unshare --user --map-root-user \
                                      --mount --propagation private bash
               ns1# echo $        # We need the PID of this shell later
               778501
               ns1# mount --make-shared --bind /mnt /mnt
               ns1# mkdir /mnt/x
               ns1# mount --make-private -t tmpfs none /mnt/x
               ns1# mkdir /mnt/x/y
               ns1# mount --make-private -t tmpfs none /mnt/x/y
               ns1# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               986 83 8:5 /mnt /mnt rw,relatime shared:344
               989 986 0:56 / /mnt/x rw,relatime
               990 989 0:57 / /mnt/x/y rw,relatime

           Continuing in the same shell session, we then create a second shell in a new user namespace and a new (less privileged) mount namespace and check  the  state
           of the propagated mounts rooted at /mnt.

               ns1# PS1='ns2# ' unshare --user --map-root-user \
                                      --mount --propagation unchanged bash
               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime

           Of  note  in  the above output is that the propagation type of the mount /mnt has been reduced to slave, as explained in point [2].  This means that submount
           events will propagate from the master /mnt in "ns1", but propagation will not occur in the opposite direction.

           From a separate terminal window, we then use nsenter(1) to enter the mount and user namespaces corresponding to "ns1".  In that terminal window, we then  re‐
           cursively bind mount /mnt/x at the location /mnt/ppp.

               $ PS1='ns3# ' sudo nsenter -t 778501 --user --mount
               ns3# mount --rbind --make-private /mnt/x /mnt/ppp
               ns3# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               986 83 8:5 /mnt /mnt rw,relatime shared:344
               989 986 0:56 / /mnt/x rw,relatime
               990 989 0:57 / /mnt/x/y rw,relatime
               1242 986 0:56 / /mnt/ppp rw,relatime
               1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518

           Because  the propagation type of the parent mount, /mnt, was shared, the recursive bind mount propagated a small subtree of mounts under the slave mount /mnt
           into "ns2", as can be verified by executing the following command in that shell session:

               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime
               1244 1239 0:56 / /mnt/ppp rw,relatime
               1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518

           While it is not possible to unmount a part of the propagated subtree (/mnt/ppp/y) in "ns2", it is possible to unmount the entire subtree,  as  shown  by  the
           following commands:

               ns2# umount /mnt/ppp/y
               umount: /mnt/ppp/y: not mounted.
               ns2# umount -l /mnt/ppp | sed 's/ - .*//'      # Succeeds...
               ns2# grep /mnt /proc/self/mountinfo
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime

       [5] The  mount(2)  flags  MS_RDONLY,  MS_NOSUID, MS_NOEXEC, and the "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) settings become locked when propagated
           from a more privileged to a less privileged mount namespace, and may not be changed in the less privileged mount namespace.

           This point is illustrated in the following example where, in a more privileged mount namespace, we create a bind mount that is marked as read-only.  For  se‐
           curity reasons, it should not be possible to make the mount writable in a less privileged mount namespace, and indeed the kernel prevents this:

               $ sudo mkdir /mnt/dir
               $ sudo mount --bind -o ro /some/path /mnt/dir
               $ sudo unshare --user --map-root-user --mount \
                              mount -o remount,rw /mnt/dir
               mount: /mnt/dir: permission denied.

       [6] A file or directory that is a mount point in one namespace that is not a mount point in another namespace, may be renamed, unlinked, or removed (rmdir(2)) in
           the mount namespace in which it is not a mount point (subject to the usual permission checks).  Consequently, the mount point is removed in the  mount  name‐
           space where it was a mount point.

           Previously (before Linux 3.18), attempting to unlink, rename, or remove a file or directory that was a mount point in another mount namespace would result in
           the error EBUSY.  That behavior had technical problems of enforcement (e.g., for NFS) and permitted denial-of-service attacks against more  privileged  users
           (i.e., preventing individual files from being updated by bind mounting on top of them).

EXAMPLES
       See pivot_root(2).

SEE ALSO
       unshare(1),  clone(2),  mount(2),  mount_setattr(2),  pivot_root(2),  setns(2),  umount(2),  unshare(2),  proc(5), namespaces(7), user_namespaces(7), findmnt(8),
       mount(8), pam_namespace(8), pivot_root(8), umount(8)

       Documentation/filesystems/sharedsubtree.txt in the kernel source tree.

Linux                                                                          2021-08-27                                                            MOUNT_NAMESPACES(7)