diff --git a/engine/security/rootless.md b/engine/security/rootless.md index 2fcf2d6d1c..61d94cd392 100644 --- a/engine/security/rootless.md +++ b/engine/security/rootless.md @@ -5,30 +5,34 @@ title: Run the Docker daemon as a non-root user (Rootless mode) --- Rootless mode allows running the Docker daemon and containers as a non-root -user, for the sake of mitigating potential vulnerabilities in the daemon and +user to mitigate potential vulnerabilities in the daemon and the container runtime. -Rootless mode does not require root privileges even for installation of the -Docker daemon, as long as [the prerequisites](#prerequisites) are satisfied. +Rootless mode does not require root privileges even during the installation of +the Docker daemon, as long as the [prerequisites](#prerequisites) are met. -Rootless mode was introduced in Docker Engine 19.03. +Rootless mode was introduced in Docker Engine v19.03. -> **Note**: -> Rootless mode is an experimental feature and has [limitations](#known-limitations). +> **Note** +> +> Rootless mode is an experimental feature and has some limitations. For details, +> see [Known limitations](#known-limitations). ## How it works + Rootless mode executes the Docker daemon and containers inside a user namespace. This is very similar to [`userns-remap` mode](userns-remap.md), except that -with `userns-remap` mode, the daemon itself is running with root privileges, whereas in -rootless mode, both the daemon and the container are running without root privileges. +with `userns-remap` mode, the daemon itself is running with root privileges, +whereas in rootless mode, both the daemon and the container are running without +root privileges. -Rootless mode does not use binaries with SETUID bits or file capabilities, +Rootless mode does not use binaries with `SETUID` bits or file capabilities, except `newuidmap` and `newgidmap`, which are needed to allow multiple UIDs/GIDs to be used in the user namespace. ## Prerequisites -- `newuidmap` and `newgidmap` need to be installed on the host. These commands +- You must install `newuidmap` and `newgidmap` on the host. These commands are provided by the `uidmap` package on most distros. - `/etc/subuid` and `/etc/subgid` should contain at least 65,536 subordinate @@ -43,14 +47,15 @@ testuser $ grep ^$(whoami): /etc/subuid testuser:231072:65536 $ grep ^$(whoami): /etc/subgid -testuser::231072:65536 +testuser:231072:65536 ``` ### Distribution-specific hint -> Note: Using Ubuntu kernel is recommended. +> Note: We recommend that you use the Ubuntu kernel. #### Ubuntu + - No preparation is needed. - `overlay2` storage driver is enabled by default @@ -65,32 +70,37 @@ testuser::231072:65536 - To use the `overlay2` storage driver (recommended), run `sudo modprobe overlay permit_mounts_in_userns=1` ([Debian-specific kernel patch, introduced in Debian 10](https://salsa.debian.org/kernel-team/linux/blob/283390e7feb21b47779b48e0c8eb0cc409d2c815/debian/patches/debian/overlayfs-permit-mounts-in-userns.patch)). - Put the configuration to `/etc/modprobe.d` for persistence. + Add the configuration to `/etc/modprobe.d` for persistence. -- Known to work on Debian 9 and 10. +- Known to work on Debian 9 and 10. `overlay2` is only supported since Debian 10 and needs `modprobe` configuration described above. #### Arch Linux + - Add `kernel.unprivileged_userns_clone=1` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system` #### openSUSE + - `sudo modprobe ip_tables iptable_mangle iptable_nat iptable_filter` is required. This might be required on other distros as well depending on the configuration. - Known to work on openSUSE 15. #### Fedora 31 and later + - Fedora 31 uses cgroup v2 by default, which is not yet supported by the containerd runtime. Run `sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"` to use cgroup v1. -- `sudo dnf install -y iptables` might be needed. +- You might need `sudo dnf install -y iptables`. #### CentOS 8 -- `sudo dnf install -y iptables` might be needed. + +- You might need `sudo dnf install -y iptables`. #### CentOS 7 + - Add `user.max_user_namespaces=28633` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. @@ -98,13 +108,13 @@ testuser::231072:65536 Run the daemon directly without systemd: `dockerd-rootless.sh --experimental --storage-driver vfs` -- Known to work on CentOS 7.7. Older releases require extra configuration +- Known to work on CentOS 7.7. Older releases require additional configuration steps. - CentOS 7.6 and older releases require [COPR package `vbatts/shadow-utils-newxidmap`](https://copr.fedorainfracloud.org/coprs/vbatts/shadow-utils-newxidmap/) to be installed. - CentOS 7.5 and older releases require running - `sudo grubby --update-kernel=ALL --args="user_namespace.enable=1"` and reboot. + `sudo grubby --update-kernel=ALL --args="user_namespace.enable=1"` and a reboot following this. ## Known limitations @@ -116,11 +126,11 @@ testuser::231072:65536 - Checkpoint - Overlay network - Exposing SCTP ports -- To use `ping` command, see [Routing ping packets](#routing-ping-packets) -- To expose privileged TCP/UDP ports (< 1024), see [Exposing privileged ports](#exposing-privileged-ports) +- To use the `ping` command, see [Routing ping packets](#routing-ping-packets). +- To expose privileged TCP/UDP ports (< 1024), see [Exposing privileged ports](#exposing-privileged-ports). - `IPAddress` shown in `docker inspect` and is namespaced inside RootlessKit's network namespace. This means the IP address is not reachable from the host without `nsenter`-ing into the network namespace. -- Host network (`docker run --net=host`) is namespaced inside RootlessKit as well. +- Host network (`docker run --net=host`) is also namespaced inside RootlessKit. ## Install @@ -131,9 +141,9 @@ $ curl -fsSL https://get.docker.com/rootless | sh ``` Make sure to run the script as a non-root user. -To install Rootless Docker as the root user, see [Manual installation](#manual-installation) steps. +To install Rootless Docker as the root user, see the [Manual installation](#manual-installation) steps. -The script will show the environment variables that are needed to be set: +The script shows environment variables that are required: ```console $ curl -fsSL https://get.docker.com/rootless | sh @@ -153,16 +163,20 @@ export DOCKER_HOST=unix:///run/user/1001/docker.sock ``` ### Manual installation + To install the binaries manually without using the installer, extract `docker-rootless-extras-.tar.gz` along with `docker-.tar.gz`: from [https://download.docker.com/linux/static/stable/x86_64/](https://download.docker.com/linux/static/stable/x86_64/){: target="_blank" class="_" } -If you already have Docker daemon running as the root, you only need to extract `docker-rootless-extras-.tar.gz`. -The archive can be extracted under an arbitrary directory listed in the `$PATH`. e.g. `/usr/local/bin`, or `$HOME/bin`. +If you already have the Docker daemon running as the root, you only need to +extract `docker-rootless-extras-.tar.gz`. The archive can be extracted +under an arbitrary directory listed in the `$PATH`. For example, `/usr/local/bin`, +or `$HOME/bin`. ### Nightly channel -To install a nightly version of Rootless Docker, execute the installation script with `CHANNEL="nightly"`: +To install a nightly version of the Rootless Docker, run the installation script +using `CHANNEL="nightly"`: ```console $ curl -fsSL https://get.docker.com/rootless | CHANNEL="nightly" sh @@ -198,11 +212,13 @@ $ dockerd-rootless.sh --experimental --storage-driver vfs As Rootless mode is experimental, you need to run `dockerd-rootless.sh` with `--experimental`. -You also need `--storage-driver vfs` unless using Ubuntu or Debian 10 kernel. -You don't need to care about these flags if you manage the daemon using systemd, as -these flags are automatically added to the systemd unit file. + +You also need `--storage-driver vfs` unless you are using Ubuntu or Debian 10 +kernel. You don't need to care about these flags if you manage the daemon using +systemd, as these flags are automatically added to the systemd unit file. Remarks about directory paths: + - The socket path is set to `$XDG_RUNTIME_DIR/docker.sock` by default. `$XDG_RUNTIME_DIR` is typically set to `/run/user/$UID`. - The data dir is set to `~/.local/share/docker` by default. @@ -211,8 +227,9 @@ Remarks about directory paths: used by the client) by default. Other remarks: + - The `dockerd-rootless.sh` script executes `dockerd` in its own user, mount, - and network namespaces. You can enter the namespaces by running + and network namespaces. You can enter the namespaces by running `nsenter -U --preserve-credentials -n -m -t $(cat $XDG_RUNTIME_DIR/docker.pid)`. - `docker info` shows `rootless` in `SecurityOptions` - `docker info` shows `none` as `Cgroup Driver` @@ -221,13 +238,15 @@ Other remarks: You need to specify the socket path explicitly. -To specify the socket path via `$DOCKER_HOST`: +To specify the socket path using `$DOCKER_HOST`: + ```console $ export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock $ docker run -d -p 8080:80 nginx ``` -To specify the socket path via `docker context`: +To specify the socket path using `docker context`: + ```console $ docker context create rootless --description "for rootless mode" --docker "host=unix://$XDG_RUNTIME_DIR/docker.sock" rootless @@ -238,24 +257,24 @@ Current context is now "rootless" $ docker run -d -p 8080:80 nginx ``` -## Tips +## Best practices ### Rootless Docker in Docker -To run Rootless Docker inside "rootful" Docker, use `docker:-dind-rootless` -image instead of `docker:-dind` image. +To run Rootless Docker inside "rootful" Docker, use the `docker:-dind-rootless` +image instead of `docker:-dind`. ```console $ docker run -d --name dind-rootless --privileged docker:19.03-dind-rootless --experimental ``` -`docker:-dind-rootless` image runs as a non-root user (UID 1000). +The `docker:-dind-rootless` image runs as a non-root user (UID 1000). However, `--privileged` is required for disabling seccomp, AppArmor, and mount masks. -### Expose Docker API socket via TCP +### Expose Docker API socket through TCP -To expose the Docker API socket via TCP, you need to launch `dockerd-rootless.sh` +To expose the Docker API socket through TCP, you need to launch `dockerd-rootless.sh` with `DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS="-p 0.0.0.0:2376:2376/tcp"`. ```console @@ -265,9 +284,9 @@ $ DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS="-p 0.0.0.0:2376:2376/tcp" \ --tlsverify --tlscacert=ca.pem --tlscert=cert.pem --tlskey=key.pem ``` -### Expose Docker API socket via SSH +### Expose Docker API socket through SSH -To expose the Docker API socket via SSH, you need to make sure `$DOCKER_HOST` +To expose the Docker API socket through SSH, you need to make sure `$DOCKER_HOST` is set on the remote host. ```console @@ -299,22 +318,22 @@ Or add `net.ipv4.ip_unprivileged_port_start=0` to `/etc/sysctl.conf` (or In Docker 19.03, rootless mode ignores cgroup-related `docker run` flags such as `--cpus`, `--memory`, --pids-limit`. -However, traditional `ulimit` and [`cpulimit`](https://github.com/opsengine/cpulimit) -can be still used, though they work in process-granularity rather than in container-granularity, -and can be arbitrary disabled by the container process. +However, you can still use the traditional `ulimit` and [`cpulimit`](https://github.com/opsengine/cpulimit), +though they work in process-granularity rather than in container-granularity, +and can be arbitrarily disabled by the container process. -e.g. -- To limit CPU usage to 0.5 cores (akin to `docker run --cpus 0.5): +For example: + +- To limit CPU usage to 0.5 cores (similar to `docker run --cpus 0.5`): `docker run cpulimit --limit=50 --include-children ` - -- To limit max VSZ to 64MiB (akin to `docker run --memory 64m`): +- To limit max VSZ to 64MiB (similar to `docker run --memory 64m`): `docker run sh -c "ulimit -v 65536; "` - To limit max number of processes to 100 per namespaced UID 2000 - (akin to `docker run --pids-limit=100): + (similar to `docker run --pids-limit=100`): `docker run --user 2000 --ulimit nproc=100 ` -### Changing network stack +### Changing the network stack `dockerd-rootless.sh` uses [slirp4netns](https://github.com/rootless-containers/slirp4netns) (if installed) or [VPNKit](https://github.com/moby/vpnkit) as the network stack @@ -329,41 +348,42 @@ and set `$DOCKERD_ROOTLESS_ROOTLESSKIT_NET=lxc-user-nic`. ## Troubleshooting -### Troubles during starting the daemon -#### `[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: operation not permitted` +### Errors when starting the Docker daemon -This error happens mostly when the value of `/proc/sys/kernel/unprivileged_userns_clone ` is set to 0: +**[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: operation not permitted** + +This error occurs mostly when the value of `/proc/sys/kernel/unprivileged_userns_clone ` is set to 0: ```console $ cat /proc/sys/kernel/unprivileged_userns_clone 0 ``` -To fix the issue, add `kernel.unprivileged_userns_clone=1` to +To fix this issue, add `kernel.unprivileged_userns_clone=1` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. -#### `[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: no space left on device` +**[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: no space left on device** -This error happens mostly when the value of `/proc/sys/user/max_user_namespaces` is too small: +This error occurs mostly when the value of `/proc/sys/user/max_user_namespaces` is too small: ```console $ cat /proc/sys/user/max_user_namespaces 0 ``` -To fix the issue, add `user.max_user_namespaces=28633` to +To fix this issue, add `user.max_user_namespaces=28633` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. -#### `[rootlesskit:parent] error: failed to setup UID/GID map: failed to compute uid/gid map: No subuid ranges found for user 1001 ("testuser")` +**[rootlesskit:parent] error: failed to setup UID/GID map: failed to compute uid/gid map: No subuid ranges found for user 1001 ("testuser")** -This error happens when `/etc/subuid` and `/etc/subgid` are not configured. +This error occurs when `/etc/subuid` and `/etc/subgid` are not configured. See [Prerequisites](#prerequisites). -See [Prerequisites](#prerequisites). +**could not get XDG_RUNTIME_DIR** -#### `could not get XDG_RUNTIME_DIR` -This error happens when `$XDG_RUNTIME_DIR` is not set. +This error occurs when `$XDG_RUNTIME_DIR` is not set. + +On a non-systemd host, you need to create a directory and then set the path: -On a non-systemd host, you need to create a directory and set the path by yourself: ```console $ export XDG_RUNTIME_DIR=$HOME/.docker/xrd $ rm -rf $XDG_RUNTIME_DIR @@ -372,14 +392,14 @@ $ dockerd-rootless.sh --experimental ``` > **Note**: -> You have to remove the directory on every logout. +> You must remove the directory every time you log out. -On a systemd host, login to the host via `pam_systemd` (see below). +On a systemd host, log into the host using `pam_systemd` (see below). The value is automatically set to `/run/user/$UID` and cleaned up on every logout. -#### `systemctl --user` fails with `Failed to connect to bus: No such file or directory` +**systemctl --user` fails with `Failed to connect to bus: No such file or directory** -This error happens mostly when you switched from the root user to an non-root user with `sudo`: +This error occurs mostly when you switch from the root user to an non-root user with `sudo`: ```console # sudo -iu testuser @@ -387,53 +407,55 @@ $ systemctl --user start docker Failed to connect to bus: No such file or directory ``` -Instead of `sudo -iu `, you need to login via `pam_systemd`, e.g. -- Login via the graphic console +Instead of `sudo -iu `, you need to log in using `pam_systemd`. For example: + +- Log in through the graphic console - `ssh @localhost` - `machinectl shell @` -#### The daemon does not start up automatically +**The daemon does not start up automatically** You need `sudo loginctl enable-linger $(whoami)` to enable the daemon to start up automatically. See [Usage](#usage). #### `rootless mode is supported only when running in experimental mode` -This error happens when the daemon was launched without `--experimental`. +This error occurs when the daemon is launched without the `--experimental` flag. See [Usage](#usage). -### Troubles during `docker pull` -#### `docker: failed to register layer: Error processing tar file(exit status 1): lchown : invalid argument` +### `docker pull` errors -This error happens when the number of available entries in `/etc/subuid` or `/etc/subgid` is not sufficient. -The number of required entries vary across images, but having 65,536 entries is enough for most images. +**docker: failed to register layer: Error processing tar file(exit status 1): lchown <FILE>: invalid argument** -See [Prerequisites](#prerequisites). +This error occurs when the number of available entries in `/etc/subuid` or +`/etc/subgid` is not sufficient. The number of entries required vary across +images. However, 65,536 entries are sufficient for most images. See +[Prerequisites](#prerequisites). -### Errors during `docker run` +### `docker run` errors -#### `--cpus`, `--memory`, and `--pids-limit` are ignored +**`--cpus`, `--memory`, and `--pids-limit` are ignored** -Expected behavior in Docker 19.03. -See [Limiting resources](#limiting-resources). +This is an expected behavior in Docker 19.03. For more information, see [Limiting resources](#limiting-resources). -#### `Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.` +**Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.** -This error happens mostly when the host is running with cgroup v2. -See [Fedora 31 or later](#fedora-31-or-later) to switch the host to use cgroup v1. +This error occurs mostly when the host is running in cgroup v2. See the section +[Fedora 31 or later](#fedora-31-or-later) for information on switching the host +to use cgroup v1. -### Networking +### Networking errors -#### `docker run -p` fails with `cannot expose privileged port ...` +**`docker run -p` fails with `cannot expose privileged port`** -`docker run -p` fails with this error when an privileged port (< 1024) is specified as the host port. +`docker run -p` fails with this error when a privileged port (< 1024) is specified as the host port. ```console $ docker run -p 80:80 nginx:alpine docker: Error response from daemon: driver failed programming external connectivity on endpoint focused_swanson (9e2e139a9d8fc92b37c36edfa6214a6e986fa2028c0cc359812f685173fa6df7): Error starting userland proxy: error while calling PortManager.AddPort(): cannot expose privileged port 80, you might need to add "net.ipv4.ip_unprivileged_port_start=0" (currently 1024) to /etc/sysctl.conf, or set CAP_NET_BIND_SERVICE on rootlesskit binary, or choose a larger port number (>= 1024): listen tcp 0.0.0.0:80: bind: permission denied. ``` -When this error happened, consider using an unprivileged port instead, e.g. 8080 instead of 80. +When you experience this error, consider using an unprivileged port instead. For example, 8080 instead of 80. ```console $ docker run -p 8080:80 nginx:alpine @@ -441,7 +463,7 @@ $ docker run -p 8080:80 nginx:alpine To allow exposing privileged ports, see [Exposing privileged ports](#exposing-privileged-ports). -#### ping doesn't work +**ping doesn't work** Ping does not work when `/proc/sys/net/ipv4/ping_group_range` is set to `1 0`: @@ -450,14 +472,14 @@ $ cat /proc/sys/net/ipv4/ping_group_range 1 0 ``` -See [Routing ping packets](#routing-ping-packets). +For details, see [Routing ping packets](#routing-ping-packets). -#### `IPAddress` shown in `docker inspect` is unreachable +**`IPAddress` shown in `docker inspect` is unreachable** -Expected behavior, as the daemon is namespaced inside RootlessKit's network namespace. -Use `docker run -p` instead. +This is an expected behavior, as the daemon is namespaced inside RootlessKit's +network namespace. Use `docker run -p` instead. -#### `--net=host` doesn't listen ports on the host network namespace +**`--net=host` doesn't listen ports on the host network namespace** -Expected behavior, as the daemon is namespaced inside RootlessKit's network namespace. -Use `docker run -p` instead. +This is an expected behavior, as the daemon is namespaced inside RootlessKit's +network namespace. Use `docker run -p` instead.