diff --git a/engine/security/rootless.md b/engine/security/rootless.md index 343c8a5d2d..2fcf2d6d1c 100644 --- a/engine/security/rootless.md +++ b/engine/security/rootless.md @@ -56,7 +56,7 @@ testuser::231072:65536 - `overlay2` storage driver is enabled by default ([Ubuntu-specific kernel patch](https://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/commit/fs/overlayfs?id=3b7da90f28fe1ed4b79ef2d994c81efbc58f1144)). -- Known to work on Ubuntu 16.04 and 18.04. +- Known to work on Ubuntu 16.04, 18.04, and 20.04. #### Debian GNU/Linux - Add `kernel.unprivileged_userns_clone=1` to `/etc/sysctl.conf` (or @@ -85,12 +85,10 @@ testuser::231072:65536 - Fedora 31 uses cgroup v2 by default, which is not yet supported by the containerd runtime. Run `sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"` to use cgroup v1. - -#### Fedora 30 -- No preparation is needed +- `sudo dnf install -y iptables` might be needed. #### CentOS 8 -- No preparation is needed +- `sudo dnf install -y iptables` might be needed. #### CentOS 7 - Add `user.max_user_namespaces=28633` to `/etc/sysctl.conf` (or @@ -112,17 +110,17 @@ testuser::231072:65536 - Only `vfs` graphdriver is supported. However, on Ubuntu and Debian 10, `overlay2` and `overlay` are also supported. - - Following features are not supported: - Cgroups (including `docker top`, which depends on the cgroups) - AppArmor - Checkpoint - Overlay network - Exposing SCTP ports - - To use `ping` command, see [Routing ping packets](#routing-ping-packets) - - To expose privileged TCP/UDP ports (< 1024), see [Exposing privileged ports](#exposing-privileged-ports) +- `IPAddress` shown in `docker inspect` and is namespaced inside RootlessKit's network namespace. + This means the IP address is not reachable from the host without `nsenter`-ing into the network namespace. +- Host network (`docker run --net=host`) is namespaced inside RootlessKit as well. ## Install @@ -133,6 +131,7 @@ $ curl -fsSL https://get.docker.com/rootless | sh ``` Make sure to run the script as a non-root user. +To install Rootless Docker as the root user, see [Manual installation](#manual-installation) steps. The script will show the environment variables that are needed to be set: @@ -153,10 +152,26 @@ export DOCKER_HOST=unix:///run/user/1001/docker.sock # ``` +### Manual installation To install the binaries manually without using the installer, extract `docker-rootless-extras-.tar.gz` along with `docker-.tar.gz`: from [https://download.docker.com/linux/static/stable/x86_64/](https://download.docker.com/linux/static/stable/x86_64/){: target="_blank" class="_" } +If you already have Docker daemon running as the root, you only need to extract `docker-rootless-extras-.tar.gz`. +The archive can be extracted under an arbitrary directory listed in the `$PATH`. e.g. `/usr/local/bin`, or `$HOME/bin`. + +### Nightly channel + +To install a nightly version of Rootless Docker, execute the installation script with `CHANNEL="nightly"`: + +```console +$ curl -fsSL https://get.docker.com/rootless | CHANNEL="nightly" sh +``` + +The raw binary archives are available at: +- https://master.dockerproject.org/linux/x86_64/docker-rootless-extras.tgz +- https://master.dockerproject.org/linux/x86_64/docker.tgz + ## Usage ### Daemon @@ -204,11 +219,23 @@ Other remarks: ### Client -You need to set the socket path explicitly. +You need to specify the socket path explicitly. +To specify the socket path via `$DOCKER_HOST`: ```console $ export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock -$ docker run -d nginx +$ docker run -d -p 8080:80 nginx +``` + +To specify the socket path via `docker context`: +```console +$ docker context create rootless --description "for rootless mode" --docker "host=unix://$XDG_RUNTIME_DIR/docker.sock" +rootless +Successfully created context "rootless" +$ docker context use rootless +rootless +Current context is now "rootless" +$ docker run -d -p 8080:80 nginx ``` ## Tips @@ -238,9 +265,20 @@ $ DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS="-p 0.0.0.0:2376:2376/tcp" \ --tlsverify --tlscacert=ca.pem --tlscert=cert.pem --tlskey=key.pem ``` +### Expose Docker API socket via SSH + +To expose the Docker API socket via SSH, you need to make sure `$DOCKER_HOST` +is set on the remote host. + +```console +$ ssh -l 'echo $DOCKER_HOST' +unix:///run/user/1001/docker.sock +$ docker -H ssh://@ run ... +``` + ### Routing ping packets -`ping` command does not work by default. +On some distributions, `ping` does not work by default. Add `net.ipv4.ping_group_range = 0 2147483647` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system` to allow using `ping`. @@ -258,11 +296,23 @@ Or add `net.ipv4.ip_unprivileged_port_start=0` to `/etc/sysctl.conf` (or ### Limiting resources -Currently, rootless mode ignores cgroup-related `docker run` flags such as -`--cpus` and `memory`. +In Docker 19.03, rootless mode ignores cgroup-related `docker run` flags such as +`--cpus`, `--memory`, --pids-limit`. However, traditional `ulimit` and [`cpulimit`](https://github.com/opsengine/cpulimit) -can be still used, though they work in process-granularity rather than in container-granularity. +can be still used, though they work in process-granularity rather than in container-granularity, +and can be arbitrary disabled by the container process. + +e.g. +- To limit CPU usage to 0.5 cores (akin to `docker run --cpus 0.5): + `docker run cpulimit --limit=50 --include-children ` + +- To limit max VSZ to 64MiB (akin to `docker run --memory 64m`): + `docker run sh -c "ulimit -v 65536; "` + +- To limit max number of processes to 100 per namespaced UID 2000 + (akin to `docker run --pids-limit=100): + `docker run --user 2000 --ulimit nproc=100 ` ### Changing network stack @@ -271,8 +321,143 @@ can be still used, though they work in process-granularity rather than in contai by default. These network stacks run in userspace and might have performance overhead. -See [RootlessKit documentation](https://github.com/rootless-containers/rootlesskit/tree/v0.7.0#network-drivers) for further information. +See [RootlessKit documentation](https://github.com/rootless-containers/rootlesskit/tree/v0.9.5#network-drivers) for further information. Optionally, you can use `lxc-user-nic` instead for the best performance. -To use `lxc-user-nic`, you need to edit [`/etc/lxc/lxc-usernet`](https://github.com/rootless-containers/rootlesskit/tree/v0.7.0#--netlxc-user-nic-experimental) +To use `lxc-user-nic`, you need to edit [`/etc/lxc/lxc-usernet`](https://github.com/rootless-containers/rootlesskit/tree/v0.9.5#--netlxc-user-nic-experimental) and set `$DOCKERD_ROOTLESS_ROOTLESSKIT_NET=lxc-user-nic`. + +## Troubleshooting + +### Troubles during starting the daemon +#### `[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: operation not permitted` + +This error happens mostly when the value of `/proc/sys/kernel/unprivileged_userns_clone ` is set to 0: + +```console +$ cat /proc/sys/kernel/unprivileged_userns_clone +0 +``` + +To fix the issue, add `kernel.unprivileged_userns_clone=1` to +`/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. + +#### `[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: no space left on device` + +This error happens mostly when the value of `/proc/sys/user/max_user_namespaces` is too small: + +```console +$ cat /proc/sys/user/max_user_namespaces +0 +``` + +To fix the issue, add `user.max_user_namespaces=28633` to +`/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. + +#### `[rootlesskit:parent] error: failed to setup UID/GID map: failed to compute uid/gid map: No subuid ranges found for user 1001 ("testuser")` + +This error happens when `/etc/subuid` and `/etc/subgid` are not configured. + +See [Prerequisites](#prerequisites). + +#### `could not get XDG_RUNTIME_DIR` +This error happens when `$XDG_RUNTIME_DIR` is not set. + +On a non-systemd host, you need to create a directory and set the path by yourself: +```console +$ export XDG_RUNTIME_DIR=$HOME/.docker/xrd +$ rm -rf $XDG_RUNTIME_DIR +$ mkdir -p $XDG_RUNTIME_DIR +$ dockerd-rootless.sh --experimental +``` + +> **Note**: +> You have to remove the directory on every logout. + +On a systemd host, login to the host via `pam_systemd` (see below). +The value is automatically set to `/run/user/$UID` and cleaned up on every logout. + +#### `systemctl --user` fails with `Failed to connect to bus: No such file or directory` + +This error happens mostly when you switched from the root user to an non-root user with `sudo`: + +```console +# sudo -iu testuser +$ systemctl --user start docker +Failed to connect to bus: No such file or directory +``` + +Instead of `sudo -iu `, you need to login via `pam_systemd`, e.g. +- Login via the graphic console +- `ssh @localhost` +- `machinectl shell @` + +#### The daemon does not start up automatically + +You need `sudo loginctl enable-linger $(whoami)` to enable the daemon to start +up automatically. See [Usage](#usage). + +#### `rootless mode is supported only when running in experimental mode` + +This error happens when the daemon was launched without `--experimental`. +See [Usage](#usage). + +### Troubles during `docker pull` +#### `docker: failed to register layer: Error processing tar file(exit status 1): lchown : invalid argument` + +This error happens when the number of available entries in `/etc/subuid` or `/etc/subgid` is not sufficient. +The number of required entries vary across images, but having 65,536 entries is enough for most images. + +See [Prerequisites](#prerequisites). + +### Errors during `docker run` + +#### `--cpus`, `--memory`, and `--pids-limit` are ignored + +Expected behavior in Docker 19.03. +See [Limiting resources](#limiting-resources). + +#### `Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.` + +This error happens mostly when the host is running with cgroup v2. +See [Fedora 31 or later](#fedora-31-or-later) to switch the host to use cgroup v1. + +### Networking + +#### `docker run -p` fails with `cannot expose privileged port ...` + +`docker run -p` fails with this error when an privileged port (< 1024) is specified as the host port. + +```console +$ docker run -p 80:80 nginx:alpine +docker: Error response from daemon: driver failed programming external connectivity on endpoint focused_swanson (9e2e139a9d8fc92b37c36edfa6214a6e986fa2028c0cc359812f685173fa6df7): Error starting userland proxy: error while calling PortManager.AddPort(): cannot expose privileged port 80, you might need to add "net.ipv4.ip_unprivileged_port_start=0" (currently 1024) to /etc/sysctl.conf, or set CAP_NET_BIND_SERVICE on rootlesskit binary, or choose a larger port number (>= 1024): listen tcp 0.0.0.0:80: bind: permission denied. +``` + +When this error happened, consider using an unprivileged port instead, e.g. 8080 instead of 80. + +```console +$ docker run -p 8080:80 nginx:alpine +``` + +To allow exposing privileged ports, see [Exposing privileged ports](#exposing-privileged-ports). + +#### ping doesn't work + +Ping does not work when `/proc/sys/net/ipv4/ping_group_range` is set to `1 0`: + +```console +$ cat /proc/sys/net/ipv4/ping_group_range +1 0 +``` + +See [Routing ping packets](#routing-ping-packets). + +#### `IPAddress` shown in `docker inspect` is unreachable + +Expected behavior, as the daemon is namespaced inside RootlessKit's network namespace. +Use `docker run -p` instead. + +#### `--net=host` doesn't listen ports on the host network namespace + +Expected behavior, as the daemon is namespaced inside RootlessKit's network namespace. +Use `docker run -p` instead.