From 1976c2178c6317400fdbdc0933570c5ef9248ab6 Mon Sep 17 00:00:00 2001 From: Akihiro Suda Date: Fri, 26 Jun 2020 15:24:49 +0900 Subject: [PATCH] v20.10 docs for cgroup v2 and rootless * Docker now supports cgroup v2 (both rootful and rootless) * Rootless mode graduated from experimental * New storage driver: fuse-overlayfs Signed-off-by: Akihiro Suda --- config/containers/runmetrics.md | 58 ++++++++++- engine/install/fedora.md | 13 +-- engine/security/rootless.md | 99 ++++++++++++------- storage/storagedriver/overlayfs-driver.md | 2 + .../storagedriver/select-storage-driver.md | 10 ++ 5 files changed, 134 insertions(+), 48 deletions(-) diff --git a/config/containers/runmetrics.md b/config/containers/runmetrics.md index 0c16f0cb79..8772d1ccc9 100644 --- a/config/containers/runmetrics.md +++ b/config/containers/runmetrics.md @@ -55,6 +55,18 @@ $ grep cgroup /proc/mounts ### Enumerate cgroups +The file layout of cgroups is significantly different between v1 and v2. + +If `/sys/fs/cgroup/cgroup.controllers` is present on your system, you are using v2, +otherwise you are using v1. +Refer to the subsection that corresponds to your cgroup version. + +> **Note** +> +> As of 2020, Fedora is the only well-known Linux distributon that uses cgroup v2 by default. +> Fedora uses cgroup v2 by default since Fedora 31. + +#### cgroup v1 You can look into `/proc/cgroups` to see the different control group subsystems known to the system, the hierarchy they belong to, and how many groups they contain. @@ -64,6 +76,41 @@ the hierarchy mountpoint. `/` means the process has not been assigned to a group, while `/lxc/pumpkin` indicates that the process is a member of a container named `pumpkin`. +#### cgroup v2 + +On cgroup v2 hosts, the content of `/proc/cgroups` isn't meaningful. +See `/sys/fs/cgroup/cgroup.controllers` to the available controllers. + +### Changing cgroup version + +Changing cgroup version requires rebooting the entire system. + +On systemd-based systems, cgroup v2 can be enabled by adding `systemd.unified_cgroup_hierarchy=1` +to the kernel cmdline. +To revert the cgroup version to v1, you need to set `systemd.unified_cgroup_hierarchy=0` instead. + +If `grubby` command is available on your system (e.g. on Fedora), the cmdline can be modified as follows: + +```console +$ sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1" +``` + +If `grubby` command is not available, edit the `GRUB_CMDLINE_LINUX` line in `/etc/default/grub` +and run `sudo update-grub`. + +### Running Docker on cgroup v2 + +Docker supports cgroup v2 experimentally since Docker 20.10. +Running Docker on cgroup v2 also requires the following conditions to be satisfied: +* containerd: v1.4 or later +* runc: v1.0.0-rc91 or later +* Kernel: v4.15 or later (v5.2 or later is recommended) + +Note that the cgroup v2 mode behaves slightly different from the cgroup v1 mode: +* The default cgroup driver (`dockerd --exec-opt native.cgroupdriver`) is "systemd" on v2, "cgroupfs" on v1. +* The default cgroup namespace mode (`docker run --cgroupns`) is "private" on v2, "host" on v1. +* The `docker run` flags `--oom-kill-disable` and `--kernel-memory` are discarded on v2. + ### Find the cgroup for a given container For each container, one cgroup is created in each hierarchy. On @@ -78,10 +125,19 @@ in `docker ps`, its long ID might be something like look it up with `docker inspect` or `docker ps --no-trunc`. Putting everything together to look at the memory metrics for a Docker -container, take a look at `/sys/fs/cgroup/memory/docker//`. +container, take a look at the following paths: +- `/sys/fs/cgroup/memory/docker//` on cgroup v1, `cgroupfs` driver +- `/sys/fs/cgroup/memory/system.slice/docker-.scope/` on cgroup v1, `systemd` driver +- `/sys/fs/cgroup/docker/` on cgroup v2, `cgroupfs` driver +- `/sys/fs/cgroup/system.slice/docker-.scope/` on cgroup v2, `systemd` driver ### Metrics from cgroups: memory, CPU, block I/O +> **Note** +> +> This section is not yet updated for cgroup v2. +> For further information about cgroup v2, refer to [the kernel documentation](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html). + For each subsystem (memory, CPU, and block I/O), one or more pseudo-files exist and contain statistics. diff --git a/engine/install/fedora.md b/engine/install/fedora.md index 2f2d4f165b..0a7acae4db 100644 --- a/engine/install/fedora.md +++ b/engine/install/fedora.md @@ -160,22 +160,13 @@ $ sudo dnf config-manager \ Docker is installed but not started. The `docker` group is created, but no users are added to the group. -3. Cgroups Exception: - For Fedora 31 and higher, you need to enable the [backward compatibility for Cgroups](https://fedoraproject.org/wiki/Common_F31_bugs#Other_software_issues). - - ```bash - $ sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0" - ``` - - After running the command, you must reboot for the changes to take effect. - -4. Start Docker. +3. Start Docker. ```bash $ sudo systemctl start docker ``` -5. Verify that Docker Engine is installed correctly by running the `hello-world` +4. Verify that Docker Engine is installed correctly by running the `hello-world` image. ```bash diff --git a/engine/security/rootless.md b/engine/security/rootless.md index 543c3e0f6f..e0645f32e7 100644 --- a/engine/security/rootless.md +++ b/engine/security/rootless.md @@ -11,12 +11,8 @@ the container runtime. Rootless mode does not require root privileges even during the installation of the Docker daemon, as long as the [prerequisites](#prerequisites) are met. -Rootless mode was introduced in Docker Engine v19.03. - -> **Note** -> -> Rootless mode is an experimental feature and has some limitations. For details, -> see [Known limitations](#known-limitations). +Rootless mode was introduced in Docker Engine v19.03 as an experimental feature. +Rootless mode graduated from experimental in Docker Engine v20.10. ## How it works @@ -78,35 +74,35 @@ testuser:231072:65536 #### Arch Linux +- Installing `fuse-overlayfs` is recommended. Run `sudo pacman -S fuse-overlayfs`. + - Add `kernel.unprivileged_userns_clone=1` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system` #### openSUSE +- Installing `fuse-overlayfs` is recommended. Run `sudo zypper install -y fuse-overlayfs`. + - `sudo modprobe ip_tables iptable_mangle iptable_nat iptable_filter` is required. This might be required on other distros as well depending on the configuration. - Known to work on openSUSE 15. -#### Fedora 31 and later +#### CentOS 8 and Fedora -- Fedora 31 uses cgroup v2 by default, which is not yet supported by the containerd runtime. - Run `sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"` - to use cgroup v1. -- You might need `sudo dnf install -y iptables`. - -#### CentOS 8 +- Installing `fuse-overlayfs` is recommended. Run `sudo dnf install -y fuse-overlayfs`. - You might need `sudo dnf install -y iptables`. +- Known to work on CentOS 8 and Fedora 32. + #### CentOS 7 - Add `user.max_user_namespaces=28633` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. - `systemctl --user` does not work by default. - Run the daemon directly without systemd: - `dockerd-rootless.sh --experimental --storage-driver vfs` + Run `dockerd-rootless.sh` directly without systemd. - Known to work on CentOS 7.7. Older releases require additional configuration steps. @@ -118,10 +114,12 @@ testuser:231072:65536 ## Known limitations -- Only `vfs` graphdriver is supported. However, on Ubuntu and Debian 10, - `overlay2` and `overlay` are also supported. +- Only the following storage drivers are supported: + - `overlay2` (only on Ubuntu and Debian 10 hosts) + - `fuse-overlayfs` (only if running with kernel 4.18 or later, and `fuse-overlayfs` is installed) + - `vfs` +- Cgroup is supported only when running with cgroup v2 and systemd. See [Limiting resources](#limiting-resources). - Following features are not supported: - - Cgroups (including `docker top`, which depends on the cgroups) - AppArmor - Checkpoint - Overlay network @@ -206,16 +204,8 @@ $ sudo loginctl enable-linger $(whoami) To run the daemon directly without systemd, you need to run `dockerd-rootless.sh` instead of `dockerd`: -```console -$ dockerd-rootless.sh --experimental --storage-driver vfs -``` - -As Rootless mode is experimental, you need to run -`dockerd-rootless.sh` with `--experimental`. - -You also need `--storage-driver vfs` unless you are using Ubuntu or Debian 10 -kernel. You don't need to care about these flags if you manage the daemon using -systemd, as these flags are automatically added to the systemd unit file. +On Docker 19.03, you had to run `dockerd-rootless.sh` with `--experimental`. +The `--experimental` flag is no longer needed since Docker 20.10. Remarks about directory paths: @@ -232,7 +222,6 @@ Other remarks: and network namespaces. You can enter the namespaces by running `nsenter -U --preserve-credentials -n -m -t $(cat $XDG_RUNTIME_DIR/docker.pid)`. - `docker info` shows `rootless` in `SecurityOptions` -- `docker info` shows `none` as `Cgroup Driver` ### Client @@ -265,13 +254,19 @@ To run Rootless Docker inside "rootful" Docker, use the `docker:-dind-r image instead of `docker:-dind`. ```console -$ docker run -d --name dind-rootless --privileged docker:19.03-dind-rootless --experimental +$ docker run -d --name dind-rootless --privileged docker:20.10-dind-rootless ``` The `docker:-dind-rootless` image runs as a non-root user (UID 1000). However, `--privileged` is required for disabling seccomp, AppArmor, and mount masks. +To run Docker 19.03 in Docker, the `--experimental` flag is needed: + +```console +$ docker run -d --name dind-rootless --privileged docker:19.03-dind-rootless --experimental +``` + ### Expose Docker API socket through TCP To expose the Docker API socket through TCP, you need to launch `dockerd-rootless.sh` @@ -314,11 +309,39 @@ Or add `net.ipv4.ip_unprivileged_port_start=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. ### Limiting resources +Limiting resources with cgroup-related `docker run` flags such as `--cpus`, `--memory`, `--pids-limit` +is supported only when running with cgroup v2 and systemd. +See [Changing cgroup version](../../config/containers/runmetrics.md) to enable cgroup v2. -In Docker 19.03, rootless mode ignores cgroup-related `docker run` flags such as -`--cpus`, `--memory`, `--pids-limit`. +If `docker info` shows `none` as `Cgroup Driver`, the conditions are not satisfied. +When these conditions are not satisfied, rootless mode ignores the cgroup-related `docker run` flags. +See [Limiting resources without cgroup](#limiting-resources-without-cgroup) for workarounds. -However, you can still use the traditional `ulimit` and [`cpulimit`](https://github.com/opsengine/cpulimit), +If `docker info` shows `systemd` as `Cgroup Driver`, the conditions are satisfied. +However, typically, only `memory` and `pids` controllers are delegated to non-root users by default. + +```console +$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers +memory pids +``` + +To allow delegation of all controllers, you need to change the systemd configuration as follows: + +```console +# mkdir -p /etc/systemd/system/user@.service.d +# cat > /etc/systemd/system/user@.service.d/delegate.conf << EOF +[Service] +Delegate=cpu cpuset io memory pids +EOF +# systemctl daemon-reload +``` + +> **Note** +> +> Delegating `cpuset` requires systemd 244 or later. + +#### Limiting resources without cgroup +Even when cgroup is not available, you can still use the traditional `ulimit` and [`cpulimit`](https://github.com/opsengine/cpulimit), though they work in process-granularity rather than in container-granularity, and can be arbitrarily disabled by the container process. @@ -388,7 +411,7 @@ On a non-systemd host, you need to create a directory and then set the path: $ export XDG_RUNTIME_DIR=$HOME/.docker/xrd $ rm -rf $XDG_RUNTIME_DIR $ mkdir -p $XDG_RUNTIME_DIR -$ dockerd-rootless.sh --experimental +$ dockerd-rootless.sh ``` > **Note**: @@ -420,9 +443,11 @@ up automatically. See [Usage](#usage). **`dockerd` fails with "rootless mode is supported only when running in experimental mode"** -This error occurs when the daemon is launched without the `--experimental` flag. +This error occurs when the daemon is launched without the `--experimental` flag on Docker 19.03. See [Usage](#usage). +The `--experimental` flag is no longer needed since Docker 20.10. + ### `docker pull` errors **docker: failed to register layer: Error processing tar file(exit status 1): lchown <FILE>: invalid argument** @@ -436,7 +461,9 @@ images. However, 65,536 entries are sufficient for most images. See **`--cpus`, `--memory`, and `--pids-limit` are ignored** -This is an expected behavior in Docker 19.03. For more information, see [Limiting resources](#limiting-resources). +This is an expected behavior on cgroup v1 mode. +To use these flags, the host needs to be configured for enabling cgroup v2. +For more information, see [Limiting resources](#limiting-resources). **Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.** diff --git a/storage/storagedriver/overlayfs-driver.md b/storage/storagedriver/overlayfs-driver.md index 6b4031d7bd..3ea00b1fbd 100644 --- a/storage/storagedriver/overlayfs-driver.md +++ b/storage/storagedriver/overlayfs-driver.md @@ -21,6 +21,8 @@ storage driver as `overlay` or `overlay2`. > For more information about differences between `overlay` vs `overlay2`, check > [Docker storage drivers](select-storage-driver.md). +> **Note**: For `fuse-overlayfs` driver, check [Rootless mode documentation](../../engine/security/rootless.md). + ## Prerequisites OverlayFS is the recommended storage driver, and supported if you meet the following diff --git a/storage/storagedriver/select-storage-driver.md b/storage/storagedriver/select-storage-driver.md index 16a1e5ae31..0aa857d4ee 100644 --- a/storage/storagedriver/select-storage-driver.md +++ b/storage/storagedriver/select-storage-driver.md @@ -34,6 +34,11 @@ Docker supports the following storage drivers: Linux distributions, and requires no extra configuration. * `aufs` was the preferred storage driver for Docker 18.06 and older, when running on Ubuntu 14.04 on kernel 3.13 which had no support for `overlay2`. +* `fuse-overlayfs` is preferred only for running Rootless Docker + on a host that does not provide support for rootless `overlay2`. + On Ubuntu and Debian 10, the `fuse-overlayfs` driver does not need to be + used `overlay2` works even in rootless mode. + See [Rootless mode documentation](../../engine/security/rootless.md). * `devicemapper` is supported, but requires `direct-lvm` for production environments, because `loopback-lvm`, while zero-configuration, has very poor performance. `devicemapper` was the recommended storage driver for @@ -98,6 +103,10 @@ release. It is recommended that users of the `overlay` storage driver migrate to release. It is recommended that users of the `devicemapper` storage driver migrate to `overlay2`. +> **Note** +> +> The comparison table above is not applicable for Rootless mode. +> For the drivers available in Rootless mode, see [the Rootless mode documentation](../../engine/security/rootless.md). When possible, `overlay2` is the recommended storage driver. When installing Docker for the first time, `overlay2` is used by default. Previously, `aufs` was @@ -147,6 +156,7 @@ backing filesystems. | Storage driver | Supported backing filesystems | |:----------------------|:------------------------------| | `overlay2`, `overlay` | `xfs` with ftype=1, `ext4` | +| `fuse-overlayfs` | any filesystem | | `aufs` | `xfs`, `ext4` | | `devicemapper` | `direct-lvm` | | `btrfs` | `btrfs` |