docs/content/manuals/engine/storage/drivers/btrfs-driver.md

337 lines
12 KiB
Markdown

---
description: Learn how to optimize your use of Btrfs driver.
keywords: container, storage, driver, Btrfs
title: BTRFS storage driver
aliases:
- /storage/storagedriver/btrfs-driver/
---
Btrfs is a copy-on-write filesystem that supports many advanced storage
technologies, making it a good fit for Docker. Btrfs is included in the
mainline Linux kernel.
Docker's `btrfs` storage driver leverages many Btrfs features for image and
container management. Among these features are block-level operations, thin
provisioning, copy-on-write snapshots, and ease of administration. You can
combine multiple physical block devices into a single Btrfs filesystem.
This page refers to Docker's Btrfs storage driver as `btrfs` and the overall
Btrfs Filesystem as Btrfs.
> [!NOTE]
>
> The `btrfs` storage driver is only supported with Docker Engine CE on SLES,
> Ubuntu, and Debian systems.
## Prerequisites
`btrfs` is supported if you meet the following prerequisites:
- `btrfs` is only recommended with Docker CE on Ubuntu or Debian systems.
- Changing the storage driver makes any containers you have already
created inaccessible on the local system. Use `docker save` to save containers,
and push existing images to Docker Hub or a private repository, so that you
do not need to re-create them later.
- `btrfs` requires a dedicated block storage device such as a physical disk. This
block device must be formatted for Btrfs and mounted into `/var/lib/docker/`.
The configuration instructions below walk you through this procedure. By
default, the SLES `/` filesystem is formatted with Btrfs, so for SLES, you do
not need to use a separate block device, but you can choose to do so for
performance reasons.
- `btrfs` support must exist in your kernel. To check this, run the following
command:
```console
$ grep btrfs /proc/filesystems
btrfs
```
- To manage Btrfs filesystems at the level of the operating system, you need the
`btrfs` command. If you don't have this command, install the `btrfsprogs`
package (SLES) or `btrfs-tools` package (Ubuntu).
## Configure Docker to use the btrfs storage driver
This procedure is essentially identical on SLES and Ubuntu.
1. Stop Docker.
2. Copy the contents of `/var/lib/docker/` to a backup location, then empty
the contents of `/var/lib/docker/`:
```console
$ sudo cp -au /var/lib/docker /var/lib/docker.bk
$ sudo rm -rf /var/lib/docker/*
```
3. Format your dedicated block device or devices as a Btrfs filesystem. This
example assumes that you are using two block devices called `/dev/xvdf` and
`/dev/xvdg`. Double-check the block device names because this is a
destructive operation.
```console
$ sudo mkfs.btrfs -f /dev/xvdf /dev/xvdg
```
There are many more options for Btrfs, including striping and RAID. See the
[Btrfs documentation](https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices).
4. Mount the new Btrfs filesystem on the `/var/lib/docker/` mount point. You
can specify any of the block devices used to create the Btrfs filesystem.
```console
$ sudo mount -t btrfs /dev/xvdf /var/lib/docker
```
> [!NOTE]
>
> Make the change permanent across reboots by adding an entry to
> `/etc/fstab`.
5. Copy the contents of `/var/lib/docker.bk` to `/var/lib/docker/`.
```console
$ sudo cp -au /var/lib/docker.bk/* /var/lib/docker/
```
6. Configure Docker to use the `btrfs` storage driver. This is required even
though `/var/lib/docker/` is now using a Btrfs filesystem.
Edit or create the file `/etc/docker/daemon.json`. If it is a new file, add
the following contents. If it is an existing file, add the key and value
only, being careful to end the line with a comma if it isn't the final
line before an ending curly bracket (`}`).
```json
{
"storage-driver": "btrfs"
}
```
See all storage options for each storage driver in the
[daemon reference documentation](/reference/cli/dockerd/#options-per-storage-driver)
7. Start Docker. When it's running, verify that `btrfs` is being used as the
storage driver.
```console
$ docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 17.03.1-ce
Storage Driver: btrfs
Build Version: Btrfs v4.4
Library Version: 101
<...>
```
8. When you are ready, remove the `/var/lib/docker.bk` directory.
## Manage a Btrfs volume
One of the benefits of Btrfs is the ease of managing Btrfs filesystems without
the need to unmount the filesystem or restart Docker.
When space gets low, Btrfs automatically expands the volume in chunks of
roughly 1 GB.
To add a block device to a Btrfs volume, use the `btrfs device add` and
`btrfs filesystem balance` commands.
```console
$ sudo btrfs device add /dev/svdh /var/lib/docker
$ sudo btrfs filesystem balance /var/lib/docker
```
> [!NOTE]
>
> While you can do these operations with Docker running, performance suffers.
> It might be best to plan an outage window to balance the Btrfs filesystem.
## How the `btrfs` storage driver works
The `btrfs` storage driver works differently from other
storage drivers in that your entire `/var/lib/docker/` directory is stored on a
Btrfs volume.
### Image and container layers on-disk
Information about image layers and writable container layers is stored in
`/var/lib/docker/btrfs/subvolumes/`. This subdirectory contains one directory
per image or container layer, with the unified filesystem built from a layer
plus all its parent layers. Subvolumes are natively copy-on-write and have space
allocated to them on-demand from an underlying storage pool. They can also be
nested and snapshotted. The diagram below shows 4 subvolumes. 'Subvolume 2' and
'Subvolume 3' are nested, whereas 'Subvolume 4' shows its own internal directory
tree.
![Subvolume example](images/btfs_subvolume.webp?w=350&h=100)
Only the base layer of an image is stored as a true subvolume. All the other
layers are stored as snapshots, which only contain the differences introduced
in that layer. You can create snapshots of snapshots as shown in the diagram
below.
![Snapshots diagram](images/btfs_snapshots.webp?w=350&h=100)
On disk, snapshots look and feel just like subvolumes, but in reality they are
much smaller and more space-efficient. Copy-on-write is used to maximize storage
efficiency and minimize layer size, and writes in the container's writable layer
are managed at the block level. The following image shows a subvolume and its
snapshot sharing data.
![Snapshot and subvolume sharing data](images/btfs_pool.webp?w=450&h=200)
For maximum efficiency, when a container needs more space, it is allocated in
chunks of roughly 1 GB in size.
Docker's `btrfs` storage driver stores every image layer and container in its
own Btrfs subvolume or snapshot. The base layer of an image is stored as a
subvolume whereas child image layers and containers are stored as snapshots.
This is shown in the diagram below.
![Btrfs container layers](images/btfs_container_layer.webp?w=600)
The high level process for creating images and containers on Docker hosts
running the `btrfs` driver is as follows:
1. The image's base layer is stored in a Btrfs _subvolume_ under
`/var/lib/docker/btrfs/subvolumes`.
2. Subsequent image layers are stored as a Btrfs _snapshot_ of the parent
layer's subvolume or snapshot, but with the changes introduced by this
layer. These differences are stored at the block level.
3. The container's writable layer is a Btrfs snapshot of the final image layer,
with the differences introduced by the running container. These differences
are stored at the block level.
## How container reads and writes work with `btrfs`
### Reading files
A container is a space-efficient snapshot of an image. Metadata in the snapshot
points to the actual data blocks in the storage pool. This is the same as with
a subvolume. Therefore, reads performed against a snapshot are essentially the
same as reads performed against a subvolume.
### Writing files
As a general caution, writing and updating a large number of small files with
Btrfs can result in slow performance.
Consider three scenarios where a container opens a file for write access with
Btrfs.
#### Writing new files
Writing a new file to a container invokes an allocate-on-demand operation to
allocate new data block to the container's snapshot. The file is then written
to this new space. The allocate-on-demand operation is native to all writes
with Btrfs and is the same as writing new data to a subvolume. As a result,
writing new files to a container's snapshot operates at native Btrfs speeds.
#### Modifying existing files
Updating an existing file in a container is a copy-on-write operation
(redirect-on-write is the Btrfs terminology). The original data is read from
the layer where the file currently exists, and only the modified blocks are
written into the container's writable layer. Next, the Btrfs driver updates the
filesystem metadata in the snapshot to point to this new data. This behavior
incurs minor overhead.
#### Deleting files or directories
If a container deletes a file or directory that exists in a lower layer, Btrfs
masks the existence of the file or directory in the lower layer. If a container
creates a file and then deletes it, this operation is performed in the Btrfs
filesystem itself and the space is reclaimed.
## Btrfs and Docker performance
There are several factors that influence Docker's performance under the `btrfs`
storage driver.
> [!NOTE]
>
> Many of these factors are mitigated by using Docker volumes for write-heavy
> workloads, rather than relying on storing data in the container's writable
> layer. However, in the case of Btrfs, Docker volumes still suffer from these
> draw-backs unless `/var/lib/docker/volumes/` isn't backed by Btrfs.
### Page caching
Btrfs doesn't support page cache sharing. This means that each process
accessing the same file copies the file into the Docker host's memory. As a
result, the `btrfs` driver may not be the best choice for high-density use cases
such as PaaS.
### Small writes
Containers performing lots of small writes (this usage pattern matches what
happens when you start and stop many containers in a short period of time, as
well) can lead to poor use of Btrfs chunks. This can prematurely fill the Btrfs
filesystem and lead to out-of-space conditions on your Docker host. Use `btrfs
filesys show` to closely monitor the amount of free space on your Btrfs device.
### Sequential writes
Btrfs uses a journaling technique when writing to disk. This can impact the
performance of sequential writes, reducing performance by up to 50%.
### Fragmentation
Fragmentation is a natural byproduct of copy-on-write filesystems like Btrfs.
Many small random writes can compound this issue. Fragmentation can manifest as
CPU spikes when using SSDs or head thrashing when using spinning disks. Either
of these issues can harm performance.
If your Linux kernel version is 3.9 or higher, you can enable the `autodefrag`
feature when mounting a Btrfs volume. Test this feature on your own workloads
before deploying it into production, as some tests have shown a negative impact
on performance.
### SSD performance
Btrfs includes native optimizations for SSD media. To enable these features,
mount the Btrfs filesystem with the `-o ssd` mount option. These optimizations
include enhanced SSD write performance by avoiding optimization such as seek
optimizations that don't apply to solid-state media.
### Balance Btrfs filesystems often
Use operating system utilities such as a `cron` job to balance the Btrfs
filesystem regularly, during non-peak hours. This reclaims unallocated blocks
and helps to prevent the filesystem from filling up unnecessarily. You can't
rebalance a totally full Btrfs filesystem unless you add additional physical
block devices to the filesystem.
See the [Btrfs
Wiki](https://btrfs.wiki.kernel.org/index.php/Balance_Filters#Balancing_to_fix_filesystem_full_errors).
### Use fast storage
Solid-state drives (SSDs) provide faster reads and writes than spinning disks.
### Use volumes for write-heavy workloads
Volumes provide the best and most predictable performance for write-heavy
workloads. This is because they bypass the storage driver and don't incur any
of the potential overheads introduced by thin provisioning and copy-on-write.
Volumes have other benefits, such as allowing you to share data among
containers and persisting even when no running container is using them.
## Related Information
- [Volumes](../volumes.md)
- [Understand images, containers, and storage drivers](index.md)
- [Select a storage driver](select-storage-driver.md)