build: update gc documentation

Signed-off-by: David Karlsson <35727626+dvdksn@users.noreply.github.com>
This commit is contained in:
David Karlsson 2024-12-11 05:36:06 +01:00
parent 3f815b9811
commit 841ca61ae2
1 changed files with 249 additions and 55 deletions

View File

@ -8,36 +8,155 @@ aliases:
While [`docker builder prune`](/reference/cli/docker/builder/prune.md) While [`docker builder prune`](/reference/cli/docker/builder/prune.md)
or [`docker buildx prune`](/reference/cli/docker/buildx/prune.md) or [`docker buildx prune`](/reference/cli/docker/buildx/prune.md)
commands run at once, garbage collection runs periodically and follows an commands run at once, Garbage Collection (GC) runs periodically and follows an
ordered list of prune policies. ordered list of prune policies. The BuildKit daemon clears the build cache when
the cache size becomes too big, or when the cache age expires.
Garbage collection runs in the BuildKit daemon. The daemon clears the build For most users, the default GC behavior is sufficient and doesn't require any
cache when the cache size becomes too big, or when the cache age expires. The intervention. Advanced users, particularly those working with large-scale
following sections describe how you can configure both the size and age builds, self-managed builders, or constrained storage environments, might
parameters by defining garbage collection policies. benefit from customizing these settings to better align with their workflow
needs. The following sections explain how GC works and provide guidance on
tailoring its behavior through custom configuration.
Each of the policy's parameters corresponds with a `docker buildx prune` command line ## Garbage collection policies
argument. Details can be found in the
`docker buildx prune` [documentation](/reference/cli/docker/buildx/prune.md). GC policies define a set of rules that determine how the build cache is managed
and cleaned up. These policies include criteria for when to remove cache
entries, such as the age of the cache, the amount of space being used, and the
type of cache records to prune.
Each GC policy is evaluated in sequence, starting with the most specific
criteria, and proceeds to broader rules if previous policies do not free up
enough cache. This lets BuildKit prioritize cache entries, preserving the most
valuable cache while ensuring the system maintains performance and
availability.
For example, say you have the following GC policies:
1. Find "stale" cache records that haven't been used in the past 48 hours, and
delete records until there's maximum 5GB of "stale" cache left.
2. If the build cache size exceeds 10GB, delete records until the total cache
size is no more than 10GB.
The first rule is more specific, prioritizing stale cache records and setting a
lower limit for a less valuable type of cache. The second rule imposes a higher
hard limit that applies to any type of cache records. With these policies, if
you have 11GB worth of build cache, where:
- 7GB of which is "stale" cache
- 4GB is other, more valuable cache
A GC sweep would delete 5GB of stale cache as part of the 1st policy, with a
remainder of 6GB, meaning the 2nd policy does not need to clear any more cache.
The default GC policies are (approximately):
1. Remove cache that can be easily regenerated, such as build contexts from
local directories or remote Git repositories, and cache mounts, if hasn't
been used for more than 48 hours.
2. Remove cache that hasn't been used in a build for more than 60 days.
3. Remove unshared cache that exceeds the build cache size limit. Unshared
cache records refers to layer blobs that are not used by other resources
(typically, as image layers).
4. Remove any build cache that exceeds the build cache size limit.
The precise algorithm and the means of configuring the policies differ slightly
depending on what kind of builder you're using. Refer to
[Configuration](#configuration) for more details.
## Configuration ## Configuration
Depending on the [driver](../builders/drivers/_index.md) used by your builder instance, > [!NOTE]
the garbage collection will use a different configuration file. > If you're satisfied with the default garbage collection behavior and don't
> need to fine-tune its settings, you can skip this section. Default
> configurations work well for most use cases and require no additional setup.
If you're using the [`docker` driver](../builders/drivers/docker.md), garbage collection Depending on the type of [build driver](../builders/drivers/_index.md) you use,
can be configured in the [Docker Daemon configuration](/reference/cli/dockerd.md#daemon-configuration-file). you will use different configuration files to change the builder's GC settings:
file:
- If you use the default builder for Docker Engine (the `docker` driver), use
the [Docker daemon configuration file](#docker-daemon-configuration-file).
- If you use a custom builder, use a [BuildKit configuration file](#buildkit-configuration-file).
### Docker daemon configuration file
If you're using the default [`docker` driver](../builders/drivers/docker.md),
GC is configured in the [`daemon.json` configuration file](/reference/cli/dockerd.md#daemon-configuration-file),
or if you use Docker Desktop, in [**Settings > Docker Engine**](/manuals/desktop/settings-and-maintenance/settings.md).
The following snippet shows the default builder configuration for the `docker`
driver for Docker Desktop users:
```json
{
"builder": {
"gc": {
"defaultKeepStorage": "20GB",
"enabled": true
}
}
}
```
The `defaultKeepStorage` option configures the size limit of the build cache,
which influences the GC policies. The default policies for the `docker` driver
work as follows:
1. Remove ephemeral, unused build cache older than 48 hours if it exceeds 13.8%
of `defaultKeepStorage`, or at minimum 512MB.
2. Remove unused build cache older than 60 days.
3. Remove unshared build cache that exceeds the `defaultKeepStorage` limit.
4. Remove any build cache that exceeds the `defaultKeepStorage` limit.
Given the Docker Desktop default value for `defaultKeepStorage` of 20GB, the
default GC policies resolve to:
```json ```json
{ {
"builder": { "builder": {
"gc": { "gc": {
"enabled": true, "enabled": true,
"defaultKeepStorage": "10GB",
"policy": [ "policy": [
{ "keepStorage": "10GB", "filter": ["unused-for=2200h"] }, {
{ "keepStorage": "50GB", "filter": ["unused-for=3300h"] }, "keepStorage": "2.764GB",
"filter": [
"unused-for=48h",
"type==source.local,type==exec.cachemount,type==source.git.checkout"
]
},
{ "keepStorage": "20GB", "filter": ["unused-for=1440h"] },
{ "keepStorage": "20GB" },
{ "keepStorage": "20GB", "all": true }
]
}
}
}
```
The easiest way to tweak the build cache configuration for the `docker` driver
is to adjust the `defaultKeepStorage` option:
- Increase the limit if you feel like you think the GC is too aggressive.
- Decrease the limit if you need to preserve space.
If you need even more control, you can define your own GC policies directly.
The following example defines a more conservative GC configuration with the
following policies:
1. Remove unused cache entries older than 1440 hours, or 60 days, if build cache exceeds 50GB.
2. Remove unshared cache entries if build cache exceeds 50GB.
3. Remove any cache entries if build cache exceeds 100GB.
```json
{
"builder": {
"gc": {
"enabled": true,
"defaultKeepStorage": "50GB",
"policy": [
{ "keepStorage": "0", "filter": ["unused-for=1440h"] },
{ "keepStorage": "0" },
{ "keepStorage": "100GB", "all": true } { "keepStorage": "100GB", "all": true }
] ]
} }
@ -45,52 +164,127 @@ file:
} }
``` ```
For other drivers, garbage collection can be configured using the Policies 1 and 2 here set `keepStorage` to `0`, which means they'll fall back
[BuildKit configuration](../buildkit/toml-configuration.md) file: to the default limit of 50GB as defined by `defaultKeepStorage`.
### BuildKit configuration file
For build drivers other than `docker`, GC is configured using a
[`buildkitd.toml`](../buildkit/toml-configuration.md) configuration file. This
file uses the following high-level configuration options that you can use to
tweak the thresholds for how much disk space BuildKit should use for cache:
| Option | Description | Default value |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- |
| `reservedSpace` | The minimum amount of disk space BuildKit is allowed to allocate for cache. Usage below this threshold will not be reclaimed during garbage collection. | 10% of total disk space or 10GB (whichever is lower) |
| `maxUsedSpace` | The maximum amount of disk space that BuildKit is allowed to use. Usage above this threshold will be reclaimed during garbage collection. | 60% of total disk space or 100GB (whichever is lower) |
| `minFreeSpace` | The amount of disk space that must be kept free. | 20GB |
You can set these options either as number of bytes, a unit string (for
example, `512MB`), or as a percentage of the total disk size. Changing these
options influences the default GC policies used by the BuildKit worker. With
the default thresholds, the GC policies resolve as follows:
```toml ```toml
# Global defaults
[worker.oci] [worker.oci]
gc = true gc = true
gckeepstorage = 10000 reservedSpace = "10GB"
maxUsedSpace = "100GB"
minFreeSpace = "20%"
# Policy 1
[[worker.oci.gcpolicy]] [[worker.oci.gcpolicy]]
keepBytes = 512000000
keepDuration = 172800
filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout" ] filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout" ]
keepDuration = "48h"
maxUsedSpace = "512MB"
# Policy 2
[[worker.oci.gcpolicy]]
keepDuration = "1440h" # 60 days
reservedSpace = "10GB"
maxUsedSpace = "100GB"
# Policy 3
[[worker.oci.gcpolicy]]
reservedSpace = "10GB"
maxUsedSpace = "100GB"
# Policy 4
[[worker.oci.gcpolicy]] [[worker.oci.gcpolicy]]
all = true all = true
keepBytes = 1024000000 reservedSpace = "10GB"
maxUsedSpace = "100GB"
``` ```
## Default policies In practical terms, this means:
Default garbage collection policies apply to all builders if not set: - Policy 1: If the build cache exceeds 512MB, BuildKit removes cache records
for local build contexts, remote Git contexts, and cache mounts that havent
been used in the last 48 hours.
- Policy 2: If disk usage exceeds 100GB, unshared build cache older than 60
days is removed, ensuring at least 10GB of disk space is reserved for cache.
- Policy 3: If disk usage exceeds 100GB, any unshared cache is removed,
ensuring at least 10GB of disk space is reserved for cache.
- Policy 4: If disk usage exceeds 100GB, all cache—including shared and
internal records—is removed, ensuring at least 10GB of disk space is reserved
for cache.
```text `reservedSpace` has the highest priority in defining the lower limit for build
GC Policy rule#0: cache size. If `maxUsedSpace` or `minFreeSpace` would define a lower value, the
All: false minimum cache size would never be brought below `reservedSpace`.
Filters: type==source.local,type==exec.cachemount,type==source.git.checkout
Keep Duration: 48h0m0s If both `reservedSpace` and `maxUsedSpace` are set, a GC sweep results in a
Keep Bytes: 512MB cache size between those thresholds. For example, if `reservedSpace` is set to
GC Policy rule#1: 10GB, and `maxUsedSpace` is set to 20GB, the resulting amount of cache after a
All: false GC run is less than 20GB, but at least 10GB.
Keep Duration: 1440h0m0s
Keep Bytes: 26GB You can also define completely custom GC policies. Custom policies also let you
GC Policy rule#2: define filters, which lets you pinpoint the types of cache entries that a given
All: false policy is allowed to prune.
Keep Bytes: 26GB
GC Policy rule#3: #### Custom GC policies in BuildKit
All: true
Keep Bytes: 26GB Custom GC policies let you fine-tune how BuildKit manages its cache, and gives
you full control over cache retention based on criteria such as cache type,
duration, or disk space thresholds. If you need full control over the cache
thresholds and how cache records should be prioritized, defining custom GC
policies is the way to go.
To define a custom GC policy, use the `[[worker.oci.gcpolicy]]` configuration
block in `buildkitd.toml`. Each policy define the thresholds that will be used
for that policy. The global values for `reservedSpace`, `maxUsedSpace`, and
`minFreeSpace` do not apply if you use custom policies.
Heres an example configuration:
```toml
# Custom GC Policy 1: Remove unused local contexts older than 24 hours
[[worker.oci.gcpolicy]]
filters = ["type==source.local"]
keepDuration = "24h"
reservedSpace = "5GB"
maxUsedSpace = "50GB"
# Custom GC Policy 2: Remove remote Git contexts older than 30 days
[[worker.oci.gcpolicy]]
filters = ["type==source.git.checkout"]
keepDuration = "720h"
reservedSpace = "5GB"
maxUsedSpace = "30GB"
# Custom GC Policy 3: Aggressively clean all cache if disk usage exceeds 90GB
[[worker.oci.gcpolicy]]
all = true
reservedSpace = "5GB"
maxUsedSpace = "90GB"
``` ```
- `rule#0`: if build cache uses more than 512MB delete the most easily In addition to the `reservedSpace`, `maxUsedSpace`, and `minFreeSpace` threshold,
reproducible data after it has not been used for 2 days. when defining a GC policy you have two additional configuration options:
- `rule#1`: remove any data not used for 60 days.
- `rule#2`: keep the unshared build cache under cap.
- `rule#3`: if previous policies were insufficient start deleting internal data
to keep build cache under cap.
> [!NOTE] - `all`: By default, BuildKit will exclude some cache records from being pruned
> during GC. Setting this option to `true` will allow any cache records to be
> `Keep Bytes` defaults to 10% of the size of the disk. If the disk size cannot pruned.
> be determined, it uses 2GB as a fallback. - `filters`: Filters let you specify specific types of cache records that a GC
policy is allowed to prune.