mirror of https://github.com/docker/docs.git
Update garbage collection in DTR (#4598)
This commit is contained in:
parent
3f0c3c3297
commit
2788c56ce7
|
@ -1,101 +1,85 @@
|
||||||
---
|
---
|
||||||
description: Configure garbage collection in Docker Trusted Registry
|
title: Garbage collection
|
||||||
title: Docker Trusted Registry 2.2 Garbage Collection
|
description: Save disk space by configuring the garbage collection settings in
|
||||||
|
Docker Trusted Registry
|
||||||
keywords: registry, garbage collection, gc, space, disk space
|
keywords: registry, garbage collection, gc, space, disk space
|
||||||
---
|
---
|
||||||
|
|
||||||
#### TL;DR
|
You can configure Docker Trusted Registry to automatically delete unused image
|
||||||
|
layers, thus saving you disk space. This process is also known as garbage collection.
|
||||||
|
|
||||||
1. Garbage Collection (GC) reclaims disk space from your storage by deleting
|
## How DTR deletes unused layers
|
||||||
unused layers
|
|
||||||
2. GC can be configured to run automatically with a cron schedule, and can also
|
|
||||||
be run manually. Only admins can configure these
|
|
||||||
3. When GC runs DTR will be placed in read-only mode. Pulls will work but
|
|
||||||
pushes will fail
|
|
||||||
4. The UI will show when GC is running, and an admin can stop GC within the UI
|
|
||||||
|
|
||||||
**Important notes**
|
First you configure DTR to run a garbage collection job on a fixed schedule. At
|
||||||
|
the scheduled time:
|
||||||
|
|
||||||
The GC cron schedule is set to run in **UTC time**. Containers typically run in
|
1. DTR becomes read-only. Images can be pulled, but pushes are not allowed.
|
||||||
UTC time (unless the system time is mounted), therefore remember that the cron
|
2. DTR identifies and marks all unused image layers.
|
||||||
schedule will run based off of UTC time when configuring.
|
3. DTR deletes the marked image layers.
|
||||||
|
|
||||||
GC puts DTR into read only mode; pulls succeed while pushes fail. Pushing an
|
Since this process puts DTR in read-only mode and is CPU-intensive, you should
|
||||||
image while GC runs may lead to undefined behavior and data loss, therefore
|
run garbage collection jobs outside business peak hours.
|
||||||
this is disabled for safety. For this reason it's generally best practice to
|
|
||||||
ensure GC runs in the early morning on a Saturday or Sunday night.
|
|
||||||
|
|
||||||
|
## Schedule garbage collection
|
||||||
|
|
||||||
## Setting up garbage collection
|
Navigate to the **Settings** page, and choose **Garbage collection**.
|
||||||
|
|
||||||
You can set up GC if you're an admin by hitting "Settings" in the UI then
|
|
||||||
choosing "Garbage Collection". By default, GC will be disabled, showing this
|
|
||||||
screen:
|
|
||||||
|
|
||||||
{: .with-border}
|
{: .with-border}
|
||||||
|
|
||||||
Here you can configure GC to run **until it's done** or **with a timeout**.
|
Select for how long the garbage collection job should run:
|
||||||
The timeout ensures that your registry will be in read-only mode for a maximum
|
* Until done: Run the job until all unused image layers are deleted.
|
||||||
amount of time.
|
* For x minutes: Only run the garbage collection job for a maximum of x minutes
|
||||||
|
at a time.
|
||||||
|
* Never: Never delete unused image layers.
|
||||||
|
|
||||||
Select an option (either "Until done" or "For N minutes") and you'll have the
|
Once you select for how long to run the garbage collection job, you can
|
||||||
option to configure GC to run via a cron job, with several default crons
|
configure its schedule (in UTC time) using the cron format.
|
||||||
provided:
|
|
||||||
|
|
||||||
{: .with-border}
|
{: .with-border}
|
||||||
|
|
||||||
You can also choose "Do not repeat" to disable the cron schedule entirely.
|
Once everything is configured you can chose to **save & start** to immediately
|
||||||
|
run the garbage collection job, or just **save** to run the job on the next
|
||||||
|
scheduled interval.
|
||||||
|
|
||||||
Once the cron schedule has been configured (or disabled), you have the option to
|
## Stop the garbage collection job
|
||||||
the schedule ("Save") or save the schedule *and* start GC immediately ("Save
|
|
||||||
& Start").
|
|
||||||
|
|
||||||
## Stopping GC while it's running
|
Once the garbage collection job starts running, a banner is displayed on the
|
||||||
|
web UI explaining that users can't push images. If you're an administrator, you can click the banner to stop the garbage
|
||||||
When GC runs the garbage collection settings page looks as follows:
|
collection job.
|
||||||
|
|
||||||
{: .with-border}
|
{: .with-border}
|
||||||
|
|
||||||
Note the global banner visible to all users, ensuring everyone knows that GC is
|
## Under the hood
|
||||||
running.
|
|
||||||
|
|
||||||
An admin can stop the current GC process by hitting "Stop". This safely shuts
|
|
||||||
down the running GC job and moves the registry into read-write mode, ensuring
|
|
||||||
pushes work as expected.
|
|
||||||
|
|
||||||
## How does garbage collection work?
|
|
||||||
|
|
||||||
### Background: how images are stored
|
|
||||||
|
|
||||||
Each image stored in DTR is made up of multiple files:
|
Each image stored in DTR is made up of multiple files:
|
||||||
|
|
||||||
- A list of "layers", which represent the image's filesystem
|
* A list of image layers that represent the image filesystem.
|
||||||
- The "config" file, which dictates the OS, architecture and other image
|
* A configuration file that contains the architecture of the image and other
|
||||||
metadata
|
metadata.
|
||||||
- The "manifest", which is pulled first and lists all layers and the config file
|
* A manifest file containing the list of all layers and configuration file for
|
||||||
for the image.
|
an image.
|
||||||
|
|
||||||
All of these files are stored in a content-addressable manner. We take the
|
All these files are stored in a content-addressable way in which the name of
|
||||||
sha256 hash of the file's content and use the hash as the filename. This means
|
the file is the result of hashing the file's content. This means that if two
|
||||||
that if tag `example.com/user/blog:1.11.0` and `example.com/user/blog:latest`
|
image tags have exactly the same content, DTR only stores the image content
|
||||||
use the same layers we only store them once.
|
once, even if the tag name is different.
|
||||||
|
|
||||||
### How this impacts GC
|
As an example, if `wordpress:4.8` and `wordpress:latest` have the same content,
|
||||||
|
they will only be stored once. If you delete one of these tags, the other won't
|
||||||
|
be deleted.
|
||||||
|
|
||||||
Let's continue from the above example, where `example.com/user/blog:latest` and
|
This means that when users delete an image tag, DTR can't delete the underlying
|
||||||
`example.com/user/blog:1.11.0` point to the same image and use the same layers.
|
files of that image tag since it's possible that there are other tags that
|
||||||
If we delete `example.com/user/blog:latest` but *not*
|
also use the same files.
|
||||||
`example.com/user/blog:1.11.0` we expect that `example.com/user/blog:1.11.0`
|
|
||||||
can still be pulled.
|
|
||||||
|
|
||||||
This means that we can't delete layers when tags or manifests are deleted.
|
To delete unused image layers, DTR:
|
||||||
Instead, we need to pause writing and take reference counts to see how many
|
1. Becomes read-only to make sure that no one is able to push an image, thus
|
||||||
times a file is used. If the file is never used only then is it safe to delete.
|
changing the underlying files in the filesystem.
|
||||||
|
2. Check all the manifest files and keep a record of the files that are
|
||||||
|
referenced.
|
||||||
|
3. If a file is never referenced, that means that no image tag uses it, so it
|
||||||
|
can be safely deleted.
|
||||||
|
|
||||||
This is the basis of our "mark and sweep" collection:
|
## Where to go next
|
||||||
|
|
||||||
1. Iterate over all manifests in registry and record all files that are
|
* [Deploy DTR caches](deploy-caches/index.md)
|
||||||
referenced
|
|
||||||
2. Iterate over all file stored and check if the file is referenced by any
|
|
||||||
manifest
|
|
||||||
3. If the file is *not* referenced, delete it
|
|
||||||
|
|
Loading…
Reference in New Issue