Update garbage collection in DTR (#4598)

This commit is contained in:
Joao Fernandes 2017-09-12 11:31:37 -07:00 committed by GitHub
parent 3f0c3c3297
commit 2788c56ce7
1 changed files with 54 additions and 70 deletions

View File

@ -1,101 +1,85 @@
--- ---
description: Configure garbage collection in Docker Trusted Registry title: Garbage collection
title: Docker Trusted Registry 2.2 Garbage Collection description: Save disk space by configuring the garbage collection settings in
Docker Trusted Registry
keywords: registry, garbage collection, gc, space, disk space keywords: registry, garbage collection, gc, space, disk space
--- ---
#### TL;DR You can configure Docker Trusted Registry to automatically delete unused image
layers, thus saving you disk space. This process is also known as garbage collection.
1. Garbage Collection (GC) reclaims disk space from your storage by deleting ## How DTR deletes unused layers
unused layers
2. GC can be configured to run automatically with a cron schedule, and can also
be run manually. Only admins can configure these
3. When GC runs DTR will be placed in read-only mode. Pulls will work but
pushes will fail
4. The UI will show when GC is running, and an admin can stop GC within the UI
**Important notes** First you configure DTR to run a garbage collection job on a fixed schedule. At
the scheduled time:
The GC cron schedule is set to run in **UTC time**. Containers typically run in 1. DTR becomes read-only. Images can be pulled, but pushes are not allowed.
UTC time (unless the system time is mounted), therefore remember that the cron 2. DTR identifies and marks all unused image layers.
schedule will run based off of UTC time when configuring. 3. DTR deletes the marked image layers.
GC puts DTR into read only mode; pulls succeed while pushes fail. Pushing an Since this process puts DTR in read-only mode and is CPU-intensive, you should
image while GC runs may lead to undefined behavior and data loss, therefore run garbage collection jobs outside business peak hours.
this is disabled for safety. For this reason it's generally best practice to
ensure GC runs in the early morning on a Saturday or Sunday night.
## Schedule garbage collection
## Setting up garbage collection Navigate to the **Settings** page, and choose **Garbage collection**.
You can set up GC if you're an admin by hitting "Settings" in the UI then
choosing "Garbage Collection". By default, GC will be disabled, showing this
screen:
![](../../images/garbage-collection-1.png){: .with-border} ![](../../images/garbage-collection-1.png){: .with-border}
Here you can configure GC to run **until it's done** or **with a timeout**. Select for how long the garbage collection job should run:
The timeout ensures that your registry will be in read-only mode for a maximum * Until done: Run the job until all unused image layers are deleted.
amount of time. * For x minutes: Only run the garbage collection job for a maximum of x minutes
at a time.
* Never: Never delete unused image layers.
Select an option (either "Until done" or "For N minutes") and you'll have the Once you select for how long to run the garbage collection job, you can
option to configure GC to run via a cron job, with several default crons configure its schedule (in UTC time) using the cron format.
provided:
![](../../images/garbage-collection-2.png){: .with-border} ![](../../images/garbage-collection-2.png){: .with-border}
You can also choose "Do not repeat" to disable the cron schedule entirely. Once everything is configured you can chose to **save & start** to immediately
run the garbage collection job, or just **save** to run the job on the next
scheduled interval.
Once the cron schedule has been configured (or disabled), you have the option to ## Stop the garbage collection job
the schedule ("Save") or save the schedule *and* start GC immediately ("Save
& Start").
## Stopping GC while it's running Once the garbage collection job starts running, a banner is displayed on the
web UI explaining that users can't push images. If you're an administrator, you can click the banner to stop the garbage
When GC runs the garbage collection settings page looks as follows: collection job.
![](../../images/garbage-collection-3.png){: .with-border} ![](../../images/garbage-collection-3.png){: .with-border}
Note the global banner visible to all users, ensuring everyone knows that GC is ## Under the hood
running.
An admin can stop the current GC process by hitting "Stop". This safely shuts
down the running GC job and moves the registry into read-write mode, ensuring
pushes work as expected.
## How does garbage collection work?
### Background: how images are stored
Each image stored in DTR is made up of multiple files: Each image stored in DTR is made up of multiple files:
- A list of "layers", which represent the image's filesystem * A list of image layers that represent the image filesystem.
- The "config" file, which dictates the OS, architecture and other image * A configuration file that contains the architecture of the image and other
metadata metadata.
- The "manifest", which is pulled first and lists all layers and the config file * A manifest file containing the list of all layers and configuration file for
for the image. an image.
All of these files are stored in a content-addressable manner. We take the All these files are stored in a content-addressable way in which the name of
sha256 hash of the file's content and use the hash as the filename. This means the file is the result of hashing the file's content. This means that if two
that if tag `example.com/user/blog:1.11.0` and `example.com/user/blog:latest` image tags have exactly the same content, DTR only stores the image content
use the same layers we only store them once. once, even if the tag name is different.
### How this impacts GC As an example, if `wordpress:4.8` and `wordpress:latest` have the same content,
they will only be stored once. If you delete one of these tags, the other won't
be deleted.
Let's continue from the above example, where `example.com/user/blog:latest` and This means that when users delete an image tag, DTR can't delete the underlying
`example.com/user/blog:1.11.0` point to the same image and use the same layers. files of that image tag since it's possible that there are other tags that
If we delete `example.com/user/blog:latest` but *not* also use the same files.
`example.com/user/blog:1.11.0` we expect that `example.com/user/blog:1.11.0`
can still be pulled.
This means that we can't delete layers when tags or manifests are deleted. To delete unused image layers, DTR:
Instead, we need to pause writing and take reference counts to see how many 1. Becomes read-only to make sure that no one is able to push an image, thus
times a file is used. If the file is never used only then is it safe to delete. changing the underlying files in the filesystem.
2. Check all the manifest files and keep a record of the files that are
referenced.
3. If a file is never referenced, that means that no image tag uses it, so it
can be safely deleted.
This is the basis of our "mark and sweep" collection: ## Where to go next
1. Iterate over all manifests in registry and record all files that are * [Deploy DTR caches](deploy-caches/index.md)
referenced
2. Iterate over all file stored and check if the file is referenced by any
manifest
3. If the file is *not* referenced, delete it