Add GC guide to DTR (#1767)

* Add GC guide to DTR

* Add border to images
This commit is contained in:
Tony Holdstock-Brown 2017-02-17 17:10:39 -08:00 committed by Joao Fernandes
parent c35a42a1a3
commit cda96b5b61
5 changed files with 103 additions and 0 deletions

View File

@ -1323,6 +1323,8 @@ manuals:
title: Set up vulnerability scans
- path: /datacenter/dtr/2.2/guides/admin/configure/deploy-a-cache/
title: Deploy a cache
- path: /datacenter/dtr/2.2/guides/admin/configure/garbage-collection/
title: Garbage collection
- sectiontitle: Manage users
section:
- path: /datacenter/dtr/2.2/guides/admin/manage-users/

View File

@ -0,0 +1,101 @@
---
description: Configure garbage collection in Docker Trusted Registry
keyworkds: docker, registry, garbage collection, gc, space, disk space
title: Docker Trusted Registry 2.2 Garbage Collection
---
#### TL;DR
1. Garbage Collection (GC) reclaims disk space from your storage by deleting
unused layers
2. GC can be configured to run automatically with a cron schedule, and can also
be run manually. Only admins can configure these
3. When GC runs DTR will be placed in read-only mode. Pulls will work but
pushes will fail
4. The UI will show when GC is running, and an admin can stop GC within the UI
**Important notes**
The GC cron schedule is set to run in **UTC time**. Containers typically run in
UTC time (unless the system time is mounted), therefore remember that the cron
schedule will run based off of UTC time when configuring.
GC puts DTR into read only mode; pulls succeed while pushes fail. Pushing an
image while GC runs may lead to undefined behaviour and data loss, therefore
this is disabled for safety. For this reason it's generally best practice to
ensure GC runs in the early morning on a Saturday or Sunday night.
## Setting up garbage collection
You can set up GC if you're an admin by hitting "Settings" in the UI then
choosing "Garbage Collection". By default, GC will be disabled, showing this
screen:
![](../../images/garbage-collection-1.png){: .with-border}
Here you can configure GC to run **until it's done** or **with a timeout**.
The timeout ensures that your registry will be in read-only mode for a maximum
amount of time.
Select an option (either "Until done" or "For N minutes") and you'll have the
option to configure GC to run via a cron job, with several default crons
provided:
![](../../images/garbage-collection-2.png){: .with-border}
You can also choose "Do not repeat" to disable the cron schedule entirely.
Once the cron schedule has been configured (or disabled), you have the option to
the schedule ("Save") or save the schedule *and* start GC immediately ("Save
& Start").
## Stopping GC while it's running
When GC runs the garbage collection settings page looks as follows:
![](../../images/garbage-collection-3.png){: .with-border}
Note the global banner visible to all users, ensuring everyone knows that GC is
running.
An admin can stop the current GC process by hitting "Stop". This safely shuts
down the running GC job and moves the registry into read-write mode, ensuring
pushes work as expected.
## How does garbage collection work?
### Background: how images are stored
Each image stored in DTR is made up of multiple files:
- A list of "layers", which represent the image's filesystem
- The "config" file, which dictates the OS, architecture and other image
metadata
- The "manifest", which is pulled first and lists all layers and the config file
for the image.
All of these files are stored in a content-addressible manner. We take the
sha256 hash of the file's content and use the hash as the filename. This means
that if tag `example.com/user/blog:1.11.0` and `example.com/user/blog:latest`
use the same layers we only store them once.
### How this impacts GC
Let's continue from the above example, where `example.com/user/blog:latest` and
`example.com/user/blog:1.11.0` point to the same image and use the same layers.
If we delete `example.com/user/blog:latest` but *not*
`example.com/user/blog:1.11.0` we expect that `example.com/user/blog:1.11.0`
can still be pulled.
This means that we can't delete layers when tags or manifests are deleted.
Instead, we need to pause writing and take reference counts to see how many
times a file is used. If the file is never used only then is it safe to delete.
This is the basis of our "mark and sweep" collection:
1. Iterate over all manifests in registry and record all files that are
referenced
2. Iterate over all file stored and check if the file is referenced by any
manifest
3. If the file is *not* referenced, delete it

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB