mirror of https://github.com/docker/docs.git
Update garbage-collection.md
- Update under the hood section as per @davidswu .
This commit is contained in:
parent
1bd0010618
commit
9b24c86429
|
|
@ -61,34 +61,49 @@ with v2.6.0. If you clicked **Save & Start** previously, verify that the garbage
|
||||||
|
|
||||||
## Under the hood
|
## Under the hood
|
||||||
|
|
||||||
|
## Under the hood
|
||||||
|
|
||||||
Each image stored in DTR is made up of multiple files:
|
Each image stored in DTR is made up of multiple files:
|
||||||
|
|
||||||
* A list of image layers that represent the image filesystem.
|
* A list of image layers that are unioned which represents the image filesystem
|
||||||
* A configuration file that contains the architecture of the image and other
|
* A configuration file that contains the architecture of the image and other
|
||||||
metadata.
|
metadata
|
||||||
* A manifest file containing the list of all layers and configuration file for
|
* A manifest file containing the list of all layers and configuration file for
|
||||||
an image.
|
an image
|
||||||
|
|
||||||
All these files are stored in a content-addressable way in which the name of
|
All these files are tracked in DTR's metadata store in RethinkDB. These files
|
||||||
the file is the result of hashing the file's content. This means that if two
|
are tracked in a content-addressable way such that a file corresponds to
|
||||||
image tags have exactly the same content, DTR only stores the image content
|
a cryptographic hashing the file's content. This means that if two image tags hold exactly the same content,
|
||||||
once, even if the tag name is different.
|
DTR only stores the image content once while making hash collisions nearly impossible,
|
||||||
|
even if the tag name is different.
|
||||||
|
|
||||||
As an example, if `wordpress:4.8` and `wordpress:latest` have the same content,
|
As an example, if `wordpress:4.8` and `wordpress:latest` have the same content,
|
||||||
they will only be stored once. If you delete one of these tags, the other won't
|
the content will only be stored once. If you delete one of these tags, the other won't
|
||||||
be deleted.
|
be deleted.
|
||||||
|
|
||||||
This means that when users delete an image tag, DTR can't delete the underlying
|
This means that when you delete an image tag, DTR cannot delete the underlying
|
||||||
files of that image tag since it's possible that there are other tags that
|
files of that image tag since other tags may also use the same files.
|
||||||
also use the same files.
|
|
||||||
|
|
||||||
To delete unused image layers, DTR:
|
To facilitate online garbage collection, DTR makes a couple of changes to how it uses the storage
|
||||||
1. Becomes read-only to make sure that no one is able to push an image, thus
|
backend that are distinct from the open-source Docker registry:
|
||||||
changing the underlying files in the filesystem.
|
1. Layer links – the references within repository directories to
|
||||||
2. Check all the manifest files and keep a record of the files that are
|
their associated blobs – are no longer in the storage backend. That is because DTR stores these references in RethinkDB instead to enumerate through
|
||||||
referenced.
|
them during the marking phase of garbage collection.
|
||||||
3. If a file is never referenced, that means that no image tag uses it, so it
|
|
||||||
can be safely deleted.
|
2. Any layers created after an upgrade to 2.6 are no longer content-addressed in
|
||||||
|
the storage backend. Many cloud provider backends do not give the sequential
|
||||||
|
consistency guarantees required to deal with the simultaneous deleting and
|
||||||
|
re-pushing of a layer in a predictable manner. To account for this, DTR assigns
|
||||||
|
each newly pushed layer a unique ID and performs the translation from content hash
|
||||||
|
to ID in RethinkDB.
|
||||||
|
|
||||||
|
To delete unused files, DTR does the following:
|
||||||
|
1. Establish a cutoff time
|
||||||
|
1. Mark each referenced manifest file with a timestamp. When manifest files
|
||||||
|
are pushed to DTR, they are also marked with a timestamp
|
||||||
|
2. Sweep each manifest file that does not have a timestamp after the cutoff time
|
||||||
|
3. If a file is never referenced – which means no image tag uses it – delete the file
|
||||||
|
4. Repeat the process for blob links and blob descriptors.
|
||||||
|
|
||||||
## Where to go next
|
## Where to go next
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue