so that the same file path is stored only once in the cache file.
After this change, the cache file measured on the fedora:{38,39,40}
images is in average ~6% smaller.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 59ac03970d)
use a bloom filter to speed up lookup of digests in a cache file.
The biggest advantage is that it reduces page faults with the mmap'ed
cache file.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit e9a96e022d)
use the binary representation for a given digest, it helps reducing
the file size by ~25%.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 33472545fb)
The getString() function was used to extract string values, but it
doesn't handle escaped characters. Replace it with iter.ReadString()
that is slower but handles escaped characters correctly.
Closes: https://github.com/containers/storage/issues/1878
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit f388a77afb)
reduce memory usage for the process by not loading entirely in memory
any cache file for the layers.
The memory mapped files can be shared among multiple instances of
Podman, as well as not being fully loaded in memory.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 080dbaf67b)
include the chunk length in the generated file location format,
This enhancement is designed to facilitate the use of the cache by external
tools which may not have knowledge of the chunk size.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
gofumpt is a superset of gofmt, enabling some more code formatting
rules.
This commit is brought to you by
gofumpt -w .
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
We now use the golang error wrapping format specifier `%w` instead of the
deprecated github.com/pkg/errors package.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
if the layer cache doesn't already exist, automatically create it from
the layer TOC.
commit 10697a05a2 introduced this
regression.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.
The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.
Closes: https://github.com/containers/storage/issues/1091
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
avoid parsing each json TOC file for the layers in the local storage,
but attempt to create a lookaside cache in a custom format faster to
load (and potentially be mmap'able).
The same cache is used to lookup files, chunks and candidates for
deduplication with hard links.
There are 3 kind of digests stored:
- digest(file.payload))
- digest(digest(file.payload) + file.UID + file.GID + file.mode + file.xattrs)
- digest(i) for each i in chunks(file payload)
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
try to reuse an existing cache object, instead of creating it for
every layer.
Set a time limit on how long it can be reused so to clean up stale
references.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>