Commit Graph

53 Commits

Author SHA1 Message Date
Kir Kolyshkin cfe8024bff ci: add codespell
1. Move codespell config out of Makefile, simplify (remove unused
   stuff).

2. Fix found issues (using codespell -w).

3. Add a codespell CI job.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-10-15 17:39:07 -07:00
Giuseppe Scrivano bcceac5657
chunked: prevent using an empty cache
readCacheFileFromMemory() returns nil, nil when the version
mismatches.  Do not attempt to use the cache if it was not
loaded.  Ignoring the layer will ensure that the cache will be
recreated with the correct version.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-10-13 00:16:37 +02:00
Giuseppe Scrivano e3664d50e0
chunked: ignore ErrLayerUnknown when creating cache
ignore the error if the layer is being deleted while we are processing
it without a lock on the store.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-09-30 10:12:43 +02:00
Giuseppe Scrivano e8ea2b2bb0
chunked: fix reuse of the layers cache
the global singleton was never updated, causing the cache to be always
recreated for each layer.

It is not possible to keep the layersCache mutex for the entire load()
since it calls into some store APIs causing a deadlock since
findDigestInternal() is already called while some store locks are
held.

Another benefit is that now only one goroutine can run load()
preventing multiple calls to load() to happen in parallel doing the
same work.

Closes: https://github.com/containers/storage/issues/2023

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-09-19 09:33:27 +02:00
Giuseppe Scrivano b1d948930f
chunked: drop timeout mechanism for cache
it is not clear if it is needed, so simplify it.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-09-19 09:33:26 +02:00
Miloslav Trmač db3139d8f0 Beautify a comment
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-09-05 19:53:51 +02:00
Miloslav Trmač cd7809d38b Be explicit about impact of not writing caches
A follow-up to https://github.com/containers/storage/pull/2031 .

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-09-03 20:42:58 +02:00
Giuseppe Scrivano 05334bc4cf
chunked: do not write cache file to RO store
if the layer is R/O, do not write a cache file.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-07-16 14:41:42 +02:00
Jan Rodák b7a2328b7e
Fix errcheck: error return value of `unix.Munmap` is not checked
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
2024-07-09 14:34:21 +02:00
Giuseppe Scrivano b89a34a789
chunked: check for digest len validity
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-10 15:20:47 +02:00
Giuseppe Scrivano adde691d56
chunked: check vdata length
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-10 15:20:47 +02:00
Giuseppe Scrivano 58151fed05
chunked: specify a maximum length for the tags len
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-10 15:20:47 +02:00
Colin Walters 6371f588df chunked: Fix two minor linter issues
My IDE runs a linter by default, and these two show up.
For the file one, it's because `Fd()` returns `uintptr`
which is unsigned and can't be negative.  IOW, a `File`
object should always be a valid opened fd.

Signed-off-by: Colin Walters <walters@verbum.org>
2024-06-07 10:15:11 -04:00
Miloslav Trmač b5413c2bd6 Move the tar-split digest value into the TOC
... so that we can uniquely identify partially-pulled layers
by the TOC digest.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-05-14 10:53:03 +02:00
Miloslav Trmač 8dd381ecf3 Refactor unmarshalTOC to use a switch
This is a microptimization, we call strings.ToLower only
once, but more importantly it will make it easier to add
more fields.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2024-05-14 10:53:03 +02:00
Giuseppe Scrivano 881107d836
chunked: skip cache file for non-partial layers
if the layer does not have a manifest TOC, just ignore it instead of
raising a warning.  There is no need to create a cache file since
there is no manifest file to parse.

Closes: https://github.com/containers/storage/issues/1909

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-24 17:18:33 +02:00
Giuseppe Scrivano e63e3002e1
chunked: downgrade loading cache file msg to info
it can happen for any reason, like for example using a new cache file
format, in this case the file is recreated with the last version.
This is internal only and should not be displayed by default.

Closes: https://github.com/containers/storage/issues/1905

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-23 17:36:18 +02:00
Giuseppe Scrivano 065a2f3321
chunked: bump version number for cache file
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 21:28:16 +02:00
Giuseppe Scrivano 9619a53b91
chunked: store file locations as binary
it reduces the cache file size by ~3%.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 21:28:16 +02:00
Giuseppe Scrivano 59ac03970d
chunked: store file names separately
so that the same file path is stored only once in the cache file.

After this change, the cache file measured on the fedora:{38,39,40}
images is in average ~6% smaller.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 21:28:16 +02:00
Giuseppe Scrivano e9a96e022d
chunked: use a bloom filter to speedup lookup
use a bloom filter to speed up lookup of digests in a cache file.

The biggest advantage is that it reduces page faults with the mmap'ed
cache file.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 21:28:16 +02:00
Giuseppe Scrivano e6793e394c
chunked: store file offset and length in binary format
it helps reducing the cache file size by ~25%.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 21:28:15 +02:00
Giuseppe Scrivano 33472545fb
chunked: store digest in binary format
use the binary representation for a given digest, it helps reducing
the file size by ~25%.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 21:28:15 +02:00
Giuseppe Scrivano 397943be44
chunked: move cache file generation to separate function
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 11:23:07 +02:00
Giuseppe Scrivano f388a77afb
chunked: fix unmarshaling of file names
The getString() function was used to extract string values, but it
doesn't handle escaped characters.  Replace it with iter.ReadString()
that is slower but handles escaped characters correctly.

Closes: https://github.com/containers/storage/issues/1878

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-09 22:03:13 +02:00
Giuseppe Scrivano 080dbaf67b
chunked: use mmap to load cache files
reduce memory usage for the process by not loading entirely in memory
any cache file for the layers.

The memory mapped files can be shared among multiple instances of
Podman, as well as not being fully loaded in memory.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-03-20 20:51:37 +01:00
Kir Kolyshkin f7e661fecc
pkg/chunked: rename metadata to cacheFile
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-03-20 17:40:12 +01:00
Giuseppe Scrivano f6356d6ccd
chunked: refactor private fields to internal struct
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-03-20 15:47:37 +01:00
Giuseppe Scrivano 9c4ea20528
chunked: add chunk size to cache file
include the chunk length in the generated  file location format,
This enhancement is designed to facilitate the use of the cache by external
tools which may not have knowledge of the chunk size.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-02-26 20:58:52 +01:00
Giuseppe Scrivano 4d6767d078
chunked: reject unexpected data after TOC
handle cases where there is unexpected data following the
manifest in the JSON document.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-11-29 11:51:49 +01:00
Giuseppe Scrivano 1b36426046
chunked: reuse json iterator
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-11-29 11:33:41 +01:00
Giuseppe Scrivano b737dc6caf
chunked: provide digest for empty files
if the file doesn't have a digest but its size is 0, we can hard code
the known sha256 digest.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-10-16 13:09:45 +02:00
Giuseppe Scrivano a50bb95770
chunked: support writing files in a flat dir format
so that they can be stored by their digest

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-07-04 17:45:41 +02:00
Giuseppe Scrivano f3a7e9c1ce
chunked: convert tag name to lowercase
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-06-17 00:31:40 +02:00
Giuseppe Scrivano d53cea918d
chunked: fix reading modtime from the TOC
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-06-17 00:31:39 +02:00
Kir Kolyshkin a4d8f720a2 Format sources with gofumpt
gofumpt is a superset of gofmt, enabling some more code formatting
rules.

This commit is brought to you by

	gofumpt -w .

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-05-26 16:17:31 -07:00
Daniel J Walsh a3204cf7e8
Move to golang 1.18 and later
Github.com is reporting security issues on older versions of
golang.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-04-03 15:26:54 -04:00
Miloslav Trmač a1ccc9d862 Use os.WriteFile instead of ioutil.WriteFile
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2022-09-12 16:31:34 +02:00
Miloslav Trmač 7635db182b Use io.ReadAll instead of ioutil.ReadAll
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2022-09-12 16:30:46 +02:00
Sascha Grunert 3455d12729
Switch to golang native error wrapping
We now use the golang error wrapping format specifier `%w` instead of the
deprecated github.com/pkg/errors package.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-07-07 13:22:46 +02:00
Giuseppe Scrivano 26c561f9a6
chunked: fix read layer cache
if the layer cache doesn't already exist, automatically create it from
the layer TOC.

commit 10697a05a2 introduced this
regression.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-02-14 12:36:00 +01:00
Giuseppe Scrivano 198820877c
pkg/chunked: add support for sparse files
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: https://github.com/containers/storage/issues/1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 13:32:13 +01:00
Giuseppe Scrivano fd89b93ef3
chunked: add tests for the cache
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 13:32:13 +01:00
Giuseppe Scrivano ab25eafc17
cache: parse the correct field for offset
commit 10697a05a2 introduced the issue.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 12:15:08 +01:00
Giuseppe Scrivano 96c0403bb1
cache: store correctly the digestLen field
commit 10697a05a2 introduced the issue.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 12:15:08 +01:00
Giuseppe Scrivano 31b28dbedf
chunked: use a RWMutex for the cache
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 0621da79cc
chunked: improve json parsing
reduce the number of allocations done by the parser by reading into a
bytes.Buffer.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 048f7c08ad
chunked: use json-iterator
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 10697a05a2
chunked: implement lookaside cache
avoid parsing each json TOC file for the layers in the local storage,
but attempt to create a lookaside cache in a custom format faster to
load (and potentially be mmap'able).

The same cache is used to lookup files, chunks and candidates for
deduplication with hard links.

There are 3 kind of digests stored:

- digest(file.payload))
- digest(digest(file.payload) + file.UID + file.GID + file.mode + file.xattrs)
- digest(i) for each i in chunks(file payload)

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 526c57d8b0
chunked: reuse cache
try to reuse an existing cache object, instead of creating it for
every layer.

Set a time limit on how long it can be reused so to clean up stale
references.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-07 21:28:16 +01:00