Commit Graph

36 Commits

Author SHA1 Message Date
Giuseppe Scrivano e6a34f1f88
chunked: bump version number for cache file
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 065a2f3321)
2024-04-20 13:18:00 +02:00
Giuseppe Scrivano 53cb4adf9e
chunked: store file locations as binary
it reduces the cache file size by ~3%.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 9619a53b91)
2024-04-20 13:18:00 +02:00
Giuseppe Scrivano 950cac9339
chunked: store file names separately
so that the same file path is stored only once in the cache file.

After this change, the cache file measured on the fedora:{38,39,40}
images is in average ~6% smaller.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 59ac03970d)
2024-04-20 13:18:00 +02:00
Giuseppe Scrivano 98836f2647
chunked: use a bloom filter to speedup lookup
use a bloom filter to speed up lookup of digests in a cache file.

The biggest advantage is that it reduces page faults with the mmap'ed
cache file.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit e9a96e022d)
2024-04-20 13:18:00 +02:00
Giuseppe Scrivano 1cafa743c2
chunked: store file offset and length in binary format
it helps reducing the cache file size by ~25%.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit e6793e394c)
2024-04-20 13:18:00 +02:00
Giuseppe Scrivano 0d6e102042
chunked: store digest in binary format
use the binary representation for a given digest, it helps reducing
the file size by ~25%.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 33472545fb)
2024-04-20 13:18:00 +02:00
Giuseppe Scrivano 554c639d41
chunked: move cache file generation to separate function
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 397943be44)
2024-04-20 13:17:59 +02:00
Giuseppe Scrivano 69bedaa9a2
chunked: fix unmarshaling of file names
The getString() function was used to extract string values, but it
doesn't handle escaped characters.  Replace it with iter.ReadString()
that is slower but handles escaped characters correctly.

Closes: https://github.com/containers/storage/issues/1878

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit f388a77afb)
2024-04-20 13:17:59 +02:00
Giuseppe Scrivano 7b5a939b3b
chunked: use mmap to load cache files
reduce memory usage for the process by not loading entirely in memory
any cache file for the layers.

The memory mapped files can be shared among multiple instances of
Podman, as well as not being fully loaded in memory.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit 080dbaf67b)
2024-04-20 13:17:59 +02:00
Kir Kolyshkin fbd6ec62dc
pkg/chunked: rename metadata to cacheFile
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit f7e661fecc)
2024-04-20 13:17:59 +02:00
Giuseppe Scrivano 9d309f6d0c
chunked: refactor private fields to internal struct
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit f6356d6ccd)
2024-04-20 13:17:58 +02:00
Giuseppe Scrivano 9c4ea20528
chunked: add chunk size to cache file
include the chunk length in the generated  file location format,
This enhancement is designed to facilitate the use of the cache by external
tools which may not have knowledge of the chunk size.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-02-26 20:58:52 +01:00
Giuseppe Scrivano 4d6767d078
chunked: reject unexpected data after TOC
handle cases where there is unexpected data following the
manifest in the JSON document.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-11-29 11:51:49 +01:00
Giuseppe Scrivano 1b36426046
chunked: reuse json iterator
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-11-29 11:33:41 +01:00
Giuseppe Scrivano b737dc6caf
chunked: provide digest for empty files
if the file doesn't have a digest but its size is 0, we can hard code
the known sha256 digest.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-10-16 13:09:45 +02:00
Giuseppe Scrivano a50bb95770
chunked: support writing files in a flat dir format
so that they can be stored by their digest

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-07-04 17:45:41 +02:00
Giuseppe Scrivano f3a7e9c1ce
chunked: convert tag name to lowercase
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-06-17 00:31:40 +02:00
Giuseppe Scrivano d53cea918d
chunked: fix reading modtime from the TOC
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-06-17 00:31:39 +02:00
Kir Kolyshkin a4d8f720a2 Format sources with gofumpt
gofumpt is a superset of gofmt, enabling some more code formatting
rules.

This commit is brought to you by

	gofumpt -w .

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-05-26 16:17:31 -07:00
Daniel J Walsh a3204cf7e8
Move to golang 1.18 and later
Github.com is reporting security issues on older versions of
golang.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-04-03 15:26:54 -04:00
Miloslav Trmač a1ccc9d862 Use os.WriteFile instead of ioutil.WriteFile
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2022-09-12 16:31:34 +02:00
Miloslav Trmač 7635db182b Use io.ReadAll instead of ioutil.ReadAll
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2022-09-12 16:30:46 +02:00
Sascha Grunert 3455d12729
Switch to golang native error wrapping
We now use the golang error wrapping format specifier `%w` instead of the
deprecated github.com/pkg/errors package.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-07-07 13:22:46 +02:00
Giuseppe Scrivano 26c561f9a6
chunked: fix read layer cache
if the layer cache doesn't already exist, automatically create it from
the layer TOC.

commit 10697a05a2 introduced this
regression.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-02-14 12:36:00 +01:00
Giuseppe Scrivano 198820877c
pkg/chunked: add support for sparse files
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: https://github.com/containers/storage/issues/1091

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 13:32:13 +01:00
Giuseppe Scrivano fd89b93ef3
chunked: add tests for the cache
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 13:32:13 +01:00
Giuseppe Scrivano ab25eafc17
cache: parse the correct field for offset
commit 10697a05a2 introduced the issue.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 12:15:08 +01:00
Giuseppe Scrivano 96c0403bb1
cache: store correctly the digestLen field
commit 10697a05a2 introduced the issue.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-13 12:15:08 +01:00
Giuseppe Scrivano 31b28dbedf
chunked: use a RWMutex for the cache
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 0621da79cc
chunked: improve json parsing
reduce the number of allocations done by the parser by reading into a
bytes.Buffer.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 048f7c08ad
chunked: use json-iterator
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 10697a05a2
chunked: implement lookaside cache
avoid parsing each json TOC file for the layers in the local storage,
but attempt to create a lookaside cache in a custom format faster to
load (and potentially be mmap'able).

The same cache is used to lookup files, chunks and candidates for
deduplication with hard links.

There are 3 kind of digests stored:

- digest(file.payload))
- digest(digest(file.payload) + file.UID + file.GID + file.mode + file.xattrs)
- digest(i) for each i in chunks(file payload)

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-10 11:27:42 +01:00
Giuseppe Scrivano 526c57d8b0
chunked: reuse cache
try to reuse an existing cache object, instead of creating it for
every layer.

Set a time limit on how long it can be reused so to clean up stale
references.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-07 21:28:16 +01:00
Giuseppe Scrivano be4e8f622d
chunked: move copy logic to storage_linux.go
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-07 21:28:16 +01:00
Giuseppe Scrivano bfd9c8046e
chunked: chunk deduplication
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-01-07 21:28:15 +01:00
Giuseppe Scrivano f18141fa76
chunked: move cache to separate file
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-12-24 13:28:25 +01:00