I was just reading the code and I have a mental checklist item
for "invoking open without O_CLOEXEC" that triggered here.
(See also e.g.
https://github.com/containers/composefs/pull/185#discussion_r1322925050
)
It has security-relevant properties for us, xref
CVE-2024-21626 for example.
This isn't the only missing variant of this in this codebase,
just using this targeted PR to test the waters for more PRs.
Signed-off-by: Colin Walters <walters@verbum.org>
Increase the threshold for auto-merging parts from 128 to 1024. This change
aims to reduce the number of parts in an HTTP multi-range request, thus
increasing the likelihood that the server will accept the request.
The previous threshold of 128 often resulted in a large number of small
ranges, which could lead to HTTP multi-range requests being rejected by
servers due to the excessive number of parts.
It partially addresses the reported issue.
Reported-by: https://github.com/containers/storage/issues/1928
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Other TOC formats don't fill the data in.
For now, this only increases memory usage, but we will
need the data soon.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This code path is usually never triggered because
the annotations are present; and it was broken until recently.
Remove it to simplify the code and analysis.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Make it structually clear that the code is all using the same value,
making it less likely for the verifier and other uses to get out of sync.
Also avoids some redundant parsing and error paths.
The conversion path looks longer, but that's just moving the parsing
from the called function (which is redundant for other callers).
Should not change behavior.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Make it structually clear that the code is all using the same value,
making it less likely for the verifier and other uses to get out of sync.
Also avoids some redundant parsing and error paths.
Should not change behavior.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
if the file is created using the object-store flat directory format,
there is no need to set its inodes attributes, as anyway they are
ignored when creating the composefs binary blob.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
improve the function that combines neighbor chunks. Instead of using
the number of parts, which also includes local files, use only the
number of chunks that must be retrieved from the network.
In addition, introduce a threshold limit to merge chunks so that we
further reduce the number of requested ranges.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
move the check for `enable_partial_images` to GetDiffer so that it
doesn't attempt any operation if the feature is disabled.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
by default enable pulling a partial image, it is still possible to
disable the feature through the configuration file.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
even if we validated the full layer, report the TOC Digest as well so
the upper layer can use both.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
store the UncompressedDigest when the original tarball was converted
to zstd:chunked, since its diffID was computed and validated.
In this way the layer can be reused as any other layer that was fully
retrieved and validated.
Before this change, a layer that was converted to zstd:chunked was
always retrieved since it has not a TOC Digest.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it prevents clobbering the chunk .Size element later. This filed was
ignored previously, but composefs uses it to retrieve the file size.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
reject a layer if it contains both a zstd:chunked and an eStargz TOC
since there are no guarantees that the two TOCs are consistent.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
drop the rootless argument from DefaultStoreOptions and
UpdateStoreOptions since this can be retrieved internally through the
unshare package.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
so that the users of the function can get access to the already
unmarshalled TOC instead of having to unmarshal it again.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the "convert_images" option is set in the configuration file, then
convert traditional images to the chunked format on the fly.
This is very expensive at the moment since the entire zstd:chunked
file is created and then processed.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
change the file format to store the tar-split as part of the
zstd:chunked image. This will allow clients to rebuild the entire
tarball without having to download it fully.
also store the uncompressed digest for the tarball, so that it can be
stored into the storage database.
Needs: https://github.com/containers/image/pull/1976
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
gofumpt is a superset of gofmt, enabling some more code formatting
rules.
This commit is brought to you by
gofumpt -w .
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Two error messages suggest that podman-system-migrate is a binary that
can be run, when the command is "podman system migrate".
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of the
deprecated github.com/pkg/errors package.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
drop host deduplication by just looking at the file path. It could be
useful in very specific use cases, but it is too expensive for generic
images. If the need arises, we first need to create an index of the
files that we can deduplicate so there is no need to calculate the
checksum on the fly.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
after the missing parts are merged, it is necessary to recalculate the
chunks to ask to the server.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Based on a conversation on the Podman mailing list:
Mentioning podman-system-migrate in the error message may help users
resolve their issues faster.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.
The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.
Closes: https://github.com/containers/storage/issues/1091
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
avoid parsing each json TOC file for the layers in the local storage,
but attempt to create a lookaside cache in a custom format faster to
load (and potentially be mmap'able).
The same cache is used to lookup files, chunks and candidates for
deduplication with hard links.
There are 3 kind of digests stored:
- digest(file.payload))
- digest(digest(file.payload) + file.UID + file.GID + file.mode + file.xattrs)
- digest(i) for each i in chunks(file payload)
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
try to reuse an existing cache object, instead of creating it for
every layer.
Set a time limit on how long it can be reused so to clean up stale
references.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it solves a problem where the discard could be performed before the
compression handler was closed (through a deferred call).
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
add a fallback mechanism when openat2 is not supported by the
underlying kernel.
If a call to openat2 fails with ENOSYS, then fallback to a user space
lookup. Generally the user space lookup is not safe, since symlinks
lookups are vulnerable to TOCTOU attacks, but in this case where the
rootfs is being created, there are no other processes modifying it
thus such lookups can be considered safe.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when dealing with symlink, open the parent directory and use the
symlink basename to set its attributes.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when creating a new file, handle the case where any of the parent
directories are missing and create them automatically if needed.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
since we now support reading additional IDs with libsubid, clarify
that the /etc/subuid and /etc/subgid files are honored only when
shadow-utils is configured to use them.
[NO TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Using unix.AT_EMPTY_PATH requires CAP_DAC_READ_SEARCH. Use an
equivalent variant that uses /proc/self/fd that can be used with
rootless.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the option ostree_repos is set:
[storage.options]
pull_options = {enable_partial_images = "true", ostree_repos = "/foo:/bar"}
then attempt to deduplicate from the specified list of OSTree repositories.
In order to be usable, an OSTree repository must be configured to track
the checksum for its files payload (payload link), that is disabled by
default:
ostree config --repo=/path/to/repo set core.payload-link-threshold N
Where N is the minimum size for files to be tracked by their payload
and must be a nonzero value.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Using unix.AT_EMPTY_PATH requires CAP_DAC_READ_SEARCH. Use an
equivalent variant that uses /proc/self/fd that can be used with
rootless.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
before deduplicating with hard links make sure the two files share the
same UID, GID, file mode and extended attributes.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
in addition to zstd:chunked, add support for the estargz format.
estargz is maintained at github.com/containerd/stargz-snapshotter
Images using estargz can be used on old clients and registries that
have no support for the zstd compression algorithm.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>