if the compressed digest was validated, as it happens when
'pull_options = {convert_images = "true"}' is set, then store it as
well so that reusing the blob by its compressed digest works.
Previously, when an image converted to zstd:chunked was pulled a
second time, it would not be recognized by its compressed digest,
resulting in the need to re-pull the image again.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
SetBigData itself calls saveFor; so doing that before raises
fewer questions about stale data / stepping over each other.
The change in timing is externally-observable, but should hopefully
not matter much in practice, because this code is typically called
from layerStore.create as a part of an atomic create+populate operation
proptected by incompleteFlag.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Add a race-condition-free alternative to using CreateLayer and
ApplyDiffFromStagingDirectory, ensuring the store is locked for the
entire duration while the layer is being created and populated.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
enforce that the stagingDirectory must have the same value as the
diffOutput.Target variable. It allows to simplify the internal API.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This allows us to correctly set (CompresedDigest, CompressedSize)
when copying data from another layer; in that case we don't have the
compressed data, so computing the size from compressedCounter
sets an incorrect value.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
introduce the TOCDigest field for a layer. TOCDigest is designed to
store the digest of the Table of Contents (TOC) of the blob.
It is useful when the UncompressedDigest cannot be validated during a
partial image pull, but the TOC itself is validated.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
AFAICS this call is intended to "remap" the parent layer's contents to the
desired IDMappings; but when there is no parent layer, there is
nothing to remap.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
change the file format to store the tar-split as part of the
zstd:chunked image. This will allow clients to rebuild the entire
tarball without having to download it fully.
also store the uncompressed digest for the tarball, so that it can be
stored into the storage database.
Needs: https://github.com/containers/image/pull/1976
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Handle old-fashioned ID mappings when looking at layers. Nowadays,
we'll use an idmapped mount if we can, but we shouldn't blow up if we
had to chown a layer because we couldn't use an idmapped mount.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
tarLogger calls the provided callback in a separate
goroutine, and that can happen after tarLogger.Write
returns; tarLogger.Close is requried to ensure the callbacks
have all been correctly called, and the created uidLog and gidLog
values can be consumed.
So, move most of the IO pipeline that is formed around the
layer stream into a nested function that terminates earlier, notably
so that the "defer idLogger.Close()" is called at the appropriate time.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
We will want to move the next part of the code
into a closure; move variables that will be
accessed outside of that section.
Should not change behavior.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
AFAICS that can't fail with current pgzip; and
pgzip.NewWriter also calls NewWriteLevel, but it just
swallows the error.
Any failure would therefore be very unexpected;
report it instead of suppressing it.0
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
gofumpt is a superset of gofmt, enabling some more code formatting
rules.
This commit is brought to you by
gofumpt -w .
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The lockfile's write record is now updated prior to the actual write
operation. This ensures that, in the event of an unexpected
termination, other processes are correctly notified of an initiated
write operation.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The documentation says
> The new Buffer takes ownership of buf, and the
> caller should not use buf after this call.
so use the more directly applicable, and simpler, bytes.Reader
instead, to avoid this potentially risky use.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Implement ListLayers() for the aufs, btrfs, and devicemapper drivers,
along with a unit test for them.
Stop filtering out directories with names that aren't 64-hex chars in
vfs and overlay ListLayers() implementations, which is more a convention
than a hard rule.
Have layerStore.Wipe() try to remove remaining listed layers after it
removes the layers that the layerStore knew of.
Close() a dangling ReadCloser in NaiveCreateFromTemplate.
Switch from using plain defer to using t.Cleanup() to handle deleting
layers that tests create, have the addManyLayers() test function do so
as well.
Remove vfs.CopyDir, which near as I can tell isn't referenced anywhere.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
We previously started "pulling up" images when we changed their names,
and started denying the presence of images in read-only stores which
shared their ID with an image in the read-write store, so that it would
be possible to "remove" names from an image in read-only storage. We
forgot about the Flags field, so start pulling that up, too.
Do all of the above when we're asked to create an image, since denying
the presence of images with the same ID in read-only stores would
prevent us from finding the image by any of the names that it "had" just
a moment before we created the new record.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
When updateNames() copies an image's record from a read-only store into
the read-write store, copy the accompanying data as well.
Add fields for setting data items at creation-time to LayerOptions,
ImageOptions, and ContainerOptions to make this easier for us and our
consumers.
Replace the store-specific Create() (and the one CreateWithFlags() and
Put()) with private create() and put() methods, since they're not
intended for consumption outside of this package, and add Flags to the
options structures we pass into those methods. In create() methods,
make copies of those passed-in options structures before modifying any
of their contents.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
In that case, we can just get read locks, confirm that nothing has changed,
and continue; no need for any serialization on exclusively holding
loadMut / inProcessLock.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Instead of basing this on exclusivity loading via loadMut (which was incorrect,
because contrary to the original design, the r.layerspathModified
check in r.Modified() could trigger during the lifetime of a read lock)
use a very traditional read-write lock to protect the fields of imageStore.
Also explicitly document how concurrent access to fields of imageStore
is managed.
Note that for the btrfs and zfs graph drivers, Diff() can trigger
Mount() and unmount() in a way that violates the locking design.
That's not fixed in this PR.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This should be fixed, it just seems too hard to do without
breaking API (and performance).
So, just be clear about that to warn future readers.
It's tracked in https://github.com/containers/storage/issues/1379 .
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
We can't safely do that because the read-only callers don't allow us
to write to layerStore state.
Luckily, with the recent changes to Mounted, we don't really need to
reload in those places.
Also, fairly extensively document the locking design or implications
for users.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Instead of reading that value, releasing the mount lock,
and then unmounting, provide a "conditional" unmount mode.
And use that in the existing "loop unmounting" code.
That's at least safer against concurrent processes unmounting
the same layer. But the callers that try to "really unmount"
the layer in a loop are still possibly racing against other processes
trying to mount the layer in the meantime.
I'm not quite sure that we need the "conditional" parameter as an
explicit choice; it seems fairly likely that Umount() should just fail
with ErrLayerNotMounted for all !force callers. I chose to use the flag
to be conservative WRT possible unknown constraints.
Similarly, it's not very clear to me that the unmount loops need to exist;
maybe they should just be unmount(force=true, conditional=true).
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The lockfile we use propertly handles the case that we Touch() it. In
other words, a later Modified() call will return false.
However, we're also looking at the mtime, which was failing. This
uses the new AtomicWriteFileWithOpts() feature to also record the
mtime of the file we write on updates.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This was using the graphDriver field without locks, and the graph driver itself,
while the implementation assumed exclusivity.
Luckily all callers are actually holding the layer store lock for writing, so
use that for exclusion. (layerStore already seems to extensively assume
that locking the layer store for writing guarantees exclusive access to the graph driver,
and because we always recreate a layer store after recreating the graph driver,
that is true in practice.)
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Only update recorded LastWrite values _after_ we succesfully reload
container/image/layer stores; so, if the reload fails, the functions
will keep failing instead of using obsolete (and possibly partially loaded
and completely invalid) data.
Also, further improve mount change tracking, so that if layerStore.load()
loads it, we don't reload it afterwards.
This does not change the store.graphLock users; they will need to be cleaned up
more comprehensively, and separately, in the future.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
If the mounts file is changed, don't reload all of the layers
as well. This is more efficient, and it will allow us to better
track/update r.lastWrite and r.mountsLastWrite in the future.
Exiting code calling reloadMountsIfChanged() indicates that this
must already be safe.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
They are a shared state across all users of the *LockFile in the process,
and therefore incorrect to use for any single consumer for change tracking.
Direct users to user the new GetLastWrite, ModifiedSince, and RecordWrite,
instead, and convert all c/storage users.
In addition to just being correct, the new API is also more efficient:
- We now initialize stores with GetLastWrite before loading state;
so, we don't trigger an immediate reload on the _second_ use of a store,
due to the problematic semantics of .Modified().
- Unlike Modified() and Touch(), the new APi can be safely used without
locking *LockFile.stateMutex, for a tiny speedup.
The conversion is, for now, trivial, just immediately updating the lastWrite
value after the ModifiedSince call. That will get better.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
- Don't exit after saving only one of the locations
- Only touch the lock once if saving both of the locations
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This looks in the container store for existing data dirs with ids not in
the container files and removes them. It also adds an (optional) driver
method to list available layers, then uses this and compares it to the
layers json file and removes layers that are not references.
Losing track of containers and layers can potentially happen in the
case of some kind of unclean shutdown, but mainly it happens at reboot
when using transient storage mode. Such users are recommended to run
a garbage collect at boot.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
We store information about the layers in two files, layers.json and
volatile-layers.json. This allows us to treat the two stores
differently, for example saving the volatile json with the NoSync
option, which is faster (but less robust).
In normal mode we store only layers for the containers that are marked
volatile (`--rm`), as these are not expected to persist anyway. This way
informantion about such containers are faster to save, and in case
of an unclean shutdown we only risk loss of information about other
such volatile containers.
In transient mode all container layers (but not image layers) are
stored in the volatile json, and additionally it is stored in tmpfs.
This improving performance as well as automatically making sure no
container layers are persisted across boots.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
... instead of failing for duplicate names, and instead of
ignoring the "incomplete" state of layers.
Try up to 3 times if other writers are creating inconsistent
state in the meantime.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
- In read-write stores, fail even readers if there are incomplete
layers. Right now that would increase observed failures (OTOH
with better consistency), but we'll fix that again soon
- Decide whether to save read-write stores strictly based on
the need to clean up, instead of cleaning up opportunistically
(which is less predictable).
- Correctly return the right error, depending on whether there
are duplicate layers
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... as an error value instead of just a boolean. That
will allow extending the logic to more kinds of inconsistencies,
and consolidates the specifics of the inconsistency knowledge
into a single place.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... instead of combining control flow branches; we will change
behavior on some of them.
Should not change behavior.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>