Motivated to have a working `make lint` on Fedora 37 (beta).
Most changes come from the new `gofmt` standards.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This allows us to add a simple stub for FreeBSD which returns -1,
leading WaitForExit to fall back to the sleep loop approach.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Make sure to wait for the container to exit after kill. While the
cleanup process will take care eventually of transitioning the state, we
need to give a guarantee to the user to leave the container in the
expected state once the (kill) command has finished.
The issue could be observed in a flaking test (#16142) where
`podman rm -f -t0` failed because the preceding `podman kill`
left the container in "running" state which ultimately confused
the "stop" backend.
Note that we should only wait for the container to exit when SIGKILL is
being used. Other signals have different semantics.
[NO NEW TESTS NEEDED] as I do not know how to reliably reproduce the
issue. If #16142 stops flaking, we are good.
Fixes: #16142
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
To improve the error message reported in #16142 where the container is
reported to be in the wrong state but we do not know which. This is not
a fix for #16142 but will hopefully aid in better understanding what's
going on if it flakes again.
[NO NEW TESTS NEEDED] as hitting the condition is inherently racy.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This gets c.config.Spec.Linux.Resources, with some nil checks.
Using this means less open coding of the nil-checks, but also the
existing user of this field in moveConmonToCgroupAndSignal() was
using ctr.Spec().Linux.Resources instead, and the Spec() call
is very expensive.
[NO NEW TESTS NEEDED] Just minor performance effects
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This just gets ctr.config.Spec.Process.Terminal with some null checks,
allowing several places that open-coded this to use the helper.
In particular, this helps the code in
pkg/domain/infra/abi/terminal.StartAttachCtr(), that used to do:
`ctr.Spec().Process.Terminal`, which looks fine, but actually causes
a deep json copy in the `ctr.Spec()` call that takes over 3 msec.
[NO NEW TESTS NEEDED] Just minor performance effects
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Fix the "stop" on-failure action by not removing the transient systemd
timer and service during container stop. Removing the service will
in turn cause systemd to terminate the Podman process attempting to
stop the container and hence leave it in the "stopping" state.
Instead move the removal into the restart sequence.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
When you run "podman run foo" we attach to the container, which essentially
blocks until the container process exits. When that happens podman immediately
calls Container.WaitForExit(), but at this point the exit value has not
yet been written to the db by conmon. This means that we almost always
hit the "check for exit state; sleep 250msec" loop in WaitForExit(),
delaying the exit of podman run by 250 msec.
More recent kernels (>= 5.3) supports the pidfd_open() syscall, that
lets you open a fd representing a pid and then poll on it to wait
until the process exits. We can use this to have the first sleep
be exactly as long as is needed for conmon to exit (if we know its pid).
If for whatever reason there is still issues we use the old sleep loop
on later iterations.
This makes "time podman run fedora true" about 200msec faster.
[NO NEW TESTS NEEDED]
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Make sure that the on-failure actions only kick in once the health check
has passed its retries. Also fix race conditions on reading/writing the
log.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
When shutting down the image engine we always wait for the image
even goroutine to finish writing any outstanding events. However,
the loop for that always waits 100msec every iteration. This means
that (depending on the phase) shutdown is always delayed up to 100msec.
This is delaying "podman run" extra much because podman is run twice
(once for the run and once as cleanup via a conmon callback).
Changing the image loop to exit immediately when a libimageEventsShutdown
(but first checking for any outstanding events to write) improves podman
run times by about 100msec on average.
Note: We can't just block on the event loop reading the shutdown event
anymore, we need to wait until it read and processed any outstanding
events, so we now send the shutdown event and then block waiting for the
channel to be closed by the event loop.
[NO NEW TESTS NEEDED]
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This moves the cgroup code to pod_internal_linux.go and adds a no-op
stub for FreeBSD.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
We should not keep the netns if there was a cleanup problem. Deleting
the netns will also delete the virtual links inside and thus make the IPs
available again for the next use.
context: https://github.com/containers/netavark/issues/302
[NO NEW TESTS NEEDED] This is very hard to trigger reliable and it would
need to work with cni and netavark. This mostly happens because of
specic bugs but those will be fixed and then this test would fail.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Include the digest of the image in `podman container inspect`. The image
digest is a key information for auditing as it defines the identify of
an image. This way, it can be determined whether a container used an
image with a given CVE etc.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
There is an existing wrapper for unix.Unmount(..., MNT_DETACH) in
util_linux.go but that filters all errors and for volumes, we only want
to filter EINVAL. The existing libpod.Unmount seems to only have one
call site so perhaps these can be merged.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Auto updates using the "registry" policy require container to be created
with a fully-qualified image reference. Short names are not supported
due the ambiguity of their source registry. Initially, container
creation errored out for non FQN images but it seems that Podman has
regressed.
Fixes: #15879
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This allows tools like Cockpit to know that the pod in question
has also been updated, so they can refresh the list of containers
in the pod.
Fixes#15408
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We added the concept of image volumes in 2.2.0, to support
inspecting an image from within a container. However, this is a
strictly read-only mount, with no modification allowed.
By contrast, the new `image` volume driver creates a c/storage
container as its underlying storage, so we have a read/write
layer. This, in and of itself, is not especially interesting, but
what it will enable in the future is. If we add a new command to
allow these image volumes to be committed, we can now distribute
volumes - and changes to them - via a standard OCI image registry
(which is rather new and quite exciting).
Future work in this area:
- Add support for `podman volume push` (commit volume changes and
push resulting image to OCI registry).
- Add support for `podman volume pull` (currently, we require
that the image a volume is created from be already pulled; it
would be simpler if we had a dedicated command that did the
pull and made a volume from it)
- Add support for scratch images (make an empty image on demand
to use as the base of the volume)
- Add UOR support to `podman volume push` and
`podman volume pull` to enable both with non-image volume
drivers
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
According to https://systemd.io/CONTAINER_INTERFACE/, systemd will try take
control over /dev/ttyN if exported, which can cause conflicts with the host's tty
in privileged containers. Thus we will not expose these to privileged containers
in systemd mode, as this is a bad idea according to systemd's maintainers.
Additionally, this commit adds a bats regression test to check that no /dev/ttyN
are present in a privileged container in systemd mode
This fixes https://github.com/containers/podman/issues/15878
Signed-off-by: Dan Čermák <dcermak@suse.com>
Package `io/ioutil` was deprecated in golang 1.16, preventing podman from
building under Fedora 37. Fortunately, functionality identical
replacements are provided by the packages `io` and `os`. Replace all
usage of all `io/ioutil` symbols with appropriate substitutions
according to the golang docs.
Signed-off-by: Chris Evich <cevich@redhat.com>
This also moves the logic for resolving paths in running and stopped
containers tp container_copy_linux.go.
On FreeBSD, we can execute the function argument to joinMountAndExec
directly using host-relative paths since the host mount namespace
includes all the container mounts.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
The logic that treats running containers differently from stopped
containers is not needed on FreeBSD where the container mounts live in
a global mount namespace.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
There is no option in Selinux labeling to only relabel the top level of
a directory. The option is to either label the path shared or not
shared. Changing to make sure future engineers do not assume that
recurse can work.
[NO NEW TESTS NEEDED]
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This makes setting EffectiveCaps and BoundingCaps conditional on whether
the capabilites field in the spec is non-nil. This allows 'podman inspect'
to work on FreeBSD.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Originally, during pod removal, we locked every container in the
pod at once, did a number of validity checks to ensure everything
was safe, and then removed all the containers in the pod.
A deadlock was recently discovered with this approach. In brief,
we cannot lock the entire pod (or much more than a single
container at a time) without causing a deadlock. As such, we
converted to an approach where we just looped over each container
in the pod, removing them individually. Unfortunately, this
removed a lot of the validity checking of the earlier approach,
allowing for a lot of unintended bad things. Infra containers
could be removed while containers in the pod still depended on
them, for example.
There's no easy way to do validity checks while in a simple loop,
so I implemented a version of our graph-traversal logic that
currently handles pod start. This version acts in the reverse
order of startup: startup starts from containers which depend on
nothing and moves outwards, while removal acts on containers which
have nothing depend on them and moves inwards. By doing graph
traversal, we can guarantee that nothing is removed while
something that depends on it still exists - so the infra
container should be the last thing in a pod that is removed, for
example.
In the (unlikely) case that a graph of the pod's containers
cannot be built (most likely impossible without database editing)
the old method of pod removal has been retained to ensure that
even misbehaving pods can be forcibly evicted from the state.
I'm fairly confident that this resolves the problem, but there
are a lot of assumptions around dependency structure built into
the original pod removal code and I am not 100% sure I have
captured all of them.
Fixes#15526
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The process of saving the OCI spec is not particularly
reboot-safe. Normally, this doesn't matter, because we recreate
the spec every time a container starts, but if one was to reboot
(or SIGKILL, or otherwise fatally interrupt) Podman in the middle
of writing the spec to disk, we can end up with a malformed spec
that sticks around until the container is next started. Some
Podman commands want to read the latest version of the spec off
disk (to get information only populated after a container is
started), and will break in the case that a partially populated
spec is present. Swap to just ignoring these errors (with a
logged warning, to let folks know something went wrong) so we
don't break important commands like `podman inspect` in these
cases.
[NO NEW TESTS NEEDED] Provided reproducer involves repeatedly
rebooting the system
Signed-off-by: Matthew Heon <mheon@redhat.com>