Commit Graph

4204 Commits

Author SHA1 Message Date
Matt Heon 458ba5a8af Fix `podman stop` and `podman run --rmi`
This started off as an attempt to make `podman stop` on a
container started with `--rm` actually remove the container,
instead of just cleaning it up and waiting for the cleanup
process to finish the removal.

In the process, I realized that `podman run --rmi` was rather
broken. It was only done as part of the Podman CLI, not the
cleanup process (meaning it only worked with attached containers)
and the way it was wired meant that I was fairly confident that
it wouldn't work if I did a `podman stop` on an attached
container run with `--rmi`. I rewired it to use the same
mechanism that `podman run --rm` uses, so it should be a lot more
durable now, and I also wired it into `podman inspect` so you can
tell that a container will remove its image.

Tests have been added for the changes to `podman run --rmi`. No
tests for `stop` on a `run --rm` container as that would be racy.

Fixes #22852
Fixes RHEL-39513

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-08-20 09:51:18 -04:00
Paul Holzinger 6c0d94328f
golangci-lint: remove most skip dirs
Now that we have propert !remote tags set everywhere we can just rely on
that and do not need to skip any dirs.
Also on linux do not lint three times, one remote run is enough.
We still have to skip the test dir for windows/macos though or we need
to add linux build tags there everywhere as well. This seems simpler.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:28 +02:00
Paul Holzinger c17daf2b09
update golangci-lint to 1.60.1
Fixes new spotted issues around printf() formats and using os.Setenv()
in tests.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:28 +02:00
openshift-merge-bot[bot] 98d52b6131
Merge pull request #23646 from Luap99/wait-remove
podman wait: allow waiting for removal of containers
2024-08-16 15:14:34 +00:00
Paul Holzinger 80639df27a
podman wait: allow waiting for removal of containers
By default wait only waits for the exit of a container, there is really
no way to make it wait for the removal too when the container was
created with --rm. I though I found a clever way in 8a943311db but this
is not working race free. While it works most of the time any other
parallel process might call syncContainer() before the cleanup process
holds the lock until it removes it. As such the wait hack to only update
the state and not sync the exit file did not work so we can drop that.

However the test wants to wait for the removal to happen by the cleanup
process and we can already say --condition=removing to do this but this
will throw an error if the ctr was removed instead of counting this as
success so fix that as well.

Fixes #23640

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-16 15:44:02 +02:00
Paul Holzinger ddece758a4
libpod: remove UpdateContainerStatus()
There are two major problems with UpdateContainerStatus()
First, it can deadlock when the the state json is to big as it tries to
read stderr until EOF but it will never hit EOF as long as the runtime
process is alive. This means if the runtime json is to big to git into
the pipe buffer we deadlock ourselves.
Second, the function modifies the container state struct and even adds
and exit code to the db however when it is called from the stop() code
path we will be unlocked here.

While the first problem is easy to fix the second one not so much. And
when we cannot update the state there is no point in reading the from
runtime in the first place as such remove the function as it does more
harm then good.

And add some warnings the the functions that might be called unlocked.

Fixes #22246

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-16 15:34:16 +02:00
openshift-merge-bot[bot] 734c4b98d4
Merge pull request #23519 from Luap99/netns-cleanup
update c/common to add some netns cleanup fixes
2024-08-15 12:39:22 +00:00
Paul Holzinger 94fd5fe6f7
libpod: remove duplicated HasVolume() check
removeVolume() already does the same check so we do not need it twice.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Paul Holzinger a65aecd260
podman volume rm --force: fix ABBA deadlock
We cannot get first the volume lock and the container locks. Other code
paths always have to first lock the container and the lock the volumes,
i.e. to mount/umount them. As such locking the volume fust can always
result in ABBA deadlocks.

To fix this move the lock down after the container removal. The removal
code is racy regardless of the lock as the volume lcok on create is no
longer taken since commit 3cc9db8626 due another deadlock there.

Fixes #23613

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Paul Holzinger 30eb6b6aae
libpod: do not stop pod on init ctr exit
Init containers are meant to exit early before other containers are
started. Thus stopping the infra container in such case is wrong.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Paul Holzinger 8a943311db
libpod: simplify WaitForExit()
The current code did several complicated state checks that simply do not
work properly on a fast restarting container. It uses a special case for
--restart=always but forgot to take care of --restart=on-failure which
always hang for 20s until it run into the timeout.

The old logic also used to call CheckConmonRunning() but synced the
state before which means it may check a new conmon every time and thus
misses exits.

To fix the new the code is much simpler. Check the conmon pid, if it is
no longer running then get then check exit file and get exit code.

This is related to #23473 but I am not sure if this fixes it because we
cannot reproduce.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Daniel J Walsh 25d66d97d2
Additional potential race condition on os.Readdir
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-08-12 11:38:02 -04:00
Paul Holzinger ecf88f17b6
libpod: reset state error on init
If we manage to init/start a container successfully we should unset any
previously stored state errors. Otherwise a user might be confused why
there is an error in the state about some old error even though the
container works/runs.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-12 14:30:48 +02:00
Paul Holzinger 4aaa5cb6f0
stopIfOnlyInfraRemains: log all errors
Log all stopping errors for each container so we actually see the real
cause.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-12 12:11:26 +02:00
Paul Holzinger 78cb1e28cb
libpod: do not save expected stop errors in ctr state
If we try to stop a contianer that is not running or paused we get an
ErrCtrStateInvalid or ErrCtrStopped error. As podman stop is idempotent
this is not a user visable error at all so we should also never log it
in the container state.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-12 12:09:01 +02:00
Paul Holzinger f276d53532
libpod: fix broken saveContainerError()
We cannot unlock then lock again without syncing the state as this will
then save a potentially old state causing very bad things, such as
double netns cleanup issues.

The fix here is simple move the saveContainerError() under the same
lock. The comment about the re-lock is just wrong. Not doing this under
the same lock would cause us to update the error after something else
changed the container alreayd.

Most likely this was caused by a misunderstanding on how go defer's work.
Given they run Last In - First Out (LIFO) it is safe as long as out
defer function is after the defer unlock() call.

I think this issue is very bad and might have caused a variety of other
weird flakes. As fact I am confident that this fixes the double cleanup
errors.

Fixes #21569
Also fixes the netns removal ENOENT issues seen in #19721.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-12 11:19:47 +02:00
Paul Holzinger f2a03e5753
libpod: cleanupNetwork() return error
Return the error not just log as the caller can then decide to log this
and exit > 0. I also removed the c.valid check as I do not see what the
purpose of this would be. c.valid is only false when the ctr was removed
but then we should never get there as Cleanup() will not work on a
container in removing state.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-09 10:57:24 +02:00
Matt Heon eb7ce80cf9 Create volume path before state initialization
Strictly speaking we don't need the path yet, but it existing
prevents a lot of strangeness in our path-checking logic to
validate the current Podman configuration, as it was the only
path that might not exist this early in init.

Fixes #23515

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-08-06 13:42:09 -04:00
openshift-merge-bot[bot] 69862b7251
Merge pull request #23460 from Luap99/cleanup-term
libpod: inhibit SIGTERM during cleanup()
2024-08-06 11:38:27 +00:00
Giuseppe Scrivano 3ae1568933
libpod: fix volume copyup with idmap
if idmap is specified for a volume, reverse the mappings when copying
up from the container, so that the original permissions are maintained.

Closes: https://github.com/containers/podman/issues/23467

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-08-01 22:49:27 +02:00
Giuseppe Scrivano 61def05cd9
libpod: avoid hang on errors
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-08-01 22:49:27 +02:00
openshift-merge-bot[bot] 803ef5c16f
Merge pull request #23384 from edsantiago/root-namespace
CI: enable root user namespaces
2024-08-01 10:32:16 +00:00
Paul Holzinger 7610cedc80
libpod: inhibit SIGTERM during cleanup()
The network cleanup can handle it when it is killed half way through as
it spits out a bunch of error in that case on the next cleanup attempt.
Try to avoid getting into such a state and ignore sigterm during this
section.

Of course we stil can get SIGKILL so we should work on fixing the
underlying problems in network cleanup but let's see if this helps us
with the CI flakes in the meantime.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-31 19:00:16 +02:00
Paul Holzinger 4c3531a1a4
fix network cleanup flake in play kube
When using service containers and play kube we create a complicated set
of dependencies.

First in a pod all conmon/container cgroups are part of one slice, that
slice will be removed when the entire pod is stopped resulting in
systemd killing all processes that were part in it.

Now the issue here is around the working of stopPodIfNeeded() and
stopIfOnlyInfraRemains(), once a container is cleaned up it will check
if the pod should be stopped depending on the pod ExitPolicy. If this is
the case it wil stop all containers in that pod. However in our flaky
test we calle podman pod kill which logically killed all containers
already. Thus the logic now thinks on cleanup it must stop the pod and
calls into pod.stopWithTimeout(). Then there we try to stop but because
all containers are already stopped it just throws errors and never gets
to the point were it would call Cleanup(). So the code does not do
cleanup and eventually calls removePodCgroup() which will cause all
conmon and other podman cleanup processes of this pod to be killed.

Thus the podman container cleanup process was likely killed while
actually trying to the the proper cleanup which leaves us in a bad
state.

Following commands such as podman pod rm will try to the cleanup again
as they see it was not completed but then fail as they are unable to
recover from the partial cleanup state.

Long term network cleanup needs to be more robust and ideally should be
idempotent to handle cases were cleanup was killed in the middle.

Fixes #21569

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-31 16:59:43 +02:00
Paul Holzinger 77081df8cd
libpod: bind ports before network setup
We bind ports to ensure there are no conflicts and we leak them into
conmon to keep them open. However we bound the ports after the network
was set up so it was possible for a second network setup to overwrite
the firewall configs of a previous container as it failed only later
when binding the port. As such we must ensure we bind before the network
is set up.

This is not so simple because we still have to take care of
PostConfigureNetNS bool in which case the network set up happens after
we launch conmon. Thus we end up with two different conditions.

Also it is possible that we "leak" the ports that are set on the
container until the garbage collector will close them. This is not
perfect but the alternative is adding special error handling on each
function exit after prepare until we start conmon which is a lot of work
to do correctly.

Fixes https://issues.redhat.com/browse/RHEL-50746

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-30 14:39:08 +02:00
Giuseppe Scrivano b59918e536
libpod: force rootfs for OCI path with idmap
when a --rootfs is specified with idmap, always use the specified
rootfs since we need a new mount on top of the original directory.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-07-27 19:25:10 +02:00
Giuseppe Scrivano 61f0230c31
kube: record infra user namespace
if there is an annotation that specifies the user namespace for the
infra container, then make sure it is used for the entire pod.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-07-24 12:10:48 +02:00
Giuseppe Scrivano e97bb79b7a
kube: invert branches
it increases readability as it doesn't need the negation, and the
first branch is shorter.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-07-24 12:10:47 +02:00
Daniel J Walsh 7768cf235e
Run codespell on source
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-07-23 07:28:23 -04:00
openshift-merge-bot[bot] 249d042035
Merge pull request #23343 from Luap99/fix-hc-output
libpod: correctly capture healthcheck output
2024-07-22 12:18:34 +00:00
Paul Holzinger b6b61a6a49
libpod: add hidden env to set sqlite timeout
Some users want to experiment with different timeout values.

Fixes #23236

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-22 12:59:00 +02:00
Paul Holzinger 55b6e4c3e8
podman pod stats: fix race when ctr process exits
Like commit 55749af0c7 but for podman *pod* stats not the normal podman
stats. We must ignore ErrCtrStopped here as well as this will happen
when the container process exited.

While at it remove a useless argument from the function as it was always
nil and restructure the logic flow to make it easier to read.

Fixes #23334

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-22 10:30:42 +02:00
Paul Holzinger 5e8884ab0d
libpod: correctly capture healthcheck output
Using the scanner is just unnecessary complicated an buggy as it will
not read the final line with a newline. There is also the problem that
it happens in a separate goroutine so it could loose output if we read
the array before the scanner was done.

The API accepts a Writer so we can just directly use a bytes.Buffer
which captures all output in memory without the need of another
goroutine.

This also means that now we always include the final newline in the
output. I checked with docker and they do the same so this is good.

Fixes #23332

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-19 15:16:55 +02:00
openshift-merge-bot[bot] 88c68a4b58
Merge pull request #23271 from giuseppe/drop-unmount-for-overlay-storage
test: podman system service doesn't leak mount on termination
2024-07-15 12:20:11 +00:00
Giuseppe Scrivano fbc4768a00
libpod: shutdown Stop waits for handlers completion
wait for handlers currently being processed.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-07-15 11:41:28 +02:00
Giuseppe Scrivano 6832a35f65
libpod: cleanup store at shutdown
shutdown the containers store so that the home directory mount is not
leaked when "podman system service" exits.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-07-15 11:41:28 +02:00
Paul Holzinger 3280da0500
fix race conditions in start/attach logic
The current code did something like this:
lock()
getState()
unlock()

if state != running
  lock()
  getState() == running -> error
  unlock()

This of course is wrong because between the first unlock() and second
lock() call another process could have modified the state. This meant
that sometimes you would get a weird error on start because the internal
setup errored as the container was already running.

In general any state check without holding the lock is incorrect and
will result in race conditions. As such refactor the code to combine
both StartAndAttach and Attach() into one function that can handle both.
With that we can move the running check into the locked code.

Also use typed error for this specific error case then the callers can
check and ignore the specific error when needed. This also allows us to
fix races in the compat API that did a similar racy state check.

This commit changes slightly how we output the result, previously a
start on already running container would never print the id/name of the
container which is confusing and sort of breaks idempotence. Now it will
include the output except when --all is used. Then it only reports the
ids that were actually started.

Fixes #23246

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-12 15:11:34 +02:00
openshift-merge-bot[bot] 04bd415c74
Merge pull request #23167 from mheon/fix_rhel_37948
Ignore result of EvalSymlinks on ENOENT
2024-07-11 20:13:02 +00:00
Matt Heon 830e550073 Ignore result of EvalSymlinks on ENOENT
When the path does not exist, filepath.EvalSymlinks returns an
empty string - so we can't just ignore ENOENT, we have to discard
the result if an ENOENT is returned.

Should fix Jira issue RHEL-37948

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-07-11 09:39:56 -04:00
Farya L. Maerten c819c7a973 create runtime's worker queue before queuing any job
It seems that if some background tasks are queued in libpod's Runtime before the worker's channel is set up (eg. in the refresh phase), they are not executed later on, but the workerGroup's counter is still ticked up. This leads podman to hang when the imageEngine is shutdown, since it waits for the workerGroup to be done.

fixes containers/podman#22984

Signed-off-by: Farya Maerten <me@ltow.me>
2024-07-09 11:15:29 +02:00
Paul Holzinger 62956ac192
libpod: first delete container then cidfile
I am seeing a weird flake in my parallel system test PR. The issue is
that system units generated by podman systemd generate leave a container
in the Removing state behind.

As far as I can tell the porblems seems to be that the cleanup process
is killed while it tries to remove the container from the db. Because
the cidfile was removed before the ExecStopPost=podman rm ... process no
longer had access to the cidfile and reported no error because it runs
with --ignore.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-07-05 10:27:42 +02:00
Paul Holzinger 6db8ff7f7b
libpod/container_top_linux.c: fix missing header
As this file uses open it needs to include fcntl.h.
This should fix the build error seen on epel9[1], not sure why it works
on the other platforms.

[1] https://download.copr.fedorainfracloud.org/results/packit/containers-podman-23113/epel-9-aarch64/07672197-podman/builder-live.log.gz

Fixes 65ed96585d ("podman top: join the container userns")

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-06-27 10:50:17 +02:00
Paul Holzinger 65ed96585d
podman top: join the container userns
When we execute ps(1) in the container and the container uses a userns
with a different id mapping the user id field will be wrong.

To fix this we must join the userns in such case.

Fixes #22293

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-06-26 11:10:56 +02:00
Paul Holzinger def182d396
restore: fix missing network setup
The restore code path never called completeNetworkSetup() and this means
that hosts/resolv.conf files were not populated. This fix is simply to
call this function. There is a big catch here. Technically this is
suposed to be called after the container is created but before it is
started. There is no such thing for restore, the container runs right
away. This means that if we do the call afterwards there is a short
interval where the file is still empty. Thus I decided to call it
before which makes it not working with PostConfigureNetNS (userns) but
as this does not work anyway today so  I don't see it as problem.

Fixes #22901

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-06-24 18:52:02 +02:00
openshift-merge-bot[bot] bf2de4177b
Merge pull request #23064 from giuseppe/podman-pass-timeout-stop-to-systemd
container: pass StopTimeout to the systemd slice
2024-06-23 14:57:55 +00:00
Giuseppe Scrivano 49eb5af301
libpod: intermediate mount if UID not mapped into the userns
if the current user is not mapped into the new user namespace, use an
intermediate mount to allow the mount point to be accessible instead
of opening up all the parent directories for the mountpoint.

Closes: https://github.com/containers/podman/issues/23028

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
Giuseppe Scrivano 08a8429459
libpod: avoid chowning the rundir to root in the userns
so it is possible to remove the code to make the entire directory
world accessible.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
Giuseppe Scrivano c81f075f43
libpod: do not chmod bind mounts
with the new mount API is available, the OCI runtime doesn't require
that each parent directory for a bind mount must be accessible.
Instead it is opened in the initial user namespace and passed down to
the container init process.

This requires that the kernel supports the new mount API and that the
OCI runtime uses it.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
Giuseppe Scrivano 094bc673ef
libpod: unlock the thread if possible
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
Giuseppe Scrivano 7d22f04f56
container: pass KillSignal and StopTimeout to the systemd scope
so that they are honored when systemd terminates the scope.

Closes: https://issues.redhat.com/browse/RHEL-16375

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 13:46:08 +02:00