Return the error not just log as the caller can then decide to log this
and exit > 0. I also removed the c.valid check as I do not see what the
purpose of this would be. c.valid is only false when the ctr was removed
but then we should never get there as Cleanup() will not work on a
container in removing state.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Strictly speaking we don't need the path yet, but it existing
prevents a lot of strangeness in our path-checking logic to
validate the current Podman configuration, as it was the only
path that might not exist this early in init.
Fixes#23515
Signed-off-by: Matt Heon <mheon@redhat.com>
if idmap is specified for a volume, reverse the mappings when copying
up from the container, so that the original permissions are maintained.
Closes: https://github.com/containers/podman/issues/23467
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The network cleanup can handle it when it is killed half way through as
it spits out a bunch of error in that case on the next cleanup attempt.
Try to avoid getting into such a state and ignore sigterm during this
section.
Of course we stil can get SIGKILL so we should work on fixing the
underlying problems in network cleanup but let's see if this helps us
with the CI flakes in the meantime.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When using service containers and play kube we create a complicated set
of dependencies.
First in a pod all conmon/container cgroups are part of one slice, that
slice will be removed when the entire pod is stopped resulting in
systemd killing all processes that were part in it.
Now the issue here is around the working of stopPodIfNeeded() and
stopIfOnlyInfraRemains(), once a container is cleaned up it will check
if the pod should be stopped depending on the pod ExitPolicy. If this is
the case it wil stop all containers in that pod. However in our flaky
test we calle podman pod kill which logically killed all containers
already. Thus the logic now thinks on cleanup it must stop the pod and
calls into pod.stopWithTimeout(). Then there we try to stop but because
all containers are already stopped it just throws errors and never gets
to the point were it would call Cleanup(). So the code does not do
cleanup and eventually calls removePodCgroup() which will cause all
conmon and other podman cleanup processes of this pod to be killed.
Thus the podman container cleanup process was likely killed while
actually trying to the the proper cleanup which leaves us in a bad
state.
Following commands such as podman pod rm will try to the cleanup again
as they see it was not completed but then fail as they are unable to
recover from the partial cleanup state.
Long term network cleanup needs to be more robust and ideally should be
idempotent to handle cases were cleanup was killed in the middle.
Fixes#21569
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We bind ports to ensure there are no conflicts and we leak them into
conmon to keep them open. However we bound the ports after the network
was set up so it was possible for a second network setup to overwrite
the firewall configs of a previous container as it failed only later
when binding the port. As such we must ensure we bind before the network
is set up.
This is not so simple because we still have to take care of
PostConfigureNetNS bool in which case the network set up happens after
we launch conmon. Thus we end up with two different conditions.
Also it is possible that we "leak" the ports that are set on the
container until the garbage collector will close them. This is not
perfect but the alternative is adding special error handling on each
function exit after prepare until we start conmon which is a lot of work
to do correctly.
Fixes https://issues.redhat.com/browse/RHEL-50746
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
when a --rootfs is specified with idmap, always use the specified
rootfs since we need a new mount on top of the original directory.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if there is an annotation that specifies the user namespace for the
infra container, then make sure it is used for the entire pod.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Like commit 55749af0c7 but for podman *pod* stats not the normal podman
stats. We must ignore ErrCtrStopped here as well as this will happen
when the container process exited.
While at it remove a useless argument from the function as it was always
nil and restructure the logic flow to make it easier to read.
Fixes#23334
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Using the scanner is just unnecessary complicated an buggy as it will
not read the final line with a newline. There is also the problem that
it happens in a separate goroutine so it could loose output if we read
the array before the scanner was done.
The API accepts a Writer so we can just directly use a bytes.Buffer
which captures all output in memory without the need of another
goroutine.
This also means that now we always include the final newline in the
output. I checked with docker and they do the same so this is good.
Fixes#23332
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
shutdown the containers store so that the home directory mount is not
leaked when "podman system service" exits.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The current code did something like this:
lock()
getState()
unlock()
if state != running
lock()
getState() == running -> error
unlock()
This of course is wrong because between the first unlock() and second
lock() call another process could have modified the state. This meant
that sometimes you would get a weird error on start because the internal
setup errored as the container was already running.
In general any state check without holding the lock is incorrect and
will result in race conditions. As such refactor the code to combine
both StartAndAttach and Attach() into one function that can handle both.
With that we can move the running check into the locked code.
Also use typed error for this specific error case then the callers can
check and ignore the specific error when needed. This also allows us to
fix races in the compat API that did a similar racy state check.
This commit changes slightly how we output the result, previously a
start on already running container would never print the id/name of the
container which is confusing and sort of breaks idempotence. Now it will
include the output except when --all is used. Then it only reports the
ids that were actually started.
Fixes#23246
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When the path does not exist, filepath.EvalSymlinks returns an
empty string - so we can't just ignore ENOENT, we have to discard
the result if an ENOENT is returned.
Should fix Jira issue RHEL-37948
Signed-off-by: Matt Heon <mheon@redhat.com>
It seems that if some background tasks are queued in libpod's Runtime before the worker's channel is set up (eg. in the refresh phase), they are not executed later on, but the workerGroup's counter is still ticked up. This leads podman to hang when the imageEngine is shutdown, since it waits for the workerGroup to be done.
fixescontainers/podman#22984
Signed-off-by: Farya Maerten <me@ltow.me>
I am seeing a weird flake in my parallel system test PR. The issue is
that system units generated by podman systemd generate leave a container
in the Removing state behind.
As far as I can tell the porblems seems to be that the cleanup process
is killed while it tries to remove the container from the db. Because
the cidfile was removed before the ExecStopPost=podman rm ... process no
longer had access to the cidfile and reported no error because it runs
with --ignore.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When we execute ps(1) in the container and the container uses a userns
with a different id mapping the user id field will be wrong.
To fix this we must join the userns in such case.
Fixes#22293
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The restore code path never called completeNetworkSetup() and this means
that hosts/resolv.conf files were not populated. This fix is simply to
call this function. There is a big catch here. Technically this is
suposed to be called after the container is created but before it is
started. There is no such thing for restore, the container runs right
away. This means that if we do the call afterwards there is a short
interval where the file is still empty. Thus I decided to call it
before which makes it not working with PostConfigureNetNS (userns) but
as this does not work anyway today so I don't see it as problem.
Fixes#22901
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
if the current user is not mapped into the new user namespace, use an
intermediate mount to allow the mount point to be accessible instead
of opening up all the parent directories for the mountpoint.
Closes: https://github.com/containers/podman/issues/23028
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
with the new mount API is available, the OCI runtime doesn't require
that each parent directory for a bind mount must be accessible.
Instead it is opened in the initial user namespace and passed down to
the container init process.
This requires that the kernel supports the new mount API and that the
OCI runtime uses it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
add a new flag that allows to override the pull options configured in
the storage.conf file.
e.g.: --pull-option="enable_partial_images=false" can be specified to
Podman to disable partial pulls even if enabled.
Leave it as a hidden configuration flag for now since the API itself
is marked as experimental in c/storage.
Currently c/storage doesn't honor the overrides, being fixed with
https://github.com/containers/storage/pull/1966
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If a container was stopped and we try to start it before we called
cleanup it tried to reuse the network which caused a panic as the pasta
code cannot deal with that. It is also never correct as the netns must
be created by the runtime in case of custom user namespaces used. As
such the proper thing is to clean the netns up first.
Also change a e2e test to report better errors. It is not directly
related to this chnage but it failed on v1 of this patch so we noticed
the ugly error message it produced. Thanks to Ed for the fix.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This avoids dereferencing c.config.Spec.Linux if it is nil, which is the
case on FreeBSD.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
This fixes a regression added in commit 4fd84190b8, because the name was
overwritten by the createTimer() timer call the removeTransientFiles()
call removed the new timer and not the startup healthcheck. And then
when the container was stopped we leaked it as the wrong unit name was
in the state.
A new test has been added to ensure the logic works and we never leak
the system timers.
Fixes#22884
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add a `podman system check` that performs consistency checks on local
storage, optionally removing damaged items so that they can be
recreated.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
debian's man (5) hostname page states "The file should contain a single newline-terminated hostname
string."
[NO NEW TESTS NEEDED]
fix#22729
Signed-off-by: Bo Wang <wangbob@uniontech.com>
When an empty volume is mounted into a container, Docker will
chown that volume appropriately for use in the container. Podman
does this as well, but there are differences in the details. In
Podman, a chown is presently a one-and-done deal; in Docker, it
will continue so long as the volume remains empty. Mount into a
dozen containers, but never add content, the chown occurs every
time. The chown is also linked to copy-up; it will always occur
when a copy-up occurred, despite the volume now not being empty.
This PR changes our logic to (mostly) match Docker's.
For some reason, the chowning also stops if the volume is chowned
to root at any point. This feels like a Docker bug, but as they
say, bug for bug compatible.
In retrospect, using bools for NeedsChown and NeedsCopyUp was a
mistake. Docker isn't actually tracking this stuff; they're just
doing a copy-up and permissions change unconditionally as long as
the volume is empty. They also have the two linked as one
operation, seemingly, despite happening at very different times
during container init. Replicating that in our stateful system is
nontrivial, hence the need for the new CopiedUp field. Basically,
we never want to chown a volume with contents in it, except if
that data is a result of a copy-up that resulted from mounting
into the current container. Tracking who did the copy-up is the
easiest way to do this.
Fixes#22571
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Now WaitForExit returns the exit code as stored in the db instead of
returning an error when the container was removed.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>