Kill is a fast syscall, so we can reduce the sleep time from 100ms to
10ms in hope to speed things up a bit.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Commit 067442b570 improved stopping/killing a container by detecting
whether the cleanup process has already fired and changed the state of
the container. Further improve on that by returning early instead of
trying to wait for the PID to finish. At that point we know that the
container has exited but the previous PID may have been recycled
already by the kernel.
[NO NEW TESTS NEEDED] - the absence of the two flaking tests recorded
in #17142 will tell.
Fixes: #17142
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Add a comment when SIGKILL is being used. It may help future readers
better comprehend what's going on and why.
[NO NEW TESTS NEEDED] - cannot test a comment :^)
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Move the stopSignal decl into the branch where it's actually used.
[NO NEW TESTS NEEDED] as it's just a small refactor.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The code can be simplified by using a timer directly.
[NO NEW TESTS NEEDED] - should not change behavior.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Reserved annotations are used internally by Podman and would effect
nothing when run with Kubernetes so we should not be generating these
annotations.
Fixes: https://github.com/containers/podman/issues/17105
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The container lock is released before stopping/killing which implies
certain race conditions with, for instance, the cleanup process changing
the container state to stopped, exited or other states.
The (remaining) flakes seen in #16142 and #15367 strongly indicate a
race in between the stopping/killing a container and the cleanup
process. To fix the flake make sure to ignore invalid-state errors.
An alternative fix would be to change `KillContainer` to not return such
errors at all but commit c77691f06f indicates an explicit desire to
have these errors being reported in the sig proxy.
[NO NEW TESTS NEEDED] as it's a race already covered by the system
tests.
Fixes: #16142Fixes: #15367
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Every time I look at a container-removal issue I wonder why the
container isn't locked directly here, so let's add a comment here.
I am not sure whether I would be better if callers took care of
locking but for now the comment will safe the future me and probably
other readers some time.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This is a cleaner solution and guarantees the variables
will be used before they are initialized.
[NO NEW TESTS NEEDED]
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The StoppedByUser variable indicates that the container was
requested to stop by a user. It's used to prevent restart policy
from firing (so that a restart=always container won't restart if
the user does a `podman stop`. The problem is we were setting it
*very* late in the stop() function. Originally, this was fine,
but after the changes to add the new Stopping state, the logic
that triggered restart policy was firing before StoppedByUser was
even set - so the container would still restart.
Setting it earlier shouldn't hurt anything and guarantees that
checks will see that the container was stopped manually.
Fixes#17069
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
While manually playing with --service-container, I encountered a number
of too verbose logs. For instance, there's no need to error-log when
the service-container has already been stopped.
For testing, add a new kube test with a multi-pod YAML which will
implicitly show that #17024 is now working.
Fixes: #17024
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Every podman command is paying the price for this compile even when they
don't use the Regex, this will speed up start of podman by a little.
[NO NEW TESTS NEEDED] Existing tests should catch issues.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
follow-up to 6886e80b45
when "podman -rm -f" is used on a container in "stopping" state, also
make sure it is terminated before removing it from the local storage.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
check that the container has a valid pid before attempting to use
kill($PID, 0) on it. If the PID==0, it means the container is already
stopped.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Do not allow for removing the service container unless all associated
pods have been removed. Previously, the service container could be
removed when all pods have exited which can lead to a number of issues.
Now, the service container is treated like an infra container and can
only be removed along with the pods.
Also make sure that a pod is unlinked from the service container once
it's being removed.
Fixes: #16964
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
if the container has no pid namespace, they are not killed when the
container process ends. In this case, attempt to kill them in the
same way.
The problem was noticed with toolbox where the exec'ed sessions are
not terminated when the container is stopped, blocking the system
shutdown.
[NO NEW TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
in several top-level API functions. These are the first line of
the function that contains them, which makes sense; we want to
capture any error returned by the function. However, making this
the first defer means that it is the last thing to run after the
function returns - meaning that the container's
`defer c.lock.Unlock()` has already fired, leading to a chance we
modify the container without holding its lock.
We could move the function around so it's no longer the first
defer, but then we'd have to call it twice (immediately after
`defer c.lock.Unlock()` if the container is not batched, and a
second time in a new `else` block right after the lock/sync call
to make sure we handle batched containers). Seems simpler to just
leave it like this.
[NO NEW TESTS NEEDED] Can't really test for DB corruption easily.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When you use podman logs with --until and --follow it should exit after
the requested until time and not keep hanging forever.
This fixes the behavior for the k8s-file backend.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When you use podman logs with --until and --follow it should exit after
the requested until time and not keep hanging forever.
To make this work I reworked the code to use the better journald event
reading code for logs as well. this correctly uses the sd_journal API
without having to compare the cursors to find the EOF.
The same problems exists for the k8s-file driver, I will fix this in the
next commit.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Instead of reading the full journal which can be expensive we can seek
based on the time.
If you have a journald with many podman events just compare the time
`time podman events --since 1s --stream=false` with and without this
patch.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The `containerCouldBeLogging` bool should not be false by default, when
--since is used we seek in the journal and can miss the start event so
that bool would stay false forever. This means that a running container
is not followed even when it should.
To fix this we can just set the `containerCouldBeLogging` bool based on
the current contianer state.
Fixes#16950
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
do not allow removing containers that are in the stopping state,
otherwise it can lead to a race condition where a "podman rm" removes
the container from the storage while another process is stopping the
same container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2155828
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This change aims to store an error message to the ContainerState struct
with the last known error from the Start, StartAndAttach, and Stop OCI
Runtime functions.
The goal was to act in accordance with Docker's behavior.
Fixes: #13729
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
This means we store things like config.json and the secret files
also on tmpfs, lowering wear on disk and leaving less stuff on disk
on an unclean shutdown.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This allows use to use STDOUT directly without having to call open
again, also this makes the export API endpoint much more performant
since it no longer needs to copy to a temp file.
I noticed that there was no export API test so I added one.
And lastly opening /dev/stdout will not work on windows.
Fixes#16870
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
always create a user namespace when running with euid != 0 since the
user is not owning the current mount namespace.
This issue happened on a Kubernetes cluster, where the pod was running
privileged but the UID was not 0, as it was configured in the image
itself.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This should simplify the db logic. We no longer need a extra db bucket
for the netns, it is still supported in read only mode for backwards
compat. The old version required us to always open the netns before we
could attach it to the container state struct which caused problem in
some cases were the netns was no longer valid.
Now we use the netns as string throughout the code, this allow us to
only open it when needed reducing possible errors.
[NO NEW TESTS NEEDED] Existing tests should cover it and it is only a
flake so hard to reproduce the error.
Fixes#16140
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We should have done this much earlier, most of the times CNI networks
just mean networks so I changed this and also fixed some function
names. This should make it more clear what actually refers to CNI and
what is just general network backend stuff.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When we read logs there can be full or partial lines, when it is full we
need to append a newline, thus the message length must be incremented by
one.
Fixes#16856
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Also update vendor of containers/storage and image
Cleanup display of added/dropped capabilties as well
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Also fix a number of duplicate words. Yet disable the new `dupword`
linter as it displays too many false positives.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Now that the OCI runtime specs have support for idmapped mounts, let's
use them instead of relying on the custom annotation in crun.
Also add the mechanism to specify the mapping to use. Pick the same
format used by crun so it won't be a breaking change for users that
are already using it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Init containers are removed once they exit, but podman
reports and error that the container does not exist, when
it was previously removed. Stop reporting missing containers
when removing.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
With the 4.0 network rewrite I introduced a regression in 094e1d70de.
It only covered the case where a checkpoint is restored via --import.
The normal restore path was not covered since the static ip/mac are now
part in an extra db bucket. This commit fixes that by changing the config
in the db.
Note that there were no test for --ignore-static-ip/mac so I added a big
system test which should cover all cases (even the ones that already
work). This is not exactly pretty but I don't have to enough time to
come up with something better at the moment.
Fixes#16666
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
subpath allows for only a subdirecty of a volumes data to be mounted in the container
add support for the named volume type sub path with others to follow.
resolves#12929
Signed-off-by: Charlie Doern <cbddoern@gmail.com>
When stopping the transient systemd timer/unit which powers running
health checks, make sure to ignore its dependencies. It turns out
that we're otherwise running into a timeout when running a container in
a systemd unit and reboot.
An alternative may be to further tweak some attributes/options when
creating the timer/unit via systemd-run but it seems safe to just ignore
the dependencies and stop.
[NO NEW TESTS NEEDED] - we don't yet have means to test reboots.
Fixes: #14531
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
When labes map is too big we may get syslog entry truncated.
This breaks JSON parsing making event loading impossible.
[NO NEW TESTS NEEDED]
Signed-off-by: Matej Vasek <mvasek@redhat.com>
The podman healthchecks are implemented using systemd timers, this works
great but it will never work on non systemd distros. Currently the logic
always assumes systemd is available and will fail with an error, so users
are forced to always run with `--no-healthcheck` to disable healthchecks
that are defined in an image for example. This is annoying and IMO
unnecessary, we should just default to no healthcheck on these systems.
First, use the systemd build tag to disable it at build time if this tag
is not used.
Second, use make sure systemd is used as init before trying
to use healthchecks. This could be the case when we are run in a container.
[NO NEW TESTS NEEDED] We do not have any non systemd VMs in CI AFAIK.
Fixes#16644
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This just calls GC on the local storage, which will remove any leftover
directories from previous containers that are not in the podman db anymore.
This is useful primarily for transient store mode, but can also help in
the case of an unclean shutdown.
Also adds some e2e test to ensure prune --external works.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This brings a performance improvement to `podman run` on top of the
other transient_store improvements in containers/storage:
Transient mode without transient bolt_db:
Benchmark 1: bin/podman run --transient-store=true --rm --pull=never --network=host --security-opt seccomp=unconfined fedora true
Time (mean ± σ): 130.6 ms ± 5.8 ms [User: 44.4 ms, System: 25.9 ms]
Range (min … max): 122.6 ms … 143.7 ms 21 runs
Transient mode with transient bolt_db:
Benchmark 1: bin/podman run --transient-store=true --rm --pull=never --network=host --security-opt seccomp=unconfined fedora true
Time (mean ± σ): 100.3 ms ± 5.3 ms [User: 40.5 ms, System: 24.9 ms]
Range (min … max): 93.0 ms … 111.6 ms 29 runs
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This handles the transient store options from the container/storage
configuration in the runtime/engine.
Changes are:
* Print transient store status in `podman info`
* Print transient store status in runtime debug output
* Add --transient-store argument to override config option
* Propagate config state to conmon cleanup args so the callback podman
gets the same config.
Note: This doesn't really change any behaviour yet (other than the changes
in containers/storage).
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Later changes will need to access it earlier, so move its creation to
just after the creation of StaticDir.
Note: For whatever reason this we created twice before, but we now
only do it once.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
The containers should be able to write to tmpfs mounted directories.
Also cleanup output of podman kube generate to not show default values.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When the new `events_container_create_inspect_data` option is enabled in
containers.conf set the `ContainersInspectData` event field for each
container-create event.
The data was requested for the purpose of auditing (e.g., intrusion
detection).
Jira: https://issues.redhat.com/browse/RUN-1702
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Startup healthchecks are similar to K8S startup probes, in that
they are a separate check from the regular healthcheck that runs
before it. If the startup healthcheck fails repeatedly, the
associated container is restarted.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Fix swapped NetInput and NetOutput container stats. This resulted
in `podman stats` showing outgoing traffic as NetInput and incoming
traffic as NetOutput. This change might be visible or cause problems
for users who are actively relying on those stats for monitoring reasons.
[NO NEW TEST NEEDED]
Signed-off-by: Ingo Becker <ingo@orgizm.net>
Since mountStorage and createNetNS run in parallel, the directory file
descriptors used by mountStorage were (rarely) propagated to the CNI
plugins. On FreeBSD, the CNI bridge plugin needs to make changes to the
network jail. This fails if there are any descriptors to open directories
to protect against host directories being visible to the jail's chroot.
Adding O_CLOEXEC to the unix.Open call in openDirectory ensures that these
descriptors are not visible to podman's child processes.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
When restarting a container, clean up the healthcheck state by removing
the old log on disk. Carrying over the old state can lead to various
issues, for instance, in a wrong failing streak and hence wrong
behaviour after the restart.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2144754
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Some error reporting logic got lost from (*Container).prepare during the
port. This adds the missing logic, similar to the Linux version.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Including the RawInput in the API output is meaningless.
Fixes: #16497
[NO NEW TESTS NEEDED]
Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
when reading from the /proc/$PID/cgroup file, treat ESRCH in the same
way as ENOENT since the kernel returns ESRCH if the file was opened
correctly but the target process exited before the open could be
performed.
Closes: https://github.com/containers/podman/issues/16383
[NO NEW TESTS NEEDED] it is a race condition that is difficult to
reproduce.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
On FreeBSD, the path argument to shm_open is not a filesystem path and we
must use shm_unlink to remove it. This changes the Linux build to also use
shm_unlink which avoids assuming that shared memory segments live in
/dev/shm.
Signed-off-by: Doug Rabson <dfr@rabson.org>
Whenever image has `arch` and `os` configured for `wasm/wasi` switch to
`crun-wasm` as extension if applicable, following will only work if
`podman` is using `crun` as default runtime.
[NO NEW TESTS NEEDED]
[NO TESTS NEEDED]
A test for this can be added when `crun-wasm` is part of CI.
Signed-off-by: Aditya R <arajan@redhat.com>
This reports the correct package versions in 'podman info' for conmon and
ociRuntime on FreeBSD which is needed for the 005-info system test.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Conceptually equivalent to networking by means of slirp4netns(1),
with a few practical differences:
- pasta(1) forks to background once networking is configured in the
namespace and quits on its own once the namespace is deleted:
file descriptor synchronisation and PID tracking are not needed
- port forwarding is configured via command line options at start-up,
instead of an API socket: this is taken care of right away as we're
about to start pasta
- there's no need for further selection of port forwarding modes:
pasta behaves similarly to containers-rootlessport for local binds
(splice() instead of read()/write() pairs, without L2-L4
translation), and keeps the original source address for non-local
connections like slirp4netns does
- IPv6 is not an experimental feature, and enabled by default. IPv6
port forwarding is supported
- by default, addresses and routes are copied from the host, that is,
container users will see the same IP address and routes as if they
were in the init namespace context. The interface name is also
sourced from the host upstream interface with the first default
route in the routing table. This is also configurable as documented
- sandboxing and seccomp(2) policies cannot be disabled
- only rootless mode is supported.
See https://passt.top for more details about pasta.
Also add a link to the maintained build of pasta(1) manual as valid
in the man page cross-reference checks: that's where the man page
for the latest build actually is -- it's not on Github and it doesn't
match any existing pattern, so add it explicitly.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Remove the container/pod ID file along with the container/pod. It's
primarily used in the context of systemd and are not useful nor needed
once a container/pod has ceased to exist.
Fixes: #16387
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
there is already the same check when using cgroupfs, but not when
using the systemd cgroup backend. The check is needed to avoid a
confusing error from the OCI runtime.
Closes: https://github.com/containers/podman/issues/16376
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Only want to report if user created local customized storage in
/etc/containers/storage.conf or in
$HOME/.config/containers/storage.conf, when resetting storage.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>