Commit Graph

3847 Commits

Author SHA1 Message Date
Paul Holzinger 8a90765b90
filters: use new FilterID function from c/common
Remove code duplication and use the new FilterID function from
c/common. Also remove the duplicated ComputeUntilTimestamp in podman use
the one from c/common as well.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-13 17:49:41 +02:00
OpenShift Merge Robot 2a947c2f4b
Merge pull request #18869 from vrothberg/debug-18860
container wait: indicate timeout in error
2023-06-13 09:38:52 -04:00
Valentin Rothberg c0ab293131 container wait: indicate timeout in error
When waiting for a container, there may be a time window where conmon
has already exited but the container hasn't been fully cleaned up.
In that case, we give the container at most 20 seconds to be fully
cleaned up.  We cannot wait forever since conmon may have been killed or
something else went wrong.

After the timeout, we optimistically assume the container to be cleaned
up and its exit code to present.  If no exit code can be found, we
return an error.

Indicate in the error whether the timeout kicked in to help debug
(transient) errors and flakes (e.g., #18860).

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-06-13 13:48:29 +02:00
Toshiki Sonoda 6f821634ad libpod: Podman info output more network information
podman info prints the network information about binary path,
package version, program version and DNS information.

Fixes: #18443

Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
2023-06-13 11:19:29 +09:00
OpenShift Merge Robot 3cae574ab2
Merge pull request #18507 from mheon/fix_rm_depends
Fix `podman rm -fa` with dependencies
2023-06-12 13:27:34 -04:00
Paul Holzinger ab502fc5c4
criu: return error when checking for min version
There is weird issue #18856 which causes the version check to fail.
Return the underlying error in these cases so we can see it and debug
it.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-12 15:29:21 +02:00
Matthew Heon 2ebc9004f4 Ignore spurious warnings when killing containers
There are certain messages logged by OCI runtimes when killing a
container that has already stopped that we really do not care
about when stopping a container. Due to our architecture, there
are inherent races around stopping containers, and so we cannot
guarantee that *we* are the people to kill it - but that doesn't
matter because Podman only cares that the container has stopped,
not who delivered the fatal signal.

Unfortunately, the OCI runtimes don't understand this, and log
various warning messages when the `kill` command is invoked on a
container that was already dead. These cause our tests to fail,
as we now check for clean STDERR when running Podman. To work
around this, capture STDERR for the OCI runtime in a buffer only
for stopping containers, and go through and discard any of the
warnings we identified as spurious.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-08 09:19:47 -04:00
Matthew Heon f1ecdca4b6 Ensure our mutexes handle recursive locking properly
We use shared-memory pthread mutexes to handle mutual exclusion
in Libpod. It turns out that these have configurable options for
how to handle a recursive lock (IE, a thread trying to lock a
lock that the same thread had previously locked). The mutex can
either deadlock, or allow the duplicate lock without deadlocking.
Default behavior is, helpfully, unspecified, so if not explicitly
set there is no clear indication of which of these behaviors will
be seen. Unfortunately, today is the first I learned of this, so
our initial implementation did *not* explicitly set our preferred
behavior.

This turns out to be a major problem with a language like Golang,
where multiple goroutines can (and often do) use the same OS
thread. So we can have two goroutines trying to stop the same
container, and if the no-deadlock mutex behavior is in use, both
threads will successfully acquire the lock because the C library,
not knowing about Go's lightweight threads, sees the same PID
trying to lock a mutex twice, and allows it without question.

It appears that, at least on Fedora/RHEL/Debian libc, the default
(unspecified) behavior of the locks is the non-deadlocking
version - so, effectively, our locks have been of questionable
utility within the same Podman process for the last four years.
This is somewhat concerning.

What's even more concerning is that the Golang-native sync.Mutex
that was also in use did nothing to prevent the duplicate locking
(I don't know if I like the implications of this).

Anyways, this resolves the major issue of our locks not working
correctly by explicitly setting the correct pthread mutex
behavior.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-07 14:09:12 -04:00
Matthew Heon a750cd9876 Fix a race removing multiple containers in the same pod
If the first container to get the pod lock is the infra container
it's going to want to remove the entire pod, which will also
remove every other container in the pod. Subsequent containers
will get the pod lock and try to access the pod, only to realize
it no longer exists - and that, actually, the container being
removed also no longer exists.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-07 14:09:12 -04:00
Matthew Heon 0e47465e4a Discard errors when a pod is already removed
This was causing some CI flakes. I'm pretty sure that the pods
being removed already isn't a bug, but just the result of another
container in the pod removing it first - so no reason not to
ignore the errors.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-07 14:09:12 -04:00
Matthew Heon 398e48a24a Change Inherit to use a pointer to a container
This fixes a lint issue, but I'm keeping it in its own commit so
it can be reverted independently if necessary; I don't know what
side effects this may have. I don't *think* there are any
issues, but I'm not sure why it wasn't a pointer in the first
place, so there may have been a reason.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-07 14:09:07 -04:00
OpenShift Merge Robot c99d42b8e4
Merge pull request #18798 from edsantiago/fix_filters
filters: better handling of id=
2023-06-07 12:31:11 -04:00
Ed Santiago 992093ae91 filters: better handling of id=
For filter=id=XXX (containers, pods) and =ctr-ids=XXX (pods):

  if XXX is only hex characters, treat it as a PREFIX
  otherwise, treat it as a REGEX

Add tests. Update documentation. And fix an incorrect help message.

Fixes: #18471

Signed-off-by: Ed Santiago <santiago@redhat.com>
2023-06-07 05:29:06 -06:00
Matt Heon 944673c883 Address review feedback and add manpage notes
The inspect format for `.LockNumber` needed to be documented.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-06-06 11:04:59 -04:00
Matt Heon 4fda7936c5 `system locks` now reports held locks
To debug a deadlock, we really want to know what lock is actually
locked, so we can figure out what is using that lock. This PR
adds support for this, using trylock to check if every lock on
the system is free or in use. Will really need to be run a few
times in quick succession to verify that it's not a transient
lock and it's actually stuck, but that's not really a big deal.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-06-05 19:34:36 -04:00
Matt Heon 0948c078c2 Add a new hidden command, podman system locks
This is a general debug command that identifies any lock
conflicts that could lead to a deadlock. It's only intended for
Libpod developers (while it does tell you if you need to run
`podman system renumber`, you should never have to do that
anyways, and the next commit will include a lot more technical
info in the output that no one except a Libpod dev will want).
Hence, hidden command, and only implemented for the local driver
(recommend just running it by SSHing into a `podman machine` VM
in the unlikely case it's needed by remote Podman).

These conflicts should normally never happen, but having a
command like this is useful for debugging deadlock conditions
when they do occur.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-06-05 14:47:12 -04:00
Matt Heon 1013696ad2 Add number of free locks to `podman info`
This is a nice quality-of-life change that should help to debug
situations where someone runs out of locks (usually when a bunch
of unused volumes accumulate).

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-06-05 14:00:40 -04:00
Matt Heon 3b39eb1333 Include lock number in pod/container/volume inspect
Being able to easily identify what lock has been allocated to a
given Libpod object is only somewhat useful for debugging lock
issues, but it's trivial to expose and I don't see any harm in
doing so.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-06-05 12:28:50 -04:00
David Gibson b2c0006706 pasta: Correct handling of unknown protocols
setupPasta() has logic to handle forwarding of TCP or UDP ports.  It has
what looks like logic to give an error if trying to forward ports of any
other protocol.  However, there's a straightforward error in this that it
will in fact only give the error if you try to use a protocol called
"default".  Other unknown protocols will fall through and result in a
nonsensical pasta command line which will almost certainly cause a cryptic
error later on.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2023-06-05 12:21:08 +10:00
Matthew Heon 2c9f18182a The removeContainer function now accepts a struct
We had something like 6 different boolean options (removing a
container turns out to be rather complicated, because there are a
million-odd things that want to do it), and the function
signature was getting unreasonably large. Change to a struct to
clean things up.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-01 16:27:27 -04:00
Matthew Heon ef1a22cdea Fix a deadlock when removing pods
The infra container would try to remove the pod, despite the pod
already being in the process of being removed - oops. Add a check
to ensure we don't try and remove the pod when called by the
`podman pod rm` command.

Also, wire up noLockPod - it wasn't previously wired in, which is
concerning, and could be related?

Finally, make a few minor fixes to un-break lint.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-01 16:27:25 -04:00
Matthew Heon 8cb5d39d43 Pods now return what containers were removed with them
This probably should have been in the API since the beginning,
but it's not too late to start now.

The extra information is returned (both via the REST API, and to
the CLI handler for `podman rm`) but is not yet printed - it
feels like adding it to the output could be a breaking change?

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-01 16:24:59 -04:00
Matthew Heon bc1a31ce6d Make RemoveContainer return containers and pods removed
This allows for accurate reporting of dependency removal, but the
work is still incomplete: pods can be removed, but do not report
the containers they removed as part of said removal. Will add
this in a subsequent commit.

Major note: I made ignoring no-such-container errors automatic
once it has been determined that a container did exist in the
first place. I can't think of any case where this would not be a
TOCTOU - IE, no reason not to ignore them. The `--ignore` option
to `podman rm` should still retain meaning as it will ignore
errors from containers that didn't exist in the first place.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-01 16:24:56 -04:00
Matthew Heon e8d7456278 Add an API for removing a container and dependencies
This is the initial stage of implementation. The current API
functions but does not report the additional containers and pods
removed. This is necessary to properly display results to the
user after `podman rm --all`.

The existing remove-dependencies code has been removed in favor
of this more native solution.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-06-01 15:32:50 -04:00
OpenShift Merge Robot a7e23d341d
Merge pull request #18756 from Luap99/tz
libpod: fix timezone handling
2023-06-01 14:16:20 -04:00
OpenShift Merge Robot e91f6f16bf
Merge pull request #15867 from boaz0/closes_15754
Fix: display online_cpus in compat REST API
2023-06-01 11:03:14 -04:00
Paul Holzinger 34c258b419
libpod: fix timezone handling
The current way of bind mounting the host timezone file has problems.
Because /etc/localtime in the image may exist and is a symlink under
/usr/share/zoneinfo it will overwrite the targetfile. That confuses
timezone parses especially java where this approach does not work at
all. So we end up with an link which does not reflect the actual truth.

The better way is to just change the symlink in the image like it is
done on the host. However because not all images ship tzdata we cannot
rely on that either. So now we do both, when tzdata is installed then
use the symlink and if not we keep the current way of copying the host
timezone file in the container to /etc/localtime.

Also note that we need to rebuild the systemd image to include tzdata in
order to test this as our images do not contain the tzdata by default.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2149876

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-01 11:04:13 +02:00
Jan Hendrik Farr f097728891 set max ulimits for rootless on each start
Signed-off-by: Jan Hendrik Farr <github@jfarr.cc>
2023-05-31 09:20:31 +00:00
Boaz Shuster 5c7d50f08c Fix: display online_cpus in compat REST API
Signed-off-by: Boaz Shuster <boaz.shuster.github@gmail.com>
2023-05-31 07:41:30 +03:00
Valentin Rothberg 08b0d93ea3 kube play: exit-code propagation
Implement means for reflecting failed containers (i.e., those having
exited non-zero) to better integrate `kube play` with systemd.  The
idea is to have the main PID of `kube play` exit non-zero in a
configurable way such that systemd's restart policies can kick in.

When using the default sdnotify-notify policy, the service container
acts as the main PID to further reduce the resource footprint.  In that
case, before stopping the service container, Podman will lookup the exit
codes of all non-infra containers.  The service will then behave
according to the following three exit-code policies:

 - `none`: exit 0 and ignore containers (default)
 - `any`: exit non-zero if _any_ container did
 - `all`: exit non-zero if _all_ containers did

The upper values can be passed via a hidden `kube play
--service-exit-code-propagation` flag which can be used by tests and
later on by Quadlet.

In case Podman acts as the main PID (i.e., when at least one container
runs with an sdnotify-policy other than "ignore"), Podman will continue
to wait for the service container to exit and reflect its exit code.

Note that this commit also fixes a long-standing annoyance of the
service container exiting non-zero.  The underlying issue was that the
service container had been stopped with SIGKILL instead of SIGTERM and
hence exited non-zero.  Fixing that was a prerequisite for the exit-code
propagation to work but also improves the integration of `kube play`
with systemd and hence Quadlet with systemd.

Jira: issues.redhat.com/browse/RUN-1776
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-05-25 14:46:34 +02:00
Valentin Rothberg 6dbc138339 prune exit codes only when container doesn't exist
Make sure to prune container exit codes only when the associated
container does not exist anymore.  This is needed when checking if any
container in kube-play exited non-zero and a building block for the
below linked Jira card.

[NO NEW TESTS NEEDED] - there are no unit tests for exit code pruning.

Jira: https://issues.redhat.com/browse/RUN-1776
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-05-25 13:14:27 +02:00
OpenShift Merge Robot 688e6dbef1
Merge pull request #18640 from HirazawaUi/add-pasta-to-podman-info
podman: Add pasta to podman info
2023-05-25 06:55:04 -04:00
binghongtao 977b3cdbf6
podman: Add pasta to podman info
[NO NEW TESTS NEEDED]

Fixes: #18561

Signed-off-by: binghongtao <695097494plus@gmail.com>
2023-05-25 00:39:52 +08:00
OpenShift Merge Robot fe64f79469
Merge pull request #18636 from mtrmac/cleanupStorage-error
Fix, and reduce repetitiveness, in container cleanup error handling
2023-05-23 07:43:01 -04:00
OpenShift Merge Robot ca7d0128b2
Merge pull request #18619 from vyasgun/pr/events-volume-name
fix: event --filter volume=vol-name should compare the event name with volume name
2023-05-23 02:42:57 -04:00
Miloslav Trmač 032d4a95f0 Consolidate error handling in Runtime.removeContainer
Use a helper to handle the cleanupErr logic instead of
copy&pasting it EIGHT times.

Also modifies the returned errors to be wrapped with a context,
and changes the text of the logged errors a bit.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-05-22 19:14:06 +02:00
Miloslav Trmač f556e58bb0 Consolidate error handling in Container.cleanupStorage
Use a shared helper instead of copy&pasting the handling
of cleanupErr EIGHT times.

This changes the wording of logged error text, and the error
in one case, a bit.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-05-22 19:14:06 +02:00
Miloslav Trmač 4969c552ec Fix reporting errors on container unmount
[NO NEW TESTS NEEDED]
... because testing this would require us to intentionally
create an inconsistent state, which should ideally not be possible...
(and because at this point I don't even know what the reported failure
was.)

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-05-22 19:11:56 +02:00
OpenShift Merge Robot af8d19dc2e
Merge pull request #18581 from vrothberg/fix-18572
wait: look for exit code in stopped state
2023-05-22 11:51:14 -04:00
Gunjan Vyas 5f29c7bf98 fix: podman event --filter volume=vol-name should compare the event name with volume name
Fixes: https://github.com/containers/podman/issues/18618

Signed-off-by: Gunjan Vyas <vyasgun20@gmail.com>
2023-05-22 19:11:15 +05:30
Valentin Rothberg 1b9272a060 wait: look for exit code in stopped state
Make sure to look for the container's exit code when it's in stopped
state.  With `--restart=always`, the container seems to stay in the
stopped state which led the wait logic to loop until the 20 seconds
timeout for the cleanup process to have finished kicks in.

Also defensively make sure to loop when the container is in stopped
state but no exit code has been written yet.

Add a regression test to make sure Podman doesn't wait more than 20
seconds.  Even on a CI machine under high load I expect it to take much
much much less than that, so I do not expect this test to flake in the
future.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-05-22 14:53:19 +02:00
Erik Sjölund 685c736185 source code comments and docs: fix typos, language, Markdown layout
- fix a/an before noun
- fix loose -> lose
- fix "the the"
- fix lets -> let's
- fix Markdown layout
- fix a few typos
- remove unnecessary text in troubleshooting.md

Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
2023-05-22 07:52:16 +02:00
OpenShift Merge Robot a8291227de
Merge pull request #18620 from HirazawaUi/find_slirp4netns_from_helper_binaries_dir
podman: Added find slirp4netns binary file from helper_binaries_dir
2023-05-20 06:18:07 -04:00
binghongtao 29749362a0
podman: Added find slirp4netns binary file from helper_binaries_dir
[NO NEW TESTS NEEDED]

Fixes: #18568
Signed-off-by: binghongtao <695097494plus@gmail.com>
2023-05-20 03:17:22 +08:00
Giuseppe Scrivano 7c53a463b2
stats: get mem limit from the cgroup
b25b330306 introduced this behaviour.

It was fine at the time because we didn't support "container update",
so the limit could not be changed at runtime.  Since it is not
possible to change the memory limit at runtime, read the limit as
reported from the cgroup.

https://github.com/containers/crun/pull/1217 is required for crun.

Closes: https://github.com/containers/podman/issues/18621

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-05-19 14:59:43 +02:00
Daniel J Walsh 13f787842d
Fix handling of .containenv on tmpfs
Fixes: https://github.com/containers/podman/issues/18531

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-05-13 06:03:21 -04:00
OpenShift Merge Robot c307aeba37
Merge pull request #18506 from nalind/so-much-diffsize
libpod/Container.rootFsSize(): use recorded image sizes
2023-05-10 06:08:12 -04:00
OpenShift Merge Robot 7a5daa0df3
Merge pull request #18492 from daw1012345/main
Ensure the consistent setting of the HOME env variable on container start
2023-05-10 05:34:02 -04:00
Dawid Kulikowski 01e20818cc
Ensure the consistent setting of the HOME env variable on container start
Signed-off-by: Dawid Kulikowski <git@dawidkulikowski.pl>
2023-05-09 16:34:28 +02:00
Valentin Rothberg 1fb3cdf8a8 sqlite: disable WAL mode
As shown in #17831, WAL mode plays a role in causing `database is locked`
errors.  Those are errors, in theory, should not happen as the DB should
busy wait.  mattn/go-sqlite3/issues/274 has some comments indicating
that the busy handler behaves differently in WAL mode which may be an
explanation to the error.

For now, let's disable WAL mode and only re-enable it when we have
clearer understanding of what's going on.  The upstream issue along with
the SQLite documentation do not give me the clear guidance that I would
need.

[NO NEW TESTS NEEDED] - flake is only reproducible in CI.

Fixes: #18356
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-05-09 15:54:26 +02:00