automation-tests

Commit Graph

Author	SHA1	Message	Date
Paul Holzinger	4aaa5cb6f0	stopIfOnlyInfraRemains: log all errors Log all stopping errors for each container so we actually see the real cause. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-12 12:11:26 +02:00
Paul Holzinger	f2a03e5753	libpod: cleanupNetwork() return error Return the error not just log as the caller can then decide to log this and exit > 0. I also removed the c.valid check as I do not see what the purpose of this would be. c.valid is only false when the ctr was removed but then we should never get there as Cleanup() will not work on a container in removing state. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-08-09 10:57:24 +02:00
Matt Heon	eb7ce80cf9	Create volume path before state initialization Strictly speaking we don't need the path yet, but it existing prevents a lot of strangeness in our path-checking logic to validate the current Podman configuration, as it was the only path that might not exist this early in init. Fixes #23515 Signed-off-by: Matt Heon <mheon@redhat.com>	2024-08-06 13:42:09 -04:00
openshift-merge-bot[bot]	69862b7251	Merge pull request #23460 from Luap99/cleanup-term libpod: inhibit SIGTERM during cleanup()	2024-08-06 11:38:27 +00:00
Giuseppe Scrivano	3ae1568933	libpod: fix volume copyup with idmap if idmap is specified for a volume, reverse the mappings when copying up from the container, so that the original permissions are maintained. Closes: https://github.com/containers/podman/issues/23467 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-08-01 22:49:27 +02:00
Giuseppe Scrivano	61def05cd9	libpod: avoid hang on errors Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-08-01 22:49:27 +02:00
openshift-merge-bot[bot]	803ef5c16f	Merge pull request #23384 from edsantiago/root-namespace CI: enable root user namespaces	2024-08-01 10:32:16 +00:00
Paul Holzinger	7610cedc80	libpod: inhibit SIGTERM during cleanup() The network cleanup can handle it when it is killed half way through as it spits out a bunch of error in that case on the next cleanup attempt. Try to avoid getting into such a state and ignore sigterm during this section. Of course we stil can get SIGKILL so we should work on fixing the underlying problems in network cleanup but let's see if this helps us with the CI flakes in the meantime. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-31 19:00:16 +02:00
Paul Holzinger	4c3531a1a4	fix network cleanup flake in play kube When using service containers and play kube we create a complicated set of dependencies. First in a pod all conmon/container cgroups are part of one slice, that slice will be removed when the entire pod is stopped resulting in systemd killing all processes that were part in it. Now the issue here is around the working of stopPodIfNeeded() and stopIfOnlyInfraRemains(), once a container is cleaned up it will check if the pod should be stopped depending on the pod ExitPolicy. If this is the case it wil stop all containers in that pod. However in our flaky test we calle podman pod kill which logically killed all containers already. Thus the logic now thinks on cleanup it must stop the pod and calls into pod.stopWithTimeout(). Then there we try to stop but because all containers are already stopped it just throws errors and never gets to the point were it would call Cleanup(). So the code does not do cleanup and eventually calls removePodCgroup() which will cause all conmon and other podman cleanup processes of this pod to be killed. Thus the podman container cleanup process was likely killed while actually trying to the the proper cleanup which leaves us in a bad state. Following commands such as podman pod rm will try to the cleanup again as they see it was not completed but then fail as they are unable to recover from the partial cleanup state. Long term network cleanup needs to be more robust and ideally should be idempotent to handle cases were cleanup was killed in the middle. Fixes #21569 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-31 16:59:43 +02:00
Paul Holzinger	77081df8cd	libpod: bind ports before network setup We bind ports to ensure there are no conflicts and we leak them into conmon to keep them open. However we bound the ports after the network was set up so it was possible for a second network setup to overwrite the firewall configs of a previous container as it failed only later when binding the port. As such we must ensure we bind before the network is set up. This is not so simple because we still have to take care of PostConfigureNetNS bool in which case the network set up happens after we launch conmon. Thus we end up with two different conditions. Also it is possible that we "leak" the ports that are set on the container until the garbage collector will close them. This is not perfect but the alternative is adding special error handling on each function exit after prepare until we start conmon which is a lot of work to do correctly. Fixes https://issues.redhat.com/browse/RHEL-50746 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-30 14:39:08 +02:00
Giuseppe Scrivano	b59918e536	libpod: force rootfs for OCI path with idmap when a --rootfs is specified with idmap, always use the specified rootfs since we need a new mount on top of the original directory. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-07-27 19:25:10 +02:00
Giuseppe Scrivano	61f0230c31	kube: record infra user namespace if there is an annotation that specifies the user namespace for the infra container, then make sure it is used for the entire pod. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-07-24 12:10:48 +02:00
Giuseppe Scrivano	e97bb79b7a	kube: invert branches it increases readability as it doesn't need the negation, and the first branch is shorter. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-07-24 12:10:47 +02:00
Daniel J Walsh	7768cf235e	Run codespell on source Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2024-07-23 07:28:23 -04:00
openshift-merge-bot[bot]	249d042035	Merge pull request #23343 from Luap99/fix-hc-output libpod: correctly capture healthcheck output	2024-07-22 12:18:34 +00:00
Paul Holzinger	b6b61a6a49	libpod: add hidden env to set sqlite timeout Some users want to experiment with different timeout values. Fixes #23236 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-22 12:59:00 +02:00
Paul Holzinger	55b6e4c3e8	podman pod stats: fix race when ctr process exits Like commit `55749af0c7` but for podman pod stats not the normal podman stats. We must ignore ErrCtrStopped here as well as this will happen when the container process exited. While at it remove a useless argument from the function as it was always nil and restructure the logic flow to make it easier to read. Fixes #23334 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-22 10:30:42 +02:00
Paul Holzinger	5e8884ab0d	libpod: correctly capture healthcheck output Using the scanner is just unnecessary complicated an buggy as it will not read the final line with a newline. There is also the problem that it happens in a separate goroutine so it could loose output if we read the array before the scanner was done. The API accepts a Writer so we can just directly use a bytes.Buffer which captures all output in memory without the need of another goroutine. This also means that now we always include the final newline in the output. I checked with docker and they do the same so this is good. Fixes #23332 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-19 15:16:55 +02:00
openshift-merge-bot[bot]	88c68a4b58	Merge pull request #23271 from giuseppe/drop-unmount-for-overlay-storage test: podman system service doesn't leak mount on termination	2024-07-15 12:20:11 +00:00
Giuseppe Scrivano	fbc4768a00	libpod: shutdown Stop waits for handlers completion wait for handlers currently being processed. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-07-15 11:41:28 +02:00
Giuseppe Scrivano	6832a35f65	libpod: cleanup store at shutdown shutdown the containers store so that the home directory mount is not leaked when "podman system service" exits. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-07-15 11:41:28 +02:00
Paul Holzinger	3280da0500	fix race conditions in start/attach logic The current code did something like this: lock() getState() unlock() if state != running lock() getState() == running -> error unlock() This of course is wrong because between the first unlock() and second lock() call another process could have modified the state. This meant that sometimes you would get a weird error on start because the internal setup errored as the container was already running. In general any state check without holding the lock is incorrect and will result in race conditions. As such refactor the code to combine both StartAndAttach and Attach() into one function that can handle both. With that we can move the running check into the locked code. Also use typed error for this specific error case then the callers can check and ignore the specific error when needed. This also allows us to fix races in the compat API that did a similar racy state check. This commit changes slightly how we output the result, previously a start on already running container would never print the id/name of the container which is confusing and sort of breaks idempotence. Now it will include the output except when --all is used. Then it only reports the ids that were actually started. Fixes #23246 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-12 15:11:34 +02:00
openshift-merge-bot[bot]	04bd415c74	Merge pull request #23167 from mheon/fix_rhel_37948 Ignore result of EvalSymlinks on ENOENT	2024-07-11 20:13:02 +00:00
Matt Heon	830e550073	Ignore result of EvalSymlinks on ENOENT When the path does not exist, filepath.EvalSymlinks returns an empty string - so we can't just ignore ENOENT, we have to discard the result if an ENOENT is returned. Should fix Jira issue RHEL-37948 Signed-off-by: Matt Heon <mheon@redhat.com>	2024-07-11 09:39:56 -04:00
Farya L. Maerten	c819c7a973	create runtime's worker queue before queuing any job It seems that if some background tasks are queued in libpod's Runtime before the worker's channel is set up (eg. in the refresh phase), they are not executed later on, but the workerGroup's counter is still ticked up. This leads podman to hang when the imageEngine is shutdown, since it waits for the workerGroup to be done. fixes containers/podman#22984 Signed-off-by: Farya Maerten <me@ltow.me>	2024-07-09 11:15:29 +02:00
Paul Holzinger	62956ac192	libpod: first delete container then cidfile I am seeing a weird flake in my parallel system test PR. The issue is that system units generated by podman systemd generate leave a container in the Removing state behind. As far as I can tell the porblems seems to be that the cleanup process is killed while it tries to remove the container from the db. Because the cidfile was removed before the ExecStopPost=podman rm ... process no longer had access to the cidfile and reported no error because it runs with --ignore. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-05 10:27:42 +02:00
Paul Holzinger	6db8ff7f7b	libpod/container_top_linux.c: fix missing header As this file uses open it needs to include fcntl.h. This should fix the build error seen on epel9[1], not sure why it works on the other platforms. [1] https://download.copr.fedorainfracloud.org/results/packit/containers-podman-23113/epel-9-aarch64/07672197-podman/builder-live.log.gz Fixes `65ed96585d` ("podman top: join the container userns") Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-06-27 10:50:17 +02:00
Paul Holzinger	65ed96585d	podman top: join the container userns When we execute ps(1) in the container and the container uses a userns with a different id mapping the user id field will be wrong. To fix this we must join the userns in such case. Fixes #22293 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-06-26 11:10:56 +02:00
Paul Holzinger	def182d396	restore: fix missing network setup The restore code path never called completeNetworkSetup() and this means that hosts/resolv.conf files were not populated. This fix is simply to call this function. There is a big catch here. Technically this is suposed to be called after the container is created but before it is started. There is no such thing for restore, the container runs right away. This means that if we do the call afterwards there is a short interval where the file is still empty. Thus I decided to call it before which makes it not working with PostConfigureNetNS (userns) but as this does not work anyway today so I don't see it as problem. Fixes #22901 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-06-24 18:52:02 +02:00
openshift-merge-bot[bot]	bf2de4177b	Merge pull request #23064 from giuseppe/podman-pass-timeout-stop-to-systemd container: pass StopTimeout to the systemd slice	2024-06-23 14:57:55 +00:00
Giuseppe Scrivano	49eb5af301	libpod: intermediate mount if UID not mapped into the userns if the current user is not mapped into the new user namespace, use an intermediate mount to allow the mount point to be accessible instead of opening up all the parent directories for the mountpoint. Closes: https://github.com/containers/podman/issues/23028 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-06-21 18:01:26 +02:00
Giuseppe Scrivano	08a8429459	libpod: avoid chowning the rundir to root in the userns so it is possible to remove the code to make the entire directory world accessible. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-06-21 18:01:26 +02:00
Giuseppe Scrivano	c81f075f43	libpod: do not chmod bind mounts with the new mount API is available, the OCI runtime doesn't require that each parent directory for a bind mount must be accessible. Instead it is opened in the initial user namespace and passed down to the container init process. This requires that the kernel supports the new mount API and that the OCI runtime uses it. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-06-21 18:01:26 +02:00
Giuseppe Scrivano	094bc673ef	libpod: unlock the thread if possible Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-06-21 18:01:26 +02:00
Giuseppe Scrivano	7d22f04f56	container: pass KillSignal and StopTimeout to the systemd scope so that they are honored when systemd terminates the scope. Closes: https://issues.redhat.com/browse/RHEL-16375 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-06-21 13:46:08 +02:00
Giuseppe Scrivano	e48f3137c0	libpod: fix comment Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-06-21 10:07:55 +02:00
Marius Hoch	6dd9abf9ec	sqlite_state: Fix RewriteVolumeConfig The VolumeConfig table does not have an ID column, thus use the Name column to update it. Fixes #23052 Signed-off-by: Marius Hoch <mail@mariushoch.de>	2024-06-20 11:39:44 +02:00
openshift-merge-bot[bot]	00bcd9aa81	Merge pull request #22733 from nalind/system-check Add `podman system check`	2024-06-13 10:35:56 +00:00
Giuseppe Scrivano	730a215025	podman: add new hidden flag --pull-option add a new flag that allows to override the pull options configured in the storage.conf file. e.g.: --pull-option="enable_partial_images=false" can be specified to Podman to disable partial pulls even if enabled. Leave it as a hidden configuration flag for now since the API itself is marked as experimental in c/storage. Currently c/storage doesn't honor the overrides, being fixed with https://github.com/containers/storage/pull/1966 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-06-12 15:48:36 +02:00
Paul Holzinger	a9de888a15	libpod: do not resuse networking on start If a container was stopped and we try to start it before we called cleanup it tried to reuse the network which caused a panic as the pasta code cannot deal with that. It is also never correct as the netns must be created by the runtime in case of custom user namespaces used. As such the proper thing is to clean the netns up first. Also change a e2e test to report better errors. It is not directly related to this chnage but it failed on v1 of this patch so we noticed the ugly error message it produced. Thanks to Ed for the fix. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-06-07 17:50:28 +02:00
Doug Rabson	ffc8522646	libpod: fix 'podman kube generate' on FreeBSD This avoids dereferencing c.config.Spec.Linux if it is nil, which is the case on FreeBSD. [NO NEW TESTS NEEDED] Signed-off-by: Doug Rabson <dfr@rabson.org>	2024-06-05 10:38:30 +01:00
openshift-merge-bot[bot]	b63767866e	Merge pull request #22895 from Luap99/hc-startup-leak libpod: do not leak systemd hc startup unit timer	2024-06-04 17:41:21 +00:00
Paul Holzinger	e8ea1e7632	libpod: do not leak systemd hc startup unit timer This fixes a regression added in commit `4fd84190b8`, because the name was overwritten by the createTimer() timer call the removeTransientFiles() call removed the new timer and not the startup healthcheck. And then when the container was stopped we leaked it as the wrong unit name was in the state. A new test has been added to ensure the logic works and we never leak the system timers. Fixes #22884 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-06-04 18:03:46 +02:00
Nalin Dahyabhai	fec58a4571	Add `podman system check` for checking storage consistency Add a `podman system check` that performs consistency checks on local storage, optionally removing damaged items so that they can be recreated. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>	2024-06-04 10:00:37 -04:00
Bo Wang	7243c7109c	fix(libpod): add newline character to the end of container's hostname file debian's man (5) hostname page states "The file should contain a single newline-terminated hostname string." [NO NEW TESTS NEEDED] fix #22729 Signed-off-by: Bo Wang <wangbob@uniontech.com>	2024-06-04 15:20:04 +08:00
Giuseppe Scrivano	4ece83bdf9	libpod: cleanup default cache on system reset Closes: https://github.com/containers/podman/issues/22825 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-29 11:10:55 +02:00
openshift-merge-bot[bot]	af8fe2b75e	Merge pull request #22764 from giuseppe/give-more-time-to-healthcheck-status-change libpod: wait another interval for healthcheck	2024-05-28 13:21:43 +00:00
openshift-merge-bot[bot]	eee0dc256a	Merge pull request #22727 from mheon/chown_all_the_time Always chown volumes when mounting into a container	2024-05-23 12:34:07 +00:00
Matthew Heon	046c0e5fc2	Only stop chowning volumes once they're not empty When an empty volume is mounted into a container, Docker will chown that volume appropriately for use in the container. Podman does this as well, but there are differences in the details. In Podman, a chown is presently a one-and-done deal; in Docker, it will continue so long as the volume remains empty. Mount into a dozen containers, but never add content, the chown occurs every time. The chown is also linked to copy-up; it will always occur when a copy-up occurred, despite the volume now not being empty. This PR changes our logic to (mostly) match Docker's. For some reason, the chowning also stops if the volume is chowned to root at any point. This feels like a Docker bug, but as they say, bug for bug compatible. In retrospect, using bools for NeedsChown and NeedsCopyUp was a mistake. Docker isn't actually tracking this stuff; they're just doing a copy-up and permissions change unconditionally as long as the volume is empty. They also have the two linked as one operation, seemingly, despite happening at very different times during container init. Replicating that in our stateful system is nontrivial, hence the need for the new CopiedUp field. Basically, we never want to chown a volume with contents in it, except if that data is a result of a copy-up that resulted from mounting into the current container. Tracking who did the copy-up is the easiest way to do this. Fixes #22571 Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2024-05-22 17:47:01 -04:00
Giuseppe Scrivano	d094a9f18e	podman: fix --sdnotify=healthy with --rm Now WaitForExit returns the exit code as stored in the db instead of returning an error when the container was removed. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-05-22 21:34:38 +02:00

1 2 3 4 5 ...

4189 Commits