Commit Graph

23331 Commits

Author SHA1 Message Date
Ed Santiago 9c3921ca58 CI: parallel-safe namespaces system test
An easy one :-)

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-08-21 05:36:04 -06:00
openshift-merge-bot[bot] 43fe3ebaf3
Merge pull request #23670 from mheon/fix_stop_and_rmi
Fix `podman stop` and `podman run --rmi`
2024-08-20 21:45:51 +00:00
openshift-merge-bot[bot] 096b52c987
Merge pull request #23669 from lsm5/packit-fedora-all
[skip-ci] Packit: update targets for propose-downstream
2024-08-20 20:26:17 +00:00
openshift-merge-bot[bot] 8be89caf46
Merge pull request #23675 from ruihe774/fix-pod-cgroups
Add key CgroupsMode in Quadlet container unit
2024-08-20 18:47:16 +00:00
openshift-merge-bot[bot] a3adad98af
Merge pull request #23676 from ruihe774/infra-name
quadlet: set infra name to %s-infra
2024-08-20 15:52:53 +00:00
Misaki Kasumi 1ccccde183 quadlet: add key CgroupsMode
Signed-off-by: Misaki Kasumi <misakikasumi@outlook.com>
2024-08-20 22:09:36 +08:00
Matt Heon 458ba5a8af Fix `podman stop` and `podman run --rmi`
This started off as an attempt to make `podman stop` on a
container started with `--rm` actually remove the container,
instead of just cleaning it up and waiting for the cleanup
process to finish the removal.

In the process, I realized that `podman run --rmi` was rather
broken. It was only done as part of the Podman CLI, not the
cleanup process (meaning it only worked with attached containers)
and the way it was wired meant that I was fairly confident that
it wouldn't work if I did a `podman stop` on an attached
container run with `--rmi`. I rewired it to use the same
mechanism that `podman run --rm` uses, so it should be a lot more
durable now, and I also wired it into `podman inspect` so you can
tell that a container will remove its image.

Tests have been added for the changes to `podman run --rmi`. No
tests for `stop` on a `run --rm` container as that would be racy.

Fixes #22852
Fixes RHEL-39513

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-08-20 09:51:18 -04:00
Misaki Kasumi e5c91ff03a quadlet: set infra name to %s-infra
e.g.: if the pod name is systemd-awd, the name of its infra container will be systemd-awd-infra

Signed-off-by: Misaki Kasumi <misakikasumi@outlook.com>
2024-08-20 18:20:02 +08:00
openshift-merge-bot[bot] 426aac362e
Merge pull request #23666 from rhatdan/apple
Do not segfault on hard stop
2024-08-19 18:02:00 +00:00
Lokesh Mandvekar 76e1bbb57d
[skip-ci] Packit: update targets for propose-downstream
When a new Fedora target (41 currently), is branched from rawhide,
`fedora-latest` packit target will point to fedora-41, while
`fedora-latest-stable` will point to `fedora-40`. Once fedora-41 has
released, `fedora-latest` and `fedora-latest-stable` will both point to
fedora-41.

So, to have Packit continue to create PRs for Fedora 40 once Fedora 41
has released, it's best to set the target back to `fedora-all`.

Caution: `fedora-all` will create v5.x PRs for Fedora-39 until it goes
EOL. Since dist-git PRs need to be merged manually, we can just manually
close F39 PRs.

Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>
2024-08-19 11:37:15 -04:00
Daniel J Walsh fc30620cdb
Do not segfault on hard stop
Podman machine on MAC can segfault on hard stop.

Fixes: 23654

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-08-19 11:14:30 -04:00
openshift-merge-bot[bot] 7899358ec9
Merge pull request #23662 from edsantiago/disable-ginkgo-flake-retry
CI: disable ginkgo flake retries
2024-08-19 13:54:29 +00:00
openshift-merge-bot[bot] 8068bb2fc8
Merge pull request #23650 from Luap99/e2e-systemd-rm
test/e2e: rm systemd start test
2024-08-19 13:21:26 +00:00
openshift-merge-bot[bot] 8fd0d90669
Merge pull request #23663 from Luap99/criu-update
vendor: update go-criu to latest
2024-08-19 13:15:57 +00:00
Ed Santiago 145c7511aa CI: disable ginkgo flake retries
As discussed in Aug 13 Cabal, we are almost at a point where
e2e tests are reliably passing on the first try. Let's try to
keep things that way, and not hide future flakes.

Closes: #17967

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-08-19 06:48:15 -06:00
Paul Holzinger b755a1c60b
vendor: update go-criu to latest
There is no new version yet but we like to use the new code[1] to debug
a flake[2] in the podman CI. It will not fix it but the new error might
give us a better idea what is going on.

[1] https://github.com/checkpoint-restore/go-criu/pull/175
[2] https://github.com/containers/podman/issues/18856

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 13:51:56 +02:00
openshift-merge-bot[bot] 84126fdba1
Merge pull request #23614 from Luap99/lint-1.60.1
update golangci-lint to 1.60.1
2024-08-19 11:19:06 +00:00
Paul Holzinger 84a85319e1
golangci-lint: make darwin linting happy
Fix one minor issue with vfkit error handling. First checking if err !=
nil OR errors.Is() is pointless as the err != is already true.
Second nilerr complains because we return nil when we hit an error
branch, in this case this is correct because an error means VM is
stopped.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:29 +02:00
Paul Holzinger 666d839157
golangci-lint: make windows linting happy
It qemu cannot be compiled anyway so make sure we do not try to compile
parts where the typechecker complains about on windows.
Also all the e2e test files are only used on linux as well.
pkg/machine/wsl also reports some error but to many for me to fix them
now. One minor problem was fixed in pkg/machine/machine_windows.go.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:29 +02:00
Paul Holzinger cd2a4c7cac
test/e2e: remove kernel version check
We need something newer than 4.14 anyway now for most Podman functions.
This is breaking liniting on windows as the function doesn't work there.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:29 +02:00
Paul Holzinger 6c0d94328f
golangci-lint: remove most skip dirs
Now that we have propert !remote tags set everywhere we can just rely on
that and do not need to skip any dirs.
Also on linux do not lint three times, one remote run is enough.
We still have to skip the test dir for windows/macos though or we need
to add linux build tags there everywhere as well. This seems simpler.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:28 +02:00
Paul Holzinger 942f789a88
set !remote build tags where needed
The new golangci-lint version 1.60.1 has problems with typecheck when
linting remote files. We have certain pakcages that should never be
inlcuded in remote but the typecheck tries to compile all of them but
this never works and it seems to ignore the exclude files we gave it.

To fix this the proper way is to mark all packages we only use locally
with !remote tags. This is a bit ugly but more correct. I also moved the
DecodeChanges() code around as it is called from the client so the
handles package which should only be remote doesn't really fit anyway.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:28 +02:00
Paul Holzinger c17daf2b09
update golangci-lint to 1.60.1
Fixes new spotted issues around printf() formats and using os.Setenv()
in tests.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-19 11:41:28 +02:00
openshift-merge-bot[bot] 5f069d8742
Merge pull request #23648 from containers/renovate/github.com-vbauerster-mpb-v8-8.x
fix(deps): update module github.com/vbauerster/mpb/v8 to v8.8.1
2024-08-16 16:31:24 +00:00
Paul Holzinger 57016f5cc3
test/e2e: rm systemd start test
We have a lot of systemd and quadlet based tests in the system tests.
This test doesn't seem very useful and it seems to flake so just remove
it.

Fixes #23480

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-16 17:33:30 +02:00
openshift-merge-bot[bot] 98d52b6131
Merge pull request #23646 from Luap99/wait-remove
podman wait: allow waiting for removal of containers
2024-08-16 15:14:34 +00:00
renovate[bot] e2e2763b0e
fix(deps): update module github.com/vbauerster/mpb/v8 to v8.8.1
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-08-16 15:06:28 +00:00
openshift-merge-bot[bot] 670b245b67
Merge pull request #23644 from Luap99/update-container-status
libpod: remove UpdateContainerStatus()
2024-08-16 15:03:26 +00:00
Paul Holzinger 80639df27a
podman wait: allow waiting for removal of containers
By default wait only waits for the exit of a container, there is really
no way to make it wait for the removal too when the container was
created with --rm. I though I found a clever way in 8a943311db but this
is not working race free. While it works most of the time any other
parallel process might call syncContainer() before the cleanup process
holds the lock until it removes it. As such the wait hack to only update
the state and not sync the exit file did not work so we can drop that.

However the test wants to wait for the removal to happen by the cleanup
process and we can already say --condition=removing to do this but this
will throw an error if the ctr was removed instead of counting this as
success so fix that as well.

Fixes #23640

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-16 15:44:02 +02:00
Paul Holzinger ddece758a4
libpod: remove UpdateContainerStatus()
There are two major problems with UpdateContainerStatus()
First, it can deadlock when the the state json is to big as it tries to
read stderr until EOF but it will never hit EOF as long as the runtime
process is alive. This means if the runtime json is to big to git into
the pipe buffer we deadlock ourselves.
Second, the function modifies the container state struct and even adds
and exit code to the db however when it is called from the stop() code
path we will be unlocked here.

While the first problem is easy to fix the second one not so much. And
when we cannot update the state there is no point in reading the from
runtime in the first place as such remove the function as it does more
harm then good.

And add some warnings the the functions that might be called unlocked.

Fixes #22246

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-16 15:34:16 +02:00
openshift-merge-bot[bot] e8410b8395
Merge pull request #23645 from Luap99/mount-race
podman mount: fix storage/libpod ctr race
2024-08-16 13:24:24 +00:00
Paul Holzinger 7a7aec355b
podman mount: fix storage/libpod ctr race
When we create a container we first create it in the storage then in the
libpod db so there is a tiny window where it is seen as storage ctr but
then by the time we mount it we see it was a libpod container.

Fixes #23637

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-16 13:42:05 +02:00
openshift-merge-bot[bot] 8c132cc388
Merge pull request #23595 from edsantiago/parallel-safe-random-free-port
CI: system tests: make random_free_port() parallel-safe
2024-08-16 11:15:09 +00:00
openshift-merge-bot[bot] f69ede1138
Merge pull request #23636 from edsantiago/safename-252
CI: quadlet tests: make parallel-safe
2024-08-16 08:30:06 +00:00
openshift-merge-bot[bot] 951f774864
Merge pull request #23635 from crd477/patch-1
remove trailing comma in example
2024-08-15 20:15:51 +00:00
openshift-merge-bot[bot] 48c8994984
Merge pull request #23630 from Gchbg/dockerscript
Fix podman-docker.sh under -eu shells
2024-08-15 20:13:17 +00:00
openshift-merge-bot[bot] 85780ce114
Merge pull request #23632 from edsantiago/safename-610
CI: format test: make parallel-safe
2024-08-15 20:10:23 +00:00
Ed Santiago 480d43748a CI: quadlet tests: make parallel-safe
The usual, safename instead of hardcoded names or random_string.
And remove some rmi statements: we no longer clean up pause_image.

Been working great in #23275 all week.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-08-15 10:56:51 -06:00
Ed Santiago 420bd16a21 CI: system tests: make random_free_port() parallel-safe
...by using a crude port lock-and-reserve mechanism. This is
a small cherrypick from code that has been working in #23275
over dozens of CI runs. Am separating out into a small PR
because it's stable, harmless to serial runs, and will
simplify the eventual review of #23275.

Closes: #23488

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-08-15 10:04:51 -06:00
Chad Dougherty 478b262f9b
remove trailing comma in example
Signed-off-by: Chad Dougherty <crd@acm.org>
2024-08-15 11:21:27 -04:00
Ed Santiago 1a1d2646df CI: format test: make parallel-safe
Use safename instead of hardcoded object names. Requires moving
a test table down, into the function itself instead of global,
because the table needs to know object names.

Also: sneak in a workaround for dealing with quay flakes (in
image search). The local registry is allowing almost all tests
to pass even when quay is down, but this one test still needs
to hit quay.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-08-15 08:34:26 -06:00
Georgi Chulkov 004c040ca2 Fix podman-docker.sh under -eu shells (fixes #23628)
Signed-off-by: Georgi Chulkov <git@gch.bg>
2024-08-15 17:15:52 +03:00
openshift-merge-bot[bot] 734c4b98d4
Merge pull request #23519 from Luap99/netns-cleanup
update c/common to add some netns cleanup fixes
2024-08-15 12:39:22 +00:00
openshift-merge-bot[bot] b28290278b
Merge pull request #23601 from Luap99/wait
libpod: simplify WaitForExit()
2024-08-15 12:25:35 +00:00
Paul Holzinger 6fb10421fb
docs: update podman-wait man page
Waiting now actually makes sure to exit on first container exit. Also
notice that it does not wait for --rm to have the container removed at
this point.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 13:32:41 +02:00
Paul Holzinger 94fd5fe6f7
libpod: remove duplicated HasVolume() check
removeVolume() already does the same check so we do not need it twice.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Paul Holzinger a65aecd260
podman volume rm --force: fix ABBA deadlock
We cannot get first the volume lock and the container locks. Other code
paths always have to first lock the container and the lock the volumes,
i.e. to mount/umount them. As such locking the volume fust can always
result in ABBA deadlocks.

To fix this move the lock down after the container removal. The removal
code is racy regardless of the lock as the volume lcok on create is no
longer taken since commit 3cc9db8626 due another deadlock there.

Fixes #23613

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Paul Holzinger b6beed9f76
test/system: fix network cleanup restart test
Now that on-failure exits right away the test is racy as the
RestartCount is not at the value we expect as the container is still
restarting in the background. As such add a timer based approach.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Paul Holzinger 30eb6b6aae
libpod: do not stop pod on init ctr exit
Init containers are meant to exit early before other containers are
started. Thus stopping the infra container in such case is wrong.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00
Paul Holzinger 8a943311db
libpod: simplify WaitForExit()
The current code did several complicated state checks that simply do not
work properly on a fast restarting container. It uses a special case for
--restart=always but forgot to take care of --restart=on-failure which
always hang for 20s until it run into the timeout.

The old logic also used to call CheckConmonRunning() but synced the
state before which means it may check a new conmon every time and thus
misses exits.

To fix the new the code is much simpler. Check the conmon pid, if it is
no longer running then get then check exit file and get exit code.

This is related to #23473 but I am not sure if this fixes it because we
cannot reproduce.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-08-15 11:07:27 +02:00