The netns dir has a special logic to bind mout itself and make itslef
shared. This code here didn't which lead to catastrophic bug during
netns unmounting as we were unable to unmount the netns as the mount got
duplicated and had the wrong parent mount. This caused us to loop forever
trying to remove the file.
Fixes https://issues.redhat.com/browse/RHEL-59620Fixes#23685
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This removes the need for a tricky/fragile namespace workaround.
Huge thanks to Paul for discovering documentation on the
Registry container, and how to override config.yml settings:
https://distribution.github.io/distribution/about/configuration/#override-specific-configuration-options
Drive-by: consistentize quotes in -eVAR="value". Minor, but
makes them all easier to read with emacs/vi syntax highlighting.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The "rm on stopping containers" test is flaking under high load,
probably because I bumped up two timeouts in the healthcheck
container that it relies on. Bump up this test's timeout as well.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...not just when running parallel Bats, because Bats
does not provide any way to know if we're parallel.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...of high system load (such as when running parallel tests).
Allow time for services to reach desired state, by retrying
a few times in a loop.
Signed-off-by: Ed Santiago <santiago@redhat.com>
There is no reason to disallow exposed sctp ports at all. As root we can
publish them find and as rootless it should error later anyway.
And for the case mentioned in the issue it doesn't make sense as the
port is not even published thus it is just part of the metadata which is
totally in all cases.
Fixes#23911
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Like we do in system tests now check for netns leaks in e2e as well. Now
because things run in parallel and this dir is shared we cannot test
after each test only once per suite. This will be a PITA to debug if
leaks happen as the netns files do not contain the container ID and are
just random bytes (maybe we should change this?)
Fixes#23715
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This fixes the problem where even as root we check the netns files from
root. But in order to catch any rootless bugs we must check the rootless
files from $XDG_RUNTIME_DIR/netns.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
...or at least as much as possible. Some tests cannot
be run in parallel due to #23750: "--events-backend=file"
does not actually work the way a naïve user would intuit.
Stop/die events are asynchronous, and can be gathered
by *ANY OTHER* podman process running after it, and if
that process has the default events-backend=journal,
that's where the event will be logged. See #23987 for
further discussion.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Need --layers=false in podman build, otherwise a buildah race
can trigger "layer not known" failures:
https://github.com/containers/buildah/issues/5674
Signed-off-by: Ed Santiago <santiago@redhat.com>
When running parallel, multiple tests could be trying to start
the registry at once. Make this parallel-safe.
Also, use a safer port range for the registry. Something
outside of /proc/sys/net/ipv4/ip_local_port_range
Sorry, I'm including a FIXME section that I haven't investigated
deeply enough.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add a few best-practices examples, and add a whole section
describing the dos and donts of writing parallel-safe tests.
Signed-off-by: Ed Santiago <santiago@redhat.com>
For tests run in parallel, show file number as |nnn| (vs [nnn])
Teach logformatter to distinguish the two, adding 'p' to anchors
in parallel tests. Necessary because in this scheme we run bats
twice, thus see 'ok 1' twice, and we want to differentiate them.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Workaround for #23292, where simultaneous 'pod create' commands
will all start a podman-build of the pause image, but only
one of them will be tagged, and the others will leak <none>
images.
Signed-off-by: Ed Santiago <santiago@redhat.com>
* podman manifest remove doesn't accept references as descriptions of
what to remove from a list or index; only use digests in the man page
* podman manifest remove only removes one thing at a time; correct the
man page examples
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
The issue is closed and I recently fixed a number of races (bf74797c69)
in the remote attach API that sound like exactly like the same error
that was mentioned in issue #9597.
As such I think this works, if it start flaking again we can revert this
or better fix the actual bug.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
As it turns on things are not so simple after all...
In podman-py it was reported[1] that waiting might hang, per our docs wait
on multiple conditions should exit once the first one is hit and not all
of them. However because the new wait logic never checked if the context
was cancelled the goroutine kept running until conmon exited and because
we used a waitgroup to wait for all of them to finish it blocked until
that happened.
First we can remove the waitgroup as we only need to wait for one of
them anyway via the channel. While this alone fixes the hang it would
still leak the other goroutine. As there is no way to cancel a goroutine
all the code must check for a cancelled context in the wait loop to no
leak.
Fixes 8a943311db ("libpod: simplify WaitForExit()")
[1] https://github.com/containers/podman-py/issues/425
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We were only splitting on tabs, not spaces, so we returned just a
single line most of the time, not an array of the fields in the
output of `ps`. Unfortunately, some of these fields are allowed
to contain spaces themselves, which makes things complicated, but
we got lucky in that Docker took the simplest possible solution
and just assumed that only one field would contain spaces and it
would always be the last one, which is easy enough to duplicate
on our end.
Fixes#23981
Signed-off-by: Matt Heon <mheon@redhat.com>
For the past two months we've been splitting system tests
into two categories: those that CAN be run in parallel,
and those that CANNOT. Much work has been done to replace
hardcoded names (mycontainer, mypod) with safename().
Hundreds of test runs, in CI and on Ed's laptop, have
proven this approach viable.
make {local,remote}system now runs in two steps: first
the serial ones, then the parallel ones. hack/bats will
now recognize the 'ci:parallel' tag and add --jobs (nprocs).
This requires some tweaking of leak_check, because there
can be umpteen tests running (affecting image/container/pod/etc
state) when any given test completes.
Rules for enabling parallelization in tests:
* use unique container/pod/volume/network names (safename)
* do not run 'podman rm -a' or 'rmi -a'
* never use the -l (--latest) option
* do not run 'podman ps/images' and expect precise output
Signed-off-by: Ed Santiago <santiago@redhat.com>