The recent fedora kernel 6.11.4 has a problem with ipv6 networks [1].
This is not a podman bug at all but rather a kernel regression. I can
reproduce the issue easily by running this test.
Given many users were hit by this add it to the distro level gating
which runs in the fedora openQA framework and then we should catch a
bad kernel like this hopefully in the future and prevent it from going
into stable.
[1] https://github.com/containers/podman/issues/24374
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Quadlet tests and some systemd tests leak unit files, as
reported by 'systemctl list-units --failed'. Clean them up.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The startup service is special because we have to transition from
startup to the normal unit. And in order to do so we kill ourselves (as
we are run as part of the service). This means we always exited 1 which
causes systemd to keep us failure and not remove the transient unit
unless "reset-failed" is called. As there is no process around to do
that we cannot really do this, thus make us exit(0) which makes more
sense.
Of course we could try to reset-failed the unit later but the code for
that seems more complicated than that.
Add a new test from Ed that ensures we check for all healthcheck units
not just the timer to avoid leaks. I slightly modified it to provide a
better error on leaks.
Fixes: 0bbef4b830 ("libpod: rework shutdown handler flow")
Fixes: #24351
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
As an internal consistency check, the pasta tests check for duplicated test
cases by grepping a log file for a parsed test id. However it uses
grep -F for the purpose which will not perform an exact match, but a
substring match. There are some tests which generate an id which is a
substring of the id for other tests, so when test order is randomised, this
can cause a spurious failure. This can happen in practice when running
the test in parallel with very high concurrency (e.g. -j 100).
Fix this by adding the -x option to grep, which only checks for full line
exact matches.
Fixes: https://github.com/containers/podman/issues/24342
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The additional image store feature assumes that images / layers
in the additional store never go away, while we do remove it after
this test. Try to repair the store.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Historically, non-schema1 images had a deterministic image ID == config digest.
With zstd:chunked, we don't want to deduplicate layers pulled by consuming the
full tarball and layers partially pulled based on TOC, because we can't cheaply
ensure equivalence; so, image IDs for images where a TOC was used differ.
To accommodate that, compare images using their configs digests, not using image IDs.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
When looking up the current-store image ID, do that
from the same output where we verify that the ID is from the
current store, instead of listing images twice.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The test got the stores RW status backwards.
Before zstd:chunked, both image IDs should be the same, so this used
to make no difference.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
when the current soft limit is higher than the new value, ulimit fails
to set the hard limit as (tested on Rawhide):
[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument
to avoid the problem, set also the soft limit:
[root@rawhide ~]# ulimit -n -H
12345678
[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument
[root@rawhide ~]# ulimit -n -SH 1048575
[root@rawhide ~]# ulimit -n -H
1048575
commit 71d5ee0e04 introduced the issue.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
As documented in the issue there is no way to wait for system units from
the user session[1]. This causes problems for rootless quadlet units as
they might be started before the network is fully up. TWhile this was
always the case and thus was never really noticed the main thing that
trigger a bunch of errors was the switch to pasta.
Pasta requires the network to be fully up in order to correctly select
the right "template" interface based on the routes. If it cannot find a
suitable interface it just fails and we cannot start the container
understandingly leading to a lot of frustration from users.
As there is no sign of any movement on the systemd issue we work around
here by using our own user unit that check if the system session
network-online.target it ready.
Now for testing it is a bit complicated. While we do now correctly test
the root and rootless generator since commit ada75c0bb8 the resulting
Wants/After= lines differ between them and there is no logic in the
testfiles themself to say if root/rootless to match specifics. One idea
was to use `assert-key-is-rootless/root` but that seemed like more
duplication for little reason so use a regex and allow both to make it
pass always. To still have some test coverage add a check in the system
test to ask systemd if we did indeed have the right depdendencies where
we can check for exact root/rootless name match.
[1] https://github.com/systemd/systemd/issues/3312Fixes#22197
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
...for debugging #24147, because "md5sum mismatch" is not
the best way to troubleshoot bytestream differences.
socat is run on the container, so this requires building a
new testimage (20241011). Bump to new CI VMs[1] which include it.
[1] https://github.com/containers/automation_images/pull/389
Signed-off-by: Ed Santiago <santiago@redhat.com>
The current mypod hack breaks down when running individual tests:
$ hack/bats 010 <<< barfs because it does not want pause-image!
Reason: Bats does not provide any official way to tell if tests
are being run in parallel.
Workaround: use an undocumented way.
Signed-off-by: Ed Santiago <santiago@redhat.com>
since the effect would be to lower the rlimits when their definition
is higher than the default value.
The test doesn't fail on the previous version, unless the system is
configured with a nofile ulimit higher than the default value.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2317721
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
In debian EST and MST7MDT are gone by default and moved to a special
package[1], instead of also installing that in the images lets use
different timezones in the test.
[1] 42c0008f86
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This command sequence causes SizeRootFs to change on foo:
podman tag foo newimagename
podman save ... newimagename
podman load ...
Solution: get foo completely out of the picture. Use an
airgapped image: new image, new digest, new everything.
Fixes: #23756
Signed-off-by: Ed Santiago <santiago@redhat.com>
There's an important reason why the healthcheck container in 055-rm
test uses 'sleep infinity' and not 'top. Document it.
And, the test itself wasn't actually working as intended. Make
it safer by confirming that the container actually enters
the "stopping" state.
Signed-off-by: Ed Santiago <santiago@redhat.com>
the kernel checks that both the uid and the gid are mapped inside the
user namespace, not only the uid:
/**
* privileged_wrt_inode_uidgid - Do capabilities in the namespace work over the inode?
* @ns: The user namespace in question
* @idmap: idmap of the mount @inode was found from
* @inode: The inode in question
*
* Return true if the inode uid and gid are within the namespace.
*/
bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
struct mnt_idmap *idmap,
const struct inode *inode)
{
return vfsuid_has_mapping(ns, i_uid_into_vfsuid(idmap, inode)) &&
vfsgid_has_mapping(ns, i_gid_into_vfsgid(idmap, inode));
}
for this reason, improve the check for hasCurrentUserMapped to verify
that the gid is also mapped, and if it is not, use an intermediate
mount for the container rootfs.
Closes: https://github.com/containers/podman/issues/24159
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Potential race between starting socat (which creates a socket
file) and processes accessing said socket. Or maybe not. I
dunno, I'm grasping at straws. This is an elusive flake.
Fixes: #23798 (I hope)
Signed-off-by: Ed Santiago <santiago@redhat.com>
Although podman has moved on from CNI, RHEL has not. Make
sure that builds on RHEL test the desired network backend(s).
Effective immediately, gating.yaml on all RHEL branches
must set CI_DESIRED_NETWORK (=cni or =netavark)
Signed-off-by: Ed Santiago <santiago@redhat.com>
Change getUnitDirs to maintain a slice in addition to the map and return the slice
Add helper functions to make the code more readable
Adjust unit tests
Restore system test
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
Yield to reality: if $XDG_RUNTIME_DIR is unset, assume a
reasonable default (rootless only). This clears up a
common failure in Fedora gating tests, and will probably
prevent future time wasters.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Primary motivator: 'curl -v' format changes in f42
Drive-bys:
* 127.0.0.1, not localhost
* use wait_for_port, not sleep
* show curl commands and their output, to ease debugging failures
* better failure assertions
Signed-off-by: Ed Santiago <santiago@redhat.com>
These flags can affect the output of the HealtCheck log. Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via `podman inspect`.
It is also limited to the last five executions and the first 500 characters per execution.
This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic `healthcheck service failed` record.
- The `--health-log-destination` flag sets the destination of the HealthCheck log.
- `none`: (default behavior) `HealthCheckResults` are stored in overlay containers. (For example: `$runroot/healthcheck.log`)
- `directory`: creates a log file named `<container-ID>-healthcheck.log` with JSON `HealthCheckResults` in the specified directory.
- `events_logger`: The log will be written with logging mechanism set by events_loggeri. It also saves the log to a default directory, for performance on a system with a large number of logs.
- The `--health-max-log-count` flag sets the maximum number of attempts in the HealthCheck log file.
- A value of `0` indicates an infinite number of attempts in the log file.
- The default value is `5` attempts in the log file.
- The `--health-max-log-size` flag sets the maximum length of the log stored.
- A value of `0` indicates an infinite log length.
- The default value is `500` log characters.
Add --health-max-log-count flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-max-log-size flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-log-destination flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
The various pasta port forwarding tests run a socat server inside a
container, then connect to it from a socat client on the host. Currently
we have the server bind to the same specific address within the container
as we connect to on the host.
That's not quite what we want. For "tap" tests where the traffic goes over
pasta's L2 link to the container it's fine, though unnecessary. For
"loopback" tests where traffic is forwarded by pasta at the L4 socket
level, however, it's not quite right. In this case the address used is
either 127.0.0.1 or ::. That's correct and as needed for the host side
address we're connecting to. However on the container side, this only
works because of an odd and arguably undesirable behaviour of pasta: we use
the fact that we have an L4 socket within the container to make such
"spliced" L4 connections appear as if they come from loopback within the
container. A container will generally expect it's loopback address to be
only accessible from within the container, and this odd behaviour may be
changed in pasta in future.
In any case, the binding of the container side server is unnecessary, so
simply remove it.
Link: https://github.com/containers/podman/issues/24045
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mostly just switch to safename. Rewrite setup() to guarantee
unique service file names, atomically created.
* IMPORTANT NOTE: enabling parallelization on these tests
triggers #24010 ("fragment file" flake), but only on my
f40 laptop. I have never seen the flake in Cirrus despite
many many runs in #23275. I am submitting this for review
and merging because even though _something_ is broken,
this breakage is unlikely to affect our CI.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Any test that uses --events-backend=file cannot be run in parallel
due to #23750. This seems to be a hard block, unfixable.
All other tests, enable ci:parallel.
And, bring in timing fixes#23600. Thanks, @Honny1!
Signed-off-by: Ed Santiago <santiago@redhat.com>
The format test flakes when quay is down, because we've
been doing 'podman search $IMAGE', which is a quay image.
Solution: check if local registry is running, and use it.
We don't need a real image.
Signed-off-by: Ed Santiago <santiago@redhat.com>
(where possible. Not all tests are parallelizable).
And, refactor two complicated tests into one. This one
is hard to review, sorry.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The netns dir has a special logic to bind mout itself and make itslef
shared. This code here didn't which lead to catastrophic bug during
netns unmounting as we were unable to unmount the netns as the mount got
duplicated and had the wrong parent mount. This caused us to loop forever
trying to remove the file.
Fixes https://issues.redhat.com/browse/RHEL-59620Fixes#23685
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This removes the need for a tricky/fragile namespace workaround.
Huge thanks to Paul for discovering documentation on the
Registry container, and how to override config.yml settings:
https://distribution.github.io/distribution/about/configuration/#override-specific-configuration-options
Drive-by: consistentize quotes in -eVAR="value". Minor, but
makes them all easier to read with emacs/vi syntax highlighting.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The "rm on stopping containers" test is flaking under high load,
probably because I bumped up two timeouts in the healthcheck
container that it relies on. Bump up this test's timeout as well.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...not just when running parallel Bats, because Bats
does not provide any way to know if we're parallel.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...of high system load (such as when running parallel tests).
Allow time for services to reach desired state, by retrying
a few times in a loop.
Signed-off-by: Ed Santiago <santiago@redhat.com>
This fixes the problem where even as root we check the netns files from
root. But in order to catch any rootless bugs we must check the rootless
files from $XDG_RUNTIME_DIR/netns.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This test is currently disabled due to several issues, only some of which
are described in the existing comments. Add some more details to clarify
the situation.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This name for the tests is misleading, since in the default configuration
podman will already configure a forwarding addres, which could forward
to either another local forwarder or an external nameserver on the host
side. What this test is really about is explicitly configuring the pasta
DNS forwarding address. Rename accordingly.
The IPv4 version of the test doesn't use the podman --dns option, only
the pasta --dns-forward option. This exercises the podman behaviour that
pasta --dns-forward options are added to /etc/resolv.conf automatically.
However there could also be other things in /etc/resolv.conf, so the
nslookup might not use the custom forwarding address for the lookup.
To fix that, split the test into two parts: one verifying that the custom
address is in /etc/resolv.conf and another performing the nslookup with an
explicit server address to make sure we exercise the pasta side as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In both the "Basic nameserver lookup" and "Local forwarder, IPv4" pasta
tests, we check whether DNS resolution is working by running "nslookup
127.0.0.1" in the container and checking if 1.0.0.127.in-addr.arpa is in
the output.
1.0.0.127.in-addr.arpa isn't the expected result of the resolution though,
it's just the DNS name that nslookup will tranlated 127.0.0.1 into. The
test mostly works, because nslookup echoes that on successful lookups.
However, it could also echo it in certain sorts of failure, so it's not a
very reliable test.
Furthermore, resolving 127.0.0.1 from a nameserver is a rather strange
thing to do. It's done that way because RFC1912[0] suggests it should
always resolve, even for nameservers on a disconnected network. But, this
doesn't really appear to be true in practice: a number of resolvers return
NXDOMAIN. That works by accident because nslookup seems to echo the
name above as part of the error message.
Change to instead looking up one of the root servers by name. This does
now rely on access to the global DNS during tests, but other podman tests
attempt to resolve google.com, so that should be ok. One of the root
servers is about as close to universal resolvability as it's possible to
get
[0] https://datatracker.ietf.org/doc/html/rfc1912#section-4.1
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The idea behind the "External resolver" tests is simply to check that we
can contact a nameserver, regardless of this configuration. To this end
the "IPv4" version looks up 127.0.0.1 which RFC1912[0] suggests should
always be resolvable.
The IPv6 version instead looks up [::1]. While it makes sense for
that to be resolvable in a similar way, there appear to be quite a few
nameservers which do not resolve it, making this test flaky.
Furthermore the idea behind resolving [::1] is that it should make
nslookup prefer to resolve over IPv6. That appears to be very
unreliable at best. Since making a different query doesn't actually
exercise anything different in pasta, drop the test.
The remaining IPv4 test isn't really specific to an "external" resolver,
it's simply checking that we can contact some sort of resolver with the
default podman configuration. Rename accordingly, and run it regardless of
IPv4 connectivity on the host: we can still query a nameserver about an
IPv4 address, even if we only have IPv6 connectivity ourselves.
[0] https://datatracker.ietf.org/doc/html/rfc1912#section-4.1
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "Local forwarder, IPv4" pasta test, amongst other things, checks that
podman's default DNS forwarding address - 169.254.0.1 - appears in the
container's /etc/resolv.conf. That's not really related to anything else
going on in that test (which is about _changing_ that default address).
So, move it into its own test case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
...or at least as much as possible. Some tests cannot
be run in parallel due to #23750: "--events-backend=file"
does not actually work the way a naïve user would intuit.
Stop/die events are asynchronous, and can be gathered
by *ANY OTHER* podman process running after it, and if
that process has the default events-backend=journal,
that's where the event will be logged. See #23987 for
further discussion.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Need --layers=false in podman build, otherwise a buildah race
can trigger "layer not known" failures:
https://github.com/containers/buildah/issues/5674
Signed-off-by: Ed Santiago <santiago@redhat.com>
When running parallel, multiple tests could be trying to start
the registry at once. Make this parallel-safe.
Also, use a safer port range for the registry. Something
outside of /proc/sys/net/ipv4/ip_local_port_range
Sorry, I'm including a FIXME section that I haven't investigated
deeply enough.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add a few best-practices examples, and add a whole section
describing the dos and donts of writing parallel-safe tests.
Signed-off-by: Ed Santiago <santiago@redhat.com>
For tests run in parallel, show file number as |nnn| (vs [nnn])
Teach logformatter to distinguish the two, adding 'p' to anchors
in parallel tests. Necessary because in this scheme we run bats
twice, thus see 'ok 1' twice, and we want to differentiate them.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Workaround for #23292, where simultaneous 'pod create' commands
will all start a podman-build of the pause image, but only
one of them will be tagged, and the others will leak <none>
images.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The issue is closed and I recently fixed a number of races (bf74797c69)
in the remote attach API that sound like exactly like the same error
that was mentioned in issue #9597.
As such I think this works, if it start flaking again we can revert this
or better fix the actual bug.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
For the past two months we've been splitting system tests
into two categories: those that CAN be run in parallel,
and those that CANNOT. Much work has been done to replace
hardcoded names (mycontainer, mypod) with safename().
Hundreds of test runs, in CI and on Ed's laptop, have
proven this approach viable.
make {local,remote}system now runs in two steps: first
the serial ones, then the parallel ones. hack/bats will
now recognize the 'ci:parallel' tag and add --jobs (nprocs).
This requires some tweaking of leak_check, because there
can be umpteen tests running (affecting image/container/pod/etc
state) when any given test completes.
Rules for enabling parallelization in tests:
* use unique container/pod/volume/network names (safename)
* do not run 'podman rm -a' or 'rmi -a'
* never use the -l (--latest) option
* do not run 'podman ps/images' and expect precise output
Signed-off-by: Ed Santiago <santiago@redhat.com>
...and remove one old skip() for older debian, but leave
two others in place and mark that they're still a problem.
Signed-off-by: Ed Santiago <santiago@redhat.com>
convert the owner UID and GID into the user namespace only when
":idmap" mount is used.
This changes the behaviour of :idmap with an empty volume. Now the
existing directory ownership is copied up as in the other case.
Closes: https://github.com/containers/podman/issues/23347
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Use safename. Add ci:parallel tags. Use a random port, not
hardcoded 9999. Do not remove pause image. And especially
do not "rm -a" anything.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...because it requires 100% control and knowledge of the
state of all images, containers, and volumes.
Use safename anyway, just in case we ever have a leak from here.
I'm finding safename sooooooo helpful when reading journal.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add ci:parallel tags; move one non-parallel-safe test to
another networking-test file; and a few drive-by fixes
Signed-off-by: Ed Santiago <santiago@redhat.com>
Use safename for containers and pods. Add ci:parallel tags.
And reenable distro-integration tests that had been skipped
due to a container-selinux bug that is now fixed.
Signed-off-by: Ed Santiago <santiago@redhat.com>
pasta added a new --map-guest-addr to option that maps a to the actual
host ip. This is exactly what we need for host.containers.internal
entry. So we now make use of this option by default but still have to
keep the exclude fallback because the option is very new and some
users/distros will not have it yet.
This also fixes an issue where the --dns-forward ip were not used when
using the bridge network mode, only useful when not using aardvark-dns
as this used the proper ips there already from the rootless netns
resolv.conf file.
Fixes#19213
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When you sort by repository a user most likely also want the tags to be
sorted as well. At the very least to get a stable output as the order
could be changed pull podman tag/pull even if they keep using the same
tag name.
Fixes#23803
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The kube generate command can now generate a yaml for
the Job kind and the kube play command can create a pod
and containers with podman when passed in a Job yaml.
Add relevant tests and docs for this.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Use safename, and add ci:parallel tags to all tests. (One
test was running "podman wait -l", which cannot work in
parallel. I choose to change it to "wait $cname", and
lose the -l testing)
Signed-off-by: Ed Santiago <santiago@redhat.com>
Where possible, use safename and add ci:parallel tags.
One test runs "podman kill -a", which would be unwise to run
in parallel with other tests.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add 'ci:parallel' tags to a few easy places. And, two
small easily-reviewed safename or random-port additions.
These have been working fine in #23275. I want to stop
carrying them there so I can work on simplifying my PR.
Signed-off-by: Ed Santiago <santiago@redhat.com>
- replace random_string with safename in container/network names
- add ci:parallel tags where possible.
- where not possible, add explanations
- fix a userns leak
Signed-off-by: Ed Santiago <santiago@redhat.com>
Workaround (NOT A FIX) for pasta issue #23482, wherein
podman logs includes a waitpid: ESRCH warning. Consensus
seems to be that this is a bug in socat.
Signed-off-by: Ed Santiago <santiago@redhat.com>
- fix a few missing safenames
- eliminate 'container rm -a'
- when running ps, do substring match, not exact
- where possible, add ci:parallel tags
- when not possible, explain
Also, fix a completely broken inspect test
Signed-off-by: Ed Santiago <santiago@redhat.com>
This started off as an attempt to make `podman stop` on a
container started with `--rm` actually remove the container,
instead of just cleaning it up and waiting for the cleanup
process to finish the removal.
In the process, I realized that `podman run --rmi` was rather
broken. It was only done as part of the Podman CLI, not the
cleanup process (meaning it only worked with attached containers)
and the way it was wired meant that I was fairly confident that
it wouldn't work if I did a `podman stop` on an attached
container run with `--rmi`. I rewired it to use the same
mechanism that `podman run --rm` uses, so it should be a lot more
durable now, and I also wired it into `podman inspect` so you can
tell that a container will remove its image.
Tests have been added for the changes to `podman run --rmi`. No
tests for `stop` on a `run --rm` container as that would be racy.
Fixes#22852
Fixes RHEL-39513
Signed-off-by: Matt Heon <mheon@redhat.com>
By default wait only waits for the exit of a container, there is really
no way to make it wait for the removal too when the container was
created with --rm. I though I found a clever way in 8a943311db but this
is not working race free. While it works most of the time any other
parallel process might call syncContainer() before the cleanup process
holds the lock until it removes it. As such the wait hack to only update
the state and not sync the exit file did not work so we can drop that.
However the test wants to wait for the removal to happen by the cleanup
process and we can already say --condition=removing to do this but this
will throw an error if the ctr was removed instead of counting this as
success so fix that as well.
Fixes#23640
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The usual, safename instead of hardcoded names or random_string.
And remove some rmi statements: we no longer clean up pause_image.
Been working great in #23275 all week.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...by using a crude port lock-and-reserve mechanism. This is
a small cherrypick from code that has been working in #23275
over dozens of CI runs. Am separating out into a small PR
because it's stable, harmless to serial runs, and will
simplify the eventual review of #23275.
Closes: #23488
Signed-off-by: Ed Santiago <santiago@redhat.com>
Use safename instead of hardcoded object names. Requires moving
a test table down, into the function itself instead of global,
because the table needs to know object names.
Also: sneak in a workaround for dealing with quay flakes (in
image search). The local registry is allowing almost all tests
to pass even when quay is down, but this one test still needs
to hit quay.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Now that on-failure exits right away the test is racy as the
RestartCount is not at the value we expect as the container is still
restarting in the background. As such add a timer based approach.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The current code did several complicated state checks that simply do not
work properly on a fast restarting container. It uses a special case for
--restart=always but forgot to take care of --restart=on-failure which
always hang for 20s until it run into the timeout.
The old logic also used to call CheckConmonRunning() but synced the
state before which means it may check a new conmon every time and thus
misses exits.
To fix the new the code is much simpler. Check the conmon pid, if it is
no longer running then get then check exit file and get exit code.
This is related to #23473 but I am not sure if this fixes it because we
cannot reproduce.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When will I learn not to dismiss something as "easy"?
Anyhow, this doesn't actually change anything parallel-wise
but it does reduce a race condition seen on heavily-loaded
slow systems, wherein a container goes into unhealthy before
we want it to. This version isn't perfect; I don't think
there's an ideal fix for this.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Only one test can be parallelized. Do so, and add a comment
to the other one explaining why it can't be.
Also, add some missing error-message checks.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Very few changes needed, all of them simple.
It is impossible to parallelize this entire file, because "stop -a".
Add tags to tests that can be parallelized, and comments to those
that can't.
Signed-off-by: Ed Santiago <santiago@redhat.com>
When the cidfile does not exists and ignore is set the cli parser skips
the file without error and we call into the backend code without any
names at all. This should logically be a NOP but on remote it caused all
containers to be returned which caused podman stop to stop everything in
this case.
Fixes#23554
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Do not rely on an arbitrary delay in order to ensure the port was bound
in the container. Instead this approach checks if the port is bound in
the netns and only then starts the client. This speeds up the entire
test file by 50% but more importantly in parallel testing it solves
hangs as the timeout there was unreliable.
Fixes#23471
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add support for the ServiceName key for all unit types
Extend the PodInfo struct into UnitInfo to consolidate all prepopulated data into a single map
Use the NodesInfo map instead of the resourceName
Update the UnitInfo in the convert function instead of returning it
No need to replace extension anymore just remove it
All e2e tests with dependencies on other Quadlet files moved to a separate section
Add the capability of overriding the service name in the test
Add e2e tests for the new functionality
Adjust integration tests
Update the MAN page
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
Use safename for containers, volumes, images.
Build a temporary scratch image for podman image mount, so
we can safely mount/umount it (instead of $IMAGE) without
risk of other parallel tests umounting it.
Fixed some oopsies ("$vol1" is empty string, so, NOP test)
And... an experiment. I'm leaving in my 'ci:parallel' tags
and notes, so I don't have to carry them in #23275. This
is harmless, basically just noisy comments. The drawback
is, if for some reason #23275 does not pan out, I'll have
to go back and remove those tags. Right now I'm feeling
pretty comfortable about this parallelization approach tho.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Use safename instead of hardcoded "test"
Start registry once, in setup_file(), instead of requiring
individual tests to do so.
Add explicit --authfile arg to a bunch of places that now need it
Minor cleanup and improvements in test descriptions. I may have
gotten a little carried away here, but if this test ever fails
these additions will make someone's life much easier.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Strictly speaking we don't need the path yet, but it existing
prevents a lot of strangeness in our path-checking logic to
validate the current Podman configuration, as it was the only
path that might not exist this early in init.
Fixes#23515
Signed-off-by: Matt Heon <mheon@redhat.com>
if idmap is specified for a volume, reverse the mappings when copying
up from the container, so that the original permissions are maintained.
Closes: https://github.com/containers/podman/issues/23467
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
BATS teardown logs are unreadable, making it almost impossible
to see tiny "Leaked this-or-that" messages.
Solution: new _run_podman_quiet() helper, replaces run_podman
in a small number of cases within teardown. Clunky, and
duplicative, sorry.
New helper for leak_check, basically spits out warnings (and
bumps error count) if it sees any output whatsoever from
individual "podman XXX ls" commands.
Signed-off-by: Ed Santiago <santiago@redhat.com>
We bind ports to ensure there are no conflicts and we leak them into
conmon to keep them open. However we bound the ports after the network
was set up so it was possible for a second network setup to overwrite
the firewall configs of a previous container as it failed only later
when binding the port. As such we must ensure we bind before the network
is set up.
This is not so simple because we still have to take care of
PostConfigureNetNS bool in which case the network set up happens after
we launch conmon. Thus we end up with two different conditions.
Also it is possible that we "leak" the ports that are set on the
container until the garbage collector will close them. This is not
perfect but the alternative is adding special error handling on each
function exit after prepare until we start conmon which is a lot of work
to do correctly.
Fixes https://issues.redhat.com/browse/RHEL-50746
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
I broke the kube external storage test in the course of my
safename PR: _write_test_yaml() with no command generated
a pod that did not trigger the conditions required for
this test.
Solution: run a container (top). Add new checks to prevent
this gap from happening again.
Signed-off-by: Ed Santiago <santiago@redhat.com>
These test steps check the automount feature with multi images for
following item:
1. multi images can be auotmounted with yaml file.
2. if there are same path exist in the images, the last one
should trumps.
3. the volume is mounted readonly in the container.
4. the volumes are only mounted in the specific container, but
not the whole pods.
Signed-off-by: Yiqiao Pu <ypu@redhat.com>
validate that a "podman generate" and "podman play" cycle restores the
specified user namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The tests didn't check anything actually because default_ifname requires
an ip version argument to work. Thus pasta_iface was empty, add new
checks to prevent this kind of error again.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The test assumes that if more than 1 ip on the host we should be able to
set host.containers.internal. This however is not how the logic works in
the code. What it actually does is to check all ips in the
rootless-netns and then it knows that it cannot use any of these ips.
This includes any podman bridge ips.
You can reproduce the error when you have only one ipv4 on the host then
run a container as root in the background and run the test:
hack/bats --rootless 505:host.containers.internal
So the failure here was that there was already a podman container
running as root on the default bridge thus the test saw 2 ips but then
the rootless run also uses the same subnet for its bridge and the code
knew that ip would not work either. I could have made another special
condition in test but the better way to work around it is to create a
new network. A new network will make sure there are no conflicting
subnets assigned so the test will pass.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>