This way has a huge disadvantage: The user will not see an error when he
uses a non-existent option. Another disadvantage is, that if we add more
options within podman, they might collide with the names chosen by
plugins. Such issues might be hard to debug.
The advantage is that the usage is very nice:
--network bridge:opt1=val1,opt2=val2.
Alternatively, we could put this behind `opt=`, which is harder to use,
but would solve all issues above:
--network bridge:opt=opt1=val1,opt=opt2=val2
Signed-off-by: Michael Zimmermann <sigmaepsilon92@gmail.com>
Final cleanup. Has been working fine in #23257 for weeks.
Not much gain here, but every little bit helps.
Signed-off-by: Ed Santiago <santiago@redhat.com>
All the backend work was done a while back for image volumes, so
this is effectively just plumbing the option in for volumes in
the parser logic. We do need to change the return type of the
volume parser as it only worked on spec.Mount before (which does
not have subpath support, so we'd have to pass it as an option
and parse it again) but that is cleaner than the alternative.
Fixes#20661
Signed-off-by: Matt Heon <mheon@redhat.com>
First, creating a global file /etc/system-fips was never a good idea for
testing as it affects other running tests at the same time.
And as of a recent change to FIPS mounts[1] we no longer use the file so
the test breaks with c/common v0.61. Instead it uses the kernel file
/proc/sys/crypto/fips_enabled which requires the real fips mode to be
activated and that in turn requires a reboot. As such this is not
somthing that can be tested in upstream CI like that.
[1] https://github.com/containers/common/pull/2174
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Previous version was badly broken: it relied on 'make'
rebuilding a file under cwd, which is a no-no; and, in
the case where we don't have a source directory, just
blindly hoped that there'd be a system-installed .service
file with the correct path to podman.
Solution:
. if running in source directory, run sed directly into
destination service file in $UNIT_DIR. This is ugly
duplication of a line in Makefile.
. if NOT running in a source directory, check $PODMAN:
. if it's /usr/bin/podman, continue. Include a warning
that will be shown only on test failure.
. otherwise skip, because we don't know what we're testing
Signed-off-by: Ed Santiago <santiago@redhat.com>
* treadmill script: handle an obscure corner case
wherein the script would bail because it thought
there were no buildah-vendor changes.
* two new test skips
* update the diffs; line-number changes due to buildah
PRs touching helpers.bash
Signed-off-by: Ed Santiago <santiago@redhat.com>
- fix issues found by recvcheck
- skip k8s files from recvcheck
- remove two removed linters gomnd and execinquery
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Up to now this test has been run using:
PODMAN_TIMEOUT=2 run_podman kube play ...
...and this gives podman time to start the pod before getting
the signal.
When run in parallel, under heavy load, the above command seems
to time out before podman has gotten its act together. Weird
things happen, like weird exit status and (most crucially)
zombie containers.
Solution: wait for container to actually start before we kill it.
Signed-off-by: Ed Santiago <santiago@redhat.com>
These tests verify that podman successfully adds (or
fails to add) a connection to an SSH server based on
the entries in the `~/.ssh/known_hosts` file.
In particular `system connection add` should succeed if:
- there is no `know_hosts` file
- `known_hosts` has an entry that matches the first protocol/key returned
by the SSH server
- `known_hosts` has an entry that matches the first protocol/key returned
by the SSH server
- `known_hosts` has an entry for another SSH server, not for the target server
It should fail if the `known_host` file has an entry for
the target server that matches the protocol but not the key.
Depends on containers/common#2212
Fixes#23575
Signed-off-by: Mario Loriedo <mario.loriedo@gmail.com>
Regression test for #23550. Setting the TZDIR env should make no
difference for the local timezone as this is not a real timezone name
that is resolved from that directory.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add support for inspecting Mounts which include SubPaths.
Handle SubPaths for kubernetes image volumes.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This commit resolves an issue where network creation and removal events were not being logged in `podman events`. A new function has been introduced in the `events` package to ensure consistent logging of network lifecycle events. This update will allow users to track network operations more effectively through the event log, improving visibility and aiding in debugging network-related issues.
Fixes: #24032
Signed-off-by: Sainath Sativar <Sativar.sainath@gmail.com>
By default today, the container is always started if its pod is also
started. This prevents to create custom with systemd where containers in
a pod could be started through their `[Install]` section.
We add a key `StartWithPod=`, enabled by default, that enables one to
disable that behavior.
This prevents the pod service from changing the state of the container
service.
Fixes#24401
Signed-off-by: Farya L. Maerten <me@ltow.me>
API clients expect the status code quickly otherwise they can time out.
If we do not flush we may not write the header immediately and only when
futher logs are send.
Fixes#23712
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
One of the problems with the Events() API was that you had to call it in
a new goroutine. This meant the the error returned by it had to be read
back via a second channel. This cuased other bugs in the past but here
the biggest problem is that basic errors such as invalid since/until
options were not directly returned to the caller.
It meant in the API we were not able to write http code 200 quickly
because we always waited for the first event or error from the
channels. This in turn made some clients not happy as they assume the
server hangs on time out if no such events are generated.
To fix this we resturcture the entire event flow. First we spawn the
goroutine inside the eventer Read() function so not all the callers have
to. Then we can return the basic error quickly without the goroutine.
The caller then checks the error like any normal function and the API
can use this one to decide which status code to return.
Second we now return errors/event in one channel then the callers can
decide to ignore or log them which makes it a bit more clear.
Fixes c46884aa93 ("podman events: check for an error after we finish reading events")
Fixes#23712
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We never want the toolchain as the default is to use the same as the go
version. So the only purpose of toolchain is to force a newer compiler
than necessary which we do not want as we are getting build by many
different distributions and block builds that would otherwise work fine
is just not helpful to anyone.
Also update the go.mod comments remind people that there should be no
toolchain. The make vendor target with the toolchain will now guarantee
this so the CI will fail otherwise.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Debug for #23913, I though if we have no idea which process is nuking
the volume then we need to figure this out. As there is no reproducer
we can (ab)use the cleanup tracer. Simply trace all unlink syscalls to
see which process deletes our special named volume. Given the volume
name is used as path on the fs and is deleted on volume rm we should
know exactly which process deleted it the next time hopefully.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
In preparation for maybe some day being able to run build tests
in parallel.
SUPER IMPORTANT NOTE! BUILD TESTS CANNOT BE PARALLELIZED YET!
buildah, when run in parallel, barfs with:
race: parallel builds: copying...committing...creating... layer not known
Until this is fixed, podman-build can never be run in parallel.
See https://github.com/containers/buildah/issues/5674
This PR is simply cleaning things up so, if/when that day comes,
the ensuing parallelize PR will be short & sweet.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The recent fedora kernel 6.11.4 has a problem with ipv6 networks [1].
This is not a podman bug at all but rather a kernel regression. I can
reproduce the issue easily by running this test.
Given many users were hit by this add it to the distro level gating
which runs in the fedora openQA framework and then we should catch a
bad kernel like this hopefully in the future and prevent it from going
into stable.
[1] https://github.com/containers/podman/issues/24374
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Quadlet tests and some systemd tests leak unit files, as
reported by 'systemctl list-units --failed'. Clean them up.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The startup service is special because we have to transition from
startup to the normal unit. And in order to do so we kill ourselves (as
we are run as part of the service). This means we always exited 1 which
causes systemd to keep us failure and not remove the transient unit
unless "reset-failed" is called. As there is no process around to do
that we cannot really do this, thus make us exit(0) which makes more
sense.
Of course we could try to reset-failed the unit later but the code for
that seems more complicated than that.
Add a new test from Ed that ensures we check for all healthcheck units
not just the timer to avoid leaks. I slightly modified it to provide a
better error on leaks.
Fixes: 0bbef4b830 ("libpod: rework shutdown handler flow")
Fixes: #24351
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Clarify, expand, fix a typo. These are the instructions
shown when the **patching** step fails, typically when
buildah's helpers.bash is changed in a way that conflicts
with our make-it-work-in-podman patches.
Signed-off-by: Ed Santiago <santiago@redhat.com>
This fixes two problems, first if a port is published and exposed it
should not be shown twice. It is enough to show the published one.
Second, if there is a huge range the ports were no grouped causing the
output to be unreadable basically. Now we group exposed ports like we do
with the normal published ports.
Fixes#23317
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
As an internal consistency check, the pasta tests check for duplicated test
cases by grepping a log file for a parsed test id. However it uses
grep -F for the purpose which will not perform an exact match, but a
substring match. There are some tests which generate an id which is a
substring of the id for other tests, so when test order is randomised, this
can cause a spurious failure. This can happen in practice when running
the test in parallel with very high concurrency (e.g. -j 100).
Fix this by adding the -x option to grep, which only checks for full line
exact matches.
Fixes: https://github.com/containers/podman/issues/24342
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The additional image store feature assumes that images / layers
in the additional store never go away, while we do remove it after
this test. Try to repair the store.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Historically, non-schema1 images had a deterministic image ID == config digest.
With zstd:chunked, we don't want to deduplicate layers pulled by consuming the
full tarball and layers partially pulled based on TOC, because we can't cheaply
ensure equivalence; so, image IDs for images where a TOC was used differ.
To accommodate that, compare images using their configs digests, not using image IDs.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
When looking up the current-store image ID, do that
from the same output where we verify that the ID is from the
current store, instead of listing images twice.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The test got the stores RW status backwards.
Before zstd:chunked, both image IDs should be the same, so this used
to make no difference.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
when the current soft limit is higher than the new value, ulimit fails
to set the hard limit as (tested on Rawhide):
[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument
to avoid the problem, set also the soft limit:
[root@rawhide ~]# ulimit -n -H
12345678
[root@rawhide ~]# ulimit -n -H 1048575
-bash: ulimit: open files: cannot modify limit: Invalid argument
[root@rawhide ~]# ulimit -n -SH 1048575
[root@rawhide ~]# ulimit -n -H
1048575
commit 71d5ee0e04 introduced the issue.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
There is no good reason for the special case, kube and pod units
definitely need it. Volume and network units maybe not but for
consistency we add it there as well. This makes the docs much easier to
write and understand for users as the behavior will not differ.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
As documented in the issue there is no way to wait for system units from
the user session[1]. This causes problems for rootless quadlet units as
they might be started before the network is fully up. TWhile this was
always the case and thus was never really noticed the main thing that
trigger a bunch of errors was the switch to pasta.
Pasta requires the network to be fully up in order to correctly select
the right "template" interface based on the routes. If it cannot find a
suitable interface it just fails and we cannot start the container
understandingly leading to a lot of frustration from users.
As there is no sign of any movement on the systemd issue we work around
here by using our own user unit that check if the system session
network-online.target it ready.
Now for testing it is a bit complicated. While we do now correctly test
the root and rootless generator since commit ada75c0bb8 the resulting
Wants/After= lines differ between them and there is no logic in the
testfiles themself to say if root/rootless to match specifics. One idea
was to use `assert-key-is-rootless/root` but that seemed like more
duplication for little reason so use a regex and allow both to make it
pass always. To still have some test coverage add a check in the system
test to ask systemd if we did indeed have the right depdendencies where
we can check for exact root/rootless name match.
[1] https://github.com/systemd/systemd/issues/3312Fixes#22197
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Two flakes seen in the last three months. One of them was in
August, so it's not related to ongoing criu-4.0 problems.
Suspected cause: race waiting for "podman run --rm" container
to transition from stopped to removed.
Solution: allow a 5-second grace period, retrying every second.
Also: add explanations to the Expect()s, remove unnecessary
code, and tighten up the CID check.
Signed-off-by: Ed Santiago <santiago@redhat.com>
I'm assuming this was buildah#5595: the COMMENT field moved around.
Deal with it, and add a few more checks while we're at it.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...for debugging #24147, because "md5sum mismatch" is not
the best way to troubleshoot bytestream differences.
socat is run on the container, so this requires building a
new testimage (20241011). Bump to new CI VMs[1] which include it.
[1] https://github.com/containers/automation_images/pull/389
Signed-off-by: Ed Santiago <santiago@redhat.com>
The current mypod hack breaks down when running individual tests:
$ hack/bats 010 <<< barfs because it does not want pause-image!
Reason: Bats does not provide any official way to tell if tests
are being run in parallel.
Workaround: use an undocumented way.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The special handling to return the exit code after the container has
been removed should only be done if there are no special conditions
requested. If a user asked for running or nay other state returning the
exit code immediately with a success response is just wrong. We only
want to allow that so the remote client can fetch the exit code without
races.
Fixes b3829a2932 ("libpod API: make wait endpoint better against rm races")
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
By default golang programs exit 2 on special exit signals that can be
cought and produce a stack trace. However this is behavior that can be
modfied via GOTRACEBACK=crash[1], in that case it does not exit(2) but
rather sends itself SIGABRT to the parent sees the signal exit and out
test sees that es exit code 134, 128 + 6 (SIGABRT), like most shells do.
As it turns out GOTRACEBACK=crash is the default mode on all fedora and
RHEL rpm builds as they patch the build with a special
"rpm_crashtraceback" go build tag.
While that change is old and existing for a very long time it was never
caught until commit 5e240ab1f5, which switched the old ExitWithError()
check that accepted anything > 0, to just accept 2. And as CI only test
upstream builds that are build without rpm_crashtraceback we did not
catch in CI either. Only once a user actually used distro build against
the source e2e test it failed.
I like to highlight that running distro builds against upstream e2e
tests is not something we really support or plan to support but given
this is a easy fix I decided to just fix it here as any user with
GOTRACEBACK=crash set would face the same issue.
While I touch this test remove the unnecessary RestoreArtifact() call
which is not needed at all as we do nothing with the image and just
slows the test down for now reason.
[1] https://pkg.go.dev/runtime#section-sourcefilesFixes#24213
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
since the effect would be to lower the rlimits when their definition
is higher than the default value.
The test doesn't fail on the previous version, unless the system is
configured with a nofile ulimit higher than the default value.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2317721
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
They no longer work in the latest image update, it is not clear why and
I do not have the time to debug that stuff. I opened #24230 to track it.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
In debian EST and MST7MDT are gone by default and moved to a special
package[1], instead of also installing that in the images lets use
different timezones in the test.
[1] 42c0008f86
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Run pasta with --trace and a log file to see if the hangs are caused by
pasta not correctly closing connections as assumed in #24219.
As the log is super verbose do not log it by default so I added some
extra logic to make sure it is only logged when the test fails.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This command sequence causes SizeRootFs to change on foo:
podman tag foo newimagename
podman save ... newimagename
podman load ...
Solution: get foo completely out of the picture. Use an
airgapped image: new image, new digest, new everything.
Fixes: #23756
Signed-off-by: Ed Santiago <santiago@redhat.com>
Quadlet inserts network-online.target Wants/After dependencies to ensure pulling works.
Those systemd statements cannot be subsequently reset.
In the cases where those dependencies are not wanted, we add a new
configuration item called `DefaultDependencies=` in a new section called
[Quadlet]. This section is shared between different unit types.
fixes#24193
Signed-off-by: Farya L. Maerten <me@ltow.me>
There's an important reason why the healthcheck container in 055-rm
test uses 'sleep infinity' and not 'top. Document it.
And, the test itself wasn't actually working as intended. Make
it safer by confirming that the container actually enters
the "stopping" state.
Signed-off-by: Ed Santiago <santiago@redhat.com>
When we are activated by systemd the code assumed that we had a valid
URL which was not the case so it failed to parse the URL which causes
the info call to fail all the time.
This fixes two problems first add the schema to the systemd activated
listener URL so it can be parsed correctly but second simply do not
parse it as url as all we care about in the info call is if it is unix
and the file path exists.
Fixes#24152
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Undoing some of my own work here from #24090 now that we have the
ExposedPorts field implemented in inspect. I considered a revert
of that patch, but it's still needed as without it we'd be
including exposed ports when --net=container which is not
correct.
Basically, exposed ports for a container should always go in the
new ExposedPorts field we added. They sometimes go in the Ports
field in NetworkSettings, but only when the container is not
net=host and not net=container. We were always including exposed
ports, which was not correct, but is an easy logical fix.
Also required is a test change to correct the expected behavior
as we were testing for incorrect behavior.
Fixes https://issues.redhat.com/browse/RHEL-60382
Signed-off-by: Matt Heon <mheon@redhat.com>
the kernel checks that both the uid and the gid are mapped inside the
user namespace, not only the uid:
/**
* privileged_wrt_inode_uidgid - Do capabilities in the namespace work over the inode?
* @ns: The user namespace in question
* @idmap: idmap of the mount @inode was found from
* @inode: The inode in question
*
* Return true if the inode uid and gid are within the namespace.
*/
bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
struct mnt_idmap *idmap,
const struct inode *inode)
{
return vfsuid_has_mapping(ns, i_uid_into_vfsuid(idmap, inode)) &&
vfsgid_has_mapping(ns, i_gid_into_vfsgid(idmap, inode));
}
for this reason, improve the check for hasCurrentUserMapped to verify
that the gid is also mapped, and if it is not, use an intermediate
mount for the container rootfs.
Closes: https://github.com/containers/podman/issues/24159
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Similar to github.com/containers/buildah/pull/5761 but not
security critical as Podman does not have an expectation that
mounts are scoped (the ability to write a --mount option is
already the ability to mount arbitrary content into the container
so sneaking arbitrary options into the mount doesn't have
security implications). Still, bad practice to let users inject
anything into the mount command line so let's not do that.
Signed-off-by: Matt Heon <mheon@redhat.com>
This commit was automatically cherry-picked
by buildah-vendor-treadmill v0.3
from the buildah vendor treadmill PR, #13808
* Fix conflict caused by Ed's local-registry PR in buildah
* Wire in "new" --retry and --retry-delay, these existed for longer
but where non functional.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Potential race between starting socat (which creates a socket
file) and processes accessing said socket. Or maybe not. I
dunno, I'm grasping at straws. This is an elusive flake.
Fixes: #23798 (I hope)
Signed-off-by: Ed Santiago <santiago@redhat.com>
Although podman has moved on from CNI, RHEL has not. Make
sure that builds on RHEL test the desired network backend(s).
Effective immediately, gating.yaml on all RHEL branches
must set CI_DESIRED_NETWORK (=cni or =netavark)
Signed-off-by: Ed Santiago <santiago@redhat.com>
A field we missed versus Docker. Matches the format of our
existing Ports list in the NetworkConfig, but only includes
exposed ports (and maps these to struct{}, as they never go to
real ports on the host).
Fixes https://issues.redhat.com/browse/RHEL-60382
Signed-off-by: Matt Heon <mheon@redhat.com>
There is no reason to validate the args here, first podman may change
the syntax so this is just duplication that may hurt us long term. It
also added special handling of some options that just do not make sense,
i.e. removing 0.0.0.0, podman should really be the only parser here. And
more importantly this prevents variables from being used.
Fixes#24081
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Previously, we didn't bother including exposed ports in the
container config when creating a container with --net=host. Per
Docker this isn't really correct; host-net containers are still
considered to have exposed ports, even though that specific
container can be guaranteed to never use them.
We could just fix this for host container, but we might as well
make it generic. This patch unconditionally adds exposed ports to
the container config - it was previously conditional on a network
namespace being configured. The behavior of `podman inspect` with
exposed ports when using `--net=container:` has also been
corrected. Previously, we used exposed ports from the container
sharing its network namespace, which was not correct. Now, we use
regular port bindings from the namespace container, but exposed
ports from our own container.
Fixes https://issues.redhat.com/browse/RHEL-60382
Signed-off-by: Matt Heon <mheon@redhat.com>
Change getUnitDirs to maintain a slice in addition to the map and return the slice
Add helper functions to make the code more readable
Adjust unit tests
Restore system test
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
Yield to reality: if $XDG_RUNTIME_DIR is unset, assume a
reasonable default (rootless only). This clears up a
common failure in Fedora gating tests, and will probably
prevent future time wasters.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Primary motivator: 'curl -v' format changes in f42
Drive-bys:
* 127.0.0.1, not localhost
* use wait_for_port, not sleep
* show curl commands and their output, to ease debugging failures
* better failure assertions
Signed-off-by: Ed Santiago <santiago@redhat.com>
These flags can affect the output of the HealtCheck log. Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via `podman inspect`.
It is also limited to the last five executions and the first 500 characters per execution.
This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic `healthcheck service failed` record.
- The `--health-log-destination` flag sets the destination of the HealthCheck log.
- `none`: (default behavior) `HealthCheckResults` are stored in overlay containers. (For example: `$runroot/healthcheck.log`)
- `directory`: creates a log file named `<container-ID>-healthcheck.log` with JSON `HealthCheckResults` in the specified directory.
- `events_logger`: The log will be written with logging mechanism set by events_loggeri. It also saves the log to a default directory, for performance on a system with a large number of logs.
- The `--health-max-log-count` flag sets the maximum number of attempts in the HealthCheck log file.
- A value of `0` indicates an infinite number of attempts in the log file.
- The default value is `5` attempts in the log file.
- The `--health-max-log-size` flag sets the maximum length of the log stored.
- A value of `0` indicates an infinite log length.
- The default value is `500` log characters.
Add --health-max-log-count flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-max-log-size flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-log-destination flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
The various pasta port forwarding tests run a socat server inside a
container, then connect to it from a socat client on the host. Currently
we have the server bind to the same specific address within the container
as we connect to on the host.
That's not quite what we want. For "tap" tests where the traffic goes over
pasta's L2 link to the container it's fine, though unnecessary. For
"loopback" tests where traffic is forwarded by pasta at the L4 socket
level, however, it's not quite right. In this case the address used is
either 127.0.0.1 or ::. That's correct and as needed for the host side
address we're connecting to. However on the container side, this only
works because of an odd and arguably undesirable behaviour of pasta: we use
the fact that we have an L4 socket within the container to make such
"spliced" L4 connections appear as if they come from loopback within the
container. A container will generally expect it's loopback address to be
only accessible from within the container, and this odd behaviour may be
changed in pasta in future.
In any case, the binding of the container side server is unnecessary, so
simply remove it.
Link: https://github.com/containers/podman/issues/24045
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mostly just switch to safename. Rewrite setup() to guarantee
unique service file names, atomically created.
* IMPORTANT NOTE: enabling parallelization on these tests
triggers #24010 ("fragment file" flake), but only on my
f40 laptop. I have never seen the flake in Cirrus despite
many many runs in #23275. I am submitting this for review
and merging because even though _something_ is broken,
this breakage is unlikely to affect our CI.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Any test that uses --events-backend=file cannot be run in parallel
due to #23750. This seems to be a hard block, unfixable.
All other tests, enable ci:parallel.
And, bring in timing fixes#23600. Thanks, @Honny1!
Signed-off-by: Ed Santiago <santiago@redhat.com>
The format test flakes when quay is down, because we've
been doing 'podman search $IMAGE', which is a quay image.
Solution: check if local registry is running, and use it.
We don't need a real image.
Signed-off-by: Ed Santiago <santiago@redhat.com>
(where possible. Not all tests are parallelizable).
And, refactor two complicated tests into one. This one
is hard to review, sorry.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Use os.ReadDir recursively instead of filepath.WalkDir
Use map instead of list to easily find looped Symlinks
Update existing tests and add a more elaborate one
Update the man page
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
The netns dir has a special logic to bind mout itself and make itslef
shared. This code here didn't which lead to catastrophic bug during
netns unmounting as we were unable to unmount the netns as the mount got
duplicated and had the wrong parent mount. This caused us to loop forever
trying to remove the file.
Fixes https://issues.redhat.com/browse/RHEL-59620Fixes#23685
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This removes the need for a tricky/fragile namespace workaround.
Huge thanks to Paul for discovering documentation on the
Registry container, and how to override config.yml settings:
https://distribution.github.io/distribution/about/configuration/#override-specific-configuration-options
Drive-by: consistentize quotes in -eVAR="value". Minor, but
makes them all easier to read with emacs/vi syntax highlighting.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The "rm on stopping containers" test is flaking under high load,
probably because I bumped up two timeouts in the healthcheck
container that it relies on. Bump up this test's timeout as well.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...not just when running parallel Bats, because Bats
does not provide any way to know if we're parallel.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...of high system load (such as when running parallel tests).
Allow time for services to reach desired state, by retrying
a few times in a loop.
Signed-off-by: Ed Santiago <santiago@redhat.com>
There is no reason to disallow exposed sctp ports at all. As root we can
publish them find and as rootless it should error later anyway.
And for the case mentioned in the issue it doesn't make sense as the
port is not even published thus it is just part of the metadata which is
totally in all cases.
Fixes#23911
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Like we do in system tests now check for netns leaks in e2e as well. Now
because things run in parallel and this dir is shared we cannot test
after each test only once per suite. This will be a PITA to debug if
leaks happen as the netns files do not contain the container ID and are
just random bytes (maybe we should change this?)
Fixes#23715
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This fixes the problem where even as root we check the netns files from
root. But in order to catch any rootless bugs we must check the rootless
files from $XDG_RUNTIME_DIR/netns.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This test is currently disabled due to several issues, only some of which
are described in the existing comments. Add some more details to clarify
the situation.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This name for the tests is misleading, since in the default configuration
podman will already configure a forwarding addres, which could forward
to either another local forwarder or an external nameserver on the host
side. What this test is really about is explicitly configuring the pasta
DNS forwarding address. Rename accordingly.
The IPv4 version of the test doesn't use the podman --dns option, only
the pasta --dns-forward option. This exercises the podman behaviour that
pasta --dns-forward options are added to /etc/resolv.conf automatically.
However there could also be other things in /etc/resolv.conf, so the
nslookup might not use the custom forwarding address for the lookup.
To fix that, split the test into two parts: one verifying that the custom
address is in /etc/resolv.conf and another performing the nslookup with an
explicit server address to make sure we exercise the pasta side as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In both the "Basic nameserver lookup" and "Local forwarder, IPv4" pasta
tests, we check whether DNS resolution is working by running "nslookup
127.0.0.1" in the container and checking if 1.0.0.127.in-addr.arpa is in
the output.
1.0.0.127.in-addr.arpa isn't the expected result of the resolution though,
it's just the DNS name that nslookup will tranlated 127.0.0.1 into. The
test mostly works, because nslookup echoes that on successful lookups.
However, it could also echo it in certain sorts of failure, so it's not a
very reliable test.
Furthermore, resolving 127.0.0.1 from a nameserver is a rather strange
thing to do. It's done that way because RFC1912[0] suggests it should
always resolve, even for nameservers on a disconnected network. But, this
doesn't really appear to be true in practice: a number of resolvers return
NXDOMAIN. That works by accident because nslookup seems to echo the
name above as part of the error message.
Change to instead looking up one of the root servers by name. This does
now rely on access to the global DNS during tests, but other podman tests
attempt to resolve google.com, so that should be ok. One of the root
servers is about as close to universal resolvability as it's possible to
get
[0] https://datatracker.ietf.org/doc/html/rfc1912#section-4.1
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The idea behind the "External resolver" tests is simply to check that we
can contact a nameserver, regardless of this configuration. To this end
the "IPv4" version looks up 127.0.0.1 which RFC1912[0] suggests should
always be resolvable.
The IPv6 version instead looks up [::1]. While it makes sense for
that to be resolvable in a similar way, there appear to be quite a few
nameservers which do not resolve it, making this test flaky.
Furthermore the idea behind resolving [::1] is that it should make
nslookup prefer to resolve over IPv6. That appears to be very
unreliable at best. Since making a different query doesn't actually
exercise anything different in pasta, drop the test.
The remaining IPv4 test isn't really specific to an "external" resolver,
it's simply checking that we can contact some sort of resolver with the
default podman configuration. Rename accordingly, and run it regardless of
IPv4 connectivity on the host: we can still query a nameserver about an
IPv4 address, even if we only have IPv6 connectivity ourselves.
[0] https://datatracker.ietf.org/doc/html/rfc1912#section-4.1
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "Local forwarder, IPv4" pasta test, amongst other things, checks that
podman's default DNS forwarding address - 169.254.0.1 - appears in the
container's /etc/resolv.conf. That's not really related to anything else
going on in that test (which is about _changing_ that default address).
So, move it into its own test case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
...or at least as much as possible. Some tests cannot
be run in parallel due to #23750: "--events-backend=file"
does not actually work the way a naïve user would intuit.
Stop/die events are asynchronous, and can be gathered
by *ANY OTHER* podman process running after it, and if
that process has the default events-backend=journal,
that's where the event will be logged. See #23987 for
further discussion.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Need --layers=false in podman build, otherwise a buildah race
can trigger "layer not known" failures:
https://github.com/containers/buildah/issues/5674
Signed-off-by: Ed Santiago <santiago@redhat.com>
When running parallel, multiple tests could be trying to start
the registry at once. Make this parallel-safe.
Also, use a safer port range for the registry. Something
outside of /proc/sys/net/ipv4/ip_local_port_range
Sorry, I'm including a FIXME section that I haven't investigated
deeply enough.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add a few best-practices examples, and add a whole section
describing the dos and donts of writing parallel-safe tests.
Signed-off-by: Ed Santiago <santiago@redhat.com>
For tests run in parallel, show file number as |nnn| (vs [nnn])
Teach logformatter to distinguish the two, adding 'p' to anchors
in parallel tests. Necessary because in this scheme we run bats
twice, thus see 'ok 1' twice, and we want to differentiate them.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Workaround for #23292, where simultaneous 'pod create' commands
will all start a podman-build of the pause image, but only
one of them will be tagged, and the others will leak <none>
images.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The issue is closed and I recently fixed a number of races (bf74797c69)
in the remote attach API that sound like exactly like the same error
that was mentioned in issue #9597.
As such I think this works, if it start flaking again we can revert this
or better fix the actual bug.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
As it turns on things are not so simple after all...
In podman-py it was reported[1] that waiting might hang, per our docs wait
on multiple conditions should exit once the first one is hit and not all
of them. However because the new wait logic never checked if the context
was cancelled the goroutine kept running until conmon exited and because
we used a waitgroup to wait for all of them to finish it blocked until
that happened.
First we can remove the waitgroup as we only need to wait for one of
them anyway via the channel. While this alone fixes the hang it would
still leak the other goroutine. As there is no way to cancel a goroutine
all the code must check for a cancelled context in the wait loop to no
leak.
Fixes 8a943311db ("libpod: simplify WaitForExit()")
[1] https://github.com/containers/podman-py/issues/425
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We were only splitting on tabs, not spaces, so we returned just a
single line most of the time, not an array of the fields in the
output of `ps`. Unfortunately, some of these fields are allowed
to contain spaces themselves, which makes things complicated, but
we got lucky in that Docker took the simplest possible solution
and just assumed that only one field would contain spaces and it
would always be the last one, which is easy enough to duplicate
on our end.
Fixes#23981
Signed-off-by: Matt Heon <mheon@redhat.com>
For the past two months we've been splitting system tests
into two categories: those that CAN be run in parallel,
and those that CANNOT. Much work has been done to replace
hardcoded names (mycontainer, mypod) with safename().
Hundreds of test runs, in CI and on Ed's laptop, have
proven this approach viable.
make {local,remote}system now runs in two steps: first
the serial ones, then the parallel ones. hack/bats will
now recognize the 'ci:parallel' tag and add --jobs (nprocs).
This requires some tweaking of leak_check, because there
can be umpteen tests running (affecting image/container/pod/etc
state) when any given test completes.
Rules for enabling parallelization in tests:
* use unique container/pod/volume/network names (safename)
* do not run 'podman rm -a' or 'rmi -a'
* never use the -l (--latest) option
* do not run 'podman ps/images' and expect precise output
Signed-off-by: Ed Santiago <santiago@redhat.com>
...and remove one old skip() for older debian, but leave
two others in place and mark that they're still a problem.
Signed-off-by: Ed Santiago <santiago@redhat.com>
podman-remote events are not flushed, so order is not guaranteed.
This results in CI flakes. Only on Debian, for reasons unknown.
Make the network-connection events test more lenient when remote.
Closes: #23634 (but does not actually fix it)
Signed-off-by: Ed Santiago <santiago@redhat.com>
convert the owner UID and GID into the user namespace only when
":idmap" mount is used.
This changes the behaviour of :idmap with an empty volume. Now the
existing directory ownership is copied up as in the other case.
Closes: https://github.com/containers/podman/issues/23347
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Use safename. Add ci:parallel tags. Use a random port, not
hardcoded 9999. Do not remove pause image. And especially
do not "rm -a" anything.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...because it requires 100% control and knowledge of the
state of all images, containers, and volumes.
Use safename anyway, just in case we ever have a leak from here.
I'm finding safename sooooooo helpful when reading journal.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add ci:parallel tags; move one non-parallel-safe test to
another networking-test file; and a few drive-by fixes
Signed-off-by: Ed Santiago <santiago@redhat.com>
Use safename for containers and pods. Add ci:parallel tags.
And reenable distro-integration tests that had been skipped
due to a container-selinux bug that is now fixed.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Minor bump. Fedora VMs now include ShellCheck, so we can
remove the 'dnf install' at CI run time.
Also, FWIW, Debian *vark are now at 1.12 (from 1.9)
VMs built in https://github.com/containers/automation_images/pull/385
Signed-off-by: Ed Santiago <santiago@redhat.com>