send a SIGTERM to the server process instead of killing it so it has
time to do a proper cleanup and don't leak the home mount.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
shutdown the containers store so that the home directory mount is not
leaked when "podman system service" exits.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Adds a `NetworkAlias=` key to both .container and .pod quadlet files,
which translates to the `--network-alias` option to `podman run` and
`podman pod create` respectively. Can be repeated multiple times.
Signed-off-by: Félix Saparelli <felix@passcod.name>
Old netavark version only supported iptables, however a new version on
th ehost might use nftables. This breaks the networking tests here as
they are not compatible and you would need to reboot to fix that.
Because this is not possible for our tests make sure we force the
iptables driver always to keep the test working.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
netavark can use iptables or nftables as firewall driver, thus if we try
to flush rules make sure we try both to keep the test working when we
switch the default to nftables.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Stop using iptables to check anything, it does not work rootless and
will no longer work with nftables which will be used in the future.
Also fix up the test that say podman run to actually use podman run and
then just check via inspect that the ports are set correctly.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This commit gets tests working under the new local-registry system:
* amend a few image names, mostly just sticking to a consistent
list of those images in our registry cache. Mostly minor
tag updates.
* trickier: pull_test: change some error messages, and remove
a test that's now a NOP. Basically, with a local (unprotected)
registry we always get "404 manifest unknown"; with a real
registry we'll get "403 I can't tell you".
* trickiest: seccomp_test: build our own images at run time,
with our desired labels. Until now we've been pulling
prebuilt images, but those will not copy to the local
cache registry. Something about v1? Anyhow, I gave up
trying to cache them, and the workaround is straightforward.
Also took the liberty of strengthening a few error-message checks
Signed-off-by: Ed Santiago <santiago@redhat.com>
As of https://github.com/containers/automation_images/pull/357
our CI VMs include a local registry preloaded with all(*)
images used in tests.
* where "all" means "most".
This commit installs a new registries.conf that redirects docker
and quay to the new local registry. The hope is that this will
reduce CI flakes.
Since tests change over time, and new tests may require new
images, this commit also adds a mechanism for pulling in
remote images at test run time. Obviously this negates
the purpose of the cache, since it introduces a flake
pain point. The idea is: DO NOT DO THIS UNLESS ABSOLUTELY
NECESSARY, and then, if we have to do this, hurry up and
spin new CI VMs that include the new image(s).
Signed-off-by: Ed Santiago <santiago@redhat.com>
Run root e2e & system tests using composefs on rawhide.
Write magic settings to storage.conf. That part is easy.
e2e tests, however, ignore storage.conf. They require everything
to be specified on the command line. And "everything", in the
case of composefs, includes a long complicated --pull-options
string which in turn requires containers-storage PR 1966
which, as of this writing, is finally vendored into podman.
Signed-off-by: Ed Santiago <santiago@redhat.com>
This test flakes frequently and its status is completely ignored in CI.
At the time of this commit, nobody has stepped up to debug or fix it.
Drop the test.
Signed-off-by: Chris Evich <cevich@redhat.com>
When a system has one ipv4 and one ipv6 address hostname -I will show
both causing a failure in the case where this is only one address.
To fix this stop using hostname -I and use ip -4 to only list v4
addresses and the use jq to filter the output accordingly.
Fixes#23227
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The code currently tried to avoid joining the userns from conmon
directly and rather joined to only read the pid file and then send this
back to use so we could join the userns. From the comment this was done
because we could not read the pid file. However this is no longer true
as of commit 49eb5af301 and file is no always owned by the real user.
This means we can just remove this special logic and join the namespace
directly there. A test has been added to check the rejoin logic with a
custom uidmapping.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
- fix test name to reflect that it's not pasta-only
(followup from #21563)
- in one podman-update test run in OpenQA, defer assertion
failures so we can gather better data on regressions.
This would've been helpful in diagnosing bz2281805.
- add an error-message check to one test that needed it
(found by accident)
- add distro-integration test tag to a handful of new tests,
so they run in OpenQA. Found via 'git diff 33891e8 test/system'
and scanning for '^\+@test '. I only added tests that IMO
have some risk of interacting poorly with kernel or systemd
updates, e.g. quadlet, modules, tmpfs+noswap.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The test check the the default volume is not on tmpfs, however what it
should really check that the volume is on our container storage fs. It
is possible that users run the storage on top of tmpfs so this test
always failed there.
The better check is to compare the fs from the graphroot and the volume.
Unfortunately, for unknown reasons stat -f -c %T returns UNKNOWN and not
the actual fs. I have no idea why, to work around that we now parse
/proc/mounts manually for the fs. Not nice but at least it works
correctly.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When a users asks for specific devices we should still add them and not
ignore them just because privileged adds all of them.
Most notably if you set --device /dev/null:/dev/test you expect
/dev/test in the container, however as we ignored them this was not the
case. Another side effect is that the input was not validated at at all.
This leads to confusion as descriped in the issue.
Fixes#23132
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
I found that Quadlet didn't currently have support for log options.
This merge allows Quadlet to handle log options and correctly
pass those values through to `podman run` for Container and Kube
types.
Syntactically consistent with existing parameters:
```ini
[Container]
Image=localhost/imagename
LogOpt=path=/var/log/container/mycontainer.json
LogOpt=size=10mb
```
Signed-off-by: Brett Calliss <brett@obligatory.email>
When the user specifies a Containerfile or Dockfile with the -f flag in podman build, if the file does not exist, the error should be intuitive to the user.
Fixed: #22940
Signed-off-by: Kevin Cui <bh@bugs.cc>
When we execute ps(1) in the container and the container uses a userns
with a different id mapping the user id field will be wrong.
To fix this we must join the userns in such case.
Fixes#22293
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The docker API uses only a single arg for platform and multiple
platforms are given as comma separated list.
Fixes#22071
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Two tests are skipped for a long time because they flaked to much,
nobody cares about them and there are only debugging endpoints mostly so
it is not critical either.
The "of 2 seconds" tests isn't useful either. It waits up to 30s for the
exit so it doesn't actually verify a proper timeout. Additionally we
have similar checks in the system tests "podman system service -
CORS enabled in logs" so I consider this safe to remove.
Fixes#12624
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add some test steps into quadlet - ContainerName. These steps are
used to ensure the default configuration for quadlets generated
service files is sending stdout/stderr/syslog to the journald.
Signed-off-by: Yiqiao Pu <ypu@redhat.com>
The restore code path never called completeNetworkSetup() and this means
that hosts/resolv.conf files were not populated. This fix is simply to
call this function. There is a big catch here. Technically this is
suposed to be called after the container is created but before it is
started. There is no such thing for restore, the container runs right
away. This means that if we do the call afterwards there is a short
interval where the file is still empty. Thus I decided to call it
before which makes it not working with PostConfigureNetNS (userns) but
as this does not work anyway today so I don't see it as problem.
Fixes#22901
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The pod was set after we checked the namespace and the namespace code
only checked the --pod flag but didn't consider --pod-id-file option.
As such fix the check to first set the pod option on the spec then use
that for the namespace. Also make sure we always use an empty default
otherwise it would be impossible in the backend to know if a user
requested a specific userns or not, i.e. even in case of a set
PODMAN_USERNS env a container should still get the userns from the pod
and not use the var in this case. Therefore unset it from the default
cli value.
There are more issues here around --pod-id-file and cli validation that
does not consider the option as conflicting with --userns like --pod
does but I decided to fix the bug at hand and don't try to fix the
entire mess which most likely would take days.
Fixes#22931
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Do not return 200 status code before we know if there will be an error.
Delay writing the status code until we send the first response. That way
we can set an error code inside the loop when we get a error on the
first try, i.e. because an invalid descriptor was used.
Fixes#22986
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When we failed to do anything we should return 500, the 409 code has a
special meaing to the client as it uses a different error format. As
such the remote client was not able to unmarshal the error correctly and
just returned an empty string.
Fixes#22989
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
With (esp. Debian) CI VM images built by
https://github.com/containers/automation_images/ pull/338 CI no-longer
tests with runc nor cgroups v1. Add logic to fail under these
conditions. Prune back high-level YAML/script envars and logic formerly
required to support these things.
Signed-off-by: Chris Evich <cevich@redhat.com>
When a user specifies a invalid connection in CONTAINER_CONNECTION then
podman should return a proper error saying so. Currently it ignored the
error and in rootFlags() just exited early with defining any flags. This
caused a panic then when trying to use the flags later.
In order to address this first store the connection error in the
PodmanConfig struct and not abort right away during flag setup. This is
important as the user might have specified a flag with a valid remote
connection. As such we check all flags and only when none were given we
return the connection error.
Also while at it I noticed that the default connection reported via
podman --help was wrong as it only used the old containers.conf field
for it and did not consider the podman-connections.json default.
New regression tests have been added to make sure it behaves correctly.
This fixes the problem reported in the PR #22997.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add a test to verify that restoring a container in a Pod works when
the `container restore --pod` option is used with Pod *name* (this
functionality was previously limited to support only full Pod ID).
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
PR #22821 (CI speedup) was overly aggressive in one kube test.
It's flaking. Bump up timeout from 3s to 4.
Signed-off-by: Ed Santiago <santiago@redhat.com>
At the end of all tests always check for leaks. That should make us more
robust against adding tests at the end that would leak stuff otherwise.
TODO: something seems wrong with bats when returning an error in
teardown_suite(), it prints a warning:
bats warning: Executed <NUM+1> instead of expected <NUM> tests
And also the output is formatted weirdly in this case where the podman
args are split over multiple lines.
But the test fails as expected so I don't think it is a problem.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
While these are not really slow they still take about 100-250ms if I
time this locally. Given they are run for every test this adds up
quickly. Looking at CI logs I can see the timings for skipped
tests are all in 600ms range. So I think it is safe to assume that these
functions need to get faster.
We have over 670 test cases currently so we talk about over 400s spend
in these functions in CI. This allows for big gains.
Now overall this is a tricky trade of, while all tests should cleanup
after themselves there is no guarantee for that as such errors can be
leaked into other tests making debugging much harder. To work at least a
bit against this teardown checks if the test was successful and only
skips the podman commands bases on that. Without it a single flake could
cause all following tets to fail.
As such this commit does the proper setup once one suite start then only
after a test failed.
In order for this to work at all we have to fix all leaks first, see
previous commits. And then for the future keep a very strong eye on
this during reviews.
Also add a PODMAN_BATS_LEAK_CHECK option
By default test must cleanup themselves and to speed up CI we no longer
do any cleanup in teardown by default. However there is still many cases
where we might have to debug a leak so add a new PODMAN_BATS_LEAK_CHECK
env option that can be set and should cause teardown to fail if the test
did not cleanup properly.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Remove leaking containers and remove unessesary push/pull args. For push
it tries to push an image as argument which makes no sense and for pull
we try to pull argument as image which is also wrong.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This is the same as what --squash-all is doing, and we already support
--squash with --layers=true since this is the default.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If a container was stopped and we try to start it before we called
cleanup it tried to reuse the network which caused a panic as the pasta
code cannot deal with that. It is also never correct as the netns must
be created by the runtime in case of custom user namespaces used. As
such the proper thing is to clean the netns up first.
Also change a e2e test to report better errors. It is not directly
related to this chnage but it failed on v1 of this patch so we noticed
the ugly error message it produced. Thanks to Ed for the fix.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This fixes a regression added in commit 4fd84190b8, because the name was
overwritten by the createTimer() timer call the removeTransientFiles()
call removed the new timer and not the startup healthcheck. And then
when the container was stopped we leaked it as the wrong unit name was
in the state.
A new test has been added to ensure the logic works and we never leak
the system timers.
Fixes#22884
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The buildah buil kill trick is bad as we have to sleep and wait to aboid
flakes which takes time. Instead it is possible to redo this build part
manually with buildah commands. It is not trival and harder to
understand but it safes 2-3s so I think it is worth it.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
First, as root don't wait 5s for the timeout, 1s is enough. Also switch
to use the curl --max-time option instead, that way we know we do not
kill curl before it had the chance to do anything possibly.
Second, combine podman inspect commands into one. This makes the test
faster by over one second as we safe a bunch of podman commands.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Another case of contianer does not exit with SIGTERM so we waste 10s.
Now because our contianer reacts to sigterm and exits 0 the systemd unit
status changed to inactive from failed.
And most importantly add Notify=yes because the socat call always failed
as the default is to not leak the notify socket into the container.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It is not clear at all why the count of 30 was choosen, this seems a
lot and of course takes quite a while. The test takes over 16s in CI.
To speed it up reduce the count to 10. I think this should still be good
enough to ensure there are no races IMO.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It makes the test a bit uglier but I cannot see a good way to sped this
up otherwise. I chnaged the created test to only start/stop the
contianer once instead of every test case iteration. This makes it about
2s faster locally.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Overall just combine several container runs into one. Every RUN
instruction will run a new container which is quite expensive so chain
the commands together. The same for podman run's.
I could have combined a bit more but I think this leaves it still
readable. This speeds up the test about 4s locally from 8s before.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Use podman version over podman info because info has to query a lot of
internal state, e.g. contianer and image count, so it is slower than a
simple info. This speeds the test up by about 600ms locally.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Issue #11825 was fixed a long time ago. Also we no longer test
cni/dnsname so there is really no point in having this.
Speeds up the test by 1 second.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Another case of contianer does not exit with SIGTERM so we waste 10s.
Now because our contianer reacts to sigterm and exits 0 the systemd
unit status changed to inactive from failed.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We don't have two loop twice for the stat call we can just stat both
dirs at once. This means we only have to create half of the containers
so the test is twice as fast.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Testing `podman system check` requires that we have a way to
intentionally introduce storage corruptions. Add a hidden `podman
testing` command that provides the necessary internal logic in
subcommands. Stub out the tunnel implementation for now.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
debian's man (5) hostname page states "The file should contain a single newline-terminated hostname
string."
[NO NEW TESTS NEEDED]
fix#22729
Signed-off-by: Bo Wang <wangbob@uniontech.com>
The e2e tests already depend on skopeo anyway and pulling a over 300
MB image is not helpful for flakes but most importantly we see ENOSPC
flakes. I see them around the skopeo test so I assume the big image is
pushing the tmpfs limits so other tests running in parallel can start
failing because of it.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The expectation with --cgroups=disabled is that the current cgroup is
used by the container.
Currently the --cgroups=disabled is passed directly to the OCI
runtime, but it doesn't stop Podman from creating a new cgroup when it
doesn't own the current one.
Closes: https://github.com/containers/podman/issues/20910
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the condition doesn't work when the runtime to use is specified
through its absolute path as the error message contains that.
Simplify the check and just look for "read from the init process".
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The command does not react on sigterm, so kube down needs to wait 10s.
To fix it first use a command that does but also write the yaml
directly instead of doing the podman create && kube generate dance.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
use a command that stops on SIGTERM not sleep, that way the tests can
continue to use podman kube down without waiting for the full stop
timeout every time.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This test is by far the slowest one taking over minute, the reason is
that it is checking every single podman command for shell completions.
The test is useful but it does not need to check the "..." argument 3
times. Test a second time to make sure not only the first arg is
completed. This change makes it about 15 seconds faster.
Long term we should get this test out of the main system tests together
with other cli only tests as they do not need to run on each OS, etc...
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The current logic used podman logs I don't understand way, all we care
about is the container output and we can just read the same with a
attached podman run, of course we have to move it into the background
but it did the some with logs.
This also allows us to remove the extra log-driver checks and because
podman logs seems to be much slower than the extra run we safe over 10s
with this change.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Use only one retry and a short stop timeout to speed them up. I am not
sure if this will cause flakes, I have not seen any after trying for
some time so I think this works just as well. And is about 2-3 seconds
faster for both tests.
If it does start to flake we can revert this commit again or write the
test differently.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Do not wait 5 seconds, just stop the container directly.
This speeds up the test by more than 4 seconds.
One could make the case here that we want to check podman wait but
there are so many other podman wait tests that it should not matter.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Instead of iterating over all tmp dirs and creating test containers for
each one we can just pass all files to one touch call. With that we have
to create much less containers while still checking the same thing. This
speeds up the test by about 4 seconds.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The test used sleep to synchronize log output between both containers
which is slow. There is actually no way to guarantee the ordering on
the reading side so just remove the sleep's and check the the lines
within the same container are in the right order.
Trying to preserve the orignal ordering is just not possible if we speed
up the test as it would flake to often.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
There is no reason for this check to wait 4 seconds for the container to
run, instead make sure to have a running process and then stop it
directly with -t0 not have any delay.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
As agreed in Planning meeting of 2024-03-20, Podman 5.x will
drop support for cgroups v1 and for runc. Make it so.
CI images built in https://github.com/containers/automation_images/pull/338
Signed-off-by: Ed Santiago <santiago@redhat.com>
This container did not react to sigterm thus we always waited 10s for it
to stop. Also do not wait 2s for the logs instead use a retry loop.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The test does a normal stop on a command that does not react to sigterm.
As I cannot fix the system stop logic use a command which does. This
safes us 10s as it no longer waits for the timeout.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Both tests take 10s longer than they need to because they run the sleep
command int he container which does not react to sigterm, as such podman
waits 10s before killing it with sigkill.
To fix it just stop them with podman rm -fa -t0 to avoid the wait and do
not use podman kube down as we cannot set a timeout there. podman kube
down is still covered in many other tests so this is not an issue.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
IMO it is not important to cover each case with each sdnotify policy, to
speed them up we run all the exit code cases only once just twice for
each policy while switching the sdnotify policy between each case. This
way we safe 50% of runs and should still have sufficient coverage.
Before it took around 24 seconds, with this it is around 12 seconds now.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
There is really no point in waiting 10s for the kill, let's use 2 this
should be good enough to observe the timing.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This test waits 15 seconds to send sigterm for no good reason, we can
just make the timeout shorter. Also make sure the podman command quit on
sigterm by looking for the output message.
While at it fix the tests to use $PODMAN_TMPDIR not /tmp and define the
yaml in the test instead of using the podman create && podman kube
generate && podman rm way to create the yaml as it is a bit slower as we
have to call three podman commands for it.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Merge two podman event tests into one to speed them up as they did
mostly the same anyway. This way we only have to do the setup/teardown
once and only run one container.
Second, add the --since option because reading the journal can be slow
if you have thousands of event entries. This is not so critical in CI as
we run on fresh systems but on local dev machines I have almost 100k
events in the journal so parsing all of them makes this test slow (like
30s), with this change I can get it under 1s.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
.build files allow to build an image via Quadlet. The keys from a .build
file are translated to arguments of a `podman build` command by Quadlet.
Minimal keys for .build files are `ImageTag=` and a context directory,
see `SetWorkingDirectory=`, or a `File=` pointing to a Containerfile.
After sorting .build files into the Quadlet dependency order, there
remains a possible dependency cycle issue between .volume and .build
files: A .volume can have `Image=some.build`, and a .build can have
`Volume=some.volume:/some/volume`.
We solve this dependency cycle by prefilling resourceNames with all
image names from .build files before converting all the unit files.
This results in an issue for the test suite though: For .volume's
depending on *.image or *.build, we need to copy these additional
dependencies to the test's quadletDir, otherwise the test will fail.
This is necessary, because `handleImageSource()` actually needs to know
the image name defined in the referenced *.{build,image} file. It cannot
fall back on the default names, as it is done for networks or volumes,
for example.
Signed-off-by: Johannes Maibaum <jmaibaum@gmail.com>
Defining a timer with a fixed interval is not a good idea as we first
have to wait until the timer triggers, while the interval was every two
seconds it means that we have to wait at least 2s for it to start.
However much worse it means it triggers the unit over and over, this
seems to cause some soft of race with the output check. I have seen
this test run 10-60s which does not make much sense.
Switching the timer to trgger once on start seem to make the test run
consistently in 7s locally for me so this is much better.
There still is the question if we really have to test this at all on
each upstream PR but I left it for now.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It takes over 10 seconds for this test as it uses --wait 5 twice which
runs into the timeout. IMO this tests is just redundant as it is already
covered in the e2e tests much better. Thus remove it here.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The new c/image version is returning a slightly new error message[1] so
make tests use the new one.
[1] https://github.com/containers/image/pull/2408
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When an empty volume is mounted into a container, Docker will
chown that volume appropriately for use in the container. Podman
does this as well, but there are differences in the details. In
Podman, a chown is presently a one-and-done deal; in Docker, it
will continue so long as the volume remains empty. Mount into a
dozen containers, but never add content, the chown occurs every
time. The chown is also linked to copy-up; it will always occur
when a copy-up occurred, despite the volume now not being empty.
This PR changes our logic to (mostly) match Docker's.
For some reason, the chowning also stops if the volume is chowned
to root at any point. This feels like a Docker bug, but as they
say, bug for bug compatible.
In retrospect, using bools for NeedsChown and NeedsCopyUp was a
mistake. Docker isn't actually tracking this stuff; they're just
doing a copy-up and permissions change unconditionally as long as
the volume is empty. They also have the two linked as one
operation, seemingly, despite happening at very different times
during container init. Replicating that in our stateful system is
nontrivial, hence the need for the new CopiedUp field. Basically,
we never want to chown a volume with contents in it, except if
that data is a result of a copy-up that resulted from mounting
into the current container. Tracking who did the copy-up is the
easiest way to do this.
Fixes#22571
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Now WaitForExit returns the exit code as stored in the db instead of
returning an error when the container was removed.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If a container unit starts on boot with a dependency on `default.target`
the image unit may start too soon, before network is ready. This cause
the unit to fail to pull the image.
- Add a dependency on `network-online.target` to make sure image pulls
don't fail.
See https://github.com/containers/podman/issues/21873
- Document the hardcoded dependency on `network-online.target` for images unit
and explain how it can be overriden if necessary.
- tests/e2e/quadlet: Add `assert-last-key-regex`
Required to test the `After=` override in [Unit] section
See https://github.com/containers/podman/pull/22057#issuecomment-2008959993
- quadlet/unitfile: add a prepenUnitLine method
Requirements on networks should be inserted at the top of the
section so the user can override them.
Signed-off-by: jbtrystram <jbtrystram@redhat.com>
The default_addr shell function in test/system/helpers.network is used to
get the host's default address, which is used in a number of pasta
networking tests. However, in certain circumstances it can incorrectly
pick a deprecated address as the primary address. Correct it to exclude
those.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We have to exclude the ips in the rootless netns as they are not the
host. Now that fix only works if there are more than one ip one the
host available, if there is only one we do not set the entry at all
which I consider better as failing to resolve this name is a much better
error for users than connecting to a wrong ip. It also matches what
--network pasta already does.
The test is bit more compilcated as I would like, however it must deal
with both cases one ip, more than one so there is no way around it I
think.
Fixes#22653
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
They are not run in CI and to my knowledge are not used by anyone, we
have much more/better tests in test/e2e and test/system that should
cover everything done in these scripts so just delete them to not
confuse contributors.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
First, point users to hack/bats for running them locally. Second, remove
TODO.md as it doesn't contain any helpful information. Basically all the
missing tests there have been added so this does not serve any purpose
and is missleading.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
wait for the healthy status on the thread where the container lock is
held. Otherwise, if it is performed from a go routine, a different
thread is used (since the runtime.LockOSThread() call doesn't have any
effect), causing pthread_mutex_unlock() to fail with EPERM.
Closes: https://github.com/containers/podman/issues/22651
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The function that's handing us events will return an error after closing
the channel over which it's sending events, and its caller (in its own
goroutine) will then send that error over another channel.
The logic that started the goroutine is likely to notice that the events
channel is closed before noticing that the error channel has a result
for it to read, so any error that would have been communicated would be
lost.
When we finish reading events, check if the reader returned an error
before telling our caller that there was no error.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
The v5 API made a breaking change for podman inspect, this means that
an old client could not longer parse the result from the new 5.X server.
The other way around new client and old server already worked.
As it turned out there were several users that run into this, one case
to hit this is using an old 4.X podman machine wich now pulls a newer
coreos with podman 5.0. But there are also other users running into it.
In order to keep the API working we now have a version check and return
the old v4 compatible payload so the old remote client can still work
against a newer server thus removing any major breaking change for an
old client.
Fixes#22657
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Under some circumstances podman might be executed with a different argv0
than the actual path to the podman binary. This breaks the reexec logic
as it tried to exec argv0 which failed.
This is visible when using podmansh as login shell which get's the
special -podmansh on argv0 to signal the shell it is a login shell.
To fix this we can simply use /proc/self/exe as command path which is
much more robust and the argv array is still passed correctly.
Fixes#22672
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Final followup to #22270. That PR added a temporary convention
allowing a new form of ExitWithError(), one with an exit code
and stderr substring. In order to allow bite-size progress,
the old no-args form was still allowed. This PR removes
support for no-args ExitWithError().
This PR also adds one piece of new functionality: passing ""
(empty string) as the stderr arg means "expect exit code
but fail if there's anything at all in stderr".
Signed-off-by: Ed Santiago <santiago@redhat.com>
Follow up to commit eaf60c7fe7, with the toolbox image removal it is
possible to run all tests from tmpfs.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles only one file, test/e2e/rmi_test.go , because
my changes are significant enough to merit individual review.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles all remaining test/e2e/r*_test.go
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles test/e2e/s*_test.go
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles a subset of test/e2e/run_xxx_test.go
Signed-off-by: Ed Santiago <santiago@redhat.com>
The scenario for inducing this is as follows:
1. Start a container with a long stop timeout and a PID1 that
ignores SIGTERM
2. Use `podman stop` to stop that container
3. Simultaneously, in another terminal, kill -9 `pidof podman`
(the container is now in ContainerStateStopping)
4. Now kill that container's Conmon with SIGKILL.
5. No commands are able to move the container from Stopping to
Stopped now.
The cause is a logic bug in our exit-file handling logic. Conmon
being dead without an exit file causes no change to the state.
Add handling for this case that tries to clean up, including
stopping the container if it still seems to be running.
Fixes#19629
Signed-off-by: Matt Heon <mheon@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles all remaining test/e2e/p*_test.go
Signed-off-by: Ed Santiago <santiago@redhat.com>
This never tested what it said it did, the command line was wrong so
`,ro=false` was taken as image causing a error. What this actually
should care about is that a glob is taken as is and not evaluated.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Followup to #22270 : wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
A small number of tests were broken, as in, not actually testing
what they claimed to be testing. I've done my best to fix those.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles test/e2e/play_kube_test.go
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles test/e2e/v*_test.go
Signed-off-by: Ed Santiago <santiago@redhat.com>
The image is way to big (over 800MB) that slows tests down as we always
have to pull this, the tests itself are also super slow due the
entrypoint logic that we don't care about. We should be testing for
features needed and not specific tools.
I think the current changes should have a similar coverage in terms of
podman features, it no longer tests toolbox but IMO this never was a
task for podman CI tests.
The main driver for this is to make the tests run entirely based on
tmpfs and this image is just to much[1].
[1] https://github.com/containers/podman/pull/22533
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Some programs have their configuration files relative to the user's
home. It would be convenient being able to mount these into the container, but
that requires expansion of `~` or `$HOME` in a label. This commit adds support
for that for the `runlabel` command.
Signed-off-by: Dan Čermák <dcermak@suse.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
This commit handles a subset of test/e2e/pod_xxxx_test.go
(I stopped before this grew too huge for review)
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #22270: wherever possible/practical, extend command
error checks to include explicit exit status codes and error strings.
Signed-off-by: Ed Santiago <santiago@redhat.com>
..to match the version in root dir, to get rid of the mismatch
warning on every ginkgo run.
I still don't understand why renovatebot isn't doing this.
(Also, touch a file under e2e, to force tests to run)
Signed-off-by: Ed Santiago <santiago@redhat.com>
Because the test left the image mounted the cleanup failed to remove the
tmpdir as it contained an active mount point. Thus ensure we unmount the
image again to prevent this leak.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Using /tmp means this file will be leaked and no deleted, switch to
using the per test tempdir which is removed after the test.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
TMPDIR is typically /tmp which is typically(*) a tmpfs.
This PR ignores $TMPDIR when $CI is defined, forcing all
e2e tests to set up one central working directory in /var/tmp
instead.
Also, lots of cleanup.
(*) For many years, up to and still including the time of
this PR, /tmp on Fedora CI VMs is actually NOT tmpfs,
it is just / (root). This is nonstandard and undesirable.
Efforts are underway to remove this special case.
Signed-off-by: Ed Santiago <santiago@redhat.com>