When podman restarts config values within the Engine are lost.
Add --hook-dirs arguments as appropriate to the cleanup command
so that hooks are preserved on restarts due to the on-restart setting
Tests: add a check that prestart/poststop hooks ran every time after 2
restarts.
`wait_for_restart_count` was re-used to wait for restarts and moved to
helpers file.
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Fixes: #17935
This commit removes the code to build a local pause
image from the Containerfile. It is replaced with
code to find the catatonit binary and include it in
the Rootfs.
This removes the need to build a local pause container
image.
The same logic is also applied to createServiceContainer
which is originally also based on the pause image.
Fixes: #23292
Signed-off-by: Jan Kaluza <jkaluza@redhat.com>
Historically, non-schema1 images had a deterministic image ID == config digest.
With zstd:chunked, we don't want to deduplicate layers pulled by consuming the
full tarball and layers partially pulled based on TOC, because we can't cheaply
ensure equivalence; so, image IDs for images where a TOC was used differ.
To accommodate that, compare images using their configs digests, not using image IDs.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
...for debugging #24147, because "md5sum mismatch" is not
the best way to troubleshoot bytestream differences.
socat is run on the container, so this requires building a
new testimage (20241011). Bump to new CI VMs[1] which include it.
[1] https://github.com/containers/automation_images/pull/389
Signed-off-by: Ed Santiago <santiago@redhat.com>
The current mypod hack breaks down when running individual tests:
$ hack/bats 010 <<< barfs because it does not want pause-image!
Reason: Bats does not provide any official way to tell if tests
are being run in parallel.
Workaround: use an undocumented way.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Yield to reality: if $XDG_RUNTIME_DIR is unset, assume a
reasonable default (rootless only). This clears up a
common failure in Fedora gating tests, and will probably
prevent future time wasters.
Signed-off-by: Ed Santiago <santiago@redhat.com>
...not just when running parallel Bats, because Bats
does not provide any way to know if we're parallel.
Signed-off-by: Ed Santiago <santiago@redhat.com>
For tests run in parallel, show file number as |nnn| (vs [nnn])
Teach logformatter to distinguish the two, adding 'p' to anchors
in parallel tests. Necessary because in this scheme we run bats
twice, thus see 'ok 1' twice, and we want to differentiate them.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Workaround for #23292, where simultaneous 'pod create' commands
will all start a podman-build of the pause image, but only
one of them will be tagged, and the others will leak <none>
images.
Signed-off-by: Ed Santiago <santiago@redhat.com>
For the past two months we've been splitting system tests
into two categories: those that CAN be run in parallel,
and those that CANNOT. Much work has been done to replace
hardcoded names (mycontainer, mypod) with safename().
Hundreds of test runs, in CI and on Ed's laptop, have
proven this approach viable.
make {local,remote}system now runs in two steps: first
the serial ones, then the parallel ones. hack/bats will
now recognize the 'ci:parallel' tag and add --jobs (nprocs).
This requires some tweaking of leak_check, because there
can be umpteen tests running (affecting image/container/pod/etc
state) when any given test completes.
Rules for enabling parallelization in tests:
* use unique container/pod/volume/network names (safename)
* do not run 'podman rm -a' or 'rmi -a'
* never use the -l (--latest) option
* do not run 'podman ps/images' and expect precise output
Signed-off-by: Ed Santiago <santiago@redhat.com>
...by using a crude port lock-and-reserve mechanism. This is
a small cherrypick from code that has been working in #23275
over dozens of CI runs. Am separating out into a small PR
because it's stable, harmless to serial runs, and will
simplify the eventual review of #23275.
Closes: #23488
Signed-off-by: Ed Santiago <santiago@redhat.com>
BATS teardown logs are unreadable, making it almost impossible
to see tiny "Leaked this-or-that" messages.
Solution: new _run_podman_quiet() helper, replaces run_podman
in a small number of cases within teardown. Clunky, and
duplicative, sorry.
New helper for leak_check, basically spits out warnings (and
bumps error count) if it sees any output whatsoever from
individual "podman XXX ls" commands.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Make safename() invocations consistent within the same
test. This puts the onus on the caller to add a unique
element when calling multiple times, e.g. "ctr1-$(safename)".
This is not too much of a burden. Major benefit is making
it easy for a reader to associate containers, pods, volumes,
images within a given test.
And, use dashes, not underscores. "podman generate kube"
removes underscores, making it very difficult to do
things like "podman inspect $podname" (because we need
to generate "$podname_with_underscores_removed")
Signed-off-by: Ed Santiago <santiago@redhat.com>
Many system tests use hardcoded names for containers, images,
and everything. This has worked because system tests run
serially. It will not work if we ever run in parallel.
Create a new safename() helper, and use it as follows:
myctr=c_$(safename)
myvol1=v1_$(safename)
...
Find current instances of hardcoded names, and replace
with safe ones.
Whether or not we ever end up parallelizing system tests,
this is simply good practice.
There are far too many instances to fix in one (reviewable) PR.
This is commit 1 of N.
Signed-off-by: Ed Santiago <santiago@redhat.com>
At the end of all tests always check for leaks. That should make us more
robust against adding tests at the end that would leak stuff otherwise.
TODO: something seems wrong with bats when returning an error in
teardown_suite(), it prints a warning:
bats warning: Executed <NUM+1> instead of expected <NUM> tests
And also the output is formatted weirdly in this case where the podman
args are split over multiple lines.
But the test fails as expected so I don't think it is a problem.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
While these are not really slow they still take about 100-250ms if I
time this locally. Given they are run for every test this adds up
quickly. Looking at CI logs I can see the timings for skipped
tests are all in 600ms range. So I think it is safe to assume that these
functions need to get faster.
We have over 670 test cases currently so we talk about over 400s spend
in these functions in CI. This allows for big gains.
Now overall this is a tricky trade of, while all tests should cleanup
after themselves there is no guarantee for that as such errors can be
leaked into other tests making debugging much harder. To work at least a
bit against this teardown checks if the test was successful and only
skips the podman commands bases on that. Without it a single flake could
cause all following tets to fail.
As such this commit does the proper setup once one suite start then only
after a test failed.
In order for this to work at all we have to fix all leaks first, see
previous commits. And then for the future keep a very strong eye on
this during reviews.
Also add a PODMAN_BATS_LEAK_CHECK option
By default test must cleanup themselves and to speed up CI we no longer
do any cleanup in teardown by default. However there is still many cases
where we might have to debug a leak so add a new PODMAN_BATS_LEAK_CHECK
env option that can be set and should cause teardown to fail if the test
did not cleanup properly.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Testing `podman system check` requires that we have a way to
intentionally introduce storage corruptions. Add a hidden `podman
testing` command that provides the necessary internal logic in
subcommands. Stub out the tunnel implementation for now.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Simply because it's been a while since the last testimage
build, and I want to confirm that our image build process
still works.
Added /home/podman/healthcheck. This saves us having to
podman-build on each healthcheck test. Removed now-
unneeded _build_health_check_image helper.
testimage: bump alpine 3.16.2 to 3.19.0
systemd-image: f38 to f39
- tzdata now requires dnf **install**, not reinstall
(this is exactly the sort of thing I was looking for)
PROBLEMS DISCOVERED:
- in e2e, fedoraMinimal is now == SYSTEMD_IMAGE. This
screws up some of the image-count tests (CACHE_IMAGES).
- "alter tarball" system test now barfs with tar < 1.35.
TODO: completely replace fedoraMinimal with SYSTEMD_IMAGE
in all tests.
Signed-off-by: Ed Santiago <santiago@redhat.com>
- tmpfs + noswap test: requires noswap feature in kernel.
Check for it, and skip if unimplemented. (Root only.
Rootless test works regardless of kernel).
- podman generate systemd tests: always use --files option,
because otherwise the "DEPRECATED" warning gets written
to the systemd unit file.
- kube play tests: yikes. Fix longstanding bugs when checking
for containers running. This revealed a longstanding bug
in one test: multi-pod YAML never actually worked. Fixed now.
- run_podman(): that new check-for-warnings code we added
in #19878, duh, I skipped it on Debian but should've skipped
when *runc*. Do so now and update the comment. Requires
minor surgery to podman_runtime() helper to avoid
infinite recursion.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Our test registry (used for login & local registry tests)
was being run using the standard podman tmpdir, hence the
standard podman database, This was then getting clobbered
in the 330-corrupt-images test, which runs "system reset".
We just didn't know this was happening. Until we added
a registry test after the system reset. Oops.
Solution: new helper function podman_isolation_opts()
sets --root, --runroot, *and --tmpdir*. Refactor all
existing --root/--runroot usages. Document.
Next problem: the "network reload" test in 500-networking.bats
did not (could not) know about our registry port, so the
"iptables -F" command reverted that to DROP, so the subsequent
podman-auth in 700-play timed out.
Solution: add a podman-isolated "network reload" to start_registry().
Final problem, because, really, those weren't enough: a BATS
bug where running with --filter-tags would set IFS=',' in setup_suite
which in turn has catastrophic consequences:
https://github.com/bats-core/bats-core/issues/812
See #20966 for details of the failure and further conversation.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Very rare flake, probably caused by my nemesis, podman run -d
Solution: keep the sleep-1 (vs using nanosecond resolution),
but make sure we first wait for the output from the container.
Also, bump down the iteration delay in wait_for_output, from 5s to 1.
Thanks to Paul for noticing that.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #20797 (defer assertion failures). The bail-now()
helper was being defined only in setup() ... and some tests,
particularly 001-basic.bats, define their own minimalist setup().
Symptom was "bail-now: command not found", which still caused
test to fail (so no failures were hidden) but led to concern
and wasted time when analyzing failures.
Solution: add one more definition of bail-now(), in outer scope.
There is still one pathological case I'm not addressing: a
bats file that defines its own teardown() which does not invoke
basic_teardown(), then has a test that runs defer-assertion-failures
without a followup immediate-assertion-failures. This would lead
to failures that are never seen. Since teardown() without basic_teardown()
is invalid, I choose not to worry about this case.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add support for .pod unit files with only PodmanArgs, GlobalArgs, ContainersConfModule and PodName
Add support for linking .container units with .pod ones
Add e2e and system tests
Add to man page
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
Some system tests run deep loops:
for x in a b c; do
for y in d e f; do
.... check condition $x + $y
Normally, if one of these fails, game over. This can be frustrating
to a developer looking for failure patterns.
Here we introduce a new defer-assertion-failure function, meant
to be called before loops like these. Everything is the same,
except that tests will continue running even after failure.
When test finishes, or if test runs immediate-assertion-failure,
a new message indicates that multiple tests failed:
FAIL: X test assertions failed. Search for 'FAIL': above this line.
Signed-off-by: Ed Santiago <santiago@redhat.com>
We're only testing vfs in CI. That's bad. #18822 tried to
remedy that but that only worked on system tests, not e2e.
Here we introduce CI_DESIRED_STORAGE, to be set in .cirrus.yml
in the same vein as all the other CI_DESIRED_X. Since it's 2023
we default to overlay, testing vfs only in priorfedora.
Fixes required:
- e2e tests:
- in cleanup, umount ROOT/overlay to avoid leaking mounts
- system tests:
- fix a few badly-written tests that assumed/hardcoded overlay
- buildx test: add weird exception to device-number test
- mount tests: add special case code for vfs
- unprivileged test: disable one section that is N/A on vfs
Signed-off-by: Ed Santiago <santiago@redhat.com>
This is something I've long wanted in logs: an indicator of
which bats file the test lives in. As of v1.7.0 there is
now a way to do that, BATS_TEST_NAME_PREFIX. Use it. Logs
now look like:
ok 14 [001] podman - shutdown engines
ok 15 [005] podman info - basic test
...
not ok 195 [065] podman cp - dot notation ....
(As a bonus, we can remove the super-long "test blah blah pasta"
duplication from 505.bats).
Also, removed no-longer-necessary (fingers crossed) debug code
for the recently fixed containers-storage umount/EINVAL flake.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #20394. For years (since BATS 1.5) we've been
seeing and ignoring nasty red warnings at the end of every
system test run. Thanks for fixing it, @giuseppe! But it
broke down in the '?' case when $expected_rc is empty:
test/system/helpers.bash: line 345: [: -eq: unary operator expected
Simple fix.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Part of RUN-1906.
Followup to #19878 (check stderr in system tests): allow_warnings()
and require_warning() functions to make sure no unexpected messages
fall through the cracks.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #20016:
- remove obsolete (misleading) comment
- prune dangling <none>:<none> image
Also, in kube test, rmi pause_image to avoid nasty red warnings
Also, ouch, fix a stupid that I introduced in #19878: the PODMAN
command path got dropped from log messages.
Signed-off-by: Ed Santiago <santiago@redhat.com>
With few exceptions, commands that exit 0 should not emit any
messages with level=warning or =error. Let's start enforcing
that in run_podman.
Allow one-off exceptions, typically when we're testing an
actual warning condition (usual case: "podman stop" where it
times out to SIGKILL). Exceptions are specified via:
run_podman 0+w subcommand...
^^^---- or, rarely, 0+e
"0" stands for "expect exit status 0", which is the default
so it's implicit anyway. The +w / +e (or even +we) is the
new part. I have added it to tests where necessary.
And, because life is what it is, add two global exceptions:
- Debian. Because runc has too many flakes.
- kube. Ditto. Kube commands emit lots of nasty error
messages (yes, level=error) that don't seem to affect
results.
Similar to #18442
Signed-off-by: Ed Santiago <santiago@redhat.com>
Unexplained infrequent flakes in sdnotify system tests,
waiting for READY=1.
Hypothesis: race condition between the container sending
the READY string and that string making it through conmon
and socat into the log file.
Solution: don't just check once; keep trying in a loop.
Write a reusable wait_for_file_content() helper function,
and clean up a bunch more tests as long as we're at it.
Fixes: #19724
Signed-off-by: Ed Santiago <santiago@redhat.com>
- the "podman {run,exec} /etc" test: runc now spits out
"is a directory" instead of "permission denied". And,
on exec, exits 255 instead of 126. Deal with it.
- workaround for https://github.com/containers/skopeo/issues/823
(skopeo XDG bug): always make sure XDG is defined for skopeo
Signed-off-by: Ed Santiago <santiago@redhat.com>
To silence my find-obsolete-skips script, remove the '#'
from the following issues in skip messages:
#11784#15013#15025#17433#17436#17456
Also update the messages to reflect the fact that the issues
will never be fixed.
Also remove ubuntu skips: we no longer test ubuntu.
Also remove one buildah skip that is no longer applicable:
Fixes: #17520
Signed-off-by: Ed Santiago <santiago@redhat.com>
The podman-login tests have accumulated much cruft over the
years, because that's the only place where we run a local
registry, and the process was crufty: we actually start/stopped
the registry as the first & last tests of the file. Meaning,
you couldn't do 'hack/bats 150:just-one-test' because that
would skip the registry start. And just now, a completely
unrelated test has had to be shoved into the login file.
This PR revamps the whole thing, by adding a new registry helper
module that can be used anywhere. And, once the registry is
started, it just stays running until the end of tests. (This
requires BATS 1.7 or greater).
Signed-off-by: Ed Santiago <santiago@redhat.com>
Add new _prefetch helper for fetching and caching images.
Use it in a few places, most importantly 120-load.bats
where our teardown() now runs 'rmi -af'.
Reason: in #17911 we discovered that podman save + load do
not actually preserve the image: annotations and other metadata
are lost. This means that a test which runs after 120-load.bats
is operating on a different $IMAGE than a test which runs before.
This is not a problem except in very obscure corner cases, like
one fixed in #18542, but it seems irresponsible to just handwave
that issue away
The _prefetch function uses skopeo for fetching and saving
images, because skopeo preserves digests and metadata.
[Side note for posterity: I tried amending basic_setup() to
always rmi -a + prefetch, instead of the current images -a +
rmi unwanted ones. That slowed down system tests by 10 minutes,
presumably because loads are much slower than queries. I reverted
that change and am documenting it as a reminder of why we do things
the way we do.]
Signed-off-by: Ed Santiago <santiago@redhat.com>
for #18514: if we get a timeout in teardown(), run and show
the output of podman system locks
for #18831: if we hit unmount/EINVAL, nothing will ever work
again, so signal all future tests to skip.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The current way of bind mounting the host timezone file has problems.
Because /etc/localtime in the image may exist and is a symlink under
/usr/share/zoneinfo it will overwrite the targetfile. That confuses
timezone parses especially java where this approach does not work at
all. So we end up with an link which does not reflect the actual truth.
The better way is to just change the symlink in the image like it is
done on the host. However because not all images ship tzdata we cannot
rely on that either. So now we do both, when tzdata is installed then
use the symlink and if not we keep the current way of copying the host
timezone file in the container to /etc/localtime.
Also note that we need to rebuild the systemd image to include tzdata in
order to test this as our images do not contain the tzdata by default.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2149876
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Instrument system tests in hopes of tracking down #17216,
the unlinkat-ebusy-hosed flake.
Oh, also, timestamp.awk: timestamps have always been UTC, but
add a 'Z' to make it unambiguous.
Signed-off-by: Ed Santiago <santiago@redhat.com>
In run_podman(), display a nanosecond-level timestamp next to
each command and its output.
Because this clutters the results, teach logformatter to grok
these new timestamps, strip them, and display a more human-readable
time delta in the left-hand timestamp column. logformatter started off
as a mess and is now, well, 🤮. I'm sorry. I just hope its results
make it worthwhile.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Several tweaks to see if we can track down #17216, the unlinkat-ebusy
flake:
- teardown(): if a cleanup command fails, display it and its
output to the debug channel. This should never happen, but
it can and does (see #18180, dependent containers). We
need to know about it.
- selinux tests: use unique pod names. This should help when
scanning journal logs.
- many tests: add "-f -t0" to "pod rm"
And, several unrelated changes caught by accident:
- images-commit-with-comment test: was leaving a stray image
behind. Clean it up, and make a few more readability tweaks
- podman-remote-group-add test: add an explicit skip()
when not remote. (Otherwise, test passes cleanly on
podman local, which is misleading)
- lots of container cleanup and/or adding "--rm" to run commands,
to avoid leaving stray containers
Signed-off-by: Ed Santiago <santiago@redhat.com>
- fix a typo that was resulting in a test being a NOP, and
add actual testing to it.
- fix two Expects() with incorrectly-ordered actual/expects
- remove leading whitespace from an It() test name
- To(BeTrue()) is evil. Wherever possible, replace it with
useful string or field checks. When not possible, use
the annotation field to indicate what failed. I got
carried away here, #sorrynotsorry
- remove unused system-test code
Signed-off-by: Ed Santiago <santiago@redhat.com>
On cgroup v1 we need to mount only the systemd named hierarchy as
writeable, so we configure the OCI runtime to mount /sys/fs/cgroup as
read-only and on top of that bind mount /sys/fs/cgroup/systemd.
But when we use a private cgroupns, we cannot do that since we don't
know the final cgroup path.
Also, do not override the mount if there is already one for
/sys/fs/cgroup/systemd.
Closes: https://github.com/containers/podman/issues/17727
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
...safer, too: the big change is using 'mapfile' to split
multiline strings; this preserves empty lines, making it
easy to see spurious (or missing) blank lines in output.
Another change is to indent the expected-output string
consistently, for readability.
Then, to handle \r (CR) and other control characters, use
bash %q to format special chars. But %q makes\ it\ hard\ to
read\ lines\ with\ spaces, so strip off those backslashes.
This makes assert() much larger and uglier, but this is
code that shouldn't be touched often.
Finally, because these are big changes to critical code,
write a complicated regression test suite for assert().
Signed-off-by: Ed Santiago <santiago@redhat.com>
Tests constantly fail with zero indication of why. Fix that.
- add correct default for $QUADLET path
- add check to make sure it exists
- log quadlet commands and their output
Signed-off-by: Ed Santiago <santiago@redhat.com>
Red Hat registry is too unreliable. (As of this writing
in January 2023, quay.io is not much better, but this is
a new flake. Ubi has been flaking for a year or more).
Instead of UBI, use the new systemd-image added to system tests
in #16814. Since this reduces the number of cached images,
a few unrelated tests (image count) need to be tweaked.
And, sigh, Fedora systemd colorizes boot messages by default,
causing a failure where we don't see an expected Reached Target
message. I don't want to rely on ASCII formatting codes, so
I've updated the build-systemd-image script so it disables
systemd colors, and have built a new systemd-image:20230106.
Made a few small usability improvements to the script as well.
Closes: #16695
Signed-off-by: Ed Santiago <santiago@redhat.com>