Quadlet tests and some systemd tests leak unit files, as
reported by 'systemctl list-units --failed'. Clean them up.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Mostly just switch to safename. Rewrite setup() to guarantee
unique service file names, atomically created.
* IMPORTANT NOTE: enabling parallelization on these tests
triggers #24010 ("fragment file" flake), but only on my
f40 laptop. I have never seen the flake in Cirrus despite
many many runs in #23275. I am submitting this for review
and merging because even though _something_ is broken,
this breakage is unlikely to affect our CI.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The tests didn't check anything actually because default_ifname requires
an ip version argument to work. Thus pasta_iface was empty, add new
checks to prevent this kind of error again.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
- fix test name to reflect that it's not pasta-only
(followup from #21563)
- in one podman-update test run in OpenQA, defer assertion
failures so we can gather better data on regressions.
This would've been helpful in diagnosing bz2281805.
- add an error-message check to one test that needed it
(found by accident)
- add distro-integration test tag to a handful of new tests,
so they run in OpenQA. Found via 'git diff 33891e8 test/system'
and scanning for '^\+@test '. I only added tests that IMO
have some risk of interacting poorly with kernel or systemd
updates, e.g. quadlet, modules, tmpfs+noswap.
Signed-off-by: Ed Santiago <santiago@redhat.com>
This container did not react to sigterm thus we always waited 10s for it
to stop. Also do not wait 2s for the logs instead use a retry loop.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The test does a normal stop on a command that does not react to sigterm.
As I cannot fix the system stop logic use a command which does. This
safes us 10s as it no longer waits for the timeout.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Three infrequent flakes. Add debug code to help track
down if/when they happen again.
And, one of them, fix a logic bug that will save us 8-10s
on system tests runs.
Signed-off-by: Ed Santiago <santiago@redhat.com>
This vendors the latest c/common version, including making Pasta
the default rootless network provider. That broke a number of
tests, which have been fixed as part of this PR.
Also includes a change to network stats logic, which simplifies
the code a bit and makes it actually work with Pasta.
Signed-off-by: Matt Heon <mheon@redhat.com>
Simply because it's been a while since the last testimage
build, and I want to confirm that our image build process
still works.
Added /home/podman/healthcheck. This saves us having to
podman-build on each healthcheck test. Removed now-
unneeded _build_health_check_image helper.
testimage: bump alpine 3.16.2 to 3.19.0
systemd-image: f38 to f39
- tzdata now requires dnf **install**, not reinstall
(this is exactly the sort of thing I was looking for)
PROBLEMS DISCOVERED:
- in e2e, fedoraMinimal is now == SYSTEMD_IMAGE. This
screws up some of the image-count tests (CACHE_IMAGES).
- "alter tarball" system test now barfs with tar < 1.35.
TODO: completely replace fedoraMinimal with SYSTEMD_IMAGE
in all tests.
Signed-off-by: Ed Santiago <santiago@redhat.com>
- tmpfs + noswap test: requires noswap feature in kernel.
Check for it, and skip if unimplemented. (Root only.
Rootless test works regardless of kernel).
- podman generate systemd tests: always use --files option,
because otherwise the "DEPRECATED" warning gets written
to the systemd unit file.
- kube play tests: yikes. Fix longstanding bugs when checking
for containers running. This revealed a longstanding bug
in one test: multi-pod YAML never actually worked. Fixed now.
- run_podman(): that new check-for-warnings code we added
in #19878, duh, I skipped it on Debian but should've skipped
when *runc*. Do so now and update the comment. Requires
minor surgery to podman_runtime() helper to avoid
infinite recursion.
Signed-off-by: Ed Santiago <santiago@redhat.com>
When a systemd-related system test fails, we usually get:
systemctl start foo
FAILED exit status 1, try 'systemctl --status' or 'journalctl -xe'
That makes it impossible to debug flakes.
Solution: new systemctl_start() [note underscore], to be used
instead of systemctl <SPACE> start. On failure, will run log
commands.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Part of RUN-1906.
Followup to #19878 (check stderr in system tests): allow_warnings()
and require_warning() functions to make sure no unexpected messages
fall through the cracks.
Signed-off-by: Ed Santiago <santiago@redhat.com>
With few exceptions, commands that exit 0 should not emit any
messages with level=warning or =error. Let's start enforcing
that in run_podman.
Allow one-off exceptions, typically when we're testing an
actual warning condition (usual case: "podman stop" where it
times out to SIGKILL). Exceptions are specified via:
run_podman 0+w subcommand...
^^^---- or, rarely, 0+e
"0" stands for "expect exit status 0", which is the default
so it's implicit anyway. The +w / +e (or even +we) is the
new part. I have added it to tests where necessary.
And, because life is what it is, add two global exceptions:
- Debian. Because runc has too many flakes.
- kube. Ditto. Kube commands emit lots of nasty error
messages (yes, level=error) that don't seem to affect
results.
Similar to #18442
Signed-off-by: Ed Santiago <santiago@redhat.com>
To silence my find-obsolete-skips script, remove the '#'
from the following issues in skip messages:
#11784#15013#15025#17433#17436#17456
Also update the messages to reflect the fact that the issues
will never be fixed.
Also remove ubuntu skips: we no longer test ubuntu.
Also remove one buildah skip that is no longer applicable:
Fixes: #17520
Signed-off-by: Ed Santiago <santiago@redhat.com>
Remove an outdated comment on the absence of exit-code propagation when
running K8s workloads in systemd. The `podman-kube@` systemd template
is using default restart policy of the system. The exit-code
propagation is tested in other tests, so we can keep the logic as is.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Implement means for reflecting failed containers (i.e., those having
exited non-zero) to better integrate `kube play` with systemd. The
idea is to have the main PID of `kube play` exit non-zero in a
configurable way such that systemd's restart policies can kick in.
When using the default sdnotify-notify policy, the service container
acts as the main PID to further reduce the resource footprint. In that
case, before stopping the service container, Podman will lookup the exit
codes of all non-infra containers. The service will then behave
according to the following three exit-code policies:
- `none`: exit 0 and ignore containers (default)
- `any`: exit non-zero if _any_ container did
- `all`: exit non-zero if _all_ containers did
The upper values can be passed via a hidden `kube play
--service-exit-code-propagation` flag which can be used by tests and
later on by Quadlet.
In case Podman acts as the main PID (i.e., when at least one container
runs with an sdnotify-policy other than "ignore"), Podman will continue
to wait for the service container to exit and reflect its exit code.
Note that this commit also fixes a long-standing annoyance of the
service container exiting non-zero. The underlying issue was that the
service container had been stopped with SIGKILL instead of SIGTERM and
hence exited non-zero. Fixing that was a prerequisite for the exit-code
propagation to work but also improves the integration of `kube play`
with systemd and hence Quadlet with systemd.
Jira: issues.redhat.com/browse/RUN-1776
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
On cgroup v1 we need to mount only the systemd named hierarchy as
writeable, so we configure the OCI runtime to mount /sys/fs/cgroup as
read-only and on top of that bind mount /sys/fs/cgroup/systemd.
But when we use a private cgroupns, we cannot do that since we don't
know the final cgroup path.
Also, do not override the mount if there is already one for
/sys/fs/cgroup/systemd.
Closes: https://github.com/containers/podman/issues/17727
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
It makes little sense to create a log line string from the entry just to
parse it again into a LogLine. We have the typed fields so we can
assemble the logLine direclty, this makes things simpler and more
efficient.
Also entries from the passthrough driver do not use the CONTAINER_ID_FULL
field, instead we can just access c.ID() directly.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The passthrough driver is designed for use in systemd units. By default
we can expect systemd to log the output on journald unless the unit sets
differen StandardOutput/StandardError settings.
At the moment podman logs just errors out when the passthrough driver is
used. With this change we will read the journald for the unit messages.
The logic is actually very similar to the existing one, we just need to
change the filter. We now filter by SYSTEMD_UNIT wich equals to the
contianer cgroup, this allows us the actually filter on a per contianer
basis even when multiple contianers are started in the same unit, i.e.
via podman-kube@.service.
The only difference a user will see is that journald will merge
stdout/err into one stream so we loose the separation there.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Only enforce the passthrough log driver for Quadlet. Commit 68fbebf
introduced a regression on the `podman-kube@` template as `podman logs`
stopped working and settings from containers.conf were ignored.
Fixes: #17482
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Replace existing tab indentations with spaces, and add
a test to CI to prevent new ones from sneaking in.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Commit 4fa307f149 fixed a number of issues in the sdnotify proxies.
Whenever a container runs with a custom sdnotify policy, the proxies
need to keep running which in turn required Podman to run and wait for
the service container to stop. Improve on that behavior and set the
service container as the main PID (instead of Podman) when no container
needs sdnotify.
Fixes: #17345
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Increase the loop range from 5 to 20 to make sure we give the service
enough time to transition to inactive. Other tests have the same range
with 0.5 seconds sleeps, so I expect the new value to be sufficient and
consistent.
Fixes: #17093
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Reasoning
---------
When the log-driver is passthrough, the journal socket is passed to the containers as-is which has two advantages:
1. journald can see who the actual sender of the log event is,
rather than thinking everything comes from the conmon process
2. conmon will not have to copy all the log data
Code Changes
------------
If log-driver was not set by the user and service-container is set use
passthrough as the default log-driver
Update the system tests
- explicitly set logdriver in sdnotify and play tests
- podman-kube template test: Verify the default log driver for service-container
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
As outlined in #16076, a subsequent BARRIER *may* follow the READY
message sent by a container. To correctly imitate the behavior of
systemd's NOTIFY_SOCKET, the notify proxies span up by `kube play` must
hence process messages for the entirety of the workload.
We know that the workload is done and that all containers and pods have
exited when the service container exits. Hence, all proxies are closed
at that time.
The above changes imply that Podman runs for the entirety of the
workload and will henceforth act as the MAINPID when running inside of
systemd. Prior to this change, the service container acted as the
MAINPID which is now not possible anymore; Podman would be killed
immediately on exit of the service container and could not clean up.
The kube template now correctly transitions to in-active instead of
failed in systemd.
Fixes: #16076Fixes: #16515
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Basically, in the timeout loop where we checked for new CID
on the restarted container, we were running 'podman inspect'
(not 'inspect --format ID'), and comparing full hundred-line
output against single-line CID string.
While I'm in here, add 'c_' prefix to container to make it
easier for my old eyes to recognize "oh, that's a container name"
vs "is that a name? a SHA? a woozle?"
Signed-off-by: Ed Santiago <santiago@redhat.com>
Make sure that the on-failure actions only kick in once the health check
has passed its retries. Also fix race conditions on reading/writing the
log.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Emit a warning to the user when generating a unit with --new on a
container that was created with a custom --restart policy. As shown
in #15284, a custom --restart policy in that case can lead to issues
on system shutdown where systemd attempts to nuke the unit but Podman
keeps on restarting the container.
Fixes: #15284
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
- basic : add actual log-level tests
- events : clean up, add --format tests
- systemd : reorder proxy args for legibility
- auto-update : fix missing timeout that could lead to hang
Signed-off-by: Ed Santiago <santiago@redhat.com>
For systems that have extreme robustness requirements (edge devices,
particularly those in difficult to access environments), it is important
that applications continue running in all circumstances. When the
application fails, Podman must restart it automatically to provide this
robustness. Otherwise, these devices may require customer IT to
physically gain access to restart, which can be prohibitively difficult.
Add a new `--on-failure` flag that supports four actions:
- **none**: Take no action.
- **kill**: Kill the container.
- **restart**: Restart the container. Do not combine the `restart`
action with the `--restart` flag. When running inside of
a systemd unit, consider using the `kill` or `stop`
action instead to make use of systemd's restart policy.
- **stop**: Stop the container.
To remain backwards compatible, **none** is the default action.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Add auto-update support to `podman kube play`. Auto-update policies can
be configured for:
* the entire pod via the `io.containers.autoupdate` annotation
* a specific container via the `io.containers.autoupdate/$name` annotation
To make use of rollbacks, the `io.containers.sdnotify` policy should be
set to `container` such that the workload running _inside_ the container
can send the READY message via the NOTIFY_SOCKET once ready. For
further details on auto updates and rollbacks, please refer to the
specific article [1].
Since auto updates and rollbacks bases on Podman's systemd integration,
the k8s YAML must be executed in the `podman-kube@` systemd template.
For further details on how to run k8s YAML in systemd via Podman, please
refer to the specific article [2].
An examplary k8s YAML may look as follows:
```YAML
apiVersion: v1
kind: Pod
metadata:
annotations:
io.containers.autoupdate: "local"
io.containers.autoupdate/b: "registry"
labels:
app: test
name: test_pod
spec:
containers:
- command:
- top
image: alpine
name: a
- command:
- top
image: alpine
name: b
```
[1] https://www.redhat.com/sysadmin/podman-auto-updates-rollbacks
[2] https://www.redhat.com/sysadmin/kubernetes-workloads-podman-systemd
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
See https://github.com/containers/conmon/pull/352
As of a few days ago, Ubuntu still hadn't built a fixed conmon.
Just skip the test until we get a fixed Ubuntu or until we
figure out a better solution to the test-something-RHEL8ish
problem.
UPDATE: WEIRD: this 'skip' triggered a baffling failure
on Ubuntu: the "Kubernetes only allows 63 characters"
warning message stopped appearing, on Ubuntu only, which
then caused the kube-generate tests to fail because they
actually checked for that. The message doesn't appear
because generate-kube is no longer spitting out a line
for org.opencontainers.image.base.digest/CONTAINER.
(Why this line is gone, I don't know, and choose not
to investigate). Solution: stop checking for the kube-63
warning. It's just not that important.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #14957, which added a new test that doesn't
actually belong in the 250-systemd.bats file. It was
copy-pasted from another test that doesn't belong there.
Move both tests to a new .bats file, because (1) they
need a custom cleanup, and (2) one of the tests should
very definitely run under podman-remote, and the 250
bats file has a global skip_if_remote().
Signed-off-by: Ed Santiago <santiago@redhat.com>
Docker supports -H and --host for specify the listening socket. Podman
should support them also in order to match the CLI.
These will not be documented since Podman defaults to using the
--url option.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
With the upcoming plans of introducing a podman-kube command with
various subcommands, rename the podman-play-kube systemd template
to podman-kube before releasing it.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
I noticed 'rmi -a' in a test. I tried to fix it. Hilarity ensued.
'rmi -a' is evil: it forces a fresh pull of our test image,
which in turn almost guarantees a flake some day. We avoid
it, but once in a while it slips in.
While fixing it, I noticed a bevy of other problems that
needed cleanup.
Signed-off-by: Ed Santiago <santiago@redhat.com>
commit 1951ff168a introduced a check so
that conmon is not moved to a new cgroup when podman is running inside
of a systemd service. This is helpful to integrate podman in systemd
so that the spawned conmon lives in the same cgroup as the service
that created it.
Unfortunately this breaks when podman daemon is running in a systemd
service since the same check is in place thus all the conmon processes
end up in the same cgroup as the podman daemon. When the podman
daemon systemd service stops the conmon processes are also terminated
as well as the containers they monitor.
Improve the check to exclude podman running as a daemon.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2052697
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
`podman auto-update` is now properly exercised in the system tests, so
we can safely remove the outdated TODO.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Support running `podman play kube` in systemd by exploiting the
previously added "service containers". During `play kube`, a service
container is started before all the pods and containers, and is stopped
last. The service container communicates its conmon PID via sdnotify.
Add a new systemd template to dispatch such k8s workloads. The argument
of the template is the path to the k8s file. Note that the path must be
escaped for systemd not to bark:
Let's assume we have a `top.yaml` file in the home directory:
```
$ escaped=$(systemd-escape ~/top.yaml)
$ systemctl --user start podman-play-kube@$escaped.service
```
Closes: https://issues.redhat.com/browse/RUN-1287
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Problem: the system test 'is()' checker was poorly thought out.
For example, there is no way to check for inequality or for
absence of a substring.
Solution, step 1: introduce new assert(), copied almost verbatim
from buildah, where it has been successful in addressing the
gaps in is().
The logical next step is to search the tests for 'die' and
for 'run', looking for negative assertions which we can
replace with assert(). There were a lot, and in the process
I found a number of ugly bugs in the tests themselves. I've
taken the liberty of fixing these.
Important note: at this time we have both assert() and is().
Replacing all instances of is() would be impossible to review.
Signed-off-by: Ed Santiago <santiago@redhat.com>
systemd expects the container_uuid environment variable be set
when it is running in a container.
Fixes: https://github.com/containers/podman/issues/13187
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When running podman inside systemd user units, it is possible that
systemd kills the rootless netns slirp4netns process because it was
started in the default unit cgroup. When the unit is stopped all
processes in that cgroup are killed. Since the slirp4netns process is
run once for all containers it should not be killed. To make sure
systemd will not kill the process we move it to the user.slice.
Fixes#13153
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
move the check after the cgroup manager is set, so to correctly detect
--cgroup-manager=cgroupfs and do not raise a warning about dbus not
being present.
Closes: https://github.com/containers/podman/issues/12802
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Replace `multi-user.target` with `default.target` across the code base.
It seems like the multi-user one is not available for (rootless) users
on F35 anymore is causing issues in all kinds of ways, for instance,
enabling the podman.service or generated systemd units.
Fixes: #12438
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add a new flag to set the start timeout for a generated systemd unit.
To make naming consistent, add a new --stop-timeout flag as well and let
the previous --time map to it.
Fixes: #11618
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>