Red Hat registry is too unreliable. (As of this writing
in January 2023, quay.io is not much better, but this is
a new flake. Ubi has been flaking for a year or more).
Instead of UBI, use the new systemd-image added to system tests
in #16814. Since this reduces the number of cached images,
a few unrelated tests (image count) need to be tweaked.
And, sigh, Fedora systemd colorizes boot messages by default,
causing a failure where we don't see an expected Reached Target
message. I don't want to rely on ASCII formatting codes, so
I've updated the build-systemd-image script so it disables
systemd colors, and have built a new systemd-image:20230106.
Made a few small usability improvements to the script as well.
Closes: #16695
Signed-off-by: Ed Santiago <santiago@redhat.com>
...based on f37, not f31. And make it fedora-minimal so it's
smaller. And clean up dnf so it's even smaller. And tag it
with our proper YMD tag, and commit the script that builds it.
This broke the system-df tests. In the process of resolving
that, I found those tests a little lacking. So, improve their
coverage a little bit.
Signed-off-by: Ed Santiago <santiago@redhat.com>
This adds basic container and volume system tests for quadlet. These
install and run actual systemd units and ensure they work.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
The 900-ssh test is not an actual test, and I'm unable to
figure out how to make it one. Skip it for now, but add a
bunch of FIXMEs some someone can come in later and actually
implement it.
Also removed lots of dead code and misleading comments.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The main helpers.bash file is rather bloated and it's difficult to
find stuff there. Move networking functions to their own helper
file.
While at it, apply a consistent style, and rearrange logically
related functions into sections.
Suggested-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Using bash /dev/tcp/ pseudo-device files to probe for bound ports has
indeed the advantage of simplicity, but comes with a few drawbacks:
- it will actually send data to unsuspecting services that might be
running in the same network namespace as the tests, possibly
causing unwanted interactions
- it doesn't allow for UDP probing
- it makes it impossible to clearly distinguish between different
address bindings
Replace that approach with a new helper, port_is_bound(), that uses
procfs entries at /proc/net to detect bound ports, without the need
for active probing.
We can now implement optional parameters in callers, to check if a
port if free for binding to a given address, including any IPv4
(0.0.0.0) or any IPv6 (::0) address, and for a given protocol, TCP
or UDP.
Extend random_free_port() and random_free_port_range() to support
that.
The implementation of one function in the file
test/system/helpers.bash, namely ipv6_to_procfs(), and the
implementation of the corresponding own test, delimited by the
markers "# BEGIN ipv6_to_procfs" and "# END ipv6_to_procfs" in the
file test/system/helpers.c was provided, on the public forum at:
https://github.com/containers/podman/pull/16141
by Ed Santiago <santiago@redhat.com>, who expressly invited me to
include them in this code submission.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently, wait_for_port() duplicates the check logic implemented by
port_is_free().
Add an optional argument to port_is_free(), representing the bound
address to check, and call it, dropping the direct check in
wait_for_port().
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
It looks like #16132 was my fault: a missing 'wait' for a container
to exit. Let's see if this fixes the flake.
And, while poking through flake logs, I found another missing wait.
And... in wait_for_output(), address a potential race.
Signed-off-by: Ed Santiago <santiago@redhat.com>
One of the system tests was creating a volume and not cleaning up
after itself. Fix that: do cleanup in the test itself. And, add
a 'volume rm -af' to global teardown() to leave things clean for
the next tests.
Also, OOPS! Correct some instances of 'podman' in two system
tests to 'run_podman'. And remove an unused (misleading) variable.
And, one more: in auto-update test, unit file, use $PODMAN,
not /usr/bin/podman
UGH! Yet one more: found/fixed a 'run<space>podman'
Signed-off-by: Ed Santiago <santiago@redhat.com>
PR #16141 introduces a new network type, "pasta". Its tests
rely on running 'ip -j' and socat in the container. Add them.
Also: bump to alpine 3.16.2 (from 3.16.0)
Also: clean up apk cache, this saves us 2MB+ in the image
Also (unrelated): clean up two broken uses of '$(< ...)' that
are causing tests to blow up under bats 1.8 on my laptop
New testimage is 20221018 and, sigh, is 12.7MB (up 4MB).
Signed-off-by: Ed Santiago <santiago@redhat.com>
Trying to catch the wiley metacopy flake: add a debug
condition to run_podman, in system tests, to log all
instances in which output includes the metacopy warning.
The idea is to detect the very first time it happens,
and see what is triggering it.
Signed-off-by: Ed Santiago <santiago@redhat.com>
For systems that have extreme robustness requirements (edge devices,
particularly those in difficult to access environments), it is important
that applications continue running in all circumstances. When the
application fails, Podman must restart it automatically to provide this
robustness. Otherwise, these devices may require customer IT to
physically gain access to restart, which can be prohibitively difficult.
Add a new `--on-failure` flag that supports four actions:
- **none**: Take no action.
- **kill**: Kill the container.
- **restart**: Restart the container. Do not combine the `restart`
action with the `--restart` flag. When running inside of
a systemd unit, consider using the `kill` or `stop`
action instead to make use of systemd's restart policy.
- **stop**: Stop the container.
To remain backwards compatible, **none** is the default action.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This exposed a nasty bug in our system-test setup: Ubuntu (runc)
was writing a scratch containers.conf file, and setting CONTAINERS_CONF
to point to it. This was well-intentionedly introduced in #10199 as
part of our long sad history of not testing runc. What I did not
understand at that time is that CONTAINERS_CONF is **dangerous**:
it does not mean "I will read standard containers.conf and then
override", it means "I will **IGNORE** standard containers.conf
and use only the settings in this file"! So on Ubuntu we were
losing all the default settings: capabilities, sysctls, all.
Yes, this is documented in containers.conf(5) but it is such
a huge violation of POLA that I need to repeat it.
In #14972, as yet another attempt to fix our runc crisis, I
introduced a new runc-override mechanism: create a custom
/etc/containers/containers.conf when OCI_RUNTIME=runc.
Unlike the CONTAINERS_CONF envariable, the /etc file
actually means what you think it means: "read the default
file first, then override with the /etc file contents".
I.e., we get the desired defaults. But I didn't remember
this helpers.bash workaround, so our runc testing has
actually been flawed: we have not been testing with
the system containers.conf. This commit removes the
no-longer-needed and never-actually-wanted workaround,
and by virtue of testing the cap-drops in kube generate,
we add a regression test to make sure this never happens
again.
It's a little scary that we haven't been testing capabilities.
Also scary: this PR requires python, for converting yaml to json.
I think that should be safe: python3 'import yaml' and 'json'
works fine on a RHEL8.7 VM from 1minutetip.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Integrate sd-notify policies into `kube play`. The policies can be
configured for all contianers via the `io.containers.sdnotify`
annotation or for indidivual containers via the
`io.containers.sdnotify/$name` annotation.
The `kube play` process will wait for all containers to be ready by
waiting for the individual `READY=1` messages which are received via
the `pkg/systemd/notifyproxy` proxy mechanism.
Also update the simple "container" sd-notify test as it did not fully
test the expected behavior which became obvious when adding the new
tests.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
implement new ssh interface into podman
this completely redesigns the entire functionality of podman image scp,
podman system connection add, and podman --remote. All references to golang.org/x/crypto/ssh
have been moved to common as have native ssh/scp execs and the new usage of the sftp package.
this PR adds a global flag, --ssh to podman which has two valid inputs `golang` and `native` where golang is the default.
Users should not notice any difference in their everyday workflows if they continue using the golang option. UNLESS they have been using an improperly verified ssh key, this will now fail. This is because podman was incorrectly using the
ssh callback method to IGNORE the ssh known hosts file which is very insecure and golang tells you not yo use this in production.
The native paths allows for immense flexibility, with a new containers.conf field `SSH_CONFIG` that specifies a specific ssh config file to be used in all operations. Else the users ~/.ssh/config file will be used.
podman --remote currently only uses the golang path, given its deep interconnection with dialing multiple clients and urls.
My goal after this PR is to go back and abstract the idea of podman --remote from golang's dialed clients, as it should not be so intrinsically connected. Overall, this is a v1 of a long process of offering native ssh, and one that covers some good ground with podman system connection add and podman image scp.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
new file: test/e2e/config_arm64.go
Tests that fail on aarch64 have been skipped with
`skip_if_aarch64`.
Co-authored-by: Chris Evich <cevich@redhat.com>
Co-authored-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>
* Correct spelling and typos.
* Improve language.
Co-authored-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
Wrong variable. And, wrong index range. And, wrong bash
syntax for extracting end_port. And, add explicit check
for valid range, because die() inside 'foo=$(...)' will not
actually die. And, refactor some confusing code. And,
reformat/clean up a confusing and too-wide comment.
Fixes: #14854
Signed-off-by: Ed Santiago <santiago@redhat.com>
The test must ensure that all ports in the range are free not just
the first. This flakes often because port 5355 is always in use by
systemd-resolved on fedora.
Fixes#14716
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Changes:
- use --timestamp option to produce 'created' stamps
that can be reliably tested in the image-history test
- podman now supports manifest & multiarch run, so we
no longer need buildah
- bump up base alpine & busybox images
This turned out to be WAY more complicated than it should've been,
because:
- alpine 3.14 fixed 'date -Iseconds' to include a colon in
the TZ offset ("-07:00", was "-0700"). This is now consistent
with GNU date's --iso-8601 format, yay, so we can eliminate
a minor workaround.
- with --timestamp, all ADDed files are set to that timestamp,
including the custom-reference-timestamp file that many tests
rely on. So we need to split the build into two steps. But:
- ...with a two-step build I need to use --squash-all, not --squash, but:
- ... (deep sigh) --squash-all doesn't work with --timestamp (#14536)
so we need to alter existing tests to deal with new image layers.
- And, long and sordid story relating to --rootfs. TL;DR that option
only worked by a miracle relating to something special in one
specific test image; it doesn't work with any other images. Fix
seems to be complicated, so we're bypassing with a FIXME (#14505).
And, unrelated:
- remove obsolete skip and workaround in run-basic test (dating
back to varlink days)
- add a pause-image cleanup to avoid icky red warnings in logs
Fixes: #14456
Signed-off-by: Ed Santiago <santiago@redhat.com>
I have seen some system tests flake waiting for a container to
transition into a specific running state. My theory is that
the waiting time was not sufficient on nodes under high load.
Hence, increase the waiting time. Also replace the break with
a return to spare some cycles to redundantly compare with the
already checked state.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Support running `podman play kube` in systemd by exploiting the
previously added "service containers". During `play kube`, a service
container is started before all the pods and containers, and is stopped
last. The service container communicates its conmon PID via sdnotify.
Add a new systemd template to dispatch such k8s workloads. The argument
of the template is the path to the k8s file. Note that the path must be
escaped for systemd not to bark:
Let's assume we have a `top.yaml` file in the home directory:
```
$ escaped=$(systemd-escape ~/top.yaml)
$ systemctl --user start podman-play-kube@$escaped.service
```
Closes: https://issues.redhat.com/browse/RUN-1287
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Add the notion of a "service container" to play kube. A service
container is started before the pods in play kube and is (reverse)
linked to them. The service container is stopped/removed *after*
all pods it is associated with are stopped/removed.
In other words, a service container tracks the entire life cycle
of a service started via `podman play kube`. This is required to
enable `play kube` in a systemd unit file.
The service container is only used when the `--service-container`
flag is set on the CLI. This flag has been marked as hidden as it
is not meant to be used outside the context of `play kube`. It is
further not supported on the remote client.
The wiring with systemd will be done in a later commit.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This reverts commit cc3790f332.
We can't change rootful to rootfull because `rootful` is written into the machine config. Changing this will break json unmarshalling, which will break existing machines.
[NO NEW TESTS NEEDED]
Signed-off-by: Ashley Cui <acui@redhat.com>
We are inconsistent on the name, we should stick with rootfull.
[NO NEW TESTS NEEDED] Existing tests should handle this and no tests for
machines exists yet.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Problem: the system test 'is()' checker was poorly thought out.
For example, there is no way to check for inequality or for
absence of a substring.
Solution, step 1: introduce new assert(), copied almost verbatim
from buildah, where it has been successful in addressing the
gaps in is().
The logical next step is to search the tests for 'die' and
for 'run', looking for negative assertions which we can
replace with assert(). There were a lot, and in the process
I found a number of ugly bugs in the tests themselves. I've
taken the liberty of fixing these.
Important note: at this time we have both assert() and is().
Replacing all instances of is() would be impossible to review.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Make sure to ignore local {container,docker}ignore files when building a
local pause image. Otherwise, we may mistakenly not be able to copy
catatonit into the build container.
Fixes: #13529
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
When a test which creates a network fail it will not remove the network.
The teardown logic should remove the networks. Since there is no --all
option for network rm we use network prune --force.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This comment refers to overiding $PODMAN although the code below does
nothing of the sort. Presumbly the comment has been outdated by altering
the containers.conf / $CONTAINERS_CONF instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix handling of "bind" and "tmpfs" olumes to actually work.
Allow bind, tmpfs local volumes to work in rootless mode.
Also removed the string "error" from all error messages that begine with it.
All Podman commands are printed with Error:, so this causes an ugly
stutter.
Fixes: https://github.com/containers/podman/issues/12013
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Podman image scp should never enter the Podman UserNS unless it needs to. This allows for
a sudo exec.Command to transfer images to and from rootful storage. If this command is run using sudo,
the simple sudo podman save/load does not work, machinectl/su is necessary here.
This modification allows for both rootful and rootless transfers, and an overall change of scp to be
more of a wrapper function for different load and save calls as well as the ssh component
Signed-off-by: cdoern <cdoern@redhat.com>
Some containers require certain user account(s) to exist within the
container when they are run. This option will allow callers to add a
bunch of passwd entries from the host to the container even if the
entries are not in the local /etc/passwd file on the host.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935831
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add --time flag to podman container rm
Add --time flag to podman pod rm
Add --time flag to podman volume rm
Add --time flag to podman network rm
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Fix day-one sloppiness: when I first wrote this framework
it compared strings using 'expr', not '=', to be more
forgiving of extra cruft in output. This was a bad decision.
It means that warnings or additional text are ignored:
is "all is ok, NOT!" "all is ok" <-- this would pass
Solution: tighten up the 'is' check. Use '=' (direct
compare) first. If it fails, look for wild cards ('*')
or character classes ('[') in the expect string. If
so, and only then, use 'expr'. And, thanks to a clever
suggestion from Luap99, include '(using expr)' in the
error message when we do so; this could make it easier
for a developer to understand a string mismatch.
This change exposes a lot of instances in which we weren't
doing proper comparisons. Fix those. Thankfully, there
weren't as many as I'd feared.
Also, and completely unrelated, add '-T' flag to bats
helper, for showing timing results. (I will open this
as a separate PR if requested. I too find it offensive
to jumble together unrelated commits.)
Signed-off-by: Ed Santiago <santiago@redhat.com>
skip the test "podman selinux: shared context in (some) namespaces" on
cgroupsv1 when running as rootless since the tests requires
--pid=container:.
If the container runtime cannot use cgroupsv1 and the container has no
pid namespace. then it is not possible to correctly terminate the
container. Without a cgroup or a pid namespace, the runtime has no
control on what processes are in the container.
Closes: https://github.com/containers/podman/issues/11785
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Wow did I screw up. #10982 introduced (at my suggestion) a
new wait_for_port() helper, with the goal of eliminating a
race condition. It didn't work.
First: wait_for_port() tests by connecting to the port, which
is a Bad Idea when you have a one-shot server that exits upon
the first connection closing. We should've caught that, but:
Second: I wrote wait_for_port() for a non-BATS test framework,
and used the conventional file descriptor 3. BATS uses fd3
for internal control. Overriding that made the test silently
just disappear, no "not ok" message, no warnings, nothing
except vanishing into the ether.
Third: this was caught by my log-colorizer script, which
loudly yelled "WARNING: expected 234" (tests) at the
bottom of the log. Unfortunately, since this wasn't
my PR, I didn't actually look at the test logs.
Solution: we can't use wait_for_port() in the network port
test. Use wait_for_output() instead, triggering on the
'listening' message emitted by netcat in the container.
Also: fix wait_for_port() to use fd5 instead of 3. Although
no code currently uses wait_for_port() as of this PR, it's
a useful helper that we may want to keep.
Signed-off-by: Ed Santiago <santiago@redhat.com>
It was observed during periodic testing, this test can fail due to the
container process being not fully running and listening on the expected
port:
```
[+1069s] not ok 220 podman networking: port with --userns=keep-id
[+1069s] # (in test file test/system/500-networking.bats, line 144)
[+1069s] # `echo "$teststring" | nc 127.0.0.1 $myport' failed
[+1069s] # # /var/tmp/go/src/github.com/containers/podman/bin/podman rm
--all --force
[+1069s] # # /var/tmp/go/src/github.com/containers/podman/bin/podman ps
--all --external --format {{.ID}} {{.Names}}
[+1069s] # # /var/tmp/go/src/github.com/containers/podman/bin/podman
images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
[+1069s] # quay.io/libpod/testimage:20210610 9f9ec7f2fdef
[+1069s] # # /var/tmp/go/src/github.com/containers/podman/bin/podman run
-d --userns=keep-id -p 127.0.0.1:54322:54322
quay.io/libpod/testimage:20210610 nc -l -n -v -p 54322
[+1069s] #
252c562c9a3c96892d867d1d72fb52b2efdfe62855ebedbccd2d281c472c2988
[+1069s] # Ncat: No route to host.
```
Fix this by using a new `wait_for_port()` function (thanks @edsantiago)
before attempting to communicate with the service.
Signed-off-by: Chris Evich <cevich@redhat.com>
TL;DR podman needs "arm64" as arch, not "arm64v8".
Unexpurgated version: docker.io publishes ${ARCH}/alpine for
several values of ARCH. Unfortunately, the arm64 one is
called "arm64v8", which is sensible, but podman needs the
--arch value of the manifest to be exactly "arm64". So we
need to special-case this value in our loop. Do so, and
build/publish a new 20210610 testimage. Use that in tests
moving forward.
And, since we need to jump through the same hoops to build
the nonlocal image, include it in the build loop instead
of as a tacked-on comment. Try to be helpful by determining
the next-available numeric tag.
And: don't push anything by default. Instead, just tell
the user what buildah-push commands to run.
And: refactor $PODMAN_NONLOCAL_IMAGE_TAG, to make it easier
for the RHEL-arch-testing folx to override using envariables
instead of inplace-sed. (Not that they should ever need to
override again, because this is the final multiarch commit
that should be forevermore perfect and need no further commits
ever again).
And, finally, bump up to latest alpine/busybox images.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Some CI systems set $OCI_RUNTIME as a way to override the
default crun. Integration (e2e) tests honor this, but system
tests were not aware of the convention; this means we haven't
been testing system tests with runc, which means RHEL gating
tests are now failing.
The proper solution would be to edit containers.conf on CI
systems. Sorry, that would involve too much CI-VM work.
Instead, this PR detects $OCI_RUNTIME and creates a dummy
containers.conf file using that runtime.
Add: various skips for tests that don't work with runc.
Refactor: add a helper function so we don't need to do
the complicated 'podman info blah blah .OCIRuntime.blah'
thing in many places.
BUG: we leave a tmp file behind on exit.
Signed-off-by: Ed Santiago <santiago@redhat.com>
The RHEL multi-arch team informed me that we were missing
aarch64; add it, using the new name (arm64v8).
(This is from last week, so the image date tag does not
match today's date. I was waiting for confirmation that
things were working).
Signed-off-by: Ed Santiago <santiago@redhat.com>