We now no longer write containers.conf, instead system connections and
farms are written to a new file called podman-connections.conf.
This is a major rework and I had to change a lot of things to get this
to compile again with my c/common changes.
It is a breaking change for users as connections/farms added before this
commit can now no longer be removed or modified directly. However because
the logic keeps reading from containers.conf the old connections can
still be used to connect to a remote host.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Cut is a cleaner & more performant api relative to SplitN(_, _, 2) added in go 1.18
Previously applied this refactoring to buildah:
https://github.com/containers/buildah/pull/5239
Signed-off-by: Philip Dubé <philip@peerdb.io>
Test "podman start container by systemd" is failed on the system in
which rootless users don't have accessibility to journald. Therefore,
skip the part that reads journal with journalctl.
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
A number of tests start a container then immediately run podman stop.
This frequently flakes with:
StopSignal SIGTERM failed to stop [...] in 10 seconds, resorting to SIGKILL
Likely reason: container is still initializing, and its process
has not yet set up its signal handlers.
Solution: if possible (containers running "top"), wait for "Mem:"
to indicate that top is running. If not possible (pods / catatonit),
sleep half a second.
Intended to fix some of the flakes cataloged in #20196 but I'm
leaving that open in case we see more. These are hard to identify
just by looking in the code.
Signed-off-by: Ed Santiago <santiago@redhat.com>
We're only testing vfs in CI. That's bad. #18822 tried to
remedy that but that only worked on system tests, not e2e.
Here we introduce CI_DESIRED_STORAGE, to be set in .cirrus.yml
in the same vein as all the other CI_DESIRED_X. Since it's 2023
we default to overlay, testing vfs only in priorfedora.
Fixes required:
- e2e tests:
- in cleanup, umount ROOT/overlay to avoid leaking mounts
- system tests:
- fix a few badly-written tests that assumed/hardcoded overlay
- buildx test: add weird exception to device-number test
- mount tests: add special case code for vfs
- unprivileged test: disable one section that is N/A on vfs
Signed-off-by: Ed Santiago <santiago@redhat.com>
Followup to #20318: now that sqlite is the podman default,
enforce that in CI as well. Test boltdb only in Prior Fedora.
In the process, discovered & cleaned up some duplication
and unused YAML anchors.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Problem: frequent CI flakes of the form:
Error: cannot listen on the TCP port: listen tcp4 :5355: bind: address already in use
Always 5355.
Cause: systemd-resolve listens on 5355, but not on 127.0.0.1. So
when GetPort() tries its is-it-in-use check by binding localhost,
it succeeds; but then podman binds * and fails.
Solution: GetPort(): test by binding 0.0.0.0.
Also, improve the failure message.
Signed-off-by: Ed Santiago <santiago@redhat.com>
When you run e2e tests locally they use CNI unless the NETWORK_BACKEND
env was set to netavark. Because our main focus is on netavark we should
test it by default.
For local tests this should help to prevent CNI/netavark conflicts as I
assume most systems where people run tests on are on netavark by now.
For CI testing we hardcode NETWORK_BACKEND there to test both netavark
(on current fedora) and CNI (prior fedora). MAke sure to switch the
logic in the CI setup to reflect that.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Commit 2 of 2: fixes to get tests passing
Mostly reverting back to Exit(0) on tests that produce stderr,
adding stderr checks when those are missing.
One pretty big exception: "run check dns" test was completely
broken in many respects. It should never have worked under CNI,
but was passing because nslookup in that alpine image was
checking /etc/hosts. This has been fixed in subsequent alpine
images, which we're now using in this test (CITEST_IMAGE).
Signed-off-by: Ed Santiago <santiago@redhat.com>
Shortcuts like unix:path and unix:/path do not work everywhere,
so make sure to use unix://path when quoting the url (or address)
Signed-off-by: Anders F Björklund <anders.f.bjorklund@gmail.com>
Do not use podman info/version as they are expensive and clutter the log
for no reason. Just checking if we can connect to the socket should be
good enough and much faster.
Fix the non existing error checking, so that we actually see an useful
error when this does not work.
Also change the interval, why wait 2s for a retry lets take 100ms steps
instead.
Fixes#19010
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Like LockTmpDir use a random tmpdir for this directory. Make sure it is
set for all parallel ginkgo processes.
Also GinkgoT().TempDir() will automatcially remove the directory at the
end so we do not need to worry about cleanup.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
AFAIK the latest podman will not even run on RHEL 7 anymore, in any case
we do not need these tests to run there.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Using the OS tempdir here is not good. This defaults to /tmp which means
the inital podman test setup uses these paths:
`--root /tmp/root --runroot /tmp/runroot and --tmpdir /tmp`
Thus we create many files directly under /tmp. Also they were never
removed thus leaked out. When running as root and then later as rooltess
this would fail to permission problems.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
There is no need to buffer them all into an array then write them once
at the end. Just write directly to the file.
Fixes#19104
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add header comment suggesting podman network create instead.
Stop using it in checkpoint tests. Turned out to be much more
complicated than expected.
Also, fix two issues caught while scanning the code:
- remove obsolete f28-and-earlier code.
- remove seccomp workaround needed for RHEL7
Signed-off-by: Ed Santiago <santiago@redhat.com>
For tests that use '--ip XX', random IP allocation is not
working well. Switch instead to a deterministic algorithm
with CPU affinity and a fudge factor for CNI.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Ginkgo test names can have more than two levels: there can be
a nested series of Describes() before the final It(). (e.g.,
quadlet_test.go). Handle that.
Before: we just assumed that the third-or-maybe-fourth line
after a "-----" divider was the test name.
Now: examine every line after the "-----" divider, until the
first empty line. Lines with /path/to/source/file are ignored,
lines with text strings are assembled together to make anchors.
This is still imperfect but it's much better than before.
SPECIAL NOTE: in order to allow linking to timing results
in the AfterSuite, I've changed the test name from Leaf to Full.
This will now be a much longer string, and hence much less
readable, but I'm inclined to think it's more correct. Please
review carefully and lmk if I should revert.
Finally, as an unrelated add-on, add links (at top) to original
log, journal, and (if applicable) podman-remote server logs.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Intented to fix an obscure, unlikely race condition in which (I
think) two parallel jobs called GetPort() and were assigned the
same port.
Also, add actual proper testing to two HTTP-registry tests, and
Skip a third that's a waste of cycles (filed #18768)
Signed-off-by: Ed Santiago <santiago@redhat.com>
"image rm concurrent" test is still failing, even after #18664:
Error: no contents in "/tmp/podman_test967723851/Dockerfile"
Probable cause: the images are built in parallel, and p.BuildImage()
writes one single Dockerfile. (This almost certainly renders the
test less effective than intended, since the generated images
might end up being identical).
Solution: write and use a uniquely-named Dockerfile
Signed-off-by: Ed Santiago <santiago@redhat.com>
There is no reason to define the same code every time in each file, just
use global nodes. This diff should speak for itself.
CleanupSecrets()/Volume() no longer call Cleanup() directly, as the
global AfterEach node will always call Cleanup() this is no longer
necessary. If one AfterEach() node fails it will still run the others.
Also always unset the CONTAINERS_CONF env vars. This prevents people
from forgetting to unset it. And fix the special CONTAINERS_CONF logic
in the system connection tests, we do not want to preserve
CONTAINERS_CONF anyway so just remove this logic.
Ginkgo orders the BeforeEach and AfterEach nodes. They will be executed
from the outer-most defined to inner-most. This means our global
BeforeEach is always first. Only then the inner one (in the Describe()
function in each file). For AfterEach it is inverted, from the inner to
the outer.
Also see https://onsi.github.io/ginkgo/#organizing-specs-with-container-nodes
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add a workaround for #18180 so the ginkgo work can be merged without
being blocked by the issue. Please revert this commit when the issue
is fixed.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This never worked when ginkgo runs with more than one thread, we use 3
in CI. The problem is that the SynchronizedAfterSuite() function accepts
two functions. The first one is run for each ginkgo node while the
second one is only run once for the whole suite.
Because the timings are stored as slice thus in memory we loose all
timings from the other nodes as they were only reported on node 1.
Moving the printing in the first function solves this but causes the
problem that the result is now no longer sorted. To fix this we let
each node write the result to a tmp file and only then let the final
after suite function collect the timings from all these files, then
sort them and print the output like we did before.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Just like Cleanup() they should check the error codes.
While doing this it was clear that some volume tests were calling
Cleanup() twice so remove this.
Instead make sure they call Cleanup() themselves so callers only need to
do one call. This is required because we cannot use Expect().To() before
doing all the cleanup. An error causes panic does results in an early
return thus missing potentially important cleanup.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It looks like AfterEach() is now executed even after Skip(), this is a
good idea because the fact that it did't before caused us to leak tmp
directories. However in case Skip() is called before the podmanTest is
initialized it will no result in a panic. To fix it simply prevent such
panic by checking the pointer against nil and do nothing in such case.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Directly writing to stdout/err is not safe when run in parallel.
Ginkgo v2 fixed this buffering the output and syncing the output so it
is not mangled between tests.
This means we should use the GinkgoWriter everywhere to make sure the
output stays in sync.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Only check exit codes last, othwerwise in case of errors it will return
early and miss other commands.
Also explicitly stop before rm, rm is not working in all cases (#18180).
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We blindy trust these commands to work but as it turns out they do not
under certain circumstances.
The "podman run ipcns ipcmk container test" can be used to fail this
reliably, if a container has dependencies the order of rm --all may
cause it to fail because the contianers are deleted in the wrong order.
This is th eonly one I found so far, adding this will uncover many more
of such problems without proper cleanup we leak processes and ginkgo v2
will block because of them.
Of course this cannot be merged without fixing these issues.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Ref: https://pkg.go.dev/math/rand@go1.20#Seed
Note: For `runtime_test.go`, this test-case was never actually doing
what appears as it's intent . Fixing it to work as intended would be
require incredibly libpod-invasive changes. Do the least-worse thing and
simply confirm that consecutive generated names are different.
Signed-off-by: Chris Evich <cevich@redhat.com>
If a unit is not active the exit code from systemctl is 3. Thus this
test always failed because it checked the error.
Fix this by checking the exit code and remove the unnecessary output
parsing.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Adapts to pass the test even if
podman binary path is not `/usr/local/bin/podman`.
[NO NEW TESTS NEEDED]
Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
Since commit f250560a80 the play kube command uses its own network.
this is racy be design because we create the network followed by
creating/running pod/containers. This means in the meantime another
prune or reset process could wipe out the network config because we have
to share the network config directory by design in the test.
The problem is we only have one host netns which is shared between
tests. If the network config dir is not shared we cannot make conflict
checks for interface names and ip address. This results in different
tests trying to use the same interface and/or ip address which will
cause runtime failures in CNI and netavark.
The only solution I see is to make sure only the reset/prune tests are
using a custom network dir. This makes sure they do not wipe configs
that are otherwise required by other parallel running tests.
Fixes#17946
Signed-off-by: Paul Holzinger <pholzing@redhat.com>