XDG_RUNTIME_DIR is needed for two groups of reasons when Toolbx is used
rootless.
First, it's important for toolbox(1) itself to work rootless because it
needs to place several files:
* The 'lock' file to synchronize Podman migrations.
* The initialization stamp file to synchronize the container's entry
point with the user-facing 'enter' and 'run' commands running on the
host operating system.
* The generated Container Device Interface specification.
These files need to be separate for the toolbox(1) processes run by the
system tests, those run by the user for 'normal' use, and concurrent
invocations of the tests.
Therefore, it's better to use a custom XDG_RUNTIME_DIR that's within the
sandbox offered by Bats [1]. The sandbox is clearly labelled as being
used by Bats, is unique for each invocation, and Bats takes care of
cleaning everything up once it has finished running.
Note that XDG_RUNTIME_DIR's Unix access mode MUST be 0700 [2]. eg.,
Ubuntu 22.04 and 24.04 Desktop have a umask of 0002, and if an access
mode is not explicitly specified, XDG_RUNTIME_DIR will be created with
0775. That will cause dbus-daemon(1) to fail with:
Unable to set up transient service directory: XDG_RUNTIME_DIR
"/var/tmp/bats-run-4XQL6i/suite/xdg-runtime-dir" can be written by
others (mode 040775)
Second, XDG_RUNTIME_DIR is used to propagate things like the user D-Bus,
Pipewire and Wayland sockets from the host to the container. These
don't need to be separated. However, if a custom XDG_RUNTIME_DIR is
used then those sockets that are used by the system tests, such as the
user D-Bus socket, have to be replicated.
Therefore, a custom D-Bus instance is run to offer the user D-Bus socket
with a configuration similar to that of the host OS. The dbus-daemon(1)
implementation is used for the sake of simplicity. It creates the
socket itself based on the configuration, unlike dbus-broker-launch(1)
where the socket must be separately created and passed to it by its
parent.
However, Podman can't use systemd as the cgroups manager with this D-Bus
instance, as the bus wasn't started by the user systemd instance. So, a
custom containers.conf(5) is used to change the cgroups manager to
cgroupfs. The only other options in the containers.conf(5) are those
that are common across Fedora 41 and 42, and Ubuntu 22.04 and 24.04.
[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[2] https://specifications.freedesktop.org/basedir-spec/latest/https://github.com/containers/toolbox/pull/1652
The XDG_CACHE_HOME environment variable is supposed to default to
$HOME/.cache [1], just as it did in the test suite, and this location is
meant to be used as a cache for 'normal' use by the user. Test suites
generally don't qualify as 'normal' use.
One expects that deleting the cache shouldn't affect 'normal' use other
than degrading performance. However, deleting these temporary files
used by the test suite will cause actual breakage. Even if the user
doesn't manually delete the cache, two concurrent invocations of the
test suite can do so or lead to other unexpected collisions, because the
paths are constant across multiple invocations.
Therefore, it's better to limit the scope of the test suite's temporary
files within the sandbox offered by Bats [2]. The sandbox is clearly
labelled as being used by Bats, is unique for each invocation, and Bats
takes care of cleaning everything up once it has finished running.
Note that there's no need for the system-test-storage sub-directory
under BATS_SUITE_TMPDIR. So it was left out.
[1] https://specifications.freedesktop.org/basedir-spec/latest/
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.htmlhttps://github.com/containers/toolbox/pull/1645
The test suite has expanded to 415 system tests. These tests can be
very I/O intensive, because many of them copy OCI images from the test
suite's image cache directory to its local container/storage store,
create containers, and then delete everything to run the next test with
a clean slate. This makes the system tests slow.
Unfortunately, Zuul's max-job-timeout setting defaults to an upper limit
of 3 hours or 10800 seconds for jobs [1], and this is what Software
Factory uses [2]. So, there comes a point beyond which the CI can't be
prevented from timing out by increasing the timeout.
One way of scaling past this maximum time limit is to run the tests in
parallel across multiple nodes. This has been implemented by splitting
the system tests into different groups, which are run separately by
different nodes.
First, the tests were grouped into those that test commands and options
accepted by the toolbox(1) binary, and those that test the runtime
environment within the Toolbx containers. The first group has more
tests, but runs faster, because many of them test error handling and
don't do much I/O.
The runtime environment tests take especially long on Fedora Rawhide
nodes, which are often slower than the stable Fedora nodes. Possibly
because Rawhide uses Linux kernels that are built with debugging
enabled, which makes it slower. Therefore, this group of tests were
further split for Rawhide nodes by the Toolbx images they use. Apart
from reducing the number of tests in each group, this should also reduce
the amount of time spent in downloading the images.
The split has been implemented with Bats' tagging system that is
available from Bats 1.8.0 [3]. Fortunately, commit 87eaeea6f0
already added a dependency on Bats >= 1.10.0. So, there's nothing to
worry about.
At the moment, Bats doesn't expose the tags being used to run the test
suite to setup_suite() and teardown_suite() [4]. Therefore, the
TOOLBX_TEST_SYSTEM_TAGS environment variable was used to optimize the
contents of setup_suite().
[1] https://zuul-ci.org/docs/zuul/latest/tenants.html
[2] Commit 83f28c52e4https://github.com/containers/toolbox/commit/83f28c52e47c2d44https://github.com/containers/toolbox/pull/1548
[3] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[4] https://github.com/bats-core/bats-core/issues/1006https://github.com/containers/toolbox/pull/1551
The paths to bats-assert and bats-support are broken, if bats(1) is
invoked from any other location than the parent directory of the 'tests'
directory. eg., Podman's downstream Fedora CI invokes the tests as:
$ cd /path/to/toolbox/test/system
$ bats .
... and it led to [1]:
1..306
# test suite: Set up
# Missing dependencies
# Forgot to run 'git submodule init' and 'git submodule update' ?
# test suite: Tear down
not ok 1 setup_suite
# (from function `setup_suite' in test file ./setup_suite.bash, line 33)
# `return 1' failed
# bats warning: Executed 1 instead of expected 306 tests
Fallout from 2c09606603
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2263968https://github.com/containers/toolbox/pull/1448
These files aren't marked as executable, and shouldn't be, because they
aren't meant to be standalone executable scripts. They're meant to be
part of a test suite driven by Bats. Therefore, it doesn't make sense
for them to have shebangs, because it gives the opposite impression.
The shebangs were actually being used by external tools like Coverity to
deduce the shell when running shellcheck(1). Shellcheck's inline
'shell' directive is a more obvious way to achieve that.
https://github.com/containers/toolbox/pull/1363
The setup_suite.bash file is meant to be written in Bash, and is not
supposed to have any Bats-specific syntax. That's why it has the *.bash
suffix, not *.bats. If Bats finds a setup_suite.bash file, when running
the test suite, it uses Bash's source(1) builtin to read the file.
This is a cosmetic change. The Bats syntax is a superset of the Bash
syntax. Therefore, it didn't make a difference to external tools like
Coverity that use the shebang to deduce the shell for shellcheck(1).
Secondly setup_suite.bash isn't meant to be an executable script and,
hence, the shebang has no effect on how the file is used. However, it's
still a commonly used hint about the contents of the file, and it's
better to be accurate than misleading.
A subsequent commit will replace the shebangs in the test suite with
ShellCheck's 'shell' directives.
Fallout from 7a387dcc8bhttps://github.com/containers/toolbox/pull/1363
First, it's not a good idea to use awk(1) as a grep(1) replacement.
Unless one really needs the AWK programming language, it's better to
stick to grep(1) because it's simpler.
Secondly, it's better to look for a specific os-release(5) field instead
of looking for the occurrence of 'rawhide' anywhere in the file, because
it lowers the possibility of false positives.
https://github.com/containers/toolbox/pull/1336
We wasted some time trying to get the tests running locally, when all we
were missing were the 'git submodule ...' commands.
Add some more obvious hints about this possible stumbling block.
Note that Bats cautions against printing outside the @test, setup* or
teardown* functions [1]. In this case, doing so leads to the first line
of the error output going missing, when using the pretty formatter for
human consumption:
$ bats --formatter pretty ./test/system
✗ setup_suite
Forgot to run 'git submodule init' and 'git submodule update' ?
bats warning: Executed 1 instead of expected 191 tests
191 tests, 1 failure, 190 not run
[1] https://bats-core.readthedocs.io/en/stable/writing-tests.htmlhttps://github.com/containers/toolbox/pull/1298
Signed-off-by: Matthias Clasen <mclasen@redhat.com>
The 000-setup.bats and 999-teardown.bats files were added [1] at a time
when Bats didn't offer any hooks for suite-wide setup and teardown.
That changed in Bats 1.7.0, which introduced the setup_suite and
teardown_suite hooks. These hooks make it easier to run a subset of the
tests, which is a good thing.
In the past, to run a subset of the tests, one had to do:
$ bats ./test/system/000-setup.bats ./test/system/002-help.bats \
./test/system/999-teardown.bats
Now, one only has to do:
$ bats ./test/system/002-help.bats
Commit e22a82fec8 already added a dependency on Bats >= 1.7.0.
Therefore, it should be exploited wherever possible to simplify things.
[1] Commit 54a2ca1eadhttps://github.com/containers/toolbox/issues/751
[2] Bats commit fb467ec3f04e322a
https://github.com/bats-core/bats-core/issues/39https://bats-core.readthedocs.io/en/stable/writing-tests.htmlhttps://github.com/containers/toolbox/pull/1317