The test suite has expanded to 415 system tests. These tests can be
very I/O intensive, because many of them copy OCI images from the test
suite's image cache directory to its local container/storage store,
create containers, and then delete everything to run the next test with
a clean slate. This makes the system tests slow.
Unfortunately, Zuul's max-job-timeout setting defaults to an upper limit
of 3 hours or 10800 seconds for jobs [1], and this is what Software
Factory uses [2]. So, there comes a point beyond which the CI can't be
prevented from timing out by increasing the timeout.
One way of scaling past this maximum time limit is to run the tests in
parallel across multiple nodes. This has been implemented by splitting
the system tests into different groups, which are run separately by
different nodes.
First, the tests were grouped into those that test commands and options
accepted by the toolbox(1) binary, and those that test the runtime
environment within the Toolbx containers. The first group has more
tests, but runs faster, because many of them test error handling and
don't do much I/O.
The runtime environment tests take especially long on Fedora Rawhide
nodes, which are often slower than the stable Fedora nodes. Possibly
because Rawhide uses Linux kernels that are built with debugging
enabled, which makes it slower. Therefore, this group of tests were
further split for Rawhide nodes by the Toolbx images they use. Apart
from reducing the number of tests in each group, this should also reduce
the amount of time spent in downloading the images.
The split has been implemented with Bats' tagging system that is
available from Bats 1.8.0 [3]. Fortunately, commit 87eaeea6f0
already added a dependency on Bats >= 1.10.0. So, there's nothing to
worry about.
At the moment, Bats doesn't expose the tags being used to run the test
suite to setup_suite() and teardown_suite() [4]. Therefore, the
TOOLBX_TEST_SYSTEM_TAGS environment variable was used to optimize the
contents of setup_suite().
[1] https://zuul-ci.org/docs/zuul/latest/tenants.html
[2] Commit 83f28c52e4https://github.com/containers/toolbox/commit/83f28c52e47c2d44https://github.com/containers/toolbox/pull/1548
[3] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[4] https://github.com/bats-core/bats-core/issues/1006https://github.com/containers/toolbox/pull/1551