This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system. Any program that uses PKCS #11 to access CA certificates should
see the same ones both inside the container and on the host.
During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.
Just like Flatpak, the singleton 'p11-kit server' process is not
terminated when the last 'enter' or 'run' command exits.
The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS #11 module instead of the usual p11-kit-trust.so
module. This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.
However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.
This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS #11 modules or
p11-kit-client.so are missing inside the container. None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.
The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so. Until that happens, containers created from them
won't have access to the CA certificates from the host.
The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2]. To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user. One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3]. So, this workaround
has to suffice until the problem is solved.
On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides 'p11-kit server', because it leads to:
$ sudo dpkg --purge p11-kit
dpkg: dependency problems prevent removal of p11-kit:
adoptium-ca-certificates depends on p11-kit.
Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the 'server' command is removed. The rest of the
p11-kit package is left untouched.
[1] Flatpak commit 66b2ff40f7caf3a7
https://github.com/flatpak/flatpak/commit/66b2ff40f7caf3a7https://github.com/flatpak/flatpak/pull/1757https://github.com/p11-glue/p11-kit/issues/68
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[3] https://github.com/containers/toolbox/pull/1652https://github.com/containers/toolbox/issues/626
Currently, the CI is failing because 'go mod download' is encountering an
expired TLS certificate:
$ go mod download
go: github.com/spf13/viper@v1.10.1 requires
go.opencensus.io@v0.23.0: unrecognized import path "go.opencensus.io":
https fetch: Get "https://go.opencensus.io/?go-get=1": tls: failed to
verify certificate: x509: certificate has expired or is not yet valid:
current time 2025-01-23T17:00:16+01:00 is after 2025-01-21T03:43:04Z
Therefore, disable the TLS certificate check until the certificate gets
updated or the dependency gets removed [1].
[1] https://pkg.go.dev/cmd/go#hdr-Environment_variableshttps://github.com/containers/toolbox/pull/1611
The toolbox(1) binary always relies on the PATH environment variable to
find the podman(1) and skopeo(1) binaries. There's no way to override
those with the PODMAN and SKOPEO environment variables, and they only
affect any direct use of podman(1) and skopeo(1) within the test suite.
Therefore, offering the PODMAN and SKOPEO environment variables in their
current form is needlessly confusing and misleading, and can lead to
surprises arising from different podman(1) and skopeo(1) binaries being
used in different places. Either toolbox(1) should also honour them or
the test suite shouldn't offer them. The former is more complicated
without any obvious need for it, so the latter was chosen.
https://github.com/containers/toolbox/pull/1592
The test suite has expanded to 415 system tests. These tests can be
very I/O intensive, because many of them copy OCI images from the test
suite's image cache directory to its local container/storage store,
create containers, and then delete everything to run the next test with
a clean slate. This makes the system tests slow.
Unfortunately, Zuul's max-job-timeout setting defaults to an upper limit
of 3 hours or 10800 seconds for jobs [1], and this is what Software
Factory uses [2]. So, there comes a point beyond which the CI can't be
prevented from timing out by increasing the timeout.
One way of scaling past this maximum time limit is to run the tests in
parallel across multiple nodes. This has been implemented by splitting
the system tests into different groups, which are run separately by
different nodes.
First, the tests were grouped into those that test commands and options
accepted by the toolbox(1) binary, and those that test the runtime
environment within the Toolbx containers. The first group has more
tests, but runs faster, because many of them test error handling and
don't do much I/O.
The runtime environment tests take especially long on Fedora Rawhide
nodes, which are often slower than the stable Fedora nodes. Possibly
because Rawhide uses Linux kernels that are built with debugging
enabled, which makes it slower. Therefore, this group of tests were
further split for Rawhide nodes by the Toolbx images they use. Apart
from reducing the number of tests in each group, this should also reduce
the amount of time spent in downloading the images.
The split has been implemented with Bats' tagging system that is
available from Bats 1.8.0 [3]. Fortunately, commit 87eaeea6f0
already added a dependency on Bats >= 1.10.0. So, there's nothing to
worry about.
At the moment, Bats doesn't expose the tags being used to run the test
suite to setup_suite() and teardown_suite() [4]. Therefore, the
TOOLBX_TEST_SYSTEM_TAGS environment variable was used to optimize the
contents of setup_suite().
[1] https://zuul-ci.org/docs/zuul/latest/tenants.html
[2] Commit 83f28c52e4https://github.com/containers/toolbox/commit/83f28c52e47c2d44https://github.com/containers/toolbox/pull/1548
[3] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[4] https://github.com/bats-core/bats-core/issues/1006https://github.com/containers/toolbox/pull/1551
The Zuul executor got updated from Ansible 2.13.7 to 2.15.10, which now
has support for DNF5 [1] and the previous DNF5 Change [2] for Fedora 39
is now aiming at Fedora 41 (and Rawhide) [3]. Unfortunately, Ansible's
'dnf5' module is still under development and doesn't seem to match the
state of DNF5 in Fedora Rawhide, which causes:
TASK [Install RPM packages]
fedora-rawhide | ERROR
fedora-rawhide | {
fedora-rawhide | "failures": [],
fedora-rawhide | "msg": "Could not import the libdnf5 python module
using /usr/bin/python3 (3.12.3 (main, Apr 17 2024, 00:00:00) [GCC
14.0.1 20240411 (Red Hat 14.0.1-0)]). Please install
python3-libdnf5 package or ensure you have specified the correct
ansible_python_interpreter. (attempted
['/usr/libexec/platform-python', '/usr/bin/python3',
'/usr/bin/python2', '/usr/bin/python'])"
fedora-rawhide | }
Trying to explicitly install python3-libdnf5, as suggested above, using
Ansible's 'command' module before using the 'package' module to install
the Toolbx dependencies, still ends up with:
TASK [Install RPM packages]
fedora-rawhide | MODULE FAILURE:
fedora-rawhide | Traceback (most recent call last):
fedora-rawhide | File "<stdin>", line 107, in <module>
fedora-rawhide | File "<stdin>", line 99, in _ansiballz_main
fedora-rawhide | File "<stdin>", line 47, in invoke_module
fedora-rawhide | File "<frozen runpy>", line 226, in run_module
fedora-rawhide | File "<frozen runpy>", line 98, in _run_module_code
fedora-rawhide | File "<frozen runpy>", line 88, in _run_code
fedora-rawhide | File "/tmp/ansible_ansible.legacy.dnf5_payload_kecazv78/ansible_ansible.legacy.dnf5_payload.zip/ansible/modules/dnf5.py",
line 708, in <module>
fedora-rawhide | File "/tmp/ansible_ansible.legacy.dnf5_payload_kecazv78/ansible_ansible.legacy.dnf5_payload.zip/ansible/modules/dnf5.py",
line 704, in main
fedora-rawhide | File "/tmp/ansible_ansible.legacy.dnf5_payload_kecazv78/ansible_ansible.legacy.dnf5_payload.zip/ansible/modules/dnf5.py",
line 487, in run
fedora-rawhide | AttributeError: 'Base' object has no attribute
'load_config_from_file'
Therefore, force the use of DNF4 when an Ansible job is being attempted
more than once [4].
[1] Ansible commit a81b787a05100986
https://github.com/ansible/ansible/commit/a81b787a05100986https://github.com/ansible/ansible/issues/78898
[2] https://fedoraproject.org/wiki/Changes/ReplaceDnfWithDnf5
[3] https://fedoraproject.org/wiki/Changes/SwitchToDnf5
[4] https://zuul-ci.org/docs/zuul/latest/job-content.html#var-zuul.attemptshttps://github.com/containers/toolbox/pull/1509
Ansible's built-in 'package' module doesn't show any details when
installing the RPMs. All that can be seen is:
TASK [Install RPM packages]
fedora-rawhide | changed
Therefore, there's no way to know what version of the packages got
installed.
In this case, not knowing the Bats version being used by the CI makes it
difficult to know why the tests are generating this spew on Fedora
Rawhide [1]:
TASK [Run system tests]
test/system/libs/helpers.bash: line 7: TEMP_BASE_DIR: readonly variable
test/system/libs/helpers.bash: line 8: TEMP_STORAGE_DIR: readonly variable
test/system/libs/helpers.bash: line 10: IMAGE_CACHE_DIR: readonly variable
test/system/libs/helpers.bash: line 11: ROOTLESS_PODMAN_STORE_DIR: readonly variable
test/system/libs/helpers.bash: line 12: ROOTLESS_PODMAN_RUNROOT_DIR: readonly variable
test/system/libs/helpers.bash: line 13: PODMAN_STORE_CONFIG_FILE: readonly variable
test/system/libs/helpers.bash: line 14: DOCKER_REG_ROOT: readonly variable
test/system/libs/helpers.bash: line 15: DOCKER_REG_CERTS_DIR: readonly variable
test/system/libs/helpers.bash: line 16: DOCKER_REG_AUTH_DIR: readonly variable
test/system/libs/helpers.bash: line 17: DOCKER_REG_URI: readonly variable
test/system/libs/helpers.bash: line 18: DOCKER_REG_NAME: readonly variable
test/system/libs/helpers.bash: line 21: PODMAN: readonly variable
test/system/libs/helpers.bash: line 22: TOOLBX: readonly variable
test/system/libs/helpers.bash: line 23: SKOPEO: readonly variable
...
fedora-rawhide | 1..340
[1] https://github.com/bats-core/bats-core/pull/904https://github.com/containers/toolbox/pull/1482
The system tests download several images when setting up the test suite,
and cache them for later use by the tests [1]. This saves time and
avoids hitting rate limits imposed by OCI registries by not downloading
the same images repeatedly for several tests, but at the cost of
increased use of storage space to cache the images.
The images are cached under BATS_TMPDIR. It defaults to the TMPDIR
environment variable, and if that's not set then to /tmp [2]. Normally,
TMPDIR isn't set, and the images end up getting cached under /tmp. Now,
/tmp is typically on tmpfs backed by RAM or swap, which means that it
should be used for smaller size-bounded files only, and /var/tmp should
be used for everything else [3].
The images are big enough that a collection of them can't be described
as smaller and size-bounded, and it led to:
1..306
# test suite: Set up
# test suite: Tear down
not ok 1 setup_suite
# (from function `setup_suite' in test file ./setup_suite.bash, line
55)
# `_pull_and_cache_distro_image fedora "$((system_version-1))" ||
false' failed
# Failed to cache image registry.fedoraproject.org/fedora-toolbox:40
to /tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40
# time="2024-02-19T11:41:43Z" level=fatal msg="copying system image
from manifest list: writing blob: write
/tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40/dir-put-blob607392514:
no space left on device"
# bats warning: Executed 1 instead of expected 306 tests
So, change the default location of the BATS_TMPDIR environment variable
to /var/tmp by setting TMPDIR.
[1] Commit 50683c9d9ahttps://github.com/containers/toolbox/commit/50683c9d9a78adc9https://github.com/containers/toolbox/pull/375
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[3] https://systemd.io/TEMPORARY_DIRECTORIES/https://github.com/containers/toolbox/pull/1462
This is meant to make the project more searchable on the Internet. More
and more people have been pointing out that "toolbox" is terribly
difficult to search for, and it's impossible to find any decent
Internet real estate by that name.
Some exceptions:
* The code repository is still https://github.com/containers/toolbox.
It will be renamed after giving a heads-up to other contributors.
* The name of the binary is still 'toolbox'. The name is embedded
into existing Toolbx containers as their entry point, which is bind
mounted from the host operating system when the containers are
started. Trivially renaming the binary will prevent these
containers from starting.
* For similar reasons, the TOOLBOX_PATH environment variable is still
the same.
* For similar reasons, the profile.d file to be read by the shell on
start-up is still called toolbox.sh.
* The label used to identify Toolbx containers and images is still
called com.github.containers.toolbox. There are many existing
Toolbx containers, and many Toolbx images beyond the control of the
Toolbx project that use this label to identity themselves. Simply
renaming the label will prevent these containers and images from
being recognized.
* The names of the built-in Toolbx images still retain the word
'toolbox'. Images under the new name need to be published on the
OCI registries and the toolbox(1) binary needs to be taught to
handle both old and new names, wherever necessary, for backwards
compatibility.
* The stamp file used to identify Toolbx containers is still called
/run/.toolboxenv because it's used by various external programs and
users to identify Toolbx containers.
* The OSC 777 escape sequence to track and preserve the user's current
Toolbx container [1] still emits 'toolbox' as the name of the
container runtime. Changing the escape sequence can break terminal
emulation applications, like Prompt [2], that consume it. Hence, it
needs to be done carefully.
* The runtime directories at /run/toolbox, when used as root, and
$XDG_RUNTIME_DIR/toolbox, when used rootless, weren't renamed.
When used as root, /run/toolbox is embedded into existing Toolbx
containers as a bind mount from the host. Trivially renaming the
path will prevent these containers from starting.
Secondly, both these paths are used to synchronize container
start-up. If the paths are trivially renamed, and the toolbox(1)
binary is updated and used without stopping all existing containers,
then it won't be able to enter containers that were already started.
Strictly speaking, this scenario isn't supported, since updates are
always expected to be "offline" [3]. However, it's worth noting
because solving the previous problem might also address this.
* The configuration file for RPM is still called
/usr/lib/rpm/macros.d/macros.toolbox.
[1] https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/17
[2] https://gitlab.gnome.org/chergert/prompt
[3] https://www.freedesktop.org/software/systemd/man/latest/systemd.offline-updates.htmlhttps://github.com/containers/toolbox/issues/1399
Ansible's built-in 'package' module doesn't show any details when
installing the RPMs. All that can be seen is:
TASK [Install RPM packages]
fedora-rawhide | changed
Therefore, there's no way to know what version of the packages got
installed.
In this case, not knowing the go-md2man(1) version being used by the CI
makes it difficult to know why the tests are failing on Fedora Rawhide
and Fedora 39 with:
not ok 3 help: Command 'help' in 177ms
# (from function `assert_line' in file
test/system/libs/bats-assert/src/assert.bash, line 479,
# in test file test/system/002-help.bats, line 48)
# `assert_line --index 0 --partial "toolbox(1)"' failed
# /usr/bin/man
#
# -- line does not contain substring --
# index : 0
# substring : toolbox(1)
# line : troff:<standard input>:33: warning: cannot select font
'C'
# --
#
It could be either because the CI is still using an older version of
go-md2man(1) [1,2], or that there's some other problem.
[1] Fedora golang-github-cpuguy83-md2man commit 117806d50e401c19
https://src.fedoraproject.org/rpms/golang-github-cpuguy83-md2man/c/117806d50e401c19https://src.fedoraproject.org/rpms/golang-github-cpuguy83-md2man/pull-request/3
[2] go-md2man commit d85280db9b54b574
https://github.com/cpuguy83/go-md2man/commit/d85280db9b54b574https://github.com/cpuguy83/go-md2man/issues/99https://github.com/containers/toolbox/pull/1386
The Zuul executor contains Ansible 2.13.7 whose 'dnf' module is not
working as it should with Fedora Rawhide because of the DNF5 Change [1].
Unlike DNF4, DNF5 no longer pulls in the python3-dnf RPM, which causes:
TASK [Install RPM packages]
fedora-rawhide | ERROR
fedora-rawhide | {
fedora-rawhide | "msg": "Could not import the dnf python module
using /usr/bin/python3 (3.12.0b3 (main, Jun 21 2023, 00:00:00)
[GCC 13.1.1 20230614 (Red Hat 13.1.1-4)]). Please install
`python3-dnf` or `python2-dnf` package or ensure you have
specified the correct ansible_python_interpreter. (attempted
['/usr/libexec/platform-python', '/usr/bin/python3',
'/usr/bin/python2', '/usr/bin/python'])",
fedora-rawhide | "results": []
fedora-rawhide | }
This adds a workaround that explicitly installs the python3-dnf RPM
using Ansible's 'command' module. It should be removed after Zuul
contains a newer release of Ansible.
[1] https://fedoraproject.org/wiki/Changes/ReplaceDnfWithDnf5https://github.com/containers/toolbox/pull/1338
Signed-off-by: Daniel Pawlik <dpawlik@redhat.com>
Ansible's 'shell' module is almost exactly like the 'command' module,
except that it runs the command through a command line shell so that
environment variables like HOSTNAME and operations like '*', '<' and '>'
work. None of those things are necessary are here. Hence, it's better
to use the 'command' module as elsewhere.
Note that, unlike Ansible's 'shell' module, the 'command' module doesn't
support inline scripts. So, each command needs to be in its own
separate task.
https://github.com/containers/toolbox/pull/1318
This uses 'skopeo inspect' to get the size of the image on the registry,
which is usually less than the size of the image in a local
containers/storage image store after download (eg., 'podman images'),
because they are kept compressed on the registry. Skopeo >= 1.10.0 is
needed to retrieve the sizes [1].
However, this doesn't add a hard dependency on Skopeo to accommodate
size-constrained operating systems like Fedora CoreOS. If skopeo(1) is
missing or too old, then the size of the image won't be shown, but
everything else would continue to work as before.
Some changes by Debarshi Ray.
[1] Skopeo commit d9dfc44888ff71a6
https://github.com/containers/skopeo/commit/d9dfc44888ff71a6https://github.com/containers/skopeo/issues/641https://github.com/containers/toolbox/issues/752
Signed-off-by: Nieves Montero <nmontero@redhat.com>
This is meant to roughly replicate the build environments used by
downstream distributors to build toolbox(1). These can be restricted in
odd ways compared to a fully featured environment where toolbox(1) is
actually going to be used. eg., the inability to use podman(1) in the
case of Fedora or not having subordinate user and group ID ranges in the
case of openSUSE.
It's important to ensure that toolbox(1) can be built by downstream
distributors without any unnecessary hassle.
https://github.com/containers/podman/issues/17657https://github.com/containers/toolbox/issues/1246
On enterprise FreeIPA set-ups, the subordinate user and group IDs are
provided by SSSD's sss plugin for the GNU Name Service Switch (or NSS)
functionality of the GNU C Library. They are not listed in /etc/subuid
and /etc/subgid. Therefore, its necessary to use libsubid.so to check
the subordinate ID ranges.
The CGO interaction with libsubid.so is loosely based on 'readSubid' in
github.com/containers/storage/pkg/idtools [1].
However, unlike 'readSubid', this code considers the absence of any
range (ie., nRanges == 0) to be an error as well.
More importantly, this code uses dlopen(3) and friends to dynamically
load the symbols from libsubid.so, instead of linking to libsubid.so at
build-time and having the dependency noted in the /usr/bin/toolbox
binary. This is done because libsubid.so itself depends on several
other shared libraries, and indirect dependencies can't be influenced
by the RUNPATH [2] embedded in the /usr/bin/toolbox binary [3]. Hence,
when the binary is used inside Toolbx containers (eg., as the entry
point), those indirect dependencies won't be picked from the host's
runtime against which the binary was built. This can render the binary
useless due to ABI compatibility issues. Using dlopen(3) avoids this
problem, especially because libsubid.so is only used when running on the
host.
Care was taken to not load and link libsubid.so twice to separately
validate the subordinate ID ranges for the user and the group. Note
that libsubid_init() must be passed a FILE pointer for logging.
Otherwise, it will create it's own for logging, and there's no way to
close it during dlclose(3).
Version 4 of the libsubid.so API/ABI [4] was released in Shadow 4.10,
which is newer than the versions shipped on RHEL 8 and Debian 10 [5],
and even that newer version had some problems [6]. Therefore, support
for older versions, with the relevant workarounds, is necessary.
Fortunately, the oldest that needs to be support is Shadow 4.9 because
that's when libsubid.so was introduced [7].
Note that SUBID_ABI_VERSION was only introduced with version 4 of the
libsubid.so API/ABI released in Shadow 4.10 [8]. The first release of
libsubid.so in Shadow 4.9 already had an ABI version of 3.0.0 [9], since
it was bumped a few times during development, so that's what's assumed
when SUBID_ABI_VERSION is absent.
This code doesn't set the public variables Prog and shadow_logfd that
older Shadow versions used to expect for logging, because from Shadow
4.9 onwards there's a separate function [4,10] to specify these. This
can be changed if there are libsubid.so versions in the wild that really
do need those public variables to be set.
Finally, ISO C99 is required because of the use of <stdbool.h> in the
libsubid.so API.
Some changes by Debarshi Ray.
[1] https://github.com/containers/storage/blob/main/pkg/idtools/idtools_supported.go
[2] https://man7.org/linux/man-pages/man8/ld.so.8.html
[3] Commit 6063eb27b9https://github.com/containers/toolbox/issues/821
[4] Shadow commit 32f641b207f6ddff
https://github.com/shadow-maint/shadow/commit/32f641b207f6ddffhttps://github.com/shadow-maint/shadow/issues/443
[5] https://packages.debian.org/source/buster/shadow
[6] Shadow commit 79157cbad87f42cd
https://github.com/shadow-maint/shadow/commit/79157cbad87f42cdhttps://github.com/shadow-maint/shadow/issues/465
[7] Shadow commit 0a7888b1fad613a0
https://github.com/shadow-maint/shadow/commit/0a7888b1fad613a0https://github.com/shadow-maint/shadow/issues/154
[8] Shadow commit 0c9f64140852e8d5
https://github.com/shadow-maint/shadow/commit/0c9f64140852e8d5https://github.com/shadow-maint/shadow/pull/449
[9] Shadow commit 3d670ba7ed58f910
https://github.com/shadow-maint/shadow/commit/3d670ba7ed58f910https://github.com/shadow-maint/shadow/issues/339
[10] Shadow commit 2b22a6909dba60d
https://github.com/shadow-maint/shadow/commit/2b22a6909dba60dhttps://github.com/shadow-maint/shadow/issues/325https://github.com/containers/toolbox/issues/1074
Signed-off-by: Martin Jackson <martjack@redhat.com>
Building Toolbx requires a C compiler [1], which defaults to GCC on
Fedora and CentOS Stream. It's good to explicitly require it, so that
it doesn't go missing from the build.
Showing the version of the C compiler is a big help when debugging weird
build problems involving the toolchain. A following commit will use CGO
to link to libsubid.so, which will only increase the relevance of the C
compiler.
[1] Commit c8aaed52c5https://github.com/containers/toolbox/pull/923https://github.com/containers/toolbox/pull/1218
This will be used by the subsequent commit to have a separate set of
dependencies for CentOS Stream 9 builds. eg., unlike Fedora, CentOS
Stream 9 doesn't have the ShellCheck, bats and fish RPMs.
https://github.com/containers/toolbox/pull/1171
Currently, the standard error and output streams of the child commands
invoked by 'meson test' are redirected to a separate log file. When the
tests fail, it's difficult, or maybe even impossible, to access this
file from the Zuul CI, and all that can be seen is something like:
1/7 shellcheck src/go-build-wrapper OK 0.04s
2/7 shellcheck profile.d/toolbox.sh FAIL 0.06s exit status 1
>>> MALLOC_PERTURB_=241 /usr/bin/shellcheck
--shell=sh
/home/zuul-worker/src/github.com/containers/toolbox/builddir/../profile.d/toolbox.sh
3/7 go fmt FAIL 0.05s exit status 1
>>> MALLOC_PERTURB_=209 /usr/bin/python3
/home/zuul-worker/src/github.com/containers/toolbox/src/meson_go_fmt.py
/home/zuul-worker/src/github.com/containers/toolbox/src
4/7 codespell FAIL 0.31s exit status 65
>>> MALLOC_PERTURB_=180 /usr/bin/codespell
--check-filenames
--check-hidden
--context 3
--exclude-file /home/zuul-worker/src/github.com/containers/toolbox/.codespellexcludefile
--skip /home/zuul-worker/src/github.com/containers/toolbox/builddir
--skip /home/zuul-worker/src/github.com/containers/toolbox/.git
--skip /home/zuul-worker/src/github.com/containers/toolbox/test/system/libs/bats-assert
--skip /home/zuul-worker/src/github.com/containers/toolbox/test/system/libs/bats-support
/home/zuul-worker/src/github.com/containers/toolbox
5/7 shellcheck toolbox (deprecated) FAIL 1.09s exit status 1
>>> MALLOC_PERTURB_=233 /usr/bin/shellcheck
/home/zuul-worker/src/github.com/containers/toolbox/builddir/../toolbox
6/7 go test OK 1.89s
7/7 go vet OK 17.60s
This doesn't have enough information to understand what caused the tests
to fail on non-interactive CI environments.
Not redirecting the standard error and output streams of the child
commands invoked by 'meson test' will readily reveal more details about
the test failures and remove the need to find the log file created by
Meson.
https://github.com/containers/toolbox/pull/1171
Different versions of ShellCheck and codespell may treat the same code
base differently. eg., these tools are currently being used on Fedora
36 as part of the 'unit tests', but CentOS Stream 9 has newer versions
that are stricter and catch several new problems.
Knowing the versions of the tools used in the tests helps to understand
these differences, and is a step towards testing on CentOS Stream 9.
https://github.com/containers/toolbox/pull/1199
Currently, 'meson compile' and 'meson install' were being invoked from
pre-run playbooks. This meant that a genuine build failure from either
of those commands would be shown as a RETRY_LIMIT failure by the CI.
This was misleading. It made it look as if the failure was caused by
some transient networking problem or that the CI node was too slow due
to momentary heavy load, whereas the failure was actually due to a
problem in the Toolbx sources. A genuine problem in the sources should
be reflected as a FAILURE, not RETRY_LIMIT.
However, it's worth noting that 'meson compile' invokes 'go build',
which downloads all the Go modules required by the Toolbx sources. This
is worth retaining in the pre-run playbooks since it primarily depends
on Internet infrastructure beyond the Toolbx sources.
As a nice side-effect, the CI no longer gets mysteriously stuck like
this while the Go modules are being downloaded:
TASK [Build Toolbox]
ci-node-36 | ninja: Entering directory
`/home/zuul-worker/src/github.com/containers/toolbox/builddir'
...
ci-node-36 | [8/13] Generating doc/toolbox-rmi.1 with a custom command
ci-node-36 | [9/13] Generating doc/toolbox-run.1 with a custom command
ci-node-36 | [10/13] Generating doc/toolbox.conf.5 with a custom
command
ci-node-36 | [11/13] Generating src/toolbox with a custom command
https://github.com/containers/toolbox/pull/1158
... at https://containertoolbx.org/install/
There are some minor benefits to always invoking meson(1), as opposed to
directly invoking the underlying build backend, like 'ninja'.
It's one less command to be aware of. Secondly, in theory, Meson can be
used with backends other than Ninja (see 'meson configure'), even though
Ninja is the most likely option for building Toolbx because it's only
supported on Linux.
https://github.com/containers/toolbox/pull/1142
The -Dmigration_path_for_coreos_toolbox option enables a different code
path that's currently not tested by the CI at all. In fact, since it's
a build-time option, the corresponding code path is not even built by
the CI.
To properly support the -Dmigration_path_for_coreos_toolbox option, it
needs to be covered by the CI. This is a step in that direction by
running the unit tests on it.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the build and installation steps for builds with
and without the -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the installation of RPM packages, Git submodule
handling, and the listing of various debug and version information for
builds with and without -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
It's only necessary to call 'systemd-tmpfiles --create' when building
and installing from source on the host operating system.
It's not needed when using a pre-built binary downstream package,
because:
* When 'meson install' is called as part of building the package,
that's not when the temporary files need to be created. They need
to be created when the binary package is later downloaded and
installed by the user.
* Downstream tools can sometimes handle it automatically. eg., on
Fedora, the systemd RPM installs a trigger that tells RPM to call
'systemd-tmpfiles --create' automatically when a tmpfiles.d snippet
is installed.
It's also not needed when installing inside a toolbox container because
the files that 'systemd-tmpfiles --create' is supposed to create are
meant to be on the host.
Downstream distributors set the DESTDIR environment variable when
building their packages. Therefore, it's used to detect when a
downstream package is being built.
Unfortunately, environment variables are messy and, generally, Meson
doesn't support accessing them inside its scripts [1]. Therefore, this
adds a spurious build-time dependency on systemd for downstream
distributors. However, that's probably not a big problem because all
supported downstream operating systems are already expected to use
systemd for the tmpfiles.d(5) snippets to work.
[1] https://github.com/mesonbuild/meson/issues/9https://github.com/containers/toolbox/issues/955
Some downstream distributors like RHEL don't have patchelf(1). Relying
on patchelf(1) during the build will make it difficult for such
downstreams to distribute Toolbox.
Fortunately, the path of the dynamic linker (ie., PT_INTERP) is
hardcoded in the ABI specification of each architecture [1]. This means
that Toolbox's build system can keep it's own architecture to dynamic
linker mapping, and specify it during the build through the GNU ld
linker's --dynamic-linker flag, as opposed to using a tool like
patchelf(1) to change the path of the dynamic linker in the built
binary to the one inside /run/host. Currently, the list of
architectures covers the ones that Fedora builds for.
[1] https://sourceware.org/glibc/wiki/ABIListhttps://github.com/containers/toolbox/pull/942
PR #897 made adjustmnets to the Toolbx binary that it requires presence
of /run/host in both the host filesystem and the filesystem in
a container.
The presence of the directory is assured by systemd-tmpfiles by
running it before the binary is started for the first time. For the run
to be effective 'data/tmpfiles.d/toolbox.conf' has to be installed in
a location visible to systemd-tmpfiles. Therefore, the call to
'systemd-tmpfiles --create' had to be placed after the install step.
https://github.com/containers/toolbox/pull/898