Currently, 'go build' is failing on Fedora 42 Workstation:
$ meson compile -C builddir --verbose
...
/path/src/go-build-wrapper /path/src /path/builddir src/toolbox 0.1.1
cc /lib64/ld-linux-x86-64.so.2 false
go: updates to go.mod needed; to update it:
go mod tidy
ninja: build stopped: subcommand failed.
... with Go version:
$ go version
go version go1.24.3 linux/amd64
$ rpm -q golang
golang-1.24.3-2.fc42.x86_64
Strangely, the CI hasn't been failing on Fedora 42 with the same Go
version [1].
Starting from Go version 1.21.0, Go started using an explicit 0 micro
version instead of skipping it - compare Go 1.20 and 1.21.0 [2]. It
looks like recent versions of Go are pedantic about using the exact
version number.
[1] https://github.com/containers/toolbox/pull/1657
[2] https://github.com/golang/go/releases/tag/go1.20https://github.com/golang/go/releases/tag/go1.21.0https://github.com/containers/toolbox/pull/1659
This uses the same approach taken by Flatpak [1] to ensure that the
certificates from certificate authorities (or CAs) that are available
inside a Toolbx container are kept synchronized with the host operating
system. Any program that uses PKCS #11 to access CA certificates should
see the same ones both inside the container and on the host.
During every 'enter' and 'run' command, toolbox(1) ensures that an
instance of 'p11-kit server' is running on the host listening on a local
file system socket that's accessible to both the container and the host.
If an instance is already running, then a second one is not created.
The location of the socket is injected into the container through the
P11_KIT_SERVER_ADDRESS environment variable.
Just like Flatpak, the singleton 'p11-kit server' process is not
terminated when the last 'enter' or 'run' command exits.
The Toolbx container's entry point configures it to use the
p11-kit-client.so PKCS #11 module instead of the usual p11-kit-trust.so
module. This talks to the 'p11-kit server' instance running on the host
over the socket instead of reading the CA certificates that are present
inside the container.
However, unlike Flatpak, this doesn't use D-Bus to set up the
communication between the container and the host, because when invoked
as 'sudo toolbox ...' there's no user or session D-Bus instance
available for the root user.
This set-up is skipped if 'p11-kit server' can't be run on the host, or
if the /etc/pkcs11/modules directory for configuring PKCS #11 modules or
p11-kit-client.so are missing inside the container. None of these are
considered hard dependencies to accommodate size-constrained OSes like
Fedora CoreOS that might not have 'p11-kit server', and existing Toolbx
containers and old images that might not have p11-kit-client.so.
The UBI-based toolbox images haven't yet been updated to contain
p11-kit-client.so. Until that happens, containers created from them
won't have access to the CA certificates from the host.
The CI needs to be run without 'p11-kit server' because the lingering
singleton process causes Bats to hang when tearing down the suite of
system tests [2]. To terminate the 'p11-kit server' instance run by the
system tests, it needs to be distinguishable from the instance run by
'normal' use of Toolbx by the user. One way to do this is to isolate
the host operating system's XDG_RUNTIME_DIR from the system tests.
Unfortunately, this is easier said than done [3]. So, this workaround
has to suffice until the problem is solved.
On the Ubuntu 22.04 CI nodes, it's not possible to remove the p11-kit
package that provides 'p11-kit server', because it leads to:
$ sudo dpkg --purge p11-kit
dpkg: dependency problems prevent removal of p11-kit:
adoptium-ca-certificates depends on p11-kit.
Therefore, as a workaround only the /usr/libexec/p11-kit/p11-kit-server
binary that provides the 'server' command is removed. The rest of the
p11-kit package is left untouched.
[1] Flatpak commit 66b2ff40f7caf3a7
https://github.com/flatpak/flatpak/commit/66b2ff40f7caf3a7https://github.com/flatpak/flatpak/pull/1757https://github.com/p11-glue/p11-kit/issues/68
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[3] https://github.com/containers/toolbox/pull/1652https://github.com/containers/toolbox/issues/626
A subsequent commit will use this to give Toolbx containers access to
the certificates from certificate authorities on the host.
The ideal goal is to ensure that all supported Toolbx containers and
images have p11-kit-client.so in them. In practice, some of them never
will. Either because it's an existing container or an older version of
an image that was already present in the local containers/storage image
store, or because the operating system is too old.
Therefore, there needs to be a way to check at runtime if a Toolbx
container has p11-kit-client.so or not.
https://github.com/containers/toolbox/issues/626
A subsequent commit will use this to give Toolbx containers access to
the certificates from certificate authorities on the host.
This changes the user-visible error message from:
$ toolbox --verbose list
...
DEBU Migrating to newer Podman: failed to create migration lock file
/run/user/1000/toolbox/migrate.lock: open
/run/user/1000/toolbox/migrate.lock: no such file or directory
Error: failed to create migration lock file
... to:
$ toolbox --verbose list
...
DEBU Migrating to newer Podman: failed to create lock file
/run/user/1000/toolbox/migrate.lock: open
/run/user/1000/toolbox/migrate.lock: no such file or directory
Error: failed to create lock file
Or, from:
$ toolbox --verbose list
...
DEBU Migrating to newer Podman: failed to acquire migration lock on
/run/user/1000/toolbox/migrate.lock: bad file descriptor
Error: failed to acquire migration lock
... to:
$ toolbox --verbose list
...
DEBU Migrating to newer Podman: failed to acquire lock on
/run/user/1000/toolbox/migrate.lock: bad file descriptor
Error: failed to acquire lock
This is admittedly less specific without the debug logs, but it's
probably alright because it's such an unlikely error.
https://github.com/containers/toolbox/issues/626
The system tests can be very I/O intensive, because many of them copy
OCI images from the test suite's image cache directory to its local
container/storage store, create containers, and then delete everything
to run the next test with a clean slate. This makes them slow.
The runtime environment tests, which includes the environment variable
tests, are particularly slow because they don't skip the I/O even when
testing error handling. This makes them a good target for
optimizations.
The environment variable tests query the values of different environment
variables from different containers without changing their state.
Therefore, a lot of disk I/O can be avoided by creating these containers
only once for all the tests.
This can reduce the time needed to run the environment variable tests
from almost 26 minutes to almost 9 minutes.
https://github.com/containers/toolbox/pull/1646
The XDG_CACHE_HOME environment variable is supposed to default to
$HOME/.cache [1], just as it did in the test suite, and this location is
meant to be used as a cache for 'normal' use by the user. Test suites
generally don't qualify as 'normal' use.
One expects that deleting the cache shouldn't affect 'normal' use other
than degrading performance. However, deleting these temporary files
used by the test suite will cause actual breakage. Even if the user
doesn't manually delete the cache, two concurrent invocations of the
test suite can do so or lead to other unexpected collisions, because the
paths are constant across multiple invocations.
Therefore, it's better to limit the scope of the test suite's temporary
files within the sandbox offered by Bats [2]. The sandbox is clearly
labelled as being used by Bats, is unique for each invocation, and Bats
takes care of cleaning everything up once it has finished running.
Note that there's no need for the system-test-storage sub-directory
under BATS_SUITE_TMPDIR. So it was left out.
[1] https://specifications.freedesktop.org/basedir-spec/latest/
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.htmlhttps://github.com/containers/toolbox/pull/1645
The p11-kit-modules package in Ubuntu provides p11-kit-client.so, but
the /etc/pkcs11/modules directory that's necessary to configure p11-kit
to use p11-kit-client.so is not created by any package.
It's better to ensure that the /etc/pkcs11/modules directory exists in
the image, instead of having the Toolbx container's entry point create
it at runtime, because it can be a confirmation that p11-kit was built
to read the module configuration from this location.
This should have been part of commit aa8507730d.
https://github.com/containers/toolbox/issues/626
The /etc/pkcs11 directory and /etc/pkcs11/pkcs11.conf.example file are
created by the p11-kit package in Arch Linux, and the lib11-kit package
provides p11-kit-client.so. However, the /etc/pkcs11/modules directory
that's necessary to configure p11-kit to use p11-kit-client.so is not
created by any package.
It's better to ensure that the /etc/pkcs11/modules directory exists in
the image, instead of having the Toolbx container's entry point create
it at runtime, because it can be a confirmation that p11-kit was built
to read the module configuration from this location.
This should have been part of commit 259de86c8f.
https://github.com/containers/toolbox/issues/626
It's been a while since it's been necessary to read the ID field from
os-release(5) outside this package or the VARIANT_ID field anywhere at
all. Therefore, it's time to adjust the code to reflect this reality.
Fallout from 8caa7cd828https://github.com/containers/toolbox/pull/1642
The system tests can be very I/O intensive, because many of them copy
OCI images from the test suite's image cache directory to its local
container/storage store, create containers, and then delete everything
to run the next test with a clean slate. This makes them slow.
The runtime environment tests, which includes the D-Bus tests, are
particularly slow because they don't skip the I/O even when testing
error handling. This makes them a good target for optimizations.
The D-Bus tests check if methods can be called across the user or
session and system D-Bus instances from different containers without
changing their state. Therefore, a lot of disk I/O can be avoided by
reating these containers only once for all the tests.
This can reduce the time needed to run the D-Bus tests from almost 10
minutes to almost 5 minutes.
https://github.com/containers/toolbox/pull/1641
This will reduce the size of the src/pkg/utils/utils.go file and make it
easier to specify which part of the code base is maintained by whom.
https://github.com/containers/toolbox/pull/1639
This will reduce the size of the src/pkg/utils/utils.go file and make it
easier to specify which part of the code base is maintained by whom.
https://github.com/containers/toolbox/pull/1639
This will reduce the size of the src/pkg/utils/utils.go file and make it
easier to specify which part of the code base is maintained by whom.
https://github.com/containers/toolbox/pull/1639
This will reduce the size of the src/pkg/utils/utils.go file and make it
easier to specify which part of the code base is maintained by whom.
https://github.com/containers/toolbox/pull/1639
The system tests can be very I/O intensive, because many of them copy
OCI images from the test suite's image cache directory to its local
container/storage store, create containers, and then delete everything
to run the next test with a clean slate. This makes them slow.
The runtime environment tests, which includes the networking tests, are
particularly slow because they don't skip the I/O even when testing
error handling. This makes them a good target for optimizations.
The networking tests check the behaviour and configuration of the
network in different containers without changing their state.
Therefore, a lot of disk I/O can be avoided by creating these containers
only once for all the tests.
This can reduce the time needed to run the networking tests from almost
15 minutes to almost 6 minutes.
https://github.com/containers/toolbox/pull/1637
The libp11-kit package was added to the arch-toolbox image to ensure the
presence of p11-kit-client.so. Currently, the package is already pulled
in by various dependencies, like the gnutls and p11-kit packages.
Therefore, it doesn't increase the size of the base image, but serves as
a safeguard against any inadvertent changes.
A subsequent commit will use this to give Toolbx containers access to
the certificates from certificate authorities on the host. This commit
was kept separate from the changes to toolbox(1) to ensure that the
arch-toolbox image is ready before that happens.
https://github.com/containers/toolbox/issues/626
The system tests can be very I/O intensive, because many of them copy
OCI images from the test suite's image cache directory to its local
container/storage store, create containers, and then delete everything
to run the next test with a clean slate. This makes them slow.
The runtime environment tests, which includes the group and user tests,
are particularly slow because they don't skip the I/O even when testing
error handling. This makes them a good target for optimizations.
The group and user tests check the group and user configuration in
different containers without changing their state. Therefore, a lot of
disk I/O can be avoided by creating these containers only once for all
the tests.
This can reduce the time needed to run the group and user tests from
almost 22 minutes to almost 5 minutes.
https://github.com/containers/toolbox/pull/1635
The system tests can be very I/O intensive, because many of them copy
OCI images from the test suite's image cache directory to its local
container/storage store, create containers, and then delete everything
to run the next test with a clean slate. This makes them slow.
The tests for toolbox(1) invocations forwarded to the host, which
includes the help tests, are particularly slow because they never skip
the I/O. This makes them a good target for optimizations.
The help tests for toolbox(1) invocations forwarded to the host use the
same default Toolbx container to invoke toolbox(1) from without changing
its state. Therefore, a lot of disk I/O can be avoided by creating the
default container only once for all those tests.
This can reduce the time needed to run the help tests from almost 7
minutes to a few seconds.
https://github.com/containers/toolbox/pull/1635
Now that there's a website at https://containertoolbx.org/ it makes more
sense to link to it instead of the code repository. The website is a
superset of the code repository and contains a lot more useful
information for someone who is not familiar with the Toolbx project.
https://github.com/containers/toolbox/pull/1632
When the fmt.Fprintf() [1] function is used to write to a
strings.Builder [2] instance, it uses the io.Writer [3] interface, which
is the strings.Builder.Write() method. This method is practically the
same as the strings.Builder.WriteString() method, other than the fact
that the former accepts a slice of bytes and the latter accepts a
string. So, the difference is the initial call to fmt.Fprintf().
Therefore, unless format verbs [4] are needed to build the string,
fmt.Fprintf() can be replaced with strings.Builder.WriteString(). It
reduces one function call and is shorter to type.
Fallout from the following:
* e390f15469
* 7542f5fc86
* e58992066f
* 8dd2f8e80a
* 063bdf965f
[1] https://pkg.go.dev/fmt#Fprintf
[2] https://pkg.go.dev/strings#Builder
[3] https://pkg.go.dev/io#Writer
[4] https://pkg.go.dev/fmthttps://github.com/containers/toolbox/pull/1632
This will prevent any silly bug in getting the initialization stamp path
from breaking the communication protocol between the 'enter' or 'run'
commands on the host and the Toolbx container's entry point process.
https://github.com/containers/toolbox/pull/1633
This is meant to reduce the size of the initContainer() function that
implements the heart of the 'init-container' command.
The debug log and error message were tweaked to match the name of the
function and for consistency with the configureRPM() function.
https://github.com/containers/toolbox/pull/1631
The runtime directory is needed a few times during the course of
commonly used Toolbx commands. It's used at start-up for all commands
except 'completion' and 'init-container' to synchronize the invocation
of 'podman system migrate'. The entry point (ie., 'init-container')
uses it to read the generated Container Device Interface specification
and create the initialization stamp file. The 'enter' and 'run'
commands use it to write the CDI specification and twice to detect the
creation of the initialization stamp file.
Since the runtime directory is always the same within a process, there's
no need to repeatedly go through all the steps of parsing the user and
group IDs, creating the directory, setting its ownership, and logging
the name of directory. Once the directory is successfully created, it's
path can be cached and returned for subsequent use.
In case an error occurred while setting up the runtime directory,
subsequent attempts to get it will go through all the steps again. This
doesn't matter much in practice because toolbox(1) can't continue in the
absence of a working runtime directory.
https://github.com/containers/toolbox/pull/1624
... for CVE-2024-0135 or GHSA-9v84-cc9j-pxr6, CVE-2024-0136 or
GHSA-vcfp-63cx-4h59, and CVE-2024-0137 or GHSA-frhw-w3wm-6cw4.
The src/go.sum file was updated with 'go mod tidy'.
https://github.com/containers/toolbox/pull/1614
The indirect dependencies in the src/go.mod file, and the src/go.sum
file were updated with 'go mod tidy'.
The src/go.sum file was skipped from the codespell test because it's
generated with 'go mod tidy'. Otherwise codespell would complain:
: github.com/spf13/viper v1.15.0
h1:js3yy885G8xwJa6iOISGFwd+qlUo5AvyXb7CiihdtiU=
> github.com/spf13/viper v1.15.0/go.mod
h1:fFcTBJxvhhzSJiZy8n+PeW6t8l+KeT/uTARa0jHOQLA=
: github.com/stretchr/objx v0.1.0/go.mod
h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
./src/go.sum:384: KeT ==> key, kept
https://github.com/containers/toolbox/pull/1612
Currently, the CI is failing because 'go mod download' is encountering an
expired TLS certificate:
$ go mod download
go: github.com/spf13/viper@v1.10.1 requires
go.opencensus.io@v0.23.0: unrecognized import path "go.opencensus.io":
https fetch: Get "https://go.opencensus.io/?go-get=1": tls: failed to
verify certificate: x509: certificate has expired or is not yet valid:
current time 2025-01-23T17:00:16+01:00 is after 2025-01-21T03:43:04Z
Therefore, disable the TLS certificate check until the certificate gets
updated or the dependency gets removed [1].
[1] https://pkg.go.dev/cmd/go#hdr-Environment_variableshttps://github.com/containers/toolbox/pull/1611
The system tests can be very I/O intensive, because many of them copy
OCI images from the test suite's image cache directory to its local
container/storage store, create containers, and then delete everything
to run the next test with a clean slate. This makes them slow.
In the case of these two particular tests, toolbox(1) is supposed to
validate the command line options before trying to find the image. So,
there's no need to copy the image from the test suite's image cache
directory to its local container/storage store.
If the toolbox(1) code breaks, then either it won't throw the expected
error or it will download the image before validating the command line
options. The first possibility will be easily detected. The other
could have been harder to notice, but, fortunately, commit
55c0e63786 added variants of these tests without the --assumeyes
option and there are other tests to ensure that images cannot be
downloaded without that option. So, any unexpected attempts to download
the image will be caught by those variants of these tests.
Fallout from 32b147b9ffhttps://github.com/containers/toolbox/pull/1595
It shouldn't be necessary to use the --assumeyes option when creating a
Toolbx container, if the corresponding image is already present in the
local containers/storage image store. It's harmful to test it with the
option, even when it shouldn't be needed, because it's off by default
and most users won't enable it.
Therefore, it's better to test the most common scenario that most users
will encounter.
https://github.com/containers/toolbox/pull/1595
The toolbox(1) binary always relies on the PATH environment variable to
find the podman(1) and skopeo(1) binaries. There's no way to override
those with the PODMAN and SKOPEO environment variables, and they only
affect any direct use of podman(1) and skopeo(1) within the test suite.
Therefore, offering the PODMAN and SKOPEO environment variables in their
current form is needlessly confusing and misleading, and can lead to
surprises arising from different podman(1) and skopeo(1) binaries being
used in different places. Either toolbox(1) should also honour them or
the test suite shouldn't offer them. The former is more complicated
without any obvious need for it, so the latter was chosen.
https://github.com/containers/toolbox/pull/1592
Note that github.com/briandowns/spinner 1.18.1 introduced an undesired
dependency on github.com/mattn/go-isatty for the IsTerminal() API, which
was later removed in 1.23.1 [1]. Fewer dependencies are always good
because it reduces the amount of code in use.
Therefore, this is a step towards using github.com/briandowns/spinner
1.23.1. Instead of bumping it straight to its final desired version,
doing it in smaller steps makes it easier to bisect any uncaught
regressions in future.
The src/go.sum file was updated with 'go mod tidy'.
[1] github.com/briandowns/spinner commit 8f269dd04fbfe236
https://github.com/briandowns/spinner/commit/8f269dd04fbfe236https://github.com/briandowns/spinner/pull/156https://github.com/containers/toolbox/pull/1584
If the NVIDIA Persistence Daemon is used, then 'enter' fails with:
$ sudo systemctl start nvidia-persistenced.service
$ toolbox enter
Error: mount: /run/nvidia-persistenced/socket: mount point does not exist.
dmesg(1) may have more information after failed mount system call.
failed to apply mount from Container Device Interface for NVIDIA
This is due to the socket at /run/nvidia-persistenced/socket being
listed in the Container Device Interface specification when the NVIDIA
Persistence Daemon is used.
Fallout from 6e848b250bhttps://github.com/containers/toolbox/issues/1572
If the proprietary NVIDIA driver is installed, particularly
libnvidia-ml.so.1, but the kernel driver is not being used, then 'enter'
fails with:
$ toolbox enter
Error: failed to initialize NVIDIA Management Library
This was tested on Fedora 39 Workstation with the proprietary NVIDIA
driver from RPM Fusion, which makes it possible to easily disable the
driver without uninstalling it [1].
Note that, with and without this change, there's a delay of a few
seconds inside nvmlInit() from the NVIDIA Management Library.
[1] https://rpmfusion.org/Howto/NVIDIAhttps://github.com/containers/toolbox/issues/1573
The working directory from which bats(1) is invoked might not be part of
the Toolbx container. eg., the downstream Fedora CI invokes the tests
as:
$ cd /path/to/toolbox/test/system
$ bats .
... and it led to:
not ok 8 help: Try unknown command (forwarded to host)
# tags: commands-options
# (from function `assert_line' in file
./libs/bats-assert/src/assert.bash, line 488,
# in test file ./002-help.bats, line 135)
# `assert_line --index 0
"Error: unknown command \"foo\" for \"toolbox\""' failed
#
# -- line differs --
# index : 0
# expected : Error: unknown command "foo" for "toolbox"
# actual : Error: crun: chdir to `/usr/share/toolbox/test/system`:
No such file or directory: OCI runtime attempted to invoke a
command that was not found
# --
#
https://github.com/containers/toolbox/pull/1560
The system tests can be very I/O intensive, because many of them copy
OCI images from the test suite's image cache directory to its local
container/storage store, create containers, and then delete everything
to run the next test with a clean slate. This makes them slow.
The runtime environment tests, which includes the resource limit tests,
are particularly slow because they don't skip the I/O even when testing
error handling. This makes them a good target for optimizations.
The resource limit tests query the values for different resources from
the same default container without changing its state. Therefore, a lot
of disk I/O can be avoided by creating the default container only once
for all the tests.
This can save even 30 minutes.
https://github.com/containers/toolbox/pull/1552
The test suite has expanded to 415 system tests. These tests can be
very I/O intensive, because many of them copy OCI images from the test
suite's image cache directory to its local container/storage store,
create containers, and then delete everything to run the next test with
a clean slate. This makes the system tests slow.
Unfortunately, Zuul's max-job-timeout setting defaults to an upper limit
of 3 hours or 10800 seconds for jobs [1], and this is what Software
Factory uses [2]. So, there comes a point beyond which the CI can't be
prevented from timing out by increasing the timeout.
One way of scaling past this maximum time limit is to run the tests in
parallel across multiple nodes. This has been implemented by splitting
the system tests into different groups, which are run separately by
different nodes.
First, the tests were grouped into those that test commands and options
accepted by the toolbox(1) binary, and those that test the runtime
environment within the Toolbx containers. The first group has more
tests, but runs faster, because many of them test error handling and
don't do much I/O.
The runtime environment tests take especially long on Fedora Rawhide
nodes, which are often slower than the stable Fedora nodes. Possibly
because Rawhide uses Linux kernels that are built with debugging
enabled, which makes it slower. Therefore, this group of tests were
further split for Rawhide nodes by the Toolbx images they use. Apart
from reducing the number of tests in each group, this should also reduce
the amount of time spent in downloading the images.
The split has been implemented with Bats' tagging system that is
available from Bats 1.8.0 [3]. Fortunately, commit 87eaeea6f0
already added a dependency on Bats >= 1.10.0. So, there's nothing to
worry about.
At the moment, Bats doesn't expose the tags being used to run the test
suite to setup_suite() and teardown_suite() [4]. Therefore, the
TOOLBX_TEST_SYSTEM_TAGS environment variable was used to optimize the
contents of setup_suite().
[1] https://zuul-ci.org/docs/zuul/latest/tenants.html
[2] Commit 83f28c52e4https://github.com/containers/toolbox/commit/83f28c52e47c2d44https://github.com/containers/toolbox/pull/1548
[3] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[4] https://github.com/bats-core/bats-core/issues/1006https://github.com/containers/toolbox/pull/1551
Using the word 'containerized' gives the false impression of heightened
security. As if it's a mechanism to run untrusted software in a
sandboxed environment without access to the user's private data (such as
$HOME), hardware peripherals (such as cameras and microphones), etc..
That's not what Toolbx is for.
Toolbx aims to offer an interactive command line environment for
development and troubleshooting the host operating system, without
having to install software on the host. That's all. It makes no
promise about security beyond what's already available in the usual
command line environment on the host that everybody is familiar with.
https://github.com/containers/toolbox/issues/1020
Mention that Toolbx is meant for system administrators to troubleshoot
the host operating system. The word 'debugging' is often used in the
context of software development, and hence most readers might not
interpret it as 'troubleshooting'.
https://github.com/containers/toolbox/pull/1549
Use 'software development' instead of just 'development' when
introducing Toolbx. The additional context makes it more understandable
to the reader.
https://github.com/containers/toolbox/pull/1549
The '-z now' flag, which is the opposite of '-z lazy', is unsupported as
an external linker flag [1], because of how the NVIDIA Container Toolkit
stack uses dlopen(3) to load libcuda.so.1 and libnvidia-ml.so.1 at
runtime [2,3].
The NVIDIA Container Toolkit stack doesn't use dlsym(3) to obtain the
address of a symbol at runtime before using it. It links against
undefined symbols at build-time available through a CUDA API definition
embedded directly in the CGO code or a copy of nvml.h. It relies upon
lazily deferring function call resolution to the point when dlopen(3) is
able to load the shared libraries at runtime, instead of doing it when
toolbox(1) is started.
This is unlike how Toolbx itself uses dlopen(3) and dlsym(3) to load
libsubid.so at runtime.
Compare the output of:
$ nm /path/to/toolbox | grep ' subid_init'
... with those from:
$ nm /path/to/toolbox | grep ' nvmlGpuInstanceGetComputeInstanceProfileInfoV'
U nvmlGpuInstanceGetComputeInstanceProfileInfoV
$ nm /path/to/toolbox | grep ' nvmlDeviceGetAccountingPids'
U nvmlDeviceGetAccountingPids
Using '-z now' as an external linker flag forces the dynamic linker to
resolve all symbols when toolbox(1) is started, and leads to:
$ toolbox
toolbox: symbol lookup error: toolbox: undefined symbol:
nvmlGpuInstanceGetComputeInstanceProfileInfoV
With the recent expansion of the test suite, it's necessary to increase
the timeout for the Fedora nodes to prevent the CI from timing out.
Fallout from 6e848b250b
[1] NVIDIA Container Toolkit commit 1407ace94ab7c150
https://github.com/NVIDIA/nvidia-container-toolkit/commit/1407ace94ab7c150https://github.com/NVIDIA/go-nvml/issues/18https://github.com/NVIDIA/nvidia-container-toolkit/issues/49
[2] https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/internal/cuda
[3] https://github.com/NVIDIA/go-nvml/blob/main/README.mdhttps://github.com/NVIDIA/go-nvml/tree/main/pkg/dlhttps://github.com/NVIDIA/go-nvml/tree/main/pkg/nvmlhttps://github.com/containers/toolbox/pull/1548
Commit 87eaeea6f0 already added a dependency on Bats >= 1.10.0,
which is present on Fedora >= 39. Therefore, it should be exploited
wherever possible to simplify things.
Currently, the CI has been frequently timing out on stable Fedora nodes.
So, increase the timeout from 1 hour 50 minutes to 2 hours to avoid
that.
For what it's worth, the timeout for Fedora Rawhide nodes is 2 hours 10
minutes and it seems enough.
https://github.com/containers/toolbox/pull/1546
The proprietary NVIDIA driver has a kernel space part and a user space
part, and they must always have the same matching version. Sometimes,
the host operating system might end up with mismatched parts. One
reason could be that the different third-party repositories used to
distribute the driver might be incompatible with each other. eg., in
the case of Fedora it could be RPM Fusion and NVIDIA's own repository.
This shows up in the systemd journal as:
$ journalctl --dmesg
...
kernel: NVRM: API mismatch: the client has the version 555.58.02, but
NVRM: this kernel module has the version 560.35.03. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
...
Without any special handling of this scenario, users would be presented
with a very misleading error:
$ toolbox enter
Error: failed to get Container Device Interface containerEdits for
NVIDIA
Instead, improve the error message to be more self-documenting:
$ toolbox enter
Error: the proprietary NVIDIA driver's kernel and user space don't
match
Check the host operating system and systemd journal.
https://github.com/containers/toolbox/pull/1541
Note that github.com/NVIDIA/go-nvlib > 0.2.0 isn't API compatible with
github.com/NVIDIA/nvidia-container-toolkit 1.15.0. The next release of
nvidia-container-toolkit is 1.16.0 and it requires go-nvlib 0.6.0.
Therefore, these two Go modules need to be updated together.
The src/go.sum file was updated with 'go mod tidy'.
https://github.com/containers/toolbox/pull/1540
When 'toolbox run' is invoked on the host, an exit code of 127 from
'podman exec' means either that the specified command couldn't be found
or that the working directory didn't exist. The only way to tell these
two scenarios apart is to actually look inside the container.
Secondly, Toolbx containers always have an executable toolbox(1) binary
at /usr/bin/toolbox and it's assumed that /usr/bin is always part of the
PATH environment variable.
When 'toolbox run toolbox ...' is invoked, the inner toolbox(1)
invocation will be forwarded back to the host by the Toolbx container's
/usr/bin/toolbox, which is always present as an executable. Hence, if
the outer 'podman exec' on the host fails with an exit code of 127,
then it doesn't mean that the container didn't have a toolbox(1)
executable, but that some subordinate process started by the container's
toolbox(1) failed with that exit code.
Therefore, handle this as a special case to avoid losing the exit code.
Otherwise, it leads to:
$ toolbox run toolbox run non-existent-command
bash: line 1: exec: non-existent-command: not found
Error: command non-existent-command not found in container
fedora-toolbox-40
$ echo "$?"
0
Instead, it will now be:
$ toolbox run toolbox run non-existent-command
bash: line 1: exec: non-existent-command: not found
Error: command non-existent-command not found in container
fedora-toolbox-40
$ echo "$?"
127
https://github.com/containers/toolbox/issues/957https://github.com/containers/toolbox/pull/1052
When 'toolbox run' is invoked on the host, an exit code of 126 from
'podman exec' means that the specified command couldn't be invoked
because it's not an executable. eg., the command was actually a
directory. Note that this doesn't mean that the command couldn't be
found. That's denoted by exit code 127.
Secondly, Toolbx containers always have an executable toolbox(1) binary
at /usr/bin/toolbox and it's assumed that /usr/bin is always part of the
PATH environment variable.
When 'toolbox run toolbox ...' is invoked, the inner toolbox(1)
invocation will be forwarded back to the host by the Toolbx container's
/usr/bin/toolbox, which is always present as an executable. Hence, if
the outer 'podman exec' on the host fails with an exit code of 126,
then it doesn't mean that the container didn't have a working toolbox(1)
executable, but that some subordinate process started by the container's
toolbox(1) failed with that exit code.
Therefore, handle this as a special case to avoid showing an extra error
message. Otherwise, it leads to:
$ toolbox run toolbox run /etc
bash: line 1: /etc: Is a directory
bash: line 1: exec: /etc: cannot execute: Is a directory
Error: failed to invoke command /etc in container fedora-toolbox-40
Error: failed to invoke command toolbox in container fedora-toolbox-40
$ echo "$?"
126
Instead, it will now be:
$ toolbox run toolbox run /etc
bash: line 1: /etc: Is a directory
bash: line 1: exec: /etc: cannot execute: Is a directory
Error: failed to invoke command /etc in container fedora-toolbox-40
$ echo "$?"
126
https://github.com/containers/toolbox/issues/957https://github.com/containers/toolbox/pull/1052
The test suite uses its own separate local container/storage store to
isolate itself from the default store, so that the tests' interactions
with containers and images don't affect anything else. This is done by
using the CONTAINERS_STORAGE_CONF environment variable [1] to specify a
separate storage.conf(5) file [2].
Therefore, when running the test suite, the CONTAINERS_STORAGE_CONF
environment variable must be preserved when forwarding toolbox(1)
invocations inside containers to the host. Otherwise, the initial
toolbox(1) invocation on the host and the forwarded invocation running
on the host won't use the same local container/storage store.
This problem only impacts test cases that cover toolbox(1) code paths
that invoke podman(1).
[1] https://docs.podman.io/en/latest/markdown/podman.1.html
[2] https://manpages.debian.org/testing/containers-storage/containers-storage.conf.5.en.htmlhttps://github.com/containers/toolbox/issues/957https://github.com/containers/toolbox/pull/1052
This will make it easier to propagate the exit codes of subordinate
processes through an exitError instance, when toolbox(1) is invoked
inside a container, and invocation is forwarded to the host.
Cobra doesn't honour the root command's SilenceErrors, if an error
occurred when parsing the command line for a command, even though the
command was found. However, Cobra does honour SilenceErrors, if the
error occurred afterwards.
Therefore, to avoid setting SilenceErrors in each and every command, it
was set in the PersistentPreRunE hook (ie., preRun), which is called
after all command line parsing has been successfully completed.
https://github.com/containers/toolbox/issues/957
It shouldn't be necessary to use the --assumeyes option when creating a
Toolbx container, if the corresponding image is already present in the
local containers/storage image store. It's harmful to test it with the
option, even when it shouldn't be needed, because it's off by default
and most users won't enable it.
Therefore, it's better to test the most common scenario that most users
will encounter.
https://github.com/containers/toolbox/pull/1536
It's far more consistent and understandable if all tests start with a
clean state without any containers or images present. Otherwise, the
subtle side-effects of having some image left behind from a previous
test can lead to surprises, and there's no need to spend time wondering
whether some tests should only clean up the containers or both
containers and images.
This additional work of cleaning up the images for all tests makes it
necessary to increase the timeout for all Fedora nodes to prevent the CI
from timing out.
https://github.com/containers/toolbox/pull/1526
This uses the NVIDIA Container Toolkit [1] to generate a Container
Device Interface specification [2] on the host during the 'enter' and
'run' commands. The specification is saved as JSON in the runtime
directories at /run/toolbox or $XDG_RUNTIME_DIR/toolbox to make it
available to the Toolbx container's entry point. The environment
variables in the specification are directly passed to 'podman exec',
while the hooks and mounts are handled by the entry point.
Toolbx containers already have access to all the devices in the host
operating system's /dev, and containers share the kernel space driver
with the host. So, this is only about making the user space driver
available to the container. It's done by bind mounting the files
mentioned in the generated CDI specification from the host to the
container, and then updating the container's dynamic linker cache.
This neither depends on 'nvidia-ctk cdi generate' to generate the
Container Device Interface specification nor on 'podman create --device'
to consume it.
The main problem with nvidia-ctk and 'podman create' is that the
specification must be saved in /etc/cdi or /var/run/cdi, both of which
require root access, for it to be visible to 'podman create --device'.
Toolbx containers are often used rootless, so requiring root privileges
for hardware support, something that's not necessary on the host, will
be a problem.
Secondly, updating the toolbox(1) binary won't let existing containers
use the proprietary NVIDIA driver, because 'podman create' only affects
new containers.
Therefore, toolbox(1) uses the Go APIs used by 'nvidia-ctk cdi generate'
and 'podman create --device' to generate, save, load and apply the CDI
specification itself. This removes the need for root privileges due to
/etc/cdi or /var/run/cdi, and makes the driver available to existing
containers.
Until Bats 1.10.0, 'run --keep-empty-lines' had a bug where it counted
the trailing newline on the last line as a separate line [3]. However,
Bats 1.10.0 is only available in Fedora >= 39 and is absent from Fedora
38.
Based on an idea from Ievgen Popovych.
[1] https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/https://github.com/NVIDIA/nvidia-container-toolkit
[2] https://github.com/cncf-tags/container-device-interface
[3] Bats commit 6648e2143bffb933
https://github.com/bats-core/bats-core/commit/6648e2143bffb933https://github.com/bats-core/bats-core/issues/708https://github.com/containers/toolbox/issues/116
The Zuul executor got updated from Ansible 2.13.7 to 2.15.10, which now
has support for DNF5 [1] and the previous DNF5 Change [2] for Fedora 39
is now aiming at Fedora 41 (and Rawhide) [3]. Unfortunately, Ansible's
'dnf5' module is still under development and doesn't seem to match the
state of DNF5 in Fedora Rawhide, which causes:
TASK [Install RPM packages]
fedora-rawhide | ERROR
fedora-rawhide | {
fedora-rawhide | "failures": [],
fedora-rawhide | "msg": "Could not import the libdnf5 python module
using /usr/bin/python3 (3.12.3 (main, Apr 17 2024, 00:00:00) [GCC
14.0.1 20240411 (Red Hat 14.0.1-0)]). Please install
python3-libdnf5 package or ensure you have specified the correct
ansible_python_interpreter. (attempted
['/usr/libexec/platform-python', '/usr/bin/python3',
'/usr/bin/python2', '/usr/bin/python'])"
fedora-rawhide | }
Trying to explicitly install python3-libdnf5, as suggested above, using
Ansible's 'command' module before using the 'package' module to install
the Toolbx dependencies, still ends up with:
TASK [Install RPM packages]
fedora-rawhide | MODULE FAILURE:
fedora-rawhide | Traceback (most recent call last):
fedora-rawhide | File "<stdin>", line 107, in <module>
fedora-rawhide | File "<stdin>", line 99, in _ansiballz_main
fedora-rawhide | File "<stdin>", line 47, in invoke_module
fedora-rawhide | File "<frozen runpy>", line 226, in run_module
fedora-rawhide | File "<frozen runpy>", line 98, in _run_module_code
fedora-rawhide | File "<frozen runpy>", line 88, in _run_code
fedora-rawhide | File "/tmp/ansible_ansible.legacy.dnf5_payload_kecazv78/ansible_ansible.legacy.dnf5_payload.zip/ansible/modules/dnf5.py",
line 708, in <module>
fedora-rawhide | File "/tmp/ansible_ansible.legacy.dnf5_payload_kecazv78/ansible_ansible.legacy.dnf5_payload.zip/ansible/modules/dnf5.py",
line 704, in main
fedora-rawhide | File "/tmp/ansible_ansible.legacy.dnf5_payload_kecazv78/ansible_ansible.legacy.dnf5_payload.zip/ansible/modules/dnf5.py",
line 487, in run
fedora-rawhide | AttributeError: 'Base' object has no attribute
'load_config_from_file'
Therefore, force the use of DNF4 when an Ansible job is being attempted
more than once [4].
[1] Ansible commit a81b787a05100986
https://github.com/ansible/ansible/commit/a81b787a05100986https://github.com/ansible/ansible/issues/78898
[2] https://fedoraproject.org/wiki/Changes/ReplaceDnfWithDnf5
[3] https://fedoraproject.org/wiki/Changes/SwitchToDnf5
[4] https://zuul-ci.org/docs/zuul/latest/job-content.html#var-zuul.attemptshttps://github.com/containers/toolbox/pull/1509
When a Toolbx container is started for the first time and the entry
point invokes 'passwd --delete root' to actually remove the password for
root, passwd(1) writes the following to its standard error stream:
passwd: Note: deleting a password also unlocks the password.
This doesn't happen when the same container is stopped and started once
again.
Since, passwd(1) directly writes to its standard error stream without
going through Logrus, the corresponding log entry in 'podman logs'
doesn't have a 'level' key, and is assumed by the log parser in the
'enter' and 'run' commands to be an error. If the entry point doesn't
actually encounter an error, then this confusion doesn't have any
user-visible effect. However, if the entry point does encounter an
error after this point, then the message from passwd(1) gets prepended
to it and presented to the user:
$ toolbox enter
Error: passwd: Note: deleting a password also unlocks the password.
failed to set KCM as the default Kerberos credential cache
Prevent this by intercepting the standard error stream of passwd(1) and
make it go through Logrus when passwd(1) fails. Losing this particular
message when passwd(1) actually succeeds in removing the password is not
a big problem, because it's somewhat redundant.
Fallout from 815d7f6035https://github.com/containers/toolbox/issues/750
The SHELL environment variable goes mysteriously missing from the
runtime environment of the GitHub Actions workflow [1]. This breaks the
'create' and 'enter' commands with:
Error: failed to get the current user's default shell
... and therefore tests involving them can't be run until this is
resolved.
It's been a year since this problem was first encountered and no
solution is in sight. Therefore, it will be better to work around this
by explicitly setting the SHELL environment variable on Ubuntu 22.04 to
increase the number of tests run by the CI.
The 'list' tests couldn't be enabled due to:
$ bats test/system
...
not ok 110 list: Containers and images
# (from function `assert_line' in file
test/system/libs/bats-assert/src/assert.bash, line 479,
# in test file test/system/102-list.bats, line 502)
# `assert_line --index 1 --partial
"registry.fedoraproject.org/fedora-toolbox:34"' failed
#
# -- line does not contain substring --
# index : 1
# substring : registry.fedoraproject.org/fedora-toolbox:34
# line : 5c5b1421750d quay.io/toolbx/ubuntu-toolbox:22.04
28 hours ago
# --
#
...
The 'run' tests couldn't be enabled due to:
$ bats --print-output-on-failure --verbose-run test/system
...
not ok 134 run: 'sudo id' inside the default container
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# in test file test/system/104-run.bats, line 208)
# `assert_success' failed
# ~ ~/work/toolbox/toolbox/containers/toolbox
# stderr:
# runner is not in the sudoers file. This incident will be reported.
#
# -- command failed --
# status : 1
# output :
# --
#
...
The 'user' tests couldn't be enabled due to:
$ bats test/system
...
not ok 243 user: runner in passwd(5) inside the default container
# (from function `assert_line' in file
test/system/libs/bats-assert/src/assert.bash, line 509,
# in test file test/system/206-user.bats, line 190)
# `assert_line --regexp
"^$USER::$user_id_real:$user_id_real:$user_gecos:$HOME:$SHELL$"'
failed
# ~ ~/work/toolbox/toolbox/containers/toolbox
#
# -- no output line matches regular expression --
# regexp : ^runner::1001:1001:,,,:/home/runner:/bin/bash$
# output (27 lines):
# root❌0:0:root:/root:/bin/bash
# ...
# runner::1001:127::/home/runner:/bin/bash
# --
#
...
The 'ulimit' tests couldn't be enabled due to:
$ bats test/system
...
not ok 271 ulimit: real-time non-blocking time (hard) in 3504ms
# (from function `assert_line' in file
test/system/libs/bats-assert/src/assert.bash, line 488,
# in test file test/system/210-ulimit.bats, line 43)
# `assert_line --index 0 "$limit"' failed
# ~ ~/work/toolbox/toolbox/containers/toolbox
#
# -- line differs --
# index : 0
# expected : unlimited
# actual :
# --
#
...
The 'dbus' tests couldn't be enabled due to:
$ bats --print-output-on-failure --verbose-run test/system
...
not ok 206 dbus: session bus inside the default container
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# in test file test/system/211-dbus.bats, line 50)
# `assert_success' failed
# ~ ~/work/toolbox/toolbox/containers/toolbox
# stderr:
# bash: line 1: exec: gdbus: not found
# Error: command gdbus not found in container ubuntu-toolbox-22.04
#
# -- command failed --
# status : 127
# output :
# --
#
...
[1] https://github.com/orgs/community/discussions/59413https://github.com/containers/toolbox/pull/1507
The test was earlier rewritten in commit b0beb68255 with custom
code in the hope that it would make it more reliable. The test has
proven to be reliable in recent times, and the cause for its past
unreliability is unclear. Therefore, it will be better to remove the
custom code in favour of the standard Bats helpers for the sake of
consistency and readability.
Until Bats 1.10.0, 'run --keep-empty-lines' had a bug where it counted
the trailing newline on the last line as a separate line [1]. However,
Bats 1.10.0 is only available in Fedora >= 39 and is absent from Fedora
38.
[1] Bats commit 6648e2143bffb933
https://github.com/bats-core/bats-core/commit/6648e2143bffb933https://github.com/bats-core/bats-core/issues/708https://github.com/containers/toolbox/pull/1506
Until now, if the entry point of a Toolbx container encountered an
error, while starting the container as part of the 'enter' and 'run'
commands, the specific error wouldn't be presented to the user by those
commands. Instead, the user would have to use 'podman start --attach'
or 'podman logs' to retrieve it. Same for the debug logs coming from
the entry point.
The lack of relevant information and insight made it difficult for users
to debug their containers or file high quality bug reports.
This is addressed by using 'podman logs', as part of the 'enter' and
'run' commands to fetch everything from the entry point's standard error
and output streams, which means both debug logs and errors, separating
them out, and presenting them to the user depending on the chosen debug
or verbosity level.
The debug logs from 'podman logs' are in the logfmt format [1], because
that's the default behaviour of Logrus when a terminal device is not
attached [2]. Logs without a 'level' key are assumed to be errors.
It's assumed that the 'podman logs' process can crash or terminate,
because the entry point crashed or got killed due to an out-of-band
'podman stop' or encountered an error. Under such circumstances, the
'enter' and 'run' commands will terminate immediately if they have
already read any error coming from the entry point, if not then they
will wait for the timeout.
If the entry point successfully initializes the Toolbx container, then
'enter' and 'run' will cancel the 'podman logs' process, parse and show
any pending logs, and then terminate.
It's possible to detect the creation of the initialization stamp file
before all that was written by the entry point has been read from
'podman logs', causing all or some of the debug logs to be not shown as
part of 'enter' and 'run'. This is because the creation of the
initialization stamp file is detected by a quick inotify(7) watch within
the 'enter' and 'run' processes, while the logs slowly change hands
across multiple entities - from the entry point to conmon(8) to the
systemd journal to 'podman logs'.
[1] https://brandur.org/logfmt
[2] https://pkg.go.dev/github.com/sirupsen/logrus#section-readmehttps://github.com/containers/toolbox/issues/750
A subsequent commit will show the debug logs and errors from a Toolbx
container's entry point as part of the 'enter' and 'run' commands. To
test this behaviour, it will be necessary to intentially fail the entry
point.
Moreover, container start-up is a concurrent operation. If the entry
point fails too early, then it will be caught by the 'podman inspect'
right after the 'podman start' before the inotify(7) watches are put in
place. Otherwise, it will be handled by the timeout. Therefore, it
will be necessary to shake out any bugs arising out of unexpected races.
To address this, two environment variables have been introduced:
* TOOLBX_DELAY_ENTRY_POINT
* TOOLBX_FAIL_ENTRY_POINT
The TOOLBX_DELAY_ENTRY_POINT environment variable can be set to a
positive integer during the 'create' command to add a delay, in terms of
seconds, when the Toolbx container's entry point is started by 'enter'
and 'run'.
Similarly, if the TOOLBX_FAIL_ENTRY_POINT environment variable is set to
a positive integer during the 'create' command, the entry point will
later fail during 'enter' and 'run'. The error message will have only
one line if its value is one, else it will have two.
https://github.com/containers/toolbox/issues/750
The subsequent commit will use this to show the debug logs and errors
from a Toolbx container's entry point as part of the 'enter' and 'run'
commands.
https://github.com/containers/toolbox/issues/750
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized. This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.
The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays. The
current one second interval is sufficiently long to add a noticeable
delay to the 'enter' and 'run' commands.
It will be better to use inotify(7) to monitor the file system, which is
quite easy to do with the Go implementation, so that the commands can
proceed as soon as the initialization stamp file is available, instead
of waiting for the polling interval to pass.
There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7). This code
path can be forced through the TOOLBX_RUN_USE_POLLING environment
variable for testing. Setting this environment variable disables some
code to ensure that the polling ticker is actually used, because,
otherwise, the race between the creation and detection of the
initialization stamp file makes it difficult to test the fallback.
[1] Commit d3e0f3df06https://github.com/containers/toolbox/commit/d3e0f3df06d3f5achttps://github.com/containers/toolbox/pull/305https://github.com/containers/toolbox/issues/1070
There's no immediate desire to make Toolbx work on operating systems
that don't use forward slashes as path separators. However, there's
also no reason not to use the standard library.
https://github.com/containers/toolbox/pull/1495
Currently, the 'enter' and 'run' commands always invoke 'podman start'
even if the Toolbx container's entry point is already running. There's
no need for that. The commands already invoke 'podman inspect' to find
out if the org.freedesktop.Flatpak.SessionHelper D-Bus service needs to
be started. Thus, they already have what is needed to find out if the
container is stopped and 'podman start' is necessary before it can be
used with 'podman exec', or if it's already running.
The unconditional 'podman start' invocation was followed by a second
'podman inspect' invocation to find out if the 'podman start' managed to
start the container's entry point. There's no need for this second
'podman inspect' either, just like the 'podman start', when it's already
known from the first 'podman inspect' that the container is running.
The extra 'podman start' and 'podman inspect' invocations are
sufficiently expensive to add a noticeable overhead to the 'enter' and
'run' commands. It's common to use a container that's already running,
just like having multiple terminals within the same working directory,
and terminal emulation applications like Ptyxis try to make it easier to
do so [1]. Therefore, it's worth optimizing this code path.
[1] https://gitlab.gnome.org/chergert/ptyxishttps://flathub.org/apps/app.devsuite.Ptyxishttps://github.com/containers/toolbox/issues/1070
This makes it possible to confine the details of detecting a Toolbx
container within the podman package, because it was not possible to use
podman.IsToolboxContainer() when listing all the Toolbx containers.
https://github.com/containers/toolbox/pull/1491
Unmarshal the JSON from 'podman inspect --format json --type container'
directly inside podman.InspectContainer() to confine the details within
the podman package.
The JSON samples for the unit tests were taken using the default Toolbx
container on versions of Fedora that shipped a specific Podman and
Toolbx version. This accounts for differences in the JSON caused by
different major versions of Podman and the way different Toolbx versions
set up the containers.
One exception was Fedora 28, which had Podman 1.1.2 and Toolbx 0.0.9,
which was the last Toolbx version before 'toolbox init-container' became
the entry point for all Toolbx containers [1]. However, the default
Toolbx image is no longer available from registry.fedoraproject.org.
Hence, the image for Fedora 29 was used.
The minimum required Podman version is 1.6.4 [2], and the Go
implementation has been encouraging users to create containers with
Toolbx version 0.0.17 or newer [3]. The versions used to collect the
JSON samples for the unit tests were chosen accordingly. They don't
exhaustively cover all possible supported and unsupported version
combinations, but hopefully enough to be useful.
[1] Commit 8b84b5e460https://github.com/containers/toolbox/commit/8b84b5e4604921fahttps://github.com/debarshiray/toolbox/pull/160
[2] Commit 8e80dd5db1https://github.com/containers/toolbox/commit/8e80dd5db1e6f40bhttps://github.com/containers/toolbox/pull/1253
[3] Commit 238f2451e7https://github.com/containers/toolbox/commit/238f2451e7d7d54ahttps://github.com/containers/toolbox/pull/318https://github.com/containers/toolbox/pull/1490
A subsequent commit will switch to unmarshalling the JSON returned from
'podman inspect --format json --type container' directly inside
podman.InspectContainer() to confine the details within the podman
package and make it easier to write unit tests for it. eg., it requires
tracking changes to the JSON output across different Podman versions.
Unfortunately, the JSON from 'podman inspect --type container' and
'podman ps --all' are considerably different and it will be awkward to
use the same implementation of the json.Unmarshaler interface [1] for
both. One option is to have two different concrete types separately
implement json.Unmarshaler to handle the differences in the JSON, and
then hiding these concrete types behind a Container interface that
provides access to the values parsed from the JSON.
[1] https://pkg.go.dev/encoding/json#Unmarshalerhttps://github.com/containers/toolbox/pull/1490
In future, it will be good if podman.Inspect() returned a Container or
Image object instead of a []map[string]interface{} that the caller has
to parse. This is because parsing the []map[string]interface{} involves
tracking changes in the JSON output by different Podman versions, and
it's better to limit such details to the podman package.
Splitting podman.Inspect() into two separate functions for containers
and images is one way of achieving that.
https://github.com/containers/toolbox/pull/1487
Currently, it's not possible to create a Toolbx container from an image
without a name:
$ podman build --squash images/fedora/f39
STEP 1/21: FROM registry.fedoraproject.org/fedora:39
STEP 2/21: ARG NAME=fedora-toolbox
STEP 3/21: ARG VERSION=39
...
--> 2f9bdf11c8d4
2f9bdf11c8d4d7674dfb17d8edcfd13475d8636077f1a6208ecd616de77d7f80
$ toolbox create --image 2f9bdf11c8d4
Error: empty RepoTag for image 2f9bdf11c8d4
The image's fully qualified name is fetched from its RepoTags for purely
cosmetic reasons to show a precise human-readable name in the debug logs
and 'podman inspect --type container'. Therefore, there's no reason to
fail the creation of a Toolbx container in the absence of it.
Note that an image without a name will have an empty RepoTags array in
the JSON returned by 'podman inspect --format json --type image'. It's
different from not having a RepoTags array at all in the JSON, which may
or may not be indicative of a more serious problem and will continue to
fail the creation of the Toolbx container as before.
https://github.com/containers/toolbox/pull/1486
It's less of a user-facing operation, and more of a backend one. A
subsequent commit will improve the error handling when getting the fully
qualified name of an image from its RepoTags to handle cases where the
'create' command is used with an image without a name.
https://github.com/containers/toolbox/pull/1486
This builds on top of commit 56fcb0b4d2.
The toolboxContainer type has been renamed to Container and moved into
the podman package.
There is nothing Toolbx specific about the type - it represents any
container returned by 'podman ps'. The containers are only later
filtered for Toolbx containers.
Secondly, having the Container type inside the podman package makes it
possible to encapsulate the unmarshalling of the JSON within the package
without exposing the raw JSON to outside consumers. This is desirable
because the unmarshalling involves tracking changes in the JSON output
by different Podman versions, and it's better to limit such details to
the podman package.
https://github.com/containers/toolbox/pull/1485
It's better to avoid single letter variables in general, because they
are so hard to grep for.
This will make the subsequent commit easier to read.
https://github.com/containers/toolbox/pull/1485
This builds on top of commit e772207831.
Currently, the JSON from 'podman ps --format json' gets unmarshalled
into a []map[string]interface{} in podman.GetContainers, where the maps
in the slice represent containers. Each map is then marshalled back
into JSON and then again unmarshalled into a toolboxContainer type.
This is wasteful. The toolboxContainer type already implements the
json.Unmarshaler interface [1], since commit e772207831. Hence,
the entire JSON from 'podman ps --format json' can be directly
unmarshalled into a slice of toolboxContainers without involving the
[]map[string]interface{}.
A subsequent commit will move the toolboxContainer type into the podman
package to more tightly encapsulate the unmarshalling of the JSON. So,
as an intermediate step in that direction, the podman.GetContainers
function has been temporarily changed to return the entire JSON.
[1] https://pkg.go.dev/encoding/json#Unmarshalerhttps://github.com/containers/toolbox/pull/1485
Ansible's built-in 'package' module doesn't show any details when
installing the RPMs. All that can be seen is:
TASK [Install RPM packages]
fedora-rawhide | changed
Therefore, there's no way to know what version of the packages got
installed.
In this case, not knowing the Bats version being used by the CI makes it
difficult to know why the tests are generating this spew on Fedora
Rawhide [1]:
TASK [Run system tests]
test/system/libs/helpers.bash: line 7: TEMP_BASE_DIR: readonly variable
test/system/libs/helpers.bash: line 8: TEMP_STORAGE_DIR: readonly variable
test/system/libs/helpers.bash: line 10: IMAGE_CACHE_DIR: readonly variable
test/system/libs/helpers.bash: line 11: ROOTLESS_PODMAN_STORE_DIR: readonly variable
test/system/libs/helpers.bash: line 12: ROOTLESS_PODMAN_RUNROOT_DIR: readonly variable
test/system/libs/helpers.bash: line 13: PODMAN_STORE_CONFIG_FILE: readonly variable
test/system/libs/helpers.bash: line 14: DOCKER_REG_ROOT: readonly variable
test/system/libs/helpers.bash: line 15: DOCKER_REG_CERTS_DIR: readonly variable
test/system/libs/helpers.bash: line 16: DOCKER_REG_AUTH_DIR: readonly variable
test/system/libs/helpers.bash: line 17: DOCKER_REG_URI: readonly variable
test/system/libs/helpers.bash: line 18: DOCKER_REG_NAME: readonly variable
test/system/libs/helpers.bash: line 21: PODMAN: readonly variable
test/system/libs/helpers.bash: line 22: TOOLBX: readonly variable
test/system/libs/helpers.bash: line 23: SKOPEO: readonly variable
...
fedora-rawhide | 1..340
[1] https://github.com/bats-core/bats-core/pull/904https://github.com/containers/toolbox/pull/1482
The system tests download several images when setting up the test suite,
and cache them for later use by the tests [1]. This saves time and
avoids hitting rate limits imposed by OCI registries by not downloading
the same images repeatedly for several tests, but at the cost of
increased use of storage space to cache the images.
The images are cached under BATS_TMPDIR. It defaults to the TMPDIR
environment variable, and if that's not set then to /tmp [2]. Normally,
TMPDIR isn't set, and the images end up getting cached under /tmp. Now,
/tmp is typically on tmpfs backed by RAM or swap, which means that it
should be used for smaller size-bounded files only, and /var/tmp should
be used for everything else [3].
The images are big enough that a collection of them can't be described
as smaller and size-bounded, and it led to:
1..306
# test suite: Set up
# test suite: Tear down
not ok 1 setup_suite
# (from function `setup_suite' in test file ./setup_suite.bash, line
55)
# `_pull_and_cache_distro_image fedora "$((system_version-1))" ||
false' failed
# Failed to cache image registry.fedoraproject.org/fedora-toolbox:40
to /tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40
# time="2024-02-19T11:41:43Z" level=fatal msg="copying system image
from manifest list: writing blob: write
/tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40/dir-put-blob607392514:
no space left on device"
# bats warning: Executed 1 instead of expected 306 tests
So, change the default location of the BATS_TMPDIR environment variable
to /var/tmp by setting TMPDIR.
[1] Commit 50683c9d9ahttps://github.com/containers/toolbox/commit/50683c9d9a78adc9https://github.com/containers/toolbox/pull/375
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[3] https://systemd.io/TEMPORARY_DIRECTORIES/https://github.com/containers/toolbox/pull/1462
The working directory from which bats(1) is invoked might not be part of
the Toolbx container. eg., Podman's downstream Fedora CI invokes the
tests as:
$ cd /path/to/toolbox/test/system
$ bats .
... and it led to [1]:
not ok 110 run: Smoke test with true(1)
# (from function `assert_output' in file
./libs/bats-assert/src/assert.bash, line 255,
# in test file ./104-run.bats, line 38)
# `assert_output ""' failed
#
# -- output differs --
# expected (0 lines):
#
# actual (3 lines):
# Error: crun: chdir to `/usr/share/toolbox/test/system`: No such
file or directory: OCI runtime attempted to invoke a command that
was not found
# Error: directory /usr/share/toolbox/test/system not found in
container fedora-toolbox-41
# Using /home/testuser instead.
# --
#
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2263968https://github.com/containers/toolbox/pull/1457
The paths to bats-assert and bats-support are broken, if bats(1) is
invoked from any other location than the parent directory of the 'tests'
directory. eg., Podman's downstream Fedora CI invokes the tests as:
$ cd /path/to/toolbox/test/system
$ bats .
... and it led to [1]:
1..306
# test suite: Set up
# Missing dependencies
# Forgot to run 'git submodule init' and 'git submodule update' ?
# test suite: Tear down
not ok 1 setup_suite
# (from function `setup_suite' in test file ./setup_suite.bash, line 33)
# `return 1' failed
# bats warning: Executed 1 instead of expected 306 tests
Fallout from 2c09606603
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2263968https://github.com/containers/toolbox/pull/1448
This is meant to make the project more searchable on the Internet. More
and more people have been pointing out that "toolbox" is terribly
difficult to search for, and it's impossible to find any decent
Internet real estate by that name.
Some exceptions:
* The code repository is still https://github.com/containers/toolbox.
It will be renamed after giving a heads-up to other contributors.
* The name of the binary is still 'toolbox'. The name is embedded
into existing Toolbx containers as their entry point, which is bind
mounted from the host operating system when the containers are
started. Trivially renaming the binary will prevent these
containers from starting.
* For similar reasons, the TOOLBOX_PATH environment variable is still
the same.
* For similar reasons, the profile.d file to be read by the shell on
start-up is still called toolbox.sh.
* The label used to identify Toolbx containers and images is still
called com.github.containers.toolbox. There are many existing
Toolbx containers, and many Toolbx images beyond the control of the
Toolbx project that use this label to identity themselves. Simply
renaming the label will prevent these containers and images from
being recognized.
* The names of the built-in Toolbx images still retain the word
'toolbox'. Images under the new name need to be published on the
OCI registries and the toolbox(1) binary needs to be taught to
handle both old and new names, wherever necessary, for backwards
compatibility.
* The stamp file used to identify Toolbx containers is still called
/run/.toolboxenv because it's used by various external programs and
users to identify Toolbx containers.
* The OSC 777 escape sequence to track and preserve the user's current
Toolbx container [1] still emits 'toolbox' as the name of the
container runtime. Changing the escape sequence can break terminal
emulation applications, like Prompt [2], that consume it. Hence, it
needs to be done carefully.
* The runtime directories at /run/toolbox, when used as root, and
$XDG_RUNTIME_DIR/toolbox, when used rootless, weren't renamed.
When used as root, /run/toolbox is embedded into existing Toolbx
containers as a bind mount from the host. Trivially renaming the
path will prevent these containers from starting.
Secondly, both these paths are used to synchronize container
start-up. If the paths are trivially renamed, and the toolbox(1)
binary is updated and used without stopping all existing containers,
then it won't be able to enter containers that were already started.
Strictly speaking, this scenario isn't supported, since updates are
always expected to be "offline" [3]. However, it's worth noting
because solving the previous problem might also address this.
* The configuration file for RPM is still called
/usr/lib/rpm/macros.d/macros.toolbox.
[1] https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/17
[2] https://gitlab.gnome.org/chergert/prompt
[3] https://www.freedesktop.org/software/systemd/man/latest/systemd.offline-updates.htmlhttps://github.com/containers/toolbox/issues/1399
With the recent expansion of the test suite, it's necessary to increase
the timeout for the Fedora 38 and 39 nodes to prevent the CI from timing
out.
https://github.com/containers/toolbox/pull/1445
It is difficult for downstream to track dependencies when they are
automatic. This results in shell completion scripts not being
installed in packages because builders do not have the right
dependency. This commit adds meson feature arguments to guard those
dependencies so that downstream distributions can use
`-Dauto_features=enabled`.
For more explanation, see rule #3 of:
https://blogs.gnome.org/mcatanzaro/2022/07/15/best-practices-for-build-options/https://github.com/containers/toolbox/pull/1442
Signed-off-by: Valentin David <me@valentindavid.com>
It doesn't make sense to show the image download prompt when the
standard input or output stream is redirected to something other than a
terminal device.
During such non-interactive use, there's no way for the user to see the
prompt and the size of the image and then make a decision based on them.
The decision has to be made differently and earlier. The user will
either never download or always download or will use 'skopeo inspect'
to decide for themself.
Secondly, when the input and output are not connected to a terminal, the
terminal escape sequences and the terminal-specific ioctl(2) requests
used to show the prompt won't work anyway.
Some changes by Debarshi Ray.
https://github.com/containers/toolbox/pull/1428
Prompt is a new terminal emulation application [1] designed for a
container-oriented desktop that implements the OSC 777 escape sequence
to track and preserve the user's current Toolbx container [2]. Hence,
Fedora's fork of GNOME Terminal is no longer the only one to offer this.
The implementation in Prompt is already better because it has a
user-visible setting to disable this integration with Toolbx, in case
the user doesn't want it. Therefore, it's time to let users of all host
operating systems enjoy this feature.
This reverts commits a3e8d8d12b and
3a96feba47.
[1] https://gitlab.gnome.org/chergert/prompt
[2] https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/17https://github.com/containers/toolbox/issues/218
Without a sufficient buffer size the discard function does not read
fast/efficiently enough causing multiple lines indicating "passed and
discarded input" to show up.
I used an already defined constant[0] for the buffer size to prevent
the use of a yet-another magical constant.
[0] https://pkg.go.dev/bytes#pkg-constantshttps://github.com/containers/toolbox/pull/1427
On some Toolbx images with systemd-resolved(8), like the fedora-toolbox
images for Fedora 39 onwards, /etc/resolv.conf can end up being a
symbolic link inside the container that expects the host operating
system to also use systemd-resolved(8):
$ ls -l /etc/resolv.conf
lrwxrwxrwx. 1 root root 39 Nov 28 08:50 /etc/resolv.conf ->
../run/systemd/resolve/stub-resolv.conf
This happens because systemd-resolved(8) already makes /etc/resolv.conf
a symbolic link inside the image, and, hence, the container's entry
point doesn't change it to point at the host's copy of the file at
/run/host/etc/resolv.conf. Instead, it's left pointing at the host's
copy of the files maintained by systemd-resolved(8) under
/run/systemd/resolve, which happen to be also available inside the
container [1].
If the host OS doesn't use systemd-resolved(8), like Red Hat Enterprise
Linux 9, then this leads to a dangling symbolic link and breaks DNS
queries.
Note that the presence of systemd-resolved(8) in the recent
fedora-toolbox images is a regression caused by the ToolbxReleaseBlocker
Change [2] for Fedora 39 where the image was rewritten in terms of
fedora-kickstarts and pungi-fedora instead of a Container/Dockerfile.
By mistake, systemd crept in as an RPM needed by the image [3], which
in turn pulled in the systemd-resolved RPM as a weak dependency [4].
Hopefully, that will get fixed. However, it's also not practical to
keep track of all the Toolbx images out there in the wild, so it's
wise to make toolbox(1) more resilient to such things.
This will have the downside of overwriting some custom user-made
modifications to the container's /etc/resolv.conf. While that's
unfortunate, it's more important to have Toolbx images produce working
containers on a wide range of host OSes. It will be better to come up
with a more explicit way to support custom user-made modifications to
the container's configuration. Perhaps with a persistent stamp file.
[1] Commit af602c7d22https://github.com/containers/toolbox/commit/af602c7d227617d2https://github.com/containers/toolbox/pull/707
[2] https://fedoraproject.org/wiki/Changes/ToolbxReleaseBlocker
[3] fedora-kickstarts commit 48e2c3b5598de32f
https://pagure.io/fedora-kickstarts/c/48e2c3b5598de32f
[4] fedora-kickstarts commit 49306cb6eada8777
https://pagure.io/fedora-kickstarts/c/49306cb6eada8777https://github.com/containers/toolbox/issues/1410
It takes 'skopeo inspect' a few seconds to fetch the image size from the
remote registry, and while that happens the user can't interact with the
image download prompt:
$ toolbox create
Image required to create toolbox container.
<wait for a few seconds>
Download registry.fedoraproject.org/fedora-toolbox:39 (359.8MB)? [y/N]:
This feels awkward because it's not clear to the user what's going on
during those few seconds. Moreover, while knowing the image size can be
convenient at times, for example when disk space and network bandwidth
are limited, it's not always important.
It will be better if 'skopeo inspect' ran in the background, while
waiting for the user to respond to the image download prompt, and once
the image size has been fetched, the image download prompt can be
updated to include it.
So, initially:
$ toolbox create
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:39 ( ... MB)? [y/N]:
... and then once the size is available:
$ toolbox create
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:39 (359.8MB)? [y/N]:
If skopeo(1) is missing or too old, then the prompt can continue without
the size, as it did before:
$ toolbox create
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:39 [y/N]:
The placeholder for the missing image size (ie., ' ... MB') was chosen
to have seven characters, so that it matches the most common sizes. The
human-readable representation of the image size is capped at four valid
numbers [1]. Unless it's a perfect round number like 1KB or 1.2MB, it
will likely use all four numbers and the decimal point, which is five
characters. Then two more for the unit, because it's very unlikely that
there will be an image that's less than 1KB in size and will be shown in
bytes with a B. That makes it seven characters in total.
Updating the image download prompt with the results of 'skopeo inspect'
is vulnerable to races. At the same time as the terminal's cursor is
being moved to the beginning of the current line to overwrite the
earlier prompt with the new one, the user can keep typing and keep
moving the cursor forward. This competition over the cursor can lead to
awkward outcomes.
For example, the prompt can overwrite the characters typed in by the
user, leaving characters in the terminal's input buffer waiting for the
user to hit ENTER, even though they are not visible on the screen.
Another example is that hitting BACKSPACE can end up deleting parts of
the prompt, instead of stopping at the edge.
This is solved by putting the terminal device into non-canonical mode
input and disabling the echoing of input characters, while the prompt is
being updated. This prevents input from moving the terminal's cursor
forward, and from accumulating in the terminal's input buffer even if
it might not be visible. Any input during this interim period is
discarded and replaced by '...', and a fresh new prompt is shown in the
following line.
In practice, this race shouldn't be too common. It can only happen if
the user is typing right when the prompt is being updated, which is
unlikely because it's only supposed to be a short 'yes' or 'no' input.
The use of the context.Cause and context.WithCancelCause functions [2]
requires Go >= 1.20. Bumping the Go version in src/go.mod then requires
a 'go mod tidy'. Otherwise, it leads to:
$ meson compile -C builddir --verbose
...
/home/rishi/devel/containers/git/toolbox/src/go-build-wrapper
/home/rishi/devel/containers/git/toolbox/src
/home/rishi/devel/containers/git/toolbox/builddir src/toolbox
0.0.99.4 cc /lib64/ld-linux-x86-64.so.2 false
go: updates to go.mod needed; to update it:
go mod tidy
ninja: build stopped: subcommand failed.
[1] https://pkg.go.dev/github.com/docker/go-units#HumanSize
[2] https://pkg.go.dev/contexthttps://github.com/containers/toolbox/issues/752https://github.com/containers/toolbox/issues/1263
A subsequent commit will use this to ensure that the user can still
interact with the image download prompt while 'skopeo inspect' fetches
the image size from the remote registry. Initially, the prompt will be
shown without the image size. If the user responds before the size is
fetched, then the pending 'skopeo inspect' will be cancelled.
https://github.com/containers/toolbox/issues/752https://github.com/containers/toolbox/issues/1263
A subsequent commit will use this to ensure that the user can still
interact with the image download prompt while 'skopeo inspect' fetches
the image size from the remote registry. Initially, the prompt will be
shown without the image size. If the user responds before the size is
fetched, then the pending 'skopeo inspect' will be cancelled.
https://github.com/containers/toolbox/issues/752https://github.com/containers/toolbox/issues/1263
A subsequent commit will use this to ensure that the user can still
interact with the image download prompt while 'skopeo inspect' fetches
the image size from the remote registry. To do this, at some point, the
terminal device will be put into non-canonical mode input and the
echoing of input characters will be disabled to retain full control of
the cursor position.
https://github.com/containers/toolbox/issues/752https://github.com/containers/toolbox/issues/1263
A subsequent commit will use this to ensure that the user can still
interact with the image download prompt while 'skopeo inspect' fetches
the image size from the remote registry.
To do this, at some point, the terminal device will be put into
non-canonical mode input and the echoing of input characters will be
disabled to retain full control of the cursor position. Unfortunately,
this will require access to the full termios(3) struct that isn't given
by golang.org/x/term, and, hence, the code needs to be written using the
underlying termios(3) API.
This future code will have enough overlap with the IsTerminal API from
golang.org/x/term that it doesn't make sense to use a separate module
(ie., golang.org/x/term) for it.
https://github.com/containers/toolbox/issues/752https://github.com/containers/toolbox/issues/1263
A subsequent commit will use this to ensure that the user can still
interact with the image download prompt while 'skopeo inspect' fetches
the image size from the remote registry.
Initially, the prompt will be shown without the image size. Once the
size has been fetched, the older prompt will be cancelled and a new one
will be shown that includes the size. While the prompt is getting
updated, the terminal device will be put into non-canonical mode input
and the echoing of input characters will be disabled to retain full
control of the cursor position. Once the new prompt is in place, the
previous state of the terminal will be restored. However, anything that
was typed in the interim will be discarded to avoid surprising the user
with invisible input.
Even though this code is only expected to be used to read from the
standard input stream when it's connected to a terminal device, the use
of poll(2) here was tested with FIFOs or named pipes and regular files
as well, in case they might be necessary in future.
https://github.com/containers/toolbox/issues/752https://github.com/containers/toolbox/issues/1263
A subsequent commit will use this to ensure that the user can still
interact with the image download prompt while 'skopeo inspect' fetches
the image size from the remote registry. Initially, the prompt will be
shown without the image size. Once the size has been fetched, the older
prompt will be cancelled and a new one will be shown that includes the
size.
Even though this code is only expected to be used to read from the
standard input stream when it's connected to a terminal device, the use
of poll(2) here was tested with FIFOs or named pipes and regular files
as well, in case they might be necessary in future.
An eventfd(2) file descriptor expects a 8-byte or 64-bit integer value
to be given to write(2) to increase its counter by that amount [1]. In
C, it could be phrased as:
uint64_t one = 1;
write (eventfd, &one, sizeof (one));
However, Go's wrapper for write(2) expects a sequence of bytes (ie.,
[]byte), and not an arbitrary memory address [2]. Therefore, the
'encoding/binary' package [3] is used to encode the integer into a byte
sequence as a varint.
Even though a varint-encoded 64-bit integer takes a maximum of 10
bytes, as defined by binary.MaxVarintLen64, 1 byte is enough to encode
the number 1 as an unsigned 64-bit integer [4]. That's enough to fit
into a byte sequence of length 8 to satisfy what an eventfd(2) file
descriptor expects. Ultimately, it doesn't matter exactly what value
the receiving end assigns to the number given to write(2), as long as
it's not zero.
[1] https://man7.org/linux/man-pages/man2/eventfd.2.html
[2] https://pkg.go.dev/golang.org/x/sys/unix#Write
[3] https://pkg.go.dev/encoding/binary
[4] https://protobuf.dev/programming-guides/encoding/https://github.com/containers/toolbox/issues/752https://github.com/containers/toolbox/issues/1263
Currently, inotify(7) is used to keep /etc/timezone inside the Toolbx
container synchronized with the host operating system's /etc/localtime.
However, /etc/timezone is only there for compatibility with Java. The
vast majority of non-ancient code bases use /etc/localtime, which does
not need inotify(7) to stay synchronized.
Therefore, it's not worth preventing the container from starting when
the operating system is suffering from a shortage of resources needed
for inotify(7). Especially because this shortage can be caused by a bug
in another program that's consuming too many inotify(7) instances and
watches.
https://github.com/containers/toolbox/issues/1329
Any image or container that has APT or systemd may have /etc/kernel.
eg., the arch-toolbox and ubuntu-toolbox images.
https://github.com/containers/toolbox/pull/1409
Signed-off-by: Penn Bauman <me@pennbauman.com>
Christian Hergert requested this. He is working on improving the
integration of Toolbx with the terminal emulation stack in GNOME and
Fedora, and he is using Fedora Linux Asahi Remix for his work.
https://github.com/containers/toolbox/pull/1413
The arguments for the D-Bus method are accepted separately by gdbus(1)
without any options. Therefore, they shouldn't be indented by another
additional level.
https://github.com/containers/toolbox/pull/1412
There's no need to run these cat(1) and gdbus(1) invocations through a
shell (ie., 'sh -c'), because there's no shell expansion that needs to
be performed.
These are unlike cases where shell expansion does need to be performed.
eg., 'readlink /proc/$$/ns/user', where the $$ needs to be expanded.
Fallout from 58134f8497 and
a0514cba12https://github.com/containers/toolbox/pull/1412
GNU roff 1.23 stopped remapping unescaped Hyphen-Minus (ie., - or 0x2D)
characters in the input to Hyphen-Minus in the output. Instead, it
follows the specified behaviour of converting unescaped Hyphen-Minus
characters in the input to Hyphen (ie., ‐ or 0x2010) in the output. To
get Hyphen-Minus characters in the output, one needs to escape the
Hyphen-Minus with a backslash (ie., \-) in the input [1].
Therefore, the command line options documented in the manuals are no
longer prefixed with the Hyphen-Minus character that's needed to
ctually use them. This breaks copying and pasting from the manuals and
searching within them.
Unfortunately, escaping the Hyphen-Minus characters in Markdown doesn't
have the intended effect of having Hyphen-Minus in the generated manual
pages [2]. Therefore, this is worked around by having the tests check
for both Hyphen-Minus and Hyphen.
Note that some operating system distributions, like Debian, have
reverted this change from GNU roff, but others haven't. So, unless it
can be guaranteed that the manuals will always have Hyphen-Minus
regardless of which GNU roff version or variant is being used, the tests
need to check for both.
[1] https://lwn.net/Articles/947941/https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00001.htmlhttps://git.savannah.gnu.org/cgit/groff.git/tree/PROBLEMS?h=1.23.0#n82
[2] https://github.com/cpuguy83/go-md2man/issues/101https://github.com/containers/toolbox/pull/1398
This should finally ensure that the fedora-toolbox image doesn't have
any package that had its content, such as documentation or translations,
stripped out by the fedora base image.
Until now, missing-docs had a hand-maintained list of packages that had
their content stripped out by the fedora base image. These packages are
reinstalled when building the fedora-toolbox image to restore the lost
content. Unfortunately, this list was incomplete because it was only
updated when someone noticed that something is missing.
Now, the list is generated with:
$ rpm --all --query --state --queryformat "PACKAGE: %{NAME}\n"
... to ensure that it's always complete.
The existing built-in test to ensure that the desired files are actually
present in the final image was extended to cover some of those that were
absent. A new built-in test, based on the above rpm(1) command, was
added as a fallback to ensure that the final image doesn't have any
package with missing content.
Only the images for currently maintained Fedoras (ie., 37, 38 and 39)
were updated.
As suggested by Brian Campbell.
https://github.com/containers/toolbox/issues/603
The shadow-utils package has always been part of the fedora base image.
It's explicitly listed in extra-packages as a safeguard against losing
useradd(8) and usermod(8) by mistake because they are needed by the
entry point of a Toolbx container [1]. Hence, the need to restore the
shadow-utils documentation that was stripped out in the base image.
Only the images for currently maintained Fedoras (ie., 37, 38 and 39)
were updated.
[1] Commit c6772f0f11https://github.com/containers/toolbox/commit/c6772f0f112e8004https://github.com/containers/toolbox/pull/1394
Ansible's built-in 'package' module doesn't show any details when
installing the RPMs. All that can be seen is:
TASK [Install RPM packages]
fedora-rawhide | changed
Therefore, there's no way to know what version of the packages got
installed.
In this case, not knowing the go-md2man(1) version being used by the CI
makes it difficult to know why the tests are failing on Fedora Rawhide
and Fedora 39 with:
not ok 3 help: Command 'help' in 177ms
# (from function `assert_line' in file
test/system/libs/bats-assert/src/assert.bash, line 479,
# in test file test/system/002-help.bats, line 48)
# `assert_line --index 0 --partial "toolbox(1)"' failed
# /usr/bin/man
#
# -- line does not contain substring --
# index : 0
# substring : toolbox(1)
# line : troff:<standard input>:33: warning: cannot select font
'C'
# --
#
It could be either because the CI is still using an older version of
go-md2man(1) [1,2], or that there's some other problem.
[1] Fedora golang-github-cpuguy83-md2man commit 117806d50e401c19
https://src.fedoraproject.org/rpms/golang-github-cpuguy83-md2man/c/117806d50e401c19https://src.fedoraproject.org/rpms/golang-github-cpuguy83-md2man/pull-request/3
[2] go-md2man commit d85280db9b54b574
https://github.com/cpuguy83/go-md2man/commit/d85280db9b54b574https://github.com/cpuguy83/go-md2man/issues/99https://github.com/containers/toolbox/pull/1386
Currently, some of the names of the tests were too long, and had
inconsistent and verbose wording. This made it difficult to look at
them and get a gist of all the scenarios being tested. The names are
like headings. They shouldn't be too long, should capture the primary
objective of the test and be consistent in their wording.
Note that the term 'usage screen' was particularly confusing. Prior to
commit 3dc106e10a, 'usage screen' in the names of the tests also
referred to the very brief listing of the commands and options that's
shown by 'toolbox help' and 'toolbox --help' in the absence of man(1).
In the context of this change, the term referred to the brief two line
error message that's shown when an unknown command or flag is used. So,
it will be good to not use it anymore.
https://github.com/containers/toolbox/pull/1386
Commit 5e63e9ec9b added a 'help' command to show the toolbox(1)
manual or a manual page for a specific command, and made the --help flag
identical to it. Therefore it's misleading to say that the --help flag
should show the usage screen. The usage screen is a brief listing of
the commands and options, which isn't the same thing as the more
detailed manuals.
Later, after this test was written, commit 40fc1689a3 added a
fallback for host operating systems without man(1), like Fedora CoreOS,
that would show a very brief usage screen with only the most common
commands.
To make it more confusing, the test was checking for a string that's
common to both the toolbox(1) manual and the fallback brief usage screen
that might be shown by 'toolbox --help'. This meant that it was neither
able to distinguish between the code paths nor ensure that they were
working as intended.
This was resolved by adapting the existing 'toolbox --help' test to
strictly ensure that it's showing the toolbox(1) manual when man(1) is
present, and by adding a new test to strictly ensure that it's showing
the fallback brief usage screen when man(1) is absent.
Until Bats 1.10.0, 'run --keep-empty-lines' had a bug where it counted
the trailing newline on the last line as a separate line [1]. However,
Bats 1.10.0 is only available in Fedora >= 39 and is absent from Fedoras
37 and 38.
Fallout from b27795a03e
[1] Bats commit 6648e2143bffb933
https://github.com/bats-core/bats-core/commit/6648e2143bffb933https://github.com/bats-core/bats-core/issues/708https://github.com/containers/toolbox/pull/1386
Until now, only the packages that are present in the fedora base image,
and had their documentation stripped out, were being tested for the
availability of documentation. There were no tests for the extra
packages that get added to the base image to form the fedora-toolbox
image.
The util-linux and xz packages were picked as examples for these new
tests. The xz package is a particularly good example because it has
translations for its manuals. It can help test that the fedora-toolbox
image is localized just like Fedora Silverblue and Workstation.
Only the images for currently maintained Fedoras (ie., 37, 38 and 39)
were updated.
https://github.com/containers/toolbox/pull/1384
Any system-wide customization to Bash's history facilities done through
a custom /etc/profile.d configuration snippet on the host operating
system gets lost inside the Toolbx container.
This is because Toolbx doesn't know what name to expect for the custom
/etc/profile.d snippet on the host, and, hence, can't give access to it
through a bind mount or symbolic link inside the container. The user
can definitely set up their own symbolic link inside the container to a
snippet inside /run/host/etc/profile.d. However, it's tedious to do
that for all containers, and the user may not even know that they are
missing the customization until they notice something wrong with the
history, which is shared across all containers and the host, and at that
point they might have already lost commands that they can't easily
reconstruct.
Therefore, it's worth trying to improve the situation by default.
This tries to preserve the environment variables used to customize
Bash's history facilities [1] across the host operating system and
Toolbx container. It assumes that the Bash start-up scripts inside the
container won't overwrite any of the propagated variables, which might
not always be the case [2].
[1] https://www.gnu.org/software/bash/manual/html_node/Bash-History-Facilities.htmlhttps://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html
[2] https://pagure.io/setup/pull-request/48https://github.com/containers/toolbox/issues/1359
Currently, if 'skopeo copy ...' fails to download and cache an OCI image
during setup_suite(), the test suite doesn't immediately fail, but
continues. It only fails later when trying to set up the Docker
registry and contains a lot of noise:
not ok 1 setup_suite
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# from function `_setup_docker_registry' in file
test/system/libs/helpers.bash, line 211,
# from function `setup_suite' in test file
test/system/setup_suite.bash, line 59)
# `_setup_docker_registry' failed
# Failed to cache image registry.fedoraproject.org/fedora-toolbox:38
to /tmp/bats-run-GyTP7A/image-cache/fedora-toolbox-38
#
# -- command failed --
# status : 1
# output : time="2023-09-25T12:19:52+02:00" level=fatal
msg="initializing source
docker://registry.fedoraproject.org/fedora-toolbox:38-foo:
reading manifest 38-foo in
registry.fedoraproject.org/fedora-toolbox: manifest unknown"
# --
#
# Failed to cache image quay.io/toolbx/arch-toolbox:latest to
/tmp/bats-run-GyTP7A/image-cache/arch-toolbox-latest
#
# -- command failed --
# status : 1
# output : time="2023-09-25T12:20:48+02:00" level=fatal
msg="initializing source
docker://quay.io/toolbx/arch-toolbox:latest-foo: reading
manifest latest-foo in quay.io/toolbx/arch-toolbox: manifest
unknown"
# --
#
# Failed to cache image registry.fedoraproject.org/fedora-toolbox:34
to /tmp/bats-run-GyTP7A/image-cache/fedora-toolbox-34
#
# -- command failed --
# status : 1
# output : time="2023-09-25T12:21:42+02:00" level=fatal
msg="initializing source
docker://registry.fedoraproject.org/fedora-toolbox:34-foo:
reading manifest 34-foo in
registry.fedoraproject.org/fedora-toolbox: manifest unknown"
# --
#
# ...
#
# -- command failed --
# status : 1
# output : time="2023-09-25T12:26:33+02:00" level=fatal
msg="determining manifest MIME type for
dir:/tmp/bats-run-GyTP7A/image-cache/fedora-toolbox-34: open
/tmp/bats-run-GyTP7A/image-cache/fedora-toolbox-34/manifest.json:
no such file or directory"
# --
#
# docker-registry
# 27fa141e291e64e4c7a148c88ddab219ff2bfb5802a2982dc4188dc11f41692d
# Untagged: quay.io/toolbox_tests/registry:latest
# Deleted: fea5a12cde107bb407bc44ede6dd9edea1d2b4171cd8e52b0cb330bf45e517e1
It makes it look as if the root cause of the failure is related to
setting up the Docker registry, which it isn't, and all that noise makes
it difficult to spot the actual problem.
Instead, from now on, it will be more obvious:
not ok 1 setup_suite
# (from function `setup_suite' in test file
test/system/setup_suite.bash, line 44)
# `_pull_and_cache_distro_image "$system_id" "$system_version" ||
false' failed
# Failed to cache image registry.fedoraproject.org/fedora-toolbox:38
to /tmp/bats-run-62b8CU/image-cache/fedora-toolbox-38
# time="2023-09-25T13:55:42+02:00" level=fatal msg="initializing
source docker://registry.fedoraproject.org/fedora-toolbox:38-foo:
reading manifest 38-foo in
registry.fedoraproject.org/fedora-toolbox: manifest unknown"
Note that Bats' 'run' helper [1] isn't designed to work inside
setup_suite(). eg., 'run --separate-stderr' doesn't work because
BATS_TEST_TMPDIR isn't defined.
[1] https://bats-core.readthedocs.io/en/stable/writing-tests.htmlhttps://github.com/containers/toolbox/pull/1377
If setup_suite() fails for some reason, then an unrelated message from
'podman system reset' would show up:
not ok 1 setup_suite
# (from function `setup_suite' in test file
test/system/setup_suite.bash, line 43)
# `_pull_and_cache_distro_image foo || false' failed
# Requested distro (foo) does not have a matching image
# A "/home/rishi/.cache/toolbox/system-test-storage/storage.conf"
config file exists.
# Remove this file if you did not modify the configuration.
This extra error message from 'podman system reset' serves no purpose
because it's not related to the cause of the setup_suite() failure.
It's just noise and it's better to silence it.
https://github.com/containers/toolbox/pull/1375
If setup_suite() fails for some reason, causing the Docker registry to
not be created, then an unrelated message from 'podman stop' would show
up:
not ok 1 setup_suite
# (from function `setup_suite' in test file
test/system/setup_suite.bash, line 43)
# `_pull_and_cache_distro_image foo || false' failed
# Requested distro (foo) does not have a matching image
# Error: no container with name or ID "docker-registry" found: no such
container
# ...
# ...
This extra error message from 'podman stop' serves no purpose because
it's not related to the cause of the setup_suite() failure. It's just
noise and it's better to silence it.
https://github.com/containers/toolbox/pull/1375
Contrary to what the documentation might seem to imply [1], Bats' 'fail'
helper only aborts a test case under certain circumstances. eg., when
called from setup_suite(), but not from within a child function, and a
@test case, but not from within the 'run' helper.
If 'fail' is called from within 'run', then the code after it will
continue to execute. The test case will only fail if 'run' eventually
catches a non-zero exit code that's caught by 'assert_success' [2].
Similarly, it doesn't abort if called from within a child function in
setup_suite().
Currently, _pull_and_cache_distro_image() is a child function called
from setup_suite(). So 'fail' won't abort if an invalid distribution is
specified.
Fortunately, pull_distro_image() is being called from within @test
cases, but outside 'run'. So, there's no problem with it now. However,
some future code changes can unknowingly alter this reality and it too
can run into unexpected behaviour.
Therefore, it's better to be safe, and explicitly specify a non-zero
exit code after 'fail'. It will ensure that it works as expected under
all circumstances.
[1] https://github.com/bats-core/bats-support
[2] https://github.com/bats-core/bats-asserthttps://github.com/containers/toolbox/pull/1375
Currently, if a Toolbx container's entry point fails to initialize the
container, there's no way to see the debug logs and error messages from
the entry point:
not ok 106 container: Check container starts without issues
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# in test file test/system/103-container.bats, line 39)
# `assert_success' failed
#
# -- command failed --
# status : 1
# output :
# --
#
Instead, from now on, they will be visible:
not ok 106 container: Check container starts without issues
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# in test file test/system/103-container.bats, line 39)
# `assert_success' failed
#
# -- command failed --
# status : 1
# output (90 lines):
# Failed to initialize container fedora-toolbox-38
# level=debug msg="Running as real user ID 0"
# level=debug msg="Resolved absolute path to the executable as
/usr/bin/toolbox"
# level=debug msg="TOOLBOX_PATH is /opt/bin/toolbox"
# level=debug msg="Migrating to newer Podman"
# level=debug msg="Migration not needed: running inside a container"
# level=debug msg="Setting up configuration"
# ...
# --
#
https://github.com/containers/toolbox/pull/1374
Bats' 'run' helper returns with an exit code of 0 even when the command
that it was given to run failed with a non-zero exit code [1]. This is
to enable making further assertions about the command after 'run' has
finished. If there's nothing that checks for failures, then it will
continue as if everything is alright.
Therefore, currently, if 'podman logs' fails, there's no indication of
it and the test only fails later because it thinks that the container
failed to initialize:
not ok 106 container: Check container starts without issues
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# in test file test/system/103-container.bats, line 39)
# `assert_success' failed
#
# -- command failed --
# status : 1
# output :
# --
#
Instead, from now on, it will be more obvious:
not ok 106 container: Check container starts without issues
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# in test file test/system/103-container.bats, line 39)
# `assert_success' failed
#
# -- command failed --
# status : 125
# output (2 lines):
# Failed to invoke '/usr/bin/podman logs'
# Error: no container with name or ID "foo" found: no such container
# --
#
One alternative was to use 'assert_success' [2] to assert that the
command given to 'run' succeeded. That would show the 'podman logs'
failure as:
not ok 106 container: Check container starts without issues
# (from function `assert_success' in file
test/system/libs/bats-assert/src/assert.bash, line 114,
# in test file test/system/103-container.bats, line 39)
# `assert_success' failed
#
# -- command failed --
# status : 1
# output (29 lines):
#
# -- command failed --
# status : 125
# output : Error: no container with name or ID "foo" found: no such
container
# --
#
# ...
#
# -- command failed --
# status : 125
# output : Error: no container with name or ID "foo" found: no such
container
# --
# --
#
However, it's a bit too noisy because of the 'assert_success' not
terminating container_started() and continuing to loop for the remaining
attempts.
[1] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[2] https://github.com/bats-core/bats-asserthttps://github.com/containers/toolbox/pull/1372
A subsequent commit will use this variable to set the return value for a
different condition. Therefore, the name needs to be changed to suit
the purpose.
https://github.com/containers/toolbox/pull/1372
Until Bats 1.10.0, 'run' with options had a bug where it would overwrite
the value of the 'i' variable even outside 'run' [1].
In these particular instances, no options are being passed to 'run',
and, hence, currently there's no problem. However, in case a future
commit adds an option, then it could lead to hard-to-debug problems.
eg., --separate-stderr sets 'i' to 1, --show-output-of-passing-tests
sets it to 2, etc.. Therefore, depending on the flag and the loop, the
loop might get terminated prematurely or run infinitely or something
else.
Moreover, Bats 1.10.0 is only available in Fedora >= 39 and is absent
from Fedoras 37 and 38. Therefore, it's not possible to consider this
bug fixed.
Hence, it's better to preemptively work around it to avoid any future
issues.
[1] Bats commit 502dc47dd063c187
https://github.com/bats-core/bats-core/commit/502dc47dd063c187https://github.com/bats-core/bats-core/issues/726https://github.com/containers/toolbox/pull/1373
Otherwise https://www.shellcheck.net/ would complain:
Line 141:
assert_line --index 0 "~/.bash_profile read"
^------------------^ SC2088 (warning): Tilde
does not expand in quotes.
Use $HOME.
See: https://www.shellcheck.net/wiki/SC2088
This is a false positive. There's no need for the tilde to be expanded
because it's not being used for any file system operation. It's merely
a human-readable string.
However, it's easier to change the string to use $HOME than littering
the file with ShellCheck's inline 'disable' directives.
https://github.com/containers/toolbox/pull/1366
These files aren't marked as executable, and shouldn't be, because they
aren't meant to be standalone executable scripts. They're meant to be
part of a test suite driven by Bats. Therefore, it doesn't make sense
for them to have shebangs, because it gives the opposite impression.
The shebangs were actually being used by external tools like Coverity to
deduce the shell when running shellcheck(1). Shellcheck's inline
'shell' directive is a more obvious way to achieve that.
https://github.com/containers/toolbox/pull/1363
The setup_suite.bash file is meant to be written in Bash, and is not
supposed to have any Bats-specific syntax. That's why it has the *.bash
suffix, not *.bats. If Bats finds a setup_suite.bash file, when running
the test suite, it uses Bash's source(1) builtin to read the file.
This is a cosmetic change. The Bats syntax is a superset of the Bash
syntax. Therefore, it didn't make a difference to external tools like
Coverity that use the shebang to deduce the shell for shellcheck(1).
Secondly setup_suite.bash isn't meant to be an executable script and,
hence, the shebang has no effect on how the file is used. However, it's
still a commonly used hint about the contents of the file, and it's
better to be accurate than misleading.
A subsequent commit will replace the shebangs in the test suite with
ShellCheck's 'shell' directives.
Fallout from 7a387dcc8bhttps://github.com/containers/toolbox/pull/1363
It's one less invocation of an external command, which is good because
spawning a new process is generally expensive.
One positive side-effect of this is that on some Active Directory
set-ups, the entry point no longer fails with:
Error: failed to remove password for user login@company.com: failed
to invoke passwd(1)
... because of:
# passwd --delete login@company.com
passwd: Libuser error at line: 210 - name contains invalid char `@'.
This is purely an accident, and isn't meant to be an intential change to
support Active Directory. Tools like useradd(8) and usermod(8) from
Shadow aren't meant to work with Active Directory users, and, hence, it
can still break in other ways. For that, one option is to expose $USER
from the host operating system to the Toolbx container through a Varlink
interface that can be used by nss-systemd inside the container.
Based on an idea from Si.
https://github.com/containers/toolbox/issues/585
Until now, configureUsers() was pushing the burden of deciding whether
to add a new user or modify an existing one on the callers, even though
it can trivially decide itself. Involving the caller loosens the
encapsulation of the user configuration logic by spreading it across
configureUsers() and it's caller, and adds an extra function parameter
that needs to be carefully set and is vulnerable to programmer errors.
Fallout from 9ea6fe5852https://github.com/containers/toolbox/pull/1356
These tests assume that the group and user information on the host
operating system can be provided by different plugins for the GNU Name
Service Switch (or NSS) functionality of the GNU C Library. eg., on
enterprise FreeIPA set-ups. However, it's expected that everything
inside the Toolbx container will be provided by /etc/group, /etc/passwd,
/etc/shadow, etc..
While /etc/group and /etc/passwd can be read by any user, /etc/shadow
can only be read by root. However, it's awkward to use sudo(8) in the
test cases involving /etc/shadow, because they ensure that root and
$USER don't need passwords to authenticate inside the container, and
sudo(8) itself depends on that. If sudo(8) is used, the test suite can
behave unexpectedly if Toolbx didn't set up the container correctly.
eg., it can get blocked waiting for a password.
Hence, 'podman unshare' is used instead to enter the container's initial
user namespace, where $USER from the host appears as root. This is
sufficient because the test cases only need to read /etc/shadow inside
the Toolbx container.
https://github.com/containers/toolbox/pull/1355
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump
and /var/log/journal sit on security hardened mount points that are
marked as 'nosuid,nodev,noexec' [1]. In such cases, when Toolbx is used
rootless, an attempt to bind mount these locations read-only at runtime
with mount(8) fails because of permission problems:
# mount --rbind -o ro <source> <containerPath>
mount: <containerPath>: filesystem was mounted, but any subsequent
operation failed: Unknown error 5005.
(Note that the above error message from mount(8) was subsequently
improved to show something more meaningful than 'Unknown error' [2].)
The problem is that 'init-container' is running inside the container's
mount and user namespace, and the source paths were mounted inside the
host's namespace with 'nosuid,nodev,noexec'. The above mount(8) call
tries to remove the 'nosuid,nodev,noexec' flags from the mount point and
replace them with only 'ro', which is something that can't be done from
a child namespace.
Note that this doesn't fail when Toolbx is running as root. This is
because the container uses the host's user namespace and is able to
remove the 'nosuid,nodev,noexec' flags from the mount point and replace
them with only 'ro'. Even though it doesn't fail, the flags shouldn't
get replaced like that inside the container, because it removes the
security hardening of those mount points.
There's actually no benefit in bind mounting these paths as read-only.
It was historically done this way 'just to be safe' because a user isn't
expected to write to these locations from inside a container. However,
Toolbx doesn't intend to provide any heightened security beyond what's
already available on the host.
Hence, it's better to get out of the way and leave it to the permissions
on the source location from the host operating system to guard the
castle. This is accomplished by not passing any file system options to
mount(8) [1].
Based on an idea from Si.
[1] https://man7.org/linux/man-pages/man8/mount.8.html
[2] util-linux commit 9420ca34dc8b6f0f
https://github.com/util-linux/util-linux/commit/9420ca34dc8b6f0fhttps://github.com/util-linux/util-linux/pull/2376https://github.com/containers/toolbox/issues/911
The '[' and 'test' implementations from GNU coreutils don't support '-v'
as a way to check if a shell variable is set [1]. Only Bash's built-in
implementations do.
This is quite confusing and makes it difficult to find out what '-v'
actually does. eg., 'man --all test' only shows the manual for the GNU
coreutils version, which doesn't list '-v' [1], and, 'man --all [' only
shows the manual for Bash's built-ins, which also doesn't list '-v'.
One has to go to the bash(1) manual to find it [2].
Elsewhere in the code base [3], the same thing is accomplished with '-z'
and parameter substitution, which are more widely supported and, hence,
easier to find documentation for.
[1] https://manpages.debian.org/testing/coreutils/test.1.en.html
[2] https://linux.die.net/man/1/bash
[3] Commit 84ae385f33https://github.com/containers/toolbox/pull/1334https://github.com/containers/toolbox/pull/1341
'[' is a command that's the same as 'test' and they might be implemented
as standalone executables or shell built-ins. Therefore, the negation
(ie., '!') has to cover the entire command to operate on its exit code.
Instead, if it's writtten as '[ ! ... ]', then the negation becomes an
argument to '[', which isn't the same thing.
Fallout from 54a2ca1eadhttps://github.com/containers/toolbox/pull/1341
First, it's not a good idea to use awk(1) as a grep(1) replacement.
Unless one really needs the AWK programming language, it's better to
stick to grep(1) because it's simpler.
Secondly, it's better to look for a specific os-release(5) field instead
of looking for the occurrence of 'rawhide' anywhere in the file, because
it lowers the possibility of false positives.
https://github.com/containers/toolbox/pull/1336
The Zuul executor contains Ansible 2.13.7 whose 'dnf' module is not
working as it should with Fedora Rawhide because of the DNF5 Change [1].
Unlike DNF4, DNF5 no longer pulls in the python3-dnf RPM, which causes:
TASK [Install RPM packages]
fedora-rawhide | ERROR
fedora-rawhide | {
fedora-rawhide | "msg": "Could not import the dnf python module
using /usr/bin/python3 (3.12.0b3 (main, Jun 21 2023, 00:00:00)
[GCC 13.1.1 20230614 (Red Hat 13.1.1-4)]). Please install
`python3-dnf` or `python2-dnf` package or ensure you have
specified the correct ansible_python_interpreter. (attempted
['/usr/libexec/platform-python', '/usr/bin/python3',
'/usr/bin/python2', '/usr/bin/python'])",
fedora-rawhide | "results": []
fedora-rawhide | }
This adds a workaround that explicitly installs the python3-dnf RPM
using Ansible's 'command' module. It should be removed after Zuul
contains a newer release of Ansible.
[1] https://fedoraproject.org/wiki/Changes/ReplaceDnfWithDnf5https://github.com/containers/toolbox/pull/1338
Signed-off-by: Daniel Pawlik <dpawlik@redhat.com>
First, it's not a good idea to use awk(1) as a grep(1) replacement.
Unless one really needs the AWK programming language, it's better to
stick to grep(1) because it's simpler.
Secondly, it's better to look for a specific os-release(5) field instead
of looking for the occurrence of 'rawhide' anywhere in the file, because
it lowers the possibility of false positives.
https://github.com/containers/toolbox/pull/1332
The following caveats must be noted:
* Podman sets the Toolbx container's soft limit for the maximum number
of open file descriptors to the host's hard limit, which is often
greater than the host's soft limit [1].
* The ulimit(1) options -P, -T, -b, and -k don't work on Fedora 38
because the corresponding resource arguments for getrlimit(2) are
absent from the operating system. These are RLIMIT_NPTS,
RLIMIT_PTHREAD, RLIMIT_SBSIZE and RLIMIT_KQUEUES respectively.
[1] https://github.com/containers/podman/issues/17681https://github.com/containers/toolbox/issues/213
The current approach of extracting the VERSION_ID field from
os-release(5) assumes that the value is not quoted. There's no
guarantee that this will be the case. It only happens to be so on
Fedora by chance, and is different on Ubuntu:
$ cat /etc/os-release
...
VERSION_ID="22.04"
...
This means that "22.04", including the double quotes, is read as the
value of VERSION_ID on Ubuntu, not 22.04. This is wrong because this
value can't be used as is in image and container names. There's no
image called quay.io/toolbx/ubuntu-toolbox:"22.04" and double quotes are
not allowed in container names.
Instead, use the same approach as profile.d/toolbox.sh and the old POSIX
shell implementation that doesn't rely on the quoting of the
os-release(5) values.
Fallout from b27795a03ehttps://github.com/containers/toolbox/pull/1320
The current approach of selecting all the os-release(5) fields that have
'ID' in their name (eg., ID, VERSION_ID, PLATFORM_ID, VARIANT_ID, etc.)
and then picking the first one, assumes that the ID field will always be
placed above the others in os-release(5). There's no guarantee that
this will be the case. It only happens to be so on Fedora by chance,
and is different on Ubuntu:
$ cat /etc/os-release
...
VERSION_ID="22.04"
...
ID=ubuntu
ID_LIKE=debian
...
This means that "22.04" is read as the value of ID on Ubuntu, which is
clearly wrong.
Instead, use the same approach as profile.d/toolbox.sh and the old POSIX
shell implementation that doesn't rely on the order of the os-release(5)
fields.
Fallout from 54a2ca1eadhttps://github.com/containers/toolbox/pull/1320
Ansible's 'shell' module is almost exactly like the 'command' module,
except that it runs the command through a command line shell so that
environment variables like HOSTNAME and operations like '*', '<' and '>'
work. None of those things are necessary are here. Hence, it's better
to use the 'command' module as elsewhere.
Note that, unlike Ansible's 'shell' module, the 'command' module doesn't
support inline scripts. So, each command needs to be in its own
separate task.
https://github.com/containers/toolbox/pull/1318
We wasted some time trying to get the tests running locally, when all we
were missing were the 'git submodule ...' commands.
Add some more obvious hints about this possible stumbling block.
Note that Bats cautions against printing outside the @test, setup* or
teardown* functions [1]. In this case, doing so leads to the first line
of the error output going missing, when using the pretty formatter for
human consumption:
$ bats --formatter pretty ./test/system
✗ setup_suite
Forgot to run 'git submodule init' and 'git submodule update' ?
bats warning: Executed 1 instead of expected 191 tests
191 tests, 1 failure, 190 not run
[1] https://bats-core.readthedocs.io/en/stable/writing-tests.htmlhttps://github.com/containers/toolbox/pull/1298
Signed-off-by: Matthias Clasen <mclasen@redhat.com>
The 000-setup.bats and 999-teardown.bats files were added [1] at a time
when Bats didn't offer any hooks for suite-wide setup and teardown.
That changed in Bats 1.7.0, which introduced the setup_suite and
teardown_suite hooks. These hooks make it easier to run a subset of the
tests, which is a good thing.
In the past, to run a subset of the tests, one had to do:
$ bats ./test/system/000-setup.bats ./test/system/002-help.bats \
./test/system/999-teardown.bats
Now, one only has to do:
$ bats ./test/system/002-help.bats
Commit e22a82fec8 already added a dependency on Bats >= 1.7.0.
Therefore, it should be exploited wherever possible to simplify things.
[1] Commit 54a2ca1eadhttps://github.com/containers/toolbox/issues/751
[2] Bats commit fb467ec3f04e322a
https://github.com/bats-core/bats-core/issues/39https://bats-core.readthedocs.io/en/stable/writing-tests.htmlhttps://github.com/containers/toolbox/pull/1317
Bats 1.7.0 emits a warning if a feature that is only available starting
from a certain version of Bats onwards is used without specifying that
version [1]:
BW02: Using flags on `run` requires at least BATS_VERSION=1.5.0. Use
`bats_require_minimum_version 1.5.0` to fix this message.
(from function `bats_warn_minimum_guaranteed_version' in file
/usr/lib/bats-core/warnings.bash, line 32,
from function `run' in file
/usr/lib/bats-core/test_functions.bash, line 227,
in test file test/system/001-version.bats, line 27)
Note that bats_require_minimum_version itself is only available from
Bats 1.7.0 [2]. Hence, even though the specific feature here (using
flags on 'run') only requires Bats >= 1.5.0, in practice Bats >= 1.7.0
is needed. Fortunately, commit e22a82fec8 already added a
dependency on Bats >= 1.7.0. So, there's nothing to worry about.
[1] Bats commit 82002bb6c1a5c418
https://github.com/bats-core/bats-core/issues/556https://bats-core.readthedocs.io/en/stable/warnings/BW02.html
[2] Bats commit 71d6b71cebc3d32b
https://github.com/bats-core/bats-core/issues/556https://bats-core.readthedocs.io/en/stable/warnings/BW02.htmlhttps://github.com/containers/toolbox/pull/1315
Commit e22a82fec8 already added a dependency on Bats >= 1.7.0,
which is present on Fedora >= 36. Therefore, it should be exploited
wherever possible to simplify things.
Earlier, when the line counts were checked only with Bats >= 1.7.0,
there was a need to separately check the whole standard error and
output streams with 'assert_output' for the tests to be useful on
Fedora 35, which only had Bats 1.5.0. Now that the line counts are
being checked unconditionally, there's no need for that anymore.
Note that bats_require_minimum_version itself is only available from
Bats 1.7.0 [1].
[1] Bats commit 71d6b71cebc3d32b
https://github.com/bats-core/bats-core/issues/556https://bats-core.readthedocs.io/en/stable/warnings/BW02.htmlhttps://github.com/containers/toolbox/pull/1314
This allows using the 'distro' option to create and enter Arch Linux
containers. Due to Arch's rolling-release model, the 'release' option
isn't required. If 'release' is used, the accepted values are 'latest'
and 'rolling'.
https://github.com/containers/toolbox/pull/1311
Operating system distributions like Arch Linux that follow a
rolling-release model don't have the concept of a release. The latest
snapshot is the only available release.
A subsequent commit will add built-in support for Arch Linux. Hence,
the code can no longer assume that every distribution will have a
matching release.
Note that just because an operating system distribution may not have the
concept of a release, it doesn't mean that it will accept an invalid
'release' option.
https://github.com/containers/toolbox/pull/1311
The VERSION_ID field in os-release(5) is optional [1]. It's absent on
Arch Linux, which follows a rolling-release model and uses the BUILD_ID
field instead:
BUILD_ID=rolling
A subsequent commit will add built-in support for Arch Linux. Hence,
the code to get the default release from the host operating system can
no longer assume the presence of the VERSION_ID field in os-release(5).
Note that the arch-toolbox image is tagged with 'latest', in accordance
with OCI conventions, not 'rolling' [2,3], which is the os-release(5)
BUILD_ID. Therefore, it will be wise to use 'latest' as the default
release on Arch Linux, to simplify how the default release matches with
the default image's tag. This means that a os-release(5) field can't be
used for the default release on Arch.
[1] https://www.freedesktop.org/software/systemd/man/os-release.html
[2] Commit 2568528cb7https://github.com/containers/toolbox/pull/861
[3] Commit a4e5861ae5https://github.com/containers/toolbox/pull/1308https://github.com/containers/toolbox/pull/1303
Until now, the Arch Linux image was being published at
quay.io/toolbx-images/archlinux-toolbox:latest. This renames the image
to arch-toolbox [1] to match the os-release(5) ID on Arch, and changes
the location to quay.io/toolbx/arch-toolbox:latest.
Build and push when there are changes in the 'images/arch' directory
or in the GitHub workflow itself, as well as at 00:00 every Monday.
[1] Commit 2568528cb7https://github.com/containers/toolbox/pull/861https://github.com/containers/toolbox/pull/1308
Since Sericea is an official variant of Fedora, it should have an
official welcome message like the other ones.
https://github.com/containers/toolbox/pull/1293
Signed-off-by: Jakub Sierżęga <jakub.sierzega@comarch.com>
Until now, the Ubuntu images (versions 16.04, 18.04, 20.04, 22.04 and
22.10) were published at quay.io/toolbx-images/ubuntu-toolbox:22.04,
etc.. This changes the location to quay.io/toolbx/ubuntu-toolbox:22.04
and builds an image for Ubuntu 23.04 that was added recently [1].
Build and push when there are changes in the `images/ubuntu` directory
or in the GitHub workflow itself, as well as every other week (7th and
21st days of a month to be precise).
The toolbox(1) code and the system tests will be switched to the new
location after the first round of images are available.
[1] Commit 3cfb6bf888https://github.com/containers/toolbox/pull/1292https://github.com/containers/toolbox/pull/483
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
This is the definition of the arch-toolbox image for Arch Linux that
plays well with Toolbx.
Today, it's published at quay.io/toolbx-images/archlinux-toolbox:latest,
but the name of the published image will be changed to arch-toolbox [1]
to match the os-release(5) ID on Arch Linux. The convention of naming
the Toolbx images according to the os-release(5) ID is deeply ingrained
in the Toolbx code base. It will be better to keep things simple by
continuing that practice, instead of adding a one-off exception.
Maintenance of this image has been passed to Morten Linderud.
[1] https://github.com/toolbx-images/images/pull/82https://github.com/containers/toolbox/pull/861
This was recently introduced with `ubuntu-advantage-tools` and it tries
to poke at some system services introducing annoying delay and messages.
Even if the services are present (on Ubuntu host) and systemd is
accessible (rootful container) - that wouldn't be appropriate still.
https://github.com/containers/toolbox/pull/1291
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
This partly reflects the value of the 'maintainer' LABELs of the current
images. Oliver is the original author, but he has lots of other duties
these days, and wanted me to help him co-maintain the images.
Note that the toolbox image definitions for RHEL do need a maintainer
who is a Red Hat employee. Otherwise they won't be able to actually
build and publish the images at registry.access.redhat.com.
https://github.com/containers/toolbox/pull/1288
The phrase 'using a custom image' is awkward because it makes it sound
as if the image plays an important role in 'enter' and 'run'. That's
not true.
Also, titles are sweeter when they are shorter.
https://github.com/containers/toolbox/pull/1281
Signed-off-by: Nils Lindemann <nilslindemann@tutanota.com>
When a specific Toolbx container is selected by name for 'enter' and
'run', it's not necessary that the container was created using a custom
image. The container could have also been created using one of the
built-in images.
Secondly, the phrase 'using a custom image' is awkward because it makes
it sound as if the image plays an important role in 'enter' and 'run'.
That's not true.
Finally, titles are sweeter when they are shorter.
https://github.com/containers/toolbox/pull/1281
Signed-off-by: Nils Lindemann <nilslindemann@tutanota.com>
These are the definitions of the ubuntu-toolbox images for Ubuntus
16.04, 18.04, 20.04, 22.04 and 22.10 that play well with Toolbx. Such
as, password-less sudo, able to resolve its own hostname, SELinux is
masked off, etc.. At the moment, these are already published at
quay.io/toolbx-images/ubuntu-toolbox:22.04 and such.
https://github.com/containers/toolbox/pull/483https://github.com/containers/toolbox/pull/1284
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
fmt.Scanf [1] is fragile when it comes to space-separated input. It
stores successive space-separated values into successive arguments as
determined by the format string. This breaks with untrusted input that
can have an unknown number of space-separated values.
Here are some examples:
$ toolbox create
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:39 (294.8MB)?
[y/N]: no no not at all
$ no not at all
bash: no: command not found...
$ toolbox create
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:39 (294.8MB)?
[y/N]: foo bar
Download registry.fedoraproject.org/fedora-toolbox:39 (294.8MB)?
[y/N]: Download registry.fedoraproject.org/fedora-toolbox:39
(294.8MB)? [y/N]:
Instead this is what should happen:
$ toolbox create
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:39 (294.8MB)?
[y/N]: no no not at all
Download registry.fedoraproject.org/fedora-toolbox:39 (294.8MB)?
[y/N]: foo bar
Download registry.fedoraproject.org/fedora-toolbox:39 (294.8MB)?
[y/N]:
Fallout from 936f22ff15
[1] https://pkg.go.dev/fmt#Scanfhttps://github.com/containers/toolbox/pull/1279
It's quite obvious what the corresponding code is doing, and it isn't
any harder to understand than the rest of the code that's not commented.
https://github.com/containers/toolbox/pull/1282
This is a quick sanity check with 'podman images' to ensure that all the
images are in place before running 'list'. Other tests already do this,
so this change makes these two tests consistent with the rest.
https://github.com/containers/toolbox/pull/1273
This is the 'simple' case of having a well-known Toolbx image (ie.,
not a copy, not an image without a name, not a non-Toolbx image). It's
good to ensure that the default image works as expected with 'list'
before moving on to more complex scenarios.
https://github.com/containers/toolbox/pull/1278
Currently, some of the names of the tests were too long, and had
inconsistent and verbose wording. This made it difficult to look at
them and get a gist of all the scenarios being tested. The names are
like headings. They shouldn't be too long, should capture the primary
objective of the test and be consistent in their wording.
https://github.com/containers/toolbox/pull/1276
Currently, some of the names of the tests were too long, and had
inconsistent and verbose wording. This made it difficult to look at
them and get a gist of all the scenarios being tested. The names are
like headings. They shouldn't be too long, should capture the primary
objective of the test and be consistent in their wording.
https://github.com/containers/toolbox/pull/1271
Toolbx was conceived to address the needs of Fedora Linux. Even though
it works on host operating systems outside the Fedora family, it hasn't
treated them with the same importance as Fedora Linux and derivatives
like Red Hat Enterprise Linux. Subsequent commits will change that by
adding first-class support for host operating systems beyond the Fedora
universe. eg., Arch Linux and Ubuntu.
The current Toolbx maintainers, Ondřej Míchal and myself, are Fedora
developers and don't have the bandwidth to drive changes and track down
bugs in OSes outside the Fedora family. Therefore, maintenance of some
parts of the code base will be delegated to contributors from those
other OS communities.
This is a step in that direction by clearly specifying which part of the
code base is maintained by whom.
https://github.com/containers/toolbox/pull/1268
Currently, some of the names of the tests were too long, and had
inconsistent and verbose wording. This made it difficult to look at
them and get a gist of all the scenarios being tested. The names are
like headings. They shouldn't be too long, should capture the primary
objective of the test and be consistent in their wording.
https://github.com/containers/toolbox/pull/1265
This uses 'skopeo inspect' to get the size of the image on the registry,
which is usually less than the size of the image in a local
containers/storage image store after download (eg., 'podman images'),
because they are kept compressed on the registry. Skopeo >= 1.10.0 is
needed to retrieve the sizes [1].
However, this doesn't add a hard dependency on Skopeo to accommodate
size-constrained operating systems like Fedora CoreOS. If skopeo(1) is
missing or too old, then the size of the image won't be shown, but
everything else would continue to work as before.
Some changes by Debarshi Ray.
[1] Skopeo commit d9dfc44888ff71a6
https://github.com/containers/skopeo/commit/d9dfc44888ff71a6https://github.com/containers/skopeo/issues/641https://github.com/containers/toolbox/issues/752
Signed-off-by: Nieves Montero <nmontero@redhat.com>
Bind mounting the locations at runtime doesn't really have anything to
do with whether /run/host/etc is present inside the Toolbx container.
The only possible exception could have been /etc/machine-id, but it
isn't, because the bind mount is only performed if the source at
/run/host/etc/machine-id is present.
This is a historical mistake that has persisted for a long time, since,
in practice, /run/host/etc will almost always exist inside the Toolbx
container. It's time to finally correct it.
Fallout from 9436bbece0https://github.com/containers/toolbox/pull/1255
The --monitor-host option was added to the 'init-container' command in
commit 8b84b5e460 to accommodate Podman versions older than 1.2.0
that didn't have the '--dns none' and '--no-hosts' options for
'podman create'. These options are necessary to keep the Toolbx
container's /etc/resolv.conf and /etc/hosts files synchronized with
those of the host.
Note that Podman 1.2.0 was already available a few months before
commit 8b84b5e460 introduced the --monitor-host option. The
chances of someone using an older Podman back then was already on the
decline, and it's very unlikely that a container created with such a
Podman has survived till this date.
Commit b6b484fa79 raised the minimum required Podman version to
1.4.0, and made the '--dns none' and '--no-hosts' options a hard
requirement. The minimum required Podman version was again raised
recently in commit 8e80dd5db1 to 1.6.4. Therefore, these days,
there's no need to separately use the --monitor-host option of
'init-container' for newly created containers to indicate that the
Podman version wasn't older than 1.2.0.
Given all this, it's time to stop using the --monitor-host option of
'init-container', and assume that it's always set. The option is still
accepted to retain compatibility with existing Toolbx containers.
For containers that were created with the --monitor-host option, a
deprecation notice will be shown as:
$ podman start --attach CONTAINER
Flag --monitor-host has been deprecated, it does nothing
...
https://github.com/containers/toolbox/pull/617
So far the minimum required Podman version was 1.4.0, based on what used
to be available in RHEL 7. These days, Podman 1.6.4 is old enough to be
in RHEL 7.9. Hence it's time to bump the baseline.
https://github.com/containers/toolbox/pull/1253
This is meant to roughly replicate the build environments used by
downstream distributors to build toolbox(1). These can be restricted in
odd ways compared to a fully featured environment where toolbox(1) is
actually going to be used. eg., the inability to use podman(1) in the
case of Fedora or not having subordinate user and group ID ranges in the
case of openSUSE.
It's important to ensure that toolbox(1) can be built by downstream
distributors without any unnecessary hassle.
https://github.com/containers/podman/issues/17657https://github.com/containers/toolbox/issues/1246
Ever since commit bafbbe81c9, the shell completions are generated
while building Toolbx using the 'completion' command. This involves
running toolbox(1) itself, and hence validating the subordinate user and
group ID ranges.
Unfortunately, some build environments, like openSUSE's, don't have
subordinate ID ranges set up. Therefore, it's better to not validate
the subordinate ID ranges when generating the shell completions, since
they are generated by Cobra itself and subordinate ID ranges are not
involved at all.
Note that subordinate ID ranges may be needed when the generated shell
completions are actually used in interactive command line environments.
The shell completions invoke the hidden '__complete' command to get the
results that are presented to the user, and, if needed, the subordinate
ID ranges will continue to be used by podman(1) as part of that.
Some changes by Debarshi Ray.
https://github.com/containers/toolbox/issues/1246https://github.com/containers/toolbox/pull/1249
Having a separate convenience function reduces the indentation levels by
at least one, and sometimes two, and makes it easy to have more detailed
debug logs.
This will make the subsequent commit easier to read.
https://github.com/containers/toolbox/issues/1246
Ever since commit bafbbe81c9, the shell completions are generated
while building Toolbx using the 'completion' command. This involves
running toolbox(1) itself, and hence invoking 'podman version' to decide
if 'podman system migrate' is needed or not.
Unfortunately, some build environments, like Fedora's, are set up inside
a chroot(2) or systemd-nspawn(1) or similar, where 'podman version' may
not work because it does various things with namespaces(7) and clone(2)
that can, under certain circumstances, encounter an EPERM.
Therefore, it's better to avoid using podman(1) when generating the
shell completions, especially, since they are generated by Cobra itself
and podman(1) is not involved at all.
Note that podman(1) is needed when the generated shell completions are
actually used in interactive command line environments. The shell
completions invoke the hidden '__complete' command to get the results
that are presented to the user, and, if needed, 'podman system migrate'
will continue to be run as part of that.
This partially reverts commit f3e005d014
because podman(1) is now only an optional runtime dependency for the
system tests.
https://github.com/containers/podman/issues/17657
It's better not to use the global flag variables beyond the top-level
RunE functions, because sometimes the lower-level functions are re-used
from other files within the 'cmd' package. In this case,
createContainer(), and hence pullImage(), is also used in src/cmd/run.go
to implement the 'run' command. However, the 'run' command doesn't have
a --authflags option.
Since the default value of the flag is the zero value of the type, which
is a NOP in the code, it's likely that the code was still correct, but
it will be better to maintain some discipline here to highlight the
inputs needed by the lower-level functions. Otherwise, things can get
tangled up.
Fallout from ecd1ced719https://github.com/containers/toolbox/pull/1240
Just like /run/systemd/sessions makes it possible to get the seat for a
session ID, /run/systemd/users can make it possible to get the seat and
the session ID for a user's UID.
The absence of /run/systemd/users inside Toolbx containers isn't
currently causing problems for any use-case, but it seems very close
to the sort of things that were necessary to run a non-nested display
server from within a Toolbx container on a virtual terminal. It's not
impossible that in future some implementation details of the display
server stack may make /run/systemd/users necessary.
https://github.com/containers/toolbox/issues/992
Not having sd_booted(3) work inside Toolbx containers isn't currently
causing problems for any use-case. However, it did come in handy when
investigating how to run a non-nested display server from within a
Toolbx container on a virtual terminal, because it's necessary for
'systemd --user' to realize that the host operating system was booted
with systemd.
https://github.com/containers/toolbox/issues/992
Signed-off-by: Sebastian Wick <sebastian.wick@redhat.com>
Podman creates a private cgroup namespace for containers on cgroups v2
by default. The host's cgroupfs is mounted at /sys/fs/cgroup giving an
inconsistent view of the cgroups. Toolbx doesn't intend to provide a
segregated security domain. So, there is no need for a cgroup namespace
and Toolbx containers can just use the host's namespace.
Having a private cgroup namespace for containers isn't currently causing
problems for any use-case, but it did come in handy when investigating
how to run a non-nested display server from within a Toolbx container on
a virtual terminal. Since this requires a change to the 'podman create'
arguments, it's not going to have an effect on existing containers, and
re-creating containers is annoying for users. So, it might be better to
get ahead of the curve and do it preemptively.
https://github.com/containers/toolbox/issues/992
Signed-off-by: Sebastian Wick <sebastian.wick@redhat.com>
This is needed by display servers for creating udev device enumerators
that matches against tags.
https://github.com/containers/toolbox/issues/992
Signed-off-by: Jonas Ådahl <jadahl@gmail.com>
This is the full definition of the UBI-based toolbox image published for
RHEL 9.1 [1] at registry.access.redhat.com/ubi9/toolbox:9.1. Note that
the Dockerfile used to build this image was already available to the
public [2], but didn't include all the files necessary to build it.
However, this has some minor deviations from the published image. The
FROM line has been changed to registry.access.redhat.com/ubi9:9.1 so
that it can be built outside Red Hat's build system and always points to
the desired RHEL version. The extra-packages file doesn't have
gnupg2-smime because it doesn't seem to be actually part of the UBI RPM
repositories, and it's not clear how it works inside Red Hat's build
system. Otherwise, 'podman build' fails with:
STEP 11/14: RUN dnf -y install $(<extra-packages)
...
Last metadata expiration check: 0:00:23 ago on Tue Feb 7 18:50:13...
...
No match for argument: gnupg2-smime
...
Error: Unable to find a match: gnupg2-smime
Error: building at STEP "RUN dnf -y install $(<extra-packages)": while
running runtime: exit status 1
[1] https://catalog.redhat.com/software/containers/ubi9/toolbox/61532d7dd2c7f84a4d2ed86b
[2] https://catalog.redhat.com/software/containers/ubi9/toolbox/61532d7dd2c7f84a4d2ed86b?container-tabs=dockerfilehttps://github.com/containers/toolbox/pull/1232
The canonical copy of README.md contains banners and labels in the
header that aren't useful when the file is shipped as part of the
images. Hence, those were removed.
Only the images for currently maintained Fedoras (ie., 36, 37 and 38)
were updated.
https://github.com/containers/toolbox/pull/1231
It turns out that at least since Fedora 30 [1], the gnupg2 package has
been part of the fedora base image, because it's required by the dnf
package:
dnf -> python3-dnf -> python3-libdnf -> libdnf -> gpgme -> gnupg2
Hence, the need to restore the gnupg2 documentation that was stripped
out in the base image.
Only the images for currently maintained Fedoras (ie., 36, 37 and 38)
were updated.
[1] It's difficult to find out if the gnupg2 package wasn't part of the
fedora base image before Fedora 30, because those images are no
longer available from registry.fedoraproject.org.
https://github.com/containers/toolbox/pull/1228
Building an OCI image leads to so much spew that it's hard to notice if
something unexpected happened, and as seen in the previous commit [1],
unexpected things do happen.
Therefore, this adds a built-in test to ensure that the desired files
are actually present in the final image. Right now it only checks the
presence of some representative manuals to ensure that the packages
listed in the 'missing-docs' file really do get reinstalled, and the
documentation that was stripped out in the base image really does get
restored.
Only the images for currently maintained Fedoras (ie., 36, 37 and 38)
were updated.
[1] Commit 1fc50176c9https://github.com/containers/toolbox/pull/1226https://github.com/containers/toolbox/pull/1226
The RPM packages in the base 'fedora' image can be older than the those
currently available in the DNF 'updates' repository [1], but at the same
time newer than those available in the DNF 'fedora' repository [1]. The
first part happens because the base image isn't updated as often as the
individual packages, so the 'updates' repository can have newer RPMs.
The second part happens because the base image does get updated after a
stable Fedora has been released, and hence can have newer RPMs than the
'fedora' repository.
This is complicated by the fact that packages can get pulled directly
from Fedora's Koji build system into the base 'fedora' image before
they make it to one of the well-known repositories like 'fedora' or
'updates' [1]. These packages are marked as having come from the
koji-override-0 repository.
All that combined can lead to unexpected behaviour when DNF is invoked
to reinstall or swap the RPM packages in the base image. Some examples
below.
The base fedora:36 image contains glibc-minimal-langpack-2.35-20.fc36
that came from koji-override-0, while 'fedora' and 'updates' have
glibc-all-langpacks-2.35-4.fc36 and glibc-all-langpacks-2.35-22.fc36
respectively. This leads to:
STEP 8/15: RUN dnf -y swap glibc-minimal-langpack glibc-all-langpacks
Last metadata expiration check: 0:00:03 ago on Wed Feb 1 12:37:04...
Dependencies resolved.
======================================================================
Package Arch Version Repository
======================================================================
Installing:
glibc-all-langpacks x86_64 2.35-4.fc36 fedora
Removing:
glibc-minimal-langpack x86_64 2.35-20.fc36 @koji-override-0
Downgrading:
glibc x86_64 2.35-4.fc36 fedora
glibc-common x86_64 2.35-4.fc36 fedora
That's unexpected. Instead of upgrading all the glibc sub-packages to
the latest version from 'updates', it's downgrading them to the older
version from 'fedora'.
Similarly, the base fedora:36 image has bash-5.2.9-2.fc36.x86_64 from
koji-override-0, and there is bash-5.2.15-1.fc36.x86_64 in 'updates'.
This leads to:
STEP 10/15: RUN dnf -y reinstall $(<missing-docs)
Last metadata expiration check: 0:00:06 ago on Wed Feb 1 12:37:04...
Package acl available, but not installed.
No match for argument: acl
Installed package bash-5.2.9-2.fc36.x86_64 (from koji-override-0) not
available.
That's unexpected. Instead of upgrading bash to the latest version from
'updates', it's simply skipping the 'reinstall', which means that the
documentation that was stripped out in the base image doesn't get
restored.
Updating all the RPM packages in the base 'fedora' image to match the
contents of the 'updates' repository before making any changes to the
image's package set will avoid such unexpected behaviour.
Only the images for currently maintained Fedoras (ie., 36, 37 and 38)
were updated.
[1] https://docs.fedoraproject.org/en-US/quick-docs/repositories/https://github.com/containers/toolbox/pull/1226
The URLs for the RHEL Toolbx images based on the Red Hat Universal Base
Images (or UBI) are a bit more complicated to construct, in comparison
to the URLs for Fedora's fedora-toolbox images. It's not enough to just
concatenate the registry, the image's basename and the release. Some
parts of the URL depend on the release's major number, which requires
custom code.
So far, the release's major number was hard coded to 8 since only RHEL 8
Toolbx containers were supported.
To support other RHEL major releases, it's necessary to have custom code
to construct the URLs for the Toolbx images.
https://github.com/containers/toolbox/issues/1065
On enterprise FreeIPA set-ups, the subordinate user and group IDs are
provided by SSSD's sss plugin for the GNU Name Service Switch (or NSS)
functionality of the GNU C Library. They are not listed in /etc/subuid
and /etc/subgid. Therefore, its necessary to use libsubid.so to check
the subordinate ID ranges.
The CGO interaction with libsubid.so is loosely based on 'readSubid' in
github.com/containers/storage/pkg/idtools [1].
However, unlike 'readSubid', this code considers the absence of any
range (ie., nRanges == 0) to be an error as well.
More importantly, this code uses dlopen(3) and friends to dynamically
load the symbols from libsubid.so, instead of linking to libsubid.so at
build-time and having the dependency noted in the /usr/bin/toolbox
binary. This is done because libsubid.so itself depends on several
other shared libraries, and indirect dependencies can't be influenced
by the RUNPATH [2] embedded in the /usr/bin/toolbox binary [3]. Hence,
when the binary is used inside Toolbx containers (eg., as the entry
point), those indirect dependencies won't be picked from the host's
runtime against which the binary was built. This can render the binary
useless due to ABI compatibility issues. Using dlopen(3) avoids this
problem, especially because libsubid.so is only used when running on the
host.
Care was taken to not load and link libsubid.so twice to separately
validate the subordinate ID ranges for the user and the group. Note
that libsubid_init() must be passed a FILE pointer for logging.
Otherwise, it will create it's own for logging, and there's no way to
close it during dlclose(3).
Version 4 of the libsubid.so API/ABI [4] was released in Shadow 4.10,
which is newer than the versions shipped on RHEL 8 and Debian 10 [5],
and even that newer version had some problems [6]. Therefore, support
for older versions, with the relevant workarounds, is necessary.
Fortunately, the oldest that needs to be support is Shadow 4.9 because
that's when libsubid.so was introduced [7].
Note that SUBID_ABI_VERSION was only introduced with version 4 of the
libsubid.so API/ABI released in Shadow 4.10 [8]. The first release of
libsubid.so in Shadow 4.9 already had an ABI version of 3.0.0 [9], since
it was bumped a few times during development, so that's what's assumed
when SUBID_ABI_VERSION is absent.
This code doesn't set the public variables Prog and shadow_logfd that
older Shadow versions used to expect for logging, because from Shadow
4.9 onwards there's a separate function [4,10] to specify these. This
can be changed if there are libsubid.so versions in the wild that really
do need those public variables to be set.
Finally, ISO C99 is required because of the use of <stdbool.h> in the
libsubid.so API.
Some changes by Debarshi Ray.
[1] https://github.com/containers/storage/blob/main/pkg/idtools/idtools_supported.go
[2] https://man7.org/linux/man-pages/man8/ld.so.8.html
[3] Commit 6063eb27b9https://github.com/containers/toolbox/issues/821
[4] Shadow commit 32f641b207f6ddff
https://github.com/shadow-maint/shadow/commit/32f641b207f6ddffhttps://github.com/shadow-maint/shadow/issues/443
[5] https://packages.debian.org/source/buster/shadow
[6] Shadow commit 79157cbad87f42cd
https://github.com/shadow-maint/shadow/commit/79157cbad87f42cdhttps://github.com/shadow-maint/shadow/issues/465
[7] Shadow commit 0a7888b1fad613a0
https://github.com/shadow-maint/shadow/commit/0a7888b1fad613a0https://github.com/shadow-maint/shadow/issues/154
[8] Shadow commit 0c9f64140852e8d5
https://github.com/shadow-maint/shadow/commit/0c9f64140852e8d5https://github.com/shadow-maint/shadow/pull/449
[9] Shadow commit 3d670ba7ed58f910
https://github.com/shadow-maint/shadow/commit/3d670ba7ed58f910https://github.com/shadow-maint/shadow/issues/339
[10] Shadow commit 2b22a6909dba60d
https://github.com/shadow-maint/shadow/commit/2b22a6909dba60dhttps://github.com/shadow-maint/shadow/issues/325https://github.com/containers/toolbox/issues/1074
Signed-off-by: Martin Jackson <martjack@redhat.com>
Building Toolbx requires a C compiler [1], which defaults to GCC on
Fedora and CentOS Stream. It's good to explicitly require it, so that
it doesn't go missing from the build.
Showing the version of the C compiler is a big help when debugging weird
build problems involving the toolchain. A following commit will use CGO
to link to libsubid.so, which will only increase the relevance of the C
compiler.
[1] Commit c8aaed52c5https://github.com/containers/toolbox/pull/923https://github.com/containers/toolbox/pull/1218
Ever since commit bafbbe81c9, the shell completions are generated
using the Toolbx binary, and the 'completion' sub-directory no longer
has any source code, but only the build scripts to invoke the Toolbx
binary to generate them. This is a good opportunity to simplify the
layout of this Git repository by reducing the number of sub-directories.
The file containing the Bash completions had to be renamed to avoid
colliding with the name of the Toolbx binary, since they are both
generated in the same sub-directory.
https://github.com/containers/toolbox/pull/1216
The Meson adapter scripts are simple enough that they don't need
detailed descriptions for their command line arguments. The cost of
formulating succint descriptions doesn't justify the benefits.
https://github.com/containers/toolbox/pull/1216
The errors should be propagated up the call chain either verbatim or by
wrapping them with all relevant context when necessary (as long as they
don't violate the API boundaries).
The errors should be logged only when there's a break in the upward
propagation, either because they need to be reformatted before being
shown to the user or because they would expose implementation details
that aren't part of the API contract. Not logging the errors in such
cases might make it difficult to debug problems later on.
https://github.com/containers/toolbox/pull/1202
Currently, the titles of the manuals are rendered with a pair of empty
parentheses and no section title:
toolbox(1)() toolbox(1)()
NAME
toolbox - Tool for containerized command line environments...
However, they should be:
toolbox(1) General Commands Manual toolbox(1)
NAME
toolbox - Tool for containerized command line environments...
This is because the troff generated by go-md2man from Markdown has a
faulty invocation of the .TH macro [1]:
.nh
.TH toolbox(1)
.SH NAME
.PP
toolbox - Tool for containerized command line environments on Linux
It should be:
.nh
.TH toolbox 1
.SH NAME
.PP
toolbox - Tool for containerized command line environments on Linux
Original patch from Andrew Denton for Podman [2].
[1] https://www.gnu.org/software/groff/manual/groff.html
[2] Podman commit 63c779a857b55b00
https://github.com/containers/podman/pull/15621https://github.com/containers/toolbox/pull/1210
Otherwise https://www.shellcheck.net/ would complain:
Line 2479:
shift
^---^ SC2317 (info): Command appears to be unreachable. Check usage
(or ignore if invoked indirectly).
See: https://www.shellcheck.net/wiki/SC2317
Fedora Rawhide now has ShellCheck-0.9.0, which flags these new problems,
while so far it only had ShellCheck-0.8.0.
ShellCheck is correct that this is unreachable code. However, given the
lack of built-in command line parsing facilities in POSIX shell, this
code pattern has so far turned out to be quite handy. It's flexible
enough to be able to handle different combinations of commands and
options, and is easy to read. Trying to 'fix' the code will likely
cause more problems than it will solve.
Moreover, the POSIX shell implementation has been replaced by the Go
implementation quite a long time ago. It's no longer maintained and has
been kept only for historical reasons. Therefore, it's not worth
spending any significant amount of time on it.
https://github.com/containers/toolbox/pull/1211
The name of a node in a nodeset is meant to be a human-readable name. A
name with an obscure prefix like 'ci-node-' makes it look more profound
than it really is.
https://github.com/containers/toolbox/pull/1206
The 'unit tests' are no longer just unit tests. They also run a bunch
of static analysis tools like ShellCheck, codespell, gofmt and 'go vet'.
Since newer versions of these tools are generally better at catching
problems in the codebase, it will be better to run the 'unit tests' on
Fedora Rawhide with the latest versions than older stable Fedoras.
The timeout for the 'unit tests' need to be increased because Fedora
Rawhide is slower than stable Fedoras. Currently, the timeout for the
'unit tests' running on Fedora 36 is 10 minutes. Increasing it to 20
minutes when running on Fedora Rawhide wasn't enough, so maybe 30 will
be sufficient.
Note that this is only feasible because the Fedora Rawhide builds are
now more robust against stale DNF caches [1]. Otherwise, it wouldn't
have been wise to use Fedora Rawhide to test anything which isn't also
being tested elsewhere, because the Fedora Rawhide builds might have
stayed broken for extended periods of time due to reasons completely
unrelated to Toolbx.
[1] Commit 995c6d175ehttps://github.com/containers/toolbox/pull/1201https://github.com/containers/toolbox/pull/1206
This will be used by the subsequent commit to have a separate set of
dependencies for CentOS Stream 9 builds. eg., unlike Fedora, CentOS
Stream 9 doesn't have the ShellCheck, bats and fish RPMs.
https://github.com/containers/toolbox/pull/1171
Currently, the standard error and output streams of the child commands
invoked by 'meson test' are redirected to a separate log file. When the
tests fail, it's difficult, or maybe even impossible, to access this
file from the Zuul CI, and all that can be seen is something like:
1/7 shellcheck src/go-build-wrapper OK 0.04s
2/7 shellcheck profile.d/toolbox.sh FAIL 0.06s exit status 1
>>> MALLOC_PERTURB_=241 /usr/bin/shellcheck
--shell=sh
/home/zuul-worker/src/github.com/containers/toolbox/builddir/../profile.d/toolbox.sh
3/7 go fmt FAIL 0.05s exit status 1
>>> MALLOC_PERTURB_=209 /usr/bin/python3
/home/zuul-worker/src/github.com/containers/toolbox/src/meson_go_fmt.py
/home/zuul-worker/src/github.com/containers/toolbox/src
4/7 codespell FAIL 0.31s exit status 65
>>> MALLOC_PERTURB_=180 /usr/bin/codespell
--check-filenames
--check-hidden
--context 3
--exclude-file /home/zuul-worker/src/github.com/containers/toolbox/.codespellexcludefile
--skip /home/zuul-worker/src/github.com/containers/toolbox/builddir
--skip /home/zuul-worker/src/github.com/containers/toolbox/.git
--skip /home/zuul-worker/src/github.com/containers/toolbox/test/system/libs/bats-assert
--skip /home/zuul-worker/src/github.com/containers/toolbox/test/system/libs/bats-support
/home/zuul-worker/src/github.com/containers/toolbox
5/7 shellcheck toolbox (deprecated) FAIL 1.09s exit status 1
>>> MALLOC_PERTURB_=233 /usr/bin/shellcheck
/home/zuul-worker/src/github.com/containers/toolbox/builddir/../toolbox
6/7 go test OK 1.89s
7/7 go vet OK 17.60s
This doesn't have enough information to understand what caused the tests
to fail on non-interactive CI environments.
Not redirecting the standard error and output streams of the child
commands invoked by 'meson test' will readily reveal more details about
the test failures and remove the need to find the log file created by
Meson.
https://github.com/containers/toolbox/pull/1171
Otherwise codespell would complain:
: @test "create: Try to create a container with invalid custom name...
> run $TOOLBOX -y create "ßpeci@l.Nam€"
:
./test/system/101-create.bats:57: Nam ==> Name
CentOS Stream 9 has codespell-2.2.1, while so far the 'unit tests' were
being run on Fedora 36, which only has codespell-2.1.0.
This is a step towards testing on CentOS Stream 9.
https://github.com/containers/toolbox/pull/1200
CentOS Stream 9 has codespell-2.2.1, while so far the 'unit tests' were
being run on Fedora 36, which only has codespell-2.1.0.
This is a step towards testing on CentOS Stream 9.
Fallout from ecd1ced719https://github.com/containers/toolbox/pull/1200
Otherwise codespell would complain:
: {"/tmp", "/run/host/tmp", "rslave"},
> {"/var/lib/flatpak", "/run/host/var/lib/flatpak", "ro"},
: {"/var/lib/libvirt", "/run/host/var/lib/libvirt", ""},
./src/cmd/initContainer.go:61: ro ==> to, row, rob, rod, roe, rot
CentOS Stream 9 has codespell-2.2.1, while so far the 'unit tests' were
being run on Fedora 36, which only has codespell-2.1.0.
This is a step towards testing on CentOS Stream 9.
https://github.com/containers/toolbox/pull/1200
Otherwise https://www.shellcheck.net/ would complain:
Line 86:
term_just_first_character="${TERM%$term_without_first_character}"
^-- SC2295 (info): Expansions inside
${..} need to be quoted
separately, otherwise they match
as patterns.
See: https://www.shellcheck.net/wiki/SC2295
CentOS Stream 9 has ShellCheck-0.8.0, while so far the 'unit tests' were
being run on Fedora 36, which only has ShellCheck-0.7.2.
This is a step towards testing on CentOS Stream 9.
https://github.com/containers/toolbox/pull/1200
CentOS Stream 9 has golang-1.19.2, while so far the 'unit tests' were
being run on Fedora 36, which only has golang-1.18.8.
This is a step towards testing on CentOS Stream 9.
https://github.com/containers/toolbox/pull/1199
CentOS Stream 9 has codespell-2.2.1, while so far the 'unit tests' were
being run on Fedora 36, which only has codespell-2.1.0.
This is a step towards testing on CentOS Stream 9.
Fallout from 708fa593e2https://github.com/containers/toolbox/pull/1199
Different versions of ShellCheck and codespell may treat the same code
base differently. eg., these tools are currently being used on Fedora
36 as part of the 'unit tests', but CentOS Stream 9 has newer versions
that are stricter and catch several new problems.
Knowing the versions of the tools used in the tests helps to understand
these differences, and is a step towards testing on CentOS Stream 9.
https://github.com/containers/toolbox/pull/1199
Note that 'run --keep-empty-lines' counts the trailing newline on the
last line as a separate line.
Until Bats 1.7.0, 'run --keep-empty-lines' had a bug where even when a
command produced no output, it would report a line count of one [1] due
to a stray line feed character. This needs to be conditionalized, since
Fedora 35 has Bats 1.5.0.
[1] https://github.com/bats-core/bats-core/issues/573https://github.com/containers/toolbox/issues/1043
Currently, if an image was copied with:
$ skopeo copy \
containers-storage:registry.fedoraproject.org/fedora-toolbox:36 \
containers-storage:localhost/fedora-toolbox:36
... or:
$ podman tag \
registry.fedoraproject.org/fedora-toolbox:36 \
localhost/fedora-toolbox:36
... then it would show up twice in 'list' with the same name, and in the
wrong order.
Either as:
$ toolbox list --images
IMAGE ID IMAGE NAME CREATED
2110dbbc33d2 localhost/fedora-toolbox:36 1 day...
e085805ade4a registry.access.redhat.com/ubi8/toolbox:latest 1 day...
2110dbbc33d2 localhost/fedora-toolbox:36 1 day...
70cbe2ce60ca registry.fedoraproject.org/fedora-toolbox:34 1 day...
... or as:
$ toolbox list --images
IMAGE ID IMAGE NAME CREATED
2110dbbc33d2 registry.fedoraproject.org/fedora-toolbox:36 1 day...
e085805ade4a registry.access.redhat.com/ubi8/toolbox:latest 1 day...
2110dbbc33d2 registry.fedoraproject.org/fedora-toolbox:36 1 day...
70cbe2ce60ca registry.fedoraproject.org/fedora-toolbox:34 1 day...
The correct output should be similar to 'podman images', and be sorted
in ascending order of the names:
$ toolbox list --images
IMAGE ID IMAGE NAME CREATED
2110dbbc33d2 localhost/fedora-toolbox:36 1 day...
e085805ade4a registry.access.redhat.com/ubi8/toolbox:latest 1 day...
70cbe2ce60ca registry.fedoraproject.org/fedora-toolbox:34 1 day...
2110dbbc33d2 registry.fedoraproject.org/fedora-toolbox:36 1 day...
The problem is that, in these situations, 'podman images --format json'
returns separate identical JSON collections for each copy of the image,
and all of those copies have multiple names:
[
{
"Id": "2110dbbc33d2",
...
"Names": [
"localhost/fedora-toolbox:36",
"registry.fedoraproject.org/fedora-toolbox:36"
],
...
},
{
"Id": "e085805ade4a",
...
"Names": [
"registry.access.redhat.com/ubi8/toolbox:latest"
],
...
},
{
"Id": "2110dbbc33d2",
...
"Names": [
"localhost/fedora-toolbox:36",
"registry.fedoraproject.org/fedora-toolbox:36"
],
...
}
{
"Id": "70cbe2ce60ca",
...
"Names": [
"registry.fedoraproject.org/fedora-toolbox:34"
],
...
},
]
The image objects need to be flattened to have only one unique name per
copy, but with the same ID, and then sorted to ensure the right order.
Note that the ordering was already broken since commit 2369da5d31,
which started using 'podman images --sort repository'. Podman can sort
by either the image's repository or tag, but not by the unified name,
which is what Toolbx needs. Therefore, even without copied images,
Toolbx really does need to sort the images itself.
Prior to commit 2369da5d31, the ordering was correct, but copied
images would only show up once.
Fallout from 2369da5d31
This reverts parts of commit 67e210378e.
https://github.com/containers/toolbox/issues/1043
With the recent expansion of the test suite, it's necessary to increase
the timeout for Fedora Rawhide nodes to prevent the CI from timing out.
https://github.com/containers/toolbox/pull/1195
If an image was copied with:
$ skopeo copy \
containers-storage:registry.fedoraproject.org/fedora-toolbox:36 \
containers-storage:localhost/fedora-toolbox:36
... or:
$ podman tag \
registry.fedoraproject.org/fedora-toolbox:36 \
localhost/fedora-toolbox:36
... then the image ID is only showed once in 'podman images --quiet',
not twice.
A subsequent commit will use this to write tests to ensure that copied
images are correctly handled.
https://github.com/containers/toolbox/issues/1043
Note that 'run --keep-empty-lines' counts the trailing newline on the
last line as a separate line.
Until Bats 1.7.0, 'run --keep-empty-lines' had a bug where even when a
command produced no output, it would report a line count of one [1] due
to a stray line feed character. This needs to be conditionalized, since
Fedora 35 has Bats 1.5.0.
[1] https://github.com/bats-core/bats-core/issues/573https://github.com/containers/toolbox/pull/1192
Note that 'run --keep-empty-lines' counts the trailing newline on the
last line as a separate line.
Until Bats 1.7.0, 'run --keep-empty-lines' had a bug where even when a
command produced no output, it would report a line count of one [1] due
to a stray line feed character. This needs to be conditionalized, since
Fedora 35 has Bats 1.5.0.
[1] https://github.com/bats-core/bats-core/issues/573https://github.com/containers/toolbox/pull/1192
A subsequent commit will test the order in which images with and without
names are listed. It's logical for that test to come after the one
about the basic support for images without names.
https://github.com/containers/toolbox/pull/1192
Skopeo was already listed, so it didn't make sense to leave out the
others. It's useful to give the user a heads-up to make it obvious what
the requirements are.
https://github.com/containers/toolbox/pull/1194
This was making it difficult to read the Bats assertions on test
failures, by polluting it with unexpected and irrelevant output from
'podman images'. For example [1]:
not ok 39 list: Images with and without names in 12332ms
# (from function `assert' in file test/system/libs/bats-assert/src/assert.bash, line 46,
# in test file test/system/102-list.bats, line 126)
# `assert [ ${#stderr_lines[@]} -eq 0 ]' failed
# REPOSITORY TAG IMAGE ID CREATED SIZE
# registry.fedoraproject.org/fedora-toolbox 35 862705390e8b 4 weeks ago 332 MB
# REPOSITORY TAG IMAGE ID CREATED SIZE
# registry.fedoraproject.org/fedora-toolbox 35 862705390e8b 4 weeks ago 332 MB
# registry.fedoraproject.org/fedora-toolbox 34 70cbe2ce60ca 7 months ago 354 MB
#
# -- assertion failed --
# expression : [ 1 -eq 0 ]
# --
#
Fallout from 7973181136
[1] https://github.com/containers/toolbox/pull/1192https://github.com/containers/toolbox/pull/1193
This builds on top of commit 0465d78fd9034ce9.
The toolboxImage type has been renamed to Image and moved into the
podman package.
There is nothing Toolbx specific about the type - it represents any
image returned by 'podman images'. The images are only later filtered
for Toolbx images.
Secondly, having the Image type inside the podman package makes it
possible to encapsulate the unmarshalling of the JSON within the package
without exposing the raw JSON to outside consumers. This is desirable
because the unmarshalling involves tracking changes in the JSON output
by different Podman versions, and it's better to limit such details to
the podman package.
https://github.com/containers/toolbox/pull/1190
It's better to avoid single letter variables in general, because they
are so hard to grep for.
This will make the subsequent commit easier to read.
https://github.com/containers/toolbox/pull/1190
This builds on top of commit e772207831.
Currently, the JSON from 'podman images --format json' gets unmarshalled
into a []map[string]interface{} in podman.GetImages, where the maps in
the slice represent images. Each map is then marshalled back into JSON
and then again unmarshalled into a toolboxImage type.
This is wasteful. The toolboxImage type already implements the
json.Unmarshaler interface [1], since commit e772207831. Hence,
the entire JSON from 'podman images --format json' can be directly
unmarshalled into a slice of toolboxImages without involving the
[]map[string]interface{}.
A subsequent commit will move the toolboxImage type into the podman
package to more tightly encapsulate the unmarshalling of the JSON. So,
as an intermediate step in that direction, the podman.GetImages function
has been temporarily changed to return the entire JSON.
[1] https://pkg.go.dev/encoding/json#Unmarshalerhttps://github.com/containers/toolbox/pull/1190
Commit ae43560d45 had added a test with a similar intention. When
the test suite is run on a Fedora Rawhide host, it tests whether the
containers for the two previous stable Fedora releases start or not.
Fedora N-2 reaches End of Life 4 weeks after Fedora N is released [1].
So, testing the containers for Fedora Rawhide and the two previous
stable releases on a Fedora Rawhide host is a decent test of general
backwards compatibility.
However, as seen recently [2], this isn't enough to catch some known
ABI compatibility issues [3,4]. These involve toolbox binaries built
on hosts with newer toolchains that aren't meant to be run against
containers with older runtimes. A targeted test is needed to defend
against these scenarios.
The fedora-toolbox:34 image has glibc-2.33, which is old enough to be
unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.
[1] https://docs.fedoraproject.org/en-US/releases/
[2] https://github.com/containers/toolbox/pull/1180
[3] Commit 6063eb27b9https://github.com/containers/toolbox/issues/821
[4] Commit 6ad9c63180https://github.com/containers/toolbox/issues/529https://github.com/containers/toolbox/pull/1187
Fedora 32 reached End of Life on 25th May 2021:
https://docs.fedoraproject.org/en-US/releases/eol/
That's quite old because right now Fedora 35 is nearing its End of Life.
Since the tests are intended for Toolbx, not the Fedora infrastructure,
it will be better to use a newer image, because images that are too old
can get lost from registry.fedoraproject.org. The fedora-toolbox:34
image can be a drop-in replacement for the fedora-toolbox:32 image for
the purposes of this test suite, and has the advantage of being newer.
Note that fedora-toolbox:34 is also old enough to test that the toolbox
binary runs against it's build-time ABI from the host, and not the
Toolbx container's ABI, when it's invoked as the entry point of the
container [1,2]. This is important because the subsequent commit will
add a test to ensure that.
[1] Commit 6063eb27b9https://github.com/containers/toolbox/issues/821
[2] Commit 6ad9c63180https://github.com/containers/toolbox/issues/529https://github.com/containers/toolbox/pull/1187
Otherwise, there's so much spew from 'go test', including the successful
tests, that the actual failures don't stand out.
Note that, the different steps involved in building the code base are a
lot more interdependent on each other. Hence, some extra verbosity
can help understand what caused a build failure on non-interactive build
environments. In contrast, the runtime outputs from each test case are
a lot more isolated and independent from one another. The additional
verbosity from successful tests doesn't really help understand why a
particular test failed.
https://github.com/containers/toolbox/pull/1186
Currently, only a so-called high-confidence subset of the default checks
in 'go vet' are being run by 'go test' [1]. Since 'go vet' is part of
the core Go tools, it's worth trying to use more of it. After all,
golangci-lint, which is currently being run through a GitHub Action,
is running the default 'go vet' checks as one of its linters [2].
It's good to have as much of the testing wrapped inside 'meson test', as
possible, because it's easier to run locally and on other non-GitHub CI
environments like those of downstream distributors.
[1] https://pkg.go.dev/cmd/go/internal/test
[2] https://golangci-lint.run/usage/linters/https://golangci-lint.run/usage/linters/#govethttps://github.com/containers/toolbox/pull/1186
In the past, before commit d323143c46, there was either had a
dummy 'return' statement or a self-documenting 'panic' that said that
the code should not be reached. Since neither golangci-lint nor
'go vet' likes those, a comment is the only option left.
Note that the core Go tools like 'go vet' [1], but also 'go lint' [2],
explicitly don't intend to add fine-grained configuration options,
including inline directives or pragmas, to silence specific warnings.
That's something golangci-lint offers [3], to the extent that it's
supported by its linters [4]. However, golangci-lint also uses 'go vet'
as one of those linters, so it's the same problem all over again.
Therefore, between the two extremes of leaving the code difficult to
read and using a very big hammer to disable a needlessly big chuck of
'go vet', a comment is the least worst option.
[1] https://github.com/golang/go/issues/17058https://github.com/golang/go/issues/18432
[2] https://github.com/golang/lint/issues/263
[3] https://golangci-lint.run/usage/false-positives/
[4] https://golangci-lint.run/usage/linters/
Fallout from d323143c46https://github.com/containers/toolbox/pull/1185
Using the word 'containerized' gives the false impression of heightened
security. As if it's a mechanism to run untrusted software in a
sandboxed environment without access to the user's private data (such as
$HOME), hardware peripherals (such as cameras and microphones), etc..
That's not what Toolbx is for.
Toolbx aims to offer an interactive command line environment for
development and troubleshooting the host operating system, without
having to install software on the host. That's all. It makes no
promise about security beyond what's already available in the usual
command line environment on the host that everybody is familiar with.
https://github.com/containers/toolbox/issues/1020
Mention that Toolbx is meant for system administrators to troubleshoot
the host operating system. The word 'debugging' is often used in the
context of software development, and hence most readers might not
interpret it as 'troubleshooting'.
https://github.com/containers/toolbox/pull/1182
Otherwise, every zsh instance on Fedora Kinoite and Silverblue was
running into:
/etc/profile.d/toolbox.sh:30: bad substitution
... because case modification with "${VARIANT_ID^}" is undefined in
POSIX shell [1], and doesn't work with Z shell.
Fedora Silverblue got its own PRETTY_NAME (and VARIANT and VARIANT_ID)
starting from Fedora 32 [2]. Therefore, it's better to use PRETTY_NAME
and let the downstream distributor of the host operating system decide
how it should be presented to the user, instead of coming up with a
custom string. eg., PRETTY_NAME isn't the same as "Fedora $VARIANT" on
Fedora Silverblue.
One nice side-effect of this is that while VARIANT and VARIANT_ID are
optional fields, PRETTY_NAME has a well-defined fallback value of
'Linux' [3]. This makes this a little less specific to Fedora Kinoite
and Silverblue.
The rest of the welcome text was reformatted to prevent it from getting
too wide depending on the contents of PRETTY_NAME.
Fallout from 3641a0032f
[1] https://www.shellcheck.net/wiki/SC3059
[2] https://pagure.io/workstation-ostree-config/c/c18ef957d11862d32f362722931dbfdf1f5beb0d
[3] https://www.freedesktop.org/software/systemd/man/os-release.htmlhttps://github.com/containers/toolbox/issues/1017
On a couple of occasions the relevant tests didn't get triggered because
some files weren't listed [1], and on another a commit forgot to update
the list of files [2].
The objective of the CI is to reduce stress for the maintainers, and
make it easy for contributors to find out if their changes work or not.
Missing tests don't help with that, and there's no need to optimize the
tests like this unless there's a real problem to be solved.
[1] Commit deca452b27
Commit 5c27d73021
[2] Commit b1743c4927
This reverts commit c28d902089.
https://github.com/containers/toolbox/pull/1168
Some OSTree based systems, such as Endless OS, don't ship with
/usr/lib/os-release, and the os-release(5) manual says [1]:
The file /etc/os-release takes precedence over /usr/lib/os-release.
Applications should check for the former, and exclusively use its data
if it exists, and only fall back to /usr/lib/os-release if it is
missing.
[1] https://www.freedesktop.org/software/systemd/man/os-release.htmlhttps://github.com/containers/toolbox/pull/692
This is a precursor to checking that higher valued exit codes from the
command running inside the container are retained, and commands like
test(1) can be used with 'toolbox run ...' in subsequent test cases.
https://github.com/containers/toolbox/pull/1163
Currently, some of the names of the tests were too long, and had
inconsistent and verbose wording. This made it difficult to look at
them and get a gist of all the scenarios being tested. The names are
like headings. They shouldn't be too long, should capture the primary
objective of the test and be consistent in their wording.
https://github.com/containers/toolbox/pull/1161
Currently, 'meson compile' and 'meson install' were being invoked from
pre-run playbooks. This meant that a genuine build failure from either
of those commands would be shown as a RETRY_LIMIT failure by the CI.
This was misleading. It made it look as if the failure was caused by
some transient networking problem or that the CI node was too slow due
to momentary heavy load, whereas the failure was actually due to a
problem in the Toolbx sources. A genuine problem in the sources should
be reflected as a FAILURE, not RETRY_LIMIT.
However, it's worth noting that 'meson compile' invokes 'go build',
which downloads all the Go modules required by the Toolbx sources. This
is worth retaining in the pre-run playbooks since it primarily depends
on Internet infrastructure beyond the Toolbx sources.
As a nice side-effect, the CI no longer gets mysteriously stuck like
this while the Go modules are being downloaded:
TASK [Build Toolbox]
ci-node-36 | ninja: Entering directory
`/home/zuul-worker/src/github.com/containers/toolbox/builddir'
...
ci-node-36 | [8/13] Generating doc/toolbox-rmi.1 with a custom command
ci-node-36 | [9/13] Generating doc/toolbox-run.1 with a custom command
ci-node-36 | [10/13] Generating doc/toolbox.conf.5 with a custom
command
ci-node-36 | [11/13] Generating src/toolbox with a custom command
https://github.com/containers/toolbox/pull/1158
This mirrors the --preserve-fds option of Podman.
Converting an unsigned 'uint', which is what Podman uses for its
--preserve-fds option, to a string is surprisingly annoying.
strconv.Itoa [1] takes a signed 'int', which would require a cast, and
there's no unsigned counterpart. There's strconv.FormatUint [2] which
takes an unsigned 'uint64', which is better, but would still require a
cast.
So, fmt.Sprint [3] it is, if the cast is to be avoided. It's more
expensive than the other two functions, but there's no need to worry
unless it's proven to be a performance bottle neck.
Some changes by Debarshi Ray.
[1] https://pkg.go.dev/strconv#Itoa
[2] https://pkg.go.dev/strconv#FormatUint
[3] https://pkg.go.dev/fmt#Sprinthttps://github.com/containers/toolbox/issues/1066
Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Commit a22d7821cb ensured that a nested pseudo-terminal device is
only created for the process running inside the container, if the Toolbx
binary's standard input and output streams are connected to a terminal.
Therefore, 'echo ...' no longer ends with an unwanted extra carriage
return when terminal devices are absent - there's only a line feed for
the trailing newline. Hence, there's no need to use the -n flag to skip
the trailing newline.
This reverts parts of commit 16b0c5d88f.
https://github.com/containers/toolbox/issues/157
It seems that as new test cases got developed they got appended towards
the end of the file. Now that there are a non-trivial number of test
cases, it's difficult to look at the file and get a gist of all the
scenarios being tested.
It will be better to have some logical grouping -- starting with the
most basic functionality, then moving on to more advanced features,
and then finally the errors.
This is a step towards that.
https://github.com/containers/toolbox/pull/1155
Here's some historical context to understand what's going on.
In the past, before commit a22d7821cb, Podman's standard error
stream was only revealed when --verbose was used.
During that time, the standard error and output streams of the process
running inside the Toolbx container, but not 'podman exec ...' itself,
were merged into the standard output stream read and revealed by the
Toolbx binary.
Then commit a22d7821cb ensured that a nested pseudo-terminal
device is only created for the process running inside the container, if
the Toolbx binary's standard input and output streams are connected to a
terminal. This meant that the standard error stream of the container
process stayed separate from the standard output stream received by the
Toolbx binary, when terminal devices were absent. The errors from
'podman exec ...' itself continued to be separate as before.
However, Toolbx only read and revealed the standard error stream of the
spawned 'podman exec ...' process when --verbose was used. This meant
that all the errors from the container process got lost in the absence
of --verbose. This was an unintended change in behaviour caused by
commit a22d7821cb that got addressed in the subsequent commit
7cba807e45, but with yet another unintended change in behaviour.
Commit 7cba807e45 started reading and revealing the standard
error stream of the spawned 'podman exec ...' process unconditionally.
This caused the errors from both Podman and the container process to be
revealed unconditionally, which is a problem.
Podman is an implementation detail of Toolbx. Therefore, Toolbx users
shouldn't be directly exposed to errors from Podman, unless they are
using --verbose to debug a problem. On the other hand, the container
process is the outcome of a command specified by the user. So, the user
does expect to see what's going on with it.
That's the unintended change in behaviour this commit tries to fix.
Unfortunately, when Toolbx is being used non-interactively (ie., no
terminal devices), the errors from the process running inside the
Toolbx container and the errors from 'podman exec ...' itself are part
of the same standard error stream received by Toolbx. It's impossible
to distinguish between the two without deeper changes.
Hence, this commit only focuses on interactive use (ie., terminals are
present), which is where the visual appearance and presentation of error
messages really matter. Non-interactive use is programmatic use, so the
visuals don't matter so much.
Fallout from 7cba807e45https://github.com/containers/toolbox/pull/1154
The outcome of checking whether the standard input and output of the
current invocation of toolbox are connected to a terminal device is
going to stay constant for the life cycle of the process. So, checking
it repeatedly in a loop when falling back to a different command or
working directory is wasteful.
Secondly, it prevents secondary logic like this from intermingling with
the code that actually assembles the list of arguments. This makes it
easier to get a quick gist of the final command and its structure.
Fallout from a22d7821cb
This needs a directory that's going to be present on the host operating
system across various configurations of all supported distributions,
such as the hosts running the CI, but not inside the Toolbx containers.
It looks like /etc/kernel is present on both Debian and Fedora, but
absent from the fedora-toolbox images. On a Debian 10 server, it's
owned by several packages:
$ dpkg-query --search /etc/kernel
dkms, systemd, grub2-common, initramfs-tools, apt: /etc/kernel
... while on Fedora 36 Workstation:
$ rpm --file --query /etc/kernel
systemd-udev-250.8-1.fc36.x86_64
Currently, there's no way to get assert_line to use the stderr_lines
array [1]. This is worked around by assigning stderr_lines to the
'lines' array.
[1] https://github.com/bats-core/bats-assert/issues/42https://github.com/containers/toolbox/pull/1153
It seems that as new test cases got developed they got appended towards
the end of the file. Now that there are a non-trivial number of test
cases, it's difficult to look at the file and get a gist of all the
scenarios being tested.
It will be better to have some logical grouping -- starting with the
most basic functionality, then moving on to more advanced features,
and then finally the errors.
This a step towards that.
https://github.com/containers/toolbox/pull/1152
Currently, commands invoked using 'toolbox run' have a different
environment than the interactive environment offered by 'toolbox enter'.
This is because 'toolbox run' was invoking the commands using something
like this:
$ bash -c 'exec "$@"' bash [COMMAND]
... whereas, 'toolbox enter' was using something like this:
$ bash -c 'exec "$@"' bash bash --login
In the first case, the helper Bash shell is a non-interactive non-login
shell. This means that it doesn't read any of the usual start-up files,
and, hence, it doesn't pick up anything that's specified in them. It
runs with the default environment variables set up by Podman and the
Toolbx image, plus the environment variables set by Toolbx itself.
In the second case, even though the helper Bash shell is still the same
as the first, it eventually invokes a login shell, which runs the usual
set of start-up files and picks up everything that's specified in them.
Therefore, to ensure parity, 'toolbox run' should always have a login
shell in the call chain inside the Toolbx container.
The easiest option is to always use a helper shell that's a login shell
with 'toolbox run', but not 'toolbox enter' so as to avoid reading the
same start-up files twice, due to two login shells in the call chain.
It will still end up reading the same start-up files twice, if someone
tried to invoke a login shell through 'toolbox run', which is fine.
It's very difficult to be sure that the user is invoking a login shell
through 'toolbox run', and it's not what most users will be doing.
https://github.com/containers/toolbox/issues/1076
For the most part, this fixes a minor cosmetic issue for users, but it
does make the code less misleading to read for those hacking on Toolbx.
Further details below.
Commands are invoked inside a Toolbx from a helper shell invoked by
capsh(1). Unless capsh(1) is built with custom options, the helper
shell is always bash, not /bin/sh:
$ capsh --caps="" -- -c 'echo "$(readlink /proc/$$/exe)"'
/usr/bin/bash
( The possibility of capsh(1) using a different shell, other than Bash,
through a custom build option is ignored for the time being. If there
really are downstream distributors who do that, then this can be
addressed one way or another. )
Secondly, the name assigned to the embedded command string's '$0' should
only be the basename of the helper shell's binary, not the full path, to
match the usual behaviour:
$ bash -c 'exec foo'
bash: line 1: exec: foo: not found
With 'toolbox run' it was:
$ toolbox run foo
/bin/sh: line 1: exec: foo: not found
Error: command foo not found in container fedora-toolbox-36
https://github.com/containers/toolbox/pull/1147
Using 'true' is likely going to be quicker than launching the entire
shell (ie., /bin/sh).
Note that 'toolbox run' already invokes a wrapper shell via capsh(1)
before invoking the user-specified command. So, this was the second
instance of a shell.
https://github.com/containers/toolbox/pull/1145
It was decided in commit 950f510872 that golang.org/x/* would be
used for the IsTerminal() API, not github.com/mattn/go-isatty. However,
github.com/mattn/go-isatty had crept in through commits f49df914f4
and a22d7821cb.
The size savings seem to have been lost, because with Go 1.18.6, the
binary size actually grew from 9410616 bytes to 9410912. However, it
seems better to stick to packages from the golang.org domain, whenever
possible.
https://github.com/containers/toolbox/pull/1144
... at https://containertoolbx.org/install/
There are some minor benefits to always invoking meson(1), as opposed to
directly invoking the underlying build backend, like 'ninja'.
It's one less command to be aware of. Secondly, in theory, Meson can be
used with backends other than Ninja (see 'meson configure'), even though
Ninja is the most likely option for building Toolbx because it's only
supported on Linux.
https://github.com/containers/toolbox/pull/1142
If 'systemd-tmpfiles --create' is called as a non-root user, then it
causes:
--- stdout ---
Calling systemd-tmpfiles --create ...
--- stderr ---
Failed to open directory 'cryptsetup': Permission denied
Failed to open directory 'certs': Permission denied
Failed to create directory or subvolume "/var/spool/cups/tmp":
Permission denied
...
...
...
Traceback (most recent call last):
File "toolbox/meson_post_install.py", line 26, in <module>
subprocess.run(['systemd-tmpfiles', '--create'], check=True)
File "/usr/lib64/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['systemd-tmpfiles',
'--create']' returned non-zero exit status 73.
Since, systemd-tmpfiles(8) can't be used like this as a non-root user,
there's no point in calling it and needlessly failing the build.
Unfortunately, Meson doesn't seem to offer a way to get the process'
effective UID inside its scripts. Therefore, this leaves a spurious
build-time dependency on systemd when building as a non-root user.
https://github.com/containers/toolbox/pull/1140
The bash-completion and fish dependencies were already optional - the
shell completions for Bash and fish won't be generated and installed if
they are absent; and there's no dependency required for Z shell. So the
install_completions build option wasn't reducing the dependency burden.
The build option was a way to disable the generation and installation of
the shell completions, regardless of whether the necessary dependencies
are present or not. The only use-case for this is when installing to a
non-system-wide prefix while hacking on Toolbox as a non-root user,
because the locations for the completions advertised by the shells' APIs
might not be accessible. Being able to disable the completions prevents
the installation from failing.
A different way of ensuring a smooth developer experience for a Toolbx
hacker is to offer a way to change the locations where the shell
completions are installed, which is necessary and beneficial for other
use-cases.
Z shell, unlike Bash's bash-completion.pc and fish's fish.pc, doesn't
offer an API to detect the location for the shell completions. This
means that Debian and Fedora use different locations [1, 2]. Namely,
/usr/share/zsh/vendor-completions and /usr/share/zsh/site-functions.
An option to specify the locations for the shell completions can
optimize the build, if there's an alternate API for the location that
doesn't involve using bash-completion.pc and fish.pc as build
dependencies. eg., Fedora provides the _tmpfilesdir RPM macro to
specify the location for vendor-supplied tmpfiles.d(5) files, which
makes it possible to avoid having systemd.pc as a build dependency [3].
Fallout from bafbbe81c9
[1] Debian zsh commit bf0a44a8744469b5
https://salsa.debian.org/debian/zsh/-/commit/bf0a44a8744469b5https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620452
[2] https://src.fedoraproject.org/rpms/zsh/blob/f37/f/zsh.spec
[3] Fedora toolbox commit 9bebde5bb60f36e3
https://src.fedoraproject.org/rpms/toolbox/c/9bebde5bb60f36e3https://github.com/containers/toolbox/pull/1123https://github.com/containers/toolbox/pull/840
Noticed today that `man xargs` was returning the POSIX manpage instead
of the one shipped by `findutils`.
Signed-off-by: Jonathan Lebon <jonathan@jlebon.com>
The following packages have also been added to Fedora 38 image:
mesa-dri-drivers
mesa-vulkan-drivers
vulkan-loader
Fixing up fedora 38 image to match the changes made earlier on fedora 37.
Signed-off-by: Nieves Montero <nmontero@redhat.com>
This new packet allows the user to set a locale inside the
toolbox and make locale dependent commands work
https://github.com/containers/toolbox/issues/60
Signed-off-by: Nieves Montero <nmontero@redhat.com>
In 54a2ca1 image caching has been done by first pulling using Podman and
then moving the image from the local container store to a directory. The
pull to the local container store can be skipped and instead we can use
Skopeo to directly save the pulled image into a directory.
On my machine this reduced the time of the system test setup "test" by
about 50 seconds. This speed-up largely depends on the available network
connection, though.
The following packages have also been added to images f36 and f35:
mesa-dri-drivers
mesa-vulkan-drivers
vulkan-loader
https://github.com/containers/toolbox/pull/1124
Signed-off-by: Nieves Montero <nmontero@redhat.com>
The following packages have been added to the
image to make OpenGL and Vulkan work:
mesa-dri-drivers
mesa-vulkan-drivers
vulkan-loader
https://github.com/containers/toolbox/issues/1110
Signed-off-by: Nieves Montero <nmontero@redhat.com>
If systemd-tmpfiles(8) couldn't be spawned, then the attempted command
is already included in the traceback:
Traceback (most recent call last):
File "toolbox/meson_post_install.py", line 26, in <module>
subprocess.run(['systemd-tmpfiles', '--create'], check=True)
File "/usr/lib64/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['systemd-tmpfiles',
'--create']' returned non-zero exit status 73.
https://github.com/containers/toolbox/pull/1122
In short, it's a lot of effort to cover all possible exceptions that can
be thrown, and things work reasonably well even without handling them.
Since this is just part of the build, there's no point in complicating
things for aesthetic reasons.
More details below.
First, not every runtime error leads to a subprocess.CalledProcessError.
It's only thrown if the spawned process returns with a non-zero exit
code. There can be other problems. eg., if the gofmt file isn't
executable then a PermissionError is thrown that's currently not
handled, and the wrapper Python script returns with a non-zero exit
code:
Traceback (most recent call last):
File "toolbox/src/meson_go_fmt.py", line 28, in <module>
gofmt = subprocess.run(['gofmt', '-d', source_dir],
capture_output=True, check=True)
File "/usr/lib64/python3.10/subprocess.py", line 501, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib64/python3.10/subprocess.py", line 969, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib64/python3.10/subprocess.py", line 1845, in
_execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'gofmt'
Second, when a subprocess.CalledProcessError is thrown, the wrapper
Python script will still return with a non-zero exit code with an
understandable error message, even if the exception isn't handled. eg.,
if 'meson install' is called without the adequate permissions, then
systemd-tmpfiles(8) will return with a non-zero exit code, which shows
up as:
--- stdout ---
Calling systemd-tmpfiles --create ...
--- stderr ---
Failed to open directory 'cryptsetup': Permission denied
Failed to open directory 'certs': Permission denied
Failed to create directory or subvolume "/var/spool/cups/tmp":
Permission denied
...
...
...
Traceback (most recent call last):
File "toolbox/meson_post_install.py", line 26, in <module>
subprocess.run(['systemd-tmpfiles', '--create'], check=True)
File "/usr/lib64/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['systemd-tmpfiles',
'--create']' returned non-zero exit status 73.
Similarly, if there problems generating the shell completions:
--- stderr ---
Error: unknown command "__completion" for "toolbox"
Run 'toolbox --help' for usage.
exit status 1
Traceback (most recent call last):
File "toolbox/completion/generate_completions.py", line 35, in
<module>
output = subprocess.run(['go', 'run', '.', '__completion',
completion_type], check=True)
File "/usr/lib64/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['go', 'run', '.',
'__completion', 'bash']' returned non-zero exit status 1.
https://github.com/containers/toolbox/pull/1122
Cobra provides a default command 'completion' that is always visible.
The reverted change caused an additional command 'completion' to show up
in the list because the then called command '__completion' didn't
override the default one. This became apparent due to d69ce6794b
dynamically generating completion arguments for the 'help' command.
This reverts commit 4469774fb1.
https://github.com/containers/toolbox/pull/1055https://github.com/containers/toolbox/pull/1121
By default, the value of the 'build_by_default' argument is determined
by the value of the 'install' argument, which was set to 'true' once the
Go implementation was considered stable enough for end users.
Fallout from 0b3c66434ehttps://github.com/containers/toolbox/pull/1116
When describing the --authfile option, the word 'private' is used to
refer to images needing authentication. Using the same word shortens
the text so that the word 'custom' can be used in the same way as in the
other examples.
https://github.com/containers/toolbox/pull/1107
It turns out that Viper's custom error implementations use non-pointer
receivers, whereas often people assume pointer receivers. This can
cause confusion when trying to use errors.As(...) with those errors [1].
Secondly, Viper may or may not throw ConfigFileNotFoundError depending
on its build tags.
[1] https://github.com/spf13/viper/issues/1139https://github.com/containers/toolbox/pull/1105
Currently, the container name and release are only validated if they
were specified as command line options. Neither the value of release
in the configuration file nor the container name generated from an
image are validated.
There's also a lot of repeated code in the command front-ends to
validate the container name and release. This opens the door for
mistakes. Any adjustment to the code must be repeated elsewhere, and
there are subtle interactions and overlaps between the validation code
and the code to resolve container and image names.
It's worth noting that the container and image name resolution happens
for both the command line and configuration file options, and generates
the container name from the image when necessary.
Therefore, validating everything while resolving cleans up the command
front-ends and increases the coverage of the validation.
This introduces the use of sentinel error values and custom error
implementations to identify the different errors that can occur while
resolving the container and images, so that they can be appropriately
shown to the user.
https://github.com/containers/toolbox/pull/1101
Currently, if an invalid or unsupported string is specified as the
distro on the command line or in the configuration file, then it would
silently fallback to Fedora. This shouldn't happen.
It should only fallback to Fedora when no distro was specified and
there's no supported Toolbox image matching the host operating system.
If a distro was explicitly specified then it should either be supported
or it should error out.
The test cases were resurrected from commit 8b6418d8aa.
https://github.com/containers/toolbox/issues/937https://github.com/containers/toolbox/pull/1080
The terms 'default' and 'fallback' are used to mean very specific
things in this context.
The 'default' values are those that are used when the 'create', 'enter'
and 'run' commands were used without any option. These values are
picked to match the host operating system.
However, if there's no supported Toolbox image matching the host
operating system, and no options were provided to the 'create', 'enter'
and 'run' commands, then the 'fallback' values are used as a last
resort.
Consistently using this terminology leads to a clear mental model and
makes the code easier to read.
This rough arrangement of the code was already being used for
'release', and has now been been extended to 'container name prefix'
and 'distro'. The suffix for the 'fallback' values was simplified to
'Fallback', instead of 'DefaultFallback'.
https://github.com/containers/toolbox/issues/937https://github.com/containers/toolbox/pull/1080
Figuring out the container name prefix for a given image only needs to
happen as part of resolving the final Toolbox container name from the
given command line and configuration options.
Fallout from c990fb43cahttps://github.com/containers/toolbox/pull/1098
Otherwise, Meson complains:
completion/meson.build:4: WARNING: Project targeting '>= 0.58.0' but
tried to use feature deprecated since '0.56.0':
dependency.get_pkgconfig_variable. use
dependency.get_variable(pkgconfig : ...) instead
Fallout from bafbbe81c9https://github.com/containers/toolbox/pull/1096
The -Dmigration_path_for_coreos_toolbox option enables a different code
path that's currently not tested by the CI at all. In fact, since it's
a build-time option, the corresponding code path is not even built by
the CI.
To properly support the -Dmigration_path_for_coreos_toolbox option, it
needs to be covered by the CI. This is a step in that direction by
running the unit tests on it.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the build and installation steps for builds with
and without the -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the installation of RPM packages, Git submodule
handling, and the listing of various debug and version information for
builds with and without -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
This will provide a path forward to those who stumble across the POSIX
shell implementation and don't know how to use the Go implementation.
https://github.com/containers/toolbox/pull/1094
The subsequent commit will touch the POSIX shell implementation, and
hence ShellCheck needs to be run on it.
As long as the POSIX shell implementation is part of the Git repository,
ShellCheck needs to keep running on it, unless it causes some serious
problems. The ShellCheck test is very fast, and the reassurance and
mental peace that it provides is invaluable.
This reverts commit 8c1d441916.
https://github.com/containers/toolbox/pull/1094
This isn't causing any problems at the moment. However, the test can
break if the order in which the command line arguments are validated
changes. eg., if the presence of a command is checked before the
release, then the error message will be different.
Fallout from 8b6418d8aahttps://github.com/containers/toolbox/pull/1091
From now on, Debarshi Ray <debarshir@gnome.org> will show up as
Debarshi Ray <rishi@fedoraproject.org>.
Toolbox isn't quite a GNOME project (it doesn't use elements from the
GNOME platform, like GLib), even though it's part of the same
ecosystem and many Toolbox contributors are also GNOME contributors.
Toolbox was conceived to improve the developer experience on Fedora
Silverblue, and expanded over time to cover other use-cases (eg.,
troubleshooting the operating system) and Fedora editions (eg., CoreOS
and Workstation). Even though there's a growing number of users on
other distributions, they are not the primary reason for Toolbox to
exist.
Toolbox heavily depends on Podman, and as a result is more aligned with
the Containers organization on GitHub than anything else, which is
driven, to a large degree, by Fedora contributors.
Hence, my desire to use my Fedora identity.
https://github.com/containers/toolbox/pull/1083
When a command is executed with toolbox run and it returns a non-zero
exit code, it is just ignored if that exit code is not handled. This
prevents users to identify errors when executing commands in toolbox.
With this fix, the exit codes of the invoked command are propagated
and returned by 'toolbox run'. This includes even exit codes returned
by Podman on error.
https://github.com/containers/toolbox/pull/1013
Co-authored-by: Ondřej Míchal <harrymichal@seznam.cz>
Without stderr being attached stderr output of the invoked command goes
into stdout.
Behaviour before:
; output="$(toolbox run /etc)"
Error: failed to invoke command /etc in container <name-of-container>
; echo -e "$output"
/bin/sh: line 1: /etc: Is a directory
/bin/sh: line 1: exec: /etc: cannot execute: Is a directory
Behaviour after:
; output="$(toolbox run /etc)"
/bin/sh: line 1: /etc: Is a directory
/bin/sh: line 1: exec: /etc: cannot execute: Is a directory
Error: failed to invoke command /etc in container <name-of-container>
; echo -e "$output"
https://github.com/containers/toolbox/pull/1013
Passing '--tty' to 'podman exec' unconditionally causes Podman to
allocate a pseudo-TTY for the command execution. This causes problems
with piping (values not being piped in and values being piped out with
carriage return at the end of a line). The solution is to track the
presence of a terminal on stdin/stdout and based on its presence use the
'--tty' flag.
Original behaviour:
; echo foo | toolbox run less
; toolbox echo foo | od -c
0000000 f o o \r \n
0000005
New behaviour:
; echo foo | toolbox run less
foo
; toolbox echo foo | od -c
0000000 f o o \n
0000004
As seen in the 'Piping in' example, the value gets only printed into
stdout. Not ideal from the point of view of using 'less' (or similar
tools) but still a move forward.
Based on a discussion in Podman's bugtracker[0].
Fixes https://github.com/containers/toolbox/issues/157
Fixes https://github.com/containers/toolbox/issues/848
[0] https://github.com/containers/podman/issues/9718https://github.com/containers/toolbox/pull/1013
Calling 'podman system cleanup' causes problems with containers/images
in a separate Podman root. Despite being stored elsewhere, they are
still under Podman's influence and the cleanup removes them. Also,
running containers (outside the scope of the tests) still got affected
by this call and e.g., lost the ability to follow terminal size changes.
Despite the raised concerns, to ensure proper cleanup of any Podman
state, the reset still needs to be done. Thus, do it only once during
the test suite teardown, moving the potential source of problems to a
single position..
https://github.com/containers/toolbox/pull/1024
The previous commit added a means to generating the completion scripts
and this one plugs that into the build system.
A new build option 'install_completions' has been introduced. Set to
'True' by default.
Completions for bash and fish use pkg-config for getting the preferred
install locations for the completions. If the packages are not
available, fallbacks are in-place.
The 'completion' subdir has been kept to work around the ideology of
Meson that does not allow creating/outputing files in subdirectories nor
using the output of custom_target() in install_data().
https://github.com/containers/toolbox/pull/840
Cobra (the CLI library) has an advanced support for generating shell
completion. It support Bash, Zsh, Fish and PowerShell. This offering
covers the majority of use cases with some exceptions, of course.
The generated completion scripts have one behavioral difference when
compared to the existing solution: flags (--xxx) are not shown by
default. User needs to type '-' first to get the completion.
https://github.com/containers/toolbox/pull/840
Co-authored-by: Ondřej Míchal <harrymichal@seznam.cz>
Using a non-supported distribution via `--distro` resulted in a silent
fallback to the Fedora image which confuses users. When a faulty release
format was used with `--release` a message without any hint about the
correct format was shown to the user.
This separates distro and release parsing into two chunks that have
greater control over error reporting and provides more accurate error
reports for the user.
Fixes https://github.com/containers/toolbox/issues/937https://github.com/containers/toolbox/pull/977
The deprecated golang.org/x/crypto/ssh/terminal API was replaced with
golang.org/x/term. Now, every invocation of 'go build' insists on
updating src/go.mod to drop the 'indirect' marker from
golang.org/x/term.
Fallout from d323143c46https://github.com/containers/toolbox/pull/982
It's only necessary to call 'systemd-tmpfiles --create' when building
and installing from source on the host operating system.
It's not needed when using a pre-built binary downstream package,
because:
* When 'meson install' is called as part of building the package,
that's not when the temporary files need to be created. They need
to be created when the binary package is later downloaded and
installed by the user.
* Downstream tools can sometimes handle it automatically. eg., on
Fedora, the systemd RPM installs a trigger that tells RPM to call
'systemd-tmpfiles --create' automatically when a tmpfiles.d snippet
is installed.
It's also not needed when installing inside a toolbox container because
the files that 'systemd-tmpfiles --create' is supposed to create are
meant to be on the host.
Downstream distributors set the DESTDIR environment variable when
building their packages. Therefore, it's used to detect when a
downstream package is being built.
Unfortunately, environment variables are messy and, generally, Meson
doesn't support accessing them inside its scripts [1]. Therefore, this
adds a spurious build-time dependency on systemd for downstream
distributors. However, that's probably not a big problem because all
supported downstream operating systems are already expected to use
systemd for the tmpfiles.d(5) snippets to work.
[1] https://github.com/mesonbuild/meson/issues/9https://github.com/containers/toolbox/issues/955
For the sake of greater control over the testing of images and for having an
infrustructure for hosting images that are not endorsed by the distirbutions.
The images are to be rebuilt every day at midnight.
https://github.com/containers/toolbox/pull/973
Defining the YAML anchor as part of the Rawhide tests, instead of the
Fedora 34 test, will prevent it from getting lost by mistake when
Fedora 34 reaches its End of Life.
https://github.com/containers/toolbox/pull/971
Currently, the CI has been frequently timing out when running the unit
tests. It's possible that the current 5 minute timeout isn't enough,
because it's significantly lower than the 20 minute timeout on stable
Fedoras for the system tests.
Increase the timeout to 10 minutes to see if that makes the CI more
stable.
https://github.com/containers/toolbox/pull/970
Currently, the CI has been frequently timing out on Fedora Rawhide
nodes, and it's not clear why that is. One possibility is that this is
due to Rawhide using Linux kernels that are built with debugging
enabled, which makes it slower than released Fedoras. So it might be a
matter of just increasing the timeout.
Currently, the timeout for stable Fedoras is 20 minutes, and that for
Rawhide is 22 minutes. An attempt to increase the Rawhide timeout to 30
minutes didn't succeed, so maybe 45 minutes will be sufficient.
https://github.com/containers/toolbox/pull/964
In version 1.1.2 of Cobra has been included a change[0] that changes
how custom usage functions are handled.
Example of the wrong behaviour:
$ toolbox --foo
Error: unknown flag: --foo
Run 'toolbox --help' for usage.Error: Run 'toolbox --help' for usage.
Desired behaviour:
$ toolbox --foo
Error: unknown flag: --foo
Run 'toolbox --help' for usage.
A workaround is to define a template string for the usage instead. The
template uses the templating language of Go[1]. See the default
template string in version 1.2.1[2].
Because the template is set only once, the executableBase needs to be
set before the template is applied. That required the move of
setUpGlobals() into init() of the cmd package. This is a better place
for the function call as init() is called earlier than Execute()[3].
Upstream issue: https://github.com/spf13/cobra/issues/1532
[0] https://github.com/spf13/cobra/pull/1044
[1] https://pkg.go.dev/text/template
[2] https://github.com/spf13/cobra/blob/v1.2.1/command.go#L491
[3] https://golang.org/doc/effective_go#inithttps://github.com/containers/toolbox/pull/917
This will be used by the subsequent commit to add a page to document
the configuration file, which should go into section 5 of the manual.
https://github.com/containers/toolbox/pull/963
Not all file are equal when it comes to testing. Unit tests are related
strictly to the source code and documentation changes do not concern it.
System tests have a wider range of influence but documentation and some
other areas also do not concern them.
I'm unsure about the effect of this change on the periodic pipeline
execution.
https://github.com/containers/toolbox/pull/948
github.com/coreos/toolbox bind mounts the entire /run from the host
operating system into the toolbox container. Due to this, when run
rootful, the /run/.containerenv created by Podman inside the container
is also seen on the host. This confuses Toolbox into thinking that it's
running inside a container, even when it's running on the host.
This is an attempt to differentiate between a toolbox container and
the host by looking at the 'container' environment variable, so that
the user can be presented with a more helpful error message.
https://bugzilla.redhat.com/show_bug.cgi?id=1998191https://github.com/containers/toolbox/pull/951
Commit 6c86cabbe5 changed the command line interface to behave
a lot similar to that of github.com/coreos/toolbox, which makes things
easier for those switching over from it. Make it conditional so that
only those OS distributors who truly need it may enable it, and
restore the previous behaviour as the default.
The tests were updated to test the default behaviour that the vast
majority of users would be seeing. Ideally, the test suite would be run
twice with the migration path turned off and on. However, that would
require a more intrusive surgery of the test suite and likely make it
slower. It might not be worth the hassle because of the small number
of users who should be using the migration path.
Note that the copyright and license notices really must use C++-style
// line comments, because build constraints can only be preceded by
blank lines and other line comments. C-style /* */ block comments can't
precede the build constraints.
This reverts commit ca899c8a56 and parts
of commit 3aeb7cf288.
[1] go help buildconstraint
https://pkg.go.dev/cmd/go#hdr-Build_constraintshttps://github.com/containers/toolbox/pull/951
This will be used by the subsequent commit to highlight some of the
more common commands that new user is likely to be interested in, when
none has been specified.
https://github.com/containers/toolbox/pull/951
Commit 6c86cabbe5 changed the command line interface to behave
a lot similar to that of github.com/coreos/toolbox, which makes things
easier for those switching over from it.
However, it makes things confusing for the vast majority of users who
have never used coreos/toolbox. The Toolbox CLI aims to be friendly to
new users by being self-documenting and offering a smooth onboarding
experience. It's jarring to new users when 'toolbox', without any
commands specified, suggests that it needs to perform a big download.
It's difficult to document two different sets of CLIs, and if the
manuals don't mention the second behaviour, then it just leaves the
users even more confused.
Hence, it will be good to keep the migration path for coreos/toolbox
behind a build-time option, so that only those OS distributors who
truly need it may enable it without impacting others. Fortunately,
coreos/toolbox doesn't have any manuals, which means that there's no
need to conditionalize the documentation.
This commit merely adds the build-time option. Subsequent commits will
use this to actually conditionalize the code.
https://github.com/containers/toolbox/pull/951
Some downstream distributors like RHEL don't have patchelf(1). Relying
on patchelf(1) during the build will make it difficult for such
downstreams to distribute Toolbox.
Fortunately, the path of the dynamic linker (ie., PT_INTERP) is
hardcoded in the ABI specification of each architecture [1]. This means
that Toolbox's build system can keep it's own architecture to dynamic
linker mapping, and specify it during the build through the GNU ld
linker's --dynamic-linker flag, as opposed to using a tool like
patchelf(1) to change the path of the dynamic linker in the built
binary to the one inside /run/host. Currently, the list of
architectures covers the ones that Fedora builds for.
[1] https://sourceware.org/glibc/wiki/ABIListhttps://github.com/containers/toolbox/pull/942
Now that there's a website at https://containertoolbx.org/ it makes
more sense to link to it instead of linking back to the same location
where the README.md resides.
The Toolbox repository was moved to the 'containers' organization some
time ago already[0]. Containers marked with the label:
com.github.debarshiray.toolbox=true
will remain supported but new containers will not be created with it.
https://github.com/containers/toolbox/pull/510
[0] de5e5df9b7
We need to know if the latest changes in the libc (that is dynamically
linked to the binary) causes problems in containers based on older
releases of Fedora.
The estimate of the version numbers is very crude and does not follow
the upstream schedule. That should not be a problem, though.
A part of an existing test has been reused and made into a helper
function to implement this.
This increases the run time of the test suite on Rawhide which already
takes longer than the same test suite on released versions of Fedora.
Make up for it by increasing the timeout by 2 minutes.
https://github.com/containers/toolbox/pull/899
The 'die' function is a remnant from times before the system tests
rewrite. It served for writing an error message and then failing
the test. Since the rewrite it is no longer present. Instead, simply
use 'false' in case a caching step fails.
Fallout from da6b6a7c5ahttps://github.com/containers/toolbox/pull/899
This will pair with a future change to `shell.Run()` so that we capture
the child process stderr.
But actually this change on its own is enough since `shell.Run()`
provides an error message when the invoked command was not found or when
some other unknown error has happened.
Before:
Error: failed to remove password for user walters`
After:
Error: failed to remove password for user walters: passwd(1) not found`
which helps immediately pinpoint the problem.
I didn't try to go through and change *all* the `shell.Run()`
invocations, but if accepted I may do it (or someone else can).
https://github.com/containers/toolbox/pull/945
Currently, the entry point of a Toolbox container runs updatedb(8) on
start-up, which can be very I/O intensive. This might be a hindrance
when troubleshooting performance problems on a host, or when
re-creating containers somewhat more frequently.
Users can install the mlocate RPM and restart their containers to
enable locate(1).
Only the images for currently maintained Fedoras (ie., 34, 35 and 36)
were updated.
https://github.com/containers/toolbox/pull/938
There's no need to specify a CMD in a Toolbox image because it's
specified by 'toolbox create', through 'podman create', when creating a
container.
A CMD was specified [1] because the Fedora Container Guidelines
requires it [2]. The idea behind the guidelines is that the right
thing should happen when one runs:
$ podman run <image>
However, that only makes sense for images targeting single service
containers. Toolbox containers and images are different - they are not
meant to be used like that to run a single one-off service.
Conceptually, 'running' a Toolbox container is expected to provide the
user with a reasonable interactive command line experience. Arguably,
that means offering something like /bin/bash, not /bin/sh.
Also, note that when the CMD was introduced [1], Toolbox containers
were actually created, through 'podman create', with /bin/sh as their
entry points. So, it did make some sense. However, things have changed
since then [3]. The entry point is now 'toolbox init-container'. It's
not possible to mention it in the Toolbox image because the
/usr/bin/toolbox binary isn't present in the image, and it's not meant
to be present.
Therefore, today, /bin/sh is simply not the right fit for a Toolbox
image's CMD. A better option would be /bin/bash.
Note that the fedora base images have their CMD set to /bin/bash, which
is inherited by the fedora-toolbox images.
So, there are two options. Either repeat the same CMD in the
fedora-toolbox images and satisfy the guidelines, or take some
liberties and let the CMD be inherited from the fedora base images.
This commit takes the latter option. People tend to use the
fedora-toolbox images as the starting point for other custom Toolbox
images, sometimes for other operating system distributions. It's
better to keep them minimal to avoid implying extra requirements. In
this case, the CMD is an abstract concept, and the actual entry point
is 'toolbox init-container' as specified by 'toolbox create'.
Specifying /bin/bash might discourage people from creating custom
images that are only meant to have /bin/zsh.
Also, note that the current CMD was actually '/bin/sh -c /bin/sh', not
/bin/sh. Unless a CMD is specified as an array of command line
arguments, it's passed as a single argument to '/bin/sh -c' [4]. So,
this:
CMD foo bar
... is the same as:
CMD [ "/bin/sh", "-c", "foo bar" ]
Only the images for currently maintained Fedoras (ie., 34 and 35) were
updated.
This reverts commit 5cc2678a36.
[1] Commit 5cc2678a36
[2] https://docs.fedoraproject.org/en-US/containers/guidelines/creation/
[3] Commit 8b84b5e460https://github.com/containers/toolbox/pull/160
[4] https://docs.docker.com/engine/reference/builder/#cmdhttps://github.com/containers/toolbox/issues/885
Instead of typing out two function names to set up the test environment,
type out only one. We never know if a new set up function will show up.
https://github.com/containers/toolbox/pull/818
This allows to run the test suite without having to worry about blasting
the whole local state of Podman.
This is done by creating a configuration file with a custom path for the
storage of Podman and specifying the config file using an env var.
The used location for the temporary storage is located either under
XDG_CACHE_HOME and if the one is not defined, $HOME/.cache is used
instead. The data are namespaced. This follows the XDG Base Directory
Specification[0]. Other locations could be /tmp or /run but those
locations usually use tmpfs and that filesystem can not be used by
Podman[1] due to missing features in tmpfs.
https://github.com/containers/toolbox/pull/818
[0] https://specifications.freedesktop.org/basedir-spec/latest/index.html
[1] https://github.com/containers/podman/issues/10693#issuecomment-863007516
Currently, on Fedora, a nested instance of Z shell inside a Toolbox
container renders the PS1 like this:
\[\]⬢\[\][\u@\h \W]\$
Notice that Z shell doesn't like that the terminal escape sequences
for the foreground colour are wrapped in '\[' and '\]' [1], and doesn't
understand the special characters like '\u' and '\h'.
This is fixed by making the PS1 specific to the shell. The prompt for
Z shell is based on the default prompt used on Fedora, just like the
one for Bash.
Note that this only affects nested instances of Z shell because of the
way the start-up scripts for Z shell are written on Fedora. Toolbox
invokes top-level shell as a login shell, and for those the PS1 set by
profile.d/toolbox.sh is overwritten by the operating system's default
in /etc/zshrc. See:
https://bugzilla.redhat.com/show_bug.cgi?id=2026749
[1] Commit bc1a816ea3https://github.com/debarshiray/toolbox/issues/190https://github.com/containers/toolbox/pull/936
All these tools were only used by the POSIX shell implementation. The
Go implementation never used them.
Note that the test suite still invokes id(1) inside a container.
However, it's not a user-visible requirement, and hence is not a hard
requirement for Toolbox images.
https://github.com/containers/toolbox/pull/934
The util-linux package was added to ensure the presence of the mount(8)
command. Currently the package is already pulled in by various
dependencies. Therefore, it doesn't increase the size of the image, but
serves as a safeguard against any inadvertent changes.
Note that starting from Fedora 35 onwards, the fedora base images no
longer have mount(8), which increases the importance of this change.
Only the images for currently maintained Fedoras (ie., 34 and 35) were
updated.
https://github.com/containers/toolbox/issues/929
It's true that the fedora base images no longer come with
coreutils-single, but they used to, and the ubi base images still do.
Therefore, it's worth being extra defensive about this.
It's better to make the build system execute one extra redundant
command than expose users to a bug because of a change that snuck in
unnoticed.
Only the images for currently maintained Fedoras (ie., 34 and 35) were
updated.
This reverts commit 033ed71ec1.
https://github.com/containers/toolbox/pull/931
When running rootless, files and directories bind mounted from the
host operating system can have their ownership listed as
nobody:nobody. This is because the UIDs and GIDs that actually own
those locations are not available inside the container.
Some distribution packages are particular about the file ownerships of
some of these locations. eg., Fedora's filesystem, flatpak and
libvirt-libs RPMs. Encountering nobody:nobody as the owner can fail
package management transactions involving such packages leading to
unforeseen consequences.
Therefore, configure RPM to leave these locations alone.
https://github.com/containers/toolbox/pull/640
The location for public shared libraries can change from one operating
system distribution to another. eg., while Fedora uses /usr/lib and
/usr/lib64, depending on the hardware architecture, Debian uses paths
like /usr/lib/x86_64-linux-gnu. Therefore, it's best not to assume
anything and ask the toolchain.
https://github.com/containers/toolbox/pull/923
Unlike the following test this one tests using the content of the
toolbox(1) manual page in man. man has to be present in PATH for this
test to be relevant.
Also, this changes the text used to test the output. The current text
can be found in the added short help message and that causes the test
to pass even though it should not. Instead, look for the text in the
"header" of the manual page.
https://github.com/containers/toolbox/pull/837
Fedora CoreOS systems do not have the man command installed. Running
toolbox --help on such a system results in a "man(1) not found" error.
As a compromise for systems without man, we added a simple help text
showing the most commonly used toolbox commands and an URL that direct
users to the Toolbox website where they can find the manuals in Markdown
format.
Fixes#713https://github.com/containers/toolbox/pull/837
pkg/utils has been in Go Toolbox since its birth. Along the way it
accumulated a number of functions where a few of them are purely CLI
related. Since the majority of functions in the package are related to
some "deeper" functionality in Toolbox, it makes more sense to move the
selected few to package cmd. This will make pkg/utils a bit leaner and
create a dedicated space for cmd utility functions to live in.
In the process the error creation functions no longer require the
executableBase argument to be passed to them.
https://github.com/containers/toolbox/pull/819
These tests need to be implemented in the future but they require some
magic with socat or similar tools as entering a container is creating
a new subshell and that is hard to monitor from a bash script. Better
not to forget then.
https://github.com/containers/toolbox/pull/915
The path of the dynamic linker (ie., PT_INTERP), as specified in an
architecture's ABI, often starts with /lib or /lib64, not /usr/lib or
/usr/lib64. eg., it's /lib/ld-linux-aarch64.so.1 for aarch64 and
/lib64/ld-linux-x86-64.so.2 for x86_64.
Unfortunately, until very recently [1], only the host's /usr was
present inside a toolbox container's /run/host, not /lib or /lib64.
Therefore, simply prepending /run/host to the /usr/bin/toolbox
binary's existing PT_INTERP entry wouldn't locate the host's dynamic
linker inside the toolbox container. This broke backwards compatibility
with every container out there, except the ones created with the
current development version in Git.
To restore backwards compatibility, the /lib and /lib64 symbolic links
must be resolved to their respective locations inside /usr.
The following caveats must be noted:
* With glibc, even the basename of the path of the dynamic linker as
specified in an architecture's ABI, is a symbolic link to a file
named ld-<glibc-version>.so. However, this file can't be used as
the PT_INTERP entry, because its name will change when glibc is
updated and the PT_INTERP entry will become invalid until the
/usr/bin/toolbox binary is rebuilt.
* On Debian, a path like /lib64/ld-linux-x86-64.so.2 doesn't resolve
to something inside /usr/lib64. Instead it ends up inside
/usr/lib/x86_64-linux-gnu through a series of symbolic links:
- /lib64 -> usr/lib64
- /usr/lib64/ld-linux-x86-64.so.2
-> /lib/x86_64-linux-gnu/ld-2.28.so
- /lib -> usr/lib
* It's assumed that a symbolic link with the basename specified in
the ABI lives in the same directory as the actual dynamic linker
binary named ld-<glibc-version>.so.
Fallout from 6063eb27b9
[1] Commit d03a5fee80https://github.com/containers/toolbox/pull/827https://github.com/containers/toolbox/issues/821
PR #897 made adjustmnets to the Toolbx binary that it requires presence
of /run/host in both the host filesystem and the filesystem in
a container.
The presence of the directory is assured by systemd-tmpfiles by
running it before the binary is started for the first time. For the run
to be effective 'data/tmpfiles.d/toolbox.conf' has to be installed in
a location visible to systemd-tmpfiles. Therefore, the call to
'systemd-tmpfiles --create' had to be placed after the install step.
https://github.com/containers/toolbox/pull/898
There is no significant benefit in keeping this configuration separated.
Now the to-be installed packages are tracked in a single place and the
test playbooks only call the relevant tests.
This was pointed out by in 6063eb27b9https://github.com/containers/toolbox/pull/898
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.
In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.
The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
$ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
__libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_detach
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_create
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_attr_getstacksize
This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.
Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.
Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.
Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.
Based on ideas from Alexander Larsson and Ray Strode.
[1] Commit 6ad9c63180https://github.com/containers/toolbox/pull/534
[2] glibc commit 035c012e32c11e84
https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84https://sourceware.org/bugzilla/show_bug.cgi?id=23323https://github.com/containers/toolbox/issues/821
The subsequent commit will add an entry to create a /run/host symbolic
link on the host that points to /, and it will require explicitly
skipping some of the columns. Doing the same for the existing entry
will make the file more readable.
https://github.com/containers/toolbox/issues/821
Currently, 'toolbox enter' can get into a loop if the user tried to
run something inside the shell that didn't exist, and quit immediately
afterwards:
$ toolbox enter
⬢$ foo
bash: foo: command not found
⬢$
logout
Error: command /bin/bash not found in container fedora-toolbox-34
Using /bin/bash instead.
⬢$
This is because:
* The shell forwards the exit code of the last command that was
invoked as its own exit code. If the last command that was
attempted was absent then this exit code is 127.
* 'podman exec' uses 127 as the exit code when it can't invoke the
command. If it's able to successfully invoke the command, it
forwards the exit code of the command itself.
Therefore, in the above example 'podman exec' itself returns with an
exit code of 127 even though both the working directory and the command
that were passed to it were present. Hence, it's necessary to
explicitly check if the requested command was really absent before
attempting the fallbacks.
Fallout from 4536e2c8c2https://github.com/containers/toolbox/pull/872
Due to docker rate limiting we can not rely in docker.io for
retrieving the images.
This was detected when executing our tests for podman fedora
gating pipeline. Our busybox image was not downloaded and
one of the list tests was failing.
Using the current working directory for cache is not a good solution
since the test files may reside in a location that is unwritable (e.g.,
/usr/share). The `BATS_RUN_TMPDIR` variable should point to a location
that is sure to be writeable from the test suite.
https://github.com/containers/toolbox/pull/850
It looks like there are some oddities with Viper [1]. The errors can't
be examined with errors.As [2] and Viper doesn't actually throw
ConfigFileNotFoundError if a configuration file is not found. Secondly,
there's no way to find out if a key was actually specified in a
configuration file. The InConfig API doesn't return 'true' even if a
key was mentioned in a configuration file, and the IsSet API returns
'true' even if the key was only set via SetDefault in the code.
Some changes by Debarshi Ray.
[1] https://pkg.go.dev/github.com/spf13/viper
[2] https://blog.golang.org/go1.13-errorshttps://github.com/containers/toolbox/pull/828https://github.com/containers/toolbox/pull/851
A subsequent commit will add support for configuration files, which can
override the default toolbox image. Since this override affects all
commands, it effectively ends up adding a fourth option to the 'enter'
command, other than the existing options to change the distribution,
release and container. This makes it a lot more difficult to reason
when only 'toolbox enter --release N' is enough to enter the created
container.
https://github.com/containers/toolbox/pull/828https://github.com/containers/toolbox/pull/851
The 'toolbox run' command has one downside: all newlines contain
a carriage return (CR). This is caused by the unconditional use of the
--tty option in `podman exec`[0]. In these particular tests this can be
worked around by not printing a newline at all.
Another quirk around partial is to check the last line of the output.
[0] https://github.com/containers/podman/issues/9718https://github.com/containers/toolbox/pull/843
The output of `podman build` has changed a bit. Each line of log
describing the build is now in the format of:
- STEP i/n: msg
instead of:
- STEP i: msg
where i is the current step and n the maximum number of steps.
The exact format is not important for the purpose of testing Toolbox, so
we may fallback to partial string testing.
Also the latest step ("COMMIT") seems to no longer be considered a step,
so just check for the word.
https://github.com/containers/toolbox/pull/846
Having the entire host file system hierarchy mounted inside a toolbox
container gives the containers a more complete environment that's
resilient against future changes in the layout of the file system
hierarchy and the need for giving access to new paths to support new
use-cases. Otherwise, one would have to create a new container to get
access to any path that lies outside the /boot, /etc, /run, /tmp, /usr
and /var directories.
As a nice side-effect, this also simplifies the bind mount handling
code.
https://github.com/containers/toolbox/pull/827
Turns out the braces do not need to be escaped.
The equivalent code in the POSIX shell implementation was:
echo "$image" | grep "^[a-f0-9]\{6,64\}$"
There the braces had to be escaped because it was using grep(1) with
basic regular expressions (ie., without the --extended-regexp flag),
where the meta-characters ?, +, {, |, ( and ) lose their special
meaning unless they are escaped.
However, that was grep(1), and this is Go's regexp package.
Fallout from dd947016b3https://github.com/containers/toolbox/pull/825
The regexp.MatchString [1] API returns an error only when the regular
expression is faulty, and the boolean return value tells if a match was
found. In this case, the regular expression is baked into the code as a
string literal. So, unless there's a programmer error, it should always
be valid.
Fallout dd947016b3
[1] https://golang.org/pkg/regexp/#MatchStringhttps://github.com/containers/toolbox/pull/825
When installing to a non-system-wide prefix as a non-root user, the
tmpfilesdir path defined by systemd might not be accessible. Overriding
the path helps to prevent the installation from failing.
https://github.com/containers/toolbox/pull/717
This makes 'toolbox', without any commands specified, behave a lot like
'toolbox enter'. When there aren't any toolbox containers, it will
offer to create a new container matching the same parameters passed to
the command. If there's just one toolbox container available, then it
will fall back to it.
This makes the command line interface a lot similar to that of
github.com/coreos/toolbox, which makes things easier for those
switching over from it.
Some changes by Debarshi Ray.
https://github.com/containers/toolbox/pull/811
SELinux is always meant to be disabled. The exact location of the code
is a historical accident and isn't meant to imply that SELinux might
be optionally enabled.
https://github.com/containers/toolbox/pull/814
Avoid phrases like "shortcoming of container configuration", because
it makes one wonder why a known shortcoming is even being used or not
being fixed. Immutability also has its advantages for certain
use-cases, and it's beyond the scope of this manual to have a full
blown discussion about the pros and cons of OCI containers. Interested
readers can research that on their own.
https://github.com/containers/toolbox/pull/814
This builds upon commit ea452d7ced.
The configuration of a toolbox container is a higher level topic than
the entry point, and the entry point is mentioned as one part of it.
Therefore, putting the section on toolbox set-up earlier in the text
makes it nicely flow from the DESCRIPTION section into the Entry Point
sub-section.
Emphasize the user-visible features of a toolbox container, and not
the underlying implementation details, and avoid using too much jargon
about container technology.
https://github.com/containers/toolbox/pull/814
It was a deliberate decision to have entry point documented in both
toolbox-create(1) and toolbox-init-container(1). For technical
documentation it's sometimes good to repeat the same thing if it's
sufficiently important. Either to refresh the user's memory or to draw
their attention to it. Having to traverse too many references can get
disorienting. eg., parts of README.md are already repeated in
toolbox(1).
In this case, the entry point is very directly related to the create
command because the command sets it up, and unlike HTML documents,
it's awkward to follow links from manuals.
This reverts parts of commit ea452d7ced.
https://github.com/containers/toolbox/pull/814
The DESCRIPTION already explains the details of the set-up on Fedora,
so there's no need to be so specific here. Plus, conceptually, it's not
meant to be Fedora-specific. Fedora is just an example and happens to
be the most well-supported one at the moment, but that will change.
https://github.com/containers/toolbox/pull/814
Some aspects of the Fedora image are described in toolbox-create(1),
but the exact URL of the image is an implementation detail. As Toolbox
grows, it will become unwieldy to describe these details in the
top-level manual.
https://github.com/containers/toolbox/pull/814
The manuals for the individual commands were already listed above.
The entry point of toolbox containers is prominently documented in
toolbox-create(1) and toolbox-init-container(1). It's not clear why
someone who has just come across toolbox(1) would want to know about
the entry point. It's, after all, an implementation detail. They
probably don't even know what's an entry point to begin with. The
top-level manual should give the reader an overall view of the tool
from a user's perspective, and let the other manuals draw them into the
finer details of things.
https://github.com/containers/toolbox/pull/814
It's good to document the --log-level and --log-podman flags because
they can give us some flexibility with the logging in future, but it's
still desirable to keep --verbose (and the -vv trick) in the manual.
Toolbox is still a small enough code base that not too many log levels
are actually needed, yet. The complexity of remembering which log
level reveals which detail soon starts to outweigh the simplicity of
dumping as much as possible, since there aren't that many log messages
to begin with. It's a lot easier to type and remember things like
--verbose, -v and -vv, than their newer counterparts, and they are a
reasonably widely used convention (eg., flatpak, nmap, ssh, etc.).
If some day Toolbox grows to have a significantly larger number of log
messages, then it's possible that --verbose would be of less use, but
that's not the case today.
https://github.com/containers/toolbox/pull/814
Currently, the 'enter' command involves two extra invocations of
'podman exec' to detect if the user's chosen shell and current working
directory are present inside the toolbox container. Each invocation is
sufficiently expensive to add a noticeable overhead to the 'enter' and
'run' commands. Moreover, file system operations being inherently racy,
it's always better to detect errors and handle them instead of trying
to pre-emptively avoid them.
Therefore, this shuffles the code around to attempt the non-fallback
invocation, and then handle the errors by attempting a series of
fallbacks for the command and the current working directory.
Unfortunately, in case of a missing command, capsh(1) adds an extra
error message that seems difficult to get rid of:
$ toolbox enter
/bin/sh: /bin/zsh: No such file or directory
Error: command /bin/zsh not found in container fedora-toolbox-34
Using /bin/bash instead.
https://github.com/containers/toolbox/pull/813
This will be used by the subsequent commit to optimize the 'enter' and
'run' commands in the non-fallback case, by attempting the fallback
only if an error was encountered by the main 'podman exec' invocation,
as opposed to pre-emptively setting up the fallback.
https://github.com/containers/toolbox/pull/813
The reason for setting FParseErrWhitelist.UnknownFlags to 'true' was to
prepare for a future when the 'init-container' command would have fewer
options than it does now.
However, there's no need to prepare for it, because the version of
toolbox(1) that's bind mounted into the container is the same as the
one on the host. So, FParseErrWhitelist.UnknownFlags can be set in
future if, or when, the number of flags do get reduced.
This reverts commit 5c2086e9ea.
https://github.com/containers/toolbox/pull/807
This builds upon commit eedfdda535, which added more information
to the error messages presented to the user by including the errors
thrown by the lower layers of the code.
However, if the errors are being thrown by external modules, or are
coming from functions that are too many layers below, then it is
difficult to guarantee their contents. They might be duplicating
information added by the upper layers, or in extreme cases might even
contain JSON blobs, simply because it made sense for the API contracts
of the functions generating the errors.
Therefore, it's better to put them in the debug logs to retain control
over what gets displayed to the user under normal (ie., non-debug)
operation.
https://github.com/containers/toolbox/pull/809
Even though SilenceUsage is set to 'true', to have full control over
what gets shown in the case of an error, there is still (at least?)
one occasion in which the usage function set using SetUsageFunc (ie.,
rootUsage) is used - when an unknown flag is used. For example,
'toolbox --foo'. Oddly enough, an unknown command won't lead to
rootUsage. eg., 'toolbox foo'.
Since rootUsage uses executableBase, that variable needs to be set
earlier, which means that setUpGlobals needs to run before rootUsage.
It turns out that the PersistentPreRunE hook (ie., preRun) doesn't get
invoked when an unknown flag is encountered. Therefore, we can't put
setUpGlobals inside preRun.
This reverts commit 6bbbedf675.
https://github.com/containers/toolbox/pull/802
Some people create images manually. If such created images are recognize
as toolbox images (they have the proper labels) but do not have
a name/tag then 'toolbox list' will panic due to index being out of
range.
https://github.com/containers/toolbox/pull/800
Since /etc/machine-id is bind mounted into the toolbox container from
the host operating system, it doesn't make sense to make it mandatory
for images to have that file. Apparently, (some?) Arch Linux images
don't have /etc/machine-id.
Since a missing containerPath for a directory is handled the same way,
there's no reason not to do the same for regular files. It will make
life a bit easier for those creating toolbox images for different
distributions.
https://github.com/containers/toolbox/pull/710
Errors thrown from 'toolbox init-container' are usually not shown to
the user. One has to use 'podman start --attach ...' to see them.
Therefore, it's worth adding the extra bit of information to the error.
https://github.com/containers/toolbox/pull/710
A subsequent commit will handle a missing containerPath when bind
mounting a regular file like /etc/machine-id. Therefore, it's better to
explicitly state that this code is dealing with a directory.
https://github.com/containers/toolbox/pull/710
Not having the corresponding image for UBI toolbox containers show up
in 'toolbox list' is a rough edge. However, the whole UBI feature is
a bit experimental. It's about a gratis RHEL environment getting
created in a jiffy on any host, which is something that hasn't been
done before, and those containers also suffer from various shortcomings
because of the limited package set of UBI.
So it's not that big of a problem if it takes a release or two to
hammer out the details. Especially since it's likely that there will
be a special Toolbox-specific image that's created out of the UBI RPM
repositories, which will likely have the com.github.containers.toolbox
label.
There's also the issue that 0.1.0 needs to be finished, and for that
the the churn needs to be kept down. Changing the labels can very
likely lead to compatibility issues in the future, because of which it
either can't be removed for a while or the wrong images start to get
listed. Some of the older labels have finally been removed, so it's
better not to add more to the list.
In short, this problem will likely fix itself in the coming months, so
it's wise not to create complications trying to rush through a fix.
This reverts commits 1df36591d0 and
e09de9f3e5.
https://github.com/containers/toolbox/issues/753
UBI[0] does not have the recommend Toolbox labels used to track whether
an image/container is truly a toolbox image/container. Thankfully, they
have a number of labels to choose from that we can use to identify the
image. The "com.redhat.component=ubi8-container" seems to be ideal.
The approach of using the UBI8 label introduces one problem though. If
we were to use only one set of labels for both images and containers,
containers created with Podman and not Toolbox from UBI8 would also be
marked as toolbox containers. This is not desired and therefore there
are now two sets of labels. Ones for images where the new label has been
added and other for containers that stays the same.
Since the rewrite of the system test suite[0] we've relied on the Zuul
playbooks for taking care of caching images using Skopeo for increasing
the reliability of the tests (in the past the instability of the Fedora
registry caused problems). This state is problematic if we want to use
the tests in other environments than the Zuul CI. This moves the caching
from Zuul into the system tests.
Currently, Bats does not support officially suite-wide setup and
teardown functions. The solution I chose was to add two new test files
that are executed before and after all tests. This may complicate the
execution of cherry-picked tests but that is not a very common use case
anyway.
The tests are now to some extent capable of adjusting to the host
environment. This is meant in the sense of: I'm running on RHEL, the
"default image" is UBI; I'm running on Fedora, the "default image" is
fedora-toolbox. This mechanism relies on os-release, which is the same
as what Toolbox itself uses.
[0] https://github.com/containers/toolbox/pull/517https://github.com/containers/toolbox/pull/774
The fedora-toolbox:32 image is the first of images in the renamed
toolbox image repository[0]. With the change we can drop the
pull_image_old() function because it was kept only for the old image.
Seems like newer version of ShellCheck checks the validity of variable
names (SC2153). This caused a false positive, so I silenced it.
[0] https://github.com/containers/toolbox/pull/615https://github.com/containers/toolbox/pull/780
The system test refactor[0] replaced the 'run_toolbox' helper function
with 'run toolbox', which is a normal invocation of Toolbox. This makes
it impossible to override Toolbox used during the tests using env var.
[0] https://github.com/containers/toolbox/pull/693
Instead of executing 'podman ps|images' several times in a row, call
them only once and get output with all images/containers. Then, filter
out the JSON using labels and keep images/containers only with matching
labels.
This simplifies the code significantly and cuts down the execution time
of 'toolbox list'. The speed gain is noticeable:
- the system has 5 images and 10 containers
Before patch: ~1.45s
After patch: ~0.85s
- Update "See also" sections
Toolbox does not use Buildah for a considerable time now[0]. We can stop
referencing it in the "See also" sections of the documentation.
In some places mention podman command man pages where they are relevant.
- Add section about toolbox images/containers
Toolbox only supports certain OCI images. These should be documented.
Also, document the change of fedora-toolbox image name.
- Add a section about toolbox container setup
Toolbox containers are specifically configured OCI containers. This
should be documented so that users know what they're using.
- Remove redundant part documentation
The description of what `toolbox init-container` does is already in
toolbox-init-container(1). There's no need to have it in
toolbox-create(1). Instead, replace the text with a hint to visit the
other part of documentation.
- Clarify behaviour of --image option
The fact that Toolbox by default tries to pull from the Fedora
registry[1] should be noted.
- Update synopsis & description of commands
Mention options passed to `podman exec`. Remove redundant paragraph
about container names (is already dealt with in toolbox-create(1)).
There's no need to mention the name of the default container on Fedora
since Toolbox now also supports RHEL.
Mention the default used image on unrecognised systems.
Emphasize the fact that toolboxes are not a fully sandboxed environment.
Update the wording of the description and splits it into a few
subsections.
The description of the --monitor-host was inaccurate and while the
option will go away in the future[2], it is currently in and should be
more documented.
[0] https://github.com/containers/toolbox/pull/160
[1] https://registry.fedoraproject.org
[2] https://github.com/containers/toolbox/pull/617https://github.com/containers/toolbox/pull/512
Since v0.0.91[0] Toolbox throws an error if $PWD is not available in a
toolbox. While this fixes the problem with 'toolbox enter/run' silently
failing to enter/exec in a container, it still requires an action to be
made by the user. I believe it is better to handle such situations more
gracefully by falling back to entering the user's home folder + printing
a warning about doing so.
[0] https://github.com/containers/toolbox/pull/370
Following patches were made:
- Use toolbox for listing containers/images (assumes the existence of
cut and tail)
- Suggest containers for cmd enter
- Don't suggest --container option
- Update global options
- Don't suggest cmd if already specified
The preferred way to provide of a container in commands enter & create
is via an argument.
Since the rewrite in Go, Toolbox provides the --log-level & --log-podman
options. These options deprecate the --verbose & --very-verbose options.
The completion script with this pops already used global options from
the list, handles better cases with different options and suggests log
levels for the --log-level option.
Toolbox can't be used with multiple commands.
The spinner needs to be explicitly stopped before showing the example
'enter' command for using the container. Otherwise, it gets misprinted:
$ toolbox create foo
Creating container foo: / Created container: foo
Enter with: toolbox enter foo
A comment was added to highlight this, since it might not be obvious at
first sight.
Due to such potential quirks, it might be better to keep the spinner
somewhat tightly encapsulated with the code that necessitates it, which
in this case is 'podman create'. For instance, we already need to be
careful to avoid enclosing the pullImage function with a spinner
because it carries it's own.
The code lying between the 'podman pull' and the 'podman create' is so
light that a human user isn't able to discern the absence of a
spinner. So, it seems worth leaning towards ease of understanding and
avoiding potential traps.
This reverts commit 3aaa1d30f1.
https://github.com/containers/toolbox/pull/746
Shell Toolbox has been replaced by the Go implementation a quite while
ago. It is kept in the repository but is no longer actively developed.
There is no need to continue checking it with ShellCheck.
https://github.com/containers/toolbox/pull/733
Call "meson builddir" makes Meson create a build directory called
"builddir". It does not make it build the project. A subsequent call to
"meson compile" or "ninja" needs to be made. This subtle detail causes
a minor (purely visual) discrepancy in the CI output. Fix this for both
unit-test & system-test job definitions.
We now have some Go unit tests[0] and we should use them. By adding a
new test case to Meson, the existing CI job called "shellcheck" has no
longer an accurate name. With this it has been renamed to "unit-test".
Also, the job is now more important and therefore should also be used
for gating.
[0] https://github.com/containers/toolbox/pull/474https://github.com/containers/toolbox/pull/730
The init-container command uses several flags. In the future we'd like
to minimize their number. In order to be able to do that without
breaking systems with older versions of Toolbox, the command can't error
out due to usage of unknown flags.
https://github.com/containers/toolbox/pull/724
Too many appends. Instead, put the required sequence into a single array
and append only the variable parts.
Instead of calling "init-container" with "--verbose", call it rather
with "--log-level debug".
Showing spinner after a lot of work on creating a toolbox is done (even
though not really time consuming) does not make much sense.
When a spinner is started successfully, a stop command is deferred.
There's no need to stop it additionally.
A while ago, 'podman build' stopped supporting COPY with relative
symbolic links [1]. Therefore, these image definitions can't be used
without first temporarily removing the symbolic links, which is
annoying.
The downside is that the copies of README.md now has to be separately
updated, which isn't that big of a hassle compared to the problem that
it fixes.
[1] https://github.com/containers/buildah/issues/1952https://github.com/containers/toolbox/pull/723
When taking ownership of the runtime directory or the initialization
stamp file inside it, it was assumed that the user's GID and UID were
the same. However that might not always be the case.
Note that this commit doesn't use the GID passed from the host to the
toolbox container's entry point to configure the user inside the
container. That is actually more difficult than it sounds. The manual
for useradd(8) says that the group specified by the '--gid' flag must
actually exist.
https://github.com/containers/toolbox/issues/664
Since Fedora 33, `nano` is the default editor[0]. It needs to be
included in the fedora-toolbox image to have the standard Fedora
experience inside the container.
[0] https://fedoraproject.org/wiki/Changes/UseNanoByDefault
CoreOS recently made /boot read-only[0]. This caused an issue with
starting containers because /boot was mounted only with option rslave
but missed the ro option. This caused a permission issue.
This scenario is very similar to the one with /usr on Fedora Silverblue.
The solution for this is to check mount options of the path and check if
it uses the rw option or ro and then add it to the mount options in the
--volume option in 'podman create'.
Fixes: https://github.com/coreos/fedora-coreos-tracker/issues/734
[0] 1de21ffa98https://github.com/containers/toolbox/pull/712
On Fedora Silverblue 33 the output of 'findmnt --noheadings --output
OPTIONS /usr' is:
ro,relatime,seclabel,ssd,space_cache,subvolid=257,subvol=/root
(Fedora uses btrfs as it's default filesystem since version 33[0]). But
when you make the current deployment mutable using 'ostree admin unlock'
the output of the command changes to something like this:
ro,relatime,seclabel,ssd,space_cache,subvolid=257,subvol=/root
rw,relatime,seclabel,lowerdir=usr,upperdir=/var/tmp/ostree-unlock-ovl.JLXHQ0/upper,workdir=/var/tmp/ostree-unlock-ovl.JLXHQ0/work
This causes utils.GetMountOptions to error out preventing a successful
creation of a container with 'toolbox create' when the deployment is
unlocked.
For Toolbox the first line is the more relevant because even though /usr
is technically writeable, it will cease to be after reboot. This is the
current behaviour of the utils.GetMountOptions. Thanks to that I think
it's safe to remove the length check that prevents to create a container
when the current deployment is unlocked.
[0] https://fedoraproject.org/wiki/Changes/BtrfsByDefaulthttps://github.com/containers/toolbox/pull/554
Since commit b27795a03e, each section of the test suite starts
and ends with a clean Podman state. This includes removing all images
from the local containers storage. Therefore, the images get downloaded
multiple times during the course of the test suite.
This commit restores the earlier behaviour where the images would get
downloaded only once, by copying them to separate directories outside
the local containers storage and then restoring them when the tests
are run.
https://github.com/containers/toolbox/pull/517https://github.com/containers/toolbox/pull/704
The POSIX shell Toolbox has been replaced by the Go implementation
quite a long time ago. People on several ocassions created PRs that
still update it, or end up using it by mistake when building from
source.
It was not clear that the POSIX shell implementation has been
deprecated and is no longer maintained.
https://github.com/containers/toolbox/pull/698
A lot of issues are about toolbox containers not starting up. In such
cases the output of `podman start --attach` is required to see what is
going on. It would be easier if users provided this information right
when they are filling the issue.
https://github.com/containers/toolbox/pull/699
Without this I get an error:
```
$ meson -Dprofile_dir=/etc/profile.d builddir
The Meson build system
Version: 0.55.3
Source dir: /home/user/toolbox
Build dir: /home/user/toolbox/builddir
Build type: native build
Project name: toolbox
Project version: 0.0.97
meson.build:1:0: ERROR: Unknown compiler(s): ['cc', 'gcc', 'clang', 'pgcc', 'icc']
The follow exceptions were encountered:
Running "cc --version" gave "[Errno 2] No such file or directory: 'cc'"
Running "gcc --version" gave "[Errno 2] No such file or directory: 'gcc'"
Running "clang --version" gave "[Errno 2] No such file or directory: 'clang'"
Running "pgcc --version" gave "[Errno 2] No such file or directory: 'pgcc'"
Running "icc --version" gave "[Errno 2] No such file or directory: 'icc'"
A full log can be found at /home/user/toolbox/builddir/meson-logs/meson-log.txt
```
The bats-support[0] and bats-assert[1] libraries extend the
capabilities of bats[2]. Mainly, bats-assert is very useful for clean
checking of values/outputs/return codes.
Apart from updating the cases to use the libraries, the test cases have
been restructured in a way that they don't depend on each other anymore.
This required major changes in the helpers.bats file.
Overall, the tests are cleaner to read and easier to extend due to the
test cases being independent.
Some slight changes were made to the test cases themselves. Should not
alter their final behaviour.
There will be a follow up commit that will take care of downloading of
the tested images locally and caching them using Skopeo to speedup the
tests and try to resolve network problems when pulling the images that
we experienced in the past.
[0] https://github.com/bats-core/bats-support
[1] https://github.com/bats-core/bats-assert
[2] https://github.com/bats-core/bats-core
The Go implementation prefers a newer syntax for assigning a custom
name to a toolbox container. The --container option is still supported
for backwards compatibility, but the manuals should show the new
workflow.
https://github.com/containers/toolbox/pull/681
The Go implementation prefers a newer syntax for assigning a custom
name to a toolbox container. The --container option is still supported
for backwards compatibility, but the manuals should show the new
workflow.
https://github.com/containers/toolbox/pull/678
Ever since version 0.0.10, all newly created toolbox containers use a
reflexive entry point [1] and don't need a user-specific customized
image. Older containers that don't use a reflexive entry point were
deprecated in version 0.0.17 [2], and aren't even supported in the Go
implementation.
Therefore, it's time to finally update the manuals to document the
current way of doing things. Since the reflexive entry point is a key
feature of toolbox containers, some text was added to explain why it's
necessary and what it does.
[1] Commit 8b84b5e460https://github.com/containers/toolbox/pull/160
[2] Commit 9dc5281430https://github.com/containers/toolbox/pull/336https://github.com/containers/toolbox/pull/677
While Toolbox's test suite explicitly uses --shell=sh when running
shellcheck(1) on profile.d/toolbox.sh, external tools like Coverity
can't be expected to do the same. So they complain:
Line 1:
[ "$BASH_VERSION" != "" ] || [ "$ZSH_VERSION" != "" ] || return 0
^-- SC2148: Tips depend on target shell and yours is unknown. Add a
shebang or a 'shell' directive.
See: https://github.com/koalaman/shellcheck/wiki/SC2148https://github.com/containers/toolbox/pull/673
On Arch Linux and Ubuntu hosts, /etc/localtime is an absolute symbolic
link to /usr/share/zoneinfo/SomeTimeZone. So, inside the container,
/run/host/etc/localtime also has /usr/share/zoneinfo/SomeTimeZone as
its target.
https://github.com/containers/toolbox/issues/622
2021-01-12 21:03:10 +01:00
267 changed files with 24912 additions and 7102 deletions
@ -28,7 +28,7 @@ If applicable, add screenshots to help explain your problem.
**Output of `toolbox --version` (v0.0.90+)**
e.g., `toolbox version 0.0.90`
**Toolbox package info (`rpm -q toolbox`)**
**Toolbx package info (`rpm -q toolbox`)**
e.g., `toolbox-0.0.18-2.fc32.noarch`
**Output of `podman version`**
@ -49,4 +49,6 @@ e.g., Fedora Silverblue 32
**Additional context**
Add any other context about the problem here.
When did the issue start occurring? After an update (what packages were updated)?
If the issue is about operating with containers/images (creating, using, deleting,..), share here what image you used. If you're unsure, share here the output of `toolbox list -i` (shows all toolbox images on your system).
If the issue is about operating with containers/images (creating, using, deleting,..), share here what image you used. If you're unsure, share here the output of `toolbox list -i` (shows all Toolbx images on your system).
If you see an error message saying: `Error: invalid entry point PID of container <name-of-container>`, add to the ticket output of command `podman start --attach <name-of-container>`.
or somewhere else. In such cases we'd like you to still report the bug and
help somewhere else (e.g., chat channels). In such cases we'd like you to still report the bug and
share with us any info that could be gathered from those places
## Writing a Bug Report
@ -55,14 +37,14 @@ When writing a bug report:
reproduce it.
- **Describe the behavior you received and what you expected** - Sometimes it
may not be clear what the *right* behavior should look like.
- **Provide info about the version of used software** - What version of Toolbox
- **Provide info about the version of used software** - What version of Toolbx
and Podman do you use?
- **Provide info about your system** - What distribution do you use? Which
desktop environment? Is it a VM or a real machine?
# Making Suggestions
Toolbox is not feature-complete and some of it's functionality is not-there-yet.
Toolbx is not feature-complete and some of it's functionality is not-there-yet.
We are thankful for all suggestions and ideas but be ready that your suggestion
may be rejected.
@ -81,7 +63,7 @@ may be rejected.
When writing a suggestion:
- **Use a clear and descriptive title**
- **Describe the idea** - What parts of Toolbox does it affect? Is it a major
- **Describe the idea** - What parts of Toolbx does it affect? Is it a major
functionality or a minor tweak?
- **Provide step-by-step description of the suggested behavior** so that we
will understand.
@ -90,13 +72,13 @@ When writing a suggestion:
# First Contribution
Toolbox is written in [Go](https://golang.org) and uses [Meson](https://mesonbuild.com)
Toolbx is written in [Go](https://golang.org) and uses [Meson](https://mesonbuild.com)
as it's buildsystem.
Instructions for building Toolbox from source are in our [README](https://github.com/containers/toolbox/blob/master/README.md).
Instructions for building Toolbx from source are in our [README](https://github.com/containers/toolbox/blob/main/README.md).
> You may not need to build the project from source if your contribution is not
> related to the code of Toolbox itself (e.g., documentation, updating CI
> related to the code of Toolbx itself (e.g., documentation, updating CI
> config, playing with image definitions,...).
Here are some ideas of what you could contribute with:
@ -106,18 +88,18 @@ Here are some ideas of what you could contribute with:
- Write tests - Go has [tools](https://golang.org/pkg/testing/) for writing tests.
There are also [some](https://github.com/stretchr/testify) [libraries](https://github.com/onsi/ginkgo)
used for creating even more sophisticated tests.
- Play with custom images - Toolbox currently officially works with Fedora-based
- Play with custom images - Toolbx currently officially works with Fedora-based
images. Ultimately there should be a wide variety of supported distro images.
You can help with testing other people's image definitions or creating your
own. **Beware**, maintainers still don't have a clear idea of how the image
infrustructure should look like.
- Write documentation - Some functions in Toolbox's code don't have comments and
it's not very clear what they do. Toolbox has it's [documentation](https://docs.fedoraproject.org/en-US/fedora-silverblue/toolbox/)
infrastructure should look like.
- Write documentation - Some functions in Toolbx's code don't have comments and
it's not very clear what they do. Toolbx has it's [documentation](https://docs.fedoraproject.org/en-US/fedora-silverblue/toolbox/)
hosted by Fedora. It's not very large and could use some attention.
- Hack on the code and share the result - Seriously! Sometimes random ideas are
the best.
Toolbox currently does not have an infrastructure for translations. You can help
Toolbx currently does not have an infrastructure for translations. You can help
us to set it up!
# Pull Requests
@ -133,9 +115,10 @@ documentation, code comments and much more.
code you're contributing to, consider opening another PR if you want to
implement it yourself or file an issue so that somebody else can pick it up.
- Update documentation to reflect your changes - Manual pages can be found in
directory `doc`. If your changes affect Toolbox's [documentation](https://docs.fedoraproject.org/en-US/fedora-silverblue/toolbox/),
directory `doc`. If your changes affect Toolbx's [documentation](https://docs.fedoraproject.org/en-US/fedora-silverblue/toolbox/),
consider creating a PR there (but to save yourself time, you can do it
after your changes are accepted), too.
- After creating a PR add to the bottom of all your commits a link to the PR. This helps the future maintainers find discussions around the changes.
## After Creating a Pull Request
@ -145,20 +128,19 @@ your efforts! We really appreciate them! Sometimes we may be stuck in different
parts of our lives.
If it takes us a very long time to even respond to your Pull Request, you can
try to @ping us, request a review or try to reach to us on IRC ([Freenode](https://freenode.net/);
#silverblue, #containers, #fedora-devel,..) or [Fedora Forum](https://discussion.fedoraproject.org).
try to @ping us at our communication channels (see section #Communication).
Toolbox has a simple CI (Continuos Integration) setup for running system tests (
can be found under directory `test/system`). Their goal is to check if your
changes don't affect adversely Toolbox's functionality. Sometimes these tests
##
Toolbx has a CI (Continuous Integration) setup for running tests. Their goal is to check if your
changes don't affect adversely Toolbx's functionality. Sometimes these tests
mail fail with a false-positive. If you are not sure about the outcome of the
tests, reach out to the maintainers!
tests, you can try to trigger a new test run by writing a comment with text `recheck` (really just that). If the issue persists, reach out to the maintainers!
Toolbox's CI system is [Zuul](https://zuul-ci.org/) hosted at [softwarefactory](https://softwarefactory-project.io/).
Toolbx's CI system is [Zuul](https://zuul-ci.org/) hosted at [softwarefactory](https://softwarefactory-project.io/). The CI is defined using [Ansible](https://www.ansible.com) playbooks. For more information on writing Zuul jobs see their [documentation](https://zuul-ci.org/docs/zuul/reference/user.html).
# Little Style Guide
Toolbox is written in [Go](https://golang.org) and uses its default set of tools
Toolbx is written in [Go](https://golang.org) and uses its default set of tools
including `gofmt` and `golint`.
Here are some good materials to learn from about the way how to write nice and
@ -175,3 +157,8 @@ If you are using Visual Studio Code, there are [plugins](https://marketplace.vis
that include all this functionality and throw a warning if you're doing
something wrong.
# Communication
The Toolbx team hangs-out at a dedicated Matrix channel: [#toolbx:matrix.org](https://matrix.to/#/#toolbx:matrix.org).
For Fedora-specific discussions you can visit their [wiki](https://docs.fedoraproject.org/en-US/project/join/) to learn about the means to contact the community.
By default, Toolbox uses Go modules and all the required Go packages are
automatically downloaded as part of the build. There's no need to worry about
the Go dependencies, unless the build environment doesn't have network access
or any such peculiarities.
## Goals and Use Cases
### High Level Goals
- Provide a CLI convenience interface to run containers (via `podman`) easily
- Support for Developer and Debugging/Management use cases
- Support for multiple distros
- toolbox package in multiple distros
- toolbox containers for multiple distros
### Non-Goals - Anti Use Cases
- Supporting multiple container runtimes. `toolbox` will use `podman` exclusively
- Adding significant features on top of `podman`
- Significant feature requests should be driven into `podman` upstream
- To run containers that aren't tightly integrated with the host
- i.e. extremely sandboxed containers become specific to the user quickly
### Developer Use Cases
- I’m a developer hacking on source code and building/testing code
- Most cases: user doesn't need root, rootless containers work fine
- Some cases: user needs root for testing
- Desktop Development:
- developers need things like dbus, display, etc, to be forwarded into the toolbox
- Headless Development:
- toolbox works properly in headless environments (no display, etc)
- Need development tools like gdb, strace, etc to work
### Debugging/System management Use Cases
- Inspecting Host Processes/Kernel
- Typically need root access
- Need bpftrace, strace on host processes to work
- Ideally even do things like helping get kernel-debuginfo data for the host kernel
- Managing system services
- systemctl restart foo.service
- journalctl
- Managing updates to the host
- rpm-ostree
- dnf/yum (classic systems)
### Specific environments
- Fedora Silverblue
- Silverblue comes with a subset of packages and discourages host software changes
- Users need a toolbox container as a working environment
- Future: use toolbox container by default when a user opens a shell
- Fedora CoreOS
- Similar to silverblue, but non-graphical and smaller package set
- RHEL CoreOS
- Similar to Fedora CoreOS. Based on RHEL content and the underlying OS for OpenShift
- Need to [use default authfile on pull](https://github.com/coreos/toolbox/pull/58/commits/413f83f7240d3c31121b557bfd55e489fad24489)
- Need to ensure compatibility with the rhel7/support-tools container
- currently not a toolbox image, opportunity for collaboration
- Alignment with `oc debug node/` (OpenShift)
- `oc debug node` opens a shell on a kubernetes node
- Value in having a consistent environment for both `toolbox` in debugging mode and `oc debug node`
## Distro support
By default, Toolbox creates the container using an
[OCI](https://www.opencontainers.org/) image called
`<ID>-toolbox:<VERSION-ID>`, where `<ID>` and `<VERSION-ID>` are taken from the
host's `/usr/lib/os-release`. For example, the default image on a Fedora 33
host would be `fedora-toolbox:33`.
This default can be overridden by the `--image` option in `toolbox create`,
but operating system distributors should provide an adequately configured
default image to ensure a smooth user experience.
## Image requirements
Toolbox customizes newly created containers in a certain way. This requires
certain tools and paths to be present and have certain characteristics inside
the OCI image.
Tools:
* `getent(1)`
* `id(1)`
* `ln(1)`
* `mkdir(1)`: for hosts where `/home` is a symbolic link to `/var/home`
* `passwd(1)`
* `readlink(1)`
* `rm(1)`
* `rmdir(1)`: for hosts where `/home` is a symbolic link to `/var/home`
* `sleep(1)`
* `test(1)`
* `touch(1)`
* `unlink(1)`
* `useradd(8)`
* `usermod(8)`
Paths:
* `/etc/host.conf`: optional, if present not a bind mount
* `/etc/hosts`: optional, if present not a bind mount
* `/etc/krb5.conf.d`: directory, not a bind mount
* `/etc/localtime`: optional, if present not a bind mount
* `/etc/resolv.conf`: optional, if present not a bind mount
* `/etc/timezone`: optional, if present not a bind mount
Toolbox enables `sudo(8)` access inside containers. The following is necessary
for that to work:
* The image should have `sudo(8)` enabled for users belonging to either the
`sudo` or `wheel` groups, and the group itself should exist. File an
[issue](https://github.com/containers/toolbox/issues/new) if you really need
support for a different group. However, it's preferable to keep this list as
short as possible.
* The image should allow empty passwords for `sudo(8)`. This can be achieved
by either adding the `nullok` option to the `PAM(8)` configuration, or by
add the `NOPASSWD` tag to the `sudoers(5)` configuration.
Since Toolbox only works with OCI images that fulfill certain requirements,
it will refuse images that aren't tagged with
`com.github.containers.toolbox="true"` and
`com.github.debarshiray.toolbox="true"` labels. These labels are meant to be
used by the maintainer of the image to indicate that they have read this
document and tested that the image works with Toolbox. You can use the
following snippet in a Dockerfile for this:
```Dockerfile
LABEL com.github.containers.toolbox="true" \
com.github.debarshiray.toolbox="true"
```
[](https://www.archlinux.org/packages/extra/x86_64/toolbox/)