Cut is a cleaner & more performant api relative to SplitN(_, _, 2) added in go 1.18
Previously applied this refactoring to buildah:
https://github.com/containers/buildah/pull/5239
Signed-off-by: Philip Dubé <philip@peerdb.io>
We shouldn't hardcode `~/.local` - we should use the internal
config helper APIs which honor the XDG_DATA_DIR etc. standard
environment variables.
Signed-off-by: Colin Walters <walters@verbum.org>
Changes SSH key behavior such that there is a single persisted key for all
machines across all providers. If there is no key that is located at
`.local/share/containers/podman/machine/` then it is created. The keys are
not deleted when the last machine on the host is removed.
The main motivation for this change is it leads to fewer files created on the
host as a result of vm configuration. Having `n` machines on your system doesn't
result in `2n` machine-related files in `.ssh` on your system anymore.
As a result of ssh keys being persisted by default, the `--save-keys` flag
on `podman machine rm` will no longer be supported.
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Add a new `no-dereference` mount option supported by crun 1.11+ to
re-create/copy a symlink if it's the source of a mount. By default the
kernel will resolve the symlink on the host and mount the target.
As reported in #20098, there are use cases where the symlink structure
must be preserved by all means.
Fixes: #20098
Fixes: issues.redhat.com/browse/RUN-1935
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
When we walk the /dev tree we need to lookup all device paths. Now in
order to get the major and minor version we have to actually stat each
device. This can again fail of course. There is at least a race between
the readdir at stat call so it must ignore ENOENT errors to avoid
the race condition as this is not a user problem. Second, we should
also not return other errors and just log them instead, returning an
error means stopping the walk and returning early which means inspect
fails with an error which would be bad.
Also there seems to be cases were ENOENT will be returned all the time,
e.g. when a device is forcefully removed. In the reported bug this is
triggered with iSCSI devices.
Because the caller does already lookup the device from the created map
it reports a warning there if the device is missing on the host so it
is not a problem to ignore a error during lookup here.
[NO NEW TESTS NEEDED] Requires special device setup to trigger
consistentlyand we cannot do that in CI.
Fixes https://issues.redhat.com/browse/RHEL-11158
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Motivation
===========
This feature aims to make --uidmap and --gidmap easier to use, especially in rootless podman setups.
(I will focus here on the --gidmap option, although the same applies for --uidmap.)
In rootless podman, the user namespace mapping happens in two steps, through an intermediate mapping.
See https://docs.podman.io/en/latest/markdown/podman-run.1.html#uidmap-container-uid-from-uid-amount
for further detail, here is a summary:
First the user GID is mapped to 0 (root), and all subordinate GIDs (defined at /etc/subgid, and
usually >100000) are mapped starting at 1.
One way to customize the mapping is through the `--gidmap` option, that maps that intermediate mapping
to the final mapping that will be seen by the container.
As an example, let's say we have as main GID the group 1000, and we also belong to the additional GID 2000,
that we want to make accessible inside the container.
We first ask the sysadmin to subordinate the group to us, by adding "$user:2000:1" to /etc/subgid.
Then we need to use --gidmap to specify that we want to map GID 2000 into some GID inside the container.
And here is the first trouble:
Since the --gidmap option operates on the intermediate mapping, we first need to figure out where has
podman placed our GID 2000 in that intermediate mapping using:
podman unshare cat /proc/self/gid_map
Then, we may see that GID 2000 was mapped to intermediate GID 5. So our --gidmap option should include:
--gidmap 20000:5:1
This intermediate mapping may change in the future if further groups are subordinated to us (or we stop
having its subordination), so we are forced to verify the mapping with
`podman unshare cat /proc/self/gid_map` every time, and parse it if we want to script it.
**The first usability improvement** we agreed on #18333 is to be able to use:
--gidmap 20000:@2000:1
so podman does this lookup in the parent user namespace for us.
But this is only part of the problem. We must specify a **full** gidmap and not only what we want:
--gidmap 0:0:5 --gidmap 5:6:15000 --gidmap 20000:5:1
This is becoming complicated. We had to break the gidmap at 5, because the intermediate 5 had to
be mapped to another value (20000), and then we had to keep mapping all other subordinate ids... up to
close to the maximum number of subordinate ids that we have (or some reasonable value). This is hard
to explain to someone who does not understand how the mappings work internally.
To simplify this, **the second usability improvement** is to be able to use:
--gidmap "+20000:@2000:1"
where the plus flag (`+`) states that the given mapping should extend any previous/default mapping,
overriding any previous conflicting assignment.
Podman will set that mapping and fill the rest of mapped gids with all other subordinated gids, leading
to the same (or an equivalent) full gidmap that we were specifying before.
One final usability improvement related to this is the following:
By default, when podman gets a --gidmap argument but not a --uidmap argument, it copies the mapping.
This is convenient in many scenarios, since usually subordinated uids and gids are assigned in chunks
simultaneously, and the subordinated IDs in /etc/subuid and /etc/subgid for a given user match.
For scenarios with additional subordinated GIDs, this map copying is annoying, since it forces the user
to provide a --uidmap, to prevent the copy from being made. This means, that when the user wants:
--gidmap 0:0:5 --gidmap 5:6:15000 --gidmap 20000:5:1
The user has to include a uidmap as well:
--gidmap 0:0:5 --gidmap 5:6:15000 --gidmap 20000:5:1 --uidmap 0:0:65000
making everything even harder to understand without proper context.
For this reason, besides the "+" flag, we introduce the "u" and "g" flags. Those flags applied to a
mapping tell podman that the mapping should only apply to users or groups, and ignored otherwise.
Therefore we can use:
--gidmap "+g20000:@2000:1"
So the mapping only applies to groups and is ignored for uidmaps. If no "u" nor "g" flag is assigned
podman assumes the mapping applies to both users and groups as before, so we preserve backwards compatibility.
Co-authored-by: Tom Sweeney <tsweeney@redhat.com>
Signed-off-by: Sergio Oller <sergioller@gmail.com>
Users want to mount a tmpfs file system with secrets, and make
sure the secret is never saved into swap. They can do this either
by using a ramfs tmpfs mount or by passing `noswap` option to
a tmpfs mount.
Fixes: https://github.com/containers/podman/issues/19659
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add new --farm flag to podman system connection add so that
a user can add a new connection to a farm immediately.
Update system connection remove such that when a connection is
removed, the connection is also removed from any farms that have it.
Add docs and tests for these changes.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Compat api for containers/stop should take -1 value
Add support for `podman stop --time -1`
Add support for `podman restart --time -1`
Add support for `podman rm --time -1`
Add support for `podman pod stop --time -1`
Add support for `podman pod rm --time -1`
Add support for `podman volume rm --time -1`
Add support for `podman network rm --time -1`
Fixes: https://github.com/containers/podman/issues/17542
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
GetKeepIDMapping never read the gid (as it intended) but reused the uid.
Most likely a typo that never bothered anybody as uid and gid usually
match.
Signed-off-by: Simon Brakhane <simon@brakhane.net>
Remove code duplication and use the new FilterID function from
c/common. Also remove the duplicated ComputeUntilTimestamp in podman use
the one from c/common as well.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fixes: https://github.com/containers/podman/issues/18239
[NO NEW TESTS NEEDED]
@test "podman build -f test" in test/system/070-build.bats
Will test this. This was passing when run on a local system since
the remote end was using the clients path to read the Containerfile
The issue is it would not work in a podman machine since the
Containerfile would/should be a different path.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
In rootFsSize(), instead of calculating the size of the diff for every
layer of the container's base image, ask the storage library for the sum
of the values it recorded when it first wrote those layers.
In a similar fashion, teach rwSize() to use the library's
ContainerSize() method instead of trying to roll its own.
Replace calls to pkg/util.SizeOfPath() with calls to
github.com/containers/storage/pkg/directory.Size(), which does the same
thing.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Add --restart flag to pod create to allow users to set the
restart policy for the pod, which applies to all the containers
in the pod. This reuses the restart policy already there for
containers and has the same restart policy options.
Add "never" to the restart policy options to match k8s syntax.
It is a synonym for "no" and does the exact same thing where the
containers are not restarted once exited.
Only the containers that have exited will be restarted based on the
restart policy, running containers will not be restarted when an exited
container is restarted in the same pod (same as is done in k8s).
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Currently --tmpdir changes the location of the pause.pid file. this
causes issues because the c code in pkg/rootless does not know about
that. I tried to fix this[1] by fixing the c code to not use the
shortcut. While this fix worked it will result in many pause processes
leaking in the integrration tests.
Commit ab88632 added this behavior but following the disccusion it was
never the intention that we end up having more than one pause process.
The issues that was trying to fix was caused by somthing else AFAICT,
the main problem seems to be that the pause.pid file parent directory
may not be created when we try to create the pid file so it failed with
ENOENT. This patch fixes it by creating this directory always and revert
the change to no longer depend on the tmpdir value.
With this commit we now always use XDG_RUNTIME_DIR/libpod/tmp/pause.pid
for all podman processes. This allows the c shortcut to work reliably
and should therefore improve perfomance over my other approach.
A system test is added to ensure we see the right behavior and that
podman system migrate actually stops the pause process. Thanks to Ed
Santiago for the improved test to make it work for both `catatonit` and
`podman pause`.
This should fix the issues with namespace missmatches that we can see in
CI as flakes.
[1] https://github.com/containers/podman/pull/18057Fixes#18057
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
add a function to securely mount a subpath inside a volume. We cannot
trust that the subpath is safe since it is beneath a volume that could
be controlled by a separate container. To avoid TOCTOU races between
when we check the subpath and when the OCI runtime mounts it, we open
the subpath, validate it, bind mount to a temporary directory and use
it instead of the original path.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
copy the current mapping into a new user namespace, and run into a
separate user namespace.
Closes: https://github.com/containers/podman/issues/17337
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
As @mheon pointed out in PR #17055[^1], isVirtualConsoleDevice() does
not only matches VT device paths but also devices named like
/dev/tty0abcd.
This causes that non VT device paths named /dev/tty[0-9]+[A-Za-z]+ are
not mounted into privileged container and systemd containers accidentally.
This is an unlikely issue because the Linux kernel does not use device
paths like that.
To make it failproof and prevent issues in unlikely scenarios, change
isVirtualConsoleDevice() to exactly match ^/dev/tty[0-9]+$ paths.
Because it is not possible to match this path exactly with Glob syntax,
the path is now checked with strings.TrimPrefix() and
strconv.ParseUint().
ParseUint uses a bitsize of 16, this is sufficient because the max
number of TTY devices is 512 in Linux 6.1.5.
(Checked via 'git grep -e '#define' --and -e 'TTY_MINORS').
The commit also adds a unit-test for isVirtualConsoleDevice().
Fixes: f4c81b0aa5 ("Only prevent VTs to be mounted inside...")
[^1]: https://github.com/containers/podman/pull/17055#issuecomment-1378904068
Signed-off-by: Fabian Holler <mail@fholler.de>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Drop support for remote use-cases when `.containerignore` or
`.dockerignore` is a symlink pointing to arbitrary location on host.
Signed-off-by: Aditya R <arajan@redhat.com>
Until Podman v4.3, privileged rootfull containers would expose all the
host devices to the container while rootless ones would exclude
`/dev/ptmx` and `/dev/tty*`.
When 5a2405ae1b ("Don't mount /dev/tty* inside privileged containers
running systemd") landed, rootfull containers started excluding all the
`/dev/tty*` devices when the container would be running in systemd
mode, reducing the disparity between rootless and rootfull containers
when running in this mode.
However, this commit regressed some legitimate use cases: exposing
non-virtual-terminal tty devices (modems, arduinos, serial
consoles, ...) to the container, and the regression was addressed in
f4c81b0aa5 ("Only prevent VTs to be mounted inside privileged
systemd containers").
This now calls into question why all tty devices were historically
prevented from being shared to the rootless non-privileged containers.
A look at the podman git history reveals that the code was introduced
as part of ba430bfe5e ("podman v2 remove bloat v2"), and obviously
was copy-pasted from some other code I couldn't find.
In any case, we can easily guess that this check was put for the same
reason 5a2405ae1b was introduced: to prevent breaking the host
environment's consoles. This also means that excluding *all* tty
devices is overbearing, and should instead be limited to just virtual
terminals like we do on the rootfull path.
This is what this commit does, thus making the rootless codepath behave
like the rootfull one when in systemd mode.
This leaves `/dev/ptmx` as the main difference between the two
codepath. Based on the blog post from the then-runC maintainer[1] and
this Red Hat bug[2], I believe that this is intentional and a needed
difference for the rootless path.
Closes: #16925
Suggested-by: Fabian Holler <mail@fholler.de>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
[1]: https://www.cyphar.com/blog/post/20160627-rootless-containers-with-runc
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=501718
While mounting virtual console devices in a systemd container is a
recipe for disaster (I experienced it first hand), mounting serial
console devices, modems, and others should still be done by default
for privileged systemd-based containers.
v2, addressing the review from @fho:
- use backticks in the regular expression to remove backslashes
- pre-compile the regex at the package level
- drop IsVirtualTerminalDevice (not needed for a one-liner)
v3, addressing the review from @fho and @rhatdan:
- re-introduce a private function for matching the device names
- use path.Match rather than a regex not to slow down startup time
Closes#16925.
Fixes: 5a2405ae1b ("Don't mount /dev/tty* inside privileged...")
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
support using keep-id when only one mapping is available to the
rootless user.
When there is only one id available (e.g. there are no additional IDs
set in /etc/subuid and /etc/subgid for the unprivileged user), then
only add the identity mapping $ID -> $ID, leaving unmapped other IDs
in the user namespace.
[NO NEW TESTS NEEDED] it needs a configuration with only one ID
available.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Add a new annotation to allow the user to point to a local tar file
If the annotation is present, import the file's content into the volume
Add a flag to PlayKubeOptions to note remote requests
Fail when trying to import volume content in remote requests
Add the annotation to the documentation
Add an E2E test to the new annotation
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
Truncate the container and pod ID files instead of throwing an error.
The main motivation is to prevent redundant work when starting systemd
units. Throwing an error when the file already exists is not preventing
races or file corruptions, so let's leave that to the user which in
almost all cases are generated (and tested) systemd units.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>