add a function to securely mount a subpath inside a volume. We cannot
trust that the subpath is safe since it is beneath a volume that could
be controlled by a separate container. To avoid TOCTOU races between
when we check the subpath and when the OCI runtime mounts it, we open
the subpath, validate it, bind mount to a temporary directory and use
it instead of the original path.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
copy the current mapping into a new user namespace, and run into a
separate user namespace.
Closes: https://github.com/containers/podman/issues/17337
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
As @mheon pointed out in PR #17055[^1], isVirtualConsoleDevice() does
not only matches VT device paths but also devices named like
/dev/tty0abcd.
This causes that non VT device paths named /dev/tty[0-9]+[A-Za-z]+ are
not mounted into privileged container and systemd containers accidentally.
This is an unlikely issue because the Linux kernel does not use device
paths like that.
To make it failproof and prevent issues in unlikely scenarios, change
isVirtualConsoleDevice() to exactly match ^/dev/tty[0-9]+$ paths.
Because it is not possible to match this path exactly with Glob syntax,
the path is now checked with strings.TrimPrefix() and
strconv.ParseUint().
ParseUint uses a bitsize of 16, this is sufficient because the max
number of TTY devices is 512 in Linux 6.1.5.
(Checked via 'git grep -e '#define' --and -e 'TTY_MINORS').
The commit also adds a unit-test for isVirtualConsoleDevice().
Fixes: f4c81b0aa5 ("Only prevent VTs to be mounted inside...")
[^1]: https://github.com/containers/podman/pull/17055#issuecomment-1378904068
Signed-off-by: Fabian Holler <mail@fholler.de>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Drop support for remote use-cases when `.containerignore` or
`.dockerignore` is a symlink pointing to arbitrary location on host.
Signed-off-by: Aditya R <arajan@redhat.com>
Until Podman v4.3, privileged rootfull containers would expose all the
host devices to the container while rootless ones would exclude
`/dev/ptmx` and `/dev/tty*`.
When 5a2405ae1b ("Don't mount /dev/tty* inside privileged containers
running systemd") landed, rootfull containers started excluding all the
`/dev/tty*` devices when the container would be running in systemd
mode, reducing the disparity between rootless and rootfull containers
when running in this mode.
However, this commit regressed some legitimate use cases: exposing
non-virtual-terminal tty devices (modems, arduinos, serial
consoles, ...) to the container, and the regression was addressed in
f4c81b0aa5 ("Only prevent VTs to be mounted inside privileged
systemd containers").
This now calls into question why all tty devices were historically
prevented from being shared to the rootless non-privileged containers.
A look at the podman git history reveals that the code was introduced
as part of ba430bfe5e ("podman v2 remove bloat v2"), and obviously
was copy-pasted from some other code I couldn't find.
In any case, we can easily guess that this check was put for the same
reason 5a2405ae1b was introduced: to prevent breaking the host
environment's consoles. This also means that excluding *all* tty
devices is overbearing, and should instead be limited to just virtual
terminals like we do on the rootfull path.
This is what this commit does, thus making the rootless codepath behave
like the rootfull one when in systemd mode.
This leaves `/dev/ptmx` as the main difference between the two
codepath. Based on the blog post from the then-runC maintainer[1] and
this Red Hat bug[2], I believe that this is intentional and a needed
difference for the rootless path.
Closes: #16925
Suggested-by: Fabian Holler <mail@fholler.de>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
[1]: https://www.cyphar.com/blog/post/20160627-rootless-containers-with-runc
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=501718
While mounting virtual console devices in a systemd container is a
recipe for disaster (I experienced it first hand), mounting serial
console devices, modems, and others should still be done by default
for privileged systemd-based containers.
v2, addressing the review from @fho:
- use backticks in the regular expression to remove backslashes
- pre-compile the regex at the package level
- drop IsVirtualTerminalDevice (not needed for a one-liner)
v3, addressing the review from @fho and @rhatdan:
- re-introduce a private function for matching the device names
- use path.Match rather than a regex not to slow down startup time
Closes#16925.
Fixes: 5a2405ae1b ("Don't mount /dev/tty* inside privileged...")
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
support using keep-id when only one mapping is available to the
rootless user.
When there is only one id available (e.g. there are no additional IDs
set in /etc/subuid and /etc/subgid for the unprivileged user), then
only add the identity mapping $ID -> $ID, leaving unmapped other IDs
in the user namespace.
[NO NEW TESTS NEEDED] it needs a configuration with only one ID
available.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Add a new annotation to allow the user to point to a local tar file
If the annotation is present, import the file's content into the volume
Add a flag to PlayKubeOptions to note remote requests
Fail when trying to import volume content in remote requests
Add the annotation to the documentation
Add an E2E test to the new annotation
Signed-off-by: Ygal Blum <ygal.blum@gmail.com>
Truncate the container and pod ID files instead of throwing an error.
The main motivation is to prevent redundant work when starting systemd
units. Throwing an error when the file already exists is not preventing
races or file corruptions, so let's leave that to the user which in
almost all cases are generated (and tested) systemd units.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Motivated to have a working `make lint` on Fedora 37 (beta).
Most changes come from the new `gofmt` standards.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
According to https://systemd.io/CONTAINER_INTERFACE/, systemd will try take
control over /dev/ttyN if exported, which can cause conflicts with the host's tty
in privileged containers. Thus we will not expose these to privileged containers
in systemd mode, as this is a bad idea according to systemd's maintainers.
Additionally, this commit adds a bats regression test to check that no /dev/ttyN
are present in a privileged container in systemd mode
This fixes https://github.com/containers/podman/issues/15878
Signed-off-by: Dan Čermák <dcermak@suse.com>
`os.ReadDir` was added in Go 1.16 as part of the deprecation of `ioutil`
package. It is a more efficient implementation than `ioutil.ReadDir`.
Reference: https://pkg.go.dev/io/ioutil#ReadDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
add two new options to the keep-id user namespace option:
- uid: allow to override the UID used inside the container.
- gid: allow to override the GID used inside the container.
For example, the following command will map the rootless user (that
has UID=0 inside the rootless user namespace) to the UID=11 inside the
container user namespace:
$ podman run --userns=keep-id:uid=11 --rm -ti fedora cat /proc/self/uid_map
0 1 11
11 0 1
12 12 65525
Closes: https://github.com/containers/podman/issues/15294
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
network and container prune could not handle the label!=... filter. vendor in c/common to fix this and
add some podman level handling to make everything run smoothly
resolves#14182
Signed-off-by: Charlie Doern <cdoern@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
add two new options to the volume create command: copy and nocopy.
When nocopy is specified, the files from the container image are not
copied up to the volume.
Closes: https://github.com/containers/podman/issues/14722
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The nolintlint linter does not deny the use of `//nolint`
Instead it allows us to enforce a common nolint style:
- force that a linter name must be specified
- do not add a space between `//` and `nolint`
- make sure nolint is only used when there is actually a problem
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
If a privileged container is running, stops, and the devices on the host
change, such as a USB device is unplugged, then a container would no
longer start. Previously, the devices from the host were only being
added to the container once: when the container was created. Now, this
happens every time the container starts.
I did this by adding a boolean to the container config that indicates
whether to mount all of the devices or not, which can be set via an option.
During spec generation, if the `MountAllDevices` option is set in the
container config, all host devices are added to the container.
Additionally, a couple of functions from `pkg/specgen/generate/config_linux.go`
were moved into `pkg/util/utils_linux.go` as they were needed in
multiple packages.
Closes#13899
Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
Rather than assuming a filesystem path, the API service URI is recorded
in the libpod runtime configuration and then reported as requested.
Note: All schemes other than "unix" are hard-coded to report URI exists.
Fixes#12023
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
golang.org/x/crypto/ssh/terminal is deprecated. The package was moved to
golang.org/x/term. golang.org/x/crypto/ssh/terminal was already just
calling golang.org/x/term itslef so there are no functional changes.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
From a security point of view, it would be nice to be able to map a
rootless usernamespace that does not use your own UID within the
container.
This would add protection against a hostile process escapping the
container and reading content in your homedir.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
`--mount` should allow setting driver specific options using
`volume-opt` when `type=volume` is set.
This ensures parity with docker's `volume-opt`.
Signed-off-by: Aditya R <arajan@redhat.com>
WalkDir should be faster the Walk, since we often do
not need to stat files.
[NO NEW TESTS NEEDED] Existing tests should find errors.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When podman gets an error it prints out "Error: " before
printing the error string. If the error message starts with
error, we end up with
Error: error ...
This PR Removes all of these stutters.
logrus.Error() also prints out that this is an error, so no need for the
error stutter.
[NO NEW TESTS NEEDED]
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
To prevent duplication and potential bugs we should use the same
GetRuntimeDir function that is used in c/common.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Previously, devices with a major/minor number >256 would fail to be
detected. Switch to using bitwise conversion (similar to
sys/sysmacros in C).
[NO NEW TESTS NEEDED]
Signed-off-by: Robb Manes <robbmanes@protonmail.com>