simplify code using auto cleanup functions
[NO NEW TESTS NEEDED] it is a refactoring of existing code
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
since we now support reading additional IDs with libsubid, clarify
that the /etc/subuid and /etc/subgid files are honored only when
shadow-utils is configured to use them.
[NO TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The go logic already prevents podman from joining the userns for machine
commands but the c shortcut code did not.
[NO TESTS NEEDED]
Fixes#11731
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Dealing with os.Signal channels seems more like an art than science
since signals may get lost. os.Notify doesn't block on an unbuffered
channel, so users are expected to know what they're doing or hope for
the best.
In the recent past, I've seen a number of flakes and BZs on non-amd64
architectures where I was under the impression that signals may got
lost, for instance, during stop and exec.
[NO TESTS NEEDED] since this is art.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
avoid a zombie process if on the first launch Podman creates a long
living process, such as "podman system service -t 0".
The `r` variable was overriden thus causing the waitpid to fail and
not clean up the intermediate process.
Closes: https://github.com/containers/podman/issues/10575
[NO TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the root mount '/' is not mounted as MS_SHARED, print a
warning, otherwise new mounts that are created in the host won't be
propagated to the rootless mount namespace.
Closes: https://github.com/containers/podman/issues/10946
[NO TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
commit ab88632835 changed the path for
the pause.pid file but didn't update the same path in the C code.
This prevented Podman to take the fast path when the userns is already
created and to join it without re-execing itself.
Fix the path in the C code as well so we can join the rootless
user+mount namespace without having to re-exec Podman.
[NO TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
sort.Search returns the smallest index, so provide the available IDs
in decreasing order.
It fixes an issue when splitting the current mappings over multiple
available IDs.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Users coming e.g. from Docker do not always read the manual and
expect podman to not require sudo or uidmap, for them the default
message is not very helpful:
Error: Cannot connect to the Podman socket, make sure there is a Podman REST API service running.:
cannot find newuidmap: exec: "newuidmap": executable file not found in $PATH
Adding a bit more context to this would help to nudge them into the
right direction and tell them what to look for in the documentation:
command required for rootless mode with multiple IDs: exec: "newuidmap": executable file not found in $PATH
Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk>
[NO TESTS NEEDED]
when creating a user namespace, attempt to create it first by copying
the current mappings and then fallback to the other methods:
1) use newidmap tools and ...
2) create a user namespace with a single user mapped.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
So rootless setup could use this condition in parent and child, child
podman should adjust LISTEN_PID to its self PID.
Add system test for systemd socket activation
Signed-off-by: pendulm <lonependulm@gmail.com>
since we already have an exported function that does the check,
refactor the code to use it instead of duplicating the logic.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We missed bumping the go module, so let's do it now :)
* Automated go code with github.com/sirkon/go-imports-rename
* Manually via `vgrep podman/v2` the rest
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
writing to the id map fails when an extent overlaps multiple mappings
in the parent user namespace:
$ cat /proc/self/uid_map
0 1000 1
1 100000 65536
$ unshare -U sleep 100 &
[1] 1029703
$ printf "0 0 100\n" | tee /proc/$!/uid_map
0 0 100
tee: /proc/1029703/uid_map: Operation not permitted
This limitation is particularly annoying when working with rootless
containers as each container runs in the rootless user namespace, so a
command like:
$ podman run --uidmap 0:0:2 --rm fedora echo hi
Error: writing file `/proc/664087/gid_map`: Operation not permitted: OCI permission denied
would fail since the specified mapping overlaps the first
mapping (where the user id is mapped to root) and the second extent
with the additional IDs available.
Detect such cases and automatically split the specified mapping with
the equivalent of:
$ podman run --uidmap 0:0:1 --uidmap 1:1:1 --rm fedora echo hi
hi
A fix has already been proposed for the kernel[1], but even if it
accepted it will take time until it is available in a released kernel,
so fix it also in pkg/rootless.
[1] https://lkml.kernel.org/lkml/20201203150252.1229077-1-gscrivan@redhat.com/
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
In case os.Open[File], os.Mkdir[All], ioutil.ReadFile and the like
fails, the error message already contains the file name and the
operation that fails, so there is no need to wrap the error with
something like "open %s failed".
While at it
- replace a few places with os.Open, ioutil.ReadAll with
ioutil.ReadFile.
- replace errors.Wrapf with errors.Wrap for cases where there
are no %-style arguments.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
when newidmap is not installed the code would hit the
reexec_in_user_namespace_wait code and wait for the child process to
be terminated. The child process is blocked waiting on the w pipe.
So make sure to unblock the child process first and then clean it up.
Closes: https://github.com/containers/podman/issues/7776
Signed-off-by: Giuseppe Scrivano <giuseppe@scrivano.org>
Currently, we're not cleanup up after ourselves when fileOutput is nil.
This patch fixes that.
Signed-off-by: Jonathan Dieter <jonathan.dieter@spearline.com>
I'm not sure if this is an OS-specific issue, but on CentOS 8, if `path`
doesn't exist, this hangs while waiting to read from this socket, even
though the socket is closed by the `reexec_in_user_namespace`. Switching
to a pipe fixes the problem, and pipes shouldn't be an issue since this is
Linux-specific code.
Signed-off-by: Jonathan Dieter <jonathan.dieter@spearline.com>
when setting up the user namespace do not ignore errors from
newuidmap/newgidmap if there are mappings configured.
The single user mapping is a fallback only when there are not mappings
specified for the user.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
There are many use cases where you want to just mount an image
without creating a container on it. For example you might want
to just examine the content in an image after you pull it for
security analysys. Or you might want to just use the executables
on the image without running it in a container.
The image is mounted readonly since we do not want people changing
images.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We should default to the user name unmount rather then the internal
name of umount.
Also User namespace was not being handled correctly. We want to inform
the user that if they do a mount when in rootless mode that they have
to be first in the podman unshare state.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
With the advent of Podman 2.0.0 we crossed the magical barrier of go
modules. While we were able to continue importing all packages inside
of the project, the project could not be vendored anymore from the
outside.
Move the go module to new major version and change all imports to
`github.com/containers/libpod/v2`. The renaming of the imports
was done via `gomove` [1].
[1] https://github.com/KSubedi/gomove
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
commit 788fdc685b introduced a race
where the target process dies before the child process opens the
namespace files. Move the open before the fork so if it fails the
parent process can attempt to join a different container instead of
failing.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
since we join directly the conmon user namespace, there is no need to
look up its parent user namespace, as we can safely assume it is the
init namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when /proc is mounted with hidepid=1 a process doesn't see processes
from the outer user namespace. This causes an issue reading the
cmdline from the parent process.
To address it, always read the command line from /proc/self instead of
using /proc/PARENT_PID.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we need to store the pause process PID file so that it can be re-used
later.
commit e9dc212092 introduced this
regression.
Closes: https://github.com/containers/libpod/issues/5246
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if there are more than FD_SETSIZE open fds passed down to the Podman
process, the initialization code could crash as it attempts to store
them into a fd_set. Use an array of fd_set structs, each of them
holding only FD_SETSIZE file descriptors.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the pause process doesn't exist and we try to join a conmon
namespace, make sure the process still exists. Otherwise re-create
the user namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
make sure the rootless env variables are set also when we are joining
directly the user+mount namespace without creating a new process.
It is required by pkg/unshare in containers/common.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the renameat2 syscall might be defined in the C library but lacking
support in the kernel.
In such case, let it fallback to open(O_CREAT)+rename as it does on
systems lacking the definition for renameat2.
Closes: https://github.com/containers/libpod/issues/4570
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the pause process cannot be joined, remove the pause.pid while
keeping a lock on it, and try to recreate it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If we don't do this, we print WARN level messages that we should
not be printing by default.
Up one WARN message to ERROR so it still shows up by default.
Fixes: #4115Fixes: #4012
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
detect if the current user namespace doesn't match the configuration
in the /etc/subuid and /etc/subgid files.
If there is a mismatch, raise a warning and suggest the user to
recreate the user namespace with "system migrate", that also restarts
the containers.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
unfortunately rootless won't work without cgo, as most of the
implementation is in C, but at least allow to build libpod.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
do not attempt to join the rootless namespace if it is running already
with euid == 0.
Closes: https://github.com/containers/libpod/issues/3463
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Currently pause process blocks all signals which may cause its
termination, including SIGTERM. This behavior hangs init(1) during
system shutdown, until pause process gets SIGKILLed after some grace
period. To avoid this hanging, SIGTERM is excluded from list of blocked
signals.
Fixes#3440
Signed-off-by: Danila Kiver <danila.kiver@mail.ru>
at least on Fedora 30 it creates the /run/user/UID directory for the
user logged in via ssh.
This needs to be done very early so that every other check when we
create the default configuration file will point to the correct
location.
Closes: https://github.com/containers/libpod/issues/3410
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
To avoid unnecessary warnings and errors in the future I'd like to
propose building all cgo related sources with `-Wall -Werror`. This
commit fixes some warnings which came up in `shm_lock.c`, too.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
The second argument of `execlp` should be of type `char *`, so we need
to add an additional argument there.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
on old kernels the ioctl NS_GET_PARENT is not available.
Handle the error code and immediately return the same fd. It should
be fine now that we use the namespace resolution using the conmon pid,
so the namespace parent resolution is just a safety measure.
Closes: https://github.com/containers/libpod/issues/2968
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we are allowed to use only signal safe functions between a fork of a
multithreaded application and the next execve. Since setenv(3) is not
signal safe, block signals. We are already doing it for creating a
new namespace.
This is mostly a cleanup since reexec_in_user_namespace_wait is used
only only to join existing namespaces when we have not a pause.pid
file.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
after we read from the pause PID file, NUL terminate the buffer to
avoid reading garbage from the stack.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
move the logic for joining existing namespaces down to the rootless
package. In main_local we still retrieve the list of conmon pid files
and use it from the rootless package.
In addition, create a temporary user namespace for reading these
files, as the unprivileged user might not have enough privileges for
reading the conmon pid file, for example when running with a different
uidmap and root in the container is different than the rootless user.
Closes: https://github.com/containers/libpod/issues/3187
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
add a shortcut for joining immediately the namespace so we don't need
to re-exec Podman.
With the pause process simplificaton, we can now attempt to join the
namespaces as soon as Podman starts (and before the Go runtime kicks
in), so that we don't need to re-exec and use just one process.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
use a pause process to keep the user and mount namespace alive.
The pause process is created immediately on reload, and all successive
Podman processes will refer to it for joining the user&mount
namespace.
This solves all the race conditions we had on joining the correct
namespaces using the conmon processes.
As a fallback if the join fails for any reason (e.g. the pause process
was killed), then we try to join the running containers as we were
doing before.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we were previously closing as many FDs as they were open when we first
started Podman in the range (3-MAX-FD). This would cause issues if
there were empty intervals, as these FDs are later on used by the
Golang runtime. Store exactly what FDs were first open in a fd_set,
so that we can close exactly the FDs that were open at startup.
Closes: https://github.com/containers/libpod/issues/2964
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we were previously proxying all the signals, but doing that for
SIGTSTP prevented the main process to be stopped by the tty.
Closes: https://github.com/containers/libpod/issues/2775
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>