when /proc is mounted with hidepid=1 a process doesn't see processes
from the outer user namespace. This causes an issue reading the
cmdline from the parent process.
To address it, always read the command line from /proc/self instead of
using /proc/PARENT_PID.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we need to store the pause process PID file so that it can be re-used
later.
commit e9dc212092 introduced this
regression.
Closes: https://github.com/containers/libpod/issues/5246
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if there are more than FD_SETSIZE open fds passed down to the Podman
process, the initialization code could crash as it attempts to store
them into a fd_set. Use an array of fd_set structs, each of them
holding only FD_SETSIZE file descriptors.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the pause process doesn't exist and we try to join a conmon
namespace, make sure the process still exists. Otherwise re-create
the user namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
make sure the rootless env variables are set also when we are joining
directly the user+mount namespace without creating a new process.
It is required by pkg/unshare in containers/common.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the renameat2 syscall might be defined in the C library but lacking
support in the kernel.
In such case, let it fallback to open(O_CREAT)+rename as it does on
systems lacking the definition for renameat2.
Closes: https://github.com/containers/libpod/issues/4570
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the pause process cannot be joined, remove the pause.pid while
keeping a lock on it, and try to recreate it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If we don't do this, we print WARN level messages that we should
not be printing by default.
Up one WARN message to ERROR so it still shows up by default.
Fixes: #4115Fixes: #4012
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
detect if the current user namespace doesn't match the configuration
in the /etc/subuid and /etc/subgid files.
If there is a mismatch, raise a warning and suggest the user to
recreate the user namespace with "system migrate", that also restarts
the containers.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
unfortunately rootless won't work without cgo, as most of the
implementation is in C, but at least allow to build libpod.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
do not attempt to join the rootless namespace if it is running already
with euid == 0.
Closes: https://github.com/containers/libpod/issues/3463
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Currently pause process blocks all signals which may cause its
termination, including SIGTERM. This behavior hangs init(1) during
system shutdown, until pause process gets SIGKILLed after some grace
period. To avoid this hanging, SIGTERM is excluded from list of blocked
signals.
Fixes#3440
Signed-off-by: Danila Kiver <danila.kiver@mail.ru>
at least on Fedora 30 it creates the /run/user/UID directory for the
user logged in via ssh.
This needs to be done very early so that every other check when we
create the default configuration file will point to the correct
location.
Closes: https://github.com/containers/libpod/issues/3410
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
To avoid unnecessary warnings and errors in the future I'd like to
propose building all cgo related sources with `-Wall -Werror`. This
commit fixes some warnings which came up in `shm_lock.c`, too.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
The second argument of `execlp` should be of type `char *`, so we need
to add an additional argument there.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
on old kernels the ioctl NS_GET_PARENT is not available.
Handle the error code and immediately return the same fd. It should
be fine now that we use the namespace resolution using the conmon pid,
so the namespace parent resolution is just a safety measure.
Closes: https://github.com/containers/libpod/issues/2968
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we are allowed to use only signal safe functions between a fork of a
multithreaded application and the next execve. Since setenv(3) is not
signal safe, block signals. We are already doing it for creating a
new namespace.
This is mostly a cleanup since reexec_in_user_namespace_wait is used
only only to join existing namespaces when we have not a pause.pid
file.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
after we read from the pause PID file, NUL terminate the buffer to
avoid reading garbage from the stack.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
move the logic for joining existing namespaces down to the rootless
package. In main_local we still retrieve the list of conmon pid files
and use it from the rootless package.
In addition, create a temporary user namespace for reading these
files, as the unprivileged user might not have enough privileges for
reading the conmon pid file, for example when running with a different
uidmap and root in the container is different than the rootless user.
Closes: https://github.com/containers/libpod/issues/3187
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
add a shortcut for joining immediately the namespace so we don't need
to re-exec Podman.
With the pause process simplificaton, we can now attempt to join the
namespaces as soon as Podman starts (and before the Go runtime kicks
in), so that we don't need to re-exec and use just one process.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
use a pause process to keep the user and mount namespace alive.
The pause process is created immediately on reload, and all successive
Podman processes will refer to it for joining the user&mount
namespace.
This solves all the race conditions we had on joining the correct
namespaces using the conmon processes.
As a fallback if the join fails for any reason (e.g. the pause process
was killed), then we try to join the running containers as we were
doing before.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we were previously closing as many FDs as they were open when we first
started Podman in the range (3-MAX-FD). This would cause issues if
there were empty intervals, as these FDs are later on used by the
Golang runtime. Store exactly what FDs were first open in a fd_set,
so that we can close exactly the FDs that were open at startup.
Closes: https://github.com/containers/libpod/issues/2964
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we were previously proxying all the signals, but doing that for
SIGTSTP prevented the main process to be stopped by the tty.
Closes: https://github.com/containers/libpod/issues/2775
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
simplify the rootless implementation to use a single user namespace
for all the running containers.
This makes the rootless implementation behave more like root Podman,
where each container is created in the host environment.
There are multiple advantages to it: 1) much simpler implementation as
there is only one namespace to join. 2) we can join namespaces owned
by different containers. 3) commands like ps won't be limited to what
container they can access as previously we either had access to the
storage from a new namespace or access to /proc when running from the
host. 4) rootless varlink works. 5) there are only two ways to enter
in a namespace, either by creating a new one if no containers are
running or joining the existing one from any container.
Containers created by older Podman versions must be restarted.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
in the few places where we care about skipping the storage
initialization, we can simply use the process effective UID, instead
of relying on a global boolean flag.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
from _LIBPOD to _CONTAINERS. The same change was done in buildah
unshare.
This is necessary for podman to detect we are running in a rootless
environment and work properly from a "buildah unshare" session.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
to protect against regressions, we need to add a few gating tasks:
* build with varlink
* build podman-remote
* build podman-remote-darwin
we already have a gating task for building without varlink
Signed-off-by: baude <bbaude@redhat.com>
we were playing safe and not allowed any container to have less than
65536 mappings. There are a couple of reasons to change it:
- it blocked libpod to work in an environment where
newuidmap/newgidmap are not available, or not configured.
- not allowed to use different partitions of subuids, where each user
has less than 65536 ids available.
Hopefully this change in containers/storage:
https://github.com/containers/storage/pull/303
will make error clearers if there are not enough IDs for the image
that is being used.
Closes: https://github.com/containers/libpod/issues/1651
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
let the process running as euid != 0 pass down an argument to the
process running in the user namespace. This will be useful for
commands like rm -a that needs to join different namespaces, so that
we can re-exec separately for each of them.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we use "podman info" to reconfigure the runtime after a reboot, but we
don't propagate the error message back if something goes wrong.
Closes: https://github.com/containers/libpod/issues/2584
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
from the clone man page:
On the cris and s390 architectures, the order of the first two
arguments is reversed:
long clone(void *child_stack, unsigned long flags,
int *ptid, int *ctid,
unsigned long newtls);
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672714
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if any of the mapping tools for setting up the user namespace fail,
then include their output in the error message.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when joining an existing namespace, we were not maintaining the
current working directory, causing commands like export -o to fail
when they weren't referring to absolute paths.
Closes: https://github.com/containers/libpod/issues/2381
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Add the possibility to join directly the user and mount namespace
without looking up the parent of the user namespace.
We need this in order to be able the conmon process, as the mount
namespace is kept alive only there.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it was reported on IRC that Podman on Ubuntu failed as
newuidmap/newgidmap were not installed by default.
Raise an error if we are not allowing single mappings (used only by
the tests suite) and any of the binaries is not present.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Need to return an error pointing user in right direction if rootless podman
fails, because of no /etc/subuid or /etc/subgid files.
Also fix up man pages to better describe rootless podman.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
the issue is caused by the Go Runtime that messes up with the process
signals, overriding SIGSETXID and SIGCANCEL which are used internally
by glibc. They are used to inform all the threads to update their
stored uid/gid information. This causes a hang on the set*id glibc
wrappers since the handler installed by glibc is never invoked.
Since we are running with only one thread, we don't really need to
update other threads or even the current thread as we are not using
getuid/getgid before the execvp.
Closes: https://github.com/containers/libpod/issues/1625
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Most container images assume there are at least 65536 UIDs/GIDs
available. Raise an error if there are not enough IDs allocated to
the current user.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1520
Approved by: rhatdan
change the tests to use chroot to set a numeric UID/GID.
Go syscall.Credential doesn't change the effective UID/GID of the
process.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1372
Approved by: mheon
Manage the case where the main process of the container creates and
joins a new user namespace.
In this case we want to join only the first child in the new
hierarchy, which is the user namespace that was used to create the
container.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1331
Approved by: rhatdan
We cannot re-exec into a new user namespace to gain privileges and
access an existing as the new namespace is not the owner of the
existing container.
"unshare" is used to join the user namespace of the target container.
The current implementation assumes that the main process of the
container didn't create a new user namespace.
Since in the setup phase we are not running with euid=0, we must skip
the setup for containers/storage.
Closes: https://github.com/containers/libpod/issues/1329
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1331
Approved by: rhatdan
Lookup the current username by UID if the USER env variable is not
set.
Reported in: https://github.com/projectatomic/libpod/issues/1092
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1217
Approved by: rhatdan
It is required only when directly configuring the user namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1200
Approved by: rhatdan
Most images won't work without multiple ids/gids. Error out
immediately if there are no multiple ids available.
The error code when the user is not present in /etc/sub{g,u}id looks
like:
$ bin/podman run --rm -ti alpine echo hello
ERRO[0000] No subuid ranges found for user "gscrivano"
Closes: https://github.com/projectatomic/libpod/issues/1087
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1097
Approved by: rhatdan
unshare the mount namespace as well when creating an user namespace so
that we are the owner of the mount namespace and we can mount FUSE
file systems on Linux 4.18. Tested on Fedora Rawhide:
podman --storage-opt overlay.fuse_program=/usr/bin/fuse-overlayfs run alpine echo hello
hello
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
use execvp instead of exec so that we keep the PATH environment
variable and the lookup for the "podman" executable works.
Closes: https://github.com/projectatomic/libpod/issues/1070
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1072
Approved by: mheon
The files were split apart by b96be3af (changes to allow for darwin
compilation, 2018-06-20, #1015), but the C import and two functions
left in rootless.go are all Linux-specific as well. This commit moves
all of the pre-b96be3af rootless.go into rootless_linux.go, just
adding the '// +build linux' header (b96be3af also scrambled the + in
that header) and keeping the new GetRootlessUID from a1545fe6
(rootless: add function to retrieve the original UID, 2018-07-05, #1048).
Signed-off-by: W. Trevor King <wking@tremily.us>
Closes: #1034
Approved by: baude
this should represent the last major changes to get darwin to **compile**. again,
the purpose here is to get darwin to compile so that we can eventually implement a
ci task that would protect against regressions for darwin compilation.
i have left the manual darwin compilation largely static still and in fact now only
interject (manually) two build tags to assist with the build. trevor king has great
ideas on how to make this better and i will defer final implementation of those
to him.
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1047
Approved by: rhatdan
After we re-exec in the userNS os.Getuid() returns the new UID (= 0)
which is not what we want to use.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1048
Approved by: mheon
When running podman as non root user always create an userNS and let
the OCI runtime use it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #936
Approved by: rhatdan