Commit Graph

81 Commits

Author SHA1 Message Date
Giuseppe Scrivano 0b57e77d7c
libpod: support for cgroup namespace
allow a container to run in a new cgroup namespace.

When running in a new cgroup namespace, the current cgroup appears to
be the root, so that there is no way for the container to access
cgroups outside of its own subtree.

By default it uses --cgroup=host to keep the previous behavior.

To create a new namespace, --cgroup=private must be provided.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-18 10:32:25 +02:00
Matthew Heon c91bc31570 Populate inspect with security-opt settings
We can infer no-new-privileges. For now, manually populate
seccomp (can't infer what file we sourced from) and
SELinux/Apparmor (hard to tell if they're enabled or not).

Signed-off-by: Matthew Heon <mheon@redhat.com>
2019-07-17 16:48:38 -04:00
Matthew Heon 1e3e99f2fe Move the HostConfig portion of Inspect inside libpod
When we first began writing Podman, we ran into a major issue
when implementing Inspect. Libpod deliberately does not tie its
internal data structures to Docker, and stores most information
about containers encoded within the OCI spec. However, Podman
must present a CLI compatible with Docker, which means it must
expose all the information in 'docker inspect' - most of which is
not contained in the OCI spec or libpod's Config struct.

Our solution at the time was the create artifact. We JSON'd the
complete CreateConfig (a parsed form of the CLI arguments to
'podman run') and stored it with the container, restoring it when
we needed to run commands that required the extra info.

Over the past month, I've been looking more at Inspect, and
refactored large portions of it into Libpod - generating them
from what we know about the OCI config and libpod's (now much
expanded, versus previously) container configuration. This path
comes close to completing the process, moving the last part of
inspect into libpod and removing the need for the create
artifact.

This improves libpod's compatability with non-Podman containers.
We no longer require an arbitrarily-formatted JSON blob to be
present to run inspect.

Fixes: #3500

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-17 16:48:38 -04:00
Giuseppe Scrivano 2f0ed531c7
spec: rework --ulimit host
it seems enough to not specify any ulimit block to maintain the host
limits.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-17 13:01:21 +02:00
OpenShift Merge Robot 9d87945005
Merge pull request #3563 from giuseppe/fix-single-mapping-rootless
spec: fix userns with less than 5 gids
2019-07-12 22:31:37 +02:00
Giuseppe Scrivano d74db186a8
spec: fix userns with less than 5 gids
when the container is running in a user namespace, check if gid=5 is
available, otherwise drop the option gid=5 for /dev/pts.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-12 11:35:03 +02:00
OpenShift Merge Robot 2b64f88446
Merge pull request #3491 from giuseppe/rlimit-host
podman: add --ulimit host
2019-07-11 21:35:37 +02:00
baude e053e0e05e first pass of corrections for golangci-lint
Signed-off-by: baude <bbaude@redhat.com>
2019-07-10 15:52:17 -05:00
Giuseppe Scrivano fb88074e68
podman: add --ulimit host
add a simple way to copy ulimit values from the host.

if --ulimit host is used then the current ulimits in place are copied
to the container.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-08 19:22:54 +02:00
Giuseppe Scrivano 5d25a4793d
util: drop IsCgroup2UnifiedMode and use it from cgroups
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-06-26 13:17:04 +02:00
Giuseppe Scrivano 14fe39968f
rootless: force resources to be nil on cgroup v1
force the resources block to be empty instead of having default
values.

Regression introduced by 8e88461511

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-05-20 21:45:05 +02:00
Daniel J Walsh db218e7162
Don't set apparmor if --priviliged
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-05-20 09:11:16 -04:00
Giuseppe Scrivano 8e88461511
rootless, spec: allow resources with cgroup v2
We were always raising an error when the rootless user attempted to
setup resources, but this is not the case anymore with cgroup v2.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-05-13 10:48:16 +02:00
Matthew Heon 606cee93bf Move handling of ReadOnlyTmpfs into new mounts code
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-01 10:19:05 -04:00
Matthew Heon 9ee50fe2c7 Migrate to unified volume handling code
Unify handling for the --volume, --mount, --volumes-from, --tmpfs
and --init flags into a single file and set of functions. This
will greatly improve readability and maintainability.

Further, properly handle superceding and conflicting mounts. Our
current patchwork has serious issues when mounts conflict, or
when a mount from --volumes-from or an image volume should be
overwritten by a user volume or named volume.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-01 10:19:05 -04:00
Matthew Heon 4540458a5e Remove non-config fields from CreateConfig
The goal here is to keep only the configuration directly used to
build the container in CreateConfig, and scrub temporary state
and helpers that we need to generate. We'll keep those internally
in MakeContainerConfig.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-01 10:16:23 -04:00
Matthew Heon 869466eb25 Add a new function for converting a CreateConfig
Right now, there are two major API calls necessary to turn a
filled-in CreateConfig into the options and OCI spec necessary to
make a libpod Container. I'm intending on refactoring both of
these extensively to unify a few things, so make a common
frontend to both that will prevent API changes from leaking out
of the package.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-01 10:16:23 -04:00
James Cassell 354d80626a auto pass http_proxy into container
Signed-off-by: James Cassell <code@james.cassell.me>
2019-04-30 17:29:29 -04:00
Daniel J Walsh 3a4be4b66c
Add --read-only-tmpfs options
The --read-only-tmpfs option caused podman to mount tmpfs on /run, /tmp, /var/tmp
if the container is running int read-only mode.

The default is true, so you would need to execute a command like

--read-only --read-only-tmpfs=false to turn off this behaviour.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-04-26 12:29:10 -04:00
Giuseppe Scrivano 2c9c40dc82
spec: mask /sys/kernel when bind mounting /sys
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-04-11 15:55:34 +02:00
Giuseppe Scrivano 42eb9eaf29
oci: add /sys/kernel to the masked paths
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-04-11 15:52:36 +02:00
Matthew Heon 1fdc89f616 Drop LocalVolumes from our the database
We were never using it. It's actually a potentially quite sizable
field (very expensive to decode an array of structs!). Removing
it should do no harm.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-04-04 12:27:20 -04:00
Matthew Heon 7309e38ddd Add handling for new named volumes code in pkg/spec
Now that named volumes must be explicitly enumerated rather than
passed in with all other volumes, we need to split normal and
named volumes up before passing them into libpod. This PR does
this.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-04-04 12:26:29 -04:00
TomSweeneyRedHat 8f418f1568 Vendor docker/docker, fsouza and more #2
Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>

Vendors in fsouza/docker-client, docker/docker and
a few more related. Of particular note, changes to the TweakCapabilities()
function from docker/docker along with the parse.IDMappingOptions() function
from Buildah. Please pay particular attention to the related changes in
the call from libpod to those functions during the review.

Passes baseline tests.
2019-03-13 11:40:39 -04:00
Daniel J Walsh de12f45688
Fix SELinux on host shared systems in userns
Currently if you turn on --net=host on a rootless container
and have selinux-policy installed in the image, tools running with
SELinux will see that the system is SELinux enabled in rootless mode.

This patch mounts a tmpfs over /sys/fs/selinux blocking this behaviour.

This patch also fixes the fact that if you shared --pid=host we were not
masking over certin /proc paths.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-03-11 15:17:22 -04:00
Giuseppe Scrivano 0f5ae3c5af
podman: fix ro bind mounts if no* opts are on the source
This is a workaround for the runc issue:

https://github.com/opencontainers/runc/issues/1247

If the source of a bind mount has any of nosuid, noexec or nodev, be
sure to propagate them to the bind mount so that when runc tries to
remount using MS_RDONLY, these options are also used.

Closes: https://github.com/containers/libpod/issues/2312

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-02-25 18:56:09 +01:00
Giuseppe Scrivano e2970ea62d
rootless: do not override /dev/pts if not needed
when running in rootless mode we were unconditionally overriding
/dev/pts to take ride of gid=5.  This is not needed when multiple gids
are present in the namespace, which is always the case except when
running the tests suite with only one mapping.  So change it to check
how many gids are present before overriding the default mount.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-02-06 15:31:20 +01:00
Giuseppe Scrivano 8156f8c694
rootless: fix --pid=host without --privileged
When using --pid=host don't try to cover /proc paths, as they are
coming from the /proc bind mounted from the host.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-01-18 17:12:28 +01:00
Valentin Rothberg edb285d176 apparmor: apply default profile at container initialization
Apply the default AppArmor profile at container initialization to cover
all possible code paths (i.e., podman-{start,run}) before executing the
runtime.  This allows moving most of the logic into pkg/apparmor.

Also make the loading and application of the default AppArmor profile
versio-indepenent by checking for the `libpod-default-` prefix and
over-writing the profile in the run-time spec if needed.

The intitial run-time spec of the container differs a bit from the
applied one when having started the container, which results in
displaying a potentially outdated AppArmor profile when inspecting
a container.  To fix that, load the container config from the file
system if present and use it to display the data.

Fixes: #2107
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2019-01-09 22:18:11 +01:00
Daniel J Walsh 43686072d3
Update vendor of runc
Updating the vendor or runc to pull in some fixes that we need.
In order to get this vendor to work, we needed to update the vendor
of docker/docker, which causes all sorts of issues, just to fix
the docker/pkg/sysinfo.  Rather then doing this, I pulled in pkg/sysinfo
into libpod and fixed the code locally.

I then switched the use of docker/pkg/sysinfo to libpod/pkg/sysinfo.

I also switched out the docker/pkg/mount to containers/storage/pkg/mount

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-01-04 14:54:59 -05:00
Daniel J Walsh df99522c67
Fixes to handle /dev/shm correctly.
We had two problems with /dev/shm, first, you mount the
container read/only then /dev/shm was mounted read/only.
This is a bug a tmpfs directory should be read/write within
a read-only container.

The second problem is we were ignoring users mounted /dev/shm
from the host.

If user specified

podman run -d -v /dev/shm:/dev/shm ...

We were dropping this mount and still using the internal mount.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-12-24 09:03:53 -05:00
Daniel J Walsh 1ad6f9af15
Allow users to specify a directory for additonal devices
Podman will search through the directory and will add any device
nodes that it finds.  If no devices are found we return an error.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-12-21 10:28:14 -05:00
Giuseppe Scrivano 4203df69ac
rootless: add new netmode "slirp4netns"
so that inspect reports the correct network configuration.

Closes: https://github.com/containers/libpod/issues/1453

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-27 21:10:16 +01:00
Daniel J Walsh 57a8c2e5e8
Mount proper cgroup for systemd to manage inside of the container.
We are still requiring oci-systemd-hook to be installed in order to run
systemd within a container.  This patch properly mounts

/sys/fs/cgroup/systemd/libpod_parent/libpod-UUID on /sys/fs/cgroup/systemd inside of container.

Since we need the UUID of the container, we needed to move Systemd to be a config option of the
container.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-10-15 16:19:11 -04:00
OpenShift Merge Robot 3c31e176c7
Merge pull request #1557 from rhatdan/systemd
Don't tmpcopyup on systemd cgroup
2018-10-04 09:54:51 -07:00
Giuseppe Scrivano abde1ef0ef
rootless: raise an error when trying to use cgroups
https://github.com/containers/libpod/issues/1429#issuecomment-424040416

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-10-01 09:33:12 +02:00
Daniel J Walsh 87c255f29f
Don't tmpcopyup on systemd cgroup
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-09-29 06:00:47 +02:00
Daniel J Walsh 52c1365f32 Add --mount option for `create` & `run` command
Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Closes: #1524
Approved by: mheon
2018-09-21 21:33:41 +00:00
Daniel J Walsh fbfcc7842e Add new field to libpod to indicate whether or not to use labelling
Also update some missing fields libpod.conf obtions in man pages.

Fix sort order of security options and add a note about disabling
labeling.

When a process requests a new label.  libpod needs to reserve all
labels to make sure that their are no conflicts.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Closes: #1406
Approved by: mheon
2018-09-20 16:01:29 +00:00
Matthew Heon e4770b8289 Small updates to OCI spec generation
Firstly, when adding the privileged catch-all resource device,
first remove the spec's default catch-all resource device.

Second, remove our default rootfs propogation config - Docker
does not set this by default, so I don't think we should either.

Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1491
Approved by: TomSweeneyRedHat
2018-09-17 22:13:42 +00:00
Daniel J Walsh 31294799c4
Don't mount /dev/* if user mounted /dev
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-09-14 13:28:19 -04:00
Matthew Heon e2137cd009 Swap default mount propagation from private to rprivate
This matches Docker behavior more closely and should resolve an
issue we were seeing with /sys mounts

Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1465
Approved by: rhatdan
2018-09-13 21:35:44 +00:00
Matthew Heon ccc4a339cd Respect user-added mounts over default spec mounts
When there was a conflict between a user-added volume and a mount
already in the spec, we previously respected the mount already in
the spec and discarded the user-added mount. This is counter to
expected behavior - if I volume-mount /dev into the container, I
epxect it will override the default /dev in the container, and
not be ignored.

Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1419
Approved by: TomSweeneyRedHat
2018-09-07 17:50:58 +00:00
Matthew Heon 2e89e5a204 Ensure we do not overlap mounts in the spec
When user-specified volume mounts overlap with mounts already in
the spec, remove the mount in the spec to ensure there are no
conflicts.

Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1419
Approved by: TomSweeneyRedHat
2018-09-07 17:50:58 +00:00
Daniel J Walsh 27ca091c08
Add proper support for systemd inside of podman
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-08-31 14:42:32 -04:00
Matthew Heon 6a46af571e Set nproc in containers unless explicitly overridden
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1355
Approved by: rhatdan
2018-08-28 17:32:24 +00:00
Matthew Heon f86f5d3e59 Do not set max open files by default if we are rootless
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1355
Approved by: rhatdan
2018-08-28 17:32:24 +00:00
Matthew Heon 9da94c454f Set default max open files in spec
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>

Closes: #1355
Approved by: rhatdan
2018-08-28 17:32:24 +00:00
Giuseppe Scrivano 663ee91eec Fix Mount Propagation
Default mount propagation inside of containes should be private

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Closes: #1305
Approved by: mheon
2018-08-27 13:26:28 +00:00
Giuseppe Scrivano 5f0a1c1ff8 rootless: fix --pid=host
Unfortunately this is not enough to get it working as runc doesn't
allow to bind mount /proc.

Depends on: https://github.com/opencontainers/runc/pull/1832

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #1349
Approved by: rhatdan
2018-08-27 12:49:32 +00:00