Commit Graph

728 Commits

Author SHA1 Message Date
OpenShift Merge Robot 626dfdb613
Merge pull request #3691 from baude/infoeventlogger
add eventlogger to info
2019-08-05 15:23:05 +02:00
OpenShift Merge Robot e2f38cdaa4
Merge pull request #3310 from gabibeyer/rootlessKata
rootless: Rearrange setup of rootless containers ***CIRRUS: TEST IMAGES***
2019-08-05 14:26:04 +02:00
Daniel J Walsh 66485c80fc
Don't log errors to the screen when XDG_RUNTIME_DIR is not set
Drop errors to debug when trying to setup the runtimetmpdir.  If the tool
can not setup a runtime dir, it will error out with a correct message
no need to put errors on the screen, when the tool actually succeeds.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-08-04 06:50:47 -04:00
baude 63eef5a234 add eventlogger to info
to help with future debugging, we now display the type of event logger
being used inside podman info -> host.

Signed-off-by: baude <bbaude@redhat.com>
2019-08-02 20:05:27 -05:00
OpenShift Merge Robot 5370c53c9c
Merge pull request #3692 from haircommander/play-caps
Add Capability support to play kube
2019-08-02 10:42:46 +02:00
OpenShift Merge Robot e3240daa47
Merge pull request #3551 from mheon/fix_memory_leak
Fix memory leak with exit files
2019-08-02 03:44:43 +02:00
OpenShift Merge Robot 1bbcb2fc56
Merge pull request #3458 from rhatdan/volume
Use buildah/pkg/parse volume parsing rather then internal version
2019-08-01 23:24:03 +02:00
Peter Hunt 834107c82e Add capability functionality to play kube
Take capabilities written in a kube and add to a container
adapt test suite and write cap-add/drop tests

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-08-01 15:47:45 -04:00
Matthew Heon 6bbeda6da5 Pass on events-backend config to cleanup processes
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-08-01 12:37:24 -04:00
OpenShift Merge Robot 6f62dac163
Merge pull request #3341 from rhatdan/exit
Add new exit codes to rm & rmi for running containers & dependencies
2019-08-01 13:37:19 +02:00
Daniel J Walsh e7aca5568a
Use buildah/pkg/parse volume parsing rather then internal version
We share this code with buildah, so we should eliminate the podman
version.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-08-01 03:48:16 -04:00
Daniel J Walsh 5370d9cb76
Add new exit codes to rm & rmi for running containers & dependencies
This enables programs and scripts wrapping the podman command to handle
'podman rm' and 'podman rmi' failures caused by paused or running
containers or due to images having other child images or dependent
containers. These errors are common enough that it makes sense to have
a more machine readable way of detecting them than parsing the standard
error output.

Signed-off-by: Ondrej Zoder <ozoder@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-08-01 03:40:29 -04:00
Matthew Heon 7dd1df4323 Retrieve exit codes for containers via events
As we previously removed our exit code retrieval code to stop a
memory leak, we need a new way of doing this. Fortunately, events
is able to do the job for us.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-31 17:28:42 -04:00
Matthew Heon ebacfbd091 podman: fix memleak caused by renaming and not deleting
the exit file

If the container exit code needs to be retained, it cannot be retained
in tmpfs, because libpod runs in a memcg itself so it can't leave
traces with a daemon-less design.

This wasn't a memleak detectable by kmemleak for example. The kernel
never lost track of the memory and there was no erroneous refcounting
either. The reference count dependencies however are not easy to track
because when a refcount is increased, there's no way to tell who's
still holding the reference. In this case it was a single page of
tmpfs pagecache holding a refcount that kept pinned a whole hierarchy
of dying memcg, slab kmem, cgropups, unrechable kernfs nodes and the
respective dentries and inodes. Such a problem wouldn't happen if the
exit file was stored in a regular filesystem because the pagecache
could be reclaimed in such case under memory pressure. The tmpfs page
can be swapped out, but that's not enough to release the memcg with
CONFIG_MEMCG_SWAP_ENABLED=y.

No amount of more aggressive kernel slab shrinking could have solved
this. Not even assigning slab kmem of dying cgroups to alive cgroup
would fully solve this. The only way to free the memory of a dying
cgroup when a struct page still references it, would be to loop over
all "struct page" in the kernel to find which one is associated with
the dying cgroup which is a O(N) operation (where N is the number of
pages and can reach billions). Linking all the tmpfs pages to the
memcg would cost less during memcg offlining, but it would waste lots
of memory and CPU globally. So this can't be optimized in the kernel.

A cronjob running this command can act as workaround and will allow
all slab cache to be released, not just the single tmpfs pages.

    rm -f /run/libpod/exits/*

This patch solved the memleak with a reproducer, booting with
cgroup.memory=nokmem and with selinux disabled. The reason memcg kmem
and selinux were disabled for testing of this fix, is because kmem
greatly decreases the kernel effectiveness in reusing partial slab
objects. cgroup.memory=nokmem is strongly recommended at least for
workstation usage. selinux needs to be further analyzed because it
causes further slab allocations.

The upstream podman commit used for testing is
1fe2965e4f (v1.4.4).

The upstream kernel commit used for testing is
f16fea666898dbdd7812ce94068c76da3e3fcf1e (v5.2-rc6).

Reported-by: Michele Baldessari <michele@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

<Applied with small tweaks to comments>
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-31 17:28:42 -04:00
Gabi Beyer 80dcd4bebc rootless: Rearrange setup of rootless containers
In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to run rootless containers by:
  1. create a network namespace
  2. pass the netns persistent mount path to the slirp4netns
     to create the tap inferface
  3. pass the netns path to the OCI spec, so the runtime can
     enter the netns

Closes #2897

Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
2019-07-30 23:28:52 +00:00
Daniel J Walsh 141c7a5165
Vendor in buildah 1.9.2
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-07-30 16:48:18 -04:00
OpenShift Merge Robot 0c4dfcfe57
Merge pull request #3639 from giuseppe/user-ns-container
podman: support --userns=ns|container
2019-07-26 15:06:06 +02:00
Giuseppe Scrivano 1d72f651e4
podman: support --userns=ns|container
allow to join the user namespace of another container.

Closes: https://github.com/containers/libpod/issues/3629

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-25 23:04:55 +02:00
Giuseppe Scrivano ba5741e398
pods: do not to join a userns if there is not any
do not attempt to join the user namespace if the pod is running in the
host user namespace.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-25 23:04:54 +02:00
samc24 d6ea4b4139 Improved hooks monitoring
...to work for specific edge cases with a simpler solution.
Re-reads hooks directories after any changes are detected by the watchers.
Added monitoring test for adding a different invalid hook to primary directory.
Some issues with prior code:
- ReadDir would stop when it encounters an invalid hook, rather than registering an error but continuing to read the valid hook.
- Wouldn’t account for Rename and Chmod events.
- After doing a mv of the hooks file instead of rm, it would still think the hooks file is in the directory, but it has been moved to another location.
- If a hook file was renamed, it would register the renamed file as a separate hook and not delete the original, so it would then execute the hook twice - once for the renamed file, and once for the original name which it did not delete.

Signed-off-by: samc24 <sam.chaturvedi24@gmail.com>
2019-07-25 09:52:45 -04:00
OpenShift Merge Robot eae9a009b2
Merge pull request #3624 from haircommander/conmon-exec-with-remote-exec
Add remote exec
2019-07-24 13:16:21 +02:00
Peter Hunt 01a8483a59 refactor to reduce duplicated error parsing
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 16:49:04 -04:00
Peter Hunt 82dce36fb6 remove debug prints
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 16:49:04 -04:00
Matthew Heon 7a85317ccb Re-add int64 casts for ctime
The variables here are 64-bit on 64-bit builds, so the linter
recommends stripping them. Unfortunately, they're 32-bit on
32-bit builds, so stripping them breaks that. Readd with a nolint
to convince it to not break.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2019-07-23 15:43:40 -04:00
Peter Hunt d59f083637 always send generic error in case io fails
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:30:15 -04:00
Peter Hunt 9e69285704 only use stdin if specified
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:30:15 -04:00
Peter Hunt 74ab273e91 buffer errChan
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:30:15 -04:00
Peter Hunt a4041dafae move handleTerminalAttach to generic build
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:30:14 -04:00
Peter Hunt 638b73a046 remove unnecessary conversions
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:29:33 -04:00
Peter Hunt 5bf99a82ff add detach keys support for remote
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:29:33 -04:00
Peter Hunt 479eeac62c move editing of exitCode to runtime
There's no way to get the error if we successfully get an exit code (as it's just printed to stderr instead).
instead of relying on the error to be passed to podman, and edit based on the error code, process it on the varlink side instead

Also move error codes to define package

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:29:33 -04:00
Peter Hunt 35ba77e040 Update e2e tests for remote exec
including changing -l to the container id
and separating a case of setting the env that remote can't handle

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:29:33 -04:00
Peter Hunt 2a474c88c9 Finish up remote exec implementation
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-23 13:29:33 -04:00
baude a793bccae6 golangci-lint cleanup
a PR slipped through without running the new linter.  this cleans things
up for the master branch.

Signed-off-by: baude <bbaude@redhat.com>
2019-07-23 10:13:04 -05:00
OpenShift Merge Robot 26749204d5
Merge pull request #3621 from baude/golangcilint4
golangci-lint phase 4
2019-07-23 10:21:41 +02:00
baude 0c3038d4b5 golangci-lint phase 4
clean up some final linter issues and add a make target for
golangci-lint. in addition, begin running the tests are part of the
gating tasks in cirrus ci.

we cannot fully shift over to the new linter until we fix the image on
the openshift side.  for short term, we will use both

Signed-off-by: baude <bbaude@redhat.com>
2019-07-22 15:44:04 -05:00
Peter Hunt a1a79c08b7 Implement conmon exec
This includes:
	Implement exec -i and fix some typos in description of -i docs
	pass failed runtime status to caller
	Add resize handling for a terminal connection
	Customize exec systemd-cgroup slice
	fix healthcheck
	fix top
	add --detach-keys
	Implement podman-remote exec (jhonce)
	* Cleanup some orphaned code (jhonce)
	adapt remote exec for conmon exec (pehunt)
	Fix healthcheck and exec to match docs
		Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error
		Use these different errors in branching for exit code in healthcheck and exec
	Set conmon to use new api version

Signed-off-by: Jhon Honce <jhonce@redhat.com>

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-22 15:57:23 -04:00
OpenShift Merge Robot 3b52e4d0b5
Merge pull request #3562 from baude/golangcilint3
golangci-lint round #3
2019-07-22 13:13:50 +02:00
baude db826d5d75 golangci-lint round #3
this is the third round of preparing to use the golangci-lint on our
code base.

Signed-off-by: baude <bbaude@redhat.com>
2019-07-21 14:22:39 -05:00
Daniel J Walsh f7f66f6a88
Remove debug message
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-07-20 16:20:48 -04:00
Daniel J Walsh 8ae97b2f57
Add support for listing read/only and read/write images
When removing --all images prune images only attempt to remove read/write images,
ignore read/only images

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-07-19 06:59:49 -04:00
OpenShift Merge Robot deb087d7b1
Merge pull request #3443 from adrianreber/rootfs-changes-migration
Include changes to the container's root file-system in the checkpoint archive
2019-07-19 02:38:26 +02:00
OpenShift Merge Robot 2254a35d3a
Merge pull request #3593 from giuseppe/rootless-privileged-devices
rootless: add host devices with --privileged
2019-07-18 19:50:22 +02:00
OpenShift Merge Robot 1065548f91
Merge pull request #3584 from QiWang19/pssize
podman-remote make --size optional in ps
2019-07-18 18:04:47 +02:00
Giuseppe Scrivano 350ede1eeb
rootless: add rw devices with --privileged
when --privileged is specified, add all the devices that are usable by
the user.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730773

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-18 17:07:50 +02:00
OpenShift Merge Robot ade0d8778f
Merge pull request #3509 from giuseppe/cgroup-namespace
libpod: support for cgroup namespace
2019-07-18 16:14:52 +02:00
Qi Wang c244c347b1 podman-remote make --size optional in ps
Close #3578 Add `size` field to PsOpts in podman remote to receive size as an option.

Signed-off-by: Qi Wang <qiwan@redhat.com>
2019-07-18 09:34:19 -04:00
Sascha Grunert 27ebd7d6f0
Add DefaultContent API to retrieve apparmor profile content
The default apparmor profile is not stored on disk which causes
confusion when debugging the content of the profile. To solve this, we
now add an additional API which returns the profile as byte slice.

Signed-off-by: Sascha Grunert <sgrunert@suse.com>
2019-07-18 13:14:02 +02:00
Giuseppe Scrivano 0b57e77d7c
libpod: support for cgroup namespace
allow a container to run in a new cgroup namespace.

When running in a new cgroup namespace, the current cgroup appears to
be the root, so that there is no way for the container to access
cgroups outside of its own subtree.

By default it uses --cgroup=host to keep the previous behavior.

To create a new namespace, --cgroup=private must be provided.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-07-18 10:32:25 +02:00
OpenShift Merge Robot 7488ed6d9a
Merge pull request #3522 from mheon/nix_the_artifact
Move the HostConfig portion of Inspect inside libpod
2019-07-18 09:23:47 +02:00