There were two problems with preserve fds.
libpod didn't open the fds before passing _OCI*PIPE to conmon. This caused libpod to talk on the preserved fds, rather than the pipes, with conmon talking on the pipes. This caused a hang.
Libpod also didn't convert an int to string correctly, so it would further fail.
Fix these and add a unit test to make sure we don't regress in the future
Note: this test will not pass on crun until crun supports --preserve-fds
Signed-off-by: Peter Hunt <pehunt@redhat.com>
if slirp4netns supports sandboxing, enable it.
It automatically creates a new mount namespace where slirp4netns will
run and have limited access to the host resources.
It needs slirp4netns 0.4.1.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We should not be throwing errors because the operation we wanted
to perform is already done. Now, it is definitely strange that a
container is actually unmounted, but shows as mounted in the DB -
if this reoccurs in a way where we can investigate, it's worth
tearing into.
Fixes#4033
Signed-off-by: Matthew Heon <mheon@redhat.com>
If you are running a rootless container on cgroupV1
you can not pause the container. We need to report the proper error
if this happens.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This change matches what is happening on the podman local side
and should eliminate a race condition.
Also exit commands on the server side should start to return to client.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We have leaked the exit number codess all over the code, this patch
removes the numbers to constants.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
if we register the resize func too early, it attempts to read from the 'ctl' file before it exists. this causes the func to error, and the resize to not go through.
Fix this by registering resize func later for conmon. This, along with a conmon fix, will allow exec to know the terminal size at startup
Signed-off-by: Peter Hunt <pehunt@redhat.com>
when executing a healthcheck, we were not cleaning up after exec's use
of a socket. we now remove the socket file and ignore if for reason it
does not exist.
Fixes: #3962
Signed-off-by: baude <bbaude@redhat.com>
When --cgroupns=private is used we need to mount a new cgroup file
system so that it points to the correct namespace.
Needs: https://github.com/containers/crun/pull/88
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when running in rootless mode and using systemd as cgroup manager
create automatically a systemd scope when the user doesn't own the
current cgroup.
This solves a couple of issues:
on cgroup v2 it is necessary that a process before it can moved to a
different cgroup tree must be in a directory owned by the unprivileged
user. This is not always true, e.g. when creating a session with su
-l.
Closes: https://github.com/containers/libpod/issues/3937
Also, for running systemd in a container it was before necessary to
specify "systemd-run --scope --user podman ...", now this is done
automatically as part of this PR.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This will be used when we allow 'podman ps' to display info on
storage containers instead of Libpod containers.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Lookup was written before volume states merged, but merged after,
and CI didn't catch the obvious failure here. Without a valid
state, we try to unmarshall into a null pointer, and 'volume rm'
is completely broken because of it.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Podman is not the only user of containers/storage, and as such we
cannot rely on our database as the sole source of truth when
pruning images. If images do not show as in use from Podman's
perspective, but subsequently fail to remove because they are
being used by a container, they're probably being used by Buildah
or another c/storage client.
Since the images in question are in use, we shouldn't error on
failure to prune them - we weren't supposed to prune them in the
first place.
Fixes: #3983
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This is mostly used with Systemd, which really wants to manage
CGroups itself when managing containers via unit file.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This change adds the following annotation to every container created by
podman:
```json
"Annotations": {
"io.containers.manager": "libpod"
}
```
Target of this annotaions is to indicate which project in the containers
ecosystem is the major manager of a container when applications share
the same storage paths. This way projects can decide if they want to
manipulate the container or not. For example, since CRI-O and podman are
not using the same container library (libpod), CRI-O can skip podman
containers and provide the end user more useful information.
A corresponding end-to-end test has been adapted as well.
Relates to: https://github.com/cri-o/cri-o/pull/2761
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Previously, we only did this for volumes created at the same time
as the container. However, this is not correct behavior - Docker
does so for all named volumes, even those made with
'podman volume create' and mounted into a container later.
Fixes#3945
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This isn't included in Docker, but seems handy enough.
Use the new API for 'volume rm' and 'volume inspect'.
Fixes#3891
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We want to get podman info to tell us about the version of
the mount program to help us diagnose issues users are having.
Also if in rootless mode and slirp4netns is installed reveal package
info on slirp4netns.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If c/storage paths are explicitly set to "" (the empty string) it
will use compiled-in defaults. However, it won't tell us this via
`storage.GetDefaultStoreOptions()` - we just get the empty string
(which can put our defaults, some of which are relative to
c/storage, in a bad spot).
Hardcode a sane default for cases like this. Furthermore, add
some sanity checks to paths, to ensure we don't use relative
paths for core parts of libpod.
Fixes#3952
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When we fail to remove a container's SHM, that's an error, and we
need to report it as such. This may be part of our lingering
storage woes.
Also, remove MNT_DETACH. It may be another cause of the storage
removal failures.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When volume options and the local volume driver are specified,
the volume is intended to be mounted using the 'mount' command.
Supported options will be used to volume the volume before the
first container using it starts, and unmount the volume after the
last container using it dies.
This should work for any local filesystem, though at present I've
only tested with tmpfs and btrfs.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We need to be able to track the number of times a volume has been
mounted for tmpfs/nfs/etc volumes. As such, we need a mutable
state for volumes. Add one, with the expected update/save methods
in both states.
There is backwards compat here, in that older volumes without a
state will still be accepted.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In upcoming commits, we're going to turn on the backends for
these fields. Volumes with these set will act fundamentally
differently from other volumes. There will probably be validation
required for each field.
Until now, though, we've freely allowed creation of volumes with
these set - they just did nothing. So we have no idea what could
be in the DB with old volumes.
Change the struct tags so we don't have to worry about old,
unvalidated data. We'll start fresh with new volumes.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
use the inotify backend to be notified on the container exit instead
of polling continuosly the runtime. Polling the runtime slowns
significantly down the podman execution time for short lived
processes:
$ time bin/podman run --rm -ti fedora true
real 0m0.324s
user 0m0.088s
sys 0m0.064s
from:
$ time podman run --rm -ti fedora true
real 0m4.199s
user 0m5.339s
sys 0m0.344s
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when cni returns a list of dns servers, we should add them under the
right conditions. the defined conditions are as follows:
- if the user provides dns, it and only it are added.
- if not above and you get a cni name server, it is added and a
forwarding dns instance is created for what was in resolv.conf.
- if not either above, the entries from the host's resolv.conf are used.
Signed-off-by: baude <bbaude@redhat.com>
Signed-off-by: baude <bbaude@redhat.com>
If I mount, say, /usr/bin into my container - I expect to be able
to run the executables in that mount. Unconditionally applying
noexec would be a bad idea.
Before my patches to change mount options and allow exec/dev/suid
being set explicitly, we inferred the mount options from where on
the base system the mount originated, and the options it had
there. Implement the same functionality for the new option
handling.
There's a lot of performance left on the table here, but I don't
know that this is ever going to take enough time to make it worth
optimizing.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Previously, we explicitly set noexec/nosuid/nodev on every mount,
with no ability to disable them. The 'mount' command on Linux
will accept their inverses without complaint, though - 'noexec'
is counteracted by 'exec', 'nosuid' by 'suid', etc. Add support
for passing these options at the command line to disable our
explicit forcing of security options.
This also cleans up mount option handling significantly. We are
still parsing options in more than one place, which isn't good,
but option parsing for bind and tmpfs mounts has been unified.
Fixes: #3819Fixes: #3803
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This will require a 'podman system renumber' after being applied
to get lock numbers for existing volumes.
Add the DB backend code for rewriting volume configs and use it
for updating lock numbers as part of 'system renumber'.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Decompose() returns an error defined in CNI which has been removed
upstream because it had no in-tree (eg in CNI) users.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Support generating systemd unit files for a pod. Podman generates one
unit file for the pod including the PID file for the infra container's
conmon process and one unit file for each container (excluding the infra
container).
Note that this change implies refactorings in the `pkg/systemdgen` API.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add the digestfile option to the push command so the digest can
be stored away in a file when requested by the user. Also have added
a debug statement to show the completion of the push.
Emulates Buildah's https://github.com/containers/buildah/pull/1799/files
Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
Before, if the container was run with a specified user that wasn't root, exec would fail because it always set to root unless respecified by user.
instead, inherit the user from the container start.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
drop the pkg/firewall module and start using the firewall CNI plugin.
It requires an updated package for CNI plugins.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
podman stats does not work in rootless environments with cgroups V1.
Fix error message and document this fact.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This is a breaking change and modifies the resulting image name when
pulling from an directory via `oci:...`.
Without this patch, the image names pulled via a local directory got
processed incorrectly, like this:
```
> podman pull oci:alpine
> podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/oci alpine 4fa153a82426 5 weeks ago 5.85 MB
```
We now use the same approach as in the corresponding [buildah fix][1] to
adapt the behavior for correct `localhost/` prefixing.
[1]: https://github.com/containers/buildah/pull/1800
After applying the patch the same OCI image pull looks like this:
```
> ./bin/podman pull oci:alpine
> podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/alpine latest 4fa153a82426 5 weeks ago 5.85 MB
```
End-to-end tests have been adapted as well to cover the added scenario.
Relates to: https://github.com/containers/buildah/issues/1797
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
add ability to not activate sd_notify when running under varlink as it
causes deadlocks and hangs.
Fixes: #3572
Signed-off-by: baude <bbaude@redhat.com>
in the case where the host has a large journald, iterating the journal
without using a Match is very poor performance. this might be a
temporary fix while we figure out why the systemd library does not seem to
behave properly.
Signed-off-by: baude <bbaude@redhat.com>
Remove GraphDriver.Data.MergedDir from the result of podman inspect if the container not mounte. Because the /var/lib/containers/.../merged directory is no longer created by default; it only exists during the scope of podman mount.
Signed-off-by: Qi Wang <qiwan@redhat.com>
JSON optimizes it out in that case anyways, so don't waste cycles
doing an Itoa (and Atoi on the decode side).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We weren't actually storing this, so we'd lose the exit code for
containers run with --rm or force-removed while running if the
journald backend for events was in use.
Fixes#3795
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Set the string to "libpod/VERSION" so that we don't use the unspecific
default of "Go-http-client/xxx".
Fixes#3788
Signed-off-by: Stefan Becker <chemobejk@gmail.com>
when creating the default libpod.conf file, be sure the default OCI
runtime is cherry picked from the system configuration.
Closes: https://github.com/containers/libpod/issues/3781
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Requirement from https://github.com/containers/libpod/issues/3575#issuecomment-512238393
Added --pull for podman create and pull to match the newly added flag in docker CLI.
`missing`: default value, podman will pull the image if it does not exist in the local.
`always`: podman will always pull the image.
`never`: podman will never pull the image.
Signed-off-by: Qi Wang <qiwan@redhat.com>
After restoring a container with a different name (ID) the ConmonPidFile
was still pointing to the path of the original container.
This means that the last restored container will overwrite the
ConmonPidFile of the original container. It was also not possible to
restore a container with a new name (ID) if the original container was
not running.
The ConmonPidFile is only changed if the ConmonPidFile starts with the
value of RunRoot. This assumes that if RunRoot is part of ConmonPidFile
the user did not specify --conmon-pidfile' during run or create.
Signed-off-by: Adrian Reber <areber@redhat.com>
in the case where we rmi an image that has only one reponame, we print
out an untagged reponame message.
$ sudo podman rmi busybox
Untagged: docker.io/library/busybox:latest
Deleted: db8ee88ad75f6bdc74663f4992a185e2722fa29573abcc1a19186cc5ec09dceb
Signed-off-by: baude <bbaude@redhat.com>
it looks like the core-os systemd library has some issue when using
seektail and add match. this patch works around that shortcoming for
the time being.
Fixes: #3616
Signed-off-by: baude <bbaude@redhat.com>
commit 223fe64dc0 introduced the
regression.
When running on cgroups v1, bind mount only /sys/fs/cgroup/systemd as
rw, as the code did earlier.
Also, simplify the rootless code as it doesn't require any special
handling when using --systemd.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1737554
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
A container restored from an exported checkpoint did not have its
StartedTime set. Which resulted in a status like 'Up 292 years ago'
after the restore.
This just sets the StartedTime to time.Now() if a container is restored
from an exported checkpoint.
Signed-off-by: Adrian Reber <areber@redhat.com>
Old versions of conmon have a bug where they create the exit file before
closing open file descriptors causing a race condition when restarting
containers with open ports since we cannot bind the ports as they're not
yet closed by conmon.
Killing the old conmon PID is ~okay since it forces the FDs of old
conmons to be closed, while it's a NOP for newer versions which should
have exited already.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
we should be looking for the libpod.conf file in /usr/share/containers
and not in /usr/local. packages of podman should drop the default
libpod.conf in /usr/share. the override remains /etc/containers/ as
well.
Fixes: #3702
Signed-off-by: baude <bbaude@redhat.com>
Begin to separate the internal structures and frontend for
inspect on volumes. We can't rely on keeping internal data
structures for external presentation - separating presentation
and internal data format is good practice.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
There are two cases logdriver can be empty, if it wasn't set by libpod, or if the user did --log-driver ""
The latter case is an odd one, and the former is very possible and already handled for LogPath.
Instead of printing an error for an entirely reasonable codepath, let's supress the error
Signed-off-by: Peter Hunt <pehunt@redhat.com>
If a container is restored multiple times from an exported checkpoint
with the help of '--import --name', the restore will fail if during
'podman run' a static container IP was set with '--ip'. The user can
tell the restore process to ignore the static IP with
'--ignore-static-ip'.
Signed-off-by: Adrian Reber <areber@redhat.com>
close https://bugzilla.redhat.com/show_bug.cgi?id=1732280
From the bug Podman search returns 25 results even when limit option `--limit` is larger than 25(maxQueries). They want Podman to return `--limit` results.
This PR fixes the number of output result.
if --limit not set, return MIN(maxQueries, len(res))
if --limit is set, return MIN(option, len(res))
Signed-off-by: Qi Wang <qiwan@redhat.com>
Somehow this managed to slip through the cracks, but this is
definitely something inspect should print.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The `$PATH` environment variable will now used as fallback if no valid
runtime or conmon path matches. The debug logs has been updated to state
the used executable.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
when running on a cgroups v2 system, do not bind mount
the named hierarchy /sys/fs/cgroup/systemd as it doesn't exist
anymore. Instead bind mount the entire /sys/fs/cgroup.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When forcibly removing a container, we are initiating an explicit
stop of the container, which is not reflected in 'podman events'.
Swap to using our standard 'stop()' function instead of a custom
one for force-remove, and move the event into the internal stop
function (so internal calls also register it).
This does add one more database save() to `podman remove`. This
should not be a terribly serious performance hit, and does have
the desirable side effect of making things generally safer.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We need this specifically for tests, but others may find it
useful if they don't explicitly need events and don't want the
performance implications of using them.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
the exit file
If the container exit code needs to be retained, it cannot be retained
in tmpfs, because libpod runs in a memcg itself so it can't leave
traces with a daemon-less design.
This wasn't a memleak detectable by kmemleak for example. The kernel
never lost track of the memory and there was no erroneous refcounting
either. The reference count dependencies however are not easy to track
because when a refcount is increased, there's no way to tell who's
still holding the reference. In this case it was a single page of
tmpfs pagecache holding a refcount that kept pinned a whole hierarchy
of dying memcg, slab kmem, cgropups, unrechable kernfs nodes and the
respective dentries and inodes. Such a problem wouldn't happen if the
exit file was stored in a regular filesystem because the pagecache
could be reclaimed in such case under memory pressure. The tmpfs page
can be swapped out, but that's not enough to release the memcg with
CONFIG_MEMCG_SWAP_ENABLED=y.
No amount of more aggressive kernel slab shrinking could have solved
this. Not even assigning slab kmem of dying cgroups to alive cgroup
would fully solve this. The only way to free the memory of a dying
cgroup when a struct page still references it, would be to loop over
all "struct page" in the kernel to find which one is associated with
the dying cgroup which is a O(N) operation (where N is the number of
pages and can reach billions). Linking all the tmpfs pages to the
memcg would cost less during memcg offlining, but it would waste lots
of memory and CPU globally. So this can't be optimized in the kernel.
A cronjob running this command can act as workaround and will allow
all slab cache to be released, not just the single tmpfs pages.
rm -f /run/libpod/exits/*
This patch solved the memleak with a reproducer, booting with
cgroup.memory=nokmem and with selinux disabled. The reason memcg kmem
and selinux were disabled for testing of this fix, is because kmem
greatly decreases the kernel effectiveness in reusing partial slab
objects. cgroup.memory=nokmem is strongly recommended at least for
workstation usage. selinux needs to be further analyzed because it
causes further slab allocations.
The upstream podman commit used for testing is
1fe2965e4f (v1.4.4).
The upstream kernel commit used for testing is
f16fea666898dbdd7812ce94068c76da3e3fcf1e (v5.2-rc6).
Reported-by: Michele Baldessari <michele@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
<Applied with small tweaks to comments>
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to run rootless containers by:
1. create a network namespace
2. pass the netns persistent mount path to the slirp4netns
to create the tap inferface
3. pass the netns path to the OCI spec, so the runtime can
enter the netns
Closes#2897
Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
NixOS links the current system state to `/run/current-system`, so we
have to add these paths to the configuration files as well to work out
of the box.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Touch up a number of formating issues for XDG_RUNTIME_DIRS in a number
of man pages. Make use of the XDG_CONFIG_HOME environment variable
in a rootless environment if available, or set it if not.
Also added a number of links to the Rootless Podman config page and
added the location of the auth.json files to that doc.
Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
We should not be fuzzy matching on volume names. Docker doesn't
do it, and it doesn't make much sense. Everything requires exact
matches for names - only IDs allow partial matches.
Fixes#3635
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When a command (like `ps`) requests no store be created, but also
requires a refresh be performed, we have to ignore its request
and initialize the store anyways to prevent segfaults. This work
was done in #3532, but that missed one thing - initializing a
storage service. Without the storage service, Podman will still
segfault. Fix that oversight here.
Fixes#3625
Signed-off-by: Matthew Heon <mheon@redhat.com>
There's no way to get the error if we successfully get an exit code (as it's just printed to stderr instead).
instead of relying on the error to be passed to podman, and edit based on the error code, process it on the varlink side instead
Also move error codes to define package
Signed-off-by: Peter Hunt <pehunt@redhat.com>
clean up some final linter issues and add a make target for
golangci-lint. in addition, begin running the tests are part of the
gating tasks in cirrus ci.
we cannot fully shift over to the new linter until we fix the image on
the openshift side. for short term, we will use both
Signed-off-by: baude <bbaude@redhat.com>
This includes:
Implement exec -i and fix some typos in description of -i docs
pass failed runtime status to caller
Add resize handling for a terminal connection
Customize exec systemd-cgroup slice
fix healthcheck
fix top
add --detach-keys
Implement podman-remote exec (jhonce)
* Cleanup some orphaned code (jhonce)
adapt remote exec for conmon exec (pehunt)
Fix healthcheck and exec to match docs
Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error
Use these different errors in branching for exit code in healthcheck and exec
Set conmon to use new api version
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Currently the pull message on failure is UGLY. This patch removes a lot of the noice
when pulling an image from multiple registries to make the user experience better.
Our current messages are way too verbose and need to be dampened down. Still has
verbose mode if you turn on log-level=debug.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When removing --all images prune images only attempt to remove read/write images,
ignore read/only images
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We have another patch running to do the same for exit files, with
a much more in-depth explanation of why it's necessary. Suffice
to say that persistent files in tmpfs tied to container CGroups
lead to significant memory allocations that last for the lifetime
of the file.
Based on a patch by Andrea Arcangeli (aarcange@redhat.com).
Signed-off-by: Matthew Heon <mheon@redhat.com>
We can infer no-new-privileges. For now, manually populate
seccomp (can't infer what file we sourced from) and
SELinux/Apparmor (hard to tell if they're enabled or not).
Signed-off-by: Matthew Heon <mheon@redhat.com>
Our previous method (just read the PID that we spawned) doesn't
work - Conmon double-forks to daemonize, so we end up with a PID
pointing to the first process, which dies almost immediately.
Reading from the PID file gets us the real PID.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When we first began writing Podman, we ran into a major issue
when implementing Inspect. Libpod deliberately does not tie its
internal data structures to Docker, and stores most information
about containers encoded within the OCI spec. However, Podman
must present a CLI compatible with Docker, which means it must
expose all the information in 'docker inspect' - most of which is
not contained in the OCI spec or libpod's Config struct.
Our solution at the time was the create artifact. We JSON'd the
complete CreateConfig (a parsed form of the CLI arguments to
'podman run') and stored it with the container, restoring it when
we needed to run commands that required the extra info.
Over the past month, I've been looking more at Inspect, and
refactored large portions of it into Libpod - generating them
from what we know about the OCI config and libpod's (now much
expanded, versus previously) container configuration. This path
comes close to completing the process, moving the last part of
inspect into libpod and removing the need for the create
artifact.
This improves libpod's compatability with non-Podman containers.
We no longer require an arbitrarily-formatted JSON blob to be
present to run inspect.
Fixes: #3500
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
An image with "HEALTHCHECK CMD ['']" is valid but as there is no command
defined the healthcheck will fail. Reject such a configuration.
Fixes#3507
Signed-off-by: Stefan Becker <chemobejk@gmail.com>
- remove duplicate check, already called in HealthCheck()
- reject zero-length command list and empty command string as errorneous
- support all Docker command list keywords: NONE, CMD or CMD-SHELL
- use Docker default "/bin/sh -c" for CMD-SHELL
Fixes#3507
Signed-off-by: Stefan Becker <chemobejk@gmail.com>
Using pod removal worked, but container removal was missing the
most critical step - the actual removal. Must have been
accidentally removed during a refactor.
Fixes#3556
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The newly added functionality to include the container's root
file-system changes into the checkpoint archive can now be explicitly
disabled. Either during checkpoint or during restore.
If a container changes a lot of files during its runtime it might be
more effective to migrated the root file-system changes in some other
way and to not needlessly increase the size of the checkpoint archive.
If a checkpoint archive does not contain the root file-system changes
information it will automatically be skipped. If the root file-system
changes are part of the checkpoint archive it is also possible to tell
Podman to ignore these changes.
Signed-off-by: Adrian Reber <areber@redhat.com>
One of the last limitations when migrating a container using Podman's
'podman container checkpoint --export=/path/to/archive.tar.gz' was
that it was necessary to manually handle changes to the container's root
file-system. The recommendation was to mount everything as --tmpfs where
the root file-system was changed.
This extends the checkpoint export functionality to also include all
changes to the root file-system in the checkpoint archive. The
checkpoint archive now includes a tarstream of the result from 'podman
diff'. This tarstream will be applied to the restored container before
restoring the container.
With this any container can now be migrated, even it there are changes
to the root file-system.
There was some discussion before implementing this to base the root
file-system migration on 'podman commit', but it seemed wrong to do
a 'podman commit' before the migration as that would change the parent
layer the restored container is referencing. Probably not really a
problem, but it would have meant that a migrated container will always
reference another storage top layer than it used to reference during
initial creation.
Signed-off-by: Adrian Reber <areber@redhat.com>
The newly added function GetDiffTarStream() mirrors the GetDiff()
function. It tries to get the correct layer ID from getLayerID()
and it filters out containerMounts from the tarstream. Thus the
behavior is the same as GetDiff(), but it returns a tarstream.
This also adds the function ApplyDiffTarStream() to apply the tarstream
generated by GetDiffTarStream().
These functions are targeted to support container migration with
root file-system changes.
Signed-off-by: Adrian Reber <areber@redhat.com>
During 'podman container checkpoint' the finished time was not set. This
resulted in a strange container status after checkpointing:
Exited (0) 292 years ago
During checkpointing FinishedTime is now set to time.now().
Signed-off-by: Adrian Reber <areber@redhat.com>
now that dbus authentication works fine from a user namespace (systemd
241 works fine), we can enable rootless healthchecks.
It uses "systemd-run --user" for creating the healthcheck timer and
communicates with the user instance of systemd listening at
$XDG_RUNTIME_DIR/systemd/private.
Closes: https://github.com/containers/libpod/issues/3523
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
an internal change in libpod will soon required the ability to lookup
the last container event using the continer name or id and the type of
event. this pr is in preperation for that need.
Signed-off-by: baude <bbaude@redhat.com>
The conmon pidfile is crucial for podman-generated systemd units, because
these units rely on it for determining service's main process ID.
With this change, every container has ConmonPidFile set (at least to
default value).
Signed-off-by: Danila Kiver <danila.kiver@mail.ru>
When specifying a podman command with a partial ID, container and pod
commands matches respectively only containers or pods IDs in the BoltDB.
Fixes: #3487
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
This is needed for dual stack IPv6 support within CRI-O. Because the API
changed within OCICNI, we have to adapt the internal linux networking as
well.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Specifically, we were needlessly doing a double lookup to find which config mounts were user volumes. Improve this by refactoring a bit of code from inspect
Signed-off-by: Peter Hunt <pehunt@redhat.com>
If we don't do this, we can leak locks on every failure, and that
is very, very bad - can render Podman unusable without a 'system
renumber' being run.
Signed-off-by: Matthew Heon <mheon@redhat.com>
some podman commands do not require the use of a container/image store.
in those cases, it is more effecient to not open the store, because that
results in having to also close the store which can be costly when the
system is under heavy write I/O loads.
Signed-off-by: baude <bbaude@redhat.com>
at least on Fedora 30 it creates the /run/user/UID directory for the
user logged in via ssh.
This needs to be done very early so that every other check when we
create the default configuration file will point to the correct
location.
Closes: https://github.com/containers/libpod/issues/3410
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we had a regression where the rootless user tried to use the global
configuration file. We should not try to use the global configuration
when running in rootless but only cherry-pick some settings from there
when creating the file for the first time.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Conmon has moved out of cri-o and into it's own dedicated repository.
This commit updates configuration and definitions which referenced
the old cri-o based paths.
Signed-off-by: Chris Evich <cevich@redhat.com>
This fixes some of our handling of images which have no layers, i.e.,
those whose TopLayer is set to an empty value.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
When a container is attached upon start, the WaitGroup counter may
never be decremented if an error is raised before start, causing
the caller to hang.
Synchronize with the start & attach goroutine using a channel, to be
able to detect failures before start.
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
the compilation demands of having libpod in main is a burden for the
remote client compilations. to combat this, we should move the use of
libpod structs, vars, constants, and functions into the adapter code
where it will only be compiled by the local client.
this should result in cleaner code organization and smaller binaries. it
should also help if we ever need to compile the remote client on
non-Linux operating systems natively (not cross-compiled).
Signed-off-by: baude <bbaude@redhat.com>
Restoring a container from a checkpoint archive creates a complete
new root file-system. This file-system needs to have the correct SELinux
label or most things in that restored container will fail. Running
processes are not as problematic as newly exec()'d process (internally
or via 'podman exec').
This patch tells the storage setup which label should be used to mount
the container's root file-system.
Signed-off-by: Adrian Reber <areber@redhat.com>
Instead of only tracking that a container is restored from
a checkpoint locally in runtime_ctr.go this adds a flag to the
Container structure.
Upcoming patches to correctly label the root file-system mount-point
need also to know if a container is restored from a checkpoint.
Instead of passing a parameter around a lot of functions, this
adds that information to the Container structure.
Signed-off-by: Adrian Reber <areber@redhat.com>
This likely broke when we made containers able to detect that
they shared a network namespace and grab ports from the
dependency container - prior to that, we could grab ports without
concern for conflict, only the infra container had them. Now, all
containers in a pod will return the same ports, so we have to
work around this.
Fixes#3408
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We weren't properly populating the container's OCI Runtime in
Batch(), causing segfaults on attempting to access it. Add a test
to make sure we actually catch cases like this in the future.
Fixes#3411
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In Go templating, we use the names of fields, not the JSON struct
tags. To ensure templating works are expected, we need the two to
match.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Go templating is incapable of dealing with pointers, so when we
moved to Docker compatible mounts JSON, we broke it. The solution
is to not use pointers in this part of inspect.
Signed-off-by: Matthew Heon <mheon@redhat.com>
To avoid unnecessary warnings and errors in the future I'd like to
propose building all cgo related sources with `-Wall -Werror`. This
commit fixes some warnings which came up in `shm_lock.c`, too.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Use name of the default runtime, instead of the OCIRuntime config
option, which may include a full path.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Instead, use a less expensive read-only transaction to see if the
DB is ready for use (it probably is), and only fire the expensive
RW transaction if absolutely necessary.
Signed-off-by: Matthew Heon <mheon@redhat.com>