Support generating systemd unit files for a pod. Podman generates one
unit file for the pod including the PID file for the infra container's
conmon process and one unit file for each container (excluding the infra
container).
Note that this change implies refactorings in the `pkg/systemdgen` API.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add the digestfile option to the push command so the digest can
be stored away in a file when requested by the user. Also have added
a debug statement to show the completion of the push.
Emulates Buildah's https://github.com/containers/buildah/pull/1799/files
Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
Before, if the container was run with a specified user that wasn't root, exec would fail because it always set to root unless respecified by user.
instead, inherit the user from the container start.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
drop the pkg/firewall module and start using the firewall CNI plugin.
It requires an updated package for CNI plugins.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
podman stats does not work in rootless environments with cgroups V1.
Fix error message and document this fact.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This is a breaking change and modifies the resulting image name when
pulling from an directory via `oci:...`.
Without this patch, the image names pulled via a local directory got
processed incorrectly, like this:
```
> podman pull oci:alpine
> podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/oci alpine 4fa153a82426 5 weeks ago 5.85 MB
```
We now use the same approach as in the corresponding [buildah fix][1] to
adapt the behavior for correct `localhost/` prefixing.
[1]: https://github.com/containers/buildah/pull/1800
After applying the patch the same OCI image pull looks like this:
```
> ./bin/podman pull oci:alpine
> podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/alpine latest 4fa153a82426 5 weeks ago 5.85 MB
```
End-to-end tests have been adapted as well to cover the added scenario.
Relates to: https://github.com/containers/buildah/issues/1797
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
add ability to not activate sd_notify when running under varlink as it
causes deadlocks and hangs.
Fixes: #3572
Signed-off-by: baude <bbaude@redhat.com>
in the case where the host has a large journald, iterating the journal
without using a Match is very poor performance. this might be a
temporary fix while we figure out why the systemd library does not seem to
behave properly.
Signed-off-by: baude <bbaude@redhat.com>
Remove GraphDriver.Data.MergedDir from the result of podman inspect if the container not mounte. Because the /var/lib/containers/.../merged directory is no longer created by default; it only exists during the scope of podman mount.
Signed-off-by: Qi Wang <qiwan@redhat.com>
JSON optimizes it out in that case anyways, so don't waste cycles
doing an Itoa (and Atoi on the decode side).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We weren't actually storing this, so we'd lose the exit code for
containers run with --rm or force-removed while running if the
journald backend for events was in use.
Fixes#3795
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Set the string to "libpod/VERSION" so that we don't use the unspecific
default of "Go-http-client/xxx".
Fixes#3788
Signed-off-by: Stefan Becker <chemobejk@gmail.com>
when creating the default libpod.conf file, be sure the default OCI
runtime is cherry picked from the system configuration.
Closes: https://github.com/containers/libpod/issues/3781
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Requirement from https://github.com/containers/libpod/issues/3575#issuecomment-512238393
Added --pull for podman create and pull to match the newly added flag in docker CLI.
`missing`: default value, podman will pull the image if it does not exist in the local.
`always`: podman will always pull the image.
`never`: podman will never pull the image.
Signed-off-by: Qi Wang <qiwan@redhat.com>
After restoring a container with a different name (ID) the ConmonPidFile
was still pointing to the path of the original container.
This means that the last restored container will overwrite the
ConmonPidFile of the original container. It was also not possible to
restore a container with a new name (ID) if the original container was
not running.
The ConmonPidFile is only changed if the ConmonPidFile starts with the
value of RunRoot. This assumes that if RunRoot is part of ConmonPidFile
the user did not specify --conmon-pidfile' during run or create.
Signed-off-by: Adrian Reber <areber@redhat.com>
in the case where we rmi an image that has only one reponame, we print
out an untagged reponame message.
$ sudo podman rmi busybox
Untagged: docker.io/library/busybox:latest
Deleted: db8ee88ad75f6bdc74663f4992a185e2722fa29573abcc1a19186cc5ec09dceb
Signed-off-by: baude <bbaude@redhat.com>
it looks like the core-os systemd library has some issue when using
seektail and add match. this patch works around that shortcoming for
the time being.
Fixes: #3616
Signed-off-by: baude <bbaude@redhat.com>
commit 223fe64dc0 introduced the
regression.
When running on cgroups v1, bind mount only /sys/fs/cgroup/systemd as
rw, as the code did earlier.
Also, simplify the rootless code as it doesn't require any special
handling when using --systemd.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1737554
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
A container restored from an exported checkpoint did not have its
StartedTime set. Which resulted in a status like 'Up 292 years ago'
after the restore.
This just sets the StartedTime to time.Now() if a container is restored
from an exported checkpoint.
Signed-off-by: Adrian Reber <areber@redhat.com>
Old versions of conmon have a bug where they create the exit file before
closing open file descriptors causing a race condition when restarting
containers with open ports since we cannot bind the ports as they're not
yet closed by conmon.
Killing the old conmon PID is ~okay since it forces the FDs of old
conmons to be closed, while it's a NOP for newer versions which should
have exited already.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
we should be looking for the libpod.conf file in /usr/share/containers
and not in /usr/local. packages of podman should drop the default
libpod.conf in /usr/share. the override remains /etc/containers/ as
well.
Fixes: #3702
Signed-off-by: baude <bbaude@redhat.com>
Begin to separate the internal structures and frontend for
inspect on volumes. We can't rely on keeping internal data
structures for external presentation - separating presentation
and internal data format is good practice.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
There are two cases logdriver can be empty, if it wasn't set by libpod, or if the user did --log-driver ""
The latter case is an odd one, and the former is very possible and already handled for LogPath.
Instead of printing an error for an entirely reasonable codepath, let's supress the error
Signed-off-by: Peter Hunt <pehunt@redhat.com>
If a container is restored multiple times from an exported checkpoint
with the help of '--import --name', the restore will fail if during
'podman run' a static container IP was set with '--ip'. The user can
tell the restore process to ignore the static IP with
'--ignore-static-ip'.
Signed-off-by: Adrian Reber <areber@redhat.com>
close https://bugzilla.redhat.com/show_bug.cgi?id=1732280
From the bug Podman search returns 25 results even when limit option `--limit` is larger than 25(maxQueries). They want Podman to return `--limit` results.
This PR fixes the number of output result.
if --limit not set, return MIN(maxQueries, len(res))
if --limit is set, return MIN(option, len(res))
Signed-off-by: Qi Wang <qiwan@redhat.com>
Somehow this managed to slip through the cracks, but this is
definitely something inspect should print.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The `$PATH` environment variable will now used as fallback if no valid
runtime or conmon path matches. The debug logs has been updated to state
the used executable.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
when running on a cgroups v2 system, do not bind mount
the named hierarchy /sys/fs/cgroup/systemd as it doesn't exist
anymore. Instead bind mount the entire /sys/fs/cgroup.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When forcibly removing a container, we are initiating an explicit
stop of the container, which is not reflected in 'podman events'.
Swap to using our standard 'stop()' function instead of a custom
one for force-remove, and move the event into the internal stop
function (so internal calls also register it).
This does add one more database save() to `podman remove`. This
should not be a terribly serious performance hit, and does have
the desirable side effect of making things generally safer.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We need this specifically for tests, but others may find it
useful if they don't explicitly need events and don't want the
performance implications of using them.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
the exit file
If the container exit code needs to be retained, it cannot be retained
in tmpfs, because libpod runs in a memcg itself so it can't leave
traces with a daemon-less design.
This wasn't a memleak detectable by kmemleak for example. The kernel
never lost track of the memory and there was no erroneous refcounting
either. The reference count dependencies however are not easy to track
because when a refcount is increased, there's no way to tell who's
still holding the reference. In this case it was a single page of
tmpfs pagecache holding a refcount that kept pinned a whole hierarchy
of dying memcg, slab kmem, cgropups, unrechable kernfs nodes and the
respective dentries and inodes. Such a problem wouldn't happen if the
exit file was stored in a regular filesystem because the pagecache
could be reclaimed in such case under memory pressure. The tmpfs page
can be swapped out, but that's not enough to release the memcg with
CONFIG_MEMCG_SWAP_ENABLED=y.
No amount of more aggressive kernel slab shrinking could have solved
this. Not even assigning slab kmem of dying cgroups to alive cgroup
would fully solve this. The only way to free the memory of a dying
cgroup when a struct page still references it, would be to loop over
all "struct page" in the kernel to find which one is associated with
the dying cgroup which is a O(N) operation (where N is the number of
pages and can reach billions). Linking all the tmpfs pages to the
memcg would cost less during memcg offlining, but it would waste lots
of memory and CPU globally. So this can't be optimized in the kernel.
A cronjob running this command can act as workaround and will allow
all slab cache to be released, not just the single tmpfs pages.
rm -f /run/libpod/exits/*
This patch solved the memleak with a reproducer, booting with
cgroup.memory=nokmem and with selinux disabled. The reason memcg kmem
and selinux were disabled for testing of this fix, is because kmem
greatly decreases the kernel effectiveness in reusing partial slab
objects. cgroup.memory=nokmem is strongly recommended at least for
workstation usage. selinux needs to be further analyzed because it
causes further slab allocations.
The upstream podman commit used for testing is
1fe2965e4f (v1.4.4).
The upstream kernel commit used for testing is
f16fea666898dbdd7812ce94068c76da3e3fcf1e (v5.2-rc6).
Reported-by: Michele Baldessari <michele@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
<Applied with small tweaks to comments>
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to run rootless containers by:
1. create a network namespace
2. pass the netns persistent mount path to the slirp4netns
to create the tap inferface
3. pass the netns path to the OCI spec, so the runtime can
enter the netns
Closes#2897
Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
NixOS links the current system state to `/run/current-system`, so we
have to add these paths to the configuration files as well to work out
of the box.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Touch up a number of formating issues for XDG_RUNTIME_DIRS in a number
of man pages. Make use of the XDG_CONFIG_HOME environment variable
in a rootless environment if available, or set it if not.
Also added a number of links to the Rootless Podman config page and
added the location of the auth.json files to that doc.
Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
We should not be fuzzy matching on volume names. Docker doesn't
do it, and it doesn't make much sense. Everything requires exact
matches for names - only IDs allow partial matches.
Fixes#3635
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When a command (like `ps`) requests no store be created, but also
requires a refresh be performed, we have to ignore its request
and initialize the store anyways to prevent segfaults. This work
was done in #3532, but that missed one thing - initializing a
storage service. Without the storage service, Podman will still
segfault. Fix that oversight here.
Fixes#3625
Signed-off-by: Matthew Heon <mheon@redhat.com>
There's no way to get the error if we successfully get an exit code (as it's just printed to stderr instead).
instead of relying on the error to be passed to podman, and edit based on the error code, process it on the varlink side instead
Also move error codes to define package
Signed-off-by: Peter Hunt <pehunt@redhat.com>
clean up some final linter issues and add a make target for
golangci-lint. in addition, begin running the tests are part of the
gating tasks in cirrus ci.
we cannot fully shift over to the new linter until we fix the image on
the openshift side. for short term, we will use both
Signed-off-by: baude <bbaude@redhat.com>
This includes:
Implement exec -i and fix some typos in description of -i docs
pass failed runtime status to caller
Add resize handling for a terminal connection
Customize exec systemd-cgroup slice
fix healthcheck
fix top
add --detach-keys
Implement podman-remote exec (jhonce)
* Cleanup some orphaned code (jhonce)
adapt remote exec for conmon exec (pehunt)
Fix healthcheck and exec to match docs
Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error
Use these different errors in branching for exit code in healthcheck and exec
Set conmon to use new api version
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Currently the pull message on failure is UGLY. This patch removes a lot of the noice
when pulling an image from multiple registries to make the user experience better.
Our current messages are way too verbose and need to be dampened down. Still has
verbose mode if you turn on log-level=debug.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When removing --all images prune images only attempt to remove read/write images,
ignore read/only images
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We have another patch running to do the same for exit files, with
a much more in-depth explanation of why it's necessary. Suffice
to say that persistent files in tmpfs tied to container CGroups
lead to significant memory allocations that last for the lifetime
of the file.
Based on a patch by Andrea Arcangeli (aarcange@redhat.com).
Signed-off-by: Matthew Heon <mheon@redhat.com>
We can infer no-new-privileges. For now, manually populate
seccomp (can't infer what file we sourced from) and
SELinux/Apparmor (hard to tell if they're enabled or not).
Signed-off-by: Matthew Heon <mheon@redhat.com>
Our previous method (just read the PID that we spawned) doesn't
work - Conmon double-forks to daemonize, so we end up with a PID
pointing to the first process, which dies almost immediately.
Reading from the PID file gets us the real PID.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When we first began writing Podman, we ran into a major issue
when implementing Inspect. Libpod deliberately does not tie its
internal data structures to Docker, and stores most information
about containers encoded within the OCI spec. However, Podman
must present a CLI compatible with Docker, which means it must
expose all the information in 'docker inspect' - most of which is
not contained in the OCI spec or libpod's Config struct.
Our solution at the time was the create artifact. We JSON'd the
complete CreateConfig (a parsed form of the CLI arguments to
'podman run') and stored it with the container, restoring it when
we needed to run commands that required the extra info.
Over the past month, I've been looking more at Inspect, and
refactored large portions of it into Libpod - generating them
from what we know about the OCI config and libpod's (now much
expanded, versus previously) container configuration. This path
comes close to completing the process, moving the last part of
inspect into libpod and removing the need for the create
artifact.
This improves libpod's compatability with non-Podman containers.
We no longer require an arbitrarily-formatted JSON blob to be
present to run inspect.
Fixes: #3500
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
An image with "HEALTHCHECK CMD ['']" is valid but as there is no command
defined the healthcheck will fail. Reject such a configuration.
Fixes#3507
Signed-off-by: Stefan Becker <chemobejk@gmail.com>
- remove duplicate check, already called in HealthCheck()
- reject zero-length command list and empty command string as errorneous
- support all Docker command list keywords: NONE, CMD or CMD-SHELL
- use Docker default "/bin/sh -c" for CMD-SHELL
Fixes#3507
Signed-off-by: Stefan Becker <chemobejk@gmail.com>
Using pod removal worked, but container removal was missing the
most critical step - the actual removal. Must have been
accidentally removed during a refactor.
Fixes#3556
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The newly added functionality to include the container's root
file-system changes into the checkpoint archive can now be explicitly
disabled. Either during checkpoint or during restore.
If a container changes a lot of files during its runtime it might be
more effective to migrated the root file-system changes in some other
way and to not needlessly increase the size of the checkpoint archive.
If a checkpoint archive does not contain the root file-system changes
information it will automatically be skipped. If the root file-system
changes are part of the checkpoint archive it is also possible to tell
Podman to ignore these changes.
Signed-off-by: Adrian Reber <areber@redhat.com>
One of the last limitations when migrating a container using Podman's
'podman container checkpoint --export=/path/to/archive.tar.gz' was
that it was necessary to manually handle changes to the container's root
file-system. The recommendation was to mount everything as --tmpfs where
the root file-system was changed.
This extends the checkpoint export functionality to also include all
changes to the root file-system in the checkpoint archive. The
checkpoint archive now includes a tarstream of the result from 'podman
diff'. This tarstream will be applied to the restored container before
restoring the container.
With this any container can now be migrated, even it there are changes
to the root file-system.
There was some discussion before implementing this to base the root
file-system migration on 'podman commit', but it seemed wrong to do
a 'podman commit' before the migration as that would change the parent
layer the restored container is referencing. Probably not really a
problem, but it would have meant that a migrated container will always
reference another storage top layer than it used to reference during
initial creation.
Signed-off-by: Adrian Reber <areber@redhat.com>
The newly added function GetDiffTarStream() mirrors the GetDiff()
function. It tries to get the correct layer ID from getLayerID()
and it filters out containerMounts from the tarstream. Thus the
behavior is the same as GetDiff(), but it returns a tarstream.
This also adds the function ApplyDiffTarStream() to apply the tarstream
generated by GetDiffTarStream().
These functions are targeted to support container migration with
root file-system changes.
Signed-off-by: Adrian Reber <areber@redhat.com>
During 'podman container checkpoint' the finished time was not set. This
resulted in a strange container status after checkpointing:
Exited (0) 292 years ago
During checkpointing FinishedTime is now set to time.now().
Signed-off-by: Adrian Reber <areber@redhat.com>
now that dbus authentication works fine from a user namespace (systemd
241 works fine), we can enable rootless healthchecks.
It uses "systemd-run --user" for creating the healthcheck timer and
communicates with the user instance of systemd listening at
$XDG_RUNTIME_DIR/systemd/private.
Closes: https://github.com/containers/libpod/issues/3523
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
an internal change in libpod will soon required the ability to lookup
the last container event using the continer name or id and the type of
event. this pr is in preperation for that need.
Signed-off-by: baude <bbaude@redhat.com>
The conmon pidfile is crucial for podman-generated systemd units, because
these units rely on it for determining service's main process ID.
With this change, every container has ConmonPidFile set (at least to
default value).
Signed-off-by: Danila Kiver <danila.kiver@mail.ru>
When specifying a podman command with a partial ID, container and pod
commands matches respectively only containers or pods IDs in the BoltDB.
Fixes: #3487
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
This is needed for dual stack IPv6 support within CRI-O. Because the API
changed within OCICNI, we have to adapt the internal linux networking as
well.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Specifically, we were needlessly doing a double lookup to find which config mounts were user volumes. Improve this by refactoring a bit of code from inspect
Signed-off-by: Peter Hunt <pehunt@redhat.com>
If we don't do this, we can leak locks on every failure, and that
is very, very bad - can render Podman unusable without a 'system
renumber' being run.
Signed-off-by: Matthew Heon <mheon@redhat.com>
some podman commands do not require the use of a container/image store.
in those cases, it is more effecient to not open the store, because that
results in having to also close the store which can be costly when the
system is under heavy write I/O loads.
Signed-off-by: baude <bbaude@redhat.com>
at least on Fedora 30 it creates the /run/user/UID directory for the
user logged in via ssh.
This needs to be done very early so that every other check when we
create the default configuration file will point to the correct
location.
Closes: https://github.com/containers/libpod/issues/3410
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
we had a regression where the rootless user tried to use the global
configuration file. We should not try to use the global configuration
when running in rootless but only cherry-pick some settings from there
when creating the file for the first time.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Conmon has moved out of cri-o and into it's own dedicated repository.
This commit updates configuration and definitions which referenced
the old cri-o based paths.
Signed-off-by: Chris Evich <cevich@redhat.com>
This fixes some of our handling of images which have no layers, i.e.,
those whose TopLayer is set to an empty value.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
When a container is attached upon start, the WaitGroup counter may
never be decremented if an error is raised before start, causing
the caller to hang.
Synchronize with the start & attach goroutine using a channel, to be
able to detect failures before start.
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
the compilation demands of having libpod in main is a burden for the
remote client compilations. to combat this, we should move the use of
libpod structs, vars, constants, and functions into the adapter code
where it will only be compiled by the local client.
this should result in cleaner code organization and smaller binaries. it
should also help if we ever need to compile the remote client on
non-Linux operating systems natively (not cross-compiled).
Signed-off-by: baude <bbaude@redhat.com>
Restoring a container from a checkpoint archive creates a complete
new root file-system. This file-system needs to have the correct SELinux
label or most things in that restored container will fail. Running
processes are not as problematic as newly exec()'d process (internally
or via 'podman exec').
This patch tells the storage setup which label should be used to mount
the container's root file-system.
Signed-off-by: Adrian Reber <areber@redhat.com>
Instead of only tracking that a container is restored from
a checkpoint locally in runtime_ctr.go this adds a flag to the
Container structure.
Upcoming patches to correctly label the root file-system mount-point
need also to know if a container is restored from a checkpoint.
Instead of passing a parameter around a lot of functions, this
adds that information to the Container structure.
Signed-off-by: Adrian Reber <areber@redhat.com>
This likely broke when we made containers able to detect that
they shared a network namespace and grab ports from the
dependency container - prior to that, we could grab ports without
concern for conflict, only the infra container had them. Now, all
containers in a pod will return the same ports, so we have to
work around this.
Fixes#3408
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We weren't properly populating the container's OCI Runtime in
Batch(), causing segfaults on attempting to access it. Add a test
to make sure we actually catch cases like this in the future.
Fixes#3411
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In Go templating, we use the names of fields, not the JSON struct
tags. To ensure templating works are expected, we need the two to
match.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Go templating is incapable of dealing with pointers, so when we
moved to Docker compatible mounts JSON, we broke it. The solution
is to not use pointers in this part of inspect.
Signed-off-by: Matthew Heon <mheon@redhat.com>
To avoid unnecessary warnings and errors in the future I'd like to
propose building all cgo related sources with `-Wall -Werror`. This
commit fixes some warnings which came up in `shm_lock.c`, too.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Use name of the default runtime, instead of the OCIRuntime config
option, which may include a full path.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Instead, use a less expensive read-only transaction to see if the
DB is ready for use (it probably is), and only fire the expensive
RW transaction if absolutely necessary.
Signed-off-by: Matthew Heon <mheon@redhat.com>
We may want to ship configurations including more than one
runtime configuration - for example, crun and runc and kata, all
configured. However, we don't want to make these extra runtimes
hard requirements, so let's not fatally error when we can't find
their executables.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Allow Podman containers to request to use a specific OCI runtime
if multiple runtimes are configured. This is the first step to
properly supporting containers in a multi-runtime environment.
The biggest changes are that all OCI runtimes are now initialized
when Podman creates its runtime, and containers now use the
runtime requested in their configuration (instead of always the
default runtime).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
For docker scripting compatibility, allow for json-file logging when creating args for conmon. That way, when json-file is supported, that case can be easily removed.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
When available, using the on-disk spec will show full mount
options in use when the container is running, which can differ
from mount options provided in the original spec - on generating
the final spec, for example, we ensure that some form of root
propagation is set.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
While we're at it, rewrite how we populate it. There were several
potential segfaults in the optional spec.Process block, and a few
fields not being populated correctly versus 'docker inspect'.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Extend kill's error message to include the container's ID and state.
This address cases where error messages caused by other containers
may confuse users.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Currently we report cgroupmanager default as systemd, even if the user modified
the libpod.conf. Also cgroupmanager does not work in rootless mode. This
PR correctly identifies the default cgroup manager or reports it is not supported.
Also add homeDir to correctly get the homedir if the $HOME is not set. Will
attempt to get Homedir out of /etc/passwd.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
- PREFIX is now passed saved in the binary at build-time so that default
paths match installation paths.
- ETCDIR is also overridable in a similar way.
- DESTDIR is now applied on top of PREFIX for install/uninstall steps.
Previously, a DESTDIR=/foo PREFIX=/bar make would install into /bar,
rather than /foo/bar.
Signed-off-by: Lawrence Chan <element103@gmail.com>
This flag switches to removing containers directly from c/storage
and is mostly used to remove orphan containers.
It's a superior solution to our former one, which attempted
removal from storage under certain circumstances and could, under
some conditions, not trigger.
Also contains the beginning of support for storage in `ps` but
wiring that in is going to be a much bigger pain.
Fixes#3329.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We're no longer using either of these JSON libraries, dropped
them in favor of jsoniter. We can't completely remove ffjson as
c/storage uses it and can't easily migrate, but we can make sure
that libpod itself isn't doing anything with them anymore.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
add a new configuration `runtime_supports_json` to list what OCI
runtimes support the --log-format=json option. If the runtime is not
listed here, libpod will redirect stdout/stderr from the runtime
process.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We were formerly dumping spec.Mount structs, with no care as to
whether it was user-generated or not - a relic of the very early
days when we didn't know whether a user made a mount or not.
Now that we do, match our output to Docker's dedicated mount
struct.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This way a tool can determine if the container exists or not, but is in the
wrong state.
Since 126 is documeted as:
**_126_** if the **_contained command_** cannot be invoked
It makes sense that the container would exit with this state.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When using slirp4netns, be sure the built-in DNS server is the first
one to be used.
Closes: https://github.com/containers/libpod/issues/3277
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The storage driver and the storage options in storage.conf should
match, but if you change the storage driver via the command line
then we need to nil out the default storage options from storage.conf.
If the user wants to change the storage driver and use storage options,
they need to specify them on the command line.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The option to restore a container from an external checkpoint archive
(podman container restore -i /tmp/checkpoint.tar.gz) restores a
container with the same name and same ID as id had before checkpointing.
This commit adds the option '--name,-n' to 'podman container restore'.
With this option the restored container gets the name specified after
'--name,-n' and a new ID. This way it is possible to restore one
container multiple times.
If a container is restored with a new name Podman will not try to
request the same IP address for the container as it had during
checkpointing. This implicitly assumes that if a container is restored
from a checkpoint archive with a different name, that it will be
restored multiple times and restoring a container multiple times with
the same IP address will fail as each IP address can only be used once.
Signed-off-by: Adrian Reber <areber@redhat.com>
This commit adds an option to the checkpoint command to export a
checkpoint into a tar.gz file as well as importing a checkpoint tar.gz
file during restore. With all checkpoint artifacts in one file it is
possible to easily transfer a checkpoint and thus enabling container
migration in Podman. With the following steps it is possible to migrate
a running container from one system (source) to another (destination).
Source system:
* podman container checkpoint -l -e /tmp/checkpoint.tar.gz
* scp /tmp/checkpoint.tar.gz destination:/tmp
Destination system:
* podman pull 'container-image-as-on-source-system'
* podman container restore -i /tmp/checkpoint.tar.gz
The exported tar.gz file contains the checkpoint image as created by
CRIU and a few additional JSON files describing the state of the
checkpointed container.
Now the container is running on the destination system with the same
state just as during checkpointing. If the container is kept running
on the source system with the checkpoint flag '-R', the result will be
that the same container is running on two different hosts.
Signed-off-by: Adrian Reber <areber@redhat.com>
This adds a couple of function in structure members needed in the next
commit to make container migration actually work. This just splits of
the function which are not modifying existing code.
Signed-off-by: Adrian Reber <areber@redhat.com>
Let's put inspect structs where they're actually being used. We
originally made pkg/inspect to solve circular import issues.
There are no more circular import issues.
Image structs remain for now, I'm focusing on container inspect.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Remove this IsNotExist out which was added along with the rest of this
block in f6a2b6bf2b (hooks: Add pre-create hooks for runtime-config
manipulation, 2018-11-19, #1830). Besides the obvious "hook directory
does not exist", it was swallowing the less-obvious "hook command does
not exist". And either way, folks are likely going to want non-zero
podman exits when we fail to load a hook directory they explicitly
pointed us towards.
Signed-off-by: W. Trevor King <wking@tremily.us>
Add a journald reader that translates the journald entry to a k8s-file formatted line, to be added as a log line
Note: --follow with journald hasn't been implemented. It's going to be a larger undertaking that can wait.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
since we now enter the user namespace prior to read the conmon.pid, we
can write the conmon.pid file again to the runtime dir.
This reverts commit 6c6a865436.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when running in rootless mode, be sure psgo is honoring the user
namespace settings for huser and hgroup.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Commit 27f9e23a0b already prevents setting the profile when creating
the spec but we also need to avoid loading and setting the profile when
creating the container.
Fixes: #3112
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
replace two usage of kwait.ExponentialBackoff in favor of WaitForFile
that uses inotify when possible.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
enable polling also when using inotify. It is generally useful to
have it as under high load inotify can lose notifications. It also
solves a race condition where the file is created while the watcher
is configured and it'd wait until the timeout and fail.
Closes: https://github.com/containers/libpod/issues/2942
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Mainly add support for podman build using --overlay mounts.
Updates containers/image also adds better support for new registries.conf
file.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
`string.Split()` splits into slice of size greater than 2
which may result in loss of environment variables
fixes#3132
Signed-off-by: Divyansh Kamboj <kambojdivyansh2000@gmail.com>
use a pause process to keep the user and mount namespace alive.
The pause process is created immediately on reload, and all successive
Podman processes will refer to it for joining the user&mount
namespace.
This solves all the race conditions we had on joining the correct
namespaces using the conmon processes.
As a fallback if the join fails for any reason (e.g. the pause process
was killed), then we try to join the running containers as we were
doing before.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
once the default event logger was removed from libpod.conf, we need to
set the default based on whether the systemd build tag is used or not.
Signed-off-by: baude <bbaude@redhat.com>
StartAndAttach() runs start() in a goroutine, which can allow it
to fire after the caller returns - and thus, after the defer to
unlock the container lock has fired.
The start() call _must_ occur while the container is locked, or
else state inconsistencies may occur.
Fixes#3114
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
print a clearer error message when an unprivileged user attempts to
create a network using CNI.
Closes: https://github.com/containers/libpod/issues/3118
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If the systemd development files are not present on the system which
builds podman, then `podman events` will error on runtime creation.
Beside this, a warning will be printed when compiling podman.
This commit mainly exists because projects which depend on libpod
would not need the podman event support and therefore do not need to
rely on the systemd headers.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
When using CGroupfs, we see races during pod removal between
removing the CGroup and the cleanup process starting (in the
CGroup, thus preventing removal).
The simplest way to avoid this is to prevent the forking of the
cleanup process. Conveniently, we can do this via the CGroup that
we already created for Conmon - we just need to update the PID
limit to 0, which completely inhibits new forks.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Instead of rewriting the logic, reuse the standard logic we use
for removing containers, which is much better tested.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Ensure that, if an error occurs somewhere along the way when we
remove a pod, it's preserved until the end and returned, even as
we continue to remove the pod.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Removing a pod must first removal all containers in the pod.
Libpod requires the state to remain consistent at all times, so
references to a deleted pod must all be cleansed first.
Pods can have many containers in them. We presently iterate
through all of them, and if an error occurs trying to clean up
and remove any single container, we abort the entire operation
(but cannot recover anything already removed - pod removal is not
an atomic operation).
Because of this, if a removal error occurs partway through, we
can end up with a pod in an inconsistent state that is no longer
usable. What's worse, if the error is in the infra container, and
it's persistent, we get zombie pods - completely unable to be
removed.
When we saw some of these same issues with containers not in
pods, we modified the removal code there to aggressively purge
containers from the database, then try to clean up afterwards.
Take the same approach here, and make cleanup errors nonfatal.
Once we've gone ahead and removed containers, we need to see
pod deletion through to the end - we'll log errors but keep
going.
Also, fix some other small things (most notably, we didn't make
events for the containers removed).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
After a reboot, when we refresh Podman's state, we retrieved the
lock from the fresh SHM instance, but we did not mark it as
allocated to prevent it being handed out to other containers and
pods.
Provide a method for marking locks as in-use, and use it when we
refresh Podman state after a reboot.
Fixes#2900
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We want to start supporting the registries.conf format.
Also start showing blocked registries in podman info
Fix sorting so all registries are listed together in podman info.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The on-failure restart option supports restarting only a given
number of times. To do this, we need one additional field in the
DB to track restart count (which conveniently fills a field in
Inspect we weren't populating), plus some plumbing logic.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This field indicates that a container was explciitly stopped by
an API call, and did not exit naturally. It's used when
implementing restart policy for containers.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Fallback to executing ps(1) in case we hit an unknown psgo descriptor.
This ensures backwards compatibility with docker-top, which was purely
ps(1) driven.
Also support comma-separated descriptors as input.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>