When running on a node with Cgroups v2, default to using `crun` instead
of `runc`. Note that this only impacts the hard-coded default config.
No user config will be over-written.
Fixes: #4463
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Currently podman generate kube does not generate the correct RunAsUser and RunAsGroup
options in the yaml file. This patch fixes this.
This patch also make `podman play kube` use the RunAdUser and RunAsGroup options if
they are specified in the yaml file.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
`go get github.com/cri-o/ocicni@deac903fd99b6c52d781c9f42b8db3af7dcfd00a`
I had to fix compilation errors in libpod/networking_linux.go
---
ocicni.Networks has changed from string to the structure NetAttachment
with the member Name (the former string value) and the member Ifname
(optional).
I don't think we can make use of Ifname here, so I just map the array of
structures to array of strings - e.g. dropping Ifname.
---
The function GetPodNetworkStatus no longer returns Result but it returns
the wrapper structure NetResult which contains the former Result plus
NetAttachment (Network name and Interface name).
Again, I don't think we can make use of that information here, so I
just added `.Result` to fix the build.
---
Issue: #1136
Signed-off-by: Jakub Filak <jakub.filak@sap.com>
If user specifies --detach-keys="", this will disable the feature.
Adding define.DefaultDetachKeys to help screen to help identify detach keys.
Updated man pages with additonal information.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When pulling an unqualified reference (e.g., `fedora`) make sure that
the reference is not using a non-docker transport to avoid iterating
over the search registries and trying to pull from them.
Fixes: #4434
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
If the kube.yaml specifieds the SELinux type or Level, we need the container
to be launched with the correct label.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
the pidWaitTimeout is already a Duration so do not multiply it again
by time.Millisecond.
Closes: https://github.com/containers/libpod/issues/4344
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Pull in changes to pkg/secrets/secrets.go that adds the
logic to disable fips mode if a pod/container has a
label set.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
change the default to -1, so that we can change the semantic of
"--tail 0" to not print any existing log line.
Closes: https://github.com/containers/libpod/issues/4396
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Refactor the `RuntimeConfig` along with related code from libpod into
libpod/config. Note that this is a first step of consolidating code
into more coherent packages to make the code more maintainable and less
prone to regressions on the long runs.
Some libpod definitions were moved to `libpod/define` to resolve
circular dependencies.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
There were many situations that made exec act funky with input. pipes didn't work as expected, as well as sending input before the shell opened.
Thinking about it, it seemed as though the issues were because of how os.Stdin buffers (it doesn't). Dropping this input had some weird consequences.
Instead, read from os.Stdin as bufio.Reader, allowing the input to buffer before passing it to the container.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
command.Start() just starts the command. That catches some
errors, but the nasty ones - bad options and similar - happen
when the command runs. Use CombinedOutput() instead - it waits
for the command to exit, and thus catches non-0 exit of the
`mount` command (invalid options, for example).
STDERR from the `mount` command is directly used, which isn't
necessarily the best, but we can't really get much more info on
what went wrong.
Fixes#4303
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
always create a new cgroup for conmon also when running as rootless.
We were previously creating one only when necessary, but that behaves
differently than root containers.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Currently podman play kube is not using the system default seccomp.json file.
This PR will use the default or override location for podman play.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Generate an image's RepoDigests list using all applicable digests, and
refrain from outputting a digest in the tag column of the "images"
output.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Be prepared to report multiple image digests for images which contain
multiple manifests but, because they continue to have the same set of
layers and the same configuration, are considered to be the same image.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Add --override-arch and --override-os as hidden flags, in line with the
global flag names that skopeo uses, so that we can test behavior around
manifest lists without having to conditionalize more of it by arch.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
When an image can be opened as an ImageSource but not an Image, handle
the case where it's an image list all by itself, the case where it's an
image for a different architecture/OS combination, or the case where
it's both.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Move to containers/image v5 and containers/buildah to v1.11.4.
Replace an equality check with a type assertion when checking for a
docker.ErrUnauthorizedForCredentials in `podman login`.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
read the slirp4netns stderr and propagate it in the error when the
process fails.
Replace: https://github.com/containers/libpod/pull/4338
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We have a lot of checks for container state scattered throughout
libpod. Many of these need to ensure the container is in one of a
given set of states so an operation may safely proceed.
Previously there was no set way of doing this, so we'd use unique
boolean logic for each one. Introduce a helper to standardize
state checks.
Note that this is only intended to replace checks for multiple
states. A simple check for one state (ContainerStateRunning, for
example) should remain a straight equality, and not use this new
helper.
Signed-off-by: Matthew Heon <mheon@redhat.com>
when running in systemd mode on cgroups v1, make sure the
/sys/fs/cgroup/systemd/release_agent is masked otherwise the container
is able to modify it and execute scripts on the host.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When you try and create a new volume with the name of a volume
that already exists, you presently get a thoroughly unhelpful
error from `mkdir` as the volume attempts to create the
directory it will be mounted at. An EEXIST out of mkdir is not
particularly helpful to Podman users - it doesn't explain that
the name is already taken by another volume.
The solution here is potentially racy as the runtime is not
locked, so someone else could take the name while we're still
getting things set up, but that's a narrow timing window, and we
will still return an error - just an error that's not as good as
this one.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
if the cgroup manager is set to systemd, detect if dbus is available,
otherwise fallback to --cgroup-manager=cgroupfs.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Everything else is a flag to mount, but "uid" and "gid" are not.
We need to parse them out of "o" and handle them separately.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
make sure the user overrides are stored in the configuration file when
first created.
Closes: https://github.com/containers/libpod/issues/2659
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Previously, when `podman run` encountered a volume mount without
separate source and destination (e.g. `-v /run`) we would assume
that both were the same - a bind mount of `/run` on the host to
`/run` in the container. However, this does not match Docker's
behavior - in Docker, this makes an anonymous named volume that
will be mounted at `/run`.
We already have (more limited) support for these anonymous
volumes in the form of image volumes. Extend this support to
allow it to be used with user-created volumes coming in from the
`-v` flag.
This change also affects how named volumes created by the
container but given names are treated by `podman run --rm` and
`podman rm -v`. Previously, they would be removed with the
container in these cases, but this did not match Docker's
behaviour. Docker only removed anonymous volumes. With this patch
we move to that model as well; `podman run -v testvol:/test` will
not have `testvol` survive the container being removed by `podman
rm -v`.
The sum total of these changes let us turn on volume removal in
`--rm` by default.
Fixes: #4276
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When a container is created with a given OCI runtime, but then it
is uninstalled or removed from the configuration file, Libpod
presently reacts very poorly. The EvictContainer code can
potentially remove these containers, but we still can't see them
in `podman ps` (aside from the massive logrus.Errorf messages
they create).
Providing a minimal OCI runtime implementation for missing
runtimes allows us to behave better. We'll be able to retrieve
containers from the database, though we still pop up an error for
each missing runtime. For containers which are stopped, we can
remove them as normal.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
network statistics cannot be collected for rootless network devices with
the current implementation. for now, we return nil so that stats will
at least for users.
Fixes:#4268
Signed-off-by: baude <bbaude@redhat.com>
The json field is called `Image` while the go field is called `ImageID`,
tricking users into filtering for `Image` which ultimately results in an
error. Hence, rename the field to `Image` to align json and go.
To prevent podman users from regressing, rename `Image` to `ImageID` in
the specified filters. Add tests to prevent us from regressing. Note
that consumers of the go API that are using `ImageID` are regressing;
ultimately we consider it to be a bug fix.
Fixes: #4193
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Also, ensure that we don't try to mount them without root - it
appears that it can somehow not error and report that mount was
successful when it clearly did not succeed, which can induce this
case.
We reuse the `--force` flag to indicate that a volume should be
removed even after unmount errors. It seems fairly natural to
expect that --force will remove a volume that is otherwise
presenting problems.
Finally, ignore EINVAL on unmount - if the mount point no longer
exists our job is done.
Fixes: #4247Fixes: #4248
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In some cases, conmon can fail without writing logs. Change the wording
of the error message from
"error reading container (probably exited) json message"
to
"container create failed (no logs from conmon)"
to have a more helpful error message that is more consistent with other
errors at that stage of execution.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Previously, `podman checkport restore` with exported containers,
when told to create a new container based on the exported
checkpoint, would create a new container, with a new container
ID, but not reset CGroup path - which contained the ID of the
original container.
If this was done multiple times, the result was two containers
with the same cgroup paths. Operations on these containers would
this have a chance of crossing over to affect the other one; the
most notable was `podman rm` once it was changed to use the --all
flag when stopping the container; all processes in the cgroup,
including the ones in the other container, would be stopped.
Reset cgroups on restore to ensure that the path matches the ID
of the container actually being run.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This is a horrible hack to work around issues with Fedora 31, but
other distros might need it to, so we'll move it upstream.
I do not recommend this functionality for general use, and the
manpages and other documentation will reflect this. But for some
upgrade cases, it will be the only thing that allows for a
working system.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
For future work, we need multiple implementations of the OCI
runtime, not just a Conmon-wrapped runtime matching the runc CLI.
As part of this, do some refactoring on the interface for exec
(move to a struct, not a massive list of arguments). Also, add
'all' support to Kill and Stop (supported by runc and used a bit
internally for removing containers).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
when runc returns an error about not being v2 complient, catch the error
and logrus an actionable message for users.
Signed-off-by: baude <bbaude@redhat.com>
This requires updating all import paths throughout, and a matching
buildah update to interoperate.
I can't figure out the reason for go.mod tracking
github.com/containers/image v3.0.2+incompatible // indirect
((go mod graph) lists it as a direct dependency of libpod, but
(go list -json -m all) lists it as an indirect dependency),
but at least looking at the vendor subdirectory, it doesn't seem
to be actually used in the built binaries.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This ensures that containers that didn't require an evict will be
dealt with normally, and we only break out evict for containers
that refuse to be removed by normal means.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
fixes a segfault when slirp4netns is not installed and the slirp sync
pipe is not created.
Closes: https://github.com/containers/libpod/issues/4113
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
A true result from reexec.Init() isn't an error, but it indicates that
main() should exit with a success exit status.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Add ability to evict a container when it becomes unusable. This may
happen when the host setup changes after a container creation, making it
impossible for that container to be used or removed.
Evicting a container is done using the `rm --force` command.
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
CNI expects that a DELETE be run before re-creating container
networks. If a reboot occurs quickly enough that containers can't
stop and clean up, that DELETE never happens, and Podman
currently wipes the old network info and thinks the state has
been entirely cleared. Unfortunately, that may not be the case on
the CNI side. Some things - like IP address reservations - may
not have been cleared.
To solve this, manually re-run CNI Delete on refresh. If the
container has already been deleted this seems harmless. If not,
it should clear lingering state.
Fixes: #3759
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to run rootless containers by:
1. create a network namespace
2. pass the netns persistent mount path to the slirp4netns
to create the tap inferface
3. pass the netns path to the OCI spec, so the runtime can
enter the netns
Closes#2897
Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
If a user upgrades to a machine that defaults to a cgroups V2 machine
and has a libpod.conf file in their homedir that defaults to OCI Runtime runc,
then we want to change it one time to crun.
runc as of this point does not work on cgroupV2 systems. This patch will
eventually be removed but is needed until runc has support.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If the HOME environment variable is not set, make sure it is set to
the configuration found in the container /etc/passwd file.
It was previously depending on a runc behavior that always set HOME
when it is not set. The OCI runtime specifications do not require
HOME to be set so move the logic to libpod.
Closes: https://github.com/debarshiray/toolbox/issues/266
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We've been seeing a lot of issues (ref: #4061, but there are
others) where Podman hiccups on trying to start a container,
because some temporary files have been retained and Conmon will
not overwrite them.
If we're calling start() we can safely assume that we really want
those files gone so the container starts without error, so invoke
the cleanup routine. It's relatively cheap (four file removes) so
it shouldn't hurt us that much.
Also contains a small simplification to the removeConmonFiles
logic - we don't need to stat-then-remove when ignoring ENOENT is
fine.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
There were two problems with preserve fds.
libpod didn't open the fds before passing _OCI*PIPE to conmon. This caused libpod to talk on the preserved fds, rather than the pipes, with conmon talking on the pipes. This caused a hang.
Libpod also didn't convert an int to string correctly, so it would further fail.
Fix these and add a unit test to make sure we don't regress in the future
Note: this test will not pass on crun until crun supports --preserve-fds
Signed-off-by: Peter Hunt <pehunt@redhat.com>
if slirp4netns supports sandboxing, enable it.
It automatically creates a new mount namespace where slirp4netns will
run and have limited access to the host resources.
It needs slirp4netns 0.4.1.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We should not be throwing errors because the operation we wanted
to perform is already done. Now, it is definitely strange that a
container is actually unmounted, but shows as mounted in the DB -
if this reoccurs in a way where we can investigate, it's worth
tearing into.
Fixes#4033
Signed-off-by: Matthew Heon <mheon@redhat.com>
If you are running a rootless container on cgroupV1
you can not pause the container. We need to report the proper error
if this happens.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This change matches what is happening on the podman local side
and should eliminate a race condition.
Also exit commands on the server side should start to return to client.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We have leaked the exit number codess all over the code, this patch
removes the numbers to constants.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
if we register the resize func too early, it attempts to read from the 'ctl' file before it exists. this causes the func to error, and the resize to not go through.
Fix this by registering resize func later for conmon. This, along with a conmon fix, will allow exec to know the terminal size at startup
Signed-off-by: Peter Hunt <pehunt@redhat.com>
when executing a healthcheck, we were not cleaning up after exec's use
of a socket. we now remove the socket file and ignore if for reason it
does not exist.
Fixes: #3962
Signed-off-by: baude <bbaude@redhat.com>
When --cgroupns=private is used we need to mount a new cgroup file
system so that it points to the correct namespace.
Needs: https://github.com/containers/crun/pull/88
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when running in rootless mode and using systemd as cgroup manager
create automatically a systemd scope when the user doesn't own the
current cgroup.
This solves a couple of issues:
on cgroup v2 it is necessary that a process before it can moved to a
different cgroup tree must be in a directory owned by the unprivileged
user. This is not always true, e.g. when creating a session with su
-l.
Closes: https://github.com/containers/libpod/issues/3937
Also, for running systemd in a container it was before necessary to
specify "systemd-run --scope --user podman ...", now this is done
automatically as part of this PR.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This will be used when we allow 'podman ps' to display info on
storage containers instead of Libpod containers.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Lookup was written before volume states merged, but merged after,
and CI didn't catch the obvious failure here. Without a valid
state, we try to unmarshall into a null pointer, and 'volume rm'
is completely broken because of it.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Podman is not the only user of containers/storage, and as such we
cannot rely on our database as the sole source of truth when
pruning images. If images do not show as in use from Podman's
perspective, but subsequently fail to remove because they are
being used by a container, they're probably being used by Buildah
or another c/storage client.
Since the images in question are in use, we shouldn't error on
failure to prune them - we weren't supposed to prune them in the
first place.
Fixes: #3983
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This is mostly used with Systemd, which really wants to manage
CGroups itself when managing containers via unit file.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This change adds the following annotation to every container created by
podman:
```json
"Annotations": {
"io.containers.manager": "libpod"
}
```
Target of this annotaions is to indicate which project in the containers
ecosystem is the major manager of a container when applications share
the same storage paths. This way projects can decide if they want to
manipulate the container or not. For example, since CRI-O and podman are
not using the same container library (libpod), CRI-O can skip podman
containers and provide the end user more useful information.
A corresponding end-to-end test has been adapted as well.
Relates to: https://github.com/cri-o/cri-o/pull/2761
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Previously, we only did this for volumes created at the same time
as the container. However, this is not correct behavior - Docker
does so for all named volumes, even those made with
'podman volume create' and mounted into a container later.
Fixes#3945
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This isn't included in Docker, but seems handy enough.
Use the new API for 'volume rm' and 'volume inspect'.
Fixes#3891
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We want to get podman info to tell us about the version of
the mount program to help us diagnose issues users are having.
Also if in rootless mode and slirp4netns is installed reveal package
info on slirp4netns.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If c/storage paths are explicitly set to "" (the empty string) it
will use compiled-in defaults. However, it won't tell us this via
`storage.GetDefaultStoreOptions()` - we just get the empty string
(which can put our defaults, some of which are relative to
c/storage, in a bad spot).
Hardcode a sane default for cases like this. Furthermore, add
some sanity checks to paths, to ensure we don't use relative
paths for core parts of libpod.
Fixes#3952
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When we fail to remove a container's SHM, that's an error, and we
need to report it as such. This may be part of our lingering
storage woes.
Also, remove MNT_DETACH. It may be another cause of the storage
removal failures.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When volume options and the local volume driver are specified,
the volume is intended to be mounted using the 'mount' command.
Supported options will be used to volume the volume before the
first container using it starts, and unmount the volume after the
last container using it dies.
This should work for any local filesystem, though at present I've
only tested with tmpfs and btrfs.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We need to be able to track the number of times a volume has been
mounted for tmpfs/nfs/etc volumes. As such, we need a mutable
state for volumes. Add one, with the expected update/save methods
in both states.
There is backwards compat here, in that older volumes without a
state will still be accepted.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In upcoming commits, we're going to turn on the backends for
these fields. Volumes with these set will act fundamentally
differently from other volumes. There will probably be validation
required for each field.
Until now, though, we've freely allowed creation of volumes with
these set - they just did nothing. So we have no idea what could
be in the DB with old volumes.
Change the struct tags so we don't have to worry about old,
unvalidated data. We'll start fresh with new volumes.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
use the inotify backend to be notified on the container exit instead
of polling continuosly the runtime. Polling the runtime slowns
significantly down the podman execution time for short lived
processes:
$ time bin/podman run --rm -ti fedora true
real 0m0.324s
user 0m0.088s
sys 0m0.064s
from:
$ time podman run --rm -ti fedora true
real 0m4.199s
user 0m5.339s
sys 0m0.344s
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when cni returns a list of dns servers, we should add them under the
right conditions. the defined conditions are as follows:
- if the user provides dns, it and only it are added.
- if not above and you get a cni name server, it is added and a
forwarding dns instance is created for what was in resolv.conf.
- if not either above, the entries from the host's resolv.conf are used.
Signed-off-by: baude <bbaude@redhat.com>
Signed-off-by: baude <bbaude@redhat.com>
If I mount, say, /usr/bin into my container - I expect to be able
to run the executables in that mount. Unconditionally applying
noexec would be a bad idea.
Before my patches to change mount options and allow exec/dev/suid
being set explicitly, we inferred the mount options from where on
the base system the mount originated, and the options it had
there. Implement the same functionality for the new option
handling.
There's a lot of performance left on the table here, but I don't
know that this is ever going to take enough time to make it worth
optimizing.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Previously, we explicitly set noexec/nosuid/nodev on every mount,
with no ability to disable them. The 'mount' command on Linux
will accept their inverses without complaint, though - 'noexec'
is counteracted by 'exec', 'nosuid' by 'suid', etc. Add support
for passing these options at the command line to disable our
explicit forcing of security options.
This also cleans up mount option handling significantly. We are
still parsing options in more than one place, which isn't good,
but option parsing for bind and tmpfs mounts has been unified.
Fixes: #3819Fixes: #3803
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This will require a 'podman system renumber' after being applied
to get lock numbers for existing volumes.
Add the DB backend code for rewriting volume configs and use it
for updating lock numbers as part of 'system renumber'.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Decompose() returns an error defined in CNI which has been removed
upstream because it had no in-tree (eg in CNI) users.
Signed-off-by: Dan Williams <dcbw@redhat.com>