added support for --pid flag. User can specify ns:file, pod, private, or host.
container returns an error since you cannot point the ns of the pods infra container
to a container outside of the pod.
Signed-off-by: cdoern <cdoern@redhat.com>
There was an race condition when calling `GetRootlessCNINetNs()`. It
created the rootless cni directory before it got locked. Therefore
another process could have called cleanup and removed this directory
before it was used resulting in errors. The lockfile got moved into the
XDG_RUNTIME_DIR directory to prevent a panic when the parent dir was
removed by cleanup.
Fixes#10930Fixes#10922
To make this even more robust `GetRootlessCNINetNs()` will now return
locked. This guarantees that we can run `Do()` after `GetRootlessCNINetNs()`
before another process could have called `Cleanup()` in between.
[NO TESTS NEEDED] CI is flaking, hopefully this will fix it.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fix issue 10929 : `[Regression in 3.2.0] CNI-in-slirp4netns DNS gets broken when running a rootful container after running a rootless container`
When /etc/resolv.conf on the host is a symlink to /run/systemd/resolve/stub-resolv.conf,
we have to mount an empty filesystem on /run/systemd/resolve in the child namespace,
so as to isolate the directory from the host mount namespace.
Otherwise our bind-mount for /run/systemd/resolve/stub-resolv.conf is unmounted
when systemd-resolved unlinks and recreates /run/systemd/resolve/stub-resolv.conf on the host.
[NO TESTS NEEDED]
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
On EOF of STDIN, we need to perform a one-sided close of the
attach connection on the client side, to ensure that STDIN
finishing will also cause the exec session to terminate, instead
of hang.
Fixes#7360
Signed-off-by: Matthew Heon <mheon@redhat.com>
If mounting to existing directory the uid/gid should be preserved.
Primary uid/gid of container shouldn't be used.
Signed-off-by: Matej Vasek <mvasek@redhat.com>
We should not be exposing the store outside of Libpod. We want to
encapsulate it as an internal implementation detail - there's no
reason functions outside of Libpod should directly be
manipulating container storage. Convert the last use to invoke a
method on Libpod instead, and remove the function.
[NO TESTS NEEDED] as this is just a refactor.
Signed-off-by: Matthew Heon <mheon@redhat.com>
The rootless cni namespace needs a valid /etc/resolv.conf file. On some
distros is a symlink to somewhere under /run. Because the kernel will
follow the symlink before mounting, it is not possible to mount a file
at exactly /etc/resolv.conf. We have to ensure that the link target will
be available in the rootless cni mount ns.
Fixes#10855
Also fixed a bug in the /var/lib/cni directory lookup logic. It used
`filepath.Base` instead of `filepath.Dir` and thus looping infinitely.
Fixes#10857
[NO TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add a new service reaper package. Podman currently does not reap all
child processes. The slirp4netns and rootlesskit processes are not
reaped. The is not a problem for local podman since the podman process
dies before the other processes and then init will reap them for us.
However with podman system service it is possible that the podman
process is still alive after slirp died. In this case podman has to reap
it or the slirp process will be a zombie until the service is stopped.
The service reaper will listen in an extra goroutine on SIGCHLD. Once it
receives this signal it will try to reap all pids that were added with
`AddPID()`. While I would like to just reap all children this is not
possible because many parts of the code use `os/exec` with `cmd.Wait()`.
If we reap before `cmd.Wait()` things can break, so reaping everything
is not an option.
[NO TESTS NEEDED]
Fixes#9777
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
First, make podman diff accept optionally a second argument. This allows
the user to specify a second image/container to compare the first with.
If it is not set the parent layer will be used as before.
Second, podman container diff should only use containers and podman
image diff should only use images. Previously, podman container diff
would use the image when both an image and container with this name
exists.
To make this work two new parameters have been added to the api. If they
are not used the previous behaviour is used. The same applies to the
bindings.
Fixes#10649
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Pull the trigger on the `pkg/registries` package which acted as a proxy
for `c/image/pkg/sysregistriesv2`. Callers should be using the packages
from c/image directly, if needed at all.
Also make use of libimage's SystemContext() method which returns a copy
of a system context, further reducing the risk of unintentionally
altering global data.
[NO TESTS NEEDED]
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Create the /etc and /etc/mtab directories with the
correct ownership based on what the UID and GID is
for the container. This was causing issue when starting
the infra container with userns as the /etc directory
wasn't being created with the correct ownership.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Added logic and handling for two new Podman pod create Flags.
--cpus specifies the total number of cores on which the pod can execute, this
is a combination of the period and quota for the CPU.
--cpuset-cpus is a string value which determines of these available cores,
how many we will truly execute on.
Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
added Avg Cpu calculation and CPU up time to podman stats. Adding different feature sets in different PRs, CPU first.
resolves#9258
Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
`syncContainer()` requires the container to be locked, otherwise we can
end up with undefined behavior.
[NO TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Podman does not need to watch the cni config directory. If a network is
not found in the cache, OCICNI will reload the networks anyway and thus
even podman system service should work as expected.
Also include a change to not mount a "new" /var by default in the
rootless cni ns, instead try to use /var/lib/cni first and then the
parent dir. This allows users to store cni configs under /var/... which
is the case for the CI compose test.
[NO TESTS NEEDED]
Fixes#10686
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Commit 84b55eec27 attempted to fix a race waiting for the container
died event. Previously, Podman slept for duration of the polling
frequence which I considerred to be a mistake. As it turns out, I was
mistaken since the file logger will, in fact, NOT read until EOF and
then stop logging but stop logging immediately _after_ it woke up.
[NO TESTS NEEDED] as the race condition cannot be hit reliably.
Fixes: #10675
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Fix the suprious "Error: nil" messages. Also add some more context to
logged error messages which makes error sources more obvious.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Previously podman failed when run in an environment where 127.0.0.53 is
the only nameserver but systemd-resolved is not used directly.
In practice this happened when podman was run within an alpine container
that used the host's network and the host was running systemd-resolved.
This fix makes podman ignore a file not found error when reading /run/systemd/resolve/resolv.conf.
Closes#10733
[NO TESTS NEEDED]
Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
Users are complaining about read/only /var/tmp failing
even if TMPDIR=/tmp is set.
This PR Fixes: https://github.com/containers/podman/issues/10698
[NO TESTS NEEDED] No way to test this.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When starting a process with `podman exec -it` the terminal is resized
after the process is started. To fix this allow exec start to accept the
terminal height and width as parameter and let it resize right before
the process is started.
Fixes#10560
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The container name should have the slirp interface ip set in /etc/hosts
and not the gateway ip. Commit c8dfcce6db introduced this regression.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1972073
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Network connect/disconnect has to call the cni plugins when the network
namespace is already configured. This is the case for `ContainerStateRunning`
and `ContainerStateCreated`. This is important otherwise the network is
not attached to this network namespace and libpod will throw errors like
`network inspection mismatch...` This problem happened when using
`docker-compose up` in attached mode.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Podman uses the volume option map to check if it has to mount the volume
or not when the container is started. Commit 28138dafcc added to uid
and gid options to this map, however when only uid/gid is set we cannot
mount this volume because there is no filesystem or device specified.
Make sure we do not try to mount the volume when only the uid/gid option
is set since this is a simple chown operation.
Also when a uid/gid is explicity set, do not chown the volume based on
the container user when the volume is used for the first time.
Fixes#10620
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When making Exec Cleanup processes mandatory, I introduced a race
wherein attached exec sessions could be cleaned up and removed by
the cleanup process before the frontend had a chance to get their
exit code. Fortunately, we've dealt with this issue before in
containers, and the same solution can be applied here. I added an
event for an exec session's process exiting, `exec_died` (Docker
has an identical event, so this actually improves our
compatibility there) that includes the exit code of the exec
session. If the race happens and the exec session no longer
exists when we go to remove it, pick up exit code from the event
and exit cleanly.
Signed-off-by: Matthew Heon <mheon@redhat.com>
We were previously only doing this for detached exec. I don't
know why we did that, but I don't see any reason not to extend it
to all exec sessions - it guarantees that we will always clean up
exec sessions, even if the original `podman exec` process died.
[NO TESTS NEEDED] because I don't really know how to test this
one.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Unfortunately --pre-checkpointing never worked as intended and recent
changes to runc have shown that it is broken.
To create a pre-checkpoint CRIU expects the paths between the
pre-checkpoints to be a relative path. If having a previous checkpoint
it needs the be referenced like this: --prev-images-dir ../parent
Unfortunately Podman was giving runc (and CRIU) an absolute path.
Unfortunately, again, until March 2021 CRIU silently ignored if
the path was not relative and switch back to normal checkpointing.
This has been now fixed in CRIU and runc and running pre-checkpoint
with the latest runc fails, because runc already sees that the path is
absolute and returns an error.
This commit fixes this by giving runc a relative path.
This commit also fixes a second pre-checkpointing error which was just
recently introduced.
So summarizing: pre-checkpointing never worked correctly because CRIU
ignored wrong parameters and recent changes broke it even more.
Now both errors should be fixed.
[NO TESTS NEEDED]
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <adrian@lisas.de>
when looking up the container cgroup, ignore named hierarchies since
containers running systemd as payload will create a sub-cgroup and
move themselves there.
Closes: https://github.com/containers/podman/issues/10602
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Checkpointed containers started with --privileged fail during restore
with:
Error: error creating container storage: ProcessLabel and Mountlabel must either not be specified or both specified
This commit fixes it by not setting the labels when restoring a
privileged container.
[NO TESTS NEEDED]
Signed-off-by: Adrian Reber <areber@redhat.com>
When 127.0.0.53 is the only nameserver in /etc/resolv.conf assume
systemd-resolved is used. This is better because /etc/resolv.conf does
not have to be symlinked to /run/systemd/resolve/stub-resolv.conf in
order to use systemd-resolved.
[NO TESTS NEEDED]
Fixes: #10570
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fix a race in the k8s-file logs driver. When "following" the logs,
Podman will print the container's logs until the end. Previously,
Podman logged until the state transitioned into something non-running
which opened up a race with the container still running, possibly in
the "stopping" state.
To fix the race, log until we've seen the wait event for the specific
container. In that case, conmon will have finished writing all logs to
the file, and Podman will read it until EOF.
Further tweak the integration tests for testing `logs -f` on a running
container. Previously, the test only checked for one of two lines
stating that there was a race. Indeed the race was in using `run --rm`
where a log file may be removed before we could fully read it.
Fixes: #10596
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
The checkpoint archive compression was hardcoded to `archive.Gzip`.
There have been requests to make the used compression algorithm
selectable. There was especially the request to not compress the
checkpoint archive to be able to create faster checkpoints when not
compressing it.
This also changes the default from `gzip` to `zstd`. This change should
not break anything as the restore code path automatically handles
whatever compression the user provides during restore.
Signed-off-by: Adrian Reber <areber@redhat.com>
The containers /etc/resolv.conf allways preserved the ipv6 nameserves
from the host even when the container did not supported ipv6. Check
if the cni result contains an ipv6 address or slirp4netns has ipv6
support enabled and only add the ipv6 nameservers when this is the case.
The test needs to have an ipv6 nameserver in the hosts /etc/hosts but we
should never mess with this file on the host. Therefore the test is
skipped when no ipv6 is detected.
Fixes#10158
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
While different filters are applied in conjunction, the same filter (but
with different values) should be applied in disjunction. This allows,
for instance, to query the events of two containers.
Fixes: #10507
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Using the gvproxy application on the host, we can now port forward from
the machine vm on the host. It requires that 'gvproxy' be installed in
an executable location. gvproxy can be found in the
containers/gvisor-tap-vsock github repo.
[NO TESTS NEEDED]
Signed-off-by: Brent Baude <bbaude@redhat.com>
Move the creation of the channel outside of the sub-routine to fix a
data race between writing the channel (implicitly by calling
EventChannel()) and using that channel in libimage.
[NO TESTS NEEDED]
Fixes: #10459
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
After #8906, there is a potential race condition in container
removal of running containers with `--rm`. Running containers
must first be stopped, which was changed to unlock the container
to allow commands like `podman ps` to continue to run while
stopping; however, this also means that the cleanup process can
potentially run before we re-lock, and remove the container from
under us, resulting in error messages from `podman rm`. The end
result is unchanged, the container is still cleanly removed, but
the `podman rm` command will seem to have failed.
Work around this by pinging the database after we stop the
container to make sure it still exists. If it doesn't, our job is
done and we can exit cleanly.
Signed-off-by: Matthew Heon <mheon@redhat.com>
When the containers.conf field "NetNS" is set to "Bridge" and the
"RootlessNetworking" field is set to "cni", Podman will now
handle rootless in the same way it does root - all containers
will be joined to a default CNI network, instead of exclusively
using slirp4netns.
If no CNI default network config is present for the user, one
will be auto-generated (this also works for root, but it won't be
nearly as common there since the package should already ship a
config).
I eventually hope to remove the "NetNS=Bridge" bit from
containers.conf, but let's get something in for Brent to work
with.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Fix a race in journald driver. Following the logs implies streaming
until the container is dead. Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.
The nature of having two goroutines running simultaneously is pretty
much the core of the race condition. When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.
Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read. The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.
Note that the journald log driver now requires the journald event
backend to be set.
Fixes: #10323
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Fix a data race between creating and using the libimage-events channel.
[NO TESTS NEEDED] since it really depends on the scheduler and we
couldn't hit the race so far.
Fixes: #10459
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
ErrOCIRuntimeNotFound error is misleading. Try to make it more
understandable to the user that the OCI Runtime IE crun or runc is not
missing, but the command they attempted to run within the container is
missing.
[NO TESTS NEEDED] Regular tests should handle this.
Fixes: https://github.com/containers/podman/issues/10432
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Creating a macvlan network with the subnet or ipRange option should set
the ipam plugin type to `host-local`. We also have to insert the default
route.
Fixes#10283
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
When attempting to copy files into and out of running containers
within the host pidnamespace, the code was attempting to join the
host pidns again, and getting an error. This was causing the podman
cp command to fail. Since we are already in the host pid namespace,
we should not be attempting to join. This PR adds a check to see if
the container is in NOT host pid namespace, and only then attempts to
join.
Fixes: https://github.com/containers/podman/issues/9985
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Support UID, GID, Mode options for mount type secrets. Also, change
default secret permissions to 444 so all users can read secret.
Signed-off-by: Ashley Cui <acui@redhat.com>
This change adds the entry `host.containers.internal` to the `/etc/hosts`
file within a new containers filesystem. The ip address is determined by
the containers networking configuration and points to the gateway address
for the containers networking namespace.
Closes#5651
Signed-off-by: Baron Lenardson <lenardson.baron@gmail.com>
Docker allows relabeling of any volume passed in via -v, even
including named volumes. This normally isn't an issue at all,
given named volumes get the right label for container access
automatically, but this becomes an issue when volume plugins are
involved - these aren't managed by Podman, and may well be
unaware of SELinux labelling. We could automatically relabel
these volumes on creation, but I'm still reluctant to do that
(feels like it could break things). Instead, let's allow :z and
:Z to be used with named volumes, so users can explicitly request
relabel of a volume plugin-backed volume.
We also get :U at the same time. I don't see any real need for it
but it also doesn't seem to hurt, so I didn't bother disabling
it.
Fixes#10273
Signed-off-by: Matthew Heon <mheon@redhat.com>
Allow podman network reload to be run as rootless user. While it is
unlikely that the iptable rules are flushed inside the rootless cni
namespace, it could still happen. Also fix podman network reload --all
to ignore errors when a container does not have the bridge network mode,
e.g. slirp4netns.
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
We should create the /etc/mtab->/proc/mountinfo link
so that mount command will work within the container.
Docker does this by default.
Fixes: https://github.com/containers/podman/issues/10263
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The initial version of libimage changed the order of layers which has
now been restored to remain backwards compatible.
Further changes:
* Fix a bug in the journald logging which requires to strip trailing
new lines from the message. The system tests did not pass due to
empty new lines. Triggered by changing the default logger to
journald in containers/common.
* Fix another bug in the journald logging which embedded the container
ID inside the message rather than the specifid field. That surfaced
in a preceeding whitespace of each log line which broke the system
tests.
* Alter the system tests to make sure that the k8s-file and the
journald logging drivers are executed.
* A number of e2e tests have been changed to force the k8s-file driver
to make them pass when running inside a root container.
* Increase the timeout in a kill test which seems to take longer now.
Reasons are unknown. Tests passed earlier and no signal-related
changes happend. It may be CI VM flake since some system tests but
other flaked.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
When a container is automatically restarted due its restart policy and
the container used the slirp4netns netmode, the slirp4netns process
died. This caused the container to lose network connectivity.
To fix this we have to start a new slirp4netns process.
Fixes#8047
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Developers asked for a deterministic field to verify if podman is
running via API or linked directly to libpod library.
$ podman info --format '{{.Host.ServiceIsRemote}}'
false
$ podman-remote info --format '{{.Host.ServiceIsRemote}}'
true
$ podman --remote info --format '{{.Host.ServiceIsRemote}}'
true
* docs/conf.py formatted via black
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Commit 728b73d7c4 introduced a regression. Containers created with a
previous version do no longer start successfully. The problem is that
the PidFile in the container config is empty for those containers. If
the PidFile is empty we have to set it to the previous default.
[NO TESTS NEEDED] We should investigate why the system upgrade test did
not caught this.
Fixes#10274
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
In the case of generate kube the auto-update labels will be converted into kube annotations and for play kube they will be converted back to labels since that's what podman understands
Signed-off-by: Eduardo Vega <edvegavalerio@gmail.com>
Revert : https://github.com/containers/podman/pull/9895
Turns out that if Docker is in --selinux-enabeled, it still relabels if
the user tells the system to, even if running a --privileged container
or if the selinux separation is disabled --security-opt label=disable.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Env var secrets are env vars that are set inside the container but not
commited to and image. Also support reading from env var when creating a
secret.
Signed-off-by: Ashley Cui <acui@redhat.com>
filepath.Dir in some cases returns `.` symbol and calling this function
again returns same result. In such cases this function
never returns and causes some operations to stuck forever.
Closes#10216
Signed-off-by: Slava Bacherikov <slava@bacher09.org>
extend to pods the existing check whether the cgroup is usable when
running as rootless with cgroupfs.
commit 17ce567c68 introduced the
regression.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
volatile containers are a storage optimization that disables *sync()
syscalls for the container rootfs.
If a container is created with --rm, then automatically set the
volatile storage flag as anyway the container won't persist after a
reboot or machine crash.
[NO TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>