Commit Graph

477 Commits

Author SHA1 Message Date
Giuseppe Scrivano 49eb5af301
libpod: intermediate mount if UID not mapped into the userns
if the current user is not mapped into the new user namespace, use an
intermediate mount to allow the mount point to be accessible instead
of opening up all the parent directories for the mountpoint.

Closes: https://github.com/containers/podman/issues/23028

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
Giuseppe Scrivano 08a8429459
libpod: avoid chowning the rundir to root in the userns
so it is possible to remove the code to make the entire directory
world accessible.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-06-21 18:01:26 +02:00
Paul Holzinger e8ea1e7632
libpod: do not leak systemd hc startup unit timer
This fixes a regression added in commit 4fd84190b8, because the name was
overwritten by the createTimer() timer call the removeTransientFiles()
call removed the new timer and not the startup healthcheck. And then
when the container was stopped we leaked it as the wrong unit name was
in the state.

A new test has been added to ensure the logic works and we never leak
the system timers.

Fixes #22884

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-06-04 18:03:46 +02:00
Matthew Heon 046c0e5fc2 Only stop chowning volumes once they're not empty
When an empty volume is mounted into a container, Docker will
chown that volume appropriately for use in the container. Podman
does this as well, but there are differences in the details. In
Podman, a chown is presently a one-and-done deal; in Docker, it
will continue so long as the volume remains empty. Mount into a
dozen containers, but never add content, the chown occurs every
time. The chown is also linked to copy-up; it will always occur
when a copy-up occurred, despite the volume now not being empty.
This PR changes our logic to (mostly) match Docker's.

For some reason, the chowning also stops if the volume is chowned
to root at any point. This feels like a Docker bug, but as they
say, bug for bug compatible.

In retrospect, using bools for NeedsChown and NeedsCopyUp was a
mistake. Docker isn't actually tracking this stuff; they're just
doing a copy-up and permissions change unconditionally as long as
the volume is empty. They also have the two linked as one
operation, seemingly, despite happening at very different times
during container init. Replicating that in our stateful system is
nontrivial, hence the need for the new CopiedUp field. Basically,
we never want to chown a volume with contents in it, except if
that data is a result of a copy-up that resulted from mounting
into the current container. Tracking who did the copy-up is the
easiest way to do this.

Fixes #22571

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2024-05-22 17:47:01 -04:00
Giuseppe Scrivano b06c58b4a5
libpod: wait for healthy on main thread
wait for the healthy status on the thread where the container lock is
held.  Otherwise, if it is performed from a go routine, a different
thread is used (since the runtime.LockOSThread() call doesn't have any
effect), causing pthread_mutex_unlock() to fail with EPERM.

Closes: https://github.com/containers/podman/issues/22651

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-05-14 22:55:02 +02:00
openshift-merge-bot[bot] 0c09421f85
Merge pull request #22641 from mheon/handle_stopping_loop
Ensure that containers do not get stuck in stopping
2024-05-13 12:32:40 +00:00
Giuseppe Scrivano 8433a01aa2
Revert "container stop: kill conmon"
This reverts commit 909ab59419.

The workaround was added almost 5 years ago to workaround an issue
with old conmon releases.  It is safe to assume such ancient conmon
releases are not used anymore.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-05-09 22:49:14 +02:00
Matt Heon 3fa8e98a31 Ensure that containers do not get stuck in stopping
The scenario for inducing this is as follows:
1. Start a container with a long stop timeout and a PID1 that
   ignores SIGTERM
2. Use `podman stop` to stop that container
3. Simultaneously, in another terminal, kill -9 `pidof podman`
   (the container is now in ContainerStateStopping)
4. Now kill that container's Conmon with SIGKILL.
5. No commands are able to move the container from Stopping to
   Stopped now.

The cause is a logic bug in our exit-file handling logic. Conmon
being dead without an exit file causes no change to the state.
Add handling for this case that tries to clean up, including
stopping the container if it still seems to be running.

Fixes #19629

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-05-09 11:17:24 -04:00
Matt Heon 4fd84190b8 Add a random suffix to healthcheck unit names
Systemd dislikes it when we rapidly create and remove a transient
unit. Solution: If we change the name every time, it's different
enough that systemd is satisfied and we stop having errors trying
to restart the healthcheck.

Generate a random 32-bit integer, and add it (formatted as hex)
to the end of the unit name to do this. As a result, we now have
to store the unit name in the database, but it does make
backwards compat easy - if the unit name in the DB is empty, we
revert to the old behavior because the timer was created by old
Podman.

Should resolve RHEL-26105

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-05-03 11:45:05 -04:00
Paul Holzinger 83dbbc3a51
Replace golang.org/x/exp/slices with slices from std
Use "slices" from the standard library, this package was added in go
1.21 so we can use it now.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-04-23 11:16:40 +02:00
openshift-merge-bot[bot] c2cadfb5c5
Merge pull request #22322 from mheon/update_the_config
Make `podman update` changes persistent
2024-04-22 07:50:48 +00:00
Giuseppe Scrivano 5656ad40b1
libpod: use fileutils.(Le|E)xists
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2024-04-19 09:52:14 +02:00
Matt Heon 482ef7bfcf Add support for updating restart policy
This is something Docker does, and we did not do until now. Most
difficult/annoying part was the REST API, where I did not really
want to modify the struct being sent, so I made the new restart
policy parameters query parameters instead.

Testing was also a bit annoying, because testing restart policy
always is.

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-04-17 08:23:51 -04:00
Matt Heon be3f075402 Make `podman update` changes persistent
The logic here is more complex than I would like, largely due to
the behavior of `podman inspect` for running containers. When a
container is running, `podman inspect` will source as much as
possible from the OCI spec used to run that container, to grab
up-to-date information on things like devices. We don't want to
change this, it's definitely the right behavior, but it does make
updating a running container inconvenient: we have to rewrite the
OCI spec as part of the update to make sure that `podman inspect`
will read the correct resource limits.

Also, make update emit events. Docker does it, we should as well.

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-04-17 08:23:50 -04:00
lvyaoting 59ee130048 chore: fix function names in comment
Signed-off-by: lvyaoting <lvyaoting@outlook.com>
2024-04-08 11:36:50 +08:00
Paul Holzinger 15b8bb72a8
libpod: restart always reconfigure the netns
Always teardown the network, trying to reuse the netns has caused
a significant amount of bugs in this code here. It also never worked
for containers with user namespaces. So once and for all simplify this
by never reusing the netns. Originally this was done to have a faster
restart of containers but with netavark now we are much faster so it
shouldn't be that noticeable in practice. It also makes more sense to
reconfigure the netns as it is likely that the container exited due
some broken network state in which case reusing would just cause more
harm than good.

The main motivation for this change was the pasta change to use
--dns-forward by default. As the restarted contianer had no idea what
nameserver to use as pasta just kept running.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-03-19 12:21:18 +01:00
Urvashi Mohnani 667311c7d5 Use persist dir for oom file
Conmon writes the exit file and oom file (if container
was oom killed) to the persist directory. This directory
is retained across reboots as well.
Update podman to create a persist-dir/ctr-id for the exit
and oom files for each container to be written to. The oom
state of container is set after reading the files
from the persist-dir/ctr-id directory.
The exit code still continues to read the exit file from
the exits directory.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2024-02-12 09:13:39 -05:00
Matt Heon 72f1617fac Bump Go module to v5
Moving from Go module v4 to v5 prepares us for public releases.

Move done using gomove [1] as with the v3 and v4 moves.

[1] https://github.com/KSubedi/gomove

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-02-08 09:35:39 -05:00
openshift-merge-bot[bot] 5e081e47aa
Merge pull request #21332 from rhatdan/timezone
Reuse timezone code from containers/common
2024-02-08 14:13:40 +00:00
openshift-merge-bot[bot] 8a6165e592
Merge pull request #21522 from Luap99/restart-userns
fix userns + restart policy with slirp4netns
2024-02-08 10:41:54 +00:00
Matt Heon 3cf2f8ccf4 Handle more states during refresh
We were preserving ContainerStateExited, which is better than
nothing, but definitely not correct. A container that ran at any
point during the last boot should be moved to Exited state to
preserve the fact that they were run at least one. This means we
have to convert Running, Stopped, Stopping, Paused containers to
exited as well.

Signed-off-by: Matt Heon <mheon@redhat.com>
2024-02-07 08:33:56 -05:00
Paul Holzinger 7d15bc2efb
fix userns + restart policy with slirp4netns
Currently we deadlock in the slirp4netns setup code as we try to
configure an non exissting netns. The problem happens because we tear
down the netns in the userns case correctly since commit bbd6281ecc but
that introduces this slirp4netns problem. The code does a proper new
network setup later so we should only use the short cut when not in a
userns.

Fixes #21477

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-02-06 13:50:07 +01:00
Daniel J Walsh fcae702205
Reuse timezone code from containers/common
Replaces: https://github.com/containers/podman/pull/21077

[NO NEW TESTS NEEDED] Existing tests should handle this.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2024-02-06 07:09:16 -05:00
Oleksandr Redko 8bdf77aa20 Refactor: replace StringInSlice with slices.Contains
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2024-01-05 16:25:56 +02:00
Oleksandr Redko 2a2d0b0e18 chore: delete obsolete // +build lines
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2024-01-04 11:53:38 +02:00
Dan Čermák 5c7f745468
Remove deprecated field ContainerState.NetworkStatusOld
This field drags in a dependency on CNI and thereby blocks us from disabling CNI
support via a build tag

[NO NEW TESTS NEEDED]

Signed-off-by: Dan Čermák <dcermak@suse.com>
2023-12-12 17:09:39 +01:00
Daniel J Walsh c8f262fec9
Use idtools.SafeChown and SafeLchown everywhere
If we get an error chowning a file or directory to a UID/GID pair
for something like ENOSUP or EPERM, then we should ignore as long as the UID/GID
pair on disk is correct.

Fixes: https://github.com/containers/podman/issues/20801

[NO NEW TESTS NEEDED]

Since this is difficult to test and existing tests should be sufficient
to ensure no regression.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-11-27 20:41:56 -05:00
Daniel J Walsh ddd6cdfd77
Ignore SELinux relabel on unsupported file systems
We were ignoreing relabel requests on certain unsupported
file systems and not on others, this changes to consistently
logrus.Debug ENOTSUP file systems.

Fixes: https://github.com/containers/podman/discussions/20745

Still needs some work on the Buildah side.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-11-22 09:25:38 -05:00
openshift-ci[bot] 77d2658201
Merge pull request #20369 from cgiradkar/Issue-16759-docs
Define better error message for container name conflicts with external storage
2023-10-30 10:22:00 +00:00
Valentin Rothberg e966c86d98 container.conf: support attributed string slices
All `[]string`s in containers.conf have now been migrated to attributed
string slices which require some adjustments in Buildah and Podman.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-10-27 12:44:33 +02:00
Paul Holzinger bad25da92e
libpod: add !remote tag
This should never be pulled into the remote client.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-24 12:11:34 +02:00
Daniel J Walsh 8876380af9
not mounted layers should be reported as info not error
There is a potential race condition we are seeing where
we are seeing a message about a removed container which
could be caused by a non mounted container, this change
should clarify which is causing it.

Also if the container does not exists, just warn the user
instead of reporting an error, not much the user can do.

Fixes: https://github.com/containers/podman/issues/19702

[NO NEW TESTS NEEDED]

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-10-23 16:25:13 -04:00
Chetan Giradkar 2d65e57ae6 Define better error message for container name conflicts with external storage.
Updated the error message to suggest user to use --replace option to instruct Podman to replace the existsing external container with a newly created one.

closes #16759

Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
2023-10-18 12:52:02 +01:00
Paul Holzinger bbd6281ecc
libpod: restart+userns cleanup netns correctly
When a userns and netns is used we need to let the runtime create the
netns otherwise the netns is not owned by the right userns and thus
the capabilities would not be correct.

The current restart logic tries to reuse the netns which is fine if no
userns is used but when one is used we setup a new netns (which is
correct) but forgot to cleanup the old netns. This resulted in leaked
network namespaces and because no teardown was ever called leaked ipam
assignments, thus a quickly restarting container will run out of ip
space very fast.

Fixes #18615

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-10-17 17:25:50 +02:00
Giuseppe Scrivano 8ac2aa7938
container: always check if mountpoint is mounted
when running as a service, the c.state.Mounted flag could get out of
sync if the container is cleaned up through the cleanup process.

To avoid this, always check if the mountpoint is really present before
skipping the mount.

[NO NEW TESTS NEEDED]

Closes: https://github.com/containers/podman/issues/17042

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-10-09 17:20:22 +02:00
Giuseppe Scrivano b8f6a12d01
libpod: create the cgroup pod before containers
When a container is created and it is part of a pod, we ensure the pod
cgroup exists so limits can be applied on the pod cgroup.

Closes: https://github.com/containers/podman/issues/19175

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-09-08 14:58:48 +02:00
Valentin Rothberg d70f15cc0a start(): don't defer event
We'd otherwise emit the start event much after the actual start of the
container when --sdnotify=healthy.  I missed adding the change to commit
0cfd12786f.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-07-26 13:57:37 +02:00
Valentin Rothberg 0cfd12786f add "healthy" sdnotify policy
Add a new "healthy" sdnotify policy that instructs Podman to send the
READY message once the container has turned healthy.

Fixes: #6160
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-07-25 11:17:44 +02:00
Doug Rabson f8213a6d53 libpod: don't make a broken symlink for /etc/mtab on FreeBSD
This file has not been present in BSD systems since 2.9.1 BSD and as far
as I remember /proc/mounts has never existed on BSD systems.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <dfr@rabson.org>
2023-07-10 12:41:41 +01:00
Fang-Pen Lin dd81f7ac61
Pass in correct cwd value for hooks exe
Signed-off-by: Fang-Pen Lin <hello@fangpenlin.com>
2023-06-26 23:49:08 -07:00
Valentin Rothberg db37d66cd1 make image listing more resilient
Handle more TOCTOUs operating on listed images.  Also pull in
containers/common/pull/1520 and containers/common/pull/1522 which do the
same on the internal layer tree.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2216700
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-06-26 16:34:26 +02:00
Paul Holzinger 614c962c23
use libnetwork/slirp4netns from c/common
Most of the code moved there so if from there and remove it here.

Some extra changes are required here. This is a bit of a mess. The pipe
handling makes this a bit more difficult.

[NO NEW TESTS NEEDED] This is just a rework, existing tests must pass.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-22 11:16:13 +02:00
Paul Holzinger 810c97bd85
libpod: write /etc/{hosts,resolv.conf} once
My PR[1] to remove PostConfigureNetNS is blocked on other things I want
to split this change out. It reduces the complexity when generating
/etc/hosts and /etc/resolv.conf as now we always write this file after
we setup the network. this means we can get the actual ip from the netns
which is important.

[NO NEW TESTS NEEDED] This is just a rework.

[1] https://github.com/containers/podman/pull/18468

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-21 15:33:42 +02:00
Paul Holzinger 34c258b419
libpod: fix timezone handling
The current way of bind mounting the host timezone file has problems.
Because /etc/localtime in the image may exist and is a symlink under
/usr/share/zoneinfo it will overwrite the targetfile. That confuses
timezone parses especially java where this approach does not work at
all. So we end up with an link which does not reflect the actual truth.

The better way is to just change the symlink in the image like it is
done on the host. However because not all images ship tzdata we cannot
rely on that either. So now we do both, when tzdata is installed then
use the symlink and if not we keep the current way of copying the host
timezone file in the container to /etc/localtime.

Also note that we need to rebuild the systemd image to include tzdata in
order to test this as our images do not contain the tzdata by default.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2149876

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-06-01 11:04:13 +02:00
Miloslav Trmač f556e58bb0 Consolidate error handling in Container.cleanupStorage
Use a shared helper instead of copy&pasting the handling
of cleanupErr EIGHT times.

This changes the wording of logged error text, and the error
in one case, a bit.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-05-22 19:14:06 +02:00
Miloslav Trmač 4969c552ec Fix reporting errors on container unmount
[NO NEW TESTS NEEDED]
... because testing this would require us to intentionally
create an inconsistent state, which should ideally not be possible...
(and because at this point I don't even know what the reported failure
was.)

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-05-22 19:11:56 +02:00
Nalin Dahyabhai c400cc7ead libpod/Container.rootFsSize(): use recorded image sizes
In rootFsSize(), instead of calculating the size of the diff for every
layer of the container's base image, ask the storage library for the sum
of the values it recorded when it first wrote those layers.

In a similar fashion, teach rwSize() to use the library's
ContainerSize() method instead of trying to roll its own.

Replace calls to pkg/util.SizeOfPath() with calls to
github.com/containers/storage/pkg/directory.Size(), which does the same
thing.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
2023-05-09 09:33:37 -04:00
Giuseppe Scrivano 70870895b7
libpod: improve errors management in cleanupStorage
fix some issues with the handling of errors, we print an error only
when there is already one set to be returned.  Also the first error is
not printed, since it is reported back to the caller of the function.

Improve some messages with more context that can be helpful when
things go wrong.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-04-28 11:51:06 +02:00
Giuseppe Scrivano 5592dc12f9
libpod: report unmount idmapped rootfs errors
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-04-28 11:46:34 +02:00
Paul Holzinger edb64f8a76
libpod: stop containers with --restart=always
Commit 1ab833fb73 improved the situation but it is still not enough.
If you run short lived containers with --restart=always podman is
basically permanently restarting them. To only way to stop this is
podman stop. However podman stop does not do anything when the
container is already in a not running state. While this makes sense we
should still mark the container as explicitly stopped by the user.

Together with the change in shouldRestart() which now checks for
StoppedByUser this makes sure the cleanup process is not going to start
it back up again.

A simple reproducer is:
```
podman run --restart=always --name test -d alpine true
podman stop test
```
then check if the container is still running, the behavior is very
flaky, it took me like 20 podman stop tries before I finally hit the
correct window were it was stopped permanently.
With this patch it worked on the first try.

Fixes #18259

[NO NEW TESTS NEEDED] This is super flaky and hard to correctly test
in CI. MY ginkgo v2 work seems to trigger this in play kube tests so
that should catch at least some regressions. Also this may be something
that should be tested at podman test days by users (#17912).

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-20 11:23:05 +02:00