This factors out the check for cgroupsv2 unified mode into a
platform-specific file and stops podman from generating a (harmless)
warning every time it is run on FreeBSD.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
I believe the previous code meant to use cmd.Run instead of cmd.Start.
The issue is that cmd.Start returns before the command has finished
executing, so the conditional body checking for the stderr of the
command never gets executed.
Raise the cmd.Start up into it's own conditional, which is checking for
whether the process could be started. Then we consume stderr, check for
some specific strings in the output, and then finally continue on with
the rest of the code.
Signed-off-by: Keith Johnson <kj@ubergeek42.com>
Fix following issues:
- create container API handler ignores Annotations from HostConfig
- inspect container API handler does not provide Annotations as
part of HostConfig
Signed-off-by: diplane <diplane3d@gmail.com>
Always teardown the network, trying to reuse the netns has caused
a significant amount of bugs in this code here. It also never worked
for containers with user namespaces. So once and for all simplify this
by never reusing the netns. Originally this was done to have a faster
restart of containers but with netavark now we are much faster so it
shouldn't be that noticeable in practice. It also makes more sense to
reconfigure the netns as it is likely that the container exited due
some broken network state in which case reusing would just cause more
harm than good.
The main motivation for this change was the pasta change to use
--dns-forward by default. As the restarted contianer had no idea what
nameserver to use as pasta just kept running.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
By default we just ignored any localhost reolvers, this is problematic
for anyone with more complicated dns setups, i.e. split dns with
systemd-reolved. To address this we now make use of the build in dns
proxy in pasta. As such we need to set the default nameserver ip now.
A second change is the option to exclude certain ips when generating the
host.containers.internal ip. With that we no longer set it to the same
ip as is used in the netns. The fix is not perfect as it could mean on a
system with a single ip we no longer add the entry, however given the
previous entry was incorrect anyway this seems like the better behavior.
Fixes#22044
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The annotations should be maintained by CRI-O itself to decouple the
projects from a dependency perspective.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Commit 03f6589f3 added basic support for pull-error event from libimage
but it contains several problems:
1. storing the error as error type prevents it from being unmarshalled,
thus change it to a string
2. the error was never propagated from the libimage event to the podman
event struct
3. the error message was not wired into the cli and API
This commit fixes these problems.
Fixes#21458
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
when performing a system reset with containers that run somewhere where
a soft kill wont work (like sleep), containers will wait 10 seconds
before terminating with a sigkill. But for a forceful action like
system reset, we should outright set no timeout so containers stop
quickly and are not waiting on a timeout
Fixes#21874
Signed-off-by: Brent Baude <bbaude@redhat.com>
This vendors the latest c/common version, including making Pasta
the default rootless network provider. That broke a number of
tests, which have been fixed as part of this PR.
Also includes a change to network stats logic, which simplifies
the code a bit and makes it actually work with Pasta.
Signed-off-by: Matt Heon <mheon@redhat.com>
The ID filed in the Event struct is duplicated for no reason, since the
Details struct is directly embedded in the Event the ID filed is
basically duplicate on the same level multiple times. Removing this one
should be be safe and make no change to the resulting json.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This effectively fix errors like "unable to upgrade to tcp, received
409" like #19930 in the special case where podman itself is running
rootful but inside a container which itself is rootless.
[NO NEW TESTS NEEDED]
Signed-off-by: Romain Geissler <romain.geissler@amadeus.com>
The reserved annotation io.podman.annotations.volumes-from is made public to let user define volumes-from to have one container mount volumes of other containers.
The annotation format is: io.podman.annotations.volumes-from/tgtCtr: "srcCtr1:mntOpts1;srcCtr2:mntOpts;..."
Fixes: containers#16819
Signed-off-by: Vikas Goel <vikas.goel@gmail.com>
if the target mount path already exists and the container uses a user
namespace, correctly map the target UID/GID to the host values before
attempting a chown.
Closes: https://github.com/containers/podman/issues/21608
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Conmon writes the exit file and oom file (if container
was oom killed) to the persist directory. This directory
is retained across reboots as well.
Update podman to create a persist-dir/ctr-id for the exit
and oom files for each container to be written to. The oom
state of container is set after reading the files
from the persist-dir/ctr-id directory.
The exit code still continues to read the exit file from
the exits directory.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Moving from Go module v4 to v5 prepares us for public releases.
Move done using gomove [1] as with the v3 and v4 moves.
[1] https://github.com/KSubedi/gomove
Signed-off-by: Matt Heon <mheon@redhat.com>
We were preserving ContainerStateExited, which is better than
nothing, but definitely not correct. A container that ran at any
point during the last boot should be moved to Exited state to
preserve the fact that they were run at least one. This means we
have to convert Running, Stopped, Stopping, Paused containers to
exited as well.
Signed-off-by: Matt Heon <mheon@redhat.com>
When interface_name attribute in containers.conf file is set to "device", then set interface names inside containers same as the network_interface names of the respective network.
The change applies to macvlan and ipvlan networks only. The interface_name attribute value has no impact on any other types of networks.
If the interface name is set in the user request, then that takes precedence.
Fixes: #21313
Signed-off-by: Vikas Goel <vikas.goel@gmail.com>
This mirrors how the Docker API handles things, allowing us to be
more compatible with Docker and more verbose on the Libpod API.
Stats are given as per network interface in the container, but
still aggregated for `podman stats` and `podman pod stats`
display (so the CLI does not change, only the Libpod and Compat
APIs).
Signed-off-by: Matt Heon <mheon@redhat.com>
During system shutdown, Podman should go down gracefully, meaning
that we have time to spawn cleanup processes which remove any
containers set to autoremove. Unfortunately, this isn't always
the case. If we get a SIGKILL because the system is going down
immediately, we can't recover from this, and the autoremove
containers are not removed.
However, we can pick up any leftover autoremove containers when
we refesh the DB state, which is the first thing Podman does
after a reboot. By detecting any autoremove containers that have
actually run (a container that was created but never run doesn't
need to be removed) at that point and removing them, we keep the
fresh boot clean, even if Podman was terminated abnormally.
Fixes#21482
[NO NEW TESTS NEEDED] This requires a reboot to realistically
test.
Signed-off-by: Matt Heon <mheon@redhat.com>
Currently we deadlock in the slirp4netns setup code as we try to
configure an non exissting netns. The problem happens because we tear
down the netns in the userns case correctly since commit bbd6281ecc but
that introduces this slirp4netns problem. The code does a proper new
network setup later so we should only use the short cut when not in a
userns.
Fixes#21477
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Podman v5 will not support cgroups-v1. This commit will print a warning
if it detects a cgroups-v1 system. The warning can be hidden by setting
envvar `PODMAN_CGROUPSV1_WARNING`.
This warning is patched out for RHEL 9 builds as cgroups-v1 will still
be supported on RHEL 9 systems.
Resolves: https://issues.redhat.com/browse/RUN-1957
[NO NEW TESTS NEEDED]
Co-authored-by: Ed Santiago <santiago@redhat.com>
Co-authored-by: Sascha Grunert <sgrunert@redhat.com>
Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
The current field separator comma of the inspect annotation conflicts with the mount options of --volumes-from as the mount options itself can be comma separated.
Signed-off-by: Vikas Goel <vikas.goel@gmail.com>
The update to runc broke creation of devices for containers in
the pod cgroup. We don't support the device cgroup for pods at
present, so just disable it for now, resolving the issue.
Thanks to Giuseppe for finding this one!
[NO NEW TESTS NEEDED] This is a fix for broken tests
Signed-off-by: Matt Heon <mheon@redhat.com>
When inspecting a container that does not define any health check, the health field should return nil. This matches docker behavior.
Signed-off-by: Ashley Cui <acui@redhat.com>
This is one of the breaking changes in Podman 5.0: removing the
ability to create new instances of the old Bolt database. This
does not remove support for the database entirely, as existing
Bolt databases will still be usable, but all new installs will
use SQLite after this point - if Bolt is forced by config, we'll
just error.
We don't have plans to outright remove the Bolt code. If that
were to happen, it'd be Podman 6.0 at least, and a significant
enough change it'd warrant a lot of discussion and planning. We
do intend to start winding down support of BoltDB, though, and
new features may be added only to SQLite from here on.
I have added an escape hatch via an undocumented environment
variable that allows us to continue testing BoltDB in CI (and, if
necessary, locally) but I don't want this to be used for any
purpose except continued testing of the old DB to ensure we don't
break it.
Signed-off-by: Matt Heon <mheon@redhat.com>
When preparing container inspection output, ensure we actually have masked paths to work with.
These will only be available on Linux, which is no longer always true as we also support FreeBSD now.
Fixes#21117
Signed-off-by: Ben Cooksley <bcooksley@kde.org>
Initial impetus was #20958 (ps --format .Label abc). This is
a complicated solution to a simple-seeming problem.
The problem: .Label is a cobra *function*, something I did not
know about nor handle.
Solution: recognize cobra functions. Switch to __complete,
not __completeNoDesc, so we can see the number of arguments
required. Invent new man-page format for documenting functions.
And, finally, start enforcing how functions (and cobra structs)
are documented.
This discovered a never-used completion function, .Recycle(),
in podman-events. Remove it.
[NO NEW TESTS NEEDED] - the .go change is an excision of dead code.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Before this, for some special Podman commands (system reset,
system migrate, system renumber), Podman would create a first
Libpod runtime to do initialization and flag parsing, then stop
that runtime and create an entirely new runtime to perform the
actual task. This is an artifact of the pre-Podman 2.0 days, when
there was almost no indirection between Libpod and the CLI, and
we only used one runtime because we didn't need a second runtime
for flag parsing and basic init.
This system was clunky, and apparently, very buggy. When we
migrated to SQLite, some logic was introduced where we'd select a
different database location based on whether or not Libpod's
StaticDir was manually set - which differed between the first
invocation of Libpod and the second. So we'd get a different
database for some commands (like `system reset`) and they would
not be able to see existing containers, meaning they would not
function properly.
The immediate cause is obviously the SQLite behavior, but I'm
certain there's a lot more baggage hiding behind this multiple
Libpod runtime logic, so let's just refactor it out. It doesn't
make sense, and complicates the code. Instead, make Reset,
Renumber, and Migrate methods of the libpod Runtime. For Reset
and Renumber, we can shut the runtime down afterwards to achieve
the desired effect (no valid runtime after). Then pipe all of
them through the ContainerEngine so cmd/podman can access them.
As part of this, remove the SystemEngine part of pkg/domain. This
was supposed to encompass these "special" commands, but every
command in SystemEngine is actually a ContainerEngine command.
Reset, Renumber, Migrate - they all need a full Libpod and access
to all containers. There's no point to a separate engine if it
just wraps Libpod in the exact same way as ContainerEngine. This
consolidation saves us a bit more code and complexity.
Signed-off-by: Matt Heon <mheon@redhat.com>
It looks like we had some logic for this from #10789 but it does
not appear to have ever worked; we can't pull external containers
out of the DB, so the ContainerRm call failed unconditionally.
Instead, just handle them in Libpod when we're removing images.
We're removing every image, so setting Force when removing images
should get rid of all external containers. It's a little later in
the process than the current (nonfunctional) solution is but I
can't think of a reason why that would be bad.
[NO NEW TESTS NEEDED] We do not currently test `system reset`.
We should probably reevaluate that at some point this year.
Fixes https://issues.redhat.com/browse/RHEL-21261
Signed-off-by: Matt Heon <mheon@redhat.com>
Cut is a cleaner & more performant api relative to SplitN(_, _, 2) added in go 1.18
Previously applied this refactoring to buildah:
https://github.com/containers/buildah/pull/5239
Signed-off-by: Philip Dubé <philip@peerdb.io>
Conditional expression duplicates the
code above, therefore, remove it
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[NO NEW TESTS NEEDED]
Signed-off-by: Egor Makrushin <emakrushin@astralinux.ru>
* Add BaseHostsFile to container configuration
* Do not copy /etc/hosts file from host when creating a container using Docker API
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
This field drags in a dependency on CNI and thereby blocks us from disabling CNI
support via a build tag
[NO NEW TESTS NEEDED]
Signed-off-by: Dan Čermák <dcermak@suse.com>
Like stated in [PR for crun](https://github.com/containers/crun/pull/1372)
that HostID is what being mapped here, so we should be checking `HostID` instead of `ContainerID`. `v.ContainerID` here is the id of owner of files on filesystem, that can be totally unrelated to the uid maps.
Signed-off-by: Karuboniru <yanqiyu01@gmail.com>
Use the new rootlessnetns logic from c/common, drop the podman code
here and make use of the new much simpler API.
ref: https://github.com/containers/common/pull/1761
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
add a new option --preserve-fd that allows to specify a list of FDs to
pass down to the container.
It is similar to --preserve-fds but it allows to specify a list of FDs
instead of the maximum FD number to preserve.
--preserve-fd and --preserve-fds are mutually exclusive.
It requires crun since runc would complain if any fd below
--preserve-fds is not preserved.
Closes: https://github.com/containers/podman/issues/20844
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When Podman starts, it checks a number of critical runtime paths
against stored values in the database to make sure that existing
containers are not broken by a configuration change. We recently
made some changes to this logic to make our handling of the some
options more sane (StaticDir in particular was set based on other
passed options in a way that was not particularly sane) which has
made the logic more sensitive to paths with symlinks. As a simple
fix, handle symlinks properly in our DB vs runtime comparisons.
The BoltDB bits are uglier because very, very old Podman versions
sometimes did not stuff a proper value in the database and
instead used the empty string. SQLite is new enough that we don't
have to worry about such things.
Fixes#20872
Signed-off-by: Matt Heon <mheon@redhat.com>
Right now, we always use a private UTS namespace on FreeBSD. This should
be made optional but implementing that cleanly needs a FreeBSD extension
to the OCI runtime config. The process for that is starting
(https://github.com/opencontainers/tob/pull/133) but in the meantime,
assume that the UTS namespace is private on FreeBSD.
This moves the Linux-specific namespace logic to
container_internal_linux.go and adds a FreeBSD stub.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
When committing containers to create new images, accept a container
config blob being passed in the body of the API request by adding a
Config field to our API structures. Populate it from the body of
requests that we receive, and use its contents as the body of requests
that we make.
Make the libpod commit endpoint split changes values at newlines, just
like the compat endpoint does.
Pass both the config blob and the "changes" slice to buildah's Commit()
API, so that it can handle cases where they overlap or conflict.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Only one process can write to the sqlite db at the same time, if another
process tries to use it at that time it fails and a database is locked
error is returned. If this happens sqlite should keep retrying until it
can write. To do that we can just set the _busy_timeout option. A 100s
timeout should be enough even on slower systems but not to much in case
there is a deadlock so it still returns in a reasonable time.
[NO NEW TESTS NEEDED] I think we strongly need to consider some form of
parallel stress testing to catch bugs like this.
Fixes#20809
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
In FreeBSD-14.0, it is possible to configure a jail's network settings
from outside the jail using ifconfig and route's new '-j' option. This
removes the need for a separate jail to own the container's vnet.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
When InitialDelaySeconds in the kube yaml is set for a helthcheck,
don't update the healthcheck status till those initial delay seconds are over.
We were waiting to update for a failing healtcheck, but when the healthcheck
was successful during the initial delay time, the status was being updated as healthy
immediately.
This is misleading to the users wondering why their healthcheck takes
much longer to fail for a failing case while it is quick to succeed for
a healthy case. It also doesn't match what the k8s InitialDelaySeconds
does. This change is only for kube play, podman healthcheck run is
unaffected.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Added additional check for event type to be remove and set the correct exitcode.
While it was getting difficult to maintain the omitempty notation for Event->ContainerExitCode, changing the type from int to int ptr gives us the ability to check for ContainerExitCode to be not nil and continue operations from there.
closes#19124
Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
If a transaction is started it must either be committed or rolled back.
The function uses defer to call `tx.Rollback()` if there is an error
returned. However it also called `tx.Commit()` and afterwards further
errors can be returned which means it tries to roll back a already
committed transaction which cannot work.
This fix is to make sure tx.Commit() is the last call in that function.
see https://github.com/containers/podman/issues/20731
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We have to Commit() the transaction. Note this is only in a rare pod
remove code path and very unlikely to ever be used.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
If we get an error chowning a file or directory to a UID/GID pair
for something like ENOSUP or EPERM, then we should ignore as long as the UID/GID
pair on disk is correct.
Fixes: https://github.com/containers/podman/issues/20801
[NO NEW TESTS NEEDED]
Since this is difficult to test and existing tests should be sufficient
to ensure no regression.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
these functions are not used anymore in the codebase, so drop them.
[NO NEW TESTS NEEDED] no new functionalities are added
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
We were ignoreing relabel requests on certain unsupported
file systems and not on others, this changes to consistently
logrus.Debug ENOTSUP file systems.
Fixes: https://github.com/containers/podman/discussions/20745
Still needs some work on the Buildah side.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add a new `no-dereference` mount option supported by crun 1.11+ to
re-create/copy a symlink if it's the source of a mount. By default the
kernel will resolve the symlink on the host and mount the target.
As reported in #20098, there are use cases where the symlink structure
must be preserved by all means.
Fixes: #20098
Fixes: issues.redhat.com/browse/RUN-1935
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
We are only using imageID on that branch, so it is
more consistent.
Should not change behavior; in callers, either
both are set or neither.
[NO NEW TESTS NEEDED]
Signed-off-by: Miloslav Trmač <mitr@redhat.com>