When the OCI Runtime tries to set certain settings in cgroups
it can get the error "no such file or directory", the wrapper
ends up reporting a bogus error like:
```
Request Failed(Internal Server Error): open io.max: No such file or directory: OCI runtime command not found error
{"cause":"OCI runtime command not found error","message":"open io.max: No such file or directory: OCI runtime command not found error","response":500}
```
On first reading of this, you would think the OCI Runtime (crun or runc) were not found. But the error is actually reporting
message":"open io.max: No such file or directory
Which is what we want the user to concentrate on.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If you use additional stores and pull the same image into
writable stores, you can end up with the situation where
you have the same image twice. This causes image exists
to return the wrong error. It should return true in this
situation rather then an error.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This adds the database backend for network aliases. Aliases are
additional names for a container that are used with the CNI
dnsname plugin - the container will be accessible by these names
in addition to its name. Aliases are allowed to change over time
as the container connects to and disconnects from networks.
Aliases are implemented as another bucket in the database to
register all aliases, plus two buckets for each container (one to
hold connected CNI networks, a second to hold its aliases). The
aliases are only unique per-network, to the global and
per-container aliases buckets have a sub-bucket for each CNI
network that has aliases, and the aliases are stored within that
sub-bucket. Aliases are formatted as alias (key) to container ID
(value) in both cases.
Three DB functions are defined for aliases: retrieving current
aliases for a given network, setting aliases for a given network,
and removing all aliases for a given network.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Make a distinction between pods that are completely running (all
containers running) and those that have some containers going,
but not all, by introducing an intermediate state between Stopped
and Running called Degraded. A Degraded pod has at least one, but
not all, containers running; a Running pod has all containers
running.
First step to a solution for #7213.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When we create a container, we assign a cgroup parent based on
the current cgroup manager in use. This parent is only usable
with the cgroup manager the container is created with, so if the
default cgroup manager is later changed or overridden, the
container will not be able to start.
To solve this, store the cgroup manager that created the
container in container configuration, so we can guarantee a
container with a systemd cgroup parent will always be started
with systemd cgroups.
Unfortunately, this is very difficult to test in CI, due to the
fact that we hard-code cgroup manager on all invocations of
Podman in CI.
Fixes#7830
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
In podman containers rm and podman images rm, the commands
exit with error code 1 if the object does not exists.
This PR implements similar functionality to volumes, networks, and Pods.
Similarly if volumes or Networks are in use by other containers, and return
exit code 2.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This is very useful for debugging cgroups v2, especially on
rootless - we need to ensure people are correctly using systemd
cgroups in these cases.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The `podman ps --all` command will now show containers that
are under the control of other c/storage container systems and
the new `ps --storage` option will show only containers that are
in c/storage but are not controlled by libpod.
In the below examples, the '*working-container' entries were created
by Buildah.
```
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9257ef8c786c docker.io/library/busybox:latest ls /etc 8 hours ago Exited (0) 8 hours ago gifted_jang
d302c81856da docker.io/library/busybox:latest buildah 30 hours ago storage busybox-working-container
7a5a7b099d33 localhost/tom:latest ls -alF 30 hours ago Exited (0) 30 hours ago hopeful_hellman
01d601fca090 localhost/tom:latest ls -alf 30 hours ago Exited (1) 30 hours ago determined_panini
ee58f429ff26 localhost/tom:latest buildah 33 hours ago storage alpine-working-container
podman ps --external
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d302c81856da docker.io/library/busybox:latest buildah 30 hours ago external busybox-working-container
ee58f429ff26 localhost/tom:latest buildah 33 hours ago external alpine-working-container
```
Signed-off-by: TomSweeneyRedHat <tsweeney@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
it allows to manually tweak the configuration for cgroup v2.
we will expose some of the options in future as single
options (e.g. the new memory knobs), but for now add the more generic
--cgroup-conf mechanism for maximum control on the cgroup
configuration.
OCI specs change: https://github.com/opencontainers/runtime-spec/pull/1040
Requires: https://github.com/containers/crun/pull/459
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
because a pod's network information is dictated by the infra container at creation, a container cannot be created with network attributes. this has been difficult for users to understand. we now return an error when a container is being created inside a pod and passes any of the following attributes:
* static IP (v4 and v6)
* static mac
* ports -p (i.e. -p 8080:80)
* exposed ports (i.e. 222-225)
* publish ports from image -P
Signed-off-by: Brent Baude <bbaude@redhat.com>
The define package under Libpod is intended to be an extremely
minimal package, including constants and very little else.
However, as a result of some legacy code, it was dragging in all
of libpod/image (and, less significantly, the util package).
Fortunately, this was just to ensure that error constants were
not duplicating, and there's nothing preventing us from
importing in the other direction and keeping libpod/define free
of dependencies.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
--umask sets the umask inside the container
Defaults to 0022
Co-authored-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Ashley Cui <acui@redhat.com>
This allows us to determine if the container auto-detected that
systemd was in use, and correctly activated systemd integration.
Use this to wire up some integration tests to verify that systemd
integration is working properly.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We had a field for this in the inspect data, but it was never
being populated. Because of this, `podman pod inspect` stopped
showing port bindings (and other infra container settings). Add
code to populate the infra container inspect data, and add a test
to ensure we don't regress again.
Signed-off-by: Matthew Heon <mheon@redhat.com>
--sdnotify container|conmon|ignore
With "conmon", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.
With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.
This is the default, which is closest to the behavior podman has done in the past.
The "ignore" option removes NOTIFY_SOCKET from the environment, so neither podman nor
any child processes will talk to systemd.
This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.
Signed-off-by: Joseph Gooch <mrwizard@dok.org>
With the advent of Podman 2.0.0 we crossed the magical barrier of go
modules. While we were able to continue importing all packages inside
of the project, the project could not be vendored anymore from the
outside.
Move the go module to new major version and change all imports to
`github.com/containers/libpod/v2`. The renaming of the imports
was done via `gomove` [1].
[1] https://github.com/KSubedi/gomove
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
--tz flag sets timezone inside container
Can be set to IANA timezone as well as `local` to match host machine
Signed-off-by: Ashley Cui <acui@redhat.com>
The infra/abi code for pods was written in a flawed way, assuming
that the map[string]error containing individual container errors
was only set when the global error for the pod function was nil;
that is not accurate, and we are actually *guaranteed* to set the
global error when any individual container errors. Thus, we'd
never actually include individual container errors, because the
infra code assumed that err being set meant everything failed and
no container operations were attempted.
We were originally setting the cause of the error to something
nonsensical ("container already exists"), so I made a new error
indicating that some containers in the pod failed. We can then
ignore that error when building the report on the pod operation
and actually return errors from individual containers.
Unfortunately, this exposed another weakness of the infra code,
which was discarding the container IDs. Errors from individual
containers are not guaranteed to identify which container they
came from, hence the use of map[string]error in the Pod API
functions. Rather than restructuring the structs we return from
pkg/infra, I just wrapped the returned errors with a message
including the ID of the container.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Throw an error if a specified tag does not exist. Also make sure that
the user input is normalized as we already do for `podman tag`.
To prevent regressions, add a set of end-to-end and systemd tests.
Last but not least, update the docs and add bash completions.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
When the container uses journald logging, we don't want to
automatically use the same driver for its exec sessions. If we do
we will pollute the journal (particularly in the case of
healthchecks) with large amounts of undesired logs. Instead,
force exec sessions logs to file for now; we can add a log-driver
flag later (we'll probably want to add a `podman logs` command
that reads exec session logs at the same time).
As part of this, add support for the new 'none' logs driver in
Conmon. It will be the default log driver for exec sessions, and
can be optionally selected for containers.
Great thanks to Joe Gooch (mrwizard@dok.org) for adding support
to Conmon for a null log driver, and wiring it in here.
Fixes#6555
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Add a `CreateCommand` field to the pod config which includes the entire
`os.Args` at pod-creation. Similar to the already existing field in a
container config, we need this information to properly generate generic
systemd unit files for pods. It's a prerequisite to support the `--new`
flag for pods.
Also add the `CreateCommand` to the pod-inspect data, which can come in
handy for debugging, general inspection and certainly for the tests that
are added along with the other changes.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
This came out of a conversation with Valentin about
systemd-managed Podman. He discovered that unit files did not
properly handle cases where Conmon was dead - the ExecStopPost
`podman rm --force` line was not actually removing the container,
but interestingly, adding a `podman cleanup --rm` line would
remove it. Both of these commands do the same thing (minus the
`podman cleanup --rm` command not force-removing running
containers).
Without a running Conmon instance, the container process is still
running (assuming you killed Conmon with SIGKILL and it had no
chance to kill the container it managed), but you can still kill
the container itself with `podman stop` - Conmon is not involved,
only the OCI Runtime. (`podman rm --force` and `podman stop` use
the same code to kill the container). The problem comes when we
want to get the container's exit code - we expect Conmon to make
us an exit file, which it's obviously not going to do, being
dead. The first `podman rm` would fail because of this, but
importantly, it would (after failing to retrieve the exit code
correctly) set container status to Exited, so that the second
`podman cleanup` process would succeed.
To make sure the first `podman rm --force` succeeds, we need to
catch the case where Conmon is already dead, and instead of
waiting for an exit file that will never come, immediately set
the Stopped state and remove an error that can be caught and
handled.
Signed-off-by: Matthew Heon <mheon@redhat.com>
this is step 1 to self-discovery of remote ssh connections. we add a remotesocket struct to info to detect what the socket path might be.
Co-authored-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Brent Baude <bbaude@redhat.com>
Currently we are displaying the Seconds since EPOCH
this will change to displaying date, similar to `podman version`
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
By moving a couple of variables from libpod/libpod to libpod/libpod/define
I am able shrink the podman-remote-* executables by another megabyte.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We’re now able to build a static podman binary based on a custom nix
derivation. This is integrated in cirrus as well, whereas a later target
would be to provide a self-contained static binary bundle which can be
installed on any Linux x64-bit system.
Fixes: https://github.com/containers/libpod/issues/1399
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
This one was a massive pain to track down.
The original symptom was an error message from rootless Podman
trying to make a container in a pod. I unfortunately did not look
at the error message closely enough to realize that the namespace
in question was the cgroup namespace (the reproducer pod was
explicitly set to only share the network namespace), else this
would have been quite a bit shorter.
I spent considerable effort trying to track down differences
between the inspect output of the two containers, and when that
failed I was forced to resort to diffing the OCI specs. That
finally proved fruitful, and I was able to determine what should
have been obvious all along: the container was joining the cgroup
namespace of the infra container when it really ought not to
have.
From there, I discovered a variable collision in pod config. The
UsePodCgroup variable means "create a parent cgroup for the pod
and join containers in the pod to it". Unfortunately, it is very
similar to UsePodUTS, UsePodNet, etc, which mean "the pod shares
this namespace", so an accessor was accidentally added for it
that indicated the pod shared the cgroup namespace when it really
did not. Once I realized that, it was a quick fix - add a bool to
the pod's configuration to indicate whether the cgroup ns was
shared (distinct from UsePodCgroup) and use that for the
accessor.
Also included are fixes for `podman inspect` and
`podman pod inspect` that fix them to actually display the state
of the cgroup namespace (for container inspect) and what
namespaces are shared (for pod inspect). Either of those would
have made tracking this down considerably quicker.
Fixes#6149
Signed-off-by: Matthew Heon <mheon@redhat.com>
Add the `podman generate kube` and `podman play kube` command. The code
has largely been copied from Podman v1 but restructured to not leak the
K8s core API into the (remote) client.
Both commands are added in the same commit to allow for enabling the
tests at the same time.
Move some exports from `cmd/podman/common` to the appropriate places in
the backend to avoid circular dependencies.
Move definitions of label annotations to `libpod/define` and set the
security-opt labels in the frontend to make kube tests pass.
Implement rest endpoints, bindings and the tunnel interface.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* Introduced define.ErrImageInUse to assist in determining the exit code
without resorting string searches.
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Add more default options parsing
Switch to using --time as opposed to --timeout to better match Docker.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This will replace the structs in use in libpod, which cannot be
used as they are also directly involved in the database
representation of pods and cannot be moved out of Libpod.
Signed-off-by: Matthew Heon <mheon@redhat.com>
the current implementation of info, while typed, is very loosely done so. we need stronger types for our apiv2 implmentation and bindings.
Signed-off-by: Brent Baude <bbaude@redhat.com>
add the ability to attach to a running container. the tunnel side of this is not enabled yet as we have work on the endpoints and plumbing to do yet.
add the ability to exec a command in a running container. the tunnel side is also being deferred for same reason.
Signed-off-by: Brent Baude <bbaude@redhat.com>
The `pause:3.1` has wrong configs for non-amd64 images as they all claim
to be for amd64. The issue has now been fixed in the latest
`pause:3.2`.
[1] https://github.com/kubernetes/kubernetes/issues/87325
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
As part of the rework of exec sessions, we need to address them
independently of containers. In the new API, we need to be able
to fetch them by their ID, regardless of what container they are
associated with. Unfortunately, our existing exec sessions are
tied to individual containers; there's no way to tell what
container a session belongs to and retrieve it without getting
every exec session for every container.
This adds a pointer to the container an exec session is
associated with to the database. The sessions themselves are
still stored in the container.
Exec-related APIs have been restructured to work with the new
database representation. The originally monolithic API has been
split into a number of smaller calls to allow more fine-grained
control of lifecycle. Support for legacy exec sessions has been
retained, but in a deprecated fashion; we should remove this in
a few releases.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
As part of the rework of exec sessions, we want to split Create
and Start - and, as a result, we need to keep everything needed
to start exec sessions in the struct, not just the bare minimum
for tracking running ones.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This reverts commit d3d97a25e8.
This does not resolve the issues we expected it would, and has
some unexpected side effects with the upcoming exec rework.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Before, we were using -1 as a bogus value in podman to signify something went wrong when reading from a conmon pipe. However, conmon uses negative values to indicate the runtime failed, and return the runtime's exit code.
instead, we should use a bogus value that is actually bogus. Define that value in the define package as MinInt32 (-1<< 31 - 1), which is outside of the range of possible pids (-1 << 31)
Signed-off-by: Peter Hunt <pehunt@redhat.com>
We can easily tell if we're going to deadlock by comparing lock
IDs before actually taking the lock. Add a few checks for this in
common places where deadlocks might occur.
This does not yet cover pod operations, where detection is more
difficult (and costly) due to the number of locks being involved
being higher than 2.
Also, add some error wrapping on the Podman side, so we can tell
people to use `system renumber` when it occurs.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When Libpod removes a container, there is the possibility that
removal will not fully succeed. The most notable problems are
storage issues, where the container cannot be removed from
c/storage.
When this occurs, we were faced with a choice. We can keep the
container in the state, appearing in `podman ps` and available for
other API operations, but likely unable to do any of them as it's
been partially removed. Or we can remove it very early and clean
up after it's already gone. We have, until now, used the second
approach.
The problem that arises is intermittent problems removing
storage. We end up removing a container, failing to remove its
storage, and ending up with a container permanently stuck in
c/storage that we can't remove with the normal Podman CLI, can't
use the name of, and generally can't interact with. A notable
cause is when Podman is hit by a SIGKILL midway through removal,
which can consistently cause `podman rm` to fail to remove
storage.
We now add a new state for containers that are in the process of
being removed, ContainerStateRemoving. We set this at the
beginning of the removal process. It notifies Podman that the
container cannot be used anymore, but preserves it in the DB
until it is fully removed. This will allow Remove to be run on
these containers again, which should successfully remove storage
if it fails.
Fixes#3906
Signed-off-by: Matthew Heon <mheon@redhat.com>
Refactor the `RuntimeConfig` along with related code from libpod into
libpod/config. Note that this is a first step of consolidating code
into more coherent packages to make the code more maintainable and less
prone to regressions on the long runs.
Some libpod definitions were moved to `libpod/define` to resolve
circular dependencies.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Also, ensure that we don't try to mount them without root - it
appears that it can somehow not error and report that mount was
successful when it clearly did not succeed, which can induce this
case.
We reuse the `--force` flag to indicate that a volume should be
removed even after unmount errors. It seems fairly natural to
expect that --force will remove a volume that is otherwise
presenting problems.
Finally, ignore EINVAL on unmount - if the mount point no longer
exists our job is done.
Fixes: #4247Fixes: #4248
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Add ability to evict a container when it becomes unusable. This may
happen when the host setup changes after a container creation, making it
impossible for that container to be used or removed.
Evicting a container is done using the `rm --force` command.
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
We have leaked the exit number codess all over the code, this patch
removes the numbers to constants.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This is mostly used with Systemd, which really wants to manage
CGroups itself when managing containers via unit file.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When we fail to remove a container's SHM, that's an error, and we
need to report it as such. This may be part of our lingering
storage woes.
Also, remove MNT_DETACH. It may be another cause of the storage
removal failures.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
There's no way to get the error if we successfully get an exit code (as it's just printed to stderr instead).
instead of relying on the error to be passed to podman, and edit based on the error code, process it on the varlink side instead
Also move error codes to define package
Signed-off-by: Peter Hunt <pehunt@redhat.com>
This includes:
Implement exec -i and fix some typos in description of -i docs
pass failed runtime status to caller
Add resize handling for a terminal connection
Customize exec systemd-cgroup slice
fix healthcheck
fix top
add --detach-keys
Implement podman-remote exec (jhonce)
* Cleanup some orphaned code (jhonce)
adapt remote exec for conmon exec (pehunt)
Fix healthcheck and exec to match docs
Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error
Use these different errors in branching for exit code in healthcheck and exec
Set conmon to use new api version
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
the compilation demands of having libpod in main is a burden for the
remote client compilations. to combat this, we should move the use of
libpod structs, vars, constants, and functions into the adapter code
where it will only be compiled by the local client.
this should result in cleaner code organization and smaller binaries. it
should also help if we ever need to compile the remote client on
non-Linux operating systems natively (not cross-compiled).
Signed-off-by: baude <bbaude@redhat.com>