Since podman doesn't set/use the needed service env
variable, always set enableServiceLinks to false in
the generated kube yaml.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
If we get a SIGTERM immediately after Conmon starts but before we
record its PID in the database, we end up leaking a Conmon and
associated OCI runtime process. Inhibit shutdown using the logic
we originally wrote to prevent similar issues during container
creation to prevent this problem.
[NO NEW TESTS NEEDED] No real way to test this I can think of.
Fixes#15557
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When a kube yaml has a volume set as empty dir, podman
will create an anonymous volume with the empty dir name and
attach it to the containers running in the pod. When the pod
is removed, the empy dir volume created is also removed.
Add tests and docs for this as well.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
I managed to miss this while factoring out moveConmonToCgroupAndSignal.
Perhaps the signalling part should move to the caller instead?
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Compat: Treat already attached networks as a no-op
Applies only to containers in created state. Maintain error in running state.
Co-authored-by: Alessandro Rossi <al.rossi87@gmail.com>
Co-authored-by: Brent Baude <bbaude@redhat.com>
Co-authored-by: Jason T. Greene <jason.greene@redhat.com>
Signed-off-by: Alessandro Rossi <al.rossi87@gmail.com>
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
Commit 30e7cbccc1 accidentally added a deadlock as Podman was waiting
for the exit code to show up when the container transitioned to stopped.
Code paths that require the exit code to be written (by the cleanup
process) should already be using `(*Container).Wait()` in a deadlock
free way.
[NO NEW TESTS NEEDED] as I did not manage to a reproducer that would
work in CI. Ultimately, it's a race condition.
Fixes: #15492
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Don't add the same annotations as the pod yaml to the
service yaml as it is not needed.
[NO NEW TESTS NEEDED]
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
The Linux implementation uses /proc/stat - the FreeBSD equivalent is
quite different where this information is exposed via sysctl.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Also, do a general cleanup of all the timeout code. Changes
include:
- Convert from int to *uint where possible. Timeouts cannot be
negative, hence the uint change; and a timeout of 0 is valid,
so we need a new way to detect that the user set a timeout
(hence, pointer).
- Change name in the database to avoid conflicts between new data
type and old one. This will cause timeouts set with 4.2.0 to be
lost, but considering nobody is using the feature at present
(and the lack of validation means we could have invalid,
negative timeouts in the DB) this feels safe.
- Ensure volume plugin timeouts can only be used with volumes
created using a plugin. Timeouts on the local driver are
nonsensical.
- Remove the existing test, as it did not use a volume plugin.
Write a new test that does.
The actual plumbing of the containers.conf timeout in is one line
in volume_api.go; the remainder are the above-described cleanups.
Signed-off-by: Matthew Heon <mheon@redhat.com>
For FreeBSD, we need the name of the 'network jail' which is the parent
of all containers in a pod. Having a separate jail for the network
configuration also simplifies the implementation of CNI plugins so we
use this pattern for solitary containers as well as pods.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
This also adds FreeBSD equivalents to the functions moved to
oci_conmon*_linux.go. For openUnixSocket, we create a temporary symlink
to shorten the path to something that fits into sockaddr_un.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
This function depends on linux-specific functionality in /proc/fd to
allow connecting to local domain sockets with pathnames too long for
sockaddr_un.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
The O_PATH flag is a recent addition to the open syscall and is not
present in darwin or in FreeBSD releases before 13.1. The constant is
not present in the FreeBSD version of x/sys/unix since that package
supports FreeBSD 12.3 and later.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
This removes a use of state.NetNS which is a linux-specific field defined
in container_linux.go from the generic container_internal.go, allowing
that to build on non-linux platforms.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Note: this makes info.go linux-only since it mixes linux-specific and
generic code. This should be addressed in a separate refactoring PR.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Integrate sd-notify policies into `kube play`. The policies can be
configured for all contianers via the `io.containers.sdnotify`
annotation or for indidivual containers via the
`io.containers.sdnotify/$name` annotation.
The `kube play` process will wait for all containers to be ready by
waiting for the individual `READY=1` messages which are received via
the `pkg/systemd/notifyproxy` proxy mechanism.
Also update the simple "container" sd-notify test as it did not fully
test the expected behavior which became obvious when adding the new
tests.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The notify socket can now either be specified via an environment
variable or programatically (where the env is ignored). The
notify mode and the socket are now also displayed in `container inspect`
which comes in handy for debugging and allows for propper testing.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
bump github.com/container-orchestrated-devices/container-device-interface from 0.4.0 to 0.5.0
This requires that the cdi.Registry be instantiated with AutoRefresh disabled for CLI clients.
[NO NEW TESTS NEEDED]
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Allow the cleanup process (and others) to transition the container from
`stopping` to `exited`. This fixes a race condition detected in #14859
where the cleanup process kicks in _before_ the stopping process can
read the exit file. Prior to this fix, the cleanup process left the
container in the `stopping` state and removed the conmon files, such
that the stopping process also left the container in this state as it
could not read the exit files. Hence, `podman wait` timed out (see the
23 seconds execution time of the test [1]) due to the unexpected/invalid
state and the test failed.
Further turn the warning during stop to a debug message since it's a
natural race due to the daemonless/concurrent architecture and nothing
to worry about.
[NO NEW TESTS NEEDED] since we can only monitor if #14859 continues
flaking or not.
[1] https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/6210434704343040/html/sys-remote-fedora-36-rootless-host.log.html#t--00205Fixes: #14859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Improve the error message when looking up the exit code of a container.
The state of the container may help us track down #14859 which flakes
rarely and is impossible to reproduce on my machine.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
added the following flags and handling for podman pod create
--memory-swap
--cpuset-mems
--device-read-bps
--device-write-bps
--blkio-weight
--blkio-weight-device
--cpu-shares
given the new backend for systemd in c/common, all of these can now be exposed to pod create.
most of the heavy lifting (nearly all) is done within c/common. However, some rewiring needed to be done here
as well!
Signed-off-by: Charlie Doern <cdoern@redhat.com>
do not attempt to lock all containers on pod rm since it can cause
deadlocks when other podman cleanup processes are attempting to lock
the same containers in a different order.
[NO NEW TESTS NEEDED]
Closes: https://github.com/containers/podman/issues/14929
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
PR https://github.com/containers/common/pull/1071 moved `pkg/hooks` to
`c/common` hence remove that from podman and use `pkg/hooks` from
`c/common`
[NO NEW TESTS NEEDED]
[NO TESTS NEEDED]
Signed-off-by: Aditya R <arajan@redhat.com>
Update the init container type default to once instead
of always to match k8s behavior.
Add a new annotation that can be used to change the init
ctr type in the kube yaml.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
When running a single podman logs this is not really important since we
will exit when we finish reading the logs. However for the system
service this is very important. Leaking goroutines will cause an
increased memory and CPU ussage over time.
Both the the event and log backend have goroutine leaks with both the
file and journald drivers.
The journald backend has the problem that journal.Wait(IndefiniteWait)
will block until we get a new journald event. So when a client closes
the connection the goroutine would still wait until there is a new
journal entry. To fix this we just wait for a maximum of 5 seconds,
after that we can check if the client connection was closed and exit
correctly in this case.
For the file backend we can fix this by waiting for either the log line
or context cancel at the same time. Currently it would block waiting for
new log lines and only check afterwards if the client closed the
connection and thus hang forever if there are no new log lines.
[NO NEW TESTS NEEDED] I am open to ideas how we can test memory leaks in
CI.
To test manually run a container like this:
`podman run --log-driver $driver --name test -d alpine sh -c 'i=1; while [ "$i" -ne 1000 ]; do echo "line $i"; i=$((i + 1)); done; sleep inf'`
where `$driver` can be either `journald` or `k8s-file`.
Then start the podman system service and use:
`curl -m 1 --output - --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock -v 'http://d/containers/test/logs?follow=1&since=0&stderr=1&stdout=1' &>/dev/null`
to get the logs from the API and then it closes the connection after 1 second.
Now run the curl command several times and check the memory usage of the service.
Fixes#14879
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
NFS Servers will thrown ENOTSUPP error if you attempt to
chown a directory to the same UID and GID as the directory
already has. If volumes are stored on NFS directories this
throws an ugly error and then works on the next try.
Bottom line don't chown directories that already have the correct
UID and GID.
Fixes: https://github.com/containers/podman/issues/14766
[NO NEW TESTS NEEDED] Difficult to setup an NFS Server in testing.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
[NO NEW TESTS NEEDED]
Empty path to runtime binary was printed instead of a real path.
Before fix:
TRAC[0000] found runtime ""
TRAC[0000] found runtime ""
After:
TRAC[0000] found runtime "/usr/bin/crun"
TRAC[0000] found runtime "/usr/bin/runc"
Signed-off-by: Mikhail Khachayants <khachayants@arrival.com>
* Correct spelling and typos.
* Improve language.
Co-authored-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
While for some call paths we may be doing this redundantly we need to
make sure the exit code is always read at this point.
[NO NEW TESTS NEEDED] as I do not manage to reproduce the issue which
is very likely caused by a code path not writing the exit code when
running concurrently.
Fixes: #14859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
If a pod is created without net sharing, allow adding
separate ports for each container to the kube yaml
and also set the pod level hostname correctly if the
uts namespace is not being shared.
Add a warning if the default namespace sharing options
have been modified by the user.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Since conmon-rs also uses this code we moved it to c/common. Now podman
should has this also to prevent duplication.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
add support for the --uts flag in pod create, allowing users to avoid
issues with default values in containers.conf.
uts follows the same format as other namespace flags:
--uts=private (default), --uts=host, --uts=ns:PATH
resolves#13714
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Previously, if a container had healthchecks disabled in the
docker-compose.yml file and the user did a `podman inspect <container>`,
they would have an incorrect output:
```
"Healthcheck":{
"Test":[
"CMD-SHELL",
"NONE"
],
"Interval":30000000000,
"Timeout":30000000000,
"Retries":3
}
```
After a quick change, the correct output is now the result:
```
"Healthcheck":{
"Test":[
"NONE"
]
}
```
Additionally, I extracted the hard-coded strings that were used for
comparisons into constants in `libpod/define` to prevent a similar issue
from recurring.
Closes: #14493
Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
Make sure `Sync()` handles state transitions and exit codes correctly.
The function was only being called when batching which could render
containers in an unusable state when running concurrently with other
state-altering functions/commands since the state must be re-read from
the database before acting upon it.
Fixes: #14761
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
using the new resource backend, implement podman pod create --memory which enables
users to modify memory.max inside of the parent cgroup (the pod), implicitly impacting all
children unless overriden
Signed-off-by: Charlie Doern <cdoern@redhat.com>
PR containers/podman/pull/14449 had an outdated base. Merging it broke
builds.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
the new version of runc has the same check in place and it
automatically resume the container if it is paused. So when Podman
tries to resume it again, it fails since the container is not in the
paused state.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2100740
[NO NEW TESTS NEEDED] the CI doesn't use a new runc on cgroup v1 systems.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
[NO NEW TESTS NEEDED] now that podman's cgroup config tries to initialize controllers, cgroupfs errors out on pod creation
we need to mimic the behavior that used to exist and only create the cgroup when running as rootful
Signed-off-by: Charlie Doern <cdoern@redhat.com>
add two new options to the volume create command: copy and nocopy.
When nocopy is specified, the files from the container image are not
copied up to the volume.
Closes: https://github.com/containers/podman/issues/14722
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the two operations are equivalent since securejoin.SecureJoin() has
solved the symlinks. Prefer the Lstat version though to make sure
symlinks are never resolved and we do not end up using a path on the
host.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
avoid any I/O operation on the volume if the source directory is empty.
This is useful on network file systems (since CAP_DAC_OVERRIDE is not
honored) where the root user might not have enough privileges to
perform an I/O operation on the NFS mount but the user running inside
the container has.
[NO NEW TESTS NEEDED] it needs a setup with a network file system
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Previously, health status events were not being generated at all. Both
the API and `podman events` will generate health_status events.
```
{"status":"health_status","id":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","from":"localhost/healthcheck-demo:latest","Type":"container","Action":"health_status","Actor":{"ID":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","Attributes":{"containerExitCode":"0","image":"localhost/healthcheck-demo:latest","io.buildah.version":"1.26.1","maintainer":"NGINX Docker Maintainers \u003cdocker-maint@nginx.com\u003e","name":"healthcheck-demo"}},"scope":"local","time":1656082205,"timeNano":1656082205882271276,"HealthStatus":"healthy"}
```
```
2022-06-24 11:06:04.886238493 -0400 EDT container health_status ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63 (image=localhost/healthcheck-demo:latest, name=healthcheck-demo, health_status=healthy, io.buildah.version=1.26.1, maintainer=NGINX Docker Maintainers <docker-maint@nginx.com>)
```
Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
currently, setting any sort of resource limit in a pod does nothing. With the newly refactored creation process in c/common, podman ca now set resources at a pod level
meaning that resource related flags can now be exposed to podman pod create.
cgroupfs and systemd are both supported with varying completion. cgroupfs is a much simpler process and one that is virtually complete for all resource types, the flags now just need to be added. systemd on the other hand
has to be handeled via the dbus api meaning that the limits need to be passed as recognized properties to systemd. The properties added so far are the ones that podman pod create supports as well as `cpuset-mems` as this will
be the next flag I work on.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
This bug was introduced in https://github.com/containers/podman/pull/8906.
When we use 'podman rm/restart/stop/kill etc...' command to
the container running with --rm, the OCI runtime directory
remains at /run/<runtime name> (root user) or
/run/user/<user id>/<runtime name> (rootless user).
This bug could cause other bugs.
For example, when we checkpoint the container running with
--rm (podman checkpoint --export) and restore it
(podman restore --import) with crun, error message
"Error: OCI runtime error: crun: container `<container id>`
already exists" is outputted.
This error is caused by an attempt to restore the container with
the same container ID as the remaining OCI runtime's container ID.
Therefore, I fix that the cleanupRuntime() function runs to
remove the OCI runtime directory,
even if the container has already been removed by --rm option.
Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
Libpod requires that all volumes are stored in the libpod db. Because
volume plugins can be created outside of podman, it will not show all
available plugins. This podman volume reload command allows users to
sync the libpod db with their external volume plugins. All new volumes
from the plugin are also created in the libpod db and when a volume from
the db no longer exists it will be removed if possible.
There are some problems:
- naming conflicts, in this case we only use the first volume we found.
This is not deterministic.
- race conditions, we have no control over the volume plugins. It is
possible that the volumes changed while we run this command.
Fixes#14207
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
There is no need for an extra parameter if the body is set. We can just
check to interface for not nil.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Firstly: don't prune exit codes after a refresh - instead, clear
the table entirely. We are guaranteed that all containers are
gone after a refresh, we should not worry about exit codes given
this.
Secondly: alter the way pruning was done. We were updating the DB
by calling Update from within an existing View, and stacking an
RW transaction on top of an existing RO one seems dodgy; further,
modifying a bucket while iterating over it with ForEach is
undefined behavior.
Hopefully this will resolve our CI issues.
Signed-off-by: Matthew Heon <mheon@redhat.com>
This commit addresses three intertwined bugs to fix an issue when using
Gitlab runner on Podman. The three bug fixes are not split into
separate commits as tests won't pass otherwise; avoidable noise when
bisecting future issues.
1) Podman conflated states: even when asking to wait for the `exited`
state, Podman returned as soon as a container transitioned to
`stopped`. The issues surfaced in Gitlab tests to fail [1] as
`conmon`'s buffers have not (yet) been emptied when attaching to a
container right after a wait. The race window was extremely narrow,
and I only managed to reproduce with the Gitlab runner [1] unit
tests.
2) The clearer separation between `exited` and `stopped` revealed a race
condition predating the changes. If a container is configured for
autoremoval (e.g., via `run --rm`), the "run" process competes with
the "cleanup" process running in the background. The window of the
race condition was sufficiently large that the "cleanup" process has
already removed the container and storage before the "run" process
could read the exit code and hence waited indefinitely.
Address the exit-code race condition by recording exit codes in the
main libpod database. Exit codes can now be read from a database.
When waiting for a container to exit, Podman first waits for the
container to transition to `exited` and will then query the database
for its exit code. Outdated exit codes are pruned during cleanup
(i.e., non-performance critical) and when refreshing the database
after a reboot. An exit code is considered outdated when it is older
than 5 minutes.
While the race condition predates this change, the waiting process
has apparently always been fast enough in catching the exit code due
to issue 1): `exited` and `stopped` were conflated. The waiting
process hence caught the exit code after the container transitioned
to `stopped` but before it `exited` and got removed.
3) With 1) and 2), Podman is now waiting for a container to properly
transition to the `exited` state. Some tests did not pass after 1)
and 2) which revealed the third bug: `conmon` was executed with its
working directory pointing to the OCI runtime bundle of the
container. The changed working directory broke resolving relative
paths in the "cleanup" process. The "cleanup" process error'ed
before actually cleaning up the container and waiting "main" process
ran indefinitely - or until hitting a timeout. Fix the issue by
executing `conmon` with the same working directory as Podman.
Note that fixing 3) *may* address a number of issues we have seen in the
past where for *some* reason cleanup processes did not fire.
[1] https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27119#note_970712864
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
[MH: Minor reword of commit message]
Signed-off-by: Matthew Heon <mheon@redhat.com>
We should just silently fall through. The log was flooding the
system-service logs when running Gitlab runner.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
* Replace "setup", "lookup", "cleanup", "backup" with
"set up", "look up", "clean up", "back up"
when used as verbs. Replace also variations of those.
* Improve language in a few places.
Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
expose the --shm-size flag to podman pod create and add proper handling and inheritance
for the option.
resolves#14609
Signed-off-by: Charlie Doern <cdoern@redhat.com>
commit 1951ff168a introduced a check so
that conmon is not moved to a new cgroup when podman is running inside
of a systemd service. This is helpful to integrate podman in systemd
so that the spawned conmon lives in the same cgroup as the service
that created it.
Unfortunately this breaks when podman daemon is running in a systemd
service since the same check is in place thus all the conmon processes
end up in the same cgroup as the podman daemon. When the podman
daemon systemd service stops the conmon processes are also terminated
as well as the containers they monitor.
Improve the check to exclude podman running as a daemon.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2052697
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The nolintlint linter does not deny the use of `//nolint`
Instead it allows us to enforce a common nolint style:
- force that a linter name must be specified
- do not add a space between `//` and `nolint`
- make sure nolint is only used when there is actually a problem
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Lint the systemd code and fix the reported problems.
The remoteclient tag is no longer used so I just removed it.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
if /run is on a volume do not create the file /run/.containerenv as it
would leak outside of the container.
Closes: https://github.com/containers/podman/issues/14577
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
command
Previously, if a container was not running, and the user ran the `podman
stats` command, an error would be reported: `Error: container state
improper`.
Podman now reports stats as the fields' default values for their
respective type if the container is not running:
```
$ podman stats --no-stream demo
ID NAME CPU % MEM USAGE / LIMIT MEM % NET IO BLOCK IO PIDS CPU TIME AVG CPU %
4b4bf8ce84ed demo 0.00% 0B / 0B 0.00% 0B / 0B 0B / 0B 0 0s 0.00%
```
Closes: #14498
Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
implement podman pod clone, a command to create an exact copy of a pod while changing
certain config elements
current supported flags are:
--name change the pod name
--destroy remove the original pod
--start run the new pod on creation
and all infra-container related flags from podman pod create (namespaces etc)
resolves#12843
Signed-off-by: cdoern <cdoern@redhat.com>
Add a new `--overwrite` flag to `podman cp` to allow for overwriting in
case existing users depend on the behavior; they will have a workaround.
By default, the flag is turned off to be compatible with Docker and to
have a more sane behavior.
Fixes: #14420
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
add an option to configure the driver timeout when creating a volume.
The default is 5 seconds but this value is too small for some custom drivers.
Signed-off-by: cdoern <cdoern@redhat.com>
this patch included additonal host namespace checks when creating a ctr as well
as fixing of the tests to check /proc/self/ns/net
see #14461
Signed-off-by: cdoern <cdoern@redhat.com>
Previous PR #12394 tried to address this, but made a mistake:
containers that have just exited do not move to the Exited state
but rather the Stopped state - as such, the code would never have
run (there is no way we start `podman kill`, and the container
transitions to Exited while we are doing it - that requires
holding the container lock, which Kill already does).
Fix the code to check Stopped as well (we omit Exited entirely
but it's a cheap check and our state logic could change in the
future). Also, return an error, instead of exiting cleanly - the
Kill failed, after all. ErrCtrStateInvalid is already handled by
the sig-proxy logic so there won't be issues.
[NO NEW TESTS NEEDED] This fixes a race that I cannot reproduce
myself, and I have no idea how we'd repro in CI.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Podman and Buildah should use the same code the generate the resolv.conf
file. This mostly moved the podman code into c/common and created a
better API for it so buildah can use it as well.
[NO NEW TESTS NEEDED] All existing tests should continue to pass.
Fixes#13599 (There is no way to test this in CI without breaking the
hosts resolv.conf)
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When a container with a userns is created the network setup is special.
Normally the netns is setup before the oci runtime container is created,
however with a userns the container is created first and then the network
is setup. In the second case we never saved the container state
afterwards. Because of it, podman inspect would not show the network info
and network teardown will not happen.
This worked with local podman because there was a save() call later in the
code path which then also saved the network status. But in the podman API
code path this save never happened thus all containers started via API had
this problem.
Fixes#14465
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
make the error clearer and state that images created by other tools
might not be visible to Podman when it overrides the graph driver.
Closes: https://github.com/containers/podman/issues/13970
[NO NEW TESTS NEEDED]
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If a privileged container is running, stops, and the devices on the host
change, such as a USB device is unplugged, then a container would no
longer start. Previously, the devices from the host were only being
added to the container once: when the container was created. Now, this
happens every time the container starts.
I did this by adding a boolean to the container config that indicates
whether to mount all of the devices or not, which can be set via an option.
During spec generation, if the `MountAllDevices` option is set in the
container config, all host devices are added to the container.
Additionally, a couple of functions from `pkg/specgen/generate/config_linux.go`
were moved into `pkg/util/utils_linux.go` as they were needed in
multiple packages.
Closes#13899
Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
Similar feature was added for named overlay volumes here: https://github.com/containers/podman/pull/12712
Following PR just mimics similar feature for anonymous volumes.
Often users want their anonymous overlayed volumes to be `non-volatile` in nature
that means that same `upper` dir can be re-used by one or more
containers but overall of nature of volumes still have to be overlay
so work done is still on a overlay not on the actual volume.
Following PR adds support for more advanced options i.e custom `workdir`
and `upperdir` for overlayed volumes. So that users can re-use `workdir`
and `upperdir` across new containers as well.
Usage
```console
podman run -it -v /some/path:/data:O,upperdir=/path/persistant/upper,workdir=/path/persistant/work alpine sh
```
Signed-off-by: Aditya R <arajan@redhat.com>
Firstly, reset is now managed by the runtime itself as a part of
initialization. This ensures that it can be used even with
runtimes that would otherwise fail to be created - most notably,
when the user has changed a core path
(runroot/root/tmpdir/staticdir).
Secondly, we now attempt a best-effort removal even if the store
completely fails to be configured.
Third, we now hold the alive lock for the entire reset operation.
This ensures that no other Podman process can start while we are
running a system reset, and removes any possibility of a race
where a user tries to create containers or pull images while we
are trying to perform a reset.
[NO NEW TESTS NEEDED] we do not test reset last I checked.
Fixes#9075
Signed-off-by: Matthew Heon <mheon@redhat.com>
The backend should not convert partial lines to full log lines. While
this works for most cases it cannot work when the last line is partial
since it will just be lost. The frontend logic can already display
partial lines correctly. The journald driver also works correctly since
it does not such conversion.
Fixes#14458
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When a container does not use the default podman netns, for example
--network none or --network ns:/path a restore would fail because the
specgen check validates that c.config.StaticMAC is nil but the
unmarshaller sets it to an empty slice.
While we could make the check use len() > 0 I feel like it is more
common to check with != nil for ip and mac addresses.
Adding omitempty tag makes the json marshal/unmarshal work correctly.
This should not cause any issues.
Fixes#14389
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Hardcoding the interface name is a bad idea. We have no control over the
actual interface name since the user can change it.
The correct thing is to read them from the network status. Since the
contianer can have more than one interface we have to add the RX/TX
values. The other values are currently not used.
For podman 5.0 we should change it so that the API can return the
statistics per interface and the client should sum the TX/RX for the
command output. This is what docker is doing.
Fixes#13824
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Make sure to wait for the systemd operations to finish when
starting/stopping healtcheck timers and services. Also make
sure to stop the timer before the service to avoid a race
with the timer.
[NO NEW TESTS NEEDED] since it is a non-functional change and existing
tests are expected to pass.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Refactor populating uptime field to use standard library parsing and
math for populating the hour, minute, seconds fields.
Note: the go-humanize package does not cover time.Duration just
time.time.
```release-note
NONE
```
[NO NEW TESTS NEEDED]
Signed-off-by: Jhon Honce <jhonce@redhat.com>
With conmon-rs on the horizon, we need to disentangle Libpod from
legacy Conmon to the greatest extent possible. There are
definitely opportunities for codesharing between the two, but we
have to assume the implementations will be largely disjoint given
the different architectures.
Fortunately, most of the work has already been done in the past.
The conmon-managed OCI runtime mostly sits behind an interface,
with a few exceptions - the most notable of those being attach.
This PR thus moves Attach behind the interface, to ensure that we
can have attach implementations that don't use our existing unix
socket streaming if necessary.
Still to-do is conmon cleanup. There's a lot of code that removes
Conmon-specific files, or kills the Conmon PID, and all of it
will need to be refactored behind the interface.
[NO NEW TESTS NEEDED] Just moving some things around.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The Authorization field lists the plugins for granting access to the
Docker daemon. This field will always be nil for Podman as there is no
daemon. The field is included for compatibility.
```release-note
NONE
```
[NO NEW TESTS NEEDED]
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Mostly, just removing the comments. These either have been done,
or are no longer a good idea.
No code changes. [NO NEW TESTS NEEDED] as such.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Most of these are no longer relevant, just drop the comments.
Most notable change: allow `podman kill` on paused containers.
Works just fine when I test it.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Simplify the work-queue implementation by using a wait group. Once all
queued work items are done, the channel can be closed.
The system tests revealed a flake (i.e., #14351) which indicated that
the service container does not always get stopped which suggests a race
condition when queuing items. Those items are queued in a goroutine to
prevent potential dead locks if the queue ever filled up too quickly.
The race condition in question is that if a work item queues another,
the goroutine for queuing may not be scheduled fast enough and the
runtime shuts down; it seems to happen fairly easily on the slow CI
machines. The wait group fixes this race and allows for simplifying
the code.
Also increase the queue's buffer size to 10 to make things slightly
faster.
[NO NEW TESTS NEEDED] as we are fixing a flake.
Fixes: #14351
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Change the TODO note to NOTE to actually reflect what it is:
breadcrumbs in case we want to add filtering the future.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
For various (mostly legacy) reasons, Podman presently maintains a
unified namespace for pods and containers - IE, we cannot have
both a pod and a container named "test" at the same time. To
implement this, we use a global database table of every pod and
container ID (and another of every pod and container name).
These entries should be added when containers/pods are added, and
removed when containers/pods are removed, with the database's
transactional integrity providing a guarantee that this is
batched with the overall removal and that the DB should remain
sane and consistent no matter what. As such, we treat a dangling
ID as a hard error that stops the use of Podman.
Unfortunately, we have someone run into this last Friday. I'm
still not certain how exactly their DB got into this state, but
without further clarification there, we can consider removing the
error and making Podman instead clean up and remove any dangling
IDs, which should restore Podman to a serviceable state. Drop an
error message if we do this, though, because people should know
that the DB is in a bad state.
[NO NEW TESTS NEEDED] it is deliberately impossible to produce a
configuration that would test this without hex-editing the DB
file.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Create an auto-update event for each invocation, independent if images
and containers are updated or not. Those events will be indicated in
the events already but users will now know why.
Fixes: #14283
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The init binary until now has been bind-mounted to /dev/init which
breaks when bind-mounting to /dev. Instead mount the init to
/run/podman-init. The reasoning for using /run is that it is already
used for other runtime data such as secrets.
Fixes: #14251
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The completion suggested incorrect values for `podman events --filter
type=` . It should only list types not the event status. Also make sure
to use the constants instead of duplicating the strings.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Support running `podman play kube` in systemd by exploiting the
previously added "service containers". During `play kube`, a service
container is started before all the pods and containers, and is stopped
last. The service container communicates its conmon PID via sdnotify.
Add a new systemd template to dispatch such k8s workloads. The argument
of the template is the path to the k8s file. Note that the path must be
escaped for systemd not to bark:
Let's assume we have a `top.yaml` file in the home directory:
```
$ escaped=$(systemd-escape ~/top.yaml)
$ systemctl --user start podman-play-kube@$escaped.service
```
Closes: https://issues.redhat.com/browse/RUN-1287
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Code is not directly reading XDG_RUNTIME_DIR, it is reading a value in
the state that may initially be from XDG_RUNTIME_DIR, but then is
overriden by a value from the boltdb that podman stores some state in.
XDG_RUNTIME_DIR and the RunRoot path may not have the same value, so
complaining about XDG_RUNTIME_DIR here may cause confusion when trying
to debug things.
[NO TESTS NEEDED]
Signed-off-by: Kevin Downey <hiredman@thelastcitadel.com>
Removing exec sessions is guaranteed to evict them from the DB,
but in the case of a zombie process (or similar) it may error and
block removal of the container. A subsequent run of `podman rm`
would succeed (because the exec sessions have been purged from
the DB), which is potentially confusing to users. So let's just
continue, instead of erroring out, if removing exec sessions
fails.
[NO NEW TESTS NEEDED] I wouldn't want to spawn a zombie in our
test VMs even if I could.
Fixes#14252
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Send the main PID only once. Previously, `(*Container).start()` and
the conmon handler sent them ~simultaneously and went into a race.
I noticed the issue while debugging a WIP PR.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Add the notion of a "service container" to play kube. A service
container is started before the pods in play kube and is (reverse)
linked to them. The service container is stopped/removed *after*
all pods it is associated with are stopped/removed.
In other words, a service container tracks the entire life cycle
of a service started via `podman play kube`. This is required to
enable `play kube` in a systemd unit file.
The service container is only used when the `--service-container`
flag is set on the CLI. This flag has been marked as hidden as it
is not meant to be used outside the context of `play kube`. It is
further not supported on the remote client.
The wiring with systemd will be done in a later commit.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
currently tags cause a panic due to an uninitialized map. Initialize the map
and add parsing to make sure we are only tagging with journald
resolves#13356
Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
Reading the networks requires an extra db operation. Most c.Config() callers
do not need them so create a new function which returns the config with
networks.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We no longer create the temporary directory as `libpod_test_*`.
The directory returned by `t.TempDir()` is TestPostDeleteHooks/001
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.
Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
defer func() {
if err := os.RemoveAll(dir); err != nil {
t.Fatal(err)
}
}
is also tedious, but `t.TempDir` handles this for us nicely.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Rather than assuming a filesystem path, the API service URI is recorded
in the libpod runtime configuration and then reported as requested.
Note: All schemes other than "unix" are hard-coded to report URI exists.
Fixes#12023
Signed-off-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
when reading from the attach socket, treat ECONNRESET in the same way
as EOF.
[NO NEW TESTS NEEDED]
Closes: https://github.com/containers/podman/issues/11446
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
since the network config is a string map, json.unmarshal does not recognize
the config and spec as the same entity, need to map this option manually
resolves#13713
Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
In support of podman machine and its counterpart desktop, we have added
new stats to podman info.
For storage, we have added GraphRootAllocated and GraphRootUsed in
bytes.
For CPUs, we have added user, system, and idle percents based on
/proc/stat.
Fixes: #13876
Signed-off-by: Brent Baude <bbaude@redhat.com>
Add the notion of an "exit policy" to a pod. This policy controls the
behaviour when the last container of pod exits. Initially, there are
two policies:
- "continue" : the pod continues running. This is the default policy
when creating a pod.
- "stop" : stop the pod when the last container exits. This is the
default behaviour for `play kube`.
In order to implement the deferred stop of a pod, add a worker queue to
the libpod runtime. The queue will pick up work items and in this case
helps resolve dead locks that would otherwise occur if we attempted to
stop a pod during container cleanup.
Note that the default restart policy of `play kube` is "Always". Hence,
in order to really solve #13464, the YAML files must set a custom
restart policy; the tests use "OnFailure".
Fixes: #13464
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Since networks must always be read from the db bucket directly we should
unset them in config to avoid caller from accidentally using them.
I already tried this but it didn't work because the networks were unset
after the config was marshalled.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When a container is run in the host network namespace we have to keep
the same resolv.conf content and not use the systemd-resolve detection
logic.
But also make sure we still allow --dns options.
Fixes#14055
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The files /etc/hosts, /etc/hostname and /etc/resolv.conf should always
be owned by the root user in the container. This worked correct for
/etc/hostname and /etc/hosts but not for /etc/resolv.conf.
A container run with --userns keep-id would have the reolv.conf file
owned by the current container user which is wrong.
Consolidate some common code in a new helper function to make the code more
cleaner.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The errcheck linter makes sure that errors are always check and not
ignored by accident. It spotted a lot of unchecked errors, mostly in the
tests but also some real problem in the code.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
It solves a race where a container cleanup process launched because of
the container process exiting normally would hang.
It also solves a problem when running as rootless on cgroup v1 since
it is not possible to force pids.max = 1 on conmon to limit spawning
the cleanup process.
Partially copied from https://github.com/containers/podman/pull/13403
Related to: https://github.com/containers/podman/issues/14057
[NO NEW TESTS NEEDED] it doesn't add any new functionality
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
`pod.CgroupPath()` currently includes a codepath that is never accessed,
which is supposed to start the infra ctr and obtain the cgroup path from there
that is never necessary/safe because p.state.CgroupPath is never empty
[NO NEW TESTS NEEDED]
Signed-off-by: cdoern <cbdoer23@g.holycross.edu>