This is something Docker does, and we did not do until now. Most
difficult/annoying part was the REST API, where I did not really
want to modify the struct being sent, so I made the new restart
policy parameters query parameters instead.
Testing was also a bit annoying, because testing restart policy
always is.
Signed-off-by: Matt Heon <mheon@redhat.com>
This includes migrating from cdi.GetRegistry() to cdi.Configure() and
cdi.GetDefaultCache() as applicable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Healthchecks, defined in a .yaml file as livenessProbe did not had any
effect. They were executing as intended, containers were marked as
unhealthy, yet no action was taken. This was never the intended
behaviour, as observed by the comment:
> if restart policy is in place, ensure the health check enforces it
A minimal example is tracked in containers/podman#20903 [1] with the
following YAML:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: ubi-httpd-24
spec:
restartPolicy: Always
containers:
- name: ubi8-httpd
image: registry.access.redhat.com/rhscl/httpd-24-rhel7:2.4-217
livenessProbe:
httpGet:
path: "/"
port: 8081
```
By passing down the restart policy (and using constants instead of
actually wrong hard-coded ones), Podman actually restarts the container
now.
[1]: https://github.com/containers/podman/issues/20903Closes#20903.
Signed-off-by: Jasmin Oster <nachtjasmin@posteo.de>
The reserved annotation io.podman.annotations.volumes-from is made public to let user define volumes-from to have one container mount volumes of other containers.
The annotation format is: io.podman.annotations.volumes-from/tgtCtr: "srcCtr1:mntOpts1;srcCtr2:mntOpts;..."
Fixes: containers#16819
Signed-off-by: Vikas Goel <vikas.goel@gmail.com>
Moving from Go module v4 to v5 prepares us for public releases.
Move done using gomove [1] as with the v3 and v4 moves.
[1] https://github.com/KSubedi/gomove
Signed-off-by: Matt Heon <mheon@redhat.com>
The pasta network mode has been added in podman v4.4 and this causes a
conflict with named networks that could also be called "pasta". To not
break anything we had special logic to prefer the named network over the
network mode. Now with 5.0 we can break this and remove this awkward
special handling from the code.
Containers created with 4.X that use a named network pasta will also
continue to work fine, this chnage will only effect the creation of new
containers with a named network pasta and instead always used the
network mode pasta. We now also block the creation of networks with the
name "pasta".
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The current field separator comma of the inspect annotation conflicts with the mount options of --volumes-from as the mount options itself can be comma separated.
Signed-off-by: Vikas Goel <vikas.goel@gmail.com>
SpecGen is our primary container creation abstraction, and is
used to connect our CLI to the Libpod container creation backend.
Because container creation has a million options (I exaggerate
only slightly), the struct is composed of several other structs,
many of which are quite large.
The core problem is that SpecGen is also an API type - it's used
in remote Podman. There, we have a client and a server, and we
want to respect the server's containers.conf. But how do we tell
what parts of SpecGen were set by the client explicitly, and what
parts were not? If we're not using nullable values, an explicit
empty string and a value never being set are identical - and we
can't tell if it's safe to grab a default from the server's
containers.conf.
Fortunately, we only really need to do this for booleans. An
empty string is sufficient to tell us that a string was unset
(even if the user explicitly gave us an empty string for an
option, filling in a default from the config file is acceptable).
This makes things a lot simpler. My initial attempt at this
changed everything, including strings, and it was far larger and
more painful.
Also, begin the first steps of removing all uses of
containers.conf defaults from client-side. Two are gone entirely,
the rest are marked as remove-when-possible.
[NO NEW TESTS NEEDED] This is just a refactor.
Signed-off-by: Matt Heon <mheon@redhat.com>
Some OCI runtimes (cf. [1]) may tolerate container images that don't
specify an entrypoint even if no entrypoint is given on the command
line. In those cases, it's annoying for the user to have to pass a ""
argument to podman.
If no entrypoint is given, make the behavior the same as if an empty ""
entrypoint was given.
[1] https://github.com/containers/crun-vm
Signed-off-by: Alberto Faria <afaria@redhat.com>
This avoids nil pointer exceptions in the subsequent code that tries to access the runtimeSpec returned from SpecGenToOCI.
[NO NEW TESTS NEEDED]
Signed-off-by: Sebastian Mosbach <sm453@cam.ac.uk>
Cut is a cleaner & more performant api relative to SplitN(_, _, 2) added in go 1.18
Previously applied this refactoring to buildah:
https://github.com/containers/buildah/pull/5239
Signed-off-by: Philip Dubé <philip@peerdb.io>
* Add BaseHostsFile to container configuration
* Do not copy /etc/hosts file from host when creating a container using Docker API
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
Use the new rootlessnetns logic from c/common, drop the podman code
here and make use of the new much simpler API.
ref: https://github.com/containers/common/pull/1761
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
add a new option --preserve-fd that allows to specify a list of FDs to
pass down to the container.
It is similar to --preserve-fds but it allows to specify a list of FDs
instead of the maximum FD number to preserve.
--preserve-fd and --preserve-fds are mutually exclusive.
It requires crun since runc would complain if any fd below
--preserve-fds is not preserved.
Closes: https://github.com/containers/podman/issues/20844
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
In FreeBSD-14.0, it is possible to configure a jail's network settings
from outside the jail using ifconfig and route's new '-j' option. This
removes the need for a separate jail to own the container's vnet.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
When InitialDelaySeconds in the kube yaml is set for a helthcheck,
don't update the healthcheck status till those initial delay seconds are over.
We were waiting to update for a failing healtcheck, but when the healthcheck
was successful during the initial delay time, the status was being updated as healthy
immediately.
This is misleading to the users wondering why their healthcheck takes
much longer to fail for a failing case while it is quick to succeed for
a healthy case. It also doesn't match what the k8s InitialDelaySeconds
does. This change is only for kube play, podman healthcheck run is
unaffected.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
1. Set the marker to the current virtual machine type instead of fixed qemu.
2. Update containers/common
[NO NEW TESTS NEEDED]
Signed-off-by: Black-Hole1 <bh@bugs.cc>
This updates the container-device-interface dependency to v0.6.2 and renames the import to
tags.cncf.io/container-device-interface to make use of the new vanity URL.
[NO NEW TESTS NEEDED]
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Docker allows the passing of -1 to indicate the maximum limit
allowed for the current process.
Fixes: https://github.com/containers/podman/issues/19319
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
All `[]string`s in containers.conf have now been migrated to attributed
string slices which require some adjustments in Buildah and Podman.
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This is not used at all but causes a libimage import for non linux
builds which causes bloat for them, with the new !remote tag this is no
longer possible and we have to remove it to fix the build.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When the hostNetwork option is set to true in the k8s yaml,
set the pod's hostname to the name of the machine/node as is
done in k8s. Also set the utsns to host.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Use the new FindInitBinary() function to lookup the init binary, this
allows the use of helper_binaries_dir in contianers.conf[1]
[NO NEW TESTS NEEDED]
[1] https://github.com/containers/common/issues/1110
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add support for DefaultMode for configMaps and secrets.
This allows users to set the file permissions for files
created with their volume mounts. Adheres to k8s defaults.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Add --rdt-class=COS to the create and run command to enable the
assignment of a container to a Class of Service (COS). The COS
represents a part of the cache based on the Cache Allocation Technology
(CAT) feature that is part of Intel's Resource Director Technology
(Intel RDT) feature set. By assigning a container to a COS, all PID's of
the container have only access to the cache space defined for this COS.
The COS has to be pre-configured based on the resctrl kernel driver.
cat_l2 and cat_l3 flags in /proc/cpuinfo represent CAT support for cache
level 2 and 3 respectively.
Signed-off-by: Wolfgang Pross <wolfgang.pross@intel.com>
Container ports defined with containerPort were exposed by default
even though kubernetes interprets them as mostly informative.
Closes#17028
Signed-off-by: Peter Werner <wpw.peter@gmail.com>
commit cf364703fc changed the way
/sys/fs/cgroup is mounted when there is not a netns and it now honors
the ro flag. The mount was created using a bind mount that is a
problem when using a cgroup namespace, fix that by mounting a fresh
cgroup file system.
Closes: https://github.com/containers/podman/issues/20073
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
First do not lint pkg/domain/infra/abi with the remote tag as this is
only local code.
Then mark the cacheLibImage field as unused, this should be an unused
stub for the remote client so that we do not leak libimage.
The linter sees that with the remote tag so we need to silence that
warning.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
These files should never be included on the remote client. There only
there to finalize the spec on the server side.
This makes sure it will not get reimported by accident and bloat the
remote client again.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This is the last place were the remote client pulls in libimage, with
this the podman-remote binary size decreases from 44788 KB to
39424 KB (not stripped).
This change simply fixes that by gating it behind the remote build tag.
Of course it would be a bit cleaner to never leak libimage into
pkg/specgen and only have it in pkg/specgen/generate. But this would be
much more involved with big chnages so I went with the easy and quick
way instead.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
commit 8b4a79a744 introduced
oom_score_adj clamping when the container oom_score_adj value is lower
than the current one in a rootless environment. Move the check to
init() time so it is performed every time the container starts and not
only when it is created. It is more robust if the oom_score_adj value
is changed for the current user session.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Add support to kube play to support the TerminationGracePeriodSeconds
fiels by sending the value of that to podman's stopTimeout.
Add support to kube generate to generate TerminationGracePeriodSeconds
if stopTimeout is set for a container (will ignore podman's default).
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
This allows to use --share-parent with --infra=false, so that the
containers in the pod can share the parent cgroup.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The logic here makes little sense, basically the /tmp and /var/tmp are
always set noexec, while /run is not. I don't see a reason to set any
of the three noexec by default.
Fixes: https://github.com/containers/podman/issues/19886
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
allow the image to specify an empty list of capabilities, currently
podman chokes when the io.containers.capabilities specified in an
image does not contain at least one capability.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when running rootless, if the specified oom_score_adj for the
container process is lower than the current value, clamp it to the
current value and print a warning.
Closes: https://github.com/containers/podman/issues/19829
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
getImageFromSpec has just make exactly the same Inspect call.
[NO NEW TESTS NEEDED]: This adds no new functionality, and
it's hard to test that a duplicate call didn't happen without
(intrusive and hard-to-maintain) mocks.
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This is a regression for #18052.
When podman ignores the resource limits, s.ResourceLimits needs to be
nil.
[NO NEW TESTS NEEDED]
Signed-off-by: Toshiki Sonoda <sonoda.toshiki@fujitsu.com>
Fixes a bug where `podman kube play` fails to set a container's Umask
to the default 0022, and sets it to 0000 instead.
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
This changes /run to /var/run for .containerenv and secrets in FreeBSD
containers for consistency with FreeBSD path conventions. Running Linux
containers on FreeBSD hosts continue to use /run for compatibility.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
This just sets the flag in the runtime spec - the actual implementation
is in the OCI runtime.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
We do not allow volumes and mounts to be placed at the same
location in the container, with create-time checks to ensure this
does not happen. User-added conflicts cannot be resolved (if the
user adds two separate mounts to, say, /myapp, we can't resolve
that contradiction and error), but for many other volume sources,
we can solve the contradiction ourselves via a priority
hierarchy. Image volumes come first, and are overridden by the
`--volumes-from` flag, which are overridden by user-added mounts,
etc, etc. The problem here is that we were not properly handling
volumes-from overriding image volumes. An inherited volume from
--volumes-from would supercede an image volume, but an inherited
mount would not. Solution is fortunately simple - just clear out
the map entry for the other type when adding volumes-from
volumes.
Makes me wish for Rust sum types - conflict resolution would be a
lot simpler if we could use a sum type for volumes and bind
mounts and thus have a single map instead of two maps, one for
each type.
Fixes#19529
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
On FreeBSD, each container has its own devfs instance with a ruleset
that controls what the container can see. To expose devices to a
container we add rules to its devfs to make the requested devices
visible. For privileged containers, we use 'ruleset=0' which makes
everything visible.
This shares the ParseDevice function with Linux so it moves to
config_common.go from config_linux.go.
Signed-off-by: Doug Rabson <dfr@rabson.org>
First, all the defaults for TERM=xterm were removed from c/common, then accordingly the same will be added if encountered a set tty flag.
Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
The intention of --read-only-tmpfs=fals when in --read-only mode was to
not allow any processes inside of the container to write content
anywhere, unless the caller also specified a volume or a tmpfs. Having
/dev and /dev/shm writable breaks this assumption.
Fixes: https://github.com/containers/podman/issues/12937
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add a new "healthy" sdnotify policy that instructs Podman to send the
READY message once the container has turned healthy.
Fixes: #6160
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Adds any required "wiring" to ensure the reserved annotations are supported by
`podman kube play`.
Addtionally fixes a bug where, when inspected, containers created using
the `--publish-all` flag had a field `.HostConfig.PublishAllPorts` whose
value was only evaluated as `false`.
Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
When using 'podman run --rootfs ...', the image passed to SpecGenToOCI
may be nil - in this case, fall back to "freebsd" for the container OS.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
When working on Linux emulation on FreeBSD, I assumed that
SpecGenerator.ImageOS was always populated from the image's OS value but
in fact, this value comes from the CLI --os flag if set, otherwise "".
This broke running FreeBSD native containers unless --os=freebsd was
also set. Fix the problem by getting the value from the image itself.
This is a strong incentive for me to complete a stalled project to enable
podman system tests on FreeBSD.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
Make sure we use the config field to know if we should use pasta or
slirp4netns as default.
While at it fix broken code which sets the default at two different
places, also do not set in Validate() as this should not modify the
specgen IMO, so set it directly before that.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This adds define.BindOptions to declare the mount options for bind-like
mounts (nullfs on FreeBSD). Note: this mirrors identical declarations in
buildah and it may be preferable to use buildah's copies throughout
podman.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
This is limited to images that don't depend on complex cgroup or capability
setups but does cover enough functionality to be useful.
[NO NEW TESTS NEEDED]
Signed-off-by: Doug Rabson <dfr@rabson.org>
we were silently ignoring --device-cgroup-rule in rootless mode. Make
sure an error is returned if the user tries to use it.
Closes: https://github.com/containers/podman/issues/18698
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This fixes a lint issue, but I'm keeping it in its own commit so
it can be reverted independently if necessary; I don't know what
side effects this may have. I don't *think* there are any
issues, but I'm not sure why it wasn't a pointer in the first
place, so there may have been a reason.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This probably should have been in the API since the beginning,
but it's not too late to start now.
The extra information is returned (both via the REST API, and to
the CLI handler for `podman rm`) but is not yet printed - it
feels like adding it to the output could be a breaking change?
Signed-off-by: Matthew Heon <matthew.heon@pm.me>