play kube function not respecting selinux options in kube yaml, all options were
being mapped to role.
fixes issue 8710
Signed-off-by: Steven Taylor <steven@taylormuff.co.uk>
We need an extra field in the pod infra container config. We may
want to reevaluate that struct at some point, as storing network
modes as bools will rapidly become unsustainable, but that's a
discussion for another time. Otherwise, straightforward plumbing.
Fixes#9165
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
when creating kubernetes yaml from containers and pods, we should honor
any custom dns settings the user provided. in the case of generate kube,
these would be provided by --dns, --dns-search, and --dns-opt. if
multiple containers are involved in the generate, the options will be
cumulative and unique with the exception of dns-opt.
when replaying a kube file that has kubernetes dns information, we now
also add that information to the pod creation.
the options for dnspolicy is not enabled as there seemed to be no direct
correlation between kubernetes and podman.
Fixes: #9132
Signed-off-by: baude <bbaude@redhat.com>
A container's workdir can be specified via the CLI via `--workdir` and
via an image config with the CLI having precedence.
Since images have a tendency to specify workdirs without necessarily
shipping the paths with the root FS, make sure that Podman creates the
workdir. When specified via the CLI, do not create the path, but check
for its existence and return a human-friendly error.
NOTE: `crun` is performing a similar check that would yield exit code
127. With this change, however, Podman performs the check and yields
exit code 126. Since this is specific to `crun`, I do not consider it
to be a breaking change of Podman.
Fixes: #9040
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
when using the compatibility api to create containers, now reflect the
use of k8s-file as json-file so that clients, which are
unaware of k8s-file, can work. specifically, if the container is using
k8s-file as the log driver, we change the log type in container
inspection to json-file. These terms are used interchangably in other
locations in libpod/podman.
this fixes log messages in compose as well.
[NO TESTS NEEDED]
Signed-off-by: baude <bbaude@redhat.com>
partially revert 95c45773d7
restrict the cases where /sys is bind mounted from the host.
The heuristic doesn't detect all the cases where the bind mount is not
necessary, but it is an improvement on the previous version where /sys
was always bind mounted for rootless containers unless --net none was
specified.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when using the bindings to *only* make a connection, the binary was
rough 28MB. This PR reduces it down to 11. There is more work to do
but it will come in a secondary PR.
Signed-off-by: baude <bbaude@redhat.com>
We now set Entrypoint when interpeting the image Entrypoint (or yaml.Command)
and Command when interpreting image Cmd (or yaml.Args)
This change is kind of breaking because now checking Config.Cmd won't return
the full command, but only the {cmd,args}.
Adapt the tests to this change as well
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
`staticcheck` is a golang code analysis tool. https://staticcheck.io/
This commit fixes a lot of problems found in our code. Common problems are:
- unnecessary use of fmt.Sprintf
- duplicated imports with different names
- unnecessary check that a key exists before a delete call
There are still a lot of reported problems in the test files but I have
not looked at those.
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
This PR takes the settings from containers.conf and uses
them. This works on the podman local but does not fix the
issue for podman remote or for APIv2. We need a way
to specify optionalbooleans when creating containers.
Fixes: https://github.com/containers/podman/issues/8843
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Follow-up to commit (1ad796677e). The build on mips is still
failing because SIGWINCH was not defined in the signal pkg.
Also stat_t.Rdev is unit32 on mips so we need to typecast.
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
When I launch a container with --userns=keep-id the rootless processes
should have no caps by default even if I launch the container with
--privileged. It should only get the caps if I specify by hand the
caps I want leaked to the process.
Currently we turn off capeff and capamb, but not capinh. This patch
treats capinh the same way as capeff and capamb.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When adding the HOSTNAME environment variable, only do so if it
is not already present in the spec. If it is already present, it
was likely added by the user, and we should honor their requested
value.
Fixes#8886
Signed-off-by: Matthew Heon <mheon@redhat.com>
When running a privileged container and `SeccompProfilePath` is empty no seccomp profile should be applied.
(Previously this was the case only if `SeccompProfilePath` was set to a non-empty default path.)
Closes#8849
Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
when HostNetwork is true in the pod spec.
Also propagate whether host network namespace should be used for containers.
Add test for HostNetwork setting in kubeYaml.
The infra configuration should reflect the setting.
Signed-off-by: Benedikt Ziemons <ben@rs485.network>
when neither yaml.Args nor yaml.Command are specified, we should use the entrypoint and cmd from the image.
update the tests to cover this and another case (both args and command are specified).
use the registry image instead of redis, as it has both an entrypoint and command specified.
update the documentation around this handling to hopefully prevent regressions and confusion.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
The existing code prevents containers.conf default sysctls from
being added if the container uses a host namespace. This patch
expands that to not just host namespaces, but also *shared*
namespaces - so we never modify another container's (or a pod's)
namespaces without being explicitly directed to do so by the
user.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Handle the ALL Flag when running with an account as a user.
Currently we throw an error when the user specifies
podman run --user bin --cap-add all fedora echo hello
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
we must honor systempaths=unconfined also for read-only paths, as
Docker does:
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The existing logic (Range > 0) always triggered, because range is
guaranteed to be at least 1 (a single port has a range of 1, a
two port range (e.g. 80-81) has a range of 2, and so on). As such
this could cause ports that had a host port assigned to them by
the user to randomly assign one instead.
Fixes#8650Fixes#8651
Signed-off-by: Matthew Heon <mheon@redhat.com>
When creating a container, do not clear the input-image name before
looking up image names. Also add a regression test.
Fixes: #8558
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
We can't mount sysfs as rootless unless we manage the network
namespace. Problem: slirp4netns is now creating and managing a
network namespace separate from the OCI runtime, so we can't
mount sysfs in many circumstances. The `crun` OCI runtime will
automatically handle this by falling back to a bind mount, but
`runc` will not, so we didn't notice until RHEL gating tests ran
on the new branch.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Docker defines an option of "default" which means to
use the default network. We should support this with
the same code path as --network="".
This is important for compatibility with the Docker API.
Fixes: https://github.com/containers/podman/issues/8544
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add the mask and unmask option to the --security-opt flag
to allow users to specify paths to mask and unmask in the
container. If unmask=ALL, this will unmask all the paths we
mask by default.
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Instead of being interpreted as an argument to the boolean flag,
the 'true' is being intepreted as the Podman command to be run -
so we're trying to run `podman true`, which does not exist. This
causes the cleanup command to fail when `--log-level=debug` is
set, so containers are not cleaned up or removed.
This problem is easily reproduced with any command combining the
`--rm`, `-d`, and `--log-level=debug` flags - the command will
execute and exit, but the container will not be removed.
Separate, but worth looking into later: the errors we get on
trying `podman true` with any flags are terrible - if you just
type `podman true` you get a quite sane "Unrecognized command"
error, but if you try `podman true --rm` you get an "unknown flag
--rm" error - which makes very little sense given the command
itself doesn't exist.
Signed-off-by: Matthew Heon <mheon@redhat.com>
In k8s a persistent volume claim (PVC) allow pods to define a volume
by referencing the name of a PVC. The PVC basically contains criterias
that k8s then use to select which storage source it will use for the
volume.
Podman only provide one abtracted storage, the named volumes, and
create them if they don't exists yet. So this patch simply use a
volume with the name of the PVC.
Signed-off-by: Alban Bedel <albeu@free.fr>
Replace the simple map of names to paths with a map of names to a struct
to allow passing more parameters. Also move the code to parse the volumes
to its own file to avoid making the playKubePod() function overly complex.
Finally rework the kube volumes test to also be ready to support more
volume types.
Signed-off-by: Alban Bedel <albeu@free.fr>
Currently we don't document which end of the podman-remote client server
operations uses the containers.conf. This PR begins documenting this
and then testing to make sure the defaults follow the rules.
Fixes: https://github.com/containers/podman/issues/7657
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
we need to migrate play kube away from using the old container creation
method. the new approach is specgen and this aligns play kube with
container creation in the rest of podman.
Signed-off-by: baude <bbaude@redhat.com>
podman can now support adding network aliases when running containers
(--network-alias). It requires an updated dnsname plugin as well as an
updated ocicni to work properly.
Signed-off-by: baude <bbaude@redhat.com>
Setting port mappings only works when CNI is configuring our
network (or slirp4netns, in the rootless case). This is not the
case with `--net=host`, `--net=container:`, and joining the
network namespace of the pod we are part of. Instead of allowing
users to do these things and then be confused why they do
nothing, let's match Docker and return a warning that your port
mappings will do nothing.
Signed-off-by: Matthew Heon <mheon@redhat.com>
if --userns=keep-id is specified and not --user is specified, take the
unprivileged capabilities code path so that ambient capabilities are
honored in the container.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the kernel supports ambient capabilities (Linux 4.3+), also set
them when running with euid != 0.
This is different that what Moby does, as ambient capabilities are
never set.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Most of the builtin golang functions like os.Stat and
os.Open report errors including the file system object
path. We should not wrap these errors and put the file path
in a second time, causing stuttering of errors when they
get presented to the user.
This patch tries to cleanup a bunch of these errors.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Containers that share IPC Namespaces share each others
/dev/shm, which means a private /dev/shm needs to be setup
for the infra container.
Added a system test and an e2e test to make sure the
/dev/shm is shared.
Fixes: https://github.com/containers/podman/issues/8181
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add a new "image" mount type to `--mount`. The source of the mount is
the name or ID of an image. The destination is the path inside the
container. Image mounts further support an optional `rw,readwrite`
parameter which if set to "true" will yield the mount writable inside
the container. Note that no changes are propagated to the image mount
on the host (which in any case is read only).
Mounts are overlay mounts. To support read-only overlay mounts, vendor
a non-release version of Buildah.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Record the correct image name when creating a container by using the
resolved image name if present. Otherwise, default to using the first
available name or an empty string in which case the image must have been
referenced by ID.
Fixes: #8082
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
when using the compatibility layer to create containers, it used code paths to the pkg/spec which is the old implementation of containers. it is error prone and no longer being maintained. rather that fixing things in spec, migrating to specgen usage seems to make the most sense. furthermore, any fixes to the compat create will not need to be ported later.
Signed-off-by: baude <bbaude@redhat.com>
When a container uses --net=host the default hostname is set to
the host's hostname. However, we were not creating any entries
in `/etc/hosts` despite having a hostname, which is incorrect.
This hostname, for Docker compat, will always be the hostname of
the host system, not the container, and will be assigned to IP
127.0.1.1 (not the standard localhost address).
Also, when `--hostname` and `--net=host` are both passed, still
use the hostname from `--hostname`, not the host's hostname (we
still use the host's hostname by default in this case if the
`--hostname` flag is not passed).
Fixes#8054
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This fixes the issue that a simple port range should map to a random
port range from the host to the container, if no host port range is
specified. For example this fails without applying the patch:
```
> podman run -it -p 6000-6066 alpine
Error: cannot listen on the TCP port: listen tcp4 :53: bind: address already in use
```
The issue is that only the first port is randomly chosen and all
following in the range start by 0 and increment. This is now fixed by
tracking the ranges and then incrementing the random port if necessary.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
In case os.Open[File], os.Mkdir[All], ioutil.ReadFile and the like
fails, the error message already contains the file name and the
operation that fails, so there is no need to wrap the error with
something like "open %s failed".
While at it
- replace a few places with os.Open, ioutil.ReadAll with
ioutil.ReadFile.
- replace errors.Wrapf with errors.Wrap for cases where there
are no %-style arguments.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Docker supports log-opt max_size and so does conmon (ALthough poorly).
Adding support for this allows users to at least make sure their containers
logs do not become a DOS vector.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
All containers within a Pod need to run with the same SELinux
label, unless overwritten by the user.
Also added a bunch of SELinux tests to make sure selinux labels
are correct on namespaces.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If the container uses the /dev/fuse device, attempt to load the fuse
kernel module first so that nested containers can use it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1872240
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If run&create image returns error: image contains manifest list, not a runnable image, find the local image that has digest matching the digest from the list and use the image from local storage for the command.
Signed-off-by: Qi Wang <qiwan@redhat.com>
change capabilities handling to reflect what docker does.
Bounding: set to caplist
Inheritable: set to caplist
Effective: if uid != 0 then clear; else set to caplist
Permitted: if uid != 0 then clear; else set to caplist
Ambient: clear
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
If user sets namespace to host, then default sysctls need to be ignored
that are specific to that namespace.
--net=host ignore sysctls that begin with net.
--ipc=host ignore fs.mqueue
--uts=host ignore kernel.domainname and kernel.hostname
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add a bunch of tests to ensure that --volumes-from
works as expected.
Also align the podman create and run man page.
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Currently infr-command and --infra-image commands are ignored
from the user. This PR instruments them and adds tests for
each combination.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Record the user-specified "raw" image name in the SpecGenerator, so we
can pass it along to the config when creating a container. We need a
separate field as the image name in the generator may be set to the
ID of the previously pulled image - ultimately the cause of #7404.
Reverting the image name from the ID to the user input would not work
since "alpine" for pulling iterates over the search registries in the
registries.conf but looking up "alpine" normalizes to
"localhost/alpine".
Recording the raw-image name directly in the generator was the best of
the options I considered as no hidden magic from search registries or
normalizations (that may or may not change in the future) can interfere.
The auto-update backend enforces that the raw-image name is a
fully-qualified reference, so we need to worry about that in the front
end.
Fixes: #7407
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
During the redesign of podman 2.0, we dropped the support for --oom-score-adj.
Test for this flag was bogus and thus passing when it was broken.
Basically just need to set the value in the spec.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1877187
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If we select "unconfined" as AppArmor profile, then we should not error
even if the host does not support it at all. This behavior has been
fixed and a corresponding e2e test has been added as well.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
podman needs to use the environment settings in containers.conf
when setting up the containers.
Also host environment variables should be relative to server side
not the client.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The seccomp/containers-golang library is not maintained any more and we
should stick to containers/common.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
We have a lot of 'cannot stat %s' errors in our codebase. These
are terrible and confusing and utterly useless without context.
Add some context to a few of them so we actually know what part
of the code is failing.
Signed-off-by: Matthew Heon <mheon@redhat.com>
it allows to manually tweak the configuration for cgroup v2.
we will expose some of the options in future as single
options (e.g. the new memory knobs), but for now add the more generic
--cgroup-conf mechanism for maximum control on the cgroup
configuration.
OCI specs change: https://github.com/opencontainers/runtime-spec/pull/1040
Requires: https://github.com/containers/crun/pull/459
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
In podman 1.0 if you executed a command like:
podman run --user dwalsh --cap-add net_bind_service alpine nc -l 80
It would work, and the user dwalsh would get the capability, in
podman 2.0, only root and the binding set gets the capability.
This change restores us back to the way podman 1.0 worked.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Most Libpod containers are made via `pkg/specgen/generate` which
includes code to generate an appropriate exit command which will
handle unmounting the container's storage, cleaning up the
container's network, etc. There is one notable exception: pod
infra containers, which are made entirely within Libpod and do
not touch pkg/specgen. As such, no cleanup process, network never
cleaned up, bad things can happen.
There is good news, though - it's not that difficult to add this,
and it's done in this PR. Generally speaking, we don't allow
passing options directly to the infra container at create time,
but we do (optionally) proxy a pre-approved set of options into
it when we create it. Add ExitCommand to these options, and set
it at time of pod creation using the same code we use to generate
exit commands for normal containers.
Fixes#7103
Signed-off-by: Matthew Heon <mheon@redhat.com>