Currently we are sending over pids-limits from the user even if they
never modified the defaults. The pids limit should be set at the server
side unless modified by the user.
This issue has led to failures on systems that were running with cgroups V1.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
In `podman inspect` output for containers and pods, we include
the command that was used to create the container. This is also
used by `podman generate systemd --new` to generate unit files.
With remote podman, the generated create commands were incorrect
since we sourced directly from os.Args on the server side, which
was guaranteed to be `podman system service` (or some variant
thereof). The solution is to pass the command along in the
Specgen or PodSpecgen, where we can source it from the client's
os.Args.
This will still be VERY iffy for mixed local/remote use (doing a
`podman --remote run ...` on a remote client then a
`podman generate systemd --new` on the server on the same
container will not work, because the `--remote` flag will slip
in) but at the very least the output of `podman inspect` will be
correct. We can look into properly handling `--remote` (parsing
it out would be a little iffy) in a future PR.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We are using these dependencies just to get the device from path.
These dependencies no longer build on Windows, so simply cloning
the deviceFromPath function, we can eliminate the need for this
vendoring.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If I enter a continer with --userns keep-id, my UID will be present
inside of the container, but most likely my user will not be defined.
This patch will take information about the user and stick it into the
container.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
--sdnotify container|conmon|ignore
With "conmon", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.
With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.
This is the default, which is closest to the behavior podman has done in the past.
The "ignore" option removes NOTIFY_SOCKET from the environment, so neither podman nor
any child processes will talk to systemd.
This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.
Signed-off-by: Joseph Gooch <mrwizard@dok.org>
With the advent of Podman 2.0.0 we crossed the magical barrier of go
modules. While we were able to continue importing all packages inside
of the project, the project could not be vendored anymore from the
outside.
Move the go module to new major version and change all imports to
`github.com/containers/libpod/v2`. The renaming of the imports
was done via `gomove` [1].
[1] https://github.com/KSubedi/gomove
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
--tz flag sets timezone inside container
Can be set to IANA timezone as well as `local` to match host machine
Signed-off-by: Ashley Cui <acui@redhat.com>
I didn't believe that this was actually legal, but it looks like
it is. And, unlike our previous understanding (host port being
empty means just use container port), empty host port actually
carries the same meaning as `--expose` + `--publish-all` (that
is, assign a random host port to the given container port). This
requires a significant rework of our port handling code to handle
this new case. I don't foresee this being commonly used, so I
optimized having a fixed port number as fast path, which this
random assignment code running after the main port handling code
only if necessary.
Fixes#6806
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Also make sure that the limits we set for rootless are not higher than
what we'd set for root containers.
Rootless containers failed to start when the calling user already
had ulimit (e.g. on NOFILE) set.
This is basically a cherry-pick of 76f8efc0d0 into specgen
Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
We have a flag, --syslog, for telling logrus to log to syslog as
well as to the terminal. Previously, this flag also set the exit
command for containers to use `--syslog` (otherwise all output
from exit commands is lost). I attempted to replicate this with
Podman v2.0, but quickly ran into circular import hell (the flag
is defined in cmd/podman, I needed it in cmd/podman/containers,
cmd/podman imports cmd/podman/containers already, etc). Instead,
let's just set the syslog flag automatically on
`--log-level=debug` so we log exit commands automatically when
debug-level logs are requested. This is consistent with Conmon
and seems to make sense.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The `--privileged` flag does not conflict with `--group-add`
(this one was breaking Toolbox) and does not conflict with most
parts of `--security-opt` (this was breaking Openstack).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Currently podman run --userns keep-id --user root:root fedora id
The --user flag is ignored. Removing this makes the code work correctly.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We initially believed that implementing this required support for
restarting containers after reboot, but this is not the case.
The unless-stopped restart policy acts identically to the always
restart policy except in cases related to reboot (which we do not
support yet), but it does not require that support for us to
implement it.
Changes themselves are quite simple, we need a new restart policy
constant, we need to remove existing checks that block creation
of containers when unless-stopped was used, and we need to update
the manpages.
Fixes#6508
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
These were part of Podman v1.9, but were lost in the transition
to using Specgen to create containers. Most resource limits are
checked via the sysinfo package to ensure they are safe to use
(the cgroup is mounted, kernel support is present, etc) and
removed if not safe. Further, bounds checks are performed to
ensure that values are valid.
Ensure these warnings are printed client-side when they occur.
This part is a little bit gross, as it happens in pkg/infra and
not cmd/podman, which is largely down to how we implemented
`podman run` - all the work is done in pkg/infra and it returns
only once the container has exited, and we need warnings to print
*before* the container runs. The solution here, while inelegant,
avoid the need to extensively refactor our handling of run.
Should fix blkio-limit warnings that were identified by the FCOS
test suite.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Add an `--infra-conmon-pidfile` flag to `podman-pod-create` to write the
infra container's conmon process ID to a specified path. Several
container sub-commands already support `--conmon-pidfile` which is
especially helpful to allow for systemd to access and track the conmon
processes. This allows for easily tracking the conmon process of a
pod's infra container.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add a `CreateCommand` field to the pod config which includes the entire
`os.Args` at pod-creation. Similar to the already existing field in a
container config, we need this information to properly generate generic
systemd unit files for pods. It's a prerequisite to support the `--new`
flag for pods.
Also add the `CreateCommand` to the pod-inspect data, which can come in
handy for debugging, general inspection and certainly for the tests that
are added along with the other changes.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Systemd enablement has to happen on the server side, since we need
check if the image is running systemd.
Also need to make sure user setting the StopSignal is not overriden on the
server side. But if not set and using systemd, we set it correctly.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When we moved to the new Namespace types in Specgen, we made a
distinction between taking a namespace from a pod, and taking it
from another container. Due to this new distinction, some code
that previously worked for both `--pod=$ID` and
`--uts=container:$ID` has accidentally become conditional on only
the latter case. This happened for Hostname - we weren't properly
setting it in cases where the container joined a pod.
Fortunately, this is an easy fix once we know to check the
condition.
Also, ensure that `podman pod inspect` actually prints hostname.
Fixes#6494
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The biggest obstacle here was cleanup - we needed a way to remove
detached exec sessions after they exited, but there's no way to
tell if an exec session will be attached or detached when it's
created, and that's when we must add the exit command that would
do the removal. The solution was adding a delay to the exit
command (5 minutes), which gives sufficient time for attached
exec sessions to retrieve the exit code of the session after it
exits, but still guarantees that they will be removed, even for
detached sessions. This requires Conmon 2.0.17, which has the new
`--exit-delay` flag.
As part of the exit command rework, we can drop the hack we were
using to clean up exec sessions (remove them as part of inspect).
This is a lot cleaner, and I'm a lot happier about it.
Otherwise, this is just plumbing - we need a bindings call for
detached exec, and that needed to be added to the tunnel mode
backend for entities.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The cleanup command creation logic is made public as part of this
and wired such that we can call it both within SpecGen (to make
container exit commands) and from the ABI detached exec handler.
Exit commands are presently only used for detached exec, but
theoretically could be turned on for all exec sessions if we
wanted (I'm declining to do this because of potential overhead).
I also forgot to copy the exit command from the exec config into
the ExecOptions struct used by the OCI runtime, so it was not
being added.
There are also two significant bugfixes for exec in here. One is
for updating the status of running exec sessions - this was
always failing as I had coded it to remove the exit file *before*
reading it, instead of after (oops). The second was that removing
a running exec session would always fail because I inverted the
check to see if it was running.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We were accidentally setting incorrect defaults for the network
namespace for rootless `pod create` when infra containers were
not being created. This should resolve that issue.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
some small fix ups for binding tests and then make them required.
update containers-common
V2 bindings tests were failing because of changes introduced in commit
a2ad5bb.
Fix some typos.
Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>
in the case where the specgen attribute for Env and Labels are nil, we should should then make the map IF we have labels and envs that need to be added.
Signed-off-by: Brent Baude <bbaude@redhat.com>
Interpreting CNI networks was a bit broken, and it was causing
rootless `podman pod create` to fail. Also, we were missing the
`--net` alias for `--network`, so add that.
Fixes#6119
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
There are three different priorities for applying env variables:
1) environment/config file environment variables
2) image's config
3) user overrides (--env)
The third kind are known to the client, while the default config and image's
config is handled by the backend.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add the `podman generate kube` and `podman play kube` command. The code
has largely been copied from Podman v1 but restructured to not leak the
K8s core API into the (remote) client.
Both commands are added in the same commit to allow for enabling the
tests at the same time.
Move some exports from `cmd/podman/common` to the appropriate places in
the backend to avoid circular dependencies.
Move definitions of label annotations to `libpod/define` and set the
security-opt labels in the frontend to make kube tests pass.
Implement rest endpoints, bindings and the tunnel interface.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
To try and identify differences between Podman v1.9 and master,
I ran a series of `podman run` commands with various flags
through each, then inspecting the resulting containers and diffed
the inspect JSON between each. This identified a number of issues
which are fixed in this PR.
In order of discovery:
- Podman v2 gave short names for images, where Podman v1 gave the
fully-qualified name. Simple enough fix (get image tags and use
the first one if they're available)
- The --restart flag was not being parsed correctly when a number
of retries was specified. Parsing has been corrected.
- The -m flag was not setting the swap limit (simple fix to set
swap in that case if it's not explicitly set by the user)
- The --cpus flag was completely nonfunctional (wired in its
logic)
Tests have been added for all of these to catch future
regressions.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
As part of this, make a major change to the type we use to
represent port mappings in SpecGen (from using existing OCICNI
structs to using our own custom one). This struct has the
advantage of supporting ranges, massively reducing traffic over
the wire for Podman commands using them (for example, the
`podman run -p 5000-6000` command will now send only one struct
instead of 1000). This struct also allows us to easily validate
which ports are in use, and which are not, which is necessary for
--expose.
Once we have parsed the ports from the new struct, we can produce
an accurate map including all currently requested ports, and use
that to determine what ports need to be exposed (some requested
exposed ports may already be included in a mapping from --publish
and will be ignored) and what open ports on the host we can map
them to.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Currently we are setting the maximum limits for rootful podman containers,
no reason not to set them by default for rootless users as well
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We should not be overwriting the Specgen's Command and Entrypoint
when building the final command to pass in the OCI spec. Both of
these will be provided to Libpod for use in `podman inspect` and
committing containers, and both must be set to the user's input,
not overwritten by the image if unset.
Fix this by moving command generation into OCI spec generation
and not modifying the SpecGenerator when we do so.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This should complete Podmanv2's support for volume-related flags.
Most code was sourced from the old pkg/spec implementation with
modifications to account for the split between frontend flags
(volume, mount, tmpfs) and the backend flags implemented here.
Also enables tests for podman run with volumes
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We do not want to join pod namespaces if no infra container is
present. A pod may claim it shares namespaces without an infra
container (I'll take an action item to fix that - it really
should not be allowed), which was tripping up our default
namespace code and forcing us to try and join the namespaces of
the nonexistant infra container.
Signed-off-by: Matthew Heon <mheon@redhat.com>
This enables the --volume, --mount, and --tmpfs flags in
Podmanv2. It does not enable init-related flags, image volumes,
and --volumes-from.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Namespaces have now been changed to properly handle all cases.
Spec handling code for namespaces was consolidated in a single
function.
Still missing:
- Image ports
- Pod namespaces likely still broken in Podmanv2
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
if the image specifies both the image and entrypoint, we need to account for that and preprend the entrypoint to the command. this only happens if no user command and entrypoint were supplied.
Signed-off-by: Brent Baude <bbaude@redhat.com>
Add more default options parsing
Switch to using --time as opposed to --timeout to better match Docker.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If user sets capabilities list we need handle minimal capabilities.
Also handle seccomp-policy being passed in.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
We were not handling the parsing of --ip. This pr adds validation
checks and now will support the flag.
Move validation to the actual parsing of the network flags.
We should only parse the dns flags if the user changed them. We don't
want to pass default options if set in containers.conf to the server.
Potential for duplicating defaults.
Add support for --dns-opt flag passing
Begin handling of --network flag, although we don't have a way right now
to translate a string into a specgen.Namespace.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
* Enable running podman V2 rootless
* Fixed cobra.PersistentPreRunE usage in all the commands
* Leveraged cobra.PersistentPreRunE/cobra.PersistentPostRunE to manage:
* rootless
* trace (--trace)
* profiling (--cpu-profile)
* initializing the registry copies of Image/Container engines
* Help and Usage templates autoset for all sub-commands
Signed-off-by: Jhon Honce <jhonce@redhat.com>
create a container in podmanv2 using specgen approach. this is the core implementation and still has quite a bit of code commented out specifically around volumes, devices, and namespaces. need contributions from smes on these parts.
Signed-off-by: Brent Baude <bbaude@redhat.com>
using the factory approach similar to container, we now create pods based on a pod spec generator. wired up the podmanv2 pod create command, podcreatewithspec binding, simple binding test, and apiv2 endpoint.
also included some code refactoring as it introduced as easy circular import.
Signed-off-by: Brent Baude <bbaude@redhat.com>
when a network is not provided, we should set a default mode based on rootless or rootfull.
Fixes: #5366
Signed-off-by: Brent Baude <bbaude@redhat.com>
Add support to auto-update containers running in systemd units as
generated with `podman generate systemd --new`.
`podman auto-update` looks up containers with a specified
"io.containers.autoupdate" label (i.e., the auto-update policy).
If the label is present and set to "image", Podman reaches out to the
corresponding registry to check if the image has been updated. We
consider an image to be updated if the digest in the local storage is
different than the one of the remote image. If an image must be
updated, Podman pulls it down and restarts the container. Note that the
restarting sequence relies on systemd.
At container-creation time, Podman looks up the "PODMAN_SYSTEMD_UNIT"
environment variables and stores it verbatim in the container's label.
This variable is now set by all systemd units generated by
`podman-generate-systemd` and is set to `%n` (i.e., the name of systemd
unit starting the container). This data is then being used in the
auto-update sequence to instruct systemd (via DBUS) to restart the unit
and hence to restart the container.
Note that this implementation of auto-updates relies on systemd and
requires a fully-qualified image reference to be used to create the
container. This enforcement is necessary to know which image to
actually check and pull. If we used an image ID, we would not know
which image to check/pull anymore.
Fixes: #3575
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
during container creation, if no network is provided, we need to add a default value so the container can be later started.
use apiv2 container creation for RunTopContainer instead of an exec to the system podman. RunTopContainer now also returns the container id and an error.
added a libpod commit endpoint.
also, changed the use of the connections and bindings slightly to make it more convenient to write tests.
Fixes: 5366
Signed-off-by: Brent Baude <bbaude@redhat.com>
This patch allows users to specify the list of capabilities required
to run their container image.
Setting a image/container label "io.containers.capabilities=setuid,setgid"
tells podman that the contained image should work fine with just these two
capabilties, instead of running with the default capabilities, podman will
launch the container with just these capabilties.
If the user or image specified capabilities that are not in the default set,
the container will print an error message and will continue to run with the
default capabilities.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Before Libpod supported named volumes, we approximated image
volumes by bind-mounting in per-container temporary directories.
This was handled by Libpod, and had a corresponding database
entry to enable/disable it.
However, when we enabled named volumes, we completely rewrote the
old implementation; none of the old bind mount implementation
still exists, save one flag in the database. With nothing
remaining to use it, it has no further purpose.
Signed-off-by: Matthew Heon <mheon@redhat.com>
this uses the specgen structure to create containers rather than the outdated createconfig. right now, only the apiv2 create is wired up. eventually the cli will also have to be done.
Signed-off-by: Brent Baude <bbaude@redhat.com>
warning: the naming of this might change as well as the location.
this is a build on a PR from mheon from last year that proposes a shift from our current approach of creating containers based on the arbitrarily made createconfig. the new approach would be to have a specification that is detached from the podman cli. the spec could then be generated and used to make a container. this theoretically is the beginning of a long-needed refactor involving how we get from the cli -> libpod | apiv2 -> libpod with code re-use and less duplication.
the intent is to build the apiv2 container creation based on this approach only. wiring to the podman cli will happen after the fact.
Signed-off-by: Brent Baude <bbaude@redhat.com>
The current Libpod pkg/spec has become a victim of the better
part of three years of development that tied it extremely closely
to the current Podman CLI. Defaults are spread across multiple
places, there is no easy way to produce a CreateConfig that will
actually produce a valid container, and the logic for generating
configs has sprawled across at least three packages.
This is an initial pass at a package that generates OCI specs
that will supersede large parts of the current pkg/spec. The
CreateConfig will still exist, but will effectively turn into a
parsed CLI. This will be compiled down into the new SpecGenerator
struct, which will generate the OCI spec and Libpod create
options.
The preferred integration point for plugging into Podman's Go API
to create containers will be the new CreateConfig, as it's less
tied to Podman's command line. CRI-O, for example, will likely
tie in here.
Signed-off-by: Matthew Heon <mheon@redhat.com>