When running under systemd there is no need to create yet another
cgroup for the container.
With conmon-delegated the current cgroup will be split in two sub
cgroups:
- supervisor
- container
The supervisor cgroup will hold conmon and the podman process, while
the container cgroup is used by the OCI runtime (using the cgroupfs
backend).
Closes: https://github.com/containers/libpod/issues/6400
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The dangling filter determine whether a volume is dangling - IE,
it has no containers attached using it. Unlike our other filters,
this one is a boolean - must be true or false, not arbitrary
values.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Throw an error if a specified tag does not exist. Also make sure that
the user input is normalized as we already do for `podman tag`.
To prevent regressions, add a set of end-to-end and systemd tests.
Last but not least, update the docs and add bash completions.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
When going through the output of `podman inspect` to try and
identify another issue, I noticed that Podman 2.0 was setting
StopSignal to 0 on containers by default. After chasing it
through the command line and SpecGen, I determined that we were
actually not setting a default in Libpod, which is strange
because I swear we used to do that. I re-added the disappeared
default and now all is well again.
Also, while I was looking for the bug in SpecGen, I found a bunch
of TODOs that have already been done. Eliminate the comments for
these.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
specify the mappings in the container configuration to the storage
when creating the container so that the correct mappings can be
configured.
Regression introduced with Podman 2.0.
Closes: https://github.com/containers/libpod/issues/6735
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This incorporates code from PR #6591 and #6614 but does not use
event channels to detect container state and rather uses timers
with a defined wait duration before calling t.StopAtEOF() to
ensure the last log entry is output before a container exits.
The polling interval is set to 250 milliseconds based on polling
interval defined in hpcloud/tail here:
https://github.com/hpcloud/tail/blob/v1.0.0/watch/polling.go#L117
Co-authored-by: Qi Wang <qiwan@redhat.com>
Signed-off-by: jgallucci32 <john.gallucci.iv@gmail.com>
This will allow containers that connect to the network namespace be
able to use the container name directly.
For example you can do something like
podman run -ti --name foobar fedora ping foobar
While we can do this with hostname now, this seems more natural.
Also if another container connects on the network to this container it
can do
podman run --network container:foobar fedora ping foobar
And connect to the original container,without having to discover the name.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When multiple connections are monitoring events via the remote API, the inotify in the hpcloud library seems unable to consistently send events. Switching from inotify to poll seems to clear this up.
Fixes: #6664
Signed-off-by: Brent Baude <bbaude@redhat.com>
Allow wildcards in the search term. Note that not all registries
support wildcards and it may only work with v1 registries.
Note that searching implies figuring out if the specified search term
includes a registry. If there's not registry detected, the search term
will be used against all configured "unqualified-serach-registries" in
the registries.conf. The parsing logic considers a registry to be the
substring before the first slash `/`.
With these changes we now not only support wildcards but arbitrary
input; ultimately it's up to the registries to decide whether they
support given input or not.
Fixes: bugzilla.redhat.com/show_bug.cgi?id=1846629
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
As part of APIv2 Attach, we need to be able to attach to freshly
created containers (in ContainerStateConfigured). This isn't
something Libpod is interested in supporting, so we use Init() to
get the container into ContainerStateCreated, in which attach is
possible. Problem: Init() will fail if dependencies are not
started, so a fresh container in a fresh pod will fail. The
simplest solution is to extend the existing recursive start code
from Start() to Init(), allowing dependency containers to be
started when we initialize the container (optionally, controlled
via bool).
Also, update some comments in container_api.go to make it more
clear how some of our major API calls work.
Fixes#6646
Signed-off-by: Matthew Heon <mheon@redhat.com>
We initially believed that implementing this required support for
restarting containers after reboot, but this is not the case.
The unless-stopped restart policy acts identically to the always
restart policy except in cases related to reboot (which we do not
support yet), but it does not require that support for us to
implement it.
Changes themselves are quite simple, we need a new restart policy
constant, we need to remove existing checks that block creation
of containers when unless-stopped was used, and we need to update
the manpages.
Fixes#6508
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
When the container uses journald logging, we don't want to
automatically use the same driver for its exec sessions. If we do
we will pollute the journal (particularly in the case of
healthchecks) with large amounts of undesired logs. Instead,
force exec sessions logs to file for now; we can add a log-driver
flag later (we'll probably want to add a `podman logs` command
that reads exec session logs at the same time).
As part of this, add support for the new 'none' logs driver in
Conmon. It will be the default log driver for exec sessions, and
can be optionally selected for containers.
Great thanks to Joe Gooch (mrwizard@dok.org) for adding support
to Conmon for a null log driver, and wiring it in here.
Fixes#6555
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
This fixes a condition when a container is removed while
following the logs and prints an error when the container
is removed forcefully.
Signed-off-by: jgallucci32 <john.gallucci.iv@gmail.com>
Fixes an issue with the previous PR where a container would exit while following logs and the log tail continued to follow. This creates a subroutine which checks the state of the container and instructs the tailLog to stop when it reaches EOF.
Tested the following conditions:
* Tail and follow logs of running container
* Tail and follow logs of stopped container
* Tail and follow logs of running container which exits after some time
Signed-off-by: jgallucci32 <john.gallucci.iv@gmail.com>
fix the check for c.state.NetNS == nil. Its value is changed in the
first code block, so the condition is always true in the second one
and we end up running slirp4netns twice.
Closes: https://github.com/containers/libpod/issues/6538
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Create a new template for generating a pod unit file. Eventually, this
allows for treating and extending pod and container generation
seprately.
The `--new` flag now also works on pods.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Refactor the systemd-unit generation code and move all the logic into
`pkg/systemd/generate`. The code was already hard to maintain but I
found it impossible to wire the `--new` logic for pods in all the chaos.
The code refactoring in this commit will make maintaining the code
easier and should make it easier to extend as well. Further changes and
refactorings may still be needed but they will easier.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add an `--infra-conmon-pidfile` flag to `podman-pod-create` to write the
infra container's conmon process ID to a specified path. Several
container sub-commands already support `--conmon-pidfile` which is
especially helpful to allow for systemd to access and track the conmon
processes. This allows for easily tracking the conmon process of a
pod's infra container.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add a `CreateCommand` field to the pod config which includes the entire
`os.Args` at pod-creation. Similar to the already existing field in a
container config, we need this information to properly generate generic
systemd unit files for pods. It's a prerequisite to support the `--new`
flag for pods.
Also add the `CreateCommand` to the pod-inspect data, which can come in
handy for debugging, general inspection and certainly for the tests that
are added along with the other changes.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Two areas needed tweaking to accomplish this: port parsing and
binding ports on the host.
Parsing is an obvious problem - we have to accomodate an IPv6
address enclosed by [] as well as a normal IPv4 address. It was
slightly complicated by the fact that we previously just counted
the number of colons in the whole port definition (a thousand
curses on whoever in the IPv6 standard body decided to reuse
colons for address separators), but did not end up being that
bad.
Libpod also (optionally) binds ports on the host to prevent their
reuse by host processes. This code was IPv4 only for TCP, and
bound to both for UDP (which I'm fairly certain is not correct,
and has been adjusted). This just needed protocols adjusted to
read "tcp4"/"tcp6" and "udp4"/"udp6" based on what we wanted to
bind to.
Fixes#5715
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
do not set the hostname when joining an UTS namespace, as it could be
owned by a different userns.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when running in a new userns, make sure the resolv.conf and hosts
files bind mounted from another container are accessible to root in
the userns.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This came out of a conversation with Valentin about
systemd-managed Podman. He discovered that unit files did not
properly handle cases where Conmon was dead - the ExecStopPost
`podman rm --force` line was not actually removing the container,
but interestingly, adding a `podman cleanup --rm` line would
remove it. Both of these commands do the same thing (minus the
`podman cleanup --rm` command not force-removing running
containers).
Without a running Conmon instance, the container process is still
running (assuming you killed Conmon with SIGKILL and it had no
chance to kill the container it managed), but you can still kill
the container itself with `podman stop` - Conmon is not involved,
only the OCI Runtime. (`podman rm --force` and `podman stop` use
the same code to kill the container). The problem comes when we
want to get the container's exit code - we expect Conmon to make
us an exit file, which it's obviously not going to do, being
dead. The first `podman rm` would fail because of this, but
importantly, it would (after failing to retrieve the exit code
correctly) set container status to Exited, so that the second
`podman cleanup` process would succeed.
To make sure the first `podman rm --force` succeeds, we need to
catch the case where Conmon is already dead, and instead of
waiting for an exit file that will never come, immediately set
the Stopped state and remove an error that can be caught and
handled.
Signed-off-by: Matthew Heon <mheon@redhat.com>
When we moved to the new Namespace types in Specgen, we made a
distinction between taking a namespace from a pod, and taking it
from another container. Due to this new distinction, some code
that previously worked for both `--pod=$ID` and
`--uts=container:$ID` has accidentally become conditional on only
the latter case. This happened for Hostname - we weren't properly
setting it in cases where the container joined a pod.
Fortunately, this is an easy fix once we know to check the
condition.
Also, ensure that `podman pod inspect` actually prints hostname.
Fixes#6494
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
this is step 1 to self-discovery of remote ssh connections. we add a remotesocket struct to info to detect what the socket path might be.
Co-authored-by: Jhon Honce <jhonce@redhat.com>
Signed-off-by: Brent Baude <bbaude@redhat.com>
Podman containers can specify that they get their network
namespace from another container. This is automatic in pods, but
any container can do it.
The problem is that these containers are not guaranteed to have a
network namespace of their own; it is perfectly valid to join the
network namespace of a --net=host container, and both containers
will end up in the host namespace. The code for obtaining network
stats did not account for this, and could cause segfaults as a
result. Fortunately, the fix is simple - the function we use to
get said stats already performs appropriate checks, so we just
need to recursively call it.
Fixes#5652
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The biggest obstacle here was cleanup - we needed a way to remove
detached exec sessions after they exited, but there's no way to
tell if an exec session will be attached or detached when it's
created, and that's when we must add the exit command that would
do the removal. The solution was adding a delay to the exit
command (5 minutes), which gives sufficient time for attached
exec sessions to retrieve the exit code of the session after it
exits, but still guarantees that they will be removed, even for
detached sessions. This requires Conmon 2.0.17, which has the new
`--exit-delay` flag.
As part of the exit command rework, we can drop the hack we were
using to clean up exec sessions (remove them as part of inspect).
This is a lot cleaner, and I'm a lot happier about it.
Otherwise, this is just plumbing - we need a bindings call for
detached exec, and that needed to be added to the tunnel mode
backend for entities.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
* Support the `X-Registry-Auth` http-request header.
* The content of the header is a base64 encoded JSON payload which can
either be a single auth config or a map of auth configs (user+pw or
token) with the corresponding registries being the keys. Vanilla
Docker, projectatomic Docker and the bindings are transparantly
supported.
* Add a hidden `--registries-conf` flag. Buildah exposes the same
flag, mostly for testing purposes.
* Do all credential parsing in the client (i.e., `cmd/podman`) pass
the username and password in the backend instead of unparsed
credentials.
* Add a `pkg/auth` which handles most of the heavy lifting.
* Go through the authentication-handling code of most commands, bindings
and endpoints. Migrate them to the new code and fix issues as seen.
A final evaluation and more tests is still required *after* this
change.
* The manifest-push endpoint is missing certain parameters and should
use the ABI function instead. Adding auth-support isn't really
possible without these parts working.
* The container commands and endpoints (i.e., create and run) have not
been changed yet. The APIs don't yet account for the authfile.
* Add authentication tests to `pkg/bindings`.
Fixes: #6384
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Currently we are displaying the Seconds since EPOCH
this will change to displaying date, similar to `podman version`
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
By moving a couple of variables from libpod/libpod to libpod/libpod/define
I am able shrink the podman-remote-* executables by another megabyte.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The cleanup command creation logic is made public as part of this
and wired such that we can call it both within SpecGen (to make
container exit commands) and from the ABI detached exec handler.
Exit commands are presently only used for detached exec, but
theoretically could be turned on for all exec sessions if we
wanted (I'm declining to do this because of potential overhead).
I also forgot to copy the exit command from the exec config into
the ExecOptions struct used by the OCI runtime, so it was not
being added.
There are also two significant bugfixes for exec in here. One is
for updating the status of running exec sessions - this was
always failing as I had coded it to remove the exit file *before*
reading it, instead of after (oops). The second was that removing
a running exec session would always fail because I inverted the
check to see if it was running.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We need to be able to use cleanup processes to remove exec
sessions as part of detached exec. This PR adds that ability. A
new flag is added to `podman container cleanup`, `--exec`, to
specify an exec session to be cleaned up.
As part of this, ensure that `ExecCleanup` can clean up exec
sessions that were running, but have since exited. This ensures
that we can come back to an exec session that was running but has
since stopped, and clean it up.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
As part of the massive exec rework, I stubbed out a function for
non-detached exec, which is implemented here. It's largely
similar to the existing exec functions, but missing a few pieces.
This also involves implemented a new OCI runtime call for
detached exec. Again, very similar to the other functions, but
with a few missing pieces.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
These are required for detached exec, where they will be used to
clean up and remove exec sessions when they exit.
As part of this, move all Exec related functionality for the
Conmon OCI runtime into a separate file; the existing one was
around 2000 lines.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
specifying `-n=ctr-name` tells conmon to log CONTAINER_NAME=name if the log driver is journald
add this, and a test!
also, refactor the args slice creation to not append() unnecessarily.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
In FIPS Mode we expect to work off of the Mountpath not the Rundir path.
This is causing FIPS Mode checks to fail.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
During the initial workup of HTTP exec, I duplicated most of the
existing exec handling code so I could work on it without
breaking normal exec (and compare what I was doing to the nroaml
version). Now that it's done and working, we can switch over to
the refactored version and ditch the original, removing a lot of
duplicated code.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
The usual flow for exec is going to be:
- Create exec session
- Start and attach to exec session
- Exec session exits, attach session terminates
- Client does an exec inspect to pick up exit code
The safest point to remove the exec session, without doing any
database changes to track stale sessions, is to remove during the
last part of this - the single inspect after the exec session
exits.
This is definitely different from Docker (which would retain the
exec session for up to 10 minutes after it exits, where we will
immediately discard) but should be close enough to be not
noticeable in regular usage.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
With APIv2, we cannot guarantee that exec sessions will be
removed cleanly on exit (Docker does not include an API for
removing exec sessions, instead using a timer-based reaper which
we cannot easily replicate). This is part 1 of a 2-part approach
to providing a solution to this. This ensures that exec sessions
will be reaped, at the very least, on container restart, which
takes care of any that were not properly removed during the run
of a container.
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
If not overridden, we should use the attach configuration given
when the exec session was first created.
Also, setting streams should not conflict with a TTY - the two
are allowed together with Attach and should be allowed together
here.
Signed-off-by: Matthew Heon <mheon@redhat.com>
This is heavily based off the existing exec implementation, but
does not presently share code with it, to try and ensure we don't
break anything.
Still to do:
- Add code sharing with existing exec implementation
- Wire in the frontend (exec HTTP endpoint)
- Move all exec-related code in oci_conmon_linux.go into a new
file
- Investigate code sharing between HTTP attach and HTTP exec.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Cleaning up the OCI runtime is not allowed in the Removing state.
To ensure it is actually cleaned up, when calling cleanup() as
part of removing a container, do so before we set the Removing
state, so we can successfully remove.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Theoretically these should never happen, but it never hurts to be
sure and check. Add a check to one, make the other one a
create-if-not-exist (it was just adding, not checking the
contents).
Signed-off-by: Matthew Heon <mheon@redhat.com>
Some runtimes (e.g. Kata containers) seem to object to having us
unmount storage before the container is removed from the runtime.
This is an easy fix (change the order of operations in cleanup)
and seems to make more sense than the way we were doing things.
Signed-off-by: Matthew Heon <mheon@redhat.com>
* Add ErrLostSync to report lost of sync when de-mux'ing stream
* Add logus.SetLevel(logrus.DebugLevel) when `go test -v` given
* Add context to debugging messages
Signed-off-by: Jhon Honce <jhonce@redhat.com>
We’re now able to build a static podman binary based on a custom nix
derivation. This is integrated in cirrus as well, whereas a later target
would be to provide a self-contained static binary bundle which can be
installed on any Linux x64-bit system.
Fixes: https://github.com/containers/libpod/issues/1399
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
I realized that setting NetworkMode to private when we are making
a network namespace but not configuring it with CNI or Slirp is
wrong; that's considered `--net=none` not `--net=private`. At the
same time, realized that we actually store whether Slirp is in
use, so we can be more specific than just "default" and instead
say slirp4netns or bridge.
Signed-off-by: Matthew Heon <mheon@redhat.com>
This one was a massive pain to track down.
The original symptom was an error message from rootless Podman
trying to make a container in a pod. I unfortunately did not look
at the error message closely enough to realize that the namespace
in question was the cgroup namespace (the reproducer pod was
explicitly set to only share the network namespace), else this
would have been quite a bit shorter.
I spent considerable effort trying to track down differences
between the inspect output of the two containers, and when that
failed I was forced to resort to diffing the OCI specs. That
finally proved fruitful, and I was able to determine what should
have been obvious all along: the container was joining the cgroup
namespace of the infra container when it really ought not to
have.
From there, I discovered a variable collision in pod config. The
UsePodCgroup variable means "create a parent cgroup for the pod
and join containers in the pod to it". Unfortunately, it is very
similar to UsePodUTS, UsePodNet, etc, which mean "the pod shares
this namespace", so an accessor was accidentally added for it
that indicated the pod shared the cgroup namespace when it really
did not. Once I realized that, it was a quick fix - add a bool to
the pod's configuration to indicate whether the cgroup ns was
shared (distinct from UsePodCgroup) and use that for the
accessor.
Also included are fixes for `podman inspect` and
`podman pod inspect` that fix them to actually display the state
of the cgroup namespace (for container inspect) and what
namespaces are shared (for pod inspect). Either of those would
have made tracking this down considerably quicker.
Fixes#6149
Signed-off-by: Matthew Heon <mheon@redhat.com>
If the first time you run podman in a user account you do a
su - USER, and the second time, you run as the logged in USER
podman fails, because it is not handling the tmpdir definition
in the database. This PR fixes this problem.
vendor containers/common v0.11.1
This should fix a couple of issues we have seen in podman 1.9.1
with handling of libpod.conf.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add the `podman generate kube` and `podman play kube` command. The code
has largely been copied from Podman v1 but restructured to not leak the
K8s core API into the (remote) client.
Both commands are added in the same commit to allow for enabling the
tests at the same time.
Move some exports from `cmd/podman/common` to the appropriate places in
the backend to avoid circular dependencies.
Move definitions of label annotations to `libpod/define` and set the
security-opt labels in the frontend to make kube tests pass.
Implement rest endpoints, bindings and the tunnel interface.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>