Quadlet was doing some custom handling of uid/gid remapping, originating
from pre --userns=auto support, including its own user for getting subuids
which kinda conflicts with the "container" user used for that.
This drops all the old support for id remapping in favour of a new set
of keys that more directly map to the podman run options.
We have essentially 3 modes now:
```
RemapUsers=manual
RemapUid=0:10000:10
RemapUid=10:20000:10
RemapGid=0:10000:10
RemapGid=10:20000:10
```
This maps to --uidmap and --gidmap options.
```
RemapUsers=auto
```
This maps to --userns=auto. But you can additionally specify RemapUid,
RemapGid and RemapUidSize which gets applied as options to the
--userns podman option.
```
RemapUsers=keep-id
```
This maps to --userns=keep-id and only works for user units.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Attempts to fix#16419
podman generate systemd --restart-sec pod
^now generates RestartSec= both in pod service file and in container service file.
podman generate systemd --restart-sec container
^now generates RestartSec= in container service file.
Signed-off-by: Veronika Fuxova <vfuxova@redhat.com>
This is much better for the systemd case becase we pass the journal
socket fds directly to the container. This means less copying of the
logs, but it also means the journal will correctly get the peer
process id when it tries to extract things like the name of what
is logging something.
With this we correctly name the logging process rather than claim
everything comes from conmon.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This makees much more sense for typical service loads, and can
easily be reverted by `ReadOnly=no`.
Also updates and adds various tests for this.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
The notify proxy has a watcher to check whether the container has left
the running state. In that case, Podman should stop waiting for the
ready message to prevent a dead lock. Fix this watcher but adding a
loop.
Fixes the dead lock in #16076 surfacing in a timeout. The underlying
issue persists though. Also use a timer in the select statement to
prevent the goroutine from running unnecessarily long
[NO NEW TESTS NEEDED]
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Truncate the container and pod ID files instead of throwing an error.
The main motivation is to prevent redundant work when starting systemd
units. Throwing an error when the file already exists is not preventing
races or file corruptions, so let's leave that to the user which in
almost all cases are generated (and tested) systemd units.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This way we don't have to use the `ExecCondition=podman volume exist`,
which saves one process start.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reduce the number of top-level packages in ./pkg by moving quadlet
packages under ./pkg/systemd.
[NO NEW TESTS NEEDED] - no functional change.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Starting listening for the READY messages on the sdnotify proxies before
starting the Pod. Otherwise, we may be missing messages.
[NO NEW TESTS NEEDED] as it's hard to test this very narrow race.
Related to but may not be fixing #16076.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Package `io/ioutil` was deprecated in golang 1.16, preventing podman from
building under Fedora 37. Fortunately, functionality identical
replacements are provided by the packages `io` and `os`. Replace all
usage of all `io/ioutil` symbols with appropriate substitutions
according to the golang docs.
Signed-off-by: Chris Evich <cevich@redhat.com>
The read deadline may yield the READY message to be lost in space.
Instead, use a more Go-idiomatic alternative by using two goroutines;
one reading from the connection, the other watching the container.
[NO NEW TESTS NEEDED] since existing tests are exercising this
functionality already.
Fixes: #15800
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This reverts commit c20abf12c7. In the
absence of `ExecStop` step, systemd will send the stop/kill signals to
the main PID while I asummed that systemd would jump directly to an
ExecStopPost step instead.
Hence revert the commit to let Podman take care of stopping rather than
systemd.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Drop the ExecStop step to simplify the generated units a bit.
The extra ExecStopPost step was added by commit e5c3432944. If the
main PID (i.e., conmon) is killed, systemd will not execute ExecStop
(since the main PID is already down) but only execute the *Post steps.
Credits to the late Ulrich Obergfell for tracking this issue down; he is
missed.
The ExecStop step can safely be dropped since the Post step will take of
stopping (and removing) in any case.
Context: #15686
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Two PRs have been merged causing a failure in one unit test.
Fix the unit test to turn CI green again.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
When creating a new pod without the `--name` flag, e.g.:
`podman pod create foobar`
it will get the name `foobar` implicitly and this will be recorded as the in the
`podCreateArgs`. Unfortunately, the implicit name only works if it appears as
the **last** argument of the startup command.
With 6e2e3a78ed we started appending the pod
security policy to the startCommand, resulting in the following `ExecStartPre=`
line:
```
/usr/bin/podman pod create --infra-conmon-pidfile %t/pod-foobar.pid --pod-id-file %t/pod-foobar.pod-id foobar --exit-policy=stop
```
This fails to launch, as the `pod create` command expects only a single
non-flag parameter, but it assumes that `exit-policy=stop` is a second and
terminates immediately instead.
This fixes https://github.com/containers/podman/issues/15592
Signed-off-by: Dan Čermák <dcermak@suse.com>
Change the dependencies from a pod unit to its associated container
units from `Requires` to `Wants` to prevent the entire pod from
transitioning to a failed state. Restart policies for individual
containers can be configured separately.
Also make sure that the pod's RunRoot is always set.
Fixes: #14546
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Emit a warning to the user when generating a unit with --new on a
container that was created with a custom --restart policy. As shown
in #15284, a custom --restart policy in that case can lead to issues
on system shutdown where systemd attempts to nuke the unit but Podman
keeps on restarting the container.
Fixes: #15284
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Podman adds an Error: to every error message. So starting an error
message with "error" ends up being reported to the user as
Error: error ...
This patch removes the stutter.
Also ioutil.ReadFile errors report the Path, so wrapping the err message
with the path causes a stutter.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Add auto-update support to `podman kube play`. Auto-update policies can
be configured for:
* the entire pod via the `io.containers.autoupdate` annotation
* a specific container via the `io.containers.autoupdate/$name` annotation
To make use of rollbacks, the `io.containers.sdnotify` policy should be
set to `container` such that the workload running _inside_ the container
can send the READY message via the NOTIFY_SOCKET once ready. For
further details on auto updates and rollbacks, please refer to the
specific article [1].
Since auto updates and rollbacks bases on Podman's systemd integration,
the k8s YAML must be executed in the `podman-kube@` systemd template.
For further details on how to run k8s YAML in systemd via Podman, please
refer to the specific article [2].
An examplary k8s YAML may look as follows:
```YAML
apiVersion: v1
kind: Pod
metadata:
annotations:
io.containers.autoupdate: "local"
io.containers.autoupdate/b: "registry"
labels:
app: test
name: test_pod
spec:
containers:
- command:
- top
image: alpine
name: a
- command:
- top
image: alpine
name: b
```
[1] https://www.redhat.com/sysadmin/podman-auto-updates-rollbacks
[2] https://www.redhat.com/sysadmin/kubernetes-workloads-podman-systemd
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Integrate sd-notify policies into `kube play`. The policies can be
configured for all contianers via the `io.containers.sdnotify`
annotation or for indidivual containers via the
`io.containers.sdnotify/$name` annotation.
The `kube play` process will wait for all containers to be ready by
waiting for the individual `READY=1` messages which are received via
the `pkg/systemd/notifyproxy` proxy mechanism.
Also update the simple "container" sd-notify test as it did not fully
test the expected behavior which became obvious when adding the new
tests.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Add a new package for proxying notify sockets and waiting for the
READY=1 message to appear. May subject to further changes in
future commits.
Tests make sure that it behaves properly.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
podman run/create can accept `-h <hostname>` as argument. When parsing
flags -h throws an help requested error from pflag. To prevent this
error we have to define the help flag.
Fixes#15124
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When a container was created with `--sdnotify value` we would remove
this arg instead of using it like with `--sdnotfiy=value`.
Also when the arg is set to ignore we should force conmon in order to
make the resulting Type=notify units work.
Fixes#15052
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
* Replace "setup", "lookup", "cleanup", "backup" with
"set up", "look up", "clean up", "back up"
when used as verbs. Replace also variations of those.
* Improve language in a few places.
Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
Unless specified in the create command of the pod, enforce the exit
policy to "stop". With "stop", a pod is stopped when the last container
exits and does not continue running. This behavior integrates much
better into systemd which is now able to tell whether the service
running as pod is actually running/active or not.
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The linter ensures a common code style.
- use switch/case instead of else if
- use if instead of switch/case for single case statement
- add space between comment and text
- detect the use of defer with os.Exit()
- use short form var += "..." instead of var = var + "..."
- detect problems with append()
```
newSlice := append(orgSlice, val)
```
This could lead to nasty bugs because the orgSlice will be changed in
place if it has enough capacity too hold the new elements. Thus we
newSlice might not be a copy.
Of course most of the changes are just cosmetic and do not cause any
logic errors but I think it is a good idea to enforce a common style.
This should help maintainability.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fixes: #13337
I added newline only on options IE Begin with "-"
[NO NEW TESTS NEEDED]
Signed-off-by: Abhijeet Kasurde <akasurde@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When podman gets an error it prints out "Error: " before
printing the error string. If the error message starts with
error, we end up with
Error: error ...
This PR Removes all of these stutters.
logrus.Error() also prints out that this is an error, so no need for the
error stutter.
[NO NEW TESTS NEEDED]
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
When podman generate systemd is invoked, it previously did not check if
container-prefix or pod-prefix are empty. When these are empty, the file name
starts with the separator, which is hyphen by default. This results in files
like '-containername.service'.
The code now checks if these prefixes are empty. If they are, the filename no
longer adds a separator. Instead, it uses name or ID of the container or pod.
Closes#13272
Signed-off-by: Nirmal Patel <npate012@gmail.com>
This commit includes:
* Handlers for generate systemd unit
with manually defined dependencies such as:
Wants=, After= and Requires=
* The new unit and e2e tests for checking generated systemd units
for container and pod with custom dependencies
* Documented descriptions for custom dependencies options
Signed-off-by: Eugene (Evgenii) Shubin <esendjer@gmail.com>
Replace `multi-user.target` with `default.target` across the code base.
It seems like the multi-user one is not available for (rootless) users
on F35 anymore is causing issues in all kinds of ways, for instance,
enabling the podman.service or generated systemd units.
Fixes: #12438
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add a new flag to set the start timeout for a generated systemd unit.
To make naming consistent, add a new --stop-timeout flag as well and let
the previous --time map to it.
Fixes: #11618
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Handle custom restart policies of containers when generating the unit
files; those should be set on the unit level and removed from ExecStart
flags.
Fixes: #11438
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
`generate systemd --new` is looking at the "create command" of the
container/pod which is simply the os.Args at creation time.
It does not work on containers or pods created via the REST API since
the create command is not set. `--new` does work on such containers and
pods since there is no reliable way to reverse-map their configs to
command-line arguments of podman.
Fixes: #11370
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Commit 9ac5267 changed the type of the generated systemd units from
`forking` to `notify`. It further stopped using `--cidfile` and instead
intended systemd to take care of stopping the container, which turned
out to be a bad idea.
Systemd will send the stop/kill signals to conmon which in turn may exit
non-zero, depending on the signal, and ultimately breaking container
cleanup.
Hence, we need to use --cidfile again and let podman stop and remove the
container to make sure that everything's in order.
Fixes: #11304
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
This reverts commit 70801b3d71.
It turns out that letting systemd handle stopping the container is not
working as I thought it will. Conmon is receiving the stop/kill signals
and may exit non-zero, which in turn lets the systemd service transition
into the `failed` state.
We need to get back to letting Podman stop the containers and do a
partial revert of commit 9ac5267 which removed using --cidfile.
Happening in a following commit.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Commit 9ac5267598 changed the type of the generated systemd units from
forking to notify. Parts of these changes was also removing the need to
pass any information via the file system (e.g., PIDFILE, container ID).
That in turn implies that systemd takes care of stopping the container.
By default, systemd first sends a SIGTERM and after a certain timeout,
it'll send a SIGKILL. That's pretty much what Podman is doing, unless
the container was created with a custom stop signal which is the case
when the --stop-signal flag was used or systemd is mounted.
Account for that by using systemd's KillSignal option which allows for
changing SIGTERM to another signal. Also make sure that we're using the
correct timeout for units generated with --new.
Fixes: #11304
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Add support for simple rollbacks during `podman auto-update`. Rollbacks
are enabled by default. If a systemd unit cannot be restarted after an
update, the previous image will be retagged and the unit will be
restarted a second time.
Add system tests for rollbacks. Also fix a bug in the restart sequence;
we have to use the channel to actually know whether the restart was
successful or not.
NOTE: To make rollbacks really useful, users must run their containers
with `--sdnotify=container` such that the containers send the ready
message over the (mounted) socket. This way, restarting the systemd
units during auto update will block until the message has been received
(or a timeout kicked in).
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Require the network to be online in all (generated) systemd units to
make sure that containers and Podman run only after the network has been
fully configured.
Fixes: #10655
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
We should not be exposing the store outside of Libpod. We want to
encapsulate it as an internal implementation detail - there's no
reason functions outside of Libpod should directly be
manipulating container storage. Convert the last use to invoke a
method on Libpod instead, and remove the function.
[NO TESTS NEEDED] as this is just a refactor.
Signed-off-by: Matthew Heon <mheon@redhat.com>
LISTEN_FDNAMES is optional, the docs for sd_listen_fds() says:
This information is read from the $LISTEN_FDNAMES variable, which
**may** contain a colon-separated list of names.
emphasis mine (indeed, the cited coreos code also suggests it is optional).
This actually results in bug, since the default
/contrib/systemd/system/podman.socket file doesn't set a
FileDescriptorName=. podman when run with this systemd configuration
*always* starts in unix socket mode since SocketActivated() will return
false because the name is missing.
The bug is a race with a very small window: between when podman does the
unlink() and when it re-binds the socket later in the code, requests made
during this time will fail since nothing is listening. There's another
small race when the service stops and systemd realizes it and starts
listening again.
However, small this window we managed to hit it :).
Let's fix this by ignoring LISTEN_FDNAMES. Since the code in
cmd/podman/system/service_abi.go:restService() ignores this value anyway
when setting up the socket activated stuff, there's no real loss here.
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Change the type of units generated with --new from "forking" to
"notify". This brings Podman closer to systemd and opens up
Podman to a number of use cases (see #5572).
Units generated without --new remain with `type=forking`. I
experimented a bit with adding a `--sdnotify` flag to `podman start` but
it doesn't really work well since we're competing with the default
sdnotify mode set during container creation.
Fixes: #5572
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Commit 748826fc88 fixed a bug where slow mounting of the runroot was
causing issues when the units are started at boot. The fix was to add
the container's runroot to the required mounts; the graph root has been
added as well.
Hard-coding the run- and graphroot to the required mounts, however,
breaks the portability of units generated with --now. Those units are
intended to be running on any machine as, theoreticaly, any user.
Make the mounts portable by using the `%t` macro for the run root.
Since the graphroot's location varies across root and ordinary users,
drop it from the list of required mounts. The graphroot was not causing
issues.
Fixes: #10493
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
The with --new generated systemd unit loses the environment variables
when the create command only contains the key without the value. Since
podman tries to lookup those values from the environment the unit can
fail.
This commits ensures that we will add the environment variables to the
unit file when this is the case. The container environment variables are
looked up in the container spec.
Fixes#10101
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
podman generate systemd --new inserts extra idfile arguments. The
generated unit can break when the user did provide their own idfile
arguments as they overwrite the arguments added by generate systemd.
This also happens when a user tries to generate the systemd unit on
a container already create with a --new unit. This should now
create a identical unit. The solution is to remove all user provided
idfile arguments.
This commit also ensures that we do not remove arguments that are part
off the containers entrypoint.
Fixes#9776
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
It is rare but possible that storage locations for the graphroot and the
runroot are not mounted at boot time, and therefore might race when
doing container operations. An example we've seen in the wild is that a
slow tmpfs mount for the runroot would suddenly mount over /run, causing
the container to lose all currently-running data, requiring a system
refresh to get it back.
This patch adds RequiresMountsFor= to the systemd.unit header to ensure
the paths for both the graphroot and runroot are mounted prior to
starting any generated unit files.
Signed-off-by: Robb Manes <rmanes@redhat.com>
Some packages used by the remote client imported the libpod package.
This is not wanted because it adds unnecessary bloat to the client and
also causes problems with platform specific code(linux only), see #9710.
The solution is to move the used functions/variables into extra packages
which do not import libpod.
This change shrinks the remote client size more than 6MB compared to the
current master.
[NO TESTS NEEDED]
I have no idea how to test this properly but with #9710 the cross
compile should fail.
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
We missed bumping the go module, so let's do it now :)
* Automated go code with github.com/sirkon/go-imports-rename
* Manually via `vgrep podman/v2` the rest
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
The unit generation accidentally escaped the %t in the pod id file path.
This is a regression caused by #9178. This was not caught by the tests
because the test itself was wrong. It used a full path instead of the
systemd variable %t like the actual code does.
Fixes#9373
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
In a systemd unit dollar and percent signs are used for variables. A backslash
is used for escape sequences. If any of these characters are used in the create
command we have to properly escape them so systemd does not try to interpret them.
Fixes#9176
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
If the container create command contains an argument with double
curly braces the golang template parsing can fail since it tries
to interpret the value as variable. To fix this change the default
delimiter for the internal template to `{{{{`.
Fixes#9034
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
First, use the pflag library to parse the flags. With this we can
handle all corner cases such as -td or --detach=false.
Second, preserve the root args with --new. They are used for all podman
commands in the unit file. (e.g. podman --root /tmp run alpine)
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Systemd is now complaining or mentioning /var/run as a legacy directory.
It has been many years where /var/run is a symlink to /run on all
most distributions, make the change to the default.
Partial fix for https://github.com/containers/podman/issues/8369
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
`KillMode=none` has been deprecated in systemd and is now throwing big
warnings when being used. Users have reported the issues upstream
(see #8615) and on the mailing list.
This deprecation was mainly motivated by an abusive use of third-party
vendors causing all kinds of undesired side-effects. For instance, busy
mounts that delay reboot.
After talking to the systemd team, we came up with the following plan:
**Short term**: we can use TimeoutStopSec and remove KillMode=none which
will default to cgroup.
**Long term**: we want to change the type to sdnotify. The plumbing for
Podman is done but we need it for conmon. Once sdnotify is working, we
can get rid of the pidfile handling etc. and let Podman handle it.
Michal Seklatar came up with a nice idea that Podman increase the time
out on demand. That's a much cleaner way than hard-coding the time out
in the unit as suggest in the short-term solution.
This change is executing the short-term plan and sets a minimum timeout
of 60 seconds. User-specified timeouts are added to that.
Fixes: #8615
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Our users are missing certain warning messages that would
make debugging issues with Podman easier.
For example if you do a podman build with a Containerfile
that contains the SHELL directive, the Derective is silently
ignored.
If you run with the log-level warn you get a warning message explainging
what happened.
$ podman build --no-cache -f /tmp/Containerfile1 /tmp/
STEP 1: FROM ubi8
STEP 2: SHELL ["/bin/bash", "-c"]
STEP 3: COMMIT
--> 7a207be102a
7a207be102aa8993eceb32802e6ceb9d2603ceed9dee0fee341df63e6300882e
$ podman --log-level=warn build --no-cache -f /tmp/Containerfile1 /tmp/
STEP 1: FROM ubi8
STEP 2: SHELL ["/bin/bash", "-c"]
STEP 3: COMMIT
WARN[0000] SHELL is not supported for OCI image format, [/bin/bash -c] will be ignored. Must use `docker` format
--> 7bd96fd25b9
7bd96fd25b9f755d8a045e31187e406cf889dcf3799357ec906e90767613e95f
These messages will no longer be lost, when we default to WARNing level.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Allow automatic generation for shell completion scripts
with the internal cobra functions (requires v1.0.0+).
This should replace the handwritten completion scripts
and even adds support for fish. With this approach it is
less likley that completions and code are out of sync.
We can now create the scripts with
- podman completion bash
- podman completion zsh
- podman completion fish
To test the completion run:
source <(podman completion bash)
The same works for podman-remote and podman --remote and
it will complete your remote containers/images with
the correct endpoints values from --url/--connection.
The completion logic is written in go and provided by the
cobra library. The completion functions lives in
`cmd/podman/completion/completion.go`.
The unit test at cmd/podman/shell_completion_test.go checks
if each command and flag has an autocompletion function set.
This prevents that commands and flags have no shell completion set.
This commit does not replace the current autocompletion scripts.
Closes#6440
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>