Commit Graph

3796 Commits

Author SHA1 Message Date
Paul Holzinger 95557a532e
libpod: do not Cleanup() more than once
If the container was already cleaned up we should not try to do it
again. Podman stop will always try to call Cleanup() if you look at the
podman event log and just keep calling podman stop --all you see a
cleanup event every time. This is not wanted. Also in case of the host
pidns we report a error every single time, see the linked issue.

Fixes #18460

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-05-04 13:53:40 +02:00
Paul Holzinger 19aabf440e
remote: exec inspect update exec session status
The remote API will wait 300s by default before conmon will call the
cleanup. In the meantime when you inspect an exec session started with
ExecStart() (so not attached) and it did exit we do not know that. If
a caller inspects it they think it is still running. To prevent this we
should sync the session based on the exec pid and update the state
accordingly.

For a reproducer see the test in this commit or the issue.

Fixes #18424

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-05-03 14:54:00 +02:00
Urvashi Mohnani fa1ba17bc1 Update kube gen & play to use pod restart policy
Podman kube generate now uses the pod's restart policy
when generating the kube yaml. If generating from containers
only, use the restart policy of the first non-init container.
Podman kube play applies the pod restart policy from the yaml
file to the pod. The containers within a pod inherit this restart
policy.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2023-05-02 10:30:07 -04:00
Urvashi Mohnani 0fef113a4b Add {{.Restarts}} to podman ps
Add Restarts column to the podman ps output to show how many times a
container was restarted based on its restart policy. This column will be
displayed when --format={{.Restarts}}.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2023-05-02 10:30:07 -04:00
Urvashi Mohnani edbeee5238 Add --restart flag to pod create
Add --restart flag to pod create to allow users to set the
restart policy for the pod, which applies to all the containers
in the pod. This reuses the restart policy already there for
containers and has the same restart policy options.
Add "never" to the restart policy options to match k8s syntax.
It is a synonym for "no" and does the exact same thing where the
containers are not restarted once exited.
Only the containers that have exited will be restarted based on the
restart policy, running containers will not be restarted when an exited
container is restarted in the same pod (same as is done in k8s).

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2023-05-02 10:29:58 -04:00
OpenShift Merge Robot 3f5f906903
Merge pull request #18376 from Luap99/swagger-info
[CI:DOCS] swagger: fix Info name conflict
2023-05-02 04:39:58 -04:00
Giuseppe Scrivano 70870895b7
libpod: improve errors management in cleanupStorage
fix some issues with the handling of errors, we print an error only
when there is already one set to be returned.  Also the first error is
not printed, since it is reported back to the caller of the function.

Improve some messages with more context that can be helpful when
things go wrong.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-04-28 11:51:06 +02:00
Giuseppe Scrivano 5592dc12f9
libpod: report unmount idmapped rootfs errors
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-04-28 11:46:34 +02:00
OpenShift Merge Robot 19152fa349
Merge pull request #18326 from cevich/f38_update
Cirrus: Update CI VM Image to F38
2023-04-27 12:42:56 -04:00
Paul Holzinger 0a92b399df
swagger: fix Info name conflict
go swagger has a flat namespace so it doesn't handle name conflicts at
all. The libpod info response uses the Info struct from some docker dep
instead. Because we cannot change the docker dependency simply rename
the Info struct, but only via swagger comment not the go actual struct.

I verified locally that this works.

Fixes #18228

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-27 14:44:18 +02:00
Sascha Grunert 512b39475b
Update c/common and avoid setting umask
We can now use the new API for creating files and directories without
setting the umask to allow parallel usage of those methods.

This patch also bumps c/common for that.

[NO NEW TESTS NEEDED]

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-04-27 08:59:44 +02:00
Chris Evich 221e3023f6
Fix rand.Seed() deprecation in golang 1.20
Ref: https://pkg.go.dev/math/rand@go1.20#Seed

Note: For `runtime_test.go`, this test-case was never actually doing
what appears as it's intent .  Fixing it to work as intended would be
require incredibly libpod-invasive changes.  Do the least-worse thing and
simply confirm that consecutive generated names are different.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-04-26 14:55:03 -04:00
Valentin Rothberg bbe9d61c49 sqlite: move first read into a transaction
According to an old upstream issue [1]: "If the first statement after
BEGIN DEFERRED is a SELECT, then a read transaction is started.
Subsequent write statements will upgrade the transaction to a write
transaction if possible, or return SQLITE_BUSY."

So let's move the first SELECT under the same transaction as the table
initialization.

[NO NEW TESTS NEEDED] as it's a hard to cause race.

[1] https://github.com/mattn/go-sqlite3/issues/274#issuecomment-1429054597

Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-04-25 16:01:49 +02:00
openshift-ci[bot] 9a750045ea
Merge pull request #18212 from rhatdan/docker
Specify format to buildah before commit
2023-04-20 17:12:29 +00:00
OpenShift Merge Robot f57020149a
Merge pull request #18267 from Luap99/always-stop
libpod: stop containers with --restart=always
2023-04-20 07:16:49 -04:00
Daniel J Walsh 81621ce8af
Specify format to buildah before commit
If user specifies commit --format, we were not setting it before
commit, this caused warning messages that made no sense to be
printed that made no sense.

Fixes: https://github.com/containers/podman/issues/17773

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-04-20 06:24:35 -04:00
Paul Holzinger edb64f8a76
libpod: stop containers with --restart=always
Commit 1ab833fb73 improved the situation but it is still not enough.
If you run short lived containers with --restart=always podman is
basically permanently restarting them. To only way to stop this is
podman stop. However podman stop does not do anything when the
container is already in a not running state. While this makes sense we
should still mark the container as explicitly stopped by the user.

Together with the change in shouldRestart() which now checks for
StoppedByUser this makes sure the cleanup process is not going to start
it back up again.

A simple reproducer is:
```
podman run --restart=always --name test -d alpine true
podman stop test
```
then check if the container is still running, the behavior is very
flaky, it took me like 20 podman stop tries before I finally hit the
correct window were it was stopped permanently.
With this patch it worked on the first try.

Fixes #18259

[NO NEW TESTS NEEDED] This is super flaky and hard to correctly test
in CI. MY ginkgo v2 work seems to trigger this in play kube tests so
that should catch at least some regressions. Also this may be something
that should be tested at podman test days by users (#17912).

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-20 11:23:05 +02:00
OpenShift Merge Robot 6a360eaab5
Merge pull request #18234 from containers/renovate/github.com-opencontainers-runtime-spec-1.x
fix(deps): update module github.com/opencontainers/runtime-spec to v1.1.0-rc.2
2023-04-20 05:17:40 -04:00
Paul Holzinger f2cec73486
libpod: fix TestPostDeleteHooks do not depend on version
It really doesn't make sense to match the version one to one,
this just requires us to update it every time manually.
Use a regex instead.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-19 15:29:52 +02:00
Paul Holzinger 7e4cd22acb
libpod: configureNetNS() tear down on errors
Make sure to tear down the netns again on errors. This is needed when a
later call fails and we do not have already stored the netns in the
container state.

[NO NEW TESTS NEEDED] My ginkgo-v2 PR will catch problem like this once
merged.

Fixes #18205

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-18 15:18:05 +02:00
Paul Holzinger 4f93a6eee4
libpod: rootlessNetNs.Cleanup() fix error message
The wrong error was logged.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-18 15:14:22 +02:00
Miloslav Trmač e9356ba206 Don't use bytes.NewBuffer to read data
The documentation says
> The new Buffer takes ownership of buf, and the
> caller should not use buf after this call.

so use the more directly applicable, and simpler, bytes.Reader instead, to avoid this potentially risky use.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2023-04-14 22:40:47 +02:00
Paul Holzinger bab95de9a2
rootless: make sure we only use a single pause process
Currently --tmpdir changes the location of the pause.pid file. this
causes issues because the c code in pkg/rootless does not know about
that. I tried to fix this[1] by fixing the c code to not use the
shortcut. While this fix worked it will result in many pause processes
leaking in the integrration tests.

Commit ab88632 added this behavior but following the disccusion it was
never the intention that we end up having more than one pause process.
The issues that was trying to fix was caused by somthing else AFAICT,
the main problem seems to be that the pause.pid file parent directory
may not be created when we try to create the pid file so it failed with
ENOENT. This patch fixes it by creating this directory always and revert
the change to no longer depend on the tmpdir value.

With this commit we now always use XDG_RUNTIME_DIR/libpod/tmp/pause.pid
for all podman processes. This allows the c shortcut to work reliably
and should therefore improve perfomance over my other approach.

A system test is added to ensure we see the right behavior and that
podman system migrate actually stops the pause process. Thanks to Ed
Santiago for the improved test to make it work for both `catatonit` and
`podman pause`.

This should fix the issues with namespace missmatches that we can see in
CI as flakes.

[1] https://github.com/containers/podman/pull/18057

Fixes #18057

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-11 10:57:46 +02:00
Daniel J Walsh c4e79fc169
Fix up codespell errors
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-04-08 00:53:38 -04:00
OpenShift Merge Robot 4880f6379f
Merge pull request #18076 from nalind/read-idmaps
libpod.storageService.CreateContainerStorage(): retrieve ID maps
2023-04-06 05:59:59 -04:00
Nalin Dahyabhai e4aad8f0f4 libpod.storageService.CreateContainerStorage(): retrieve ID maps
When creating storage for a container using ID maps, read the ID maps
that are assigned to the container from the returned container
structure, rather than from the options structure that we passed to the
storage library, which it previously modified in error.

[NO NEW TESTS NEEDED]

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
2023-04-05 17:58:30 -04:00
Urvashi Mohnani d0ffb87925 Fix invalid pod name and hostname during kube generate
Kube generate on pods was not checking for any underscores
in the pod name so was creating a kube yaml with an invalid
pod name when there were underscores present.
The hostname for the pod is set to the podname by default. There
is no need to set that to the container's name or the pod name
again in the generated yaml. So removed that field unless a hostname
was set for the container by the user.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2023-04-05 17:43:02 -04:00
OpenShift Merge Robot ac1d297fc7
Merge pull request #18024 from Luap99/netns-recover
rootless netns: recover from invalid netns
2023-04-04 15:02:30 -04:00
OpenShift Merge Robot 98933456bc
Merge pull request #17950 from umohnani8/deployments
Support Deployment generation with kube generate
2023-04-03 11:08:29 -04:00
Paul Holzinger 2051e54e01
rootless netns: recover from invalid netns
I made a change in c/common[1] to prevent duplicates in netns names.
This now causes problem in podman[2] where the rootless netns will no
longer work after the netns got invalid but the underlying path still
exists. AFAICT this happens when the podman pause process got killed and
we are now in a different user namespace.

While I do not know what causes this, this commit should make it at
least possible to recover from this situation automatically as it used
to be before[1].

the problem with that is that containers started before it will not be
able to talk to contianers started after this. A restart of the previous
container will fix it but this was also the case before.

[NO NEW TESTS NEEDED]

[1] https://github.com/containers/common/pull/1381
[2] https://github.com/containers/podman/issues/17903#issuecomment-1494169843

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-04-03 16:21:02 +02:00
Alexis Couvreur 21febcb5cf docs: add `starting` to `HealthCheckResults.Status`
Signed-off-by: Alexis Couvreur <alexiscouvreur.pro@gmail.com>
2023-04-02 02:02:11 -04:00
Giuseppe Scrivano 4d56292e7a
libpod: mount safely subpaths
add a function to securely mount a subpath inside a volume.  We cannot
trust that the subpath is safe since it is beneath a volume that could
be controlled by a separate container.  To avoid TOCTOU races between
when we check the subpath and when the OCI runtime mounts it, we open
the subpath, validate it, bind mount to a temporary directory and use
it instead of the original path.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-03-31 19:48:03 +02:00
Urvashi Mohnani 4f90194068 Support Deployment generation with kube generate
The podman kube generate command can now generate a
Deployment kind when the --ype flag is set to deployment.
By default, a Pod spec will be generated if --type flag is
not set.
Add --replicas flag to kube generate to allow users to set
the value of replicas in the generated yaml when generating a
Deployment kind.
Add e2e and minikube tests for this feature.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2023-03-31 13:34:38 -04:00
Valentin Rothberg f131eaa74a auto-update: stop+start instead of restart sytemd units
It turns out the restart is _not_ a stop+start but keeps certain
resources open and is subject to some timeouts that may differ across
distributions' default settings.

[NO NEW TESTS NEEDED] as I have absolutely no idea how to reliably cause
the failure/flake/race.

Also ignore ENOENTS of the CID file when removing a container which has
been identified of actually fixing #17607.

Fixes: #17607
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-29 11:31:35 +02:00
OpenShift Merge Robot 9369a3c336
Merge pull request #17963 from Luap99/slirp-dns-userns
fix slirp4netns resolv.conf ip with a userns
2023-03-28 21:57:03 +02:00
Paul Holzinger 81e5bffc32
fix slirp4netns resolv.conf ip with a userns
When a userns is set we setup the network after the bind mounts, at the
point where resolv.conf is generated we do not yet know the subnet.
Just like the other dns servers for bridge networks we need to add the
ip later in completeNetworkSetup()

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2182052

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2023-03-28 15:52:33 +02:00
Valentin Rothberg cdb5b3e990 sqlite: do not `Ping()` after connecting
`Ping()` requires the DB lock, so we had to move it into a transaction
to fix #17859. Since we try to access the DB directly afterwards, I
prefer to let that fail instead of paying the cost of a transaction
which would lock the DB for _all_ processes.

[NO NEW TESTS NEEDED] as it's a hard to reproduce race.

Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-28 11:27:43 +02:00
OpenShift Merge Robot 8bd9109fb8
Merge pull request #17917 from mheon/fix_17905
Ensure that SQLite state handles name-ID collisions
2023-03-27 07:48:37 -04:00
Matt Heon 7daab31f1f Ensure that SQLite state handles name-ID collisions
If a container with an ID starting with "db1" exists, and a
container named "db1" also exists, and they are different
containers - if I run `podman inspect db1` the container named
"db1" should be inspected, and there should not be an error that
multiple containers matched the name or id "db1". This was
already handled by BoltDB, and now is properly managed by SQLite.

Fixes #17905

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-24 15:09:25 -04:00
Matt Heon e061cb968c Fix a race around SQLite DB config validation
The DB config is a single-row table, and the first Podman process
to run against the database creates it. However, there was a race
where multiple Podman processes, started simultaneously, could
try and write it. Only the first would succeed, with subsequent
processes failing once (and then running correctly once re-ran),
but it was happening often in CI and deserves fixing.

[NO NEW TESTS NEEDED] It's a CI flake fix.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-23 19:48:27 -04:00
Valentin Rothberg b31d9e15f2 sqlite: do not use shared cache
SQLite developers consider it a misfeature [1], and after turning it on,
we saw a new set of flakes.  Let's turn it off and trust the developers
[1] that WAL mode is sufficient for our purposes.

Turning the shared cache off also makes the DB smaller and faster.

[NO NEW TESTS NEEDED]

[1] https://sqlite.org/forum/forumpost/1f291cdca4

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-22 15:44:38 +01:00
OpenShift Merge Robot 6b9f3140fa
Merge pull request #17874 from mheon/sqlite_fixes
Sqlite fixes
2023-03-22 08:13:29 -04:00
Daniel J Walsh 5f274e45f2
Run make codespell
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2023-03-21 16:00:54 -04:00
Matthew Heon 3925cd653b Drop SQLite max connections
The SQLite transaction lock Valentin found is (slightly) faster.
So let's go with that.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-21 14:20:34 -04:00
Valentin Rothberg 0fbc325156 sqlite: set connection attributes on open
The symptoms in #17859 indicate that setting the PRAGMAs in individual
EXECs outside of a transaction can lead to concurrency issues and
failures when the DB is locked.  Hence set all PRAGMAs when opening
the connection.  Move them into individual constants to improve
documentation and readability.

Further make transactions exclusive as #17859 also mentions an error
that the DB is locked during a transaction.

[NO NEW TESTS NEEDED] - existing tests cover the code.

Fixes: #17859
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>

<MH: Cherry-picked on top of my branch>

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-21 12:51:31 -04:00
Matthew Heon 9f0e0e8331 Fix database locked errors with SQLite
I was searching the SQLite docs for a fix, but apparently that
was the wrong place; it's a common enough error with the Go
frontend for SQLite that the fix is prominently listed in the API
docs for go-sqlite3. Setting cache mode to 'shared' and using a
maximum of 1 simultaneous open connection should fix.

Performance implications of this are unclear, but cache=shared
sounds like it will be a benefit, not a curse.

[NO NEW TESTS NEEDED] This fixes a flake with concurrent DB
access.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-21 09:57:56 -04:00
Valentin Rothberg 9563415430 fix --health-on-failure=restart in transient unit
As described in #17777, the `restart` on-failure action did not behave
correctly when the health check is being run by a transient systemd
unit.  It ran just fine when being executed outside such a unit, for
instance, manually or, as done in the system tests, in a scripted
fashion.

There were two issue causing the `restart` on-failure action to
misbehave:

1) The transient systemd units used the default `KillMode=cgroup` which
   will nuke all processes in the specific cgroup including the recently
   restarted container/conmon once the main `podman healthcheck run`
   process exits.

2) Podman attempted to remove the transient systemd unit and timer
   during restart.  That is perfectly fine when manually restarting the
   container but not when the restart itself is being executed inside
   such a transient unit.  Ultimately, Podman tried to shoot itself in
   the foot.

Fix both issues by moving the restart logic in the cleanup process.
Instead of restarting the container, the `healthcheck run` will just
stop the container and the cleanup process will restart the container
once it has turned unhealthy.

Fixes: #17777
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-20 13:56:00 +01:00
Matthew Heon 94f905a503 Fix SQLite DB schema migration code
It now can safely run on bare databases, before any tables are
created.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2023-03-17 13:24:53 -04:00
Matt Heon 6142c16a9c Ensure SQLite places uses the runroot in transient mode
Transient mode means the DB should not persist, so instead of
using the GraphRoot we should use the RunRoot instead.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-15 14:45:28 -04:00
Matt Heon 2ec11b16ab Fix various integration test issues with SQLite state
Two main changes:
- The transient state tests relied on BoltDB paths, change to
  make them agnostic
- The volume code in SQLite wasn't retrieving and setting the
  volume plugin for volumes that used one.

Signed-off-by: Matt Heon <mheon@redhat.com>
2023-03-15 14:45:18 -04:00