Commit Graph

41 Commits

Author SHA1 Message Date
Valentin Rothberg 9563415430 fix --health-on-failure=restart in transient unit
As described in #17777, the `restart` on-failure action did not behave
correctly when the health check is being run by a transient systemd
unit.  It ran just fine when being executed outside such a unit, for
instance, manually or, as done in the system tests, in a scripted
fashion.

There were two issue causing the `restart` on-failure action to
misbehave:

1) The transient systemd units used the default `KillMode=cgroup` which
   will nuke all processes in the specific cgroup including the recently
   restarted container/conmon once the main `podman healthcheck run`
   process exits.

2) Podman attempted to remove the transient systemd unit and timer
   during restart.  That is perfectly fine when manually restarting the
   container but not when the restart itself is being executed inside
   such a transient unit.  Ultimately, Podman tried to shoot itself in
   the foot.

Fix both issues by moving the restart logic in the cleanup process.
Instead of restarting the container, the `healthcheck run` will just
stop the container and the cleanup process will restart the container
once it has turned unhealthy.

Fixes: #17777
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-03-20 13:56:00 +01:00
Valentin Rothberg 9d1c153cfc ps: query health check in batch mode
Also do not return (and immediately suppress) an error if no health
check is defined for a given container.

Makes listing 100 containers around 10 percent faster.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2023-01-25 11:24:18 +01:00
Paul Holzinger 28774f18c5
disable healthchecks automatically on non systemd systems
The podman healthchecks are implemented using systemd timers, this works
great but it will never work on non systemd distros. Currently the logic
always assumes systemd is available and will fail with an error, so users
are forced to always run with `--no-healthcheck` to disable healthchecks
that are defined in an image for example. This is annoying and IMO
unnecessary, we should just default to no healthcheck on these systems.

First, use the systemd build tag to disable it at build time if this tag
is not used.
Second, use make sure systemd is used as init before trying
to use healthchecks. This could be the case when we are run in a container.

[NO NEW TESTS NEEDED] We do not have any non systemd VMs in CI AFAIK.

Fixes #16644

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2022-12-05 20:58:30 +01:00
Matthew Heon d16129330d Add support for startup healthchecks
Startup healthchecks are similar to K8S startup probes, in that
they are a separate check from the regular healthcheck that runs
before it. If the startup healthcheck fails repeatedly, the
associated container is restarted.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2022-11-28 13:30:29 -05:00
Valentin Rothberg 02040089a6 health checks: make on-failure action retry aware
Make sure that the on-failure actions only kick in once the health check
has passed its retries.  Also fix race conditions on reading/writing the
log.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2022-10-07 15:43:48 +02:00
Chris Evich d968f3fe09
Replace deprecated ioutil
Package `io/ioutil` was deprecated in golang 1.16, preventing podman from
building under Fedora 37.  Fortunately, functionality identical
replacements are provided by the packages `io` and `os`.  Replace all
usage of all `io/ioutil` symbols with appropriate substitutions
according to the golang docs.

Signed-off-by: Chris Evich <cevich@redhat.com>
2022-09-20 15:34:27 -04:00
Valentin Rothberg aad29e759c health check: add on-failure actions
For systems that have extreme robustness requirements (edge devices,
particularly those in difficult to access environments), it is important
that applications continue running in all circumstances. When the
application fails, Podman must restart it automatically to provide this
robustness. Otherwise, these devices may require customer IT to
physically gain access to restart, which can be prohibitively difficult.

Add a new `--on-failure` flag that supports four actions:

- **none**: Take no action.

- **kill**: Kill the container.

- **restart**: Restart the container.  Do not combine the `restart`
               action with the `--restart` flag.  When running inside of
               a systemd unit, consider using the `kill` or `stop`
               action instead to make use of systemd's restart policy.

- **stop**: Stop the container.

To remain backwards compatible, **none** is the default action.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
2022-09-09 13:02:05 +02:00
Sascha Grunert 251d91699d
libpod: switch to golang native error wrapping
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.

[NO NEW TESTS NEEDED]

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-07-05 16:06:32 +02:00
Jake Correnti 5633ef1d15 Docker-compose disable healthcheck properly handled
Previously, if a container had healthchecks disabled in the
docker-compose.yml file and the user did a `podman inspect <container>`,
they would have an incorrect output:

```
"Healthcheck":{
   "Test":[
      "CMD-SHELL",
      "NONE"
   ],
   "Interval":30000000000,
   "Timeout":30000000000,
   "Retries":3
}
```

After a quick change, the correct output is now the result:
```
"Healthcheck":{
   "Test":[
      "NONE"
   ]
}
```

Additionally, I extracted the hard-coded strings that were used for
comparisons into constants in `libpod/define` to prevent a similar issue
from recurring.

Closes: #14493

Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
2022-07-05 08:02:22 -04:00
openshift-ci[bot] 278afae1de
Merge pull request #14705 from jakecorrenti/show-health-status-event
Show Health Status events
2022-06-27 17:49:27 +00:00
Jake Correnti 0c1a3b70f5 Show Health Status events
Previously, health status events were not being generated at all. Both
the API and `podman events` will generate health_status events.

```
{"status":"health_status","id":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","from":"localhost/healthcheck-demo:latest","Type":"container","Action":"health_status","Actor":{"ID":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","Attributes":{"containerExitCode":"0","image":"localhost/healthcheck-demo:latest","io.buildah.version":"1.26.1","maintainer":"NGINX Docker Maintainers \u003cdocker-maint@nginx.com\u003e","name":"healthcheck-demo"}},"scope":"local","time":1656082205,"timeNano":1656082205882271276,"HealthStatus":"healthy"}
```
```
2022-06-24 11:06:04.886238493 -0400 EDT container health_status ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63 (image=localhost/healthcheck-demo:latest, name=healthcheck-demo, health_status=healthy, io.buildah.version=1.26.1, maintainer=NGINX Docker Maintainers <docker-maint@nginx.com>)
```

Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
2022-06-27 10:44:53 -04:00
Erik Sjölund aa4279ae15 Fix spelling "setup" -> "set up" and similar
* Replace "setup", "lookup", "cleanup", "backup" with
  "set up", "look up", "clean up", "back up"
  when used as verbs. Replace also variations of those.

* Improve language in a few places.

Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
2022-06-22 18:39:21 +02:00
Aditya R 4f77331c9d
healthcheck, libpod: Read healthcheck event output from os pipe
It seems we are ignoring output from healthcheck session.
Open a valid pipe to healthcheck session in order read its output.

Use common pipe for both `stdout/stderr` since that was the previous
behviour as well.

Signed-off-by: Aditya R <arajan@redhat.com>
2022-02-04 21:15:03 +05:30
Valentin Rothberg bd09b7aa79 bump go module to version 4
Automated for .go files via gomove [1]:
`gomove github.com/containers/podman/v3 github.com/containers/podman/v4`

Remaining files via vgrep [2]:
`vgrep github.com/containers/podman/v3`

[1] https://github.com/KSubedi/gomove
[2] https://github.com/vrothberg/vgrep

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2022-01-18 12:47:07 +01:00
Paul Holzinger db44addf97
sync container state before reading the healthcheck
The health check result is stored in the container state. Since the
state can change or might not even be set we have to retrive the current
state before we try to read the health check result.

Fixes #11687

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-09-22 17:40:16 +02:00
Valentin Rothberg 5dded6fae7 bump go module to v3
We missed bumping the go module, so let's do it now :)

* Automated go code with github.com/sirkon/go-imports-rename
* Manually via `vgrep podman/v2` the rest

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-02-22 09:03:51 +01:00
Paul Holzinger 69ab67bf90 Enable golint linter
Use the golint linter and fix the reported problems.

[NO TESTS NEEDED]

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
2021-02-11 23:01:49 +01:00
Daniel J Walsh 831d7fb0d7
Stop excessive wrapping of errors
Most of the builtin golang functions like os.Stat and
os.Open report errors including the file system object
path. We should not wrap these errors and put the file path
in a second time, causing stuttering of errors when they
get presented to the user.

This patch tries to cleanup a bunch of these errors.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-10-30 05:34:04 -04:00
Daniel J Walsh a5e37ad280
Switch all references to github.com/containers/libpod -> podman
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-07-28 08:23:45 -04:00
Matthew Heon 90e547ec1a Do not print an error message on non-0 exec exit code
This was added with an earlier exec rework, and honestly is very
confusing. Podman is printing an error message, but the error had
nothing to do with Podman; it was the executable we ran inside
the container that errored, and per `podman run` convention we
should set the Podman exit code to the process's exit code and
print no error.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2020-07-21 13:28:40 -04:00
Valentin Rothberg 8489dc4345 move go module to v2
With the advent of Podman 2.0.0 we crossed the magical barrier of go
modules.  While we were able to continue importing all packages inside
of the project, the project could not be vendored anymore from the
outside.

Move the go module to new major version and change all imports to
`github.com/containers/libpod/v2`.  The renaming of the imports
was done via `gomove` [1].

[1] https://github.com/KSubedi/gomove

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-07-06 15:50:12 +02:00
Brent Baude 141b34f6be Fix remote integration for healthchecks
the one remaining test that is still skipped do to missing exec function

Signed-off-by: Brent Baude <bbaude@redhat.com>
2020-05-20 14:43:01 -05:00
Brent Baude fadd011a80 separate healthcheck and container log paths
instead of using the container log path to derive where to put the healthchecks, we now put them into the rundir to avoid collision of health check log files when the log path is set by user.

Fixes: #5915

Signed-off-by: Brent Baude <bbaude@redhat.com>
2020-04-27 16:05:48 -05:00
Brent Baude 4d895dcb54 v2podman attach and exec
add the ability to attach to a running container.  the tunnel side of this is not enabled yet as we have work on the endpoints and plumbing to do yet.

add the ability to exec a command in a running container.  the tunnel side is also being deferred for same reason.

Signed-off-by: Brent Baude <bbaude@redhat.com>
2020-04-05 15:54:51 -05:00
Brent Baude 2fa78938a9 podmanv2 container inspect
add ability to inspect a container

Signed-off-by: Brent Baude <bbaude@redhat.com>
2020-03-26 15:54:26 -05:00
Matthew Heon 118e78c5d6 Add structure for new exec session tracking to DB
As part of the rework of exec sessions, we need to address them
independently of containers. In the new API, we need to be able
to fetch them by their ID, regardless of what container they are
associated with. Unfortunately, our existing exec sessions are
tied to individual containers; there's no way to tell what
container a session belongs to and retrieve it without getting
every exec session for every container.

This adds a pointer to the container an exec session is
associated with to the database. The sessions themselves are
still stored in the container.

Exec-related APIs have been restructured to work with the new
database representation. The originally monolithic API has been
split into a number of smaller calls to allow more fine-grained
control of lifecycle. Support for legacy exec sessions has been
retained, but in a deprecated fashion; we should remove this in
a few releases.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-03-18 11:02:14 -04:00
Valentin Rothberg 67165b7675 make lint: enable gocritic
`gocritic` is a powerful linter that helps in preventing certain kinds
of errors as well as enforcing a coding style.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-01-13 14:27:02 +01:00
Dmitry Smirnov 8d928d525f codespell: spelling corrections
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
2019-11-13 08:15:00 +11:00
Peter Hunt 1df4dba0a0 Switch to bufio Reader for exec streams
There were many situations that made exec act funky with input. pipes didn't work as expected, as well as sending input before the shell opened.
Thinking about it, it seemed as though the issues were because of how os.Stdin buffers (it doesn't). Dropping this input had some weird consequences.
Instead, read from os.Stdin as bufio.Reader, allowing the input to buffer before passing it to the container.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-10-31 11:20:12 -04:00
Matthew Heon 6f630bc09b Move OCI runtime implementation behind an interface
For future work, we need multiple implementations of the OCI
runtime, not just a Conmon-wrapped runtime matching the runc CLI.

As part of this, do some refactoring on the interface for exec
(move to a struct, not a massive list of arguments). Also, add
'all' support to Kill and Stop (supported by runc and used a bit
internally for removing containers).

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-10-10 10:19:32 -04:00
Peter Hunt a1a79c08b7 Implement conmon exec
This includes:
	Implement exec -i and fix some typos in description of -i docs
	pass failed runtime status to caller
	Add resize handling for a terminal connection
	Customize exec systemd-cgroup slice
	fix healthcheck
	fix top
	add --detach-keys
	Implement podman-remote exec (jhonce)
	* Cleanup some orphaned code (jhonce)
	adapt remote exec for conmon exec (pehunt)
	Fix healthcheck and exec to match docs
		Introduce two new OCIRuntime errors to more comprehensively describe situations in which the runtime can error
		Use these different errors in branching for exit code in healthcheck and exec
	Set conmon to use new api version

Signed-off-by: Jhon Honce <jhonce@redhat.com>

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-07-22 15:57:23 -04:00
baude db826d5d75 golangci-lint round #3
this is the third round of preparing to use the golangci-lint on our
code base.

Signed-off-by: baude <bbaude@redhat.com>
2019-07-21 14:22:39 -05:00
Stefan Becker 5ed2de158f healthcheck: reject empty commands
An image with "HEALTHCHECK CMD ['']" is valid but as there is no command
defined the healthcheck will fail. Reject such a configuration.

Fixes #3507

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
2019-07-16 07:01:43 +03:00
Stefan Becker dd0ea08cef healthcheck: improve command list parser
- remove duplicate check, already called in HealthCheck()
- reject zero-length command list and empty command string as errorneous
- support all Docker command list keywords: NONE, CMD or CMD-SHELL
- use Docker default "/bin/sh -c" for CMD-SHELL

Fixes #3507

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
2019-07-16 07:01:43 +03:00
baude 8561b99644 libpod removal from main (phase 2)
this is phase 2 for the removal of libpod from main.

Signed-off-by: baude <bbaude@redhat.com>
2019-06-27 07:56:24 -05:00
Matthew Heon 1be345bd9d Begin to break up pkg/inspect
Let's put inspect structs where they're actually being used. We
originally made pkg/inspect to solve circular import issues.
There are no more circular import issues.

Image structs remain for now, I'm focusing on container inspect.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-06-03 15:54:53 -04:00
baude 0b6bb6a3d3 enable podman-remote on windows
build a podman-remote binary for windows that allows users to use the
remote client on windows and interact with podman on linux system.

Signed-off-by: baude <bbaude@redhat.com>
2019-04-30 15:28:39 -05:00
Ed Santiago a07b2c2c60 (minor): fix misspelled 'Healthcheck'
Signed-off-by: Ed Santiago <santiago@redhat.com>
2019-04-10 09:43:56 -06:00
baude bb69004b8c podman health check phase3
podman will not start a transient service and timer for healthchecks.
this handles the tracking of the timing for health checks.

added the 'started' status which represents the time that a container is
in its start-period.

the systemd timing can be disabled with an env variable of
DISABLE_HC_SYSTEMD="true".

added filter for ps where --filter health=[starting, healthy, unhealthy]
can now be used.

Signed-off-by: baude <bbaude@redhat.com>
2019-03-22 14:58:44 -05:00
baude 03716cf7f3 healtcheck phase 2
integration of healthcheck into create and run as well as inspect.
healthcheck enhancements are as follows:

* add the following options to create|run so that non-docker images can
define healthchecks at the container level.
  * --healthcheck-command
  * --healthcheck-retries
  * --healthcheck-interval
  * --healthcheck-start-period

* podman create|run --healthcheck-command=none disables healthcheck as
described by an image.
* the healthcheck itself and the healthcheck "history" can now be
observed in podman inspect
* added the wiring for healthcheck history which logs the health history
of the container, the current failed streak attempts, and log entries
for the last five attempts which themselves have start and stop times,
result, and a 500 character truncated (if needed) log of stderr/stdout.

The timings themselves are not implemented in this PR but will be in
future enablement (i.e. next).

Signed-off-by: baude <bbaude@redhat.com>
2019-03-12 14:29:18 -05:00
baude 598bde52d0 podman healthcheck run (phase 1)
Add the ability to manually run a container's healthcheck command.
This is only the first phase of implementing the healthcheck.
Subsequent pull requests will deal with the exposing the results and
history of healthchecks as well as the scheduling.

Signed-off-by: baude <bbaude@redhat.com>
2019-03-05 14:03:55 -06:00