podman

Commit Graph

Author	SHA1	Message	Date
Jan Rodák	499ea1168b	Fix: Ensure HealthCheck exec session terminates on timeout Previously, the HealthCheck exec session would not terminate on timeout, allowing the healthcheck to run indefinitely. Fixes: https://issues.redhat.com/browse/RHEL-86096 Signed-off-by: Jan Rodák <hony.com@seznam.cz>	2025-05-12 17:01:35 +02:00
Jan Rodák	fff42ac232	Fix HealthCheck log destination, count, and size defaults GoLang sets unset values to the default value of the type. This means that the destination of the log is an empty string and the count and size are set to 0. However, this means that size and count are unbounded, and this is not the default behavior. Fixes: https://github.com/containers/podman/issues/25473 Fixes: https://issues.redhat.com/browse/RHEL-83262 Signed-off-by: Jan Rodák <hony.com@seznam.cz>	2025-03-12 21:27:00 +01:00
Paul Holzinger	47a743bba2	report healthcheck start errors When starting a container consider healthcheck errors fatal. That way user know when systemd-run failed to setup the timer to run the healthcheck and we don't get into a state where the container is running but not the healthcheck. This also fixes the broken error reporting from the systemd-run exec, if the binary could not be run the output was just empty leaving the users with no idea what failed. Fixes #25034 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2025-03-04 16:48:50 +01:00
Jan Rodák	511d912685	Add stopped status for HealthCheck If the container is stopped and the ongoing HealthCheck has no chance to complete the check is evaluated as stopped. Fixes: https://issues.redhat.com/browse/RUN-2520 Fixes: https://github.com/containers/podman/issues/25276 Signed-off-by: Jan Rodák <hony.com@seznam.cz>	2025-03-03 17:09:30 +01:00
Jan Rodák	ad9839ac55	Run HealthCheck without creating and removing the ExecSession in the database Fixes: https://issues.redhat.com/browse/RHEL-69970 Signed-off-by: Jan Rodák <hony.com@seznam.cz>	2025-02-11 13:59:00 +01:00
Jan Rodák	a1249425bd	Configure HealthCheck with `podman update` New flags in a `podman update` can change the configuration of HealthCheck when the container is started, without having to restart or recreate the container. This can help determine why a given container suddenly started failing HealthCheck without interfering with the services it provides. For example, reconfigure HealthCheck to keep logs longer than the usual last X results, store logs to other destinations, etc. Fixes: https://issues.redhat.com/browse/RHEL-60561 Signed-off-by: Jan Rodák <hony.com@seznam.cz>	2024-11-19 19:44:14 +01:00
Paul Holzinger	6069cdda00	healthcheck: do not leak statup service The startup service is special because we have to transition from startup to the normal unit. And in order to do so we kill ourselves (as we are run as part of the service). This means we always exited 1 which causes systemd to keep us failure and not remove the transient unit unless "reset-failed" is called. As there is no process around to do that we cannot really do this, thus make us exit(0) which makes more sense. Of course we could try to reset-failed the unit later but the code for that seems more complicated than that. Add a new test from Ed that ensures we check for all healthcheck units not just the timer to avoid leaks. I slightly modified it to provide a better error on leaks. Fixes: `0bbef4b830` ("libpod: rework shutdown handler flow") Fixes: #24351 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-10-25 13:47:59 +02:00
Jan Rodák	de856dab99	Add --health-max-log-count, --health-max-log-size, --health-log-destination flags These flags can affect the output of the HealtCheck log. Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via `podman inspect`. It is also limited to the last five executions and the first 500 characters per execution. This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic `healthcheck service failed` record. - The `--health-log-destination` flag sets the destination of the HealthCheck log. - `none`: (default behavior) `HealthCheckResults` are stored in overlay containers. (For example: `$runroot/healthcheck.log`) - `directory`: creates a log file named `<container-ID>-healthcheck.log` with JSON `HealthCheckResults` in the specified directory. - `events_logger`: The log will be written with logging mechanism set by events_loggeri. It also saves the log to a default directory, for performance on a system with a large number of logs. - The `--health-max-log-count` flag sets the maximum number of attempts in the HealthCheck log file. - A value of `0` indicates an infinite number of attempts in the log file. - The default value is `5` attempts in the log file. - The `--health-max-log-size` flag sets the maximum length of the log stored. - A value of `0` indicates an infinite log length. - The default value is `500` log characters. Add --health-max-log-count flag Signed-off-by: Jan Rodák <hony.com@seznam.cz> Add --health-max-log-size flag Signed-off-by: Jan Rodák <hony.com@seznam.cz> Add --health-log-destination flag Signed-off-by: Jan Rodák <hony.com@seznam.cz>	2024-09-25 14:01:35 +02:00
Paul Holzinger	5e8884ab0d	libpod: correctly capture healthcheck output Using the scanner is just unnecessary complicated an buggy as it will not read the final line with a newline. There is also the problem that it happens in a separate goroutine so it could loose output if we read the array before the scanner was done. The API accepts a Writer so we can just directly use a bytes.Buffer which captures all output in memory without the need of another goroutine. This also means that now we always include the final newline in the output. I checked with docker and they do the same so this is good. Fixes #23332 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-07-19 15:16:55 +02:00
Paul Holzinger	e8ea1e7632	libpod: do not leak systemd hc startup unit timer This fixes a regression added in commit `4fd84190b8`, because the name was overwritten by the createTimer() timer call the removeTransientFiles() call removed the new timer and not the startup healthcheck. And then when the container was stopped we leaked it as the wrong unit name was in the state. A new test has been added to ensure the logic works and we never leak the system timers. Fixes #22884 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-06-04 18:03:46 +02:00
Paul Holzinger	2681ab23d1	libpod: getHealthCheckLog() remove unessesary check Checking if the file exists before opening it anyway is really pointless and needs a extra syscall and in theory is racy as the file might have been changed between the two calls. We can simply ignore the ENOENT error on the ReadFile call. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-04-19 17:32:55 +02:00
Paul Holzinger	2ae6d0d4dd	add containers.conf healthcheck_events support When the field is set to false we should never log healthcheck events. Fixes https://issues.redhat.com/browse/RHEL-18987 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-04-19 17:29:48 +02:00
Paul Holzinger	b026e1d635	libpod: make healthcheck events more efficient We already know the status of the healthcheck in the caller so calling healthCheckStatus() just make the event code sync the container state and reread the healthcheck file for no reason. It is much better to directly pass the status down to the event call. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2024-04-19 17:24:44 +02:00
Giuseppe Scrivano	5656ad40b1	libpod: use fileutils.(Le\|E)xists Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-04-19 09:52:14 +02:00
fanqiaojun	1600cfffa5	Fix some comments Signed-off-by: fanqiaojun <fanqiaojun@yeah.net>	2024-04-13 15:20:19 +08:00
Matt Heon	72f1617fac	Bump Go module to v5 Moving from Go module v4 to v5 prepares us for public releases. Move done using gomove [1] as with the v3 and v4 moves. [1] https://github.com/KSubedi/gomove Signed-off-by: Matt Heon <mheon@redhat.com>	2024-02-08 09:35:39 -05:00
Oleksandr Redko	2a2d0b0e18	chore: delete obsolete // +build lines Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2024-01-04 11:53:38 +02:00
Urvashi Mohnani	f35d1c1c25	Don't update health check status during initialDelaySeconds When InitialDelaySeconds in the kube yaml is set for a helthcheck, don't update the healthcheck status till those initial delay seconds are over. We were waiting to update for a failing healtcheck, but when the healthcheck was successful during the initial delay time, the status was being updated as healthy immediately. This is misleading to the users wondering why their healthcheck takes much longer to fail for a failing case while it is quick to succeed for a healthy case. It also doesn't match what the k8s InitialDelaySeconds does. This change is only for kube play, podman healthcheck run is unaffected. Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>	2023-11-29 08:37:39 -05:00
Paul Holzinger	bad25da92e	libpod: add !remote tag This should never be pulled into the remote client. Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2023-10-24 12:11:34 +02:00
Matt Heon	925794c6aa	Ensure HC events fire after logs are written HC events were firing as part of the `exec` call, before it had even been decided whether the HC succeeded or failed. As such, the status was not going to be correct any time there was a change (e.g. the first event after a container went healthy to unhealthy would still read healthy). Move the event into the actual Healthcheck function and throw it in a defer to make sure it happens at the very end, after logs are written. Ignores several conditions that did not log previously (container in question does not have a healthcheck, or an internal failure that should not really happen). Still not a perfect solution. This relies on the HC log being written, when instead we could just get the status straight from the function writing the event - so if we fail to write the log, we can still report a bad status. But if the log wasn't written, we're in bad shape regardless - `podman ps` would disagree with the event written, for example. Fixes #19237 Signed-off-by: Matt Heon <mheon@redhat.com>	2023-09-11 08:02:46 -04:00
Erik Sjölund	b5ce0ab2de	Fix language, typos and markdown layout [NO NEW TESTS NEEDED] Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>	2023-07-24 11:18:25 +02:00
Valentin Rothberg	9563415430	fix --health-on-failure=restart in transient unit As described in #17777, the `restart` on-failure action did not behave correctly when the health check is being run by a transient systemd unit. It ran just fine when being executed outside such a unit, for instance, manually or, as done in the system tests, in a scripted fashion. There were two issue causing the `restart` on-failure action to misbehave: 1) The transient systemd units used the default `KillMode=cgroup` which will nuke all processes in the specific cgroup including the recently restarted container/conmon once the main `podman healthcheck run` process exits. 2) Podman attempted to remove the transient systemd unit and timer during restart. That is perfectly fine when manually restarting the container but not when the restart itself is being executed inside such a transient unit. Ultimately, Podman tried to shoot itself in the foot. Fix both issues by moving the restart logic in the cleanup process. Instead of restarting the container, the `healthcheck run` will just stop the container and the cleanup process will restart the container once it has turned unhealthy. Fixes: #17777 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-03-20 13:56:00 +01:00
Valentin Rothberg	9d1c153cfc	ps: query health check in batch mode Also do not return (and immediately suppress) an error if no health check is defined for a given container. Makes listing 100 containers around 10 percent faster. [NO NEW TESTS NEEDED] Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2023-01-25 11:24:18 +01:00
Paul Holzinger	28774f18c5	disable healthchecks automatically on non systemd systems The podman healthchecks are implemented using systemd timers, this works great but it will never work on non systemd distros. Currently the logic always assumes systemd is available and will fail with an error, so users are forced to always run with `--no-healthcheck` to disable healthchecks that are defined in an image for example. This is annoying and IMO unnecessary, we should just default to no healthcheck on these systems. First, use the systemd build tag to disable it at build time if this tag is not used. Second, use make sure systemd is used as init before trying to use healthchecks. This could be the case when we are run in a container. [NO NEW TESTS NEEDED] We do not have any non systemd VMs in CI AFAIK. Fixes #16644 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2022-12-05 20:58:30 +01:00
Matthew Heon	d16129330d	Add support for startup healthchecks Startup healthchecks are similar to K8S startup probes, in that they are a separate check from the regular healthcheck that runs before it. If the startup healthcheck fails repeatedly, the associated container is restarted. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2022-11-28 13:30:29 -05:00
Valentin Rothberg	02040089a6	health checks: make on-failure action retry aware Make sure that the on-failure actions only kick in once the health check has passed its retries. Also fix race conditions on reading/writing the log. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-10-07 15:43:48 +02:00
Chris Evich	d968f3fe09	Replace deprecated ioutil Package `io/ioutil` was deprecated in golang 1.16, preventing podman from building under Fedora 37. Fortunately, functionality identical replacements are provided by the packages `io` and `os`. Replace all usage of all `io/ioutil` symbols with appropriate substitutions according to the golang docs. Signed-off-by: Chris Evich <cevich@redhat.com>	2022-09-20 15:34:27 -04:00
Valentin Rothberg	aad29e759c	health check: add on-failure actions For systems that have extreme robustness requirements (edge devices, particularly those in difficult to access environments), it is important that applications continue running in all circumstances. When the application fails, Podman must restart it automatically to provide this robustness. Otherwise, these devices may require customer IT to physically gain access to restart, which can be prohibitively difficult. Add a new `--on-failure` flag that supports four actions: - none: Take no action. - kill: Kill the container. - restart: Restart the container. Do not combine the `restart` action with the `--restart` flag. When running inside of a systemd unit, consider using the `kill` or `stop` action instead to make use of systemd's restart policy. - stop: Stop the container. To remain backwards compatible, none is the default action. Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>	2022-09-09 13:02:05 +02:00
Sascha Grunert	251d91699d	libpod: switch to golang native error wrapping We now use the golang error wrapping format specifier `%w` instead of the deprecated github.com/pkg/errors package. [NO NEW TESTS NEEDED] Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-07-05 16:06:32 +02:00
Jake Correnti	5633ef1d15	Docker-compose disable healthcheck properly handled Previously, if a container had healthchecks disabled in the docker-compose.yml file and the user did a `podman inspect <container>`, they would have an incorrect output: ``` "Healthcheck":{ "Test":[ "CMD-SHELL", "NONE" ], "Interval":30000000000, "Timeout":30000000000, "Retries":3 } ``` After a quick change, the correct output is now the result: ``` "Healthcheck":{ "Test":[ "NONE" ] } ``` Additionally, I extracted the hard-coded strings that were used for comparisons into constants in `libpod/define` to prevent a similar issue from recurring. Closes: #14493 Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>	2022-07-05 08:02:22 -04:00
openshift-ci[bot]	278afae1de	Merge pull request #14705 from jakecorrenti/show-health-status-event Show Health Status events	2022-06-27 17:49:27 +00:00
Jake Correnti	0c1a3b70f5	Show Health Status events Previously, health status events were not being generated at all. Both the API and `podman events` will generate health_status events. ``` {"status":"health_status","id":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","from":"localhost/healthcheck-demo:latest","Type":"container","Action":"health_status","Actor":{"ID":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","Attributes":{"containerExitCode":"0","image":"localhost/healthcheck-demo:latest","io.buildah.version":"1.26.1","maintainer":"NGINX Docker Maintainers \u003cdocker-maint@nginx.com\u003e","name":"healthcheck-demo"}},"scope":"local","time":1656082205,"timeNano":1656082205882271276,"HealthStatus":"healthy"} ``` ``` 2022-06-24 11:06:04.886238493 -0400 EDT container health_status ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63 (image=localhost/healthcheck-demo:latest, name=healthcheck-demo, health_status=healthy, io.buildah.version=1.26.1, maintainer=NGINX Docker Maintainers <docker-maint@nginx.com>) ``` Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>	2022-06-27 10:44:53 -04:00
Erik Sjölund	aa4279ae15	Fix spelling "setup" -> "set up" and similar * Replace "setup", "lookup", "cleanup", "backup" with "set up", "look up", "clean up", "back up" when used as verbs. Replace also variations of those. * Improve language in a few places. Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>	2022-06-22 18:39:21 +02:00
Aditya R	4f77331c9d	healthcheck, libpod: Read healthcheck event output from os pipe It seems we are ignoring output from healthcheck session. Open a valid pipe to healthcheck session in order read its output. Use common pipe for both `stdout/stderr` since that was the previous behviour as well. Signed-off-by: Aditya R <arajan@redhat.com>	2022-02-04 21:15:03 +05:30
Valentin Rothberg	bd09b7aa79	bump go module to version 4 Automated for .go files via gomove [1]: `gomove github.com/containers/podman/v3 github.com/containers/podman/v4` Remaining files via vgrep [2]: `vgrep github.com/containers/podman/v3` [1] https://github.com/KSubedi/gomove [2] https://github.com/vrothberg/vgrep Signed-off-by: Valentin Rothberg <rothberg@redhat.com>	2022-01-18 12:47:07 +01:00
Paul Holzinger	db44addf97	sync container state before reading the healthcheck The health check result is stored in the container state. Since the state can change or might not even be set we have to retrive the current state before we try to read the health check result. Fixes #11687 Signed-off-by: Paul Holzinger <pholzing@redhat.com>	2021-09-22 17:40:16 +02:00
Valentin Rothberg	5dded6fae7	bump go module to v3 We missed bumping the go module, so let's do it now :) * Automated go code with github.com/sirkon/go-imports-rename * Manually via `vgrep podman/v2` the rest Signed-off-by: Valentin Rothberg <rothberg@redhat.com>	2021-02-22 09:03:51 +01:00
Paul Holzinger	69ab67bf90	Enable golint linter Use the golint linter and fix the reported problems. [NO TESTS NEEDED] Signed-off-by: Paul Holzinger <paul.holzinger@web.de>	2021-02-11 23:01:49 +01:00
Daniel J Walsh	831d7fb0d7	Stop excessive wrapping of errors Most of the builtin golang functions like os.Stat and os.Open report errors including the file system object path. We should not wrap these errors and put the file path in a second time, causing stuttering of errors when they get presented to the user. This patch tries to cleanup a bunch of these errors. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2020-10-30 05:34:04 -04:00
Daniel J Walsh	a5e37ad280	Switch all references to github.com/containers/libpod -> podman Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2020-07-28 08:23:45 -04:00
Matthew Heon	90e547ec1a	Do not print an error message on non-0 exec exit code This was added with an earlier exec rework, and honestly is very confusing. Podman is printing an error message, but the error had nothing to do with Podman; it was the executable we ran inside the container that errored, and per `podman run` convention we should set the Podman exit code to the process's exit code and print no error. Signed-off-by: Matthew Heon <mheon@redhat.com>	2020-07-21 13:28:40 -04:00
Valentin Rothberg	8489dc4345	move go module to v2 With the advent of Podman 2.0.0 we crossed the magical barrier of go modules. While we were able to continue importing all packages inside of the project, the project could not be vendored anymore from the outside. Move the go module to new major version and change all imports to `github.com/containers/libpod/v2`. The renaming of the imports was done via `gomove` [1]. [1] https://github.com/KSubedi/gomove Signed-off-by: Valentin Rothberg <rothberg@redhat.com>	2020-07-06 15:50:12 +02:00
Brent Baude	141b34f6be	Fix remote integration for healthchecks the one remaining test that is still skipped do to missing exec function Signed-off-by: Brent Baude <bbaude@redhat.com>	2020-05-20 14:43:01 -05:00
Brent Baude	fadd011a80	separate healthcheck and container log paths instead of using the container log path to derive where to put the healthchecks, we now put them into the rundir to avoid collision of health check log files when the log path is set by user. Fixes: #5915 Signed-off-by: Brent Baude <bbaude@redhat.com>	2020-04-27 16:05:48 -05:00
Brent Baude	4d895dcb54	v2podman attach and exec add the ability to attach to a running container. the tunnel side of this is not enabled yet as we have work on the endpoints and plumbing to do yet. add the ability to exec a command in a running container. the tunnel side is also being deferred for same reason. Signed-off-by: Brent Baude <bbaude@redhat.com>	2020-04-05 15:54:51 -05:00
Brent Baude	2fa78938a9	podmanv2 container inspect add ability to inspect a container Signed-off-by: Brent Baude <bbaude@redhat.com>	2020-03-26 15:54:26 -05:00
Matthew Heon	118e78c5d6	Add structure for new exec session tracking to DB As part of the rework of exec sessions, we need to address them independently of containers. In the new API, we need to be able to fetch them by their ID, regardless of what container they are associated with. Unfortunately, our existing exec sessions are tied to individual containers; there's no way to tell what container a session belongs to and retrieve it without getting every exec session for every container. This adds a pointer to the container an exec session is associated with to the database. The sessions themselves are still stored in the container. Exec-related APIs have been restructured to work with the new database representation. The originally monolithic API has been split into a number of smaller calls to allow more fine-grained control of lifecycle. Support for legacy exec sessions has been retained, but in a deprecated fashion; we should remove this in a few releases. Signed-off-by: Matthew Heon <matthew.heon@pm.me>	2020-03-18 11:02:14 -04:00
Valentin Rothberg	67165b7675	make lint: enable gocritic `gocritic` is a powerful linter that helps in preventing certain kinds of errors as well as enforcing a coding style. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>	2020-01-13 14:27:02 +01:00
Dmitry Smirnov	8d928d525f	codespell: spelling corrections Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>	2019-11-13 08:15:00 +11:00
Peter Hunt	1df4dba0a0	Switch to bufio Reader for exec streams There were many situations that made exec act funky with input. pipes didn't work as expected, as well as sending input before the shell opened. Thinking about it, it seemed as though the issues were because of how os.Stdin buffers (it doesn't). Dropping this input had some weird consequences. Instead, read from os.Stdin as bufio.Reader, allowing the input to buffer before passing it to the container. Signed-off-by: Peter Hunt <pehunt@redhat.com>	2019-10-31 11:20:12 -04:00

1 2

62 Commits