New flags in a `podman update` can change the configuration of HealthCheck when the container is started, without having to restart or recreate the container.
This can help determine why a given container suddenly started failing HealthCheck without interfering with the services it provides. For example, reconfigure HealthCheck to keep logs longer than the usual last X results, store logs to other destinations, etc.
Fixes: https://issues.redhat.com/browse/RHEL-60561
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add error check during tmpfile close.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
If volume ls was called while another volume was removed at the right
time it could have failed with "no such volume" as we did not ignore
such error during listing. As we list things and this no longer exists
the correct thing is to ignore the error and continue like we do with
containers, pods, etc...
This was pretty easy to reproduce with these two commands running in
different terminals:
while :; do bin/podman volume create test && bin/podman volume rm test || break; done
while :; do bin/podman volume ls || break ; done
I have a slight feeling that this might solve #23913 but I am not to
sure there so I am not adding a Fixes here.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
API clients expect the status code quickly otherwise they can time out.
If we do not flush we may not write the header immediately and only when
futher logs are send.
Fixes#23712
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
One of the problems with the Events() API was that you had to call it in
a new goroutine. This meant the the error returned by it had to be read
back via a second channel. This cuased other bugs in the past but here
the biggest problem is that basic errors such as invalid since/until
options were not directly returned to the caller.
It meant in the API we were not able to write http code 200 quickly
because we always waited for the first event or error from the
channels. This in turn made some clients not happy as they assume the
server hangs on time out if no such events are generated.
To fix this we resturcture the entire event flow. First we spawn the
goroutine inside the eventer Read() function so not all the callers have
to. Then we can return the basic error quickly without the goroutine.
The caller then checks the error like any normal function and the API
can use this one to decide which status code to return.
Second we now return errors/event in one channel then the callers can
decide to ignore or log them which makes it a bit more clear.
Fixes c46884aa93 ("podman events: check for an error after we finish reading events")
Fixes#23712
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The special handling to return the exit code after the container has
been removed should only be done if there are no special conditions
requested. If a user asked for running or nay other state returning the
exit code immediately with a success response is just wrong. We only
want to allow that so the remote client can fetch the exit code without
races.
Fixes b3829a2932 ("libpod API: make wait endpoint better against rm races")
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The close is replaced in the body of the error condition.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
Prior to this commit, many scp functions existed without option structs, which would make extending functionality (adding new options) impossible without breaking changes, or without adding redundant wrapper functions.
This commit adds in new option types for various scp related functions, and changes those functions' signatures to use the new options.
This commit also modifies the `ImageEngine.Scp()` function's interface to use the new opts.
The commit also renames the existing `ImageScpOptions` entity type to `ScpTransferImageOptions`. This is because the previous `ImageScpOptions` was inaccurate, as it is not the actual options for `ImageEngine.Scp()`. `ImageEngine.Scp()` should instead receive `ImageScpOptions`.
This commit should not change any behavior, however it will break the existing functions' signatures.
Signed-off-by: Zachary Hanham <z.hanham00@gmail.com>
In the common scenario of podman-remote run --rm the API is required to
attach + start + wait to get exit code. This has the problem that the
wait call races against the container removal from the cleanup process
so it may not get the exit code back. However we keep the exit code
around for longer than the container so we can just look it up in the
endpoint. Of course this only works when we get a full id as param but
podman-remote will do that.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This commit was automatically cherry-picked
by buildah-vendor-treadmill v0.3
from the buildah vendor treadmill PR, #13808
* Fix conflict caused by Ed's local-registry PR in buildah
* Wire in "new" --retry and --retry-delay, these existed for longer
but where non functional.
Signed-off-by: Ed Santiago <santiago@redhat.com>
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
A field we missed versus Docker. Matches the format of our
existing Ports list in the NetworkConfig, but only includes
exposed ports (and maps these to struct{}, as they never go to
real ports on the host).
Fixes https://issues.redhat.com/browse/RHEL-60382
Signed-off-by: Matt Heon <mheon@redhat.com>
These flags can affect the output of the HealtCheck log. Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via `podman inspect`.
It is also limited to the last five executions and the first 500 characters per execution.
This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic `healthcheck service failed` record.
- The `--health-log-destination` flag sets the destination of the HealthCheck log.
- `none`: (default behavior) `HealthCheckResults` are stored in overlay containers. (For example: `$runroot/healthcheck.log`)
- `directory`: creates a log file named `<container-ID>-healthcheck.log` with JSON `HealthCheckResults` in the specified directory.
- `events_logger`: The log will be written with logging mechanism set by events_loggeri. It also saves the log to a default directory, for performance on a system with a large number of logs.
- The `--health-max-log-count` flag sets the maximum number of attempts in the HealthCheck log file.
- A value of `0` indicates an infinite number of attempts in the log file.
- The default value is `5` attempts in the log file.
- The `--health-max-log-size` flag sets the maximum length of the log stored.
- A value of `0` indicates an infinite log length.
- The default value is `500` log characters.
Add --health-max-log-count flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-max-log-size flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-log-destination flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
We were only splitting on tabs, not spaces, so we returned just a
single line most of the time, not an array of the fields in the
output of `ps`. Unfortunately, some of these fields are allowed
to contain spaces themselves, which makes things complicated, but
we got lucky in that Docker took the simplest possible solution
and just assumed that only one field would contain spaces and it
would always be the last one, which is easy enough to duplicate
on our end.
Fixes#23981
Signed-off-by: Matt Heon <mheon@redhat.com>
This is very similar to commit 3280da0500, we cannot check the state
then unlock to then lock again and do the action. Everything must
happen under one lock. To fix this move the code into the HTTPAttach
function in libpod. The locking here is a bit weird because attach
blocks for the lifetime of attach which can be very long so we must
unlock before performing the attach.
Fixes#23757
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This started off as an attempt to make `podman stop` on a
container started with `--rm` actually remove the container,
instead of just cleaning it up and waiting for the cleanup
process to finish the removal.
In the process, I realized that `podman run --rmi` was rather
broken. It was only done as part of the Podman CLI, not the
cleanup process (meaning it only worked with attached containers)
and the way it was wired meant that I was fairly confident that
it wouldn't work if I did a `podman stop` on an attached
container run with `--rmi`. I rewired it to use the same
mechanism that `podman run --rm` uses, so it should be a lot more
durable now, and I also wired it into `podman inspect` so you can
tell that a container will remove its image.
Tests have been added for the changes to `podman run --rmi`. No
tests for `stop` on a `run --rm` container as that would be racy.
Fixes#22852
Fixes RHEL-39513
Signed-off-by: Matt Heon <mheon@redhat.com>
The new golangci-lint version 1.60.1 has problems with typecheck when
linting remote files. We have certain pakcages that should never be
inlcuded in remote but the typecheck tries to compile all of them but
this never works and it seems to ignore the exclude files we gave it.
To fix this the proper way is to mark all packages we only use locally
with !remote tags. This is a bit ugly but more correct. I also moved the
DecodeChanges() code around as it is called from the client so the
handles package which should only be remote doesn't really fit anyway.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The value of the pointer might be changed while creating the container
causing unexpected side effects.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Users do not realize that the entire context directory is being copied
into the podman machine when doing a podman --remote build.
Adding information about the context directory might help them
understand this.
Improves: https://github.com/containers/podman/issues/23287
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
The current code did something like this:
lock()
getState()
unlock()
if state != running
lock()
getState() == running -> error
unlock()
This of course is wrong because between the first unlock() and second
lock() call another process could have modified the state. This meant
that sometimes you would get a weird error on start because the internal
setup errored as the container was already running.
In general any state check without holding the lock is incorrect and
will result in race conditions. As such refactor the code to combine
both StartAndAttach and Attach() into one function that can handle both.
With that we can move the running check into the locked code.
Also use typed error for this specific error case then the callers can
check and ignore the specific error when needed. This also allows us to
fix races in the compat API that did a similar racy state check.
This commit changes slightly how we output the result, previously a
start on already running container would never print the id/name of the
container which is confusing and sort of breaks idempotence. Now it will
include the output except when --all is used. Then it only reports the
ids that were actually started.
Fixes#23246
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fixes compile issues with new docker changes, then fix all the new
depreciation warnings.
Also there seem to be larger pre-existing problems with the
/containers/json API output as the HostConfig field seems to be missing
but I don't have time to deal with that currently.
Note this does not include changes for the new docker API 1.46.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The docker API uses only a single arg for platform and multiple
platforms are given as comma separated list.
Fixes#22071
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Do not return 200 status code before we know if there will be an error.
Delay writing the status code until we send the first response. That way
we can set an error code inside the loop when we get a error on the
first try, i.e. because an invalid descriptor was used.
Fixes#22986
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When we failed to do anything we should return 500, the 409 code has a
special meaing to the client as it uses a different error format. As
such the remote client was not able to unmarshal the error correctly and
just returned an empty string.
Fixes#22989
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Add a `podman system check` that performs consistency checks on local
storage, optionally removing damaged items so that they can be
recreated.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
The function that's handing us events will return an error after closing
the channel over which it's sending events, and its caller (in its own
goroutine) will then send that error over another channel.
The logic that started the goroutine is likely to notice that the events
channel is closed before noticing that the error channel has a result
for it to read, so any error that would have been communicated would be
lost.
When we finish reading events, check if the reader returned an error
before telling our caller that there was no error.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
The v5 API made a breaking change for podman inspect, this means that
an old client could not longer parse the result from the new 5.X server.
The other way around new client and old server already worked.
As it turned out there were several users that run into this, one case
to hit this is using an old 4.X podman machine wich now pulls a newer
coreos with podman 5.0. But there are also other users running into it.
In order to keep the API working we now have a version check and return
the old v4 compatible payload so the old remote client can still work
against a newer server thus removing any major breaking change for an
old client.
Fixes#22657
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This is something Docker does, and we did not do until now. Most
difficult/annoying part was the REST API, where I did not really
want to modify the struct being sent, so I made the new restart
policy parameters query parameters instead.
Testing was also a bit annoying, because testing restart policy
always is.
Signed-off-by: Matt Heon <mheon@redhat.com>
The Docker endpoint here is kind of a nightmare - accepts a full
Resources block, including a large number of scary things like
devices. But it only documents (and seems to use) a small subset
of those. This implements support for that subset. We can always
extend things to implement more later if we have a need.
Signed-off-by: Matt Heon <mheon@redhat.com>
when listing images through the restful service, consumers want to know
if the image they are listing is a manifest or not because the libpod
endpoint returns both images and manifest lists.
in addition, we now add `arch` and `os` as fields in the libpod endpoint
for image listing as well.
Fixes: #22184Fixes: #22185
Signed-off-by: Brent Baude <bbaude@redhat.com>
Not sure why this only triggers now but this code was broken for a
while. It is racy as reported on the issue but because it changes the
actual map part of the network backend it means it can also alter the
behavior of the network which is very bad.
Fixes#22330
Signed-off-by: Paul Holzinger <pholzing@redhat.com>