This is not needed and was added by during debugging but it turned out
to be something else. We should not lock the thread unless needed
because this just raises question why it is here otherwise.
Also the lock would not do much as we spawn a goroutine below anyway so
it runs on another thread no matter what.
From the review comment by Miloslav but it was merged before I had the
chance to fix it:
https://github.com/containers/podman/pull/24406#discussion_r1828102666
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
One of the problems with the Events() API was that you had to call it in
a new goroutine. This meant the the error returned by it had to be read
back via a second channel. This cuased other bugs in the past but here
the biggest problem is that basic errors such as invalid since/until
options were not directly returned to the caller.
It meant in the API we were not able to write http code 200 quickly
because we always waited for the first event or error from the
channels. This in turn made some clients not happy as they assume the
server hangs on time out if no such events are generated.
To fix this we resturcture the entire event flow. First we spawn the
goroutine inside the eventer Read() function so not all the callers have
to. Then we can return the basic error quickly without the goroutine.
The caller then checks the error like any normal function and the API
can use this one to decide which status code to return.
Second we now return errors/event in one channel then the callers can
decide to ignore or log them which makes it a bit more clear.
Fixes c46884aa93 ("podman events: check for an error after we finish reading events")
Fixes#23712
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
These flags can affect the output of the HealtCheck log. Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via `podman inspect`.
It is also limited to the last five executions and the first 500 characters per execution.
This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic `healthcheck service failed` record.
- The `--health-log-destination` flag sets the destination of the HealthCheck log.
- `none`: (default behavior) `HealthCheckResults` are stored in overlay containers. (For example: `$runroot/healthcheck.log`)
- `directory`: creates a log file named `<container-ID>-healthcheck.log` with JSON `HealthCheckResults` in the specified directory.
- `events_logger`: The log will be written with logging mechanism set by events_loggeri. It also saves the log to a default directory, for performance on a system with a large number of logs.
- The `--health-max-log-count` flag sets the maximum number of attempts in the HealthCheck log file.
- A value of `0` indicates an infinite number of attempts in the log file.
- The default value is `5` attempts in the log file.
- The `--health-max-log-size` flag sets the maximum length of the log stored.
- A value of `0` indicates an infinite log length.
- The default value is `500` log characters.
Add --health-max-log-count flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-max-log-size flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add --health-log-destination flag
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Commit 03f6589f3 added basic support for pull-error event from libimage
but it contains several problems:
1. storing the error as error type prevents it from being unmarshalled,
thus change it to a string
2. the error was never propagated from the libimage event to the podman
event struct
3. the error message was not wired into the cli and API
This commit fixes these problems.
Fixes#21458
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Moving from Go module v4 to v5 prepares us for public releases.
Move done using gomove [1] as with the v3 and v4 moves.
[1] https://github.com/KSubedi/gomove
Signed-off-by: Matt Heon <mheon@redhat.com>
Added additional check for event type to be remove and set the correct exitcode.
While it was getting difficult to maintain the omitempty notation for Event->ContainerExitCode, changing the type from int to int ptr gives us the ability to check for ContainerExitCode to be not nil and continue operations from there.
closes#19124
Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
This fixes a regression caused by commit 7e6e267329, unfortunately this
was not caught during review as for some reason this works fine rootless
and only fails as root.
Because we set the systemd log level to notice in order to hide the unit
started/stopped messages to prevent spamming the journal the issue is
that this now also causes systemd to ignore the events we write to
journald as we also send them as info level.
To fix this we simply send health_status events now on notice level. I
decided against sending all events on notice as I think info is fine for
them. Whenever the notice level is right is of course debatable but
given it may contain the unhealthy message I think having this a notice
should be ok.
The main reason this made it through testing is because we do not rely
on the systemd unit to fire healthchecks in the tests as this is flaky.
There is one test were we rely on it though and I added a check there
to make sure events are displayed correctly when trigger via systemd.
Fixes#20342
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
I noticed this while running some things in parallel, podman events
would show events from other users. Because all events are written to
the journal everybody can see them. So when we read the journal we must
filter events for only the current UID.
To reproduce run `podman events` as user then in another window create a
container as root for example. After this patch it will correctly ignore
these events from other users.
[NO NEW TESTS NEEDED] I don't think we can test with two users at the same
time.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When you use podman logs with --until and --follow it should exit after
the requested until time and not keep hanging forever.
To make this work I reworked the code to use the better journald event
reading code for logs as well. this correctly uses the sd_journal API
without having to compare the cursors to find the EOF.
The same problems exists for the k8s-file driver, I will fix this in the
next commit.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Instead of reading the full journal which can be expensive we can seek
based on the time.
If you have a journald with many podman events just compare the time
`time podman events --since 1s --stream=false` with and without this
patch.
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When labes map is too big we may get syslog entry truncated.
This breaks JSON parsing making event loading impossible.
[NO NEW TESTS NEEDED]
Signed-off-by: Matej Vasek <mvasek@redhat.com>
When the new `events_container_create_inspect_data` option is enabled in
containers.conf set the `ContainersInspectData` event field for each
container-create event.
The data was requested for the purpose of auditing (e.g., intrusion
detection).
Jira: https://issues.redhat.com/browse/RUN-1702
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
This allows tools like Cockpit to know that the pod in question
has also been updated, so they can refresh the list of containers
in the pod.
Fixes#15408
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Currently podman events will just fail with `Error: failed to get journal
cursor: failed to get cursor: cannot assign requested address` when the
journal contains zero podman events.
The problem is that we are using the journal accessors wrong. There is no
need to call GetCursor() and compare them manually. The Next() return an
integer which tells if it moved to the next or not. This means the we can
remove GetCursor() which would fail when there is no entry.
This also includes another bug fix. Previously the logic called Next()
twice for the first entry which caused us to miss the first entry.
To reproduce this issue you can run the following commands:
```
sudo journalctl --rotate
sudo journalctl --vacuum-time=1s
```
Note that this will delete the full journal.
Now run podman events and it fails but with this patch it works.
Now generate a single event, i.e. podman pull alpine, and run
podman events --until 1s.
I am not sure how to get a reliable test into CI, I really do not want
to delete the journal and developer or CI systems.
Fixes second part of #15688
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When running a single podman logs this is not really important since we
will exit when we finish reading the logs. However for the system
service this is very important. Leaking goroutines will cause an
increased memory and CPU ussage over time.
Both the the event and log backend have goroutine leaks with both the
file and journald drivers.
The journald backend has the problem that journal.Wait(IndefiniteWait)
will block until we get a new journald event. So when a client closes
the connection the goroutine would still wait until there is a new
journal entry. To fix this we just wait for a maximum of 5 seconds,
after that we can check if the client connection was closed and exit
correctly in this case.
For the file backend we can fix this by waiting for either the log line
or context cancel at the same time. Currently it would block waiting for
new log lines and only check afterwards if the client closed the
connection and thus hang forever if there are no new log lines.
[NO NEW TESTS NEEDED] I am open to ideas how we can test memory leaks in
CI.
To test manually run a container like this:
`podman run --log-driver $driver --name test -d alpine sh -c 'i=1; while [ "$i" -ne 1000 ]; do echo "line $i"; i=$((i + 1)); done; sleep inf'`
where `$driver` can be either `journald` or `k8s-file`.
Then start the podman system service and use:
`curl -m 1 --output - --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock -v 'http://d/containers/test/logs?follow=1&since=0&stderr=1&stdout=1' &>/dev/null`
to get the logs from the API and then it closes the connection after 1 second.
Now run the curl command several times and check the memory usage of the service.
Fixes#14879
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
We now use the golang error wrapping format specifier `%w` instead of
the deprecated github.com/pkg/errors package.
[NO NEW TESTS NEEDED]
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Previously, health status events were not being generated at all. Both
the API and `podman events` will generate health_status events.
```
{"status":"health_status","id":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","from":"localhost/healthcheck-demo:latest","Type":"container","Action":"health_status","Actor":{"ID":"ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63","Attributes":{"containerExitCode":"0","image":"localhost/healthcheck-demo:latest","io.buildah.version":"1.26.1","maintainer":"NGINX Docker Maintainers \u003cdocker-maint@nginx.com\u003e","name":"healthcheck-demo"}},"scope":"local","time":1656082205,"timeNano":1656082205882271276,"HealthStatus":"healthy"}
```
```
2022-06-24 11:06:04.886238493 -0400 EDT container health_status ae498ac3aa6c63db8b69a37583a6eae1a9cefbdbdbeeadcf8e1d66d745f0df63 (image=localhost/healthcheck-demo:latest, name=healthcheck-demo, health_status=healthy, io.buildah.version=1.26.1, maintainer=NGINX Docker Maintainers <docker-maint@nginx.com>)
```
Signed-off-by: Jake Correnti <jcorrenti13@gmail.com>
Lint the systemd code and fix the reported problems.
The remoteclient tag is no longer used so I just removed it.
[NO NEW TESTS NEEDED]
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Standardize on no-trunc through the code.
Alias notruncate where necessary.
Standardize on the man page display of no-trunc.
Fixes: https://github.com/containers/podman/issues/8941
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
we were adding a negative duration in podman events, causing inputs like
-5s to be correct and 5s to be incorrect.
fixes#11158
Signed-off-by: cdoern <cdoern@redhat.com>
While different filters are applied in conjunction, the same filter (but
with different values) should be applied in disjunction. This allows,
for instance, to query the events of two containers.
Fixes: #10507
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
We missed bumping the go module, so let's do it now :)
* Automated go code with github.com/sirkon/go-imports-rename
* Manually via `vgrep podman/v2` the rest
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
The podman events aren't read until the given timestamp if the
timestamp is in the future. It just reads all events until now
and exits afterwards.
This does not make sense and does not match docker. The correct
behavior is to read all events until the given time is reached.
This fixes a bug where the wrong event log file path was used
when running first time with a new storage location.
Fixes#8694
This also fixes the events api endpoint which only exited when
an error occurred. Otherwise it just hung after reading all events.
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
this enables the ability to connect and disconnect a container from a
given network. it is only for the compatibility layer. some code had to
be refactored to avoid circular imports.
additionally, tests are being deferred temporarily due to some
incompatibility/bug in either docker-py or our stack.
Signed-off-by: baude <bbaude@redhat.com>
Fix the AddMatch/SeekTail conflict. This prevents reading
unnecessary journal entries which could cause errors.
Also wrap the sdjournal errors to provide better error messages.
Fixes#8125
Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
adding the ability to filter evens by the container labels. this requires that container labels be added to the events data being recorded and subsequently read.
Signed-off-by: baude <bbaude@redhat.com>
Fix a potential panic in the events endpoint when parsing the filters
parameter. Values of the filters map might be empty, so we need to
account for that instead of uncondtitionally accessing the first item.
Also apply a similar for race conditions as done in commit f4a2d25c0fca:
Fix a race that could cause read errors to be masked. Masking
such errors is likely to report red herrings since users don't
see that reading failed for some reasons but that a given event
could not be found.
Another race was the handler closing event channel, which could lead to
two kinds of panics: double close, send to close channel. The backend
takes care of that. However, make sure that the backend stops working
in case the context has been cancelled.
Fixes: #6899
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
We weren't actually halting the goroutine that sent events, so it
would continue sending even when the channel closed (the most
notable cause being early hangup - e.g. Control-c on a curl
session). Use a context to cancel the events goroutine and stop
sending events.
Fixes#6805
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
Update the outdated systemd and dbus dependencies which are now provided
as go modules. This will further tighten our dependencies and releases
and pave the way for the upcoming auto-update feature.
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
in the case where the host has a large journald, iterating the journal
without using a Match is very poor performance. this might be a
temporary fix while we figure out why the systemd library does not seem to
behave properly.
Signed-off-by: baude <bbaude@redhat.com>
JSON optimizes it out in that case anyways, so don't waste cycles
doing an Itoa (and Atoi on the decode side).
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
We weren't actually storing this, so we'd lose the exit code for
containers run with --rm or force-removed while running if the
journald backend for events was in use.
Fixes#3795
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
it looks like the core-os systemd library has some issue when using
seektail and add match. this patch works around that shortcoming for
the time being.
Fixes: #3616
Signed-off-by: baude <bbaude@redhat.com>
an internal change in libpod will soon required the ability to lookup
the last container event using the continer name or id and the type of
event. this pr is in preperation for that need.
Signed-off-by: baude <bbaude@redhat.com>
once the default event logger was removed from libpod.conf, we need to
set the default based on whether the systemd build tag is used or not.
Signed-off-by: baude <bbaude@redhat.com>
If the systemd development files are not present on the system which
builds podman, then `podman events` will error on runtime creation.
Beside this, a warning will be printed when compiling podman.
This commit mainly exists because projects which depend on libpod
would not need the podman event support and therefore do not need to
rely on the systemd headers.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>