Instrument cleanup tracer to log weird volume removal flake

Debug for #23913, I though if we have no idea which process is nuking
the volume then we need to figure this out. As there is no reproducer
we can (ab)use the cleanup tracer. Simply trace all unlink syscalls to
see which process deletes our special named volume. Given the volume
name is used as path on the fs and is deleted on volume rm we should
know exactly which process deleted it the next time hopefully.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This commit is contained in:
Paul Holzinger 2024-10-30 18:45:39 +01:00
parent f139bc17b3
commit d633824a95
No known key found for this signature in database
GPG Key ID: EB145DD938A3CAF2
2 changed files with 16 additions and 1 deletions

View File

@ -149,3 +149,17 @@ tracepoint:syscalls:sys_enter_write
$offset += $len
}
}
// HACK: debug for https://github.com/containers/podman/issues/23913
// The test uses "ebpf-debug-23913" volume name and because and volume rm
// will delete the path we can trap the process here to find out who actually
// deletes it.
tracepoint:syscalls:sys_enter_unlink*
/ strcontains(str(args.pathname), "ebpf-debug-23913") /
{
printf("Special issue 23913 volume deleted by pid %d: ", pid);
// This can fail to open the file it is done in user space and
// thus racy if the process exits quickly.
cat("/proc/%d/cmdline", pid);
print("");
}

View File

@ -270,7 +270,8 @@ function _check_no_suggestions() {
random_image_name="i-$(safename)"
random_image_tag=$(random_string 5)
random_network_name="n-$(safename)"
random_volume_name="v-$(safename)"
# Do not change the suffix, it is special debug for #23913
random_volume_name="v-$(safename)-ebpf-debug-23913"
random_secret_name="s-$(safename)"
random_secret_content=$(random_string 30)
secret_file=$PODMAN_TMPDIR/$(random_string 10)