this commit rewrites the cgroups v2 parsing logic in get_cgroup function
which is used to fetch stats of a container. The reason for the rewrite
was that in some cases the original logic would panic due to index
of bound for parsing paths like
0::/kubepods-besteffort-pod162385e5_7f69_4c38_ba9c_db0a8f02b35e.slice:cri-containerd:278a0aac1fff30dfbc41b4a32ba9de4519928fe7480213dba87aa1498838ef34
we ran into this issue in deleting a spin container in the spin shim.
the rewrite replaces index access to properly propogate the error to
the caller of the function and added a few unit tests for the parsing logic.
Signed-off-by: jiaxiao zhou <jiazho@microsoft.com>
If /sys/kernel/mm/transparent_hugepage/enabled=always, the shim process
will use huge pages, which will consume a lot of memory.
Just like this:
ps -efo pid,rss,comm | grep shim
PID RSS COMMAND
2614 7464 containerd-shim
I don't think shim needs to use huge pages, and if we turn off the huge
pages option, we can save a lot of memory resources.
After we set THP_DISABLE=true:
ps -efo pid,comm,rss
PID COMMAND RSS
1629841 containerd-shim 5648
containerd
|
|--shim1 --start
|
|--shim2 (this shim will on host)
|
|--runc create (when containerd send create request by ttrpc)
|
|--runc init (this is the pid 1 in container)
we should set thp_disabled=1 in shim1 --start, because if we set this
in shim 2, the huge page has been setted while func main() running,
we set thp_disabled cannot change the setted huge pages.
So We need to set thp_disabled=1 in shim1 so that shim2 inherits the
settings of the parent process shim1, and shim2 has closed the
hugepage when it starts.
For runc processes, we need to set thp_disabled='before' in shim2 after
fork() and before execve(). So we use cmd.pre_exec to do this.
It's a very small change so I figured it's simpler to open a PR than an issue first.
The sync `state` method returns `Container` but for async returns `Vec<usize>`, and I couldn't locate an explanation for why these might be different so I assume it's a mistake. From a user perspective too I want Container rather than a usize vec.
Signed-off-by: Andrew Baxter <i@isandrew.com>
in cgroupv2 we should use the cgroups.proc file when adding a process (https://www.man7.org/linux/man-pages/man7/cgroups.7.html). The add_tasks function was writing to the cgroup.threads file which is only avaliable when in threaded mode. In either case our intent is to add the process not the individual threads to we should use add_task_by_tgid. See https://github.com/kata-containers/cgroups-rs/pull/104 for when this was added
Signed-off-by: James Sturtevant <jstur@microsoft.com>