this commit rewrites the cgroups v2 parsing logic in get_cgroup function
which is used to fetch stats of a container. The reason for the rewrite
was that in some cases the original logic would panic due to index
of bound for parsing paths like
0::/kubepods-besteffort-pod162385e5_7f69_4c38_ba9c_db0a8f02b35e.slice:cri-containerd:278a0aac1fff30dfbc41b4a32ba9de4519928fe7480213dba87aa1498838ef34
we ran into this issue in deleting a spin container in the spin shim.
the rewrite replaces index access to properly propogate the error to
the caller of the function and added a few unit tests for the parsing logic.
Signed-off-by: jiaxiao zhou <jiazho@microsoft.com>
If /sys/kernel/mm/transparent_hugepage/enabled=always, the shim process
will use huge pages, which will consume a lot of memory.
Just like this:
ps -efo pid,rss,comm | grep shim
PID RSS COMMAND
2614 7464 containerd-shim
I don't think shim needs to use huge pages, and if we turn off the huge
pages option, we can save a lot of memory resources.
After we set THP_DISABLE=true:
ps -efo pid,comm,rss
PID COMMAND RSS
1629841 containerd-shim 5648
containerd
|
|--shim1 --start
|
|--shim2 (this shim will on host)
|
|--runc create (when containerd send create request by ttrpc)
|
|--runc init (this is the pid 1 in container)
we should set thp_disabled=1 in shim1 --start, because if we set this
in shim 2, the huge page has been setted while func main() running,
we set thp_disabled cannot change the setted huge pages.
So We need to set thp_disabled=1 in shim1 so that shim2 inherits the
settings of the parent process shim1, and shim2 has closed the
hugepage when it starts.
For runc processes, we need to set thp_disabled='before' in shim2 after
fork() and before execve(). So we use cmd.pre_exec to do this.
It's a very small change so I figured it's simpler to open a PR than an issue first.
The sync `state` method returns `Container` but for async returns `Vec<usize>`, and I couldn't locate an explanation for why these might be different so I assume it's a mistake. From a user perspective too I want Container rather than a usize vec.
Signed-off-by: Andrew Baxter <i@isandrew.com>
in cgroupv2 we should use the cgroups.proc file when adding a process (https://www.man7.org/linux/man-pages/man7/cgroups.7.html). The add_tasks function was writing to the cgroup.threads file which is only avaliable when in threaded mode. In either case our intent is to add the process not the individual threads to we should use add_task_by_tgid. See https://github.com/kata-containers/cgroups-rs/pull/104 for when this was added
Signed-off-by: James Sturtevant <jstur@microsoft.com>
This commits adds cgroup v2 support for collecting metrics in the shim.
Additionally, it uses CPU controller instead of the CPUAcct controller
for reporting CPU metrics back to containerd.
Signed-off-by: jiaxiao zhou <jiazho@microsoft.com>
The "read" side of container stdout/stderr fifo has been opened
by containerd and on the other hand "write" side is opened by
container process, which is a little different with golang shim.
If containerd shutdown and closed the read fd, container process
will receive EPIPE when writing to stdout/stderr and then be
killed by SIGPIPE signal. In this commit, the "read" side is
opened again by shim so that at least there is one opened "read"
side all the time.
Signed-off-by: Tianyang Zhang <burning9699@gmail.com>
Because the second invocation of the shim doesn't have the containerd pipe passed to it, a shim that wants to communicate over the pipe needs to parse the arguments its own. This makes it so the library pass all the arguments, which has already parsed the arguments allowing shims to use the containerd address.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
This PR follows up on @wedsonaf's #126 and pass the filters through the list input.
Note that this is not backward compatible but this is a step closer to the protocol
conformance.
Bump containerd-shim-protos from 0.2.0 to 0.3.0 to
include changes made in #95.
Because the update in #95 is not compatible we need a major
version update.
Signed-off-by: Tim Zhang <tim@hyper.sh>
use File::from_raw_fd() to create the tty io file, when the io copy
thread ends, it will drop the file object which will close the fd, but
as we made three file objects from the same fd, it will be closed
three times, if other opened files occupied this fd number, the
second or third drop of the file object may close the fds of other files.
Signed-off-by: Zhang Tianyang <burning9699@gmail.com>
This allows us to use the `==` and `!=` operators to compare instances
of `Kind`, which is useful when we require that a snapshot be of some
specific kind (e.g., committed) before performing an operation on it.
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
This is needed by remote snapshotters: once they report "already
exists", containerd tries to find the snapshot via `list`. If
it's not implemented, the "already exists" trick to prevent
layer download doesn't work.
This is still missing a filtering function, but allows remote
snapshotters to work.
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>