--- title: 进程 ID 约束与预留 content_type: concept weight: 40 --- <!-- reviewers: - derekwaynecarr title: Process ID Limits And Reservations content_type: concept weight: 40 --> <!-- overview --> {{< feature-state for_k8s_version="v1.20" state="stable" >}} <!-- Kubernetes allow you to limit the number of process IDs (PIDs) that a {{< glossary_tooltip term_id="Pod" text="Pod" >}} can use. You can also reserve a number of allocatable PIDs for each {{< glossary_tooltip term_id="node" text="node" >}} for use by the operating system and daemons (rather than by Pods). --> Kubernetes 允许你限制一个 {{< glossary_tooltip term_id="Pod" text="Pod" >}} 中可以使用的 进程 ID(PID)数目。你也可以为每个 {{< glossary_tooltip term_id="node" text="节点" >}} 预留一定数量的可分配的 PID,供操作系统和守护进程(而非 Pod)使用。 <!-- body --> <!-- Process IDs (PIDs) are a fundamental resource on nodes. It is trivial to hit the task limit without hitting any other resource limits, which can then cause instability to a host machine. --> 进程 ID(PID)是节点上的一种基础资源。很容易就会在尚未超出其它资源约束的时候就 已经触及任务个数上限,进而导致宿主机器不稳定。 <!-- Cluster administrators require mechanisms to ensure that Pods running in the cluster cannot induce PID exhaustion that prevents host daemons (such as the {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} or {{< glossary_tooltip text="kube-proxy" term_id="kube-proxy" >}}, and potentially also the container runtime) from running. In addition, it is important to ensure that PIDs are limited among Pods in order to ensure they have limited impact on other workloads on the same node. --> 集群管理员需要一定的机制来确保集群中运行的 Pod 不会导致 PID 资源枯竭,甚而 造成宿主机上的守护进程(例如 {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} 或者 {{< glossary_tooltip text="kube-proxy" term_id="kube-proxy" >}} 乃至包括容器运行时本身)无法正常运行。 此外,确保 Pod 中 PID 的个数受限对于保证其不会影响到同一节点上其它负载也很重要。 {{< note >}} <!-- On certain Linux installations, the operating system sets the PIDs limit to a low default, such as `32768`. Consider raising the value of `/proc/sys/kernel/pid_max`. --> 在某些 Linux 安装环境中,操作系统会将 PID 约束设置为一个较低的默认值,例如 `32768`。这时可以考虑提升 `/proc/sys/kernel/pid_max` 的设置值。 {{< /note >}} <!-- You can configure a kubelet to limit the number of PIDs a given Pod can consume. For example, if your node's host OS is set to use a maximum of `262144` PIDs and expect to host less than `250` Pods, one can give each Pod a budget of `1000` PIDs to prevent using up that node's overall number of available PIDs. If the admin wants to overcommit PIDs similar to CPU or memory, they may do so as well with some additional risks. Either way, a single Pod will not be able to bring the whole machine down. This kind of resource limiting helps to prevent simple fork bombs from affecting operation of an entire cluster. --> 你可以配置 kubelet 限制给定 Pod 能够使用的 PID 个数。 例如,如果你的节点上的宿主操作系统被设置为最多可使用 `262144` 个 PID,同时预期 节点上会运行的 Pod 个数不会超过 `250`,那么你可以为每个 Pod 设置 `1000` 个 PID 的预算,避免耗尽该节点上可用 PID 的总量。 如果管理员系统像 CPU 或内存那样允许对 PID 进行过量分配(Overcommit),他们也可以 这样做,只是会有一些额外的风险。不管怎样,任何一个 Pod 都不可以将整个机器的运行 状态破坏。这类资源限制有助于避免简单的派生炸弹(Fork Bomb)影响到整个集群的运行。 <!-- Per-Pod PID limiting allows administrators to protect one Pod from another, but does not ensure that all Pods scheduled onto that host are unable to impact the node overall. Per-Pod limiting also does not protect the node agents themselves from PID exhaustion. You can also reserve an amount of PIDs for node overhead, separate from the allocation to Pods. This is similar to how you can reserve CPU, memory, or other resources for use by the operating system and other facilities outside of Pods and their containers. --> 在 Pod 级别设置 PID 限制使得管理员能够保护 Pod 之间不会互相伤害,不过无法 确保所有调度到该宿主机器上的所有 Pod 都不会影响到节点整体。 Pod 级别的限制也无法保护节点代理任务自身不会受到 PID 耗尽的影响。 你也可以预留一定量的 PID,作为节点的额外开销,与分配给 Pod 的 PID 集合独立。 这有点类似于在给操作系统和其它设施预留 CPU、内存或其它资源时所做的操作, 这些任务都在 Pod 及其所包含的容器之外运行。 <!-- PID limiting is a an important sibling to [compute resource](/docs/concepts/configuration/manage-resources-containers/) requests and limits. However, you specify it in a different way: rather than defining a Pod's resource limit in the `.spec` for a Pod, you configure the limit as a setting on the kubelet. Pod-defined PID limits are not currently supported. --> PID 限制是与[计算资源](/zh/docs/concepts/configuration/manage-resources-containers/) 请求和限制相辅相成的一种机制。不过,你需要用一种不同的方式来设置这一限制: 你需要将其设置到 kubelet 上而不是在 Pod 的 `.spec` 中为 Pod 设置资源限制。 目前还不支持在 Pod 级别设置 PID 限制。 {{< caution >}} <!-- This means that the limit that applies to a Pod may be different depending on where the Pod is scheduled. To make things simple, it's easiest if all Nodes use the same PID resource limits and reservations. --> 这意味着,施加在 Pod 之上的限制值可能因为 Pod 运行所在的节点不同而有差别。 为了简化系统,最简单的方法是为所有节点设置相同的 PID 资源限制和预留值。 {{< /caution >}} <!-- ## Node PID limits Kubernetes allows you to reserve a number of process IDs for the system use. To configure the reservation, use the parameter `pid=<number>` in the `--system-reserved` and `--kube-reserved` command line options to the kubelet. The value you specified declares that the specified number of process IDs will be reserved for the system as a whole and for Kubernetes system daemons respectively. --> ## 节点级别 PID 限制 {#node-pid-limits} Kubernetes 允许你为系统预留一定量的进程 ID。为了配置预留数量,你可以使用 kubelet 的 `--system-reserved` 和 `--kube-reserved` 命令行选项中的参数 `pid=<number>`。你所设置的参数值分别用来声明为整个系统和 Kubernetes 系统 守护进程所保留的进程 ID 数目。 {{< note >}} <!-- Before Kubernetes version 1.20, PID resource limiting with Node-level reservations required enabling the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) `SupportNodePidsLimit` to work. --> 在 Kubernetes 1.20 版本之前,在节点级别通过 PID 资源限制预留 PID 的能力 需要启用[特性门控](/zh/docs/reference/command-line-tools-reference/feature-gates/) `SupportNodePidsLimit` 才行。 {{< /note >}} <!-- ## Pod PID limits Kubernetes allows you to limit the number of processes running in a Pod. You specify this limit at the node level, rather than configuring it as a resource limit for a particular Pod. Each Node can have a different PID limit. To configure the limit, you can specify the command line parameter `--pod-max-pids` to the kubelet, or set `PodPidsLimit` in the kubelet [configuration file](/docs/tasks/administer-cluster/kubelet-config-file/). --> ## Pod 级别 PID 限制 {#pod-pid-limits} Kubernetes 允许你限制 Pod 中运行的进程个数。你可以在节点级别设置这一限制, 而不是为特定的 Pod 来将其设置为资源限制。 每个节点都可以有不同的 PID 限制设置。 要设置限制值,你可以设置 kubelet 的命令行参数 `--pod-max-pids`,或者 在 kubelet 的[配置文件](/zh/docs/tasks/administer-cluster/kubelet-config-file/) 中设置 `PodPidsLimit`。 {{< note >}} <!-- Before Kubernetes version 1.20, PID resource limiting for Pods required enabling the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) `SupportPodPidsLimit` to work. --> 在 Kubernetes 1.20 版本之前,为 Pod 设置 PID 资源限制的能力需要启用 [特性门控](/zh/docs/reference/command-line-tools-reference/feature-gates/) `SupportNodePidsLimit` 才行。 {{< /note >}} <!-- ## PID based eviction You can configure kubelet to start terminating a Pod when it is misbehaving and consuming abnormal amount of resources. This feature is called eviction. You can [Configure Out of Resource Handling](/docs/concepts/scheduling-eviction/node-pressure-eviction/) for various eviction signals. Use `pid.available` eviction signal to configure the threshold for number of PIDs used by Pod. You can set soft and hard eviction policies. However, even with the hard eviction policy, if the number of PIDs growing very fast, node can still get into unstable state by hitting the node PIDs limit. Eviction signal value is calculated periodically and does NOT enforce the limit. --> ## 基于 PID 的驱逐 {#pid-based-eviction} 你可以配置 kubelet 使之在 Pod 行为不正常或者消耗不正常数量资源的时候将其终止。 这一特性称作驱逐。你可以针对不同的驱逐信号 [配置资源不足的处理](/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/)。 使用 `pid.available` 驱逐信号来配置 Pod 使用的 PID 个数的阈值。 你可以设置硬性的和软性的驱逐策略。不过,即使使用硬性的驱逐策略, 如果 PID 个数增长过快,节点仍然可能因为触及节点 PID 限制而进入一种不稳定状态。 驱逐信号的取值是周期性计算的,而不是一直能够强制实施约束。 <!-- PID limiting - per Pod and per Node sets the hard limit. Once the limit is hit, workload will start experiencing failures when trying to get a new PID. It may or may not lead to rescheduling of a Pod, depending on how workload reacts on these failures and how liveleness and readiness probes are configured for the Pod. However, if limits were set correctly, you can guarantee that other Pods workload and system processes will not run out of PIDs when one Pod is misbehaving. --> Pod 级别和节点级别的 PID 限制会设置硬性限制。 一旦触及限制值,工作负载会在尝试获得新的 PID 时开始遇到问题。 这可能会也可能不会导致 Pod 被重新调度,取决于工作负载如何应对这类失败 以及 Pod 的存活性和就绪态探测是如何配置的。 可是,如果限制值被正确设置,你可以确保其它 Pod 负载和系统进程不会因为某个 Pod 行为不正常而没有 PID 可用。 ## {{% heading "whatsnext" %}} <!-- - Refer to the [PID Limiting enhancement document](https://github.com/kubernetes/enhancements/blob/097b4d8276bc9564e56adf72505d43ce9bc5e9e8/keps/sig-node/20190129-pid-limiting.md) for more information. - For historical context, read [Process ID Limiting for Stability Improvements in Kubernetes 1.14](/blog/2019/04/15/process-id-limiting-for-stability-improvements-in-kubernetes-1.14/). - Read [Managing Resources for Containers](/docs/concepts/configuration/manage-resources-containers/). - Learn how to [Configure Out of Resource Handling](/docs/concepts/scheduling-eviction/node-pressure-eviction/). --> - 参阅 [PID 约束改进文档](https://github.com/kubernetes/enhancements/blob/097b4d8276bc9564e56adf72505d43ce9bc5e9e8/keps/sig-node/20190129-pid-limiting.md) 以了解更多信息。 - 关于历史背景,请阅读 [Kubernetes 1.14 中限制进程 ID 以提升稳定性](/blog/2019/04/15/process-id-limiting-for-stability-improvements-in-kubernetes-1.14/) 的博文。 - 请阅读[为容器管理资源](/zh/docs/concepts/configuration/manage-resources-containers/)。 - 学习如何[配置资源不足情况的处理](/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/)。