412 lines
17 KiB
Markdown
412 lines
17 KiB
Markdown
---
|
||
title: 动态资源分配
|
||
content_type: concept
|
||
weight: 65
|
||
---
|
||
<!--
|
||
reviewers:
|
||
- klueska
|
||
- pohly
|
||
title: Dynamic Resource Allocation
|
||
content_type: concept
|
||
weight: 65
|
||
-->
|
||
|
||
<!-- overview -->
|
||
|
||
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}
|
||
|
||
<!--
|
||
Dynamic resource allocation is an API for requesting and sharing resources
|
||
between pods and containers inside a pod. It is a generalization of the
|
||
persistent volumes API for generic resources. Third-party resource drivers are
|
||
responsible for tracking and allocating resources. Different kinds of
|
||
resources support arbitrary parameters for defining requirements and
|
||
initialization.
|
||
-->
|
||
动态资源分配是一个用于在 Pod 之间和 Pod 内部容器之间请求和共享资源的 API。
|
||
它是对为通用资源所提供的持久卷 API 的泛化。第三方资源驱动程序负责跟踪和分配资源。
|
||
不同类型的资源支持用任意参数进行定义和初始化。
|
||
|
||
## {{% heading "prerequisites" %}}
|
||
|
||
<!--
|
||
Kubernetes v{{< skew currentVersion >}} includes cluster-level API support for
|
||
dynamic resource allocation, but it [needs to be
|
||
enabled](#enabling-dynamic-resource-allocation) explicitly. You also must
|
||
install a resource driver for specific resources that are meant to be managed
|
||
using this API. If you are not running Kubernetes v{{< skew currentVersion>}},
|
||
check the documentation for that version of Kubernetes.
|
||
-->
|
||
Kubernetes v{{< skew currentVersion >}} 包含用于动态资源分配的集群级 API 支持,
|
||
但它需要被[显式启用](#enabling-dynamic-resource-allocation)。
|
||
你还必须为此 API 要管理的特定资源安装资源驱动程序。
|
||
如果你未运行 Kubernetes v{{< skew currentVersion>}},
|
||
请查看对应版本的 Kubernetes 文档。
|
||
|
||
<!-- body -->
|
||
|
||
## API
|
||
|
||
<!--
|
||
The `resource.k8s.io/v1alpha2` {{< glossary_tooltip text="API group"
|
||
term_id="api-group" >}} provides four types:
|
||
-->
|
||
`resource.k8s.io/v1alpha2`
|
||
{{< glossary_tooltip text="API 组" term_id="api-group" >}}提供四种类型:
|
||
|
||
<!--
|
||
ResourceClass
|
||
: Defines which resource driver handles a certain kind of
|
||
resource and provides common parameters for it. ResourceClasses
|
||
are created by a cluster administrator when installing a resource
|
||
driver.
|
||
|
||
ResourceClaim
|
||
: Defines a particular resource instances that is required by a
|
||
workload. Created by a user (lifecycle managed manually, can be shared
|
||
between different Pods) or for individual Pods by the control plane based on
|
||
a ResourceClaimTemplate (automatic lifecycle, typically used by just one
|
||
Pod).
|
||
|
||
ResourceClaimTemplate
|
||
: Defines the spec and some meta data for creating
|
||
ResourceClaims. Created by a user when deploying a workload.
|
||
|
||
PodSchedulingContext
|
||
: Used internally by the control plane and resource drivers
|
||
to coordinate pod scheduling when ResourceClaims need to be allocated
|
||
for a Pod.
|
||
-->
|
||
ResourceClass
|
||
: 定义由哪个资源驱动程序处理某种资源,并为其提供通用参数。
|
||
集群管理员在安装资源驱动程序时创建 ResourceClass。
|
||
|
||
ResourceClaim
|
||
: 定义工作负载所需的特定资源实例。
|
||
由用户创建(手动管理生命周期,可以在不同的 Pod 之间共享),
|
||
或者由控制平面基于 ResourceClaimTemplate 为特定 Pod 创建
|
||
(自动管理生命周期,通常仅由一个 Pod 使用)。
|
||
|
||
ResourceClaimTemplate
|
||
: 定义用于创建 ResourceClaim 的 spec 和一些元数据。
|
||
部署工作负载时由用户创建。
|
||
|
||
PodSchedulingContext
|
||
: 供控制平面和资源驱动程序内部使用,
|
||
在需要为 Pod 分配 ResourceClaim 时协调 Pod 调度。
|
||
|
||
<!--
|
||
Parameters for ResourceClass and ResourceClaim are stored in separate objects,
|
||
typically using the type defined by a {{< glossary_tooltip
|
||
term_id="CustomResourceDefinition" text="CRD" >}} that was created when
|
||
installing a resource driver.
|
||
-->
|
||
ResourceClass 和 ResourceClaim 的参数存储在单独的对象中,通常使用安装资源驱动程序时创建的
|
||
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CRD" >}} 所定义的类型。
|
||
|
||
<!--
|
||
The `core/v1` `PodSpec` defines ResourceClaims that are needed for a Pod in a
|
||
`resourceClaims` field. Entries in that list reference either a ResourceClaim
|
||
or a ResourceClaimTemplate. When referencing a ResourceClaim, all Pods using
|
||
this PodSpec (for example, inside a Deployment or StatefulSet) share the same
|
||
ResourceClaim instance. When referencing a ResourceClaimTemplate, each Pod gets
|
||
its own instance.
|
||
-->
|
||
`core/v1` 的 `PodSpec` 在 `resourceClaims` 字段中定义 Pod 所需的 ResourceClaim。
|
||
该列表中的条目引用 ResourceClaim 或 ResourceClaimTemplate。
|
||
当引用 ResourceClaim 时,使用此 PodSpec 的所有 Pod
|
||
(例如 Deployment 或 StatefulSet 中的 Pod)共享相同的 ResourceClaim 实例。
|
||
引用 ResourceClaimTemplate 时,每个 Pod 都有自己的实例。
|
||
|
||
<!--
|
||
The `resources.claims` list for container resources defines whether a container gets
|
||
access to these resource instances, which makes it possible to share resources
|
||
between one or more containers.
|
||
|
||
Here is an example for a fictional resource driver. Two ResourceClaim objects
|
||
will get created for this Pod and each container gets access to one of them.
|
||
-->
|
||
容器资源的 `resources.claims` 列表定义容器可以访问的资源实例,
|
||
从而可以实现在一个或多个容器之间共享资源。
|
||
|
||
下面是一个虚构的资源驱动程序的示例。
|
||
该示例将为此 Pod 创建两个 ResourceClaim 对象,每个容器都可以访问其中一个。
|
||
|
||
```yaml
|
||
apiVersion: resource.k8s.io/v1alpha2
|
||
kind: ResourceClass
|
||
name: resource.example.com
|
||
driverName: resource-driver.example.com
|
||
---
|
||
apiVersion: cats.resource.example.com/v1
|
||
kind: ClaimParameters
|
||
name: large-black-cat-claim-parameters
|
||
spec:
|
||
color: black
|
||
size: large
|
||
---
|
||
apiVersion: resource.k8s.io/v1alpha2
|
||
kind: ResourceClaimTemplate
|
||
metadata:
|
||
name: large-black-cat-claim-template
|
||
spec:
|
||
spec:
|
||
resourceClassName: resource.example.com
|
||
parametersRef:
|
||
apiGroup: cats.resource.example.com
|
||
kind: ClaimParameters
|
||
name: large-black-cat-claim-parameters
|
||
–--
|
||
apiVersion: v1
|
||
kind: Pod
|
||
metadata:
|
||
name: pod-with-cats
|
||
spec:
|
||
containers:
|
||
- name: container0
|
||
image: ubuntu:20.04
|
||
command: ["sleep", "9999"]
|
||
resources:
|
||
claims:
|
||
- name: cat-0
|
||
- name: container1
|
||
image: ubuntu:20.04
|
||
command: ["sleep", "9999"]
|
||
resources:
|
||
claims:
|
||
- name: cat-1
|
||
resourceClaims:
|
||
- name: cat-0
|
||
source:
|
||
resourceClaimTemplateName: large-black-cat-claim-template
|
||
- name: cat-1
|
||
source:
|
||
resourceClaimTemplateName: large-black-cat-claim-template
|
||
```
|
||
|
||
<!--
|
||
## Scheduling
|
||
-->
|
||
## 调度 {#scheduling}
|
||
|
||
<!--
|
||
In contrast to native resources (CPU, RAM) and extended resources (managed by a
|
||
device plugin, advertised by kubelet), the scheduler has no knowledge of what
|
||
dynamic resources are available in a cluster or how they could be split up to
|
||
satisfy the requirements of a specific ResourceClaim. Resource drivers are
|
||
responsible for that. They mark ResourceClaims as "allocated" once resources
|
||
for it are reserved. This also then tells the scheduler where in the cluster a
|
||
ResourceClaim is available.
|
||
-->
|
||
与原生资源(CPU、RAM)和扩展资源(由设备插件管理,并由 kubelet 公布)不同,
|
||
调度器不知道集群中有哪些动态资源,
|
||
也不知道如何将它们拆分以满足特定 ResourceClaim 的要求。
|
||
资源驱动程序负责这些任务。
|
||
资源驱动程序在为 ResourceClaim 保留资源后将其标记为“已分配(Allocated)”。
|
||
然后告诉调度器集群中可用的 ResourceClaim 的位置。
|
||
|
||
<!--
|
||
ResourceClaims can get allocated as soon as they are created ("immediate
|
||
allocation"), without considering which Pods will use them. The default is to
|
||
delay allocation until a Pod gets scheduled which needs the ResourceClaim
|
||
(i.e. "wait for first consumer").
|
||
-->
|
||
ResourceClaim 可以在创建时就进行分配(“立即分配”),不用考虑哪些 Pod 将使用它。
|
||
默认情况下采用延迟分配,直到需要 ResourceClaim 的 Pod 被调度时
|
||
(即“等待第一个消费者”)再进行分配。
|
||
|
||
<!--
|
||
In that mode, the scheduler checks all ResourceClaims needed by a Pod and
|
||
creates a PodScheduling object where it informs the resource drivers
|
||
responsible for those ResourceClaims about nodes that the scheduler considers
|
||
suitable for the Pod. The resource drivers respond by excluding nodes that
|
||
don't have enough of the driver's resources left. Once the scheduler has that
|
||
information, it selects one node and stores that choice in the PodScheduling
|
||
object. The resource drivers then allocate their ResourceClaims so that the
|
||
resources will be available on that node. Once that is complete, the Pod
|
||
gets scheduled.
|
||
-->
|
||
在这种模式下,调度器检查 Pod 所需的所有 ResourceClaim,并创建一个 PodScheduling 对象,
|
||
通知负责这些 ResourceClaim 的资源驱动程序,告知它们调度器认为适合该 Pod 的节点。
|
||
资源驱动程序通过排除没有足够剩余资源的节点来响应调度器。
|
||
一旦调度器有了这些信息,它就会选择一个节点,并将该选择存储在 PodScheduling 对象中。
|
||
然后,资源驱动程序为分配其 ResourceClaim,以便资源可用于该节点。
|
||
完成后,Pod 就会被调度。
|
||
|
||
<!--
|
||
As part of this process, ResourceClaims also get reserved for the
|
||
Pod. Currently ResourceClaims can either be used exclusively by a single Pod or
|
||
an unlimited number of Pods.
|
||
-->
|
||
作为此过程的一部分,ResourceClaim 会为 Pod 保留。
|
||
目前,ResourceClaim 可以由单个 Pod 独占使用或不限数量的多个 Pod 使用。
|
||
|
||
<!--
|
||
One key feature is that Pods do not get scheduled to a node unless all of
|
||
their resources are allocated and reserved. This avoids the scenario where a Pod
|
||
gets scheduled onto one node and then cannot run there, which is bad because
|
||
such a pending Pod also blocks all other resources like RAM or CPU that were
|
||
set aside for it.
|
||
-->
|
||
除非 Pod 的所有资源都已分配和保留,否则 Pod 不会被调度到节点,这是一个重要特性。
|
||
这避免了 Pod 被调度到一个节点但无法在那里运行的情况,
|
||
这种情况很糟糕,因为被挂起 Pod 也会阻塞为其保留的其他资源,如 RAM 或 CPU。
|
||
|
||
{{< note >}}
|
||
<!--
|
||
Scheduling of pods which use ResourceClaims is going to be slower because of
|
||
the additional communication that is required. Beware that this may also impact
|
||
pods that don't use ResourceClaims because only one pod at a time gets
|
||
scheduled, blocking API calls are made while handling a pod with
|
||
ResourceClaims, and thus scheduling the next pod gets delayed.
|
||
-->
|
||
由于需要额外的通信,使用 ResourceClaim 的 Pod 的调度将会变慢。
|
||
请注意,这也可能会影响不使用 ResourceClaim 的 Pod,因为一次仅调度一个
|
||
Pod,在使用 ResourceClaim 处理 Pod 时会进行阻塞 API 调用,
|
||
从而推迟调度下一个 Pod。
|
||
{{< /note >}}
|
||
|
||
<!--
|
||
## Monitoring resources
|
||
-->
|
||
## 监控资源 {#monitoring-resources}
|
||
|
||
<!--
|
||
The kubelet provides a gRPC service to enable discovery of dynamic resources of
|
||
running Pods. For more information on the gRPC endpoints, see the
|
||
[resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources).
|
||
-->
|
||
kubelet 提供了一个 gRPC 服务,以便发现正在运行的 Pod 的动态资源。
|
||
有关 gRPC 端点的更多信息,请参阅[资源分配报告](/zh-cn/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources)。
|
||
|
||
<!--
|
||
## Pre-scheduled Pods
|
||
|
||
When you - or another API client - create a Pod with `spec.nodeName` already set, the scheduler gets bypassed.
|
||
If some ResourceClaim needed by that Pod does not exist yet, is not allocated
|
||
or not reserved for the Pod, then the kubelet will fail to run the Pod and
|
||
re-check periodically because those requirements might still get fulfilled
|
||
later.
|
||
-->
|
||
## 预调度的 Pod {#pre-scheduled-pods}
|
||
|
||
当你(或别的 API 客户端)创建设置了 `spec.nodeName` 的 Pod 时,调度器将被绕过。
|
||
如果 Pod 所需的某个 ResourceClaim 尚不存在、未被分配或未为该 Pod 保留,那么 kubelet
|
||
将无法运行该 Pod,并会定期重新检查,因为这些要求可能在以后得到满足。
|
||
|
||
<!--
|
||
Such a situation can also arise when support for dynamic resource allocation
|
||
was not enabled in the scheduler at the time when the Pod got scheduled
|
||
(version skew, configuration, feature gate, etc.). kube-controller-manager
|
||
detects this and tries to make the Pod runnable by triggering allocation and/or
|
||
reserving the required ResourceClaims.
|
||
-->
|
||
这种情况也可能发生在 Pod 被调度时调度器中未启用动态资源分配支持的时候(原因可能是版本偏差、配置、特性门控等)。
|
||
kube-controller-manager 能够检测到这一点,并尝试通过触发分配和/或预留所需的 ResourceClaim 来使 Pod 可运行。
|
||
|
||
<!--
|
||
However, it is better to avoid this because a Pod that is assigned to a node
|
||
blocks normal resources (RAM, CPU) that then cannot be used for other Pods
|
||
while the Pod is stuck. To make a Pod run on a specific node while still going
|
||
through the normal scheduling flow, create the Pod with a node selector that
|
||
exactly matches the desired node:
|
||
-->
|
||
然而,最好避免这种情况,因为分配给节点的 Pod 会锁住一些正常的资源(RAM、CPU),
|
||
而这些资源在 Pod 被卡住时无法用于其他 Pod。为了让一个 Pod 在特定节点上运行,
|
||
同时仍然通过正常的调度流程进行,请在创建 Pod 时使用与期望的节点精确匹配的节点选择算符:
|
||
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: Pod
|
||
metadata:
|
||
name: pod-with-cats
|
||
spec:
|
||
nodeSelector:
|
||
kubernetes.io/hostname: name-of-the-intended-node
|
||
...
|
||
```
|
||
|
||
<!--
|
||
You may also be able to mutate the incoming Pod, at admission time, to unset
|
||
the `.spec.nodeName` field and to use a node selector instead.
|
||
-->
|
||
你还可以在准入时变更传入的 Pod,取消设置 `.spec.nodeName` 字段,并改为使用节点选择算符。
|
||
|
||
<!--
|
||
## Enabling dynamic resource allocation
|
||
-->
|
||
## 启用动态资源分配 {#enabling-dynamic-resource-allocation}
|
||
|
||
<!--
|
||
Dynamic resource allocation is an *alpha feature* and only enabled when the
|
||
`DynamicResourceAllocation` [feature
|
||
gate](/docs/reference/command-line-tools-reference/feature-gates/) and the
|
||
`resource.k8s.io/v1alpha2` {{< glossary_tooltip text="API group"
|
||
term_id="api-group" >}} are enabled. For details on that, see the
|
||
`--feature-gates` and `--runtime-config` [kube-apiserver
|
||
parameters](/docs/reference/command-line-tools-reference/kube-apiserver/).
|
||
kube-scheduler, kube-controller-manager and kubelet also need the feature gate.
|
||
-->
|
||
动态资源分配是一个 **Alpha 特性**,只有在启用 `DynamicResourceAllocation`
|
||
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
|
||
和 `resource.k8s.io/v1alpha1`
|
||
{{< glossary_tooltip text="API 组" term_id="api-group" >}} 时才启用。
|
||
有关详细信息,参阅 `--feature-gates` 和 `--runtime-config`
|
||
[kube-apiserver 参数](/zh-cn/docs/reference/command-line-tools-reference/kube-apiserver/)。
|
||
kube-scheduler、kube-controller-manager 和 kubelet 也需要设置该特性门控。
|
||
|
||
<!--
|
||
A quick check whether a Kubernetes cluster supports the feature is to list
|
||
ResourceClass objects with:
|
||
-->
|
||
快速检查 Kubernetes 集群是否支持该功能的方法是列出 ResourceClass 对象:
|
||
|
||
```shell
|
||
kubectl get resourceclasses
|
||
```
|
||
|
||
<!--
|
||
If your cluster supports dynamic resource allocation, the response is either a
|
||
list of ResourceClass objects or:
|
||
-->
|
||
如果你的集群支持动态资源分配,则响应是 ResourceClass 对象列表或:
|
||
|
||
```
|
||
No resources found
|
||
```
|
||
|
||
<!--
|
||
If not supported, this error is printed instead:
|
||
-->
|
||
如果不支持,则会输出如下错误:
|
||
|
||
```
|
||
error: the server doesn't have a resource type "resourceclasses"
|
||
```
|
||
|
||
<!--
|
||
The default configuration of kube-scheduler enables the "DynamicResources"
|
||
plugin if and only if the feature gate is enabled and when using
|
||
the v1 configuration API. Custom configurations may have to be modified to
|
||
include it.
|
||
-->
|
||
kube-scheduler 的默认配置仅在启用特性门控且使用 v1 配置 API 时才启用 "DynamicResources" 插件。
|
||
自定义配置可能需要被修改才能启用它。
|
||
|
||
<!--
|
||
In addition to enabling the feature in the cluster, a resource driver also has to
|
||
be installed. Please refer to the driver's documentation for details.
|
||
-->
|
||
除了在集群中启用该功能外,还必须安装资源驱动程序。
|
||
欲了解详细信息,请参阅驱动程序的文档。
|
||
|
||
## {{% heading "whatsnext" %}}
|
||
|
||
<!--
|
||
- For more information on the design, see the
|
||
[Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md).
|
||
-->
|
||
- 了解更多该设计的信息,
|
||
参阅[动态资源分配 KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md)。
|