doc: extend PodResources API for Dynamic Resource Allocation

Signed-off-by: Moshe Levi <moshele@nvidia.com>
This commit is contained in:
Moshe Levi 2023-03-14 01:58:36 +02:00
parent bd456cf518
commit eaf9199d07
3 changed files with 75 additions and 0 deletions

View File

@ -213,6 +213,7 @@ for these devices:
service PodResourcesLister {
rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
rpc GetAllocatableResources(AllocatableResourcesRequest) returns (AllocatableResourcesResponse) {}
rpc Get(GetPodResourcesRequest) returns (GetPodResourcesResponse) {}
}
```
@ -223,6 +224,14 @@ id of exclusively allocated CPUs, device id as it was reported by device plugins
the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the
information about memory and hugepages reserved for a container.
Starting from Kubernetes v1.27, the `List` enpoint can provide information on resources
of running pods allocated in `ResourceClaims` by the `DynamicResourceAllocation` API. To enable
this feature `kubelet` must be started with the following flags:
```
--feature-gates=DynamicResourceAllocation=true,KubeletPodResourcesDynamiceResources=true
```
```gRPC
// ListPodResourcesResponse is the response returned by List function
message ListPodResourcesResponse {
@ -242,6 +251,7 @@ message ContainerResources {
repeated ContainerDevices devices = 2;
repeated int64 cpu_ids = 3;
repeated ContainerMemory memory = 4;
repeated DynamicResource dynamic_resources = 5;
}
// ContainerMemory contains information about memory and hugepages assigned to a container
@ -267,6 +277,28 @@ message ContainerDevices {
repeated string device_ids = 2;
TopologyInfo topology = 3;
}
// DynamicResource contains information about the devices assigned to a container by Dynamic Resource Allocation
message DynamicResource {
string class_name = 1;
string claim_name = 2;
string claim_namespace = 3;
repeated ClaimResource claim_resources = 4;
}
// ClaimResource contains per-plugin resource information
message ClaimResource {
repeated CDIDevice cdi_devices = 1 [(gogoproto.customname) = "CDIDevices"];
}
// CDIDevice specifies a CDI device information
message CDIDevice {
// Fully qualified CDI device name
// for example: vendor.com/gpu=gpudevice1
// see more details in the CDI specification:
// https://github.com/container-orchestrated-devices/container-device-interface/blob/main/SPEC.md
string name = 1;
}
```
{{< note >}}
cpu_ids in the `ContainerResources` in the `List` endpoint correspond to exclusive CPUs allocated
@ -333,6 +365,36 @@ Support for the `PodResourcesLister service` requires `KubeletPodResources`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
It is enabled by default starting with Kubernetes 1.15 and is v1 since Kubernetes 1.20.
### `Get` gRPC endpoint {#grpc-endpoint-get}
{{< feature-state state="alpha" for_k8s_version="v1.27" >}}
The `Get` endpoint provides information on resources of a running Pod. It exposes information
similar to those described in the `List` endpoint. The `Get` endpoint requires `PodName`
and `PodNamespace` of the running Pod.
```gRPC
// GetPodResourcesRequest contains information about the pod
message GetPodResourcesRequest {
string pod_name = 1;
string pod_namespace = 2;
}
```
To enable this feature, you must start your kubelet services with the following flag:
```
--feature-gates=KubeletPodResourcesGet=true
```
The `Get` endpoint can provide Pod information related to dynamic resources
allocated by the dynamic resource allocation API. To enable this feature, you must
ensure your kubelet services are started with the following flags:
```
--feature-gates=KubeletPodResourcesGet=true,DynamicResourceAllocation=true,KubeletPodResourcesDynamiceResources=true
```
## Device plugin integration with the Topology Manager
{{< feature-state for_k8s_version="v1.18" state="beta" >}}

View File

@ -162,6 +162,12 @@ gets scheduled onto one node and then cannot run there, which is bad because
such a pending Pod also blocks all other resources like RAM or CPU that were
set aside for it.
## Monitoring resources
The kubelet provides a gRPC service to enable discovery of dynamic resources of
running Pods. For more information on the gRPC endpoints, see the
[resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources).
## Limitations
The scheduler plugin must be involved in scheduling Pods which use

View File

@ -125,8 +125,10 @@ For a reference to old feature gates that are removed, please refer to
| `KubeletInUserNamespace` | `false` | Alpha | 1.22 | |
| `KubeletPodResources` | `false` | Alpha | 1.13 | 1.14 |
| `KubeletPodResources` | `true` | Beta | 1.15 | |
| `KubeletPodResourcesGet` | `false` | Alpha | 1.27 | |
| `KubeletPodResourcesGetAllocatable` | `false` | Alpha | 1.21 | 1.22 |
| `KubeletPodResourcesGetAllocatable` | `true` | Beta | 1.23 | |
| `KubeletPodResourcesDynamicResources` | `false` | Alpha | 1.27 | |
| `KubeletTracing` | `false` | Alpha | 1.25 | |
| `LegacyServiceAccountTokenTracking` | `false` | Alpha | 1.25 | |
| `LocalStorageCapacityIsolationFSQuotaMonitoring` | `false` | Alpha | 1.15 | - |
@ -578,9 +580,14 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `KubeletPodResources`: Enable the kubelet's pod resources gRPC endpoint. See
[Support Device Monitoring](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/606-compute-device-assignment/README.md)
for more details.
- `KubeletPodResourcesGet`: Enable the `Get` gRPC endpoint on kubelet's for Pod resources.
This API augments the [resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources).
- `KubeletPodResourcesGetAllocatable`: Enable the kubelet's pod resources
`GetAllocatableResources` functionality. This API augments the
[resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources)
- `KubeletPodResourcesDynamiceResources`: Extend the kubelet's pod resources gRPC endpoint to
to include resources allocated in `ResourceClaims` via `DynamicResourceAllocation` API.
See [resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more details.
with informations about the allocatable resources, enabling clients to properly
track the free compute resources on a node.
- `KubeletTracing`: Add support for distributed tracing in the kubelet.