doc: extend PodResources API for Dynamic Resource Allocation
Signed-off-by: Moshe Levi <moshele@nvidia.com>
This commit is contained in:
parent
bd456cf518
commit
eaf9199d07
|
@ -213,6 +213,7 @@ for these devices:
|
|||
service PodResourcesLister {
|
||||
rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
|
||||
rpc GetAllocatableResources(AllocatableResourcesRequest) returns (AllocatableResourcesResponse) {}
|
||||
rpc Get(GetPodResourcesRequest) returns (GetPodResourcesResponse) {}
|
||||
}
|
||||
```
|
||||
|
||||
|
@ -223,6 +224,14 @@ id of exclusively allocated CPUs, device id as it was reported by device plugins
|
|||
the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the
|
||||
information about memory and hugepages reserved for a container.
|
||||
|
||||
Starting from Kubernetes v1.27, the `List` enpoint can provide information on resources
|
||||
of running pods allocated in `ResourceClaims` by the `DynamicResourceAllocation` API. To enable
|
||||
this feature `kubelet` must be started with the following flags:
|
||||
|
||||
```
|
||||
--feature-gates=DynamicResourceAllocation=true,KubeletPodResourcesDynamiceResources=true
|
||||
```
|
||||
|
||||
```gRPC
|
||||
// ListPodResourcesResponse is the response returned by List function
|
||||
message ListPodResourcesResponse {
|
||||
|
@ -242,6 +251,7 @@ message ContainerResources {
|
|||
repeated ContainerDevices devices = 2;
|
||||
repeated int64 cpu_ids = 3;
|
||||
repeated ContainerMemory memory = 4;
|
||||
repeated DynamicResource dynamic_resources = 5;
|
||||
}
|
||||
|
||||
// ContainerMemory contains information about memory and hugepages assigned to a container
|
||||
|
@ -267,6 +277,28 @@ message ContainerDevices {
|
|||
repeated string device_ids = 2;
|
||||
TopologyInfo topology = 3;
|
||||
}
|
||||
|
||||
// DynamicResource contains information about the devices assigned to a container by Dynamic Resource Allocation
|
||||
message DynamicResource {
|
||||
string class_name = 1;
|
||||
string claim_name = 2;
|
||||
string claim_namespace = 3;
|
||||
repeated ClaimResource claim_resources = 4;
|
||||
}
|
||||
|
||||
// ClaimResource contains per-plugin resource information
|
||||
message ClaimResource {
|
||||
repeated CDIDevice cdi_devices = 1 [(gogoproto.customname) = "CDIDevices"];
|
||||
}
|
||||
|
||||
// CDIDevice specifies a CDI device information
|
||||
message CDIDevice {
|
||||
// Fully qualified CDI device name
|
||||
// for example: vendor.com/gpu=gpudevice1
|
||||
// see more details in the CDI specification:
|
||||
// https://github.com/container-orchestrated-devices/container-device-interface/blob/main/SPEC.md
|
||||
string name = 1;
|
||||
}
|
||||
```
|
||||
{{< note >}}
|
||||
cpu_ids in the `ContainerResources` in the `List` endpoint correspond to exclusive CPUs allocated
|
||||
|
@ -333,6 +365,36 @@ Support for the `PodResourcesLister service` requires `KubeletPodResources`
|
|||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
|
||||
It is enabled by default starting with Kubernetes 1.15 and is v1 since Kubernetes 1.20.
|
||||
|
||||
### `Get` gRPC endpoint {#grpc-endpoint-get}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.27" >}}
|
||||
|
||||
The `Get` endpoint provides information on resources of a running Pod. It exposes information
|
||||
similar to those described in the `List` endpoint. The `Get` endpoint requires `PodName`
|
||||
and `PodNamespace` of the running Pod.
|
||||
|
||||
```gRPC
|
||||
// GetPodResourcesRequest contains information about the pod
|
||||
message GetPodResourcesRequest {
|
||||
string pod_name = 1;
|
||||
string pod_namespace = 2;
|
||||
}
|
||||
```
|
||||
|
||||
To enable this feature, you must start your kubelet services with the following flag:
|
||||
|
||||
```
|
||||
--feature-gates=KubeletPodResourcesGet=true
|
||||
```
|
||||
|
||||
The `Get` endpoint can provide Pod information related to dynamic resources
|
||||
allocated by the dynamic resource allocation API. To enable this feature, you must
|
||||
ensure your kubelet services are started with the following flags:
|
||||
|
||||
```
|
||||
--feature-gates=KubeletPodResourcesGet=true,DynamicResourceAllocation=true,KubeletPodResourcesDynamiceResources=true
|
||||
```
|
||||
|
||||
## Device plugin integration with the Topology Manager
|
||||
|
||||
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
|
||||
|
|
|
@ -162,6 +162,12 @@ gets scheduled onto one node and then cannot run there, which is bad because
|
|||
such a pending Pod also blocks all other resources like RAM or CPU that were
|
||||
set aside for it.
|
||||
|
||||
## Monitoring resources
|
||||
|
||||
The kubelet provides a gRPC service to enable discovery of dynamic resources of
|
||||
running Pods. For more information on the gRPC endpoints, see the
|
||||
[resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources).
|
||||
|
||||
## Limitations
|
||||
|
||||
The scheduler plugin must be involved in scheduling Pods which use
|
||||
|
|
|
@ -125,8 +125,10 @@ For a reference to old feature gates that are removed, please refer to
|
|||
| `KubeletInUserNamespace` | `false` | Alpha | 1.22 | |
|
||||
| `KubeletPodResources` | `false` | Alpha | 1.13 | 1.14 |
|
||||
| `KubeletPodResources` | `true` | Beta | 1.15 | |
|
||||
| `KubeletPodResourcesGet` | `false` | Alpha | 1.27 | |
|
||||
| `KubeletPodResourcesGetAllocatable` | `false` | Alpha | 1.21 | 1.22 |
|
||||
| `KubeletPodResourcesGetAllocatable` | `true` | Beta | 1.23 | |
|
||||
| `KubeletPodResourcesDynamicResources` | `false` | Alpha | 1.27 | |
|
||||
| `KubeletTracing` | `false` | Alpha | 1.25 | |
|
||||
| `LegacyServiceAccountTokenTracking` | `false` | Alpha | 1.25 | |
|
||||
| `LocalStorageCapacityIsolationFSQuotaMonitoring` | `false` | Alpha | 1.15 | - |
|
||||
|
@ -578,9 +580,14 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
- `KubeletPodResources`: Enable the kubelet's pod resources gRPC endpoint. See
|
||||
[Support Device Monitoring](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/606-compute-device-assignment/README.md)
|
||||
for more details.
|
||||
- `KubeletPodResourcesGet`: Enable the `Get` gRPC endpoint on kubelet's for Pod resources.
|
||||
This API augments the [resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources).
|
||||
- `KubeletPodResourcesGetAllocatable`: Enable the kubelet's pod resources
|
||||
`GetAllocatableResources` functionality. This API augments the
|
||||
[resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources)
|
||||
- `KubeletPodResourcesDynamiceResources`: Extend the kubelet's pod resources gRPC endpoint to
|
||||
to include resources allocated in `ResourceClaims` via `DynamicResourceAllocation` API.
|
||||
See [resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more details.
|
||||
with informations about the allocatable resources, enabling clients to properly
|
||||
track the free compute resources on a node.
|
||||
- `KubeletTracing`: Add support for distributed tracing in the kubelet.
|
||||
|
|
Loading…
Reference in New Issue