diff --git a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md index a241ccee3d..537bde377c 100644 --- a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md +++ b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md @@ -213,6 +213,7 @@ for these devices: service PodResourcesLister { rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {} rpc GetAllocatableResources(AllocatableResourcesRequest) returns (AllocatableResourcesResponse) {} + rpc Get(GetPodResourcesRequest) returns (GetPodResourcesResponse) {} } ``` @@ -223,6 +224,14 @@ id of exclusively allocated CPUs, device id as it was reported by device plugins the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the information about memory and hugepages reserved for a container. +Starting from Kubernetes v1.27, the `List` enpoint can provide information on resources +of running pods allocated in `ResourceClaims` by the `DynamicResourceAllocation` API. To enable +this feature `kubelet` must be started with the following flags: + +``` +--feature-gates=DynamicResourceAllocation=true,KubeletPodResourcesDynamiceResources=true +``` + ```gRPC // ListPodResourcesResponse is the response returned by List function message ListPodResourcesResponse { @@ -242,6 +251,7 @@ message ContainerResources { repeated ContainerDevices devices = 2; repeated int64 cpu_ids = 3; repeated ContainerMemory memory = 4; + repeated DynamicResource dynamic_resources = 5; } // ContainerMemory contains information about memory and hugepages assigned to a container @@ -267,6 +277,28 @@ message ContainerDevices { repeated string device_ids = 2; TopologyInfo topology = 3; } + +// DynamicResource contains information about the devices assigned to a container by Dynamic Resource Allocation +message DynamicResource { + string class_name = 1; + string claim_name = 2; + string claim_namespace = 3; + repeated ClaimResource claim_resources = 4; +} + +// ClaimResource contains per-plugin resource information +message ClaimResource { + repeated CDIDevice cdi_devices = 1 [(gogoproto.customname) = "CDIDevices"]; +} + +// CDIDevice specifies a CDI device information +message CDIDevice { + // Fully qualified CDI device name + // for example: vendor.com/gpu=gpudevice1 + // see more details in the CDI specification: + // https://github.com/container-orchestrated-devices/container-device-interface/blob/main/SPEC.md + string name = 1; +} ``` {{< note >}} cpu_ids in the `ContainerResources` in the `List` endpoint correspond to exclusive CPUs allocated @@ -333,6 +365,36 @@ Support for the `PodResourcesLister service` requires `KubeletPodResources` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled. It is enabled by default starting with Kubernetes 1.15 and is v1 since Kubernetes 1.20. +### `Get` gRPC endpoint {#grpc-endpoint-get} + +{{< feature-state state="alpha" for_k8s_version="v1.27" >}} + +The `Get` endpoint provides information on resources of a running Pod. It exposes information +similar to those described in the `List` endpoint. The `Get` endpoint requires `PodName` +and `PodNamespace` of the running Pod. + +```gRPC +// GetPodResourcesRequest contains information about the pod +message GetPodResourcesRequest { + string pod_name = 1; + string pod_namespace = 2; +} +``` + +To enable this feature, you must start your kubelet services with the following flag: + +``` +--feature-gates=KubeletPodResourcesGet=true +``` + +The `Get` endpoint can provide Pod information related to dynamic resources +allocated by the dynamic resource allocation API. To enable this feature, you must +ensure your kubelet services are started with the following flags: + +``` +--feature-gates=KubeletPodResourcesGet=true,DynamicResourceAllocation=true,KubeletPodResourcesDynamiceResources=true +``` + ## Device plugin integration with the Topology Manager {{< feature-state for_k8s_version="v1.18" state="beta" >}} diff --git a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md index e1c468f58f..b2bca19c36 100644 --- a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md +++ b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md @@ -162,6 +162,12 @@ gets scheduled onto one node and then cannot run there, which is bad because such a pending Pod also blocks all other resources like RAM or CPU that were set aside for it. +## Monitoring resources + +The kubelet provides a gRPC service to enable discovery of dynamic resources of +running Pods. For more information on the gRPC endpoints, see the +[resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources). + ## Limitations The scheduler plugin must be involved in scheduling Pods which use diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 626cc6931f..cb56ed8841 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -125,8 +125,10 @@ For a reference to old feature gates that are removed, please refer to | `KubeletInUserNamespace` | `false` | Alpha | 1.22 | | | `KubeletPodResources` | `false` | Alpha | 1.13 | 1.14 | | `KubeletPodResources` | `true` | Beta | 1.15 | | +| `KubeletPodResourcesGet` | `false` | Alpha | 1.27 | | | `KubeletPodResourcesGetAllocatable` | `false` | Alpha | 1.21 | 1.22 | | `KubeletPodResourcesGetAllocatable` | `true` | Beta | 1.23 | | +| `KubeletPodResourcesDynamicResources` | `false` | Alpha | 1.27 | | | `KubeletTracing` | `false` | Alpha | 1.25 | | | `LegacyServiceAccountTokenTracking` | `false` | Alpha | 1.25 | | | `LocalStorageCapacityIsolationFSQuotaMonitoring` | `false` | Alpha | 1.15 | - | @@ -578,9 +580,14 @@ Each feature gate is designed for enabling/disabling a specific feature: - `KubeletPodResources`: Enable the kubelet's pod resources gRPC endpoint. See [Support Device Monitoring](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/606-compute-device-assignment/README.md) for more details. +- `KubeletPodResourcesGet`: Enable the `Get` gRPC endpoint on kubelet's for Pod resources. + This API augments the [resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources). - `KubeletPodResourcesGetAllocatable`: Enable the kubelet's pod resources `GetAllocatableResources` functionality. This API augments the [resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) +- `KubeletPodResourcesDynamiceResources`: Extend the kubelet's pod resources gRPC endpoint to + to include resources allocated in `ResourceClaims` via `DynamicResourceAllocation` API. + See [resource allocation reporting](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more details. with informations about the allocatable resources, enabling clients to properly track the free compute resources on a node. - `KubeletTracing`: Add support for distributed tracing in the kubelet.