Addressed comments, added protocol overview, explained impl differences
This commit is contained in:
parent
4cbc77e491
commit
f1b462b123
Binary file not shown.
|
After Width: | Height: | Size: 53 KiB |
|
|
@ -3,30 +3,31 @@ Device Manager Proposal
|
||||||
|
|
||||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||||
|
|
||||||
- [Motivation](#motivation)
|
* [Motivation](#motivation)
|
||||||
- [Use Cases](#use-cases)
|
* [Use Cases](#use-cases)
|
||||||
- [Objectives](#objectives)
|
* [Objectives](#objectives)
|
||||||
- [Non Objectives](#non-objectives)
|
* [Non Objectives](#non-objectives)
|
||||||
- [Proposed Implementation 1](#proposed-implementation-1)
|
* [Proposed Implementation 1](#proposed-implementation-1)
|
||||||
- [Vendor story](#vendor-story)
|
* [Vendor story](#vendor-story)
|
||||||
- [End User story](#end-user-story)
|
* [End User story](#end-user-story)
|
||||||
- [Device Plugin](#device-plugin)
|
* [Device Plugin](#device-plugin)
|
||||||
- [Introduction](#introduction)
|
* [Introduction](#introduction)
|
||||||
- [Registration](#registration)
|
* [Registration](#registration)
|
||||||
- [Unix Socket](#unix-socket)
|
* [Unix Socket](#unix-socket)
|
||||||
- [Protocol Overview](#protocol-overview)
|
* [Protocol Overview](#protocol-overview)
|
||||||
- [Protobuf specification](#protobuf-specification)
|
* [Protobuf specification](#protobuf-specification)
|
||||||
- [Proposed Implementation 2](#proposed-implementation-2)
|
* [HealthCheck and Failure Recovery](#healthcheck-and-failure-recovery)
|
||||||
- [Device Plugin Lifecycle](#device-plugin-lifecycle)
|
* [API Changes](#api-changes)
|
||||||
- [Protobuf API](#protobuf-api)
|
* [Upgrading your cluster](#upgrading-your-cluster)
|
||||||
- [Failure recovery](#failure-recovery)
|
* [Proposed Implementation 2](#proposed-implementation-2)
|
||||||
- [Roadmap](#roadmap)
|
* [Device Plugin Lifecycle](#device-plugin-lifecycle)
|
||||||
- [Open Questions](#open-questions-1)
|
* [Protobuf API](#protobuf-api)
|
||||||
- [Installation](#installation)
|
* [Failure recovery](#failure-recovery)
|
||||||
- [Versioning](#versioning)
|
* [Roadmap](#roadmap)
|
||||||
- [References](#references)
|
* [Open Questions](#open-questions-1)
|
||||||
|
* [Installation](#installation)
|
||||||
<!-- END MUNGE: GENERATED_TOC -->
|
* [Versioning](#versioning)
|
||||||
|
* [References](#references)
|
||||||
|
|
||||||
_Authors:_
|
_Authors:_
|
||||||
|
|
||||||
|
|
@ -48,7 +49,7 @@ This document describes a vendor independant solution to:
|
||||||
* Discovering and representing external devices
|
* Discovering and representing external devices
|
||||||
* Making these devices available to the containers using these devices and
|
* Making these devices available to the containers using these devices and
|
||||||
cleaning them up afterwards
|
cleaning them up afterwards
|
||||||
* Monitoring these devices
|
* Health Check of these devices
|
||||||
|
|
||||||
Because devices are vendor dependant and have their own sets of problems
|
Because devices are vendor dependant and have their own sets of problems
|
||||||
and mechanisms, the solution we describe is a plugin mechanism that may run
|
and mechanisms, the solution we describe is a plugin mechanism that may run
|
||||||
|
|
@ -85,33 +86,43 @@ the following simple steps:
|
||||||
|
|
||||||
1. Advanced scheduling and resource selection (solved through
|
1. Advanced scheduling and resource selection (solved through
|
||||||
[#782](https://github.com/Kubernetes/community/pull/782)).
|
[#782](https://github.com/Kubernetes/community/pull/782)).
|
||||||
We will only try to give basic selection primitives to the devices
|
2. Collecting metrics is not part of this proposal. We will only solve
|
||||||
2. Metrics: this should be the job of cadvisor and should probably either be
|
Health Check.
|
||||||
addressed there (cadvisor) or if people feel there is a case to be made
|
|
||||||
for it being addressed in the Device Plugin, in a follow up proposal.
|
|
||||||
|
|
||||||
# Proposed Implementation 1
|
# Proposed Implementation 1
|
||||||
|
|
||||||
|
## TLDR
|
||||||
|
|
||||||
|
At their core, device plugins are simple gRPC servers that may run in a
|
||||||
|
container deployed through the pod mechanism.
|
||||||
|
|
||||||
|
These servers implement the gRPC interface defined later in this design
|
||||||
|
document and once the device plugin makes itself known to kubelet, kubelet
|
||||||
|
will interact with the device through three simple functions:
|
||||||
|
1. A `ListDevices` function for the kubelet to Discover the devices and
|
||||||
|
their properties.
|
||||||
|
2. An `Allocate` function which is called before container creation
|
||||||
|
3. A `HealthCheck` function to notify Kubelet whenever a device becomes
|
||||||
|
unhealthy.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
## Vendor story
|
## Vendor story
|
||||||
|
|
||||||
Kubernetes provides to vendors a mechanism called device plugins to:
|
Kubernetes provides to vendors a mechanism called device plugins to:
|
||||||
* advertise devices.
|
* advertise devices.
|
||||||
* monitor devices (currently perform health checks).
|
* monitor devices (currently perform health checks).
|
||||||
* hook into the runtime to instruct Kubelet what are the steps to
|
* hook into the runtime to execute device specific instructions
|
||||||
take in order to make the device available (or cleanup the device).
|
(e.g: Clean GPU memory) and instruct Kubelet what are the steps
|
||||||
|
to take in order to make the device available in the container.
|
||||||
|
|
||||||
A device plugin at it's core is a simple gRPC server usually running in
|
```go
|
||||||
a container and deployed across clusters through a daemonSet.
|
|
||||||
|
|
||||||
```gRPC
|
|
||||||
service DevicePlugin {
|
service DevicePlugin {
|
||||||
rpc Discover(Empty) returns (stream Device) {}
|
rpc ListDevices(Empty) returns (stream Device) {}
|
||||||
rpc Monitor(Empty) returns (stream DeviceHealth) {}
|
rpc HealthCheck(Empty) returns (stream Device) {}
|
||||||
|
|
||||||
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
|
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
|
||||||
rpc Deallocate(DeallocateRequest) returns (Empty) {}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The gRPC server that the device plugin must implement is expected to
|
The gRPC server that the device plugin must implement is expected to
|
||||||
|
|
@ -120,44 +131,44 @@ be advertised on a unix socket in a mounted hostPath (e.g:
|
||||||
|
|
||||||
Finally, to notify Kubelet of the existence of the device plugin,
|
Finally, to notify Kubelet of the existence of the device plugin,
|
||||||
the vendor's device plugin will have to make a request to Kubelet's
|
the vendor's device plugin will have to make a request to Kubelet's
|
||||||
onwn gRPC server.
|
own gRPC server.
|
||||||
Only then will kubelet start interacting with the vendor's device plugin
|
Only then will kubelet start interacting with the vendor's device plugin
|
||||||
through the gRPC apis.
|
through the gRPC apis.
|
||||||
|
|
||||||
## End User story
|
## End User story
|
||||||
|
|
||||||
When setting up the cluster the admin knows what kind of devices are present
|
When setting up the cluster the admins knows what kind of devices are present
|
||||||
on the different machines and therefore can select what devices they want to
|
on the different machines and therefore can select what devices he want to
|
||||||
enable.
|
enable.
|
||||||
|
|
||||||
The cluster admins knows his cluster has NVIDIA GPUs therefore he deploys
|
The cluster admins knows his cluster has NVIDIA GPUs therefore he deploys
|
||||||
the NVIDIA device plugin through:
|
the NVIDIA device plugin through:
|
||||||
`kubectl create -f NVIDIA.io/device-plugin.yml`
|
`kubectl create -f nvidia.io/device-plugin.yml`
|
||||||
|
|
||||||
The device plugin lands on all the nodes of the cluster and if it detects that
|
The device plugin lands on all the nodes of the cluster and if it detects that
|
||||||
there are no GPUs it terminates. However, when there are GPUs it reports them
|
there are no GPUs it terminates. However, when there are GPUs it reports them
|
||||||
to Kubelet.
|
to Kubelet and starts it's gRPC server to monitor devices and hook into the
|
||||||
For device plugins reporting non-GPU Devices these are advertised as
|
container creation process.
|
||||||
OIRs and selected through the same method.
|
|
||||||
|
|
||||||
1. A user submits a pod spec requesting X GPUs (or devices)
|
Device Plugins reporting non-GPU Devices are advertised as OIRs of the shape
|
||||||
|
`extensions.kubernetes.io/vendor-device` GPUs are advertised as `nvidia-gpu`.
|
||||||
|
Devices can be selected using the same process as for OIRs in the pod spec.
|
||||||
|
|
||||||
|
1. A user submits a pod spec requesting X GPUs (or devices) through OIR
|
||||||
2. The scheduler filters the nodes which do not match the resource requests
|
2. The scheduler filters the nodes which do not match the resource requests
|
||||||
3. The pod lands on the node and Kubelet decides which device
|
3. The pod lands on the node and Kubelet decides which device
|
||||||
should be assigned to the pod
|
should be assigned to the pod
|
||||||
4. Kubelet calls `Allocate` on the matching Device Plugins
|
4. Kubelet calls `Allocate` on the matching Device Plugins
|
||||||
5. The user deletes the pod or the pod terminates
|
5. The user deletes the pod or the pod terminates
|
||||||
6. Kubelet calls `Deallocate` on the matching Device Plugins
|
|
||||||
|
|
||||||
When receiving a pod which requests Devices kubelet is in charge of:
|
When receiving a pod which requests Devices kubelet is in charge of:
|
||||||
* deciding which device to assign to the pod's containers (this will
|
* deciding which device to assign to the pod's containers
|
||||||
change in the future)
|
* Note: This will be decided in the future at the scheduler level as
|
||||||
* advertising the changes to the node's `Available` list
|
part of the Resource Class proposal
|
||||||
* advertising the changes to the pods's `Allocated` list
|
|
||||||
* Calling the `Allocate` function with the list of devices
|
* Calling the `Allocate` function with the list of devices
|
||||||
|
|
||||||
The scheduler is still be in charge of filtering the nodes which cannot
|
The scheduler is still in charge of filtering the nodes which cannot
|
||||||
satisfy the resource requests.
|
satisfy the resource requests.
|
||||||
He might in the future be in charge of selecting the device.
|
|
||||||
|
|
||||||
## Device Plugin
|
## Device Plugin
|
||||||
|
|
||||||
|
|
@ -165,13 +176,16 @@ He might in the future be in charge of selecting the device.
|
||||||
|
|
||||||
The device plugin is structured in 5 parts:
|
The device plugin is structured in 5 parts:
|
||||||
1. Registration: The device plugin advertises it's presence to Kubelet
|
1. Registration: The device plugin advertises it's presence to Kubelet
|
||||||
2. Discovery: Kubelet calls the device plugin to list it's devices
|
2. ListDevices: Kubelet calls the device plugin to list it's devices
|
||||||
3. Allocate / Deallocate: When creating/deleting containers requesting the
|
3. HealthCheck: The device plugin returns a stream on which it writes when
|
||||||
devices advertised by the device plugin, Kubelet calls the device plugin's
|
a device's health changes
|
||||||
`Allocate` and `Deallocate` functions.
|
4. Allocate: When creating containers, Kubelet calls the device plugin's
|
||||||
4. Cleanup: Kubelet terminates the communication through a "Stop"
|
`Allocate` function so that it can run device specific instructions (gpu
|
||||||
5. Heartbeat: The device plugin polls Kubelet to know if it's still alive
|
cleanup, QRNG initialization, ...) and instruct Kubelet how to make the
|
||||||
and if it has to re-issue a Register request
|
device available in the container.
|
||||||
|
5. Heartbeat: The device plugin polls every 5s Kubelet to know if it's still
|
||||||
|
alive and if it has to re-issue a Register request (e.g: Kubelet crashed
|
||||||
|
between two heartbeats)
|
||||||
|
|
||||||
### Registration
|
### Registration
|
||||||
|
|
||||||
|
|
@ -183,7 +197,7 @@ sockets and follow this simple pattern:
|
||||||
1. The device plugins starts it's gRPC server
|
1. The device plugins starts it's gRPC server
|
||||||
2. The device plugins sends a `RegisterRequest` to Kubelet (through a
|
2. The device plugins sends a `RegisterRequest` to Kubelet (through a
|
||||||
gRPC request)
|
gRPC request)
|
||||||
4. Kubelet starts it's Discovery phase and calls `Discover` and `Monitor`
|
4. Kubelet starts it's Discovery phase and calls `ListDevices` and `HealthCheck`
|
||||||
5. Kubelet answers to the `RegisterRequest` with a `RegisterResponse`
|
5. Kubelet answers to the `RegisterRequest` with a `RegisterResponse`
|
||||||
containing any error Kubelet might have encountered
|
containing any error Kubelet might have encountered
|
||||||
|
|
||||||
|
|
@ -192,7 +206,7 @@ sockets and follow this simple pattern:
|
||||||
Device Plugins are expected to communicate with Kubelet through gRPC
|
Device Plugins are expected to communicate with Kubelet through gRPC
|
||||||
on an Unix socket.
|
on an Unix socket.
|
||||||
When starting the gRPC server, they are expected to create a unix socket
|
When starting the gRPC server, they are expected to create a unix socket
|
||||||
at the following host path: `/var/run/Kubernetes`.
|
at the following host path: `/var/run/kubernetes`.
|
||||||
|
|
||||||
For non bare metal device plugin this means they will have to mount the folder
|
For non bare metal device plugin this means they will have to mount the folder
|
||||||
as a volume in their pod spec ([see Installation](##installation)).
|
as a volume in their pod spec ([see Installation](##installation)).
|
||||||
|
|
@ -217,12 +231,10 @@ not there was an error. The errors may include (but not limited to):
|
||||||
* Vendor is not consistent across discovered devices
|
* Vendor is not consistent across discovered devices
|
||||||
|
|
||||||
Kubelet will then interact with the plugin through the following functions:
|
Kubelet will then interact with the plugin through the following functions:
|
||||||
* `Discover`: List Devices
|
* `ListDevices`: List Devices
|
||||||
* `Monitor`: Returns a stream that is written to when a
|
* `HealthCheck`: Returns a stream that is written to when a Device becomes
|
||||||
Device becomes unhealty
|
unhealty
|
||||||
* `Allocate`: Called when creating a container with a list of devices
|
* `Allocate`: Called when creating a container with a list of devices
|
||||||
can request changes to the Container config
|
|
||||||
* `Deallocate`: Called when deleting a container can be used for cleanup
|
|
||||||
|
|
||||||
The device plugin is also expected to periodically call the `Heartbeat` function
|
The device plugin is also expected to periodically call the `Heartbeat` function
|
||||||
exposed by Kubelet and issue a `Registration` request when it either can't reach
|
exposed by Kubelet and issue a `Registration` request when it either can't reach
|
||||||
|
|
@ -240,11 +252,10 @@ service PluginRegistration {
|
||||||
}
|
}
|
||||||
|
|
||||||
service DevicePlugin {
|
service DevicePlugin {
|
||||||
rpc Discover(Empty) returns (stream Device) {}
|
rpc ListDevices(Empty) returns (stream Device) {}
|
||||||
rpc Monitor(Empty) returns (stream DeviceHealth) {}
|
rpc HealthCheck(Empty) returns (stream Device) {}
|
||||||
|
|
||||||
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
|
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
|
||||||
rpc Deallocate(DeallocateRequest) returns (Empty) {}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
message RegisterRequest {
|
message RegisterRequest {
|
||||||
|
|
@ -262,7 +273,7 @@ message RegisterResponse {
|
||||||
string version = 1;
|
string version = 1;
|
||||||
// Kubelet fills this field if it encounters any errors
|
// Kubelet fills this field if it encounters any errors
|
||||||
// during the registration process or discover process
|
// during the registration process or discover process
|
||||||
Error error = 2;
|
string error = 2;
|
||||||
}
|
}
|
||||||
|
|
||||||
message HeartbeatRequest {
|
message HeartbeatRequest {
|
||||||
|
|
@ -274,7 +285,7 @@ message HeartbeatResponse {
|
||||||
// plugin to either re-register itself or not
|
// plugin to either re-register itself or not
|
||||||
string response = 1;
|
string response = 1;
|
||||||
// Kubelet fills this field if it encountered any errors
|
// Kubelet fills this field if it encountered any errors
|
||||||
Error error = 2;
|
string error = 2;
|
||||||
}
|
}
|
||||||
|
|
||||||
message AllocateRequest {
|
message AllocateRequest {
|
||||||
|
|
@ -288,26 +299,17 @@ message AllocateResponse {
|
||||||
repeated Mount mounts = 2;
|
repeated Mount mounts = 2;
|
||||||
}
|
}
|
||||||
|
|
||||||
message DeallocateRequest {
|
|
||||||
repeated Device devices = 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
message Error {
|
|
||||||
bool error = 1;
|
|
||||||
string reason = 2;
|
|
||||||
}
|
|
||||||
|
|
||||||
// E.g:
|
// E.g:
|
||||||
// struct Device {
|
// struct Device {
|
||||||
// Kind: "NVIDIA-gpu"
|
// Kind: "NVIDIA-gpu"
|
||||||
// Name: "GPU-fef8089b-4820-abfc-e83e-94318197576e"
|
// Name: "GPU-fef8089b-4820-abfc-e83e-94318197576e",
|
||||||
|
// Health: "Healthy",
|
||||||
// Properties: {
|
// Properties: {
|
||||||
// "Family": "Pascal",
|
// "Family": "Pascal",
|
||||||
// "Memory": "4G",
|
// "Memory": "4G",
|
||||||
// "ECC" : "True",
|
// "ECC" : "True",
|
||||||
// }
|
// }
|
||||||
//}
|
//}
|
||||||
//
|
|
||||||
message Device {
|
message Device {
|
||||||
string Kind = 1;
|
string Kind = 1;
|
||||||
string Name = 2;
|
string Name = 2;
|
||||||
|
|
@ -315,15 +317,161 @@ message Device {
|
||||||
string Vendor = 4;
|
string Vendor = 4;
|
||||||
map<string, string> properties = 5; // Could be [1, 1.2, 1G]
|
map<string, string> properties = 5; // Could be [1, 1.2, 1G]
|
||||||
}
|
}
|
||||||
|
```
|
||||||
|
|
||||||
message DeviceHealth {
|
### HealthCheck and Failure Recovery
|
||||||
string Name = 1;
|
|
||||||
string Kind = 2;
|
We want Kubelet as well as the Device Plugins to recover from failures
|
||||||
string Vendor = 4;
|
that may happen on any side of this protocol.
|
||||||
string Health = 3;
|
|
||||||
|
At the communication level, gRPC is a very strong piece of software and
|
||||||
|
is able to ensure that if failure happens it will try it's best to recover
|
||||||
|
through exponential backoff reconnection.
|
||||||
|
|
||||||
|
The proposed mechanism intends to replace any device specific handling in
|
||||||
|
Kubelet. Therefore in general, device plugin failure or upgrade means that
|
||||||
|
Kubelet is not able to accept any pod requesting a Device until the upgrade
|
||||||
|
or failure finishes.
|
||||||
|
|
||||||
|
If a device fails, the Device Plugin should signal that through the HealthCheck
|
||||||
|
stream and we expect Kubelet to stop the pod and reschedule it.
|
||||||
|
|
||||||
|
If any Device Plugin fails the behavior we expect depends on the task Kubelet
|
||||||
|
is performing:
|
||||||
|
* In general we expect Kubelet to remove any devices that are owned by the failed
|
||||||
|
device plugin from the resources advertised by the Node status.
|
||||||
|
* We however do not expect Kubelet to fail or restart any pods or containers
|
||||||
|
running that are using these devices.
|
||||||
|
* If Kubelet is in the process of allocating a device, then it should fail
|
||||||
|
the container process and reschedule the Pod.
|
||||||
|
|
||||||
|
If the Kubelet fails or restarts, we expect the Device Plugins to know about
|
||||||
|
it through Kubelet's Heartbeat call which every Device Plugin should call
|
||||||
|
every 5s.
|
||||||
|
|
||||||
|
When Kubelet fails or restarts it should know what are the devices that are
|
||||||
|
owned by the different containers and be able to rebuild a list of available
|
||||||
|
devices.
|
||||||
|
In the current design, instead of checkpointing this data, we propose to save
|
||||||
|
this in the API server as this gives introspection capabilities to the user
|
||||||
|
has minimal impact on performances and is a minimal change that can be
|
||||||
|
reverted if we decide to implement checkpointing or a debug API later.
|
||||||
|
|
||||||
|
If Kubelet failed and recovered between two Heartbeat we are expecting it
|
||||||
|
to answer with a HeartbeatKo answer. Signaling the device plugins to register
|
||||||
|
themselves again against the Kubelet (in case of heartbeat failure
|
||||||
|
or connection error).
|
||||||
|
|
||||||
|
### API Changes
|
||||||
|
|
||||||
|
When discovering the devices, Kubelet will be in charge of advertising those
|
||||||
|
resources to the API server as part of the kubelet node update current protocol.
|
||||||
|
|
||||||
|
We will advertise each device returned by the Device Plugin in a new structure
|
||||||
|
called `Device`.
|
||||||
|
It is defined as follows:
|
||||||
|
|
||||||
|
```golang
|
||||||
|
// E.g:
|
||||||
|
// struct Device {
|
||||||
|
// Kind: "NVIDIA-gpu"
|
||||||
|
// Name: "GPU-fef8089b-4820-abfc-e83e-94318197576e"
|
||||||
|
// Health: "Healthy",
|
||||||
|
// Properties: {
|
||||||
|
// "Family": "Pascal",
|
||||||
|
// "Memory": "4G",
|
||||||
|
// "ECC" : "True",
|
||||||
|
// }
|
||||||
|
//}
|
||||||
|
type Device struct {
|
||||||
|
Kind string
|
||||||
|
Vendor string
|
||||||
|
Name string
|
||||||
|
Health DeviceHealthStatus
|
||||||
|
Properties map[string]string
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Because the current API (Capacity) can not be extended to support Device,
|
||||||
|
we will need to create one new attribute in the NodeStatus structure:
|
||||||
|
* `DevCapacity`: Describing the device capacity of the node
|
||||||
|
|
||||||
|
```golang
|
||||||
|
type NodeStatus struct {
|
||||||
|
DevCapacity []Device
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
We also introduce the `Devices` field in the Containers status so that user
|
||||||
|
can know what devices were assigned to the container.
|
||||||
|
|
||||||
|
```golang
|
||||||
|
type ContainerStatus struct {
|
||||||
|
Devices []Device
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that we will be using OIR to schedule and trigger the device plugin
|
||||||
|
in parallel.
|
||||||
|
So when a Device plugin registers two `foo-device` the node status will be
|
||||||
|
updated to advertise 2 `extensions.kubernetes.io/foo-device`.
|
||||||
|
|
||||||
|
If a user wants to trigger the device plugin he only needs to request this
|
||||||
|
OIR in his Pod Spec.
|
||||||
|
|
||||||
|
## Upgrading your cluster
|
||||||
|
|
||||||
|
TLDR: If you are upgrading either Kubelet or any device plugin the safest way
|
||||||
|
is to drain the node of all pods and upgrade.
|
||||||
|
However depending on what you are upgrading and what changes happened then it
|
||||||
|
is completely possible to only restart just Kubelet or just the device plugin.
|
||||||
|
|
||||||
|
### Upgrading Kubelet
|
||||||
|
|
||||||
|
This assumes that the Device Plugins running on the nodes fully implement the
|
||||||
|
protocol and are able to recover from a Kubelet crash.
|
||||||
|
|
||||||
|
Then, as long as the Device Plugin API does not change upgrading Kubelet can be done
|
||||||
|
seamlessly through a Kubelet restart.
|
||||||
|
|
||||||
|
However, as mentioned in the Versioning section, we currently expect the Device
|
||||||
|
Plugin's API version to match exactly the Kubelet's Device Plugin API version.
|
||||||
|
|
||||||
|
Therefore if the Device Plugin API version change then you will have to change
|
||||||
|
the Device Plugin too.
|
||||||
|
Consider draining the node in that case.
|
||||||
|
|
||||||
|
When the Device Plugin API becomes a stable feature, versionning should be
|
||||||
|
backward compatible and even if Kubelet has a different Device Plugin API,
|
||||||
|
it should not require a Device Plugin upgrade.
|
||||||
|
|
||||||
|
### Upgrading Device Plugins
|
||||||
|
|
||||||
|
Because we cannot enforce what the different Device Plugins will do, we cannot
|
||||||
|
say for certain that upgrading a device plugin will not crash any containers
|
||||||
|
on the node.
|
||||||
|
|
||||||
|
It is therefore up to the Device Plugin vendors to specify if the Device Plugins
|
||||||
|
can be upgraded without impacting any running containers.
|
||||||
|
|
||||||
|
As mentioned earlier, the safest way is to drain the node before upgrading
|
||||||
|
the Device Plugins.
|
||||||
|
|
||||||
|
## Difference Between Implementations
|
||||||
|
|
||||||
|
The main difference between implementation 1 and 2 are:
|
||||||
|
* This implementation allows vendors to run device specific code before
|
||||||
|
starting the containers requesting these devices.
|
||||||
|
* This implementation allows users to know what devices were assigned
|
||||||
|
to a container
|
||||||
|
* This implementation does not need checkpointing
|
||||||
|
* This implementation has a clear separation of concerns, every functions
|
||||||
|
does one thing and only one. Every actor has only one explicit role:
|
||||||
|
* Kubelet's gRPC is in charge of keeping track of Device Plugins
|
||||||
|
* The Device Plugin's gRPC is in charge of handling devices
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# Proposed Implementation 2
|
# Proposed Implementation 2
|
||||||
|
|
||||||
The main strategy of this proposed implemenation is that we want to start with
|
The main strategy of this proposed implemenation is that we want to start with
|
||||||
|
|
@ -636,7 +784,7 @@ Negotiation would take place in the registration:
|
||||||
4. If the Device Plugin supports the version sent by Kubelet it can and should
|
4. If the Device Plugin supports the version sent by Kubelet it can and should
|
||||||
answer the different calls made by Kubelet
|
answer the different calls made by Kubelet
|
||||||
|
|
||||||
## References
|
# References
|
||||||
|
|
||||||
* [Enable "kick the tires" support for NVIDIA GPUs in COS](https://github.com/Kubernetes/Kubernetes/pull/45136)
|
* [Enable "kick the tires" support for NVIDIA GPUs in COS](https://github.com/Kubernetes/Kubernetes/pull/45136)
|
||||||
* [Extend experimental support to multiple NVIDIA GPUs](https://github.com/Kubernetes/Kubernetes/pull/42116)
|
* [Extend experimental support to multiple NVIDIA GPUs](https://github.com/Kubernetes/Kubernetes/pull/42116)
|
||||||
|
|
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 60 KiB After Width: | Height: | Size: 46 KiB |
Loading…
Reference in New Issue