Edits in response to review comments.

This commit is contained in:
Connor Doyle 2018-03-29 07:55:24 -07:00
parent 69402a4ddc
commit 4793277c98
1 changed files with 21 additions and 9 deletions

View File

@ -46,7 +46,6 @@ components.
This proposal provides a mechanism to coordinate fine-grained hardware
resource assignments for different components in Kubernetes.
# Motivation
Multiple components in the Kubelet make decisions about system
@ -73,12 +72,17 @@ with measurable performance ramifications.
- [Support VF interrupt binding to specified CPU][sriov-issue-10]
- [Proposal: CPU Affinity and NUMA Topology Awareness][proposal-affinity]
Note that all of these concerns pertain only to multi-socket systems.
Note that all of these concerns pertain only to multi-socket systems. Correct
behavior requires that the kernel receive accurate topology information from
the underlying hardware (typically via the SLIT table). See section 5.2.16
and 5.2.17 of the
[ACPI Specification](http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf) for more
information.
## Goals
- Allow CPU manager and Device Manager to agree on preferred
NUMA node affinity for containers.
- Arbitrate preferred NUMA node affinity for containers based on input from
CPU manager and Device Manager.
- Provide an internal interface and pattern to integrate additional
topology-aware Kubelet components.
@ -98,7 +102,7 @@ Note that all of these concerns pertain only to multi-socket systems.
- _CNI:_ Changing the Container Networking Interface is out of scope for
this proposal. However, this design should be extensible enough to
accommodate network interface locality if the CNI adds support in the
future. This limitation is potentially mitigated by the possiblity to
future. This limitation is potentially mitigated by the possibility to
use the device plugin API as a stopgap solution for specialized
networking requirements.
@ -136,14 +140,15 @@ affinity.
This proposal is focused on a new component in the Kubelet called the
NUMA Manager. The NUMA Manager implements the pod admit handler
interface and participates in Kubelet pod admission. When the `Admit()`
function is called, the NUMA manager collects NUMA hints from from other
function is called, the NUMA manager collects NUMA hints from other
Kubelet components.
If the NUMA hints are not compatible, the NUMA manager could choose to
reject the pod. The details of what to do in this situation needs more
discussion. For example, the NUMA manager could enforce strict NUMA
alignment for Guaranteed QoS pods. Alternatively, the NUMA manager could
simply provide best-effort NUMA alignment for all pods.
simply provide best-effort NUMA alignment for all pods. The NUMA manager could
use `softAdmitHandler` to keep the pod in `Pending` state.
The NUMA Manager component will be disabled behind a feature gate until
graduation from alpha to beta.
@ -162,6 +167,10 @@ present in all lists. Here is a sketch:
1. Store the first non-empty result and break out early.
1. If no non-empty result exists, return an error.
The behavior when a match does not exist should be configurable. The Kubelet
could support a config option to require strict NUMA assignment when set to
`true`. A `false` value would mean best-effort NUMA alignment.
#### New Interfaces
```go
@ -190,7 +199,9 @@ type Store interface {
// NUMA-related resource assignments. The NUMA manager consults each
// hint provider at pod admission time.
type HintProvider interface {
GetNUMAHints(pod v1.Pod, containerName string) []NUMAMask
// Returns a mask if this hint provider has a preference; otherwise
// returns `_, false` to indicate "don't care".
GetNUMAHints(pod v1.Pod, containerName string) ([]NUMAMask, bool)
}
```
@ -220,7 +231,8 @@ _NUMA Manager instantiation and inclusion in pod admit lifecycle._
interface. Plugins should be able to determine the NUMA node
easily when enumerating supported devices. For example, Linux
exposes the node ID in sysfs for PCI devices:
`/sys/devices/pci*/*/numa_node`.
`/sys/devices/pci*/*/numa_node`. NOTE: this is `-1` on many
public cloud instances and single-node machines.
1. Device Manager calls `GetAffinity()` method of NUMA manager when
deciding device allocation.