Merge pull request #7805 from johnbelamaric/wg-device-mgmt
Add WG Device Management
This commit is contained in:
commit
6046718b31
|
@ -130,6 +130,10 @@ aliases:
|
|||
wg-data-protection-leads:
|
||||
- xing-yang
|
||||
- yuxiangqian
|
||||
wg-device-management-leads:
|
||||
- johnbelamaric
|
||||
- klueska
|
||||
- pohly
|
||||
wg-lts-leads:
|
||||
- jeremyrickard
|
||||
- liggitt
|
||||
|
|
|
@ -111,7 +111,6 @@ channels:
|
|||
- name: digitalocean-k8s
|
||||
- name: distroless
|
||||
- name: diversity
|
||||
- name: dra
|
||||
- name: draft-dev
|
||||
- name: draft-users
|
||||
- name: druid-operator
|
||||
|
@ -549,6 +548,8 @@ channels:
|
|||
- name: wg-component-standard-mentorship
|
||||
archived: true
|
||||
- name: wg-data-protection
|
||||
- name: wg-device-management
|
||||
id: C0409NGC1TK
|
||||
- name: wg-iot-edge
|
||||
- name: sig-k8s-infra
|
||||
id: CCK68P2Q2
|
||||
|
|
|
@ -57,6 +57,7 @@ members will assume one of the departing members groups.
|
|||
| [WG API Expression](wg-api-expression/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) |
|
||||
| [WG Batch](wg-batch/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
|
||||
| [WG Data Protection](wg-data-protection/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
|
||||
| [WG Device Management](wg-device-management/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
|
||||
| [WG LTS](wg-lts/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) |
|
||||
| [WG Policy](wg-policy/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
|
||||
| [WG Structured Logging](wg-structured-logging/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) |
|
||||
|
|
|
@ -57,6 +57,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
|
|||
|
||||
The following [working groups][working-group-definition] are sponsored by sig-architecture:
|
||||
* [WG API Expression](/wg-api-expression)
|
||||
* [WG Device Management](/wg-device-management)
|
||||
* [WG LTS](/wg-lts)
|
||||
* [WG Policy](/wg-policy)
|
||||
* [WG Structured Logging](/wg-structured-logging)
|
||||
|
|
|
@ -47,6 +47,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
|
|||
|
||||
The following [working groups][working-group-definition] are sponsored by sig-autoscaling:
|
||||
* [WG Batch](/wg-batch)
|
||||
* [WG Device Management](/wg-device-management)
|
||||
|
||||
|
||||
## Subprojects
|
||||
|
|
|
@ -64,6 +64,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
|
|||
|[API Expression](wg-api-expression/README.md)|[api-expression](https://github.com/kubernetes/kubernetes/labels/wg%2Fapi-expression)|* API Machinery<br>* Architecture<br>|* [Antoine Pelisse](https://github.com/apelisse), Google<br>* [Kevin Wiesmueller](https://github.com/kwiesmueller), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-api-expression)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-api-expression)|* Regular WG Meeting: [Tuesdays at 9:30 PT (Pacific Time) (biweekly)](https://zoom.us/j/94238112084)<br>
|
||||
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Aldo Culquicondor](https://github.com/alculquicondor), Google<br>* [Marcin Wielgus](https://github.com/mwielgus), Google<br>* [Maciej Szulik](https://github.com/soltysh), Red Hat<br>* [Swati Sehgal](https://github.com/swatisehgal), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)|* Regular Meeting ([Calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 15th 2024)s at 3PM CET (Central European Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>* Regular Meeting ([Calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 1st 2024)s at 3PM PT (Pacific Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>
|
||||
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>
|
||||
|[Device Management](wg-device-management/README.md)|[device-management](https://github.com/kubernetes/kubernetes/labels/wg%2Fdevice-management)|* Architecture<br>* Autoscaling<br>* Network<br>* Node<br>* Scheduling<br>|* [John Belamaric](https://github.com/johnbelamaric), Google<br>* [Kevin Klues](https://github.com/klueska), NVIDIA<br>* [Patrick Ohly](https://github.com/pohly), Intel<br>|* [Slack](https://kubernetes.slack.com/messages/wg-device-management)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-device-management)|* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time) (biweekly)](TBD)<br>
|
||||
|[LTS](wg-lts/README.md)|[lts](https://github.com/kubernetes/kubernetes/labels/wg%2Flts)|* Architecture<br>* Cluster Lifecycle<br>* K8s Infra<br>* Release<br>* Security<br>* Testing<br>|* [Jeremy Rickard](https://github.com/jeremyrickard), Microsoft<br>* [Jordan Liggitt](https://github.com/liggitt), Google<br>* [Micah Hausler](https://github.com/micahhausler), Amazon<br>|* [Slack](https://kubernetes.slack.com/messages/wg-lts)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-lts)|* Regular WG Meeting: [Tuesdays at 07:00 PT (Pacific Time) (biweekly)](https://zoom.us/j/92480197536?pwd=dmtSMGJRQmNYYTIyZkFlQ25JRngrdz09)<br>
|
||||
|[Policy](wg-policy/README.md)|[policy](https://github.com/kubernetes/kubernetes/labels/wg%2Fpolicy)|* Architecture<br>* Auth<br>* Multicluster<br>* Network<br>* Node<br>* Scheduling<br>* Storage<br>|* [Jim Bugwadia](https://github.com/JimBugwadia), Kyverno/Nirmata<br>* [Poonam Lamba](https://github.com/poonam-lamba), Google<br>* [Andy Suderman](https://github.com/sudermanjr), Fairwinds<br>|* [Slack](https://kubernetes.slack.com/messages/wg-policy)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-policy)|* Regular WG Meeting: [Wednesdays at 8:00 PT (Pacific Time) (semimonthly)](https://zoom.us/j/7375677271)<br>
|
||||
|[Structured Logging](wg-structured-logging/README.md)|[structured-logging](https://github.com/kubernetes/kubernetes/labels/wg%2Fstructured-logging)|* API Machinery<br>* Architecture<br>* Cloud Provider<br>* Instrumentation<br>* Network<br>* Node<br>* Scheduling<br>* Storage<br>|* [Mengjiao Liu](https://github.com/mengjiao-liu), DaoCloud<br>* [Patrick Ohly](https://github.com/pohly), Intel<br>|* [Slack](https://kubernetes.slack.com/messages/wg-structured-logging)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-structured-logging)|
|
||||
|
|
|
@ -72,6 +72,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
|
|||
## Working Groups
|
||||
|
||||
The following [working groups][working-group-definition] are sponsored by sig-network:
|
||||
* [WG Device Management](/wg-device-management)
|
||||
* [WG Policy](/wg-policy)
|
||||
* [WG Structured Logging](/wg-structured-logging)
|
||||
|
||||
|
|
|
@ -53,6 +53,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
|
|||
|
||||
The following [working groups][working-group-definition] are sponsored by sig-node:
|
||||
* [WG Batch](/wg-batch)
|
||||
* [WG Device Management](/wg-device-management)
|
||||
* [WG Policy](/wg-policy)
|
||||
* [WG Structured Logging](/wg-structured-logging)
|
||||
|
||||
|
|
|
@ -63,6 +63,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
|
|||
|
||||
The following [working groups][working-group-definition] are sponsored by sig-scheduling:
|
||||
* [WG Batch](/wg-batch)
|
||||
* [WG Device Management](/wg-device-management)
|
||||
* [WG Policy](/wg-policy)
|
||||
* [WG Structured Logging](/wg-structured-logging)
|
||||
|
||||
|
|
42
sigs.yaml
42
sigs.yaml
|
@ -3345,6 +3345,48 @@ workinggroups:
|
|||
liaison:
|
||||
github: pohly
|
||||
name: Patrick Ohly
|
||||
- dir: wg-device-management
|
||||
name: Device Management
|
||||
mission_statement: >
|
||||
Enable simple and efficient configuration, sharing, and allocation of accelerators
|
||||
and other specialized devices.
|
||||
|
||||
[Additional context](https://groups.google.com/a/kubernetes.io/g/dev/c/YWXGXe07A5w/m/OqLvdQ47BQAJ)
|
||||
|
||||
charter_link: charter.md
|
||||
stakeholder_sigs:
|
||||
- Architecture
|
||||
- Autoscaling
|
||||
- Network
|
||||
- Node
|
||||
- Scheduling
|
||||
label: device-management
|
||||
leadership:
|
||||
chairs:
|
||||
- github: johnbelamaric
|
||||
name: John Belamaric
|
||||
company: Google
|
||||
- github: klueska
|
||||
name: Kevin Klues
|
||||
company: NVIDIA
|
||||
- github: pohly
|
||||
name: Patrick Ohly
|
||||
company: Intel
|
||||
meetings:
|
||||
- description: Regular WG Meeting
|
||||
day: Tuesday
|
||||
time: "8:30"
|
||||
tz: PT (Pacific Time)
|
||||
frequency: biweekly
|
||||
url: TBD
|
||||
archive_url: https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?usp=sharing
|
||||
recordings_url: TBD
|
||||
contact:
|
||||
slack: wg-device-management
|
||||
mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-device-management
|
||||
liaison:
|
||||
github: pohly
|
||||
name: Patrick Ohly
|
||||
- dir: wg-lts
|
||||
name: LTS
|
||||
mission_statement: >
|
||||
|
|
|
@ -0,0 +1,42 @@
|
|||
<!---
|
||||
This is an autogenerated file!
|
||||
|
||||
Please do not edit this file directly, but instead make changes to the
|
||||
sigs.yaml file in the project root.
|
||||
|
||||
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
|
||||
--->
|
||||
# Device Management Working Group
|
||||
|
||||
Enable simple and efficient configuration, sharing, and allocation of accelerators and other specialized devices.
|
||||
[Additional context](https://groups.google.com/a/kubernetes.io/g/dev/c/YWXGXe07A5w/m/OqLvdQ47BQAJ)
|
||||
|
||||
The [charter](charter.md) defines the scope and governance of the Device Management Working Group.
|
||||
|
||||
## Stakeholder SIGs
|
||||
* [SIG Architecture](/sig-architecture)
|
||||
* [SIG Autoscaling](/sig-autoscaling)
|
||||
* [SIG Network](/sig-network)
|
||||
* [SIG Node](/sig-node)
|
||||
* [SIG Scheduling](/sig-scheduling)
|
||||
|
||||
## Meetings
|
||||
*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-device-management) for the group will typically add invites for the following meetings to your calendar.*
|
||||
* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time)](TBD) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=8:30&tz=PT%20%28Pacific%20Time%29).
|
||||
* [Meeting notes and Agenda](https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?usp=sharing).
|
||||
* [Meeting recordings](TBD).
|
||||
|
||||
## Organizers
|
||||
|
||||
* John Belamaric (**[@johnbelamaric](https://github.com/johnbelamaric)**), Google
|
||||
* Kevin Klues (**[@klueska](https://github.com/klueska)**), NVIDIA
|
||||
* Patrick Ohly (**[@pohly](https://github.com/pohly)**), Intel
|
||||
|
||||
## Contact
|
||||
- Slack: [#wg-device-management](https://kubernetes.slack.com/messages/wg-device-management)
|
||||
- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-device-management)
|
||||
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fdevice-management)
|
||||
- Steering Committee Liaison: Patrick Ohly (**[@pohly](https://github.com/pohly)**)
|
||||
<!-- BEGIN CUSTOM CONTENT -->
|
||||
|
||||
<!-- END CUSTOM CONTENT -->
|
|
@ -0,0 +1,103 @@
|
|||
# WG Device Management Charter
|
||||
|
||||
This charter adheres to the conventions described in the [Kubernetes Charter
|
||||
README] and uses the Roles and Organization Management outlined in
|
||||
[wg-governance].
|
||||
|
||||
## Scope
|
||||
|
||||
Enable simple and efficient configuration, sharing, and allocation of
|
||||
accelerators and other specialized devices. This working group focuses on the
|
||||
APIs, abstractions, and feature designs needed to configure, target, and share
|
||||
the necessary hardware for both batch and serving (inference) workloads.
|
||||
|
||||
### In scope
|
||||
|
||||
- Enable efficient utilization of specialized hardware devices. This includes
|
||||
sharing one or more resources effectively (many workloads sharing a pool of
|
||||
devices), as well as sharing individual devices effectively (several workloads
|
||||
dividing up a single device for sharing).
|
||||
- Enable workload authors to specify “just enough” details about their workload
|
||||
requirements to ensure it runs optimally, without having to understand exactly
|
||||
how the infrastructure team has provisioned the cluster.
|
||||
- Enable the scheduler to choose the correct place to run a workload the vast
|
||||
majority of the time (rejections should be extremely rare).
|
||||
- Enable cluster autoscalers and other node auto-provisioning components to
|
||||
predict whether creating additional resources will satisfy workload needs,
|
||||
before provisioning those resources.
|
||||
- Enable the shift from “pods run on nodes” to “workloads consume capacity”.
|
||||
This allows Kubernetes to provision sets of pods on top of sets of nodes and
|
||||
specialized hardware, while taking into account the relationships between
|
||||
those infrastructure components.
|
||||
- Enable in-node devices as well as network-accessible devices.
|
||||
- Minimize workload disruption due to hardware failures.
|
||||
- Address fragmentation of accelerator due to fractional use.
|
||||
- Additional problems that may be identified and deemed in scope as we gather
|
||||
use cases and requirements from WG Serving, WG Batch, and other stakeholders.
|
||||
- Address all of the above while with a simple API that is a natural extension
|
||||
of the existing Kubernetes APIs, and avoids or minimizes any transition
|
||||
effort.
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Higher-level workload controller APIs (for example, the equivalent of
|
||||
Deployment, StatefulSet, or DaemonSet) for specific types of workloads.
|
||||
- General resource management requirements not related to devices.
|
||||
|
||||
## Deliverables
|
||||
|
||||
The WG will coordinate the delivery of KEPs and their implementations by the
|
||||
participating SIGs. Interim artifacts will include documents capturing use
|
||||
cases, requirements, and designs; however, all of those will eventually result
|
||||
in KEPs and code owned by SIGs.
|
||||
|
||||
Specifically, we expect to need:
|
||||
|
||||
- APIs for publishing resource capacity of in-node and network-accessible
|
||||
devices, as well as sample code to ease creation of drivers to populate this
|
||||
information.
|
||||
- APIs for specifying workload resource requirements with respect to devices.
|
||||
- APIs, algorithms, and implementations for allocating access to and resources on devices, as well as
|
||||
persisting the results of those allocations.
|
||||
- APIs, algorithms, and implementations for allowing adminstrators to control
|
||||
and govern access to devices.
|
||||
|
||||
## Stakeholders
|
||||
|
||||
- SIG Architecture
|
||||
- SIG Autoscaling
|
||||
- SIG Network
|
||||
- SIG Node
|
||||
- SIG Scheduling
|
||||
|
||||
Additionally a broad set of end users, device vendors, cloud providers,
|
||||
Kubernetes distribution providers, and ecosystem projects (particularly
|
||||
autoscaling-related projects) have expressed interest in this effort. There are
|
||||
five primary groups of stakeholders from each of which we expect multiple participants:
|
||||
|
||||
- Device vendors that manufacture accelerators and other specialized hardware
|
||||
which they would like to make available to Kubernetes users.
|
||||
- Kubernetes distribution and managed offering providers that would like to make
|
||||
specialized hardware available to their users.
|
||||
- Kubernetes ecosystem projects that help manage workloads utilizing these
|
||||
accelerators (e.g., Karpenter, Kueue, Volcano)
|
||||
- End user workload authors that will create workloads that take advantage of
|
||||
the specialized hardware.
|
||||
- Cluster administrators that operate and govern clusters containing the
|
||||
specialized hardware.
|
||||
|
||||
## Roles and Organization Management
|
||||
|
||||
This working group adheres to the Roles and Organization Management outlined in
|
||||
[wg-governance] and opts-in to updates and modifications to [wg-governance].
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
The working group will disband when the KEPs resulting from these discussions
|
||||
have reached a terminal state. When the core functionality for dynamic resource
|
||||
allocation (DRA) reaches GA, we will evaluate whether the working group should
|
||||
be disbanded and any remaining KEPs be left to the management of their owning
|
||||
SIGs.
|
||||
|
||||
[wg-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
|
||||
[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
|
Loading…
Reference in New Issue