Merge pull request #7805 from johnbelamaric/wg-device-mgmt

Add WG Device Management
This commit is contained in:
Kubernetes Prow Robot 2024-04-29 09:27:54 -07:00 committed by GitHub
commit 6046718b31
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
12 changed files with 200 additions and 1 deletions

View File

@ -130,6 +130,10 @@ aliases:
wg-data-protection-leads:
- xing-yang
- yuxiangqian
wg-device-management-leads:
- johnbelamaric
- klueska
- pohly
wg-lts-leads:
- jeremyrickard
- liggitt

View File

@ -111,7 +111,6 @@ channels:
- name: digitalocean-k8s
- name: distroless
- name: diversity
- name: dra
- name: draft-dev
- name: draft-users
- name: druid-operator
@ -549,6 +548,8 @@ channels:
- name: wg-component-standard-mentorship
archived: true
- name: wg-data-protection
- name: wg-device-management
id: C0409NGC1TK
- name: wg-iot-edge
- name: sig-k8s-infra
id: CCK68P2Q2

View File

@ -57,6 +57,7 @@ members will assume one of the departing members groups.
| [WG API Expression](wg-api-expression/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) |
| [WG Batch](wg-batch/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
| [WG Data Protection](wg-data-protection/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
| [WG Device Management](wg-device-management/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
| [WG LTS](wg-lts/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) |
| [WG Policy](wg-policy/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
| [WG Structured Logging](wg-structured-logging/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) |

View File

@ -57,6 +57,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
The following [working groups][working-group-definition] are sponsored by sig-architecture:
* [WG API Expression](/wg-api-expression)
* [WG Device Management](/wg-device-management)
* [WG LTS](/wg-lts)
* [WG Policy](/wg-policy)
* [WG Structured Logging](/wg-structured-logging)

View File

@ -47,6 +47,7 @@ The Chairs of the SIG run operations and processes governing the SIG.
The following [working groups][working-group-definition] are sponsored by sig-autoscaling:
* [WG Batch](/wg-batch)
* [WG Device Management](/wg-device-management)
## Subprojects

View File

@ -64,6 +64,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
|[API Expression](wg-api-expression/README.md)|[api-expression](https://github.com/kubernetes/kubernetes/labels/wg%2Fapi-expression)|* API Machinery<br>* Architecture<br>|* [Antoine Pelisse](https://github.com/apelisse), Google<br>* [Kevin Wiesmueller](https://github.com/kwiesmueller), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-api-expression)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-api-expression)|* Regular WG Meeting: [Tuesdays at 9:30 PT (Pacific Time) (biweekly)](https://zoom.us/j/94238112084)<br>
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Aldo Culquicondor](https://github.com/alculquicondor), Google<br>* [Marcin Wielgus](https://github.com/mwielgus), Google<br>* [Maciej Szulik](https://github.com/soltysh), Red Hat<br>* [Swati Sehgal](https://github.com/swatisehgal), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)|* Regular Meeting ([Calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 15th 2024)s at 3PM CET (Central European Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>* Regular Meeting ([Calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 1st 2024)s at 3PM PT (Pacific Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>
|[Device Management](wg-device-management/README.md)|[device-management](https://github.com/kubernetes/kubernetes/labels/wg%2Fdevice-management)|* Architecture<br>* Autoscaling<br>* Network<br>* Node<br>* Scheduling<br>|* [John Belamaric](https://github.com/johnbelamaric), Google<br>* [Kevin Klues](https://github.com/klueska), NVIDIA<br>* [Patrick Ohly](https://github.com/pohly), Intel<br>|* [Slack](https://kubernetes.slack.com/messages/wg-device-management)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-device-management)|* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time) (biweekly)](TBD)<br>
|[LTS](wg-lts/README.md)|[lts](https://github.com/kubernetes/kubernetes/labels/wg%2Flts)|* Architecture<br>* Cluster Lifecycle<br>* K8s Infra<br>* Release<br>* Security<br>* Testing<br>|* [Jeremy Rickard](https://github.com/jeremyrickard), Microsoft<br>* [Jordan Liggitt](https://github.com/liggitt), Google<br>* [Micah Hausler](https://github.com/micahhausler), Amazon<br>|* [Slack](https://kubernetes.slack.com/messages/wg-lts)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-lts)|* Regular WG Meeting: [Tuesdays at 07:00 PT (Pacific Time) (biweekly)](https://zoom.us/j/92480197536?pwd=dmtSMGJRQmNYYTIyZkFlQ25JRngrdz09)<br>
|[Policy](wg-policy/README.md)|[policy](https://github.com/kubernetes/kubernetes/labels/wg%2Fpolicy)|* Architecture<br>* Auth<br>* Multicluster<br>* Network<br>* Node<br>* Scheduling<br>* Storage<br>|* [Jim Bugwadia](https://github.com/JimBugwadia), Kyverno/Nirmata<br>* [Poonam Lamba](https://github.com/poonam-lamba), Google<br>* [Andy Suderman](https://github.com/sudermanjr), Fairwinds<br>|* [Slack](https://kubernetes.slack.com/messages/wg-policy)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-policy)|* Regular WG Meeting: [Wednesdays at 8:00 PT (Pacific Time) (semimonthly)](https://zoom.us/j/7375677271)<br>
|[Structured Logging](wg-structured-logging/README.md)|[structured-logging](https://github.com/kubernetes/kubernetes/labels/wg%2Fstructured-logging)|* API Machinery<br>* Architecture<br>* Cloud Provider<br>* Instrumentation<br>* Network<br>* Node<br>* Scheduling<br>* Storage<br>|* [Mengjiao Liu](https://github.com/mengjiao-liu), DaoCloud<br>* [Patrick Ohly](https://github.com/pohly), Intel<br>|* [Slack](https://kubernetes.slack.com/messages/wg-structured-logging)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-structured-logging)|

View File

@ -72,6 +72,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
## Working Groups
The following [working groups][working-group-definition] are sponsored by sig-network:
* [WG Device Management](/wg-device-management)
* [WG Policy](/wg-policy)
* [WG Structured Logging](/wg-structured-logging)

View File

@ -53,6 +53,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-node:
* [WG Batch](/wg-batch)
* [WG Device Management](/wg-device-management)
* [WG Policy](/wg-policy)
* [WG Structured Logging](/wg-structured-logging)

View File

@ -63,6 +63,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
The following [working groups][working-group-definition] are sponsored by sig-scheduling:
* [WG Batch](/wg-batch)
* [WG Device Management](/wg-device-management)
* [WG Policy](/wg-policy)
* [WG Structured Logging](/wg-structured-logging)

View File

@ -3345,6 +3345,48 @@ workinggroups:
liaison:
github: pohly
name: Patrick Ohly
- dir: wg-device-management
name: Device Management
mission_statement: >
Enable simple and efficient configuration, sharing, and allocation of accelerators
and other specialized devices.
[Additional context](https://groups.google.com/a/kubernetes.io/g/dev/c/YWXGXe07A5w/m/OqLvdQ47BQAJ)
charter_link: charter.md
stakeholder_sigs:
- Architecture
- Autoscaling
- Network
- Node
- Scheduling
label: device-management
leadership:
chairs:
- github: johnbelamaric
name: John Belamaric
company: Google
- github: klueska
name: Kevin Klues
company: NVIDIA
- github: pohly
name: Patrick Ohly
company: Intel
meetings:
- description: Regular WG Meeting
day: Tuesday
time: "8:30"
tz: PT (Pacific Time)
frequency: biweekly
url: TBD
archive_url: https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?usp=sharing
recordings_url: TBD
contact:
slack: wg-device-management
mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-device-management
liaison:
github: pohly
name: Patrick Ohly
- dir: wg-lts
name: LTS
mission_statement: >

View File

@ -0,0 +1,42 @@
<!---
This is an autogenerated file!
Please do not edit this file directly, but instead make changes to the
sigs.yaml file in the project root.
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
--->
# Device Management Working Group
Enable simple and efficient configuration, sharing, and allocation of accelerators and other specialized devices.
[Additional context](https://groups.google.com/a/kubernetes.io/g/dev/c/YWXGXe07A5w/m/OqLvdQ47BQAJ)
The [charter](charter.md) defines the scope and governance of the Device Management Working Group.
## Stakeholder SIGs
* [SIG Architecture](/sig-architecture)
* [SIG Autoscaling](/sig-autoscaling)
* [SIG Network](/sig-network)
* [SIG Node](/sig-node)
* [SIG Scheduling](/sig-scheduling)
## Meetings
*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-device-management) for the group will typically add invites for the following meetings to your calendar.*
* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time)](TBD) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=8:30&tz=PT%20%28Pacific%20Time%29).
* [Meeting notes and Agenda](https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?usp=sharing).
* [Meeting recordings](TBD).
## Organizers
* John Belamaric (**[@johnbelamaric](https://github.com/johnbelamaric)**), Google
* Kevin Klues (**[@klueska](https://github.com/klueska)**), NVIDIA
* Patrick Ohly (**[@pohly](https://github.com/pohly)**), Intel
## Contact
- Slack: [#wg-device-management](https://kubernetes.slack.com/messages/wg-device-management)
- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-device-management)
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fdevice-management)
- Steering Committee Liaison: Patrick Ohly (**[@pohly](https://github.com/pohly)**)
<!-- BEGIN CUSTOM CONTENT -->
<!-- END CUSTOM CONTENT -->

View File

@ -0,0 +1,103 @@
# WG Device Management Charter
This charter adheres to the conventions described in the [Kubernetes Charter
README] and uses the Roles and Organization Management outlined in
[wg-governance].
## Scope
Enable simple and efficient configuration, sharing, and allocation of
accelerators and other specialized devices. This working group focuses on the
APIs, abstractions, and feature designs needed to configure, target, and share
the necessary hardware for both batch and serving (inference) workloads.
### In scope
- Enable efficient utilization of specialized hardware devices. This includes
sharing one or more resources effectively (many workloads sharing a pool of
devices), as well as sharing individual devices effectively (several workloads
dividing up a single device for sharing).
- Enable workload authors to specify “just enough” details about their workload
requirements to ensure it runs optimally, without having to understand exactly
how the infrastructure team has provisioned the cluster.
- Enable the scheduler to choose the correct place to run a workload the vast
majority of the time (rejections should be extremely rare).
- Enable cluster autoscalers and other node auto-provisioning components to
predict whether creating additional resources will satisfy workload needs,
before provisioning those resources.
- Enable the shift from “pods run on nodes” to “workloads consume capacity”.
This allows Kubernetes to provision sets of pods on top of sets of nodes and
specialized hardware, while taking into account the relationships between
those infrastructure components.
- Enable in-node devices as well as network-accessible devices.
- Minimize workload disruption due to hardware failures.
- Address fragmentation of accelerator due to fractional use.
- Additional problems that may be identified and deemed in scope as we gather
use cases and requirements from WG Serving, WG Batch, and other stakeholders.
- Address all of the above while with a simple API that is a natural extension
of the existing Kubernetes APIs, and avoids or minimizes any transition
effort.
### Out of Scope
- Higher-level workload controller APIs (for example, the equivalent of
Deployment, StatefulSet, or DaemonSet) for specific types of workloads.
- General resource management requirements not related to devices.
## Deliverables
The WG will coordinate the delivery of KEPs and their implementations by the
participating SIGs. Interim artifacts will include documents capturing use
cases, requirements, and designs; however, all of those will eventually result
in KEPs and code owned by SIGs.
Specifically, we expect to need:
- APIs for publishing resource capacity of in-node and network-accessible
devices, as well as sample code to ease creation of drivers to populate this
information.
- APIs for specifying workload resource requirements with respect to devices.
- APIs, algorithms, and implementations for allocating access to and resources on devices, as well as
persisting the results of those allocations.
- APIs, algorithms, and implementations for allowing adminstrators to control
and govern access to devices.
## Stakeholders
- SIG Architecture
- SIG Autoscaling
- SIG Network
- SIG Node
- SIG Scheduling
Additionally a broad set of end users, device vendors, cloud providers,
Kubernetes distribution providers, and ecosystem projects (particularly
autoscaling-related projects) have expressed interest in this effort. There are
five primary groups of stakeholders from each of which we expect multiple participants:
- Device vendors that manufacture accelerators and other specialized hardware
which they would like to make available to Kubernetes users.
- Kubernetes distribution and managed offering providers that would like to make
specialized hardware available to their users.
- Kubernetes ecosystem projects that help manage workloads utilizing these
accelerators (e.g., Karpenter, Kueue, Volcano)
- End user workload authors that will create workloads that take advantage of
the specialized hardware.
- Cluster administrators that operate and govern clusters containing the
specialized hardware.
## Roles and Organization Management
This working group adheres to the Roles and Organization Management outlined in
[wg-governance] and opts-in to updates and modifications to [wg-governance].
## Exit Criteria
The working group will disband when the KEPs resulting from these discussions
have reached a terminal state. When the core functionality for dynamic resource
allocation (DRA) reaches GA, we will evaluate whether the working group should
be disbanded and any remaining KEPs be left to the management of their owning
SIGs.
[wg-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md