diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES index a7deedb12..a9e0b17f8 100644 --- a/OWNERS_ALIASES +++ b/OWNERS_ALIASES @@ -130,6 +130,10 @@ aliases: wg-data-protection-leads: - xing-yang - yuxiangqian + wg-device-management-leads: + - johnbelamaric + - klueska + - pohly wg-lts-leads: - jeremyrickard - liggitt diff --git a/communication/slack-config/channels.yaml b/communication/slack-config/channels.yaml index 8cb9f1349..8ae1973f5 100644 --- a/communication/slack-config/channels.yaml +++ b/communication/slack-config/channels.yaml @@ -111,7 +111,6 @@ channels: - name: digitalocean-k8s - name: distroless - name: diversity - - name: dra - name: draft-dev - name: draft-users - name: druid-operator @@ -549,6 +548,8 @@ channels: - name: wg-component-standard-mentorship archived: true - name: wg-data-protection + - name: wg-device-management + id: C0409NGC1TK - name: wg-iot-edge - name: sig-k8s-infra id: CCK68P2Q2 diff --git a/liaisons.md b/liaisons.md index 785036c39..e08a51d1c 100644 --- a/liaisons.md +++ b/liaisons.md @@ -57,6 +57,7 @@ members will assume one of the departing members groups. | [WG API Expression](wg-api-expression/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) | | [WG Batch](wg-batch/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) | | [WG Data Protection](wg-data-protection/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) | +| [WG Device Management](wg-device-management/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) | | [WG LTS](wg-lts/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) | | [WG Policy](wg-policy/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) | | [WG Structured Logging](wg-structured-logging/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) | diff --git a/sig-architecture/README.md b/sig-architecture/README.md index da653feb0..9c75ccb0e 100644 --- a/sig-architecture/README.md +++ b/sig-architecture/README.md @@ -57,6 +57,7 @@ The Chairs of the SIG run operations and processes governing the SIG. The following [working groups][working-group-definition] are sponsored by sig-architecture: * [WG API Expression](/wg-api-expression) +* [WG Device Management](/wg-device-management) * [WG LTS](/wg-lts) * [WG Policy](/wg-policy) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-autoscaling/README.md b/sig-autoscaling/README.md index cbfa21886..914233833 100644 --- a/sig-autoscaling/README.md +++ b/sig-autoscaling/README.md @@ -47,6 +47,7 @@ The Chairs of the SIG run operations and processes governing the SIG. The following [working groups][working-group-definition] are sponsored by sig-autoscaling: * [WG Batch](/wg-batch) +* [WG Device Management](/wg-device-management) ## Subprojects diff --git a/sig-list.md b/sig-list.md index 4de468be5..7c7de17e7 100644 --- a/sig-list.md +++ b/sig-list.md @@ -64,6 +64,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md) |[API Expression](wg-api-expression/README.md)|[api-expression](https://github.com/kubernetes/kubernetes/labels/wg%2Fapi-expression)|* API Machinery
* Architecture
|* [Antoine Pelisse](https://github.com/apelisse), Google
* [Kevin Wiesmueller](https://github.com/kwiesmueller), Google
|* [Slack](https://kubernetes.slack.com/messages/wg-api-expression)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-api-expression)|* Regular WG Meeting: [Tuesdays at 9:30 PT (Pacific Time) (biweekly)](https://zoom.us/j/94238112084)
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps
* Autoscaling
* Node
* Scheduling
|* [Aldo Culquicondor](https://github.com/alculquicondor), Google
* [Marcin Wielgus](https://github.com/mwielgus), Google
* [Maciej Szulik](https://github.com/soltysh), Red Hat
* [Swati Sehgal](https://github.com/swatisehgal), Red Hat
|* [Slack](https://kubernetes.slack.com/messages/wg-batch)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)|* Regular Meeting ([Calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 15th 2024)s at 3PM CET (Central European Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)
* Regular Meeting ([Calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 1st 2024)s at 3PM PT (Pacific Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps
* Storage
|* [Xing Yang](https://github.com/xing-yang), VMware
* [Xiangqian Yu](https://github.com/yuxiangqian), Google
|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)
+|[Device Management](wg-device-management/README.md)|[device-management](https://github.com/kubernetes/kubernetes/labels/wg%2Fdevice-management)|* Architecture
* Autoscaling
* Network
* Node
* Scheduling
|* [John Belamaric](https://github.com/johnbelamaric), Google
* [Kevin Klues](https://github.com/klueska), NVIDIA
* [Patrick Ohly](https://github.com/pohly), Intel
|* [Slack](https://kubernetes.slack.com/messages/wg-device-management)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-device-management)|* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time) (biweekly)](TBD)
|[LTS](wg-lts/README.md)|[lts](https://github.com/kubernetes/kubernetes/labels/wg%2Flts)|* Architecture
* Cluster Lifecycle
* K8s Infra
* Release
* Security
* Testing
|* [Jeremy Rickard](https://github.com/jeremyrickard), Microsoft
* [Jordan Liggitt](https://github.com/liggitt), Google
* [Micah Hausler](https://github.com/micahhausler), Amazon
|* [Slack](https://kubernetes.slack.com/messages/wg-lts)
* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-lts)|* Regular WG Meeting: [Tuesdays at 07:00 PT (Pacific Time) (biweekly)](https://zoom.us/j/92480197536?pwd=dmtSMGJRQmNYYTIyZkFlQ25JRngrdz09)
|[Policy](wg-policy/README.md)|[policy](https://github.com/kubernetes/kubernetes/labels/wg%2Fpolicy)|* Architecture
* Auth
* Multicluster
* Network
* Node
* Scheduling
* Storage
|* [Jim Bugwadia](https://github.com/JimBugwadia), Kyverno/Nirmata
* [Poonam Lamba](https://github.com/poonam-lamba), Google
* [Andy Suderman](https://github.com/sudermanjr), Fairwinds
|* [Slack](https://kubernetes.slack.com/messages/wg-policy)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-policy)|* Regular WG Meeting: [Wednesdays at 8:00 PT (Pacific Time) (semimonthly)](https://zoom.us/j/7375677271)
|[Structured Logging](wg-structured-logging/README.md)|[structured-logging](https://github.com/kubernetes/kubernetes/labels/wg%2Fstructured-logging)|* API Machinery
* Architecture
* Cloud Provider
* Instrumentation
* Network
* Node
* Scheduling
* Storage
|* [Mengjiao Liu](https://github.com/mengjiao-liu), DaoCloud
* [Patrick Ohly](https://github.com/pohly), Intel
|* [Slack](https://kubernetes.slack.com/messages/wg-structured-logging)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-structured-logging)| diff --git a/sig-network/README.md b/sig-network/README.md index d96ff386a..99283aa43 100644 --- a/sig-network/README.md +++ b/sig-network/README.md @@ -72,6 +72,7 @@ subprojects, and resolve cross-subproject technical issues and decisions. ## Working Groups The following [working groups][working-group-definition] are sponsored by sig-network: +* [WG Device Management](/wg-device-management) * [WG Policy](/wg-policy) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-node/README.md b/sig-node/README.md index 2ec627344..4e0dc89ed 100644 --- a/sig-node/README.md +++ b/sig-node/README.md @@ -53,6 +53,7 @@ subprojects, and resolve cross-subproject technical issues and decisions. The following [working groups][working-group-definition] are sponsored by sig-node: * [WG Batch](/wg-batch) +* [WG Device Management](/wg-device-management) * [WG Policy](/wg-policy) * [WG Structured Logging](/wg-structured-logging) diff --git a/sig-scheduling/README.md b/sig-scheduling/README.md index edeb5b8a8..3901cf69f 100644 --- a/sig-scheduling/README.md +++ b/sig-scheduling/README.md @@ -63,6 +63,7 @@ subprojects, and resolve cross-subproject technical issues and decisions. The following [working groups][working-group-definition] are sponsored by sig-scheduling: * [WG Batch](/wg-batch) +* [WG Device Management](/wg-device-management) * [WG Policy](/wg-policy) * [WG Structured Logging](/wg-structured-logging) diff --git a/sigs.yaml b/sigs.yaml index b83619d06..bf5538e04 100644 --- a/sigs.yaml +++ b/sigs.yaml @@ -3345,6 +3345,48 @@ workinggroups: liaison: github: pohly name: Patrick Ohly +- dir: wg-device-management + name: Device Management + mission_statement: > + Enable simple and efficient configuration, sharing, and allocation of accelerators + and other specialized devices. + + [Additional context](https://groups.google.com/a/kubernetes.io/g/dev/c/YWXGXe07A5w/m/OqLvdQ47BQAJ) + + charter_link: charter.md + stakeholder_sigs: + - Architecture + - Autoscaling + - Network + - Node + - Scheduling + label: device-management + leadership: + chairs: + - github: johnbelamaric + name: John Belamaric + company: Google + - github: klueska + name: Kevin Klues + company: NVIDIA + - github: pohly + name: Patrick Ohly + company: Intel + meetings: + - description: Regular WG Meeting + day: Tuesday + time: "8:30" + tz: PT (Pacific Time) + frequency: biweekly + url: TBD + archive_url: https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?usp=sharing + recordings_url: TBD + contact: + slack: wg-device-management + mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-device-management + liaison: + github: pohly + name: Patrick Ohly - dir: wg-lts name: LTS mission_statement: > diff --git a/wg-device-management/README.md b/wg-device-management/README.md new file mode 100644 index 000000000..633fe3163 --- /dev/null +++ b/wg-device-management/README.md @@ -0,0 +1,42 @@ + +# Device Management Working Group + +Enable simple and efficient configuration, sharing, and allocation of accelerators and other specialized devices. +[Additional context](https://groups.google.com/a/kubernetes.io/g/dev/c/YWXGXe07A5w/m/OqLvdQ47BQAJ) + +The [charter](charter.md) defines the scope and governance of the Device Management Working Group. + +## Stakeholder SIGs +* [SIG Architecture](/sig-architecture) +* [SIG Autoscaling](/sig-autoscaling) +* [SIG Network](/sig-network) +* [SIG Node](/sig-node) +* [SIG Scheduling](/sig-scheduling) + +## Meetings +*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-device-management) for the group will typically add invites for the following meetings to your calendar.* +* Regular WG Meeting: [Tuesdays at 8:30 PT (Pacific Time)](TBD) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=8:30&tz=PT%20%28Pacific%20Time%29). + * [Meeting notes and Agenda](https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?usp=sharing). + * [Meeting recordings](TBD). + +## Organizers + +* John Belamaric (**[@johnbelamaric](https://github.com/johnbelamaric)**), Google +* Kevin Klues (**[@klueska](https://github.com/klueska)**), NVIDIA +* Patrick Ohly (**[@pohly](https://github.com/pohly)**), Intel + +## Contact +- Slack: [#wg-device-management](https://kubernetes.slack.com/messages/wg-device-management) +- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-device-management) +- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fdevice-management) +- Steering Committee Liaison: Patrick Ohly (**[@pohly](https://github.com/pohly)**) + + + diff --git a/wg-device-management/charter.md b/wg-device-management/charter.md new file mode 100644 index 000000000..00155d881 --- /dev/null +++ b/wg-device-management/charter.md @@ -0,0 +1,103 @@ +# WG Device Management Charter + +This charter adheres to the conventions described in the [Kubernetes Charter +README] and uses the Roles and Organization Management outlined in +[wg-governance]. + +## Scope + +Enable simple and efficient configuration, sharing, and allocation of +accelerators and other specialized devices. This working group focuses on the +APIs, abstractions, and feature designs needed to configure, target, and share +the necessary hardware for both batch and serving (inference) workloads. + +### In scope + +- Enable efficient utilization of specialized hardware devices. This includes + sharing one or more resources effectively (many workloads sharing a pool of + devices), as well as sharing individual devices effectively (several workloads + dividing up a single device for sharing). +- Enable workload authors to specify “just enough” details about their workload + requirements to ensure it runs optimally, without having to understand exactly + how the infrastructure team has provisioned the cluster. +- Enable the scheduler to choose the correct place to run a workload the vast + majority of the time (rejections should be extremely rare). +- Enable cluster autoscalers and other node auto-provisioning components to + predict whether creating additional resources will satisfy workload needs, + before provisioning those resources. +- Enable the shift from “pods run on nodes” to “workloads consume capacity”. + This allows Kubernetes to provision sets of pods on top of sets of nodes and + specialized hardware, while taking into account the relationships between + those infrastructure components. +- Enable in-node devices as well as network-accessible devices. +- Minimize workload disruption due to hardware failures. +- Address fragmentation of accelerator due to fractional use. +- Additional problems that may be identified and deemed in scope as we gather + use cases and requirements from WG Serving, WG Batch, and other stakeholders. +- Address all of the above while with a simple API that is a natural extension + of the existing Kubernetes APIs, and avoids or minimizes any transition + effort. + +### Out of Scope + +- Higher-level workload controller APIs (for example, the equivalent of + Deployment, StatefulSet, or DaemonSet) for specific types of workloads. +- General resource management requirements not related to devices. + +## Deliverables + +The WG will coordinate the delivery of KEPs and their implementations by the +participating SIGs. Interim artifacts will include documents capturing use +cases, requirements, and designs; however, all of those will eventually result +in KEPs and code owned by SIGs. + +Specifically, we expect to need: + +- APIs for publishing resource capacity of in-node and network-accessible + devices, as well as sample code to ease creation of drivers to populate this + information. +- APIs for specifying workload resource requirements with respect to devices. +- APIs, algorithms, and implementations for allocating access to and resources on devices, as well as + persisting the results of those allocations. +- APIs, algorithms, and implementations for allowing adminstrators to control + and govern access to devices. + +## Stakeholders + +- SIG Architecture +- SIG Autoscaling +- SIG Network +- SIG Node +- SIG Scheduling + +Additionally a broad set of end users, device vendors, cloud providers, +Kubernetes distribution providers, and ecosystem projects (particularly +autoscaling-related projects) have expressed interest in this effort. There are +five primary groups of stakeholders from each of which we expect multiple participants: + +- Device vendors that manufacture accelerators and other specialized hardware + which they would like to make available to Kubernetes users. +- Kubernetes distribution and managed offering providers that would like to make + specialized hardware available to their users. +- Kubernetes ecosystem projects that help manage workloads utilizing these + accelerators (e.g., Karpenter, Kueue, Volcano) +- End user workload authors that will create workloads that take advantage of + the specialized hardware. +- Cluster administrators that operate and govern clusters containing the + specialized hardware. + +## Roles and Organization Management + +This working group adheres to the Roles and Organization Management outlined in +[wg-governance] and opts-in to updates and modifications to [wg-governance]. + +## Exit Criteria + +The working group will disband when the KEPs resulting from these discussions +have reached a terminal state. When the core functionality for dynamic resource +allocation (DRA) reaches GA, we will evaluate whether the working group should +be disbanded and any remaining KEPs be left to the management of their owning +SIGs. + +[wg-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md +[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md