address feedback from review

This commit is contained in:
Kevin Hannon 2025-01-20 10:10:53 -05:00
parent e478776ec6
commit cfd3cd8284
1 changed files with 30 additions and 18 deletions

View File

@ -18,7 +18,7 @@ We will breakdown our highlights into Sub Projects, KEPs, talks, community adopt
##### Kueue
Kueue has had 5 releases in 2024.
Kueue has had 5 minor releases in 2024.
- [Release 0.6](https://github.com/kubernetes-sigs/kueue/releases/tag/v0.6.0)
@ -32,19 +32,25 @@ Kueue has had 5 releases in 2024.
In 2024, the kueue community would like to highlight are Topology aware scheduling, MultiKueue, Kueue Dashboard, KueueCtrl, Deployment/Statefulset integration for serving and Fair sharing.
Topology aware scheduling facilitates scheduling of workloads that take in account data center topology. Workloads benefit from using interconnects that are physically close together.
[Topology aware scheduling](https://kueue.sigs.k8s.io/docs/concepts/topology_aware_scheduling/) facilitates scheduling of workloads that take in account data center topology.
Workloads benefit from using interconnects that are physically close together.
MultiKueue provides a way of dispatching batch workloads to worker clusters. Kueue provides multicluster dispatching for popular batch workloads such as Ray, Job, Kubeflow and JobSet. This feature went beta in 0.9.
[MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) provides a way of dispatching batch workloads to worker clusters.
Kueue provides multicluster dispatching for popular batch workloads such as Ray, Job, Kubeflow and JobSet.
This feature went beta in 0.9.
Kueue Dashboards has been a popular ask for Kueue. Users would like to have a visualization representation of queueing and we are happy to announce that a dashboard has been created for Kueue. This went into kueue in late 2024 and a big focus of 2025 will be to harden this for production.
[Kueue Dashboards](https://github.com/kubernetes-sigs/kueue/tree/release-0.10/cmd/experimental/kueue-viz) has been a popular ask for Kueue.
Users would like to have a visualization representation of queueing and we are happy to announce that a dashboard has been created for Kueue.
This went into kueue in late 2024 and a big focus of 2025 will be to harden this for production.
KueueCtrl provides a cli for creating kueue objects. The plugin is hosted in krew and is easily installed as a kueue plugin.
[KueueCtl](https://kueue.sigs.k8s.io/docs/reference/kubectl-kueue/) provides a cli for creating kueue objects.
The plugin is hosted in krew and is easily installed as a kueue plugin.
Deployment/StatefulSet integration provides an avenue for the usage of Kueue for serving workloads. Serving leads to a need for sharing/preemption of model servers that may leverage accelerators. Kueue provides an integration with popular methods of deploying services (Deployment/StatefulSet).
[Deployment](https://kueue.sigs.k8s.io/docs/tasks/run/deployment/) and [StatefulSet](https://kueue.sigs.k8s.io/docs/tasks/run/statefulset/) integration provides an avenue for the usage of Kueue for serving workloads. Serving leads to a need for sharing/preemption of model servers that may leverage accelerators. Kueue provides an integration with popular methods of deploying services (Deployment/StatefulSet).
##### JobSet
Jobset has had 4 release in 2024.
Jobset has had 4 minor releases in 2024.
- [Release 0.4](https://github.com/kubernetes-sigs/jobset/releases/tag/v0.4.0)
@ -54,14 +60,14 @@ Jobset has had 4 release in 2024.
- [Release 0.7](https://github.com/kubernetes-sigs/jobset/releases/tag/v0.7.0)
A major achievement of JobSet has been the adoption of JobSet as a component for Kubeflow Training Operator V2.
A major achievement of JobSet has been the adoption of JobSet as a component for [Kubeflow Training Operator](https://github.com/kubeflow/training-operator) V2.
There has been a collaborative effort with the Kubeflow community and the batch community to implement the features needed for this integration.
[Metaflow](https://github.com/Netflix/metaflow/pull/1804) has adopted the use of JobSet for distributed ML training.
##### KJob
[KJob](https://github.com/kubernetes-sigs/kjob?tab=readme-ov-file#kjob) has been started to provide a CLI friendly way for users to submit batch jobs.
[KJob](https://github.com/kubernetes-sigs/kjob) has been started to provide a CLI friendly way for users to submit batch jobs.
The HPC/ML community tend to prefer CLI over YAML so the focus was to provide a templated solution for submitting batch jobs.
Another focus of this project is to provide a smooth transition for Slurm users.
@ -87,32 +93,39 @@ WG-Batch provided a series of kubernetes enhancements that improved the experien
### Talks
- WG-Batch Update at Kubecon
- Authors: Kevin Hannon and Marcin Wielgus
- Speakers: Kevin Hannon and Marcin Wielgus
- Kubecon NA, Salt Lake City
- [Recording](https://www.youtube.com/watch?v=C2ABOEzZTWg&list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&index=283&pp=iAQB)
- Keynote: MultiCluster Batch Jobs Dispatching with Kueue at CERN
- Authors: Ricardo Rocha and Marcin Wielgus
- Speakers: Ricardo Rocha and Marcin Wielgus
- Kubecon NA, Salt Lake City
- [Recording](https://www.youtube.com/watch?v=xMmskWIlktA&list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&index=193&pp=iAQB)
- Multitenancy and Fairness at Scale with Kueue: A Case Study
- Authors: Aldo Culquicondor and Rajat Phull
- Speakers: Aldo Culquicondor and Rajat Phull
- Kubecon NA, Salt Lake City
- [Recording](https://www.youtube.com/watch?v=GYiuTQCvTx8&list=PLj6h78yzYM2Mvqk_mNejD7kbe3tldxxsr&index=5&pp=iAQB)
- Advanced Resource Management for Running AI/ML Workloads with Kueue
- Authors: Michał Woźniak and Yuki Iwai
- Speakers: Michał Woźniak and Yuki Iwai
- Kubecon EU, Paris
- [Recording](https://www.youtube.com/watch?v=6k_8Go3u8Qk)
- Scale Your Batch / Big Data / AI Workloads Beyond the Kubernetes Scheduler
- Authors: Antonin Stefanutti and Anish Asthana
- Speaker: Antonin Stefanutti and Anish Asthana
- KubeCon EU, Paris
- [Recording](https://www.youtube.com/watch?v=Ij5EAnuF-jk&list=PLj6h78yzYM2PWGv34W6w5ssq1b1meRmY7&index=15&pp=iAQB)
- WG-Batch Update
- Author: Marcin Wielgus
- Speaker: Michał Woźniak and Yuki Iwai
- KubeCon EU, Paris
- [Recording](https://www.youtube.com/watch?v=2D2QSzUnS0M&list=PLj6h78yzYM2N8nw1YcqqKveySH6_0VnI0&index=84&pp=iAQB)
- How the Kubernetes Community is Improving Kubernetes for HPC/AI/ML Workloads
- Author: Kevin Hannon
- FOSDEM 2024
- [Recording](https://live.fosdem.org/watch/ua2118)
### Community adoption
@ -130,9 +143,8 @@ Operational tasks in [wg-governance.md]:
- [x] WG leaders in [sigs.yaml] are accurate and active, and updated if needed
- [x] Meeting notes and recordings for 2024 are linked from [README.md] and updated/uploaded if needed
- [x] Updates provided to sponsoring SIGs in 2024
- WG-Batch Updates at Kubecon EU 2024
- WG-Batch Updates at Kubecon NA 2024
- [WG-Batch Updates at Kubecon EU 2024](https://www.youtube.com/watch?v=2D2QSzUnS0M&list=PLj6h78yzYM2N8nw1YcqqKveySH6_0VnI0&index=84&pp=iAQB)
- [WG-Batch Updates at Kubecon NA 2024](https://www.youtube.com/watch?v=C2ABOEzZTWg&list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&index=283&pp=iAQB)
[wg-governance.md]: https://git.k8s.io/community/committee-steering/governance/wg-governance.md
[README.md]: https://git.k8s.io/community/wg-batch/README.md
[sigs.yaml]: https://git.k8s.io/community/sigs.yaml