community/sig-scalability/README.md

177 lines
9.9 KiB
Markdown

<!---
This is an autogenerated file!
Please do not edit this file directly, but instead make changes to the
sigs.yaml file in the project root.
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
--->
# Scalability Special Interest Group
SIG Scalability is responsible for defining and driving scalability goals for Kubernetes. We also coordinate and contribute to general system-wide scalability and performance improvements (not falling into the charter of other individual SIGs) by driving large architectural changes and finding bottlenecks, as well as provide guidance and consultations about any scalability and performance related aspects of Kubernetes. <br/> We are actively working on finding and removing various scalability bottlenecks which should lead us towards pushing system's scalability higher. This may include going beyond 5k nodes in the future - although that's not our priority as of now, this is very deeply in our area of interest and we are happy to guide and collaborate on any efforts towards that goal as long as they are not sacrificing on overall Kubernetes architecture (by making it non-maintainable, non-understandable, etc.).
The [charter](charter.md) defines the scope and governance of the Scalability Special Interest Group.
## Meetings
*Joining the [mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-scale) for the group will typically add invites for the following meetings to your calendar.*
* Regular SIG Meeting: [Thursdays at 10:30 PT (Pacific Time)](https://zoom.us/j/94252896018?pwd=cTlMMlBoTHZqUEdjRm9VY2NWNUg5dz09) (bi-weekly ([upcoming meeting dates](#upcoming-meeting-dates))). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=10%3A30&tz=PT%20%28Pacific%20Time%29).
* [Meeting notes and Agenda](https://docs.google.com/a/bobsplanet.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?usp=drive_web).
* [Meeting recordings](https://www.youtube.com/watch?v=NDP1uYyom28&list=PL69nYSiGNLP2X-hzNTqyELU6jYS3p10uL).
## Leadership
### Chairs
The Chairs of the SIG run operations and processes governing the SIG.
* Marcel Zieba (**[@marseel](https://github.com/marseel)**), Isovalent
* David (Mengqi) Yu (**[@mengqiy](https://github.com/mengqiy)**), Amazon
### Technical Leads
The Technical Leads of the SIG establish new subprojects, decommission existing
subprojects, and resolve cross-subproject technical issues and decisions.
* Shyam Jeedigunta (**[@shyamjvs](https://github.com/shyamjvs)**), Amazon
* Wojciech Tyczynski (**[@wojtek-t](https://github.com/wojtek-t)**), Google
## Emeritus Leads
* Matt Matejczyk (**[@mm4tt](https://github.com/mm4tt)**)
## Contact
- Slack: [#sig-scalability](https://kubernetes.slack.com/messages/sig-scalability)
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-scale)
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fscalability)
- GitHub Teams:
- [@kubernetes/sig-scalability](https://github.com/orgs/kubernetes/teams/sig-scalability) - General Discussion
- [@kubernetes/sig-scalability-leads](https://github.com/orgs/kubernetes/teams/sig-scalability-leads) - Leads
- [@kubernetes/sig-scalability-pr-reviews](https://github.com/orgs/kubernetes/teams/sig-scalability-pr-reviews) - PR Reviews
- Steering Committee Liaison: Antonio Ojea (**[@aojea](https://github.com/aojea)**)
## Subprojects
The following [subprojects][subproject-definition] are owned by sig-scalability:
### inference-perf
[Described below](#inference-perf)
- **Owners:**
- [kubernetes-sigs/inference-perf/heads/main](https://github.com/kubernetes-sigs/inference-perf/blob/refs/heads/main/OWNERS)
### kubernetes-scalability-and-performance-tests-and-validation
[Described below](#kubernetes-scalability-and-performance-tests-and-validation)
- **Owners:**
- [kubernetes/community/sig-scalability/processes](https://github.com/kubernetes/community/blob/master/sig-scalability/processes/OWNERS)
- [kubernetes/kubernetes/test/e2e/scalability](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/scalability/OWNERS)
### kubernetes-scalability-bottlenecks-detection
[Described below](#kubernetes-scalability-bottlenecks-detection)
- **Owners:**
- [kubernetes/community/sig-scalability/blogs](https://github.com/kubernetes/community/blob/master/sig-scalability/blogs/OWNERS)
### kubernetes-scalability-definition
[Described below](#kubernetes-scalability-definition)
- **Owners:**
- [kubernetes/community/sig-scalability/configs-and-limits](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/OWNERS)
- [kubernetes/community/sig-scalability/slos](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/OWNERS)
### kubernetes-scalability-governance
[Described below](#kubernetes-scalability-governance)
- **Owners:**
- [kubernetes/community/sig-scalability/governance](https://github.com/kubernetes/community/blob/master/sig-scalability/governance/OWNERS)
### kubernetes-scalability-test-frameworks
[Described below](#kubernetes-scalability-test-frameworks)
- **Owners:**
- [kubernetes/kubernetes/cluster/images/kubemark](https://github.com/kubernetes/kubernetes/blob/master/cluster/images/kubemark/OWNERS)
- [kubernetes/kubernetes/cmd/kubemark](https://github.com/kubernetes/kubernetes/blob/master/cmd/kubemark/OWNERS)
- [kubernetes/kubernetes/pkg/kubemark](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubemark/OWNERS)
- [kubernetes/kubernetes/test/kubemark](https://github.com/kubernetes/kubernetes/blob/master/test/kubemark/OWNERS)
- [kubernetes/perf-tests](https://github.com/kubernetes/perf-tests/blob/master/OWNERS)
- [kubernetes/perf-tests/clusterloader2](https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/OWNERS)
[subproject-definition]: https://github.com/kubernetes/community/blob/master/governance.md#subprojects
[working-group-definition]: https://github.com/kubernetes/community/blob/master/governance.md#working-groups
<!-- BEGIN CUSTOM CONTENT -->
# Scalability Regression - Contact Points
SIG Scalability has established best-effort oncall rotation operating in
CEST/CET business hours (~9:00-18:00). If you have any inquiries about
scalability regressions, e.g. regression status, whether it should block the
release or not, etc. please reach out to the current oncaller. They can be found
on https://go.k8s.io/oncall .
Also do not hesitate to contact those SIG members for status update:
* Antoni Zawody (**[@tosi3k](https://github.com/tosi3k)**), Google
* Jakub Przychodzeń (**[@jprzychodzen](https://github.com/jprzychodzen)**), Google
* Maciej Borsz (**[@mborsz](https://github.com/mborsz)**), Google
* Marcel Zięba (**[@marseel](https://github.com/marseel)**), Google
* Wojciech Tyczynski (**[@wojtek-t](https://github.com/wojtek-t)**), Google
## Upcoming Meeting Dates
Check out [this calendar](https://calendar.google.com/calendar/embed?src=90g85fajsmubf5vp02uhpbvcq8%40group.calendar.google.com) for upcoming meeting dates.
You can use [this link](https://calendar.google.com/calendar?cid=OTBnODVmYWpzbXViZjV2cDAydWhwYnZjcThAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ) to add it to your personal Google calendar.
# Details about SIG-Scalability sub-projects
## Kubernetes scalability definition
Defining what does it mean that "Kubernetes scales".
This includes defining (or approving) individual performance and scalability
related SLIs/SLOs, ensuring they are all oriented on user experience and
consistent with each other.
Measuring and publishing limits within which Kubernetes is supposed to scale
as defined above and providing recommendations about setting clusters in
scalable and performant ways.
* [Kubernetes Scalability SLIs/SLOs](./slos/slos.md).
## Kubernetes scalability governance
Establishing and documenting best practises on how do design and implement
Kubernetes features in scalable and performance way.
Educating contributors and ensuring best practises are widely used.
* [Regressions case study](./governance/scalability-regressions-case-studies.md)
* [Scalability Regressions and Bugs](https://docs.google.com/document/d/1_mqv_T7i5k7_HgcQihEuFdq7ZCIf3AAGyAo9axzdAGI/edit)
## Kubernetes scalability bottlenecks detection
Detecting scalability bottlenecks and limitations, documenting them and
driving architectural changes to eliminate those (if such are required) in
collaboration with other SIGs or directly delegating non cross-cutting
improvements to individual SIGs.
* [Scalability issues with Services](./blogs/k8s-services-scalability-issues.md)
## Kubernetes scalability test frameworks
Designing and creating frameworks to make scalability and performance testing
of Kubernetes easy and available for all contributors.
Different frameworks may help in different aspects of scalability testing,
enabling making conscious tradeoffs, e.g. cost vs accuracy or real life vs
more generalized benchmarking scenarios.
* [Cluster Loader v2](https://github.com/kubernetes/perf-tests/tree/master/clusterloader2)
* [Kubemark](https://github.com/kubernetes/kubernetes/blob/master/cmd/kubemark)
## Kubernetes scalability and performance tests and validation
Ensuring that all tests necessary to validate Kubernetes scalability and
performance exists (ideally by providing easy-to-use frameworks and working
with SIGs to provide them), having environment and resources to run them:
* [Official tests](https://github.com/kubernetes/perf-tests/tree/master/clusterloader2/testing)
* [Testgrid](https://testgrid.k8s.io/sig-scalability)
* [Perfdash](https://perf-dash.k8s.io/)
Ensuring that tests are being executed according to calendar and ensuring that
each official Kubernetes release satisfies all scalability and performance
requirements as stated in "Kubernetes scalability" definition.
This also includes designing processes to reduce maintenance work and number
of scalability and performance regressions:
* [Processes](https://github.com/kubernetes/community/tree/master/sig-scalability/processes)
## Inference perf
TODO: to fill in.
<!-- END CUSTOM CONTENT -->