Merge branch 'master' into fix/helm-incu
This commit is contained in:
commit
03ecf9eb26
68
CLA.md
68
CLA.md
|
|
@ -1,30 +1,70 @@
|
|||
### How do I sign the CNCF CLA?
|
||||
# The Contributor License Agreement
|
||||
|
||||
* To sign up as an individual or as an employee of a signed organization, go to https://identity.linuxfoundation.org/projects/cncf
|
||||
* To sign up as an organization, go to https://identity.linuxfoundation.org/node/285/organization-signup
|
||||
* To review the CNCF CLA, go to https://github.com/cncf/cla
|
||||
The [Cloud Native Computing Foundation][CNCF] defines the legal status of the
|
||||
contributed code in a _Contributor License Agreement_ (CLA).
|
||||
|
||||
***
|
||||
Only original source code from CLA signatories can be accepted into kubernetes.
|
||||
|
||||
### After you select one of the options above, please follow the instructions below:
|
||||
This policy does not apply to [third_party] and [vendor].
|
||||
|
||||
**Step 1**: You must sign in with GitHub.
|
||||
## How do I sign?
|
||||
|
||||
**Step 2**: If you are signing up as an employee, you must use your official person@organization.domain email address in the CNCF account registration page.
|
||||
#### 1. Read
|
||||
|
||||
* [CLA for individuals] to sign up as an individual or as an employee of a signed organization.
|
||||
* [CLA for corporations] to sign as a corporation representative and manage signups from your organization.
|
||||
|
||||
#### 2. Sign in with GitHub.
|
||||
|
||||
**Step 3**: The email you use on your commits (https://help.github.com/articles/setting-your-email-in-git/) must match the email address you use when signing up for the CNCF account.
|
||||
Click
|
||||
* [Individual signup] to sign up as an individual or as an employee of a signed organization.
|
||||
* [Corp signup] to sign as a corporation representative and manage signups from your organization.
|
||||
|
||||
Either signup form looks like this:
|
||||
|
||||

|
||||
|
||||
#### 3. Enter the correct E-mail address to validate!
|
||||
|
||||
The address entered on the form must meet two constraints:
|
||||
|
||||
* It __must match__ your [git email] (the output of `git config user.email`)
|
||||
or your PRs will not be approved!
|
||||
|
||||
* It must be your official `person@organization.com` address if you signed up
|
||||
as an employee of said organization.
|
||||
|
||||

|
||||

|
||||
|
||||
#### 4. Look for an email indicating successful signup.
|
||||
|
||||
**Step 4**: Once the CLA sent to your email address is signed (or your email address is verified in case your organization has signed the CLA), you should be able to check that you are authorized in any new PR you create.
|
||||
> The Linux Foundation
|
||||
>
|
||||
> Hello,
|
||||
>
|
||||
> You have signed CNCF Individual Contributor License Agreement.
|
||||
> You can see your document anytime by clicking View on HelloSign.
|
||||
>
|
||||
|
||||
Once you have this, the CLA authorizer bot will authorize your PRs.
|
||||
|
||||

|
||||
|
||||
**Step 5**: The status on your old PRs will be updated when any new comment is made on it.
|
||||
|
||||
### I'm having issues with signing the CLA.
|
||||
## Troubleshooting
|
||||
|
||||
If you're facing difficulty with signing the CNCF CLA, please explain your case on https://github.com/kubernetes/kubernetes/issues/27796 and we (@sarahnovotny and @foxish), along with the CNCF will help sort it out.
|
||||
If you have signup trouble, please explain your case on
|
||||
the [CLA signing issue] and we (@sarahnovotny and @foxish),
|
||||
along with the [CNCF] will help sort it out.
|
||||
|
||||
Another option: ask for help at `helpdesk@rt.linuxfoundation.org`.
|
||||
|
||||
[CNCF]: https://www.cncf.io/community
|
||||
[CLA signing issue]: https://github.com/kubernetes/kubernetes/issues/27796
|
||||
[CLA for individuals]: https://github.com/cncf/cla/blob/master/individual-cla.pdf
|
||||
[CLA for corporations]: https://github.com/cncf/cla/blob/master/corporate-cla.pdf
|
||||
[Corp signup]: https://identity.linuxfoundation.org/node/285/organization-signup
|
||||
[Individual signup]: https://identity.linuxfoundation.org/projects/cncf
|
||||
[git email]: https://help.github.com/articles/setting-your-email-in-git
|
||||
[third_party]: https://github.com/kubernetes/kubernetes/tree/master/third_party
|
||||
[vendor]: https://github.com/kubernetes/kubernetes/tree/master/vendor
|
||||
|
|
@ -1,16 +1,30 @@
|
|||
# Contributing guidelines
|
||||
# Contributing to the community repo
|
||||
|
||||
This project is for documentation about the community. To contribute to one of
|
||||
the Kubernetes projects please see the contribution guide for that project.
|
||||
|
||||
## How To Contribute
|
||||
Contributions to this community repository follow a
|
||||
[pull request](https://help.github.com/articles/using-pull-requests/) (PR)
|
||||
model:
|
||||
|
||||
The contributions here follow a [pull request](https://help.github.com/articles/using-pull-requests/) model with some additional process.
|
||||
The process is as follows:
|
||||
#### 1. Submit a PR with your change
|
||||
|
||||
1. Submit a pull request with the requested change.
|
||||
2. Another person, other than a Special Interest Group (SIG) owner, can mark it Looks Good To Me (LGTM) upon successful review. Otherwise feedback can be given.
|
||||
3. A SIG owner can merge someone else's change into their SIG documentation immediate.
|
||||
4. Someone cannot immediately merge their own change. To merge your own change wait 24 hours during the week or 72 hours over a weekend. This allows others the opportunity to review a change.
|
||||
#### 2. Get an LGTM.
|
||||
|
||||
_Note, the SIG Owners decide on the layout for their own sub-directory structure._
|
||||
Upon successful review, someone will give the PR
|
||||
a __LGTM__ (_looks good to me_) in the review thread.
|
||||
|
||||
#### 3. Allow time for others to see it
|
||||
|
||||
Once you have an __LGTM__, please wait 24 hours during
|
||||
the week or 72 hours over a weekend before you
|
||||
merge it, to give others (besides your initial reviewer)
|
||||
time to see it.
|
||||
|
||||
__That said, a [SIG lead](sig-list.md) may shortcut this by merging
|
||||
someone else's change into their SIG's documentation
|
||||
at any time.__
|
||||
|
||||
Edits in SIG sub-directories should follow structure and guidelines set
|
||||
by the respective SIG leads - see `CONTRIBUTING` instructions in subdirectories.
|
||||
|
||||
|
||||
|
||||
121
README.md
121
README.md
|
|
@ -1,71 +1,86 @@
|
|||
# Kubernetes Community Documentation
|
||||
# Kubernetes Community
|
||||
|
||||
Welcome to the Kubernetes community documentation. Here you can learn about what's happening in the community.
|
||||
Welcome to the Kubernetes community!
|
||||
|
||||
## Slack Chat
|
||||
This is the starting point for becoming a contributor - improving docs, improving code, giving talks etc.
|
||||
## Communicating
|
||||
|
||||
Kubernetes uses [Slack](http://slack.com) for community discussions.
|
||||
General communication channels - e.g. filing issues, chat, mailing lists and
|
||||
conferences are listed on the [communication](communication.md) page.
|
||||
|
||||
**Join**: Joining is self-service. Go to [slack.k8s.io](http://slack.k8s.io) to join.
|
||||
For more specific topics, try a SIG.
|
||||
|
||||
**Access**: Once you join, the team can be found at [kubernetes.slack.com](http://kubernetes.slack.com)
|
||||
## SIGs
|
||||
|
||||
**Archives**: Discussions on most channels are archived at [kubernetes.slackarchive.io](http://kubernetes.slackarchive.io). Start archiving by inviting the slackarchive bot to a channel via `/invite @slackarchive`
|
||||
Kubernetes is a set of projects, each shepherded by a special interest group (SIG).
|
||||
|
||||
A first step to contributing is to pick from the [list of kubernetes SIGs](sig-list.md).
|
||||
|
||||
To add new channels, contact one of the admins. Currently that includes briangrant, goltermann, jbeda, sarahnovotny and thockin.
|
||||
A SIG can have its own policy for contribution,
|
||||
described in a `README` or `CONTRIBUTING` file in the SIG
|
||||
folder in this repo (e.g. [sig-cli/contributing](sig-cli/contributing.md)),
|
||||
and its own mailing list, slack channel, etc.
|
||||
|
||||
## How Can I help?
|
||||
|
||||
## kubernetes mailing lists
|
||||
Documentation (like the text you are reading now) can
|
||||
always use improvement!
|
||||
|
||||
Many important announcements and discussions end up on the main development group.
|
||||
There's a [semi-curated list of issues][help-wanted]
|
||||
that should not need deep knowledge of the system.
|
||||
|
||||
kubernetes-dev@googlegroups.com
|
||||
To dig deeper, read a design doc, e.g. [architecture].
|
||||
|
||||
[Google Group](https://groups.google.com/forum/#!forum/kubernetes-dev)
|
||||
[Pick a SIG](sig-list.md), peruse its associated [cmd] directory,
|
||||
find a `main()` and read code until you find something you want to fix.
|
||||
|
||||
Users of kubernetes trade notes on:
|
||||
There's always code that can be clarified and variables
|
||||
or functions that can be renamed or commented.
|
||||
|
||||
kubernetes-users@googlegroups.com
|
||||
There's always a need for more test coverage.
|
||||
|
||||
[Google Group](https://groups.google.com/forum/#!forum/kubernetes-users)
|
||||
## Learn to Build
|
||||
|
||||
Links in [contributors/devel/README.md](contributors/devel/README.md)
|
||||
lead to many relevant topics, including
|
||||
* [Developer's Guide] - how to start a build/test cycle
|
||||
* [Collaboration Guide] - how to work together
|
||||
* [expectations] - what the community expects
|
||||
* [pull request] policy - how to prepare a pull request
|
||||
|
||||
## Making a Pull Request
|
||||
|
||||
We recommend that you work on existing issues before attempting
|
||||
to [develop a new feature].
|
||||
|
||||
Find an existing issue (e.g. one marked [help-wanted], or simply
|
||||
ask a SIG lead for suggestions), and respond on the issue thread
|
||||
expressing interest in working on it.
|
||||
|
||||
This helps other people know that the issue is active, and
|
||||
hopefully prevents duplicated efforts.
|
||||
|
||||
Before submitting a pull request, sign the [CLA].
|
||||
|
||||
If you want to work on a new idea of relatively small scope:
|
||||
|
||||
1. Submit an issue describing your proposed change to the repo in question.
|
||||
1. The repo owners will respond to your issue promptly.
|
||||
1. If your proposed change is accepted,
|
||||
sign the [CLA],
|
||||
and start work in your fork.
|
||||
1. Submit a [pull request] containing a tested change.
|
||||
|
||||
|
||||
## [Weekly Community Video Conference](community/README.md)
|
||||
[architecture]: https://github.com/kubernetes/kubernetes/blob/master/docs/design/architecture.md
|
||||
[cmd]: https://github.com/kubernetes/kubernetes/tree/master/cmd
|
||||
[CLA]: cla.md
|
||||
[Collaboration Guide]: contributors/devel/development.md
|
||||
[Developer's Guide]: contributors/devel/development.md
|
||||
[develop a new feature]: https://github.com/kubernetes/features
|
||||
[expectations]: contributors/devel/community-expectations.md
|
||||
[help-wanted]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Ahelp-wanted
|
||||
[pull request]: contributors/devel/pull-requests.md
|
||||
|
||||
The [weekly community meeting](https://zoom.us/my/kubernetescommunity) provides an opportunity for the different SIGs, WGs and other parts of the community to come together. More information about joining the weekly community meeting is available on our [agenda working document] (https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#)
|
||||
[]()
|
||||
|
||||
## Special Interest Groups (SIG) and Working Groups
|
||||
|
||||
Much of the community activity is organized into a community meeting, numerous SIGs and time bounded WGs. SIGs follow these [guidelines](governance.md) although each of these groups may operate a little differently depending on their needs and workflow. Each groups material is in its subdirectory in this project.
|
||||
|
||||
The community meeting calendar is available as an [iCal to subscribe to] (https://calendar.google.com/calendar/ical/cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com/public/basic.ics) (simply copy and paste the url into any calendar product that supports the iCal format) or [html to view] (https://calendar.google.com/calendar/embed?src=cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com&ctz=America/Los_Angeles).
|
||||
|
||||
| Name | Leads | Group | Slack Channel | Meetings |
|
||||
|------|-------|-------|---------------|----------|
|
||||
| [API Machinery](sig-api-machinery/README.md) | [@lavalamp (Daniel Smith, Google)](https://github.com/lavalamp) <br> [@deads2k (David Eads, Red Hat)] (https://github.com/orgs/kubernetes/people/deads2k)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-api-machinery) | [#sig-api-machinery](https://kubernetes.slack.com/messages/sig-api-machinery/) | [Every other Wednesday at 11:00 AM PST](https://staging.talkgadget.google.com/hangouts/_/google.com/kubernetes-sig) |
|
||||
| [Apps](sig-apps/README.md) | [@michelleN (Michelle Noorali, Deis)](https://github.com/michelleN)<br>[@mattfarina (Matt Farina, HPE)](https://github.com/mattfarina) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-apps) | [#sig-apps](https://kubernetes.slack.com/messages/sig-apps) | [Mondays 9:00AM PST](https://zoom.us/j/4526666954) |
|
||||
| [Auth](sig-auth/README.md) | [@ erictune (Eric Tune, Google)](https://github.com/erictune)<br> [@ericchiang (Eric Chiang, CoreOS)](https://github.com/orgs/kubernetes/people/ericchiang)<br> [@liggitt (Jordan Liggitt, Red Hat)] (https://github.com/orgs/kubernetes/people/liggitt) <br> [@deads2k (David Eads, Red Hat)] (https://github.com/orgs/kubernetes/people/deads2k) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-auth) | [#sig-auth](https://kubernetes.slack.com/messages/sig-auth/) | Biweekly [Wednesdays at 1100 to 1200 PT](https://zoom.us/my/k8s.sig.auth) |
|
||||
| [Autoscaling](sig-autoscaling/README.md) | [@fgrzadkowski (Filip Grządkowski, Google)](https://github.com/fgrzadkowski)<br> [@directxman12 (Solly Ross, Red Hat)](https://github.com/directxman12) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-autoscaling) | [#sig-autoscaling](https://kubernetes.slack.com/messages/sig-autoscaling/) | Biweekly (or triweekly) on [Thurs at 0830 PT](https://plus.google.com/hangouts/_/google.com/k8s-autoscaling) |
|
||||
| [AWS](sig-aws/README.md) | [@justinsb (Justin Santa Barbara)](https://github.com/justinsb)<br>[@kris-nova (Kris Nova)](https://github.com/kris-nova)<br>[@chrislovecnm (Chris Love)](https://github.com/chrislovecnm)<br>[@mfburnett (Mackenzie Burnett)](https://github.com/mfburnett) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) | [#sig-aws](https://kubernetes.slack.com/messages/sig-aws/) | We meet on [Zoom](https://zoom.us/my/k8ssigaws), and the calls are scheduled via the official [group mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) |
|
||||
| [Big Data](sig-big-data/README.md) | [@zmerlynn (Zach Loafman, Google)](https://github.com/zmerlynn)<br>[@timothysc (Timothy St. Clair, Red Hat)](https://github.com/timothysc)<br>[@wattsteve (Steve Watt, Red Hat)](https://github.com/wattsteve) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data) | [#sig-big-data](https://kubernetes.slack.com/messages/sig-big-data/) | Suspended |
|
||||
| [CLI](sig-cli/README.md) | [@fabianofranz (Fabiano Franz, Red Hat)](https://github.com/fabianofranz)<br>[@pwittrock (Phillip Wittrock, Google)](https://github.com/pwittrock)<br>[@AdoHe (Tony Ado, Alibaba)](https://github.com/AdoHe) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cli) | [#sig-cli](https://kubernetes.slack.com/messages/sig-cli) | Bi-weekly Wednesdays at 9:00 AM PT on [Zoom](https://zoom.us/my/sigcli) |
|
||||
| [Cluster Lifecycle](sig-cluster-lifecycle/README.md) | [@lukemarsden (Luke Marsden, Weave)] (https://github.com/lukemarsden) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-lifecycle) | [#sig-cluster-lifecycle](https://kubernetes.slack.com/messages/sig-cluster-lifecycle) | Tuesdays at 09:00 AM PST on [Zoom](https://zoom.us/j/166836624) |
|
||||
| [Cluster Ops](sig-cluster-ops/README.md) | [@zehicle (Rob Hirschfeld, RackN)](https://github.com/zehicle) <br> [@mikedanese (Mike Danese, Google] (https://github.com/mikedanese) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-ops) | [#sig-cluster-ops](https://kubernetes.slack.com/messages/sig-cluster-ops) | Thursdays at 1:00 PM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/sig-cluster-ops)|
|
||||
| [Contributor Experience](sig-contribx/README.md) | [@grodrigues3 (Garrett Rodrigues, Google)](https://github.com/Grodrigues3) <br> [@pwittrock (Phillip Witrock, Google)] (https://github.com/pwittrock) <br> [@Phillels (Elsie Phillips, CoreOS)](https://github.com/Phillels) | [Group](https://groups.google.com/forum/#!forum/kubernetes-wg-contribex) | [#wg-contribex] (https://kubernetes.slack.com/messages/wg-contribex) | Biweekly Wednesdays 9:30 AM PST on [zoom] (https://zoom.us/j/4730809290) |
|
||||
| [Docs] (sig-docs/README.md) | [@pwittrock (Philip Wittrock, Google)] (https://github.com/pwittrock) <br> [@devin-donnelly (Devin Donnelly, Google)] (https://github.com/devin-donnelly) <br> [@jaredbhatti (Jared Bhatti, Google)] (https://github.com/jaredbhatti)| [Group] (https://groups.google.com/forum/#!forum/kubernetes-sig-docs) | [#sig-docs] (https://kubernetes.slack.com/messages/sig-docs) | Tuesdays @ 10:30AM PST on [Zoom](https://zoom.us/j/4730809290) |
|
||||
| [Federation](sig-federation/README.md) | [@csbell (Christian Bell, Google)](https://github.com/csbell) <br> [@quinton-hoole (Quinton Hoole, Huawei)](https://github.com/quinton-hoole) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-federation) | [#sig-federation](https://kubernetes.slack.com/messages/sig-federation/) | Bi-weekly on Monday at 9:00 AM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/ubernetes) |
|
||||
| [Instrumentation](sig-instrumentation/README.md) | [@piosz (Piotr Szczesniak, Google)](https://github.com/piosz) <br> [@fabxc (Fabian Reinartz, CoreOS)](https://github.com/fabxc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-instrumentation) | [#sig-instrumentation](https://kubernetes.slack.com/messages/sig-instrumentation) | [Thursdays at 9.30 AM PST](https://zoom.us/j/5342565819) |
|
||||
| [Network](sig-network/README.md) | [@thockin (Tim Hockin, Google)](https://github.com/thockin)<br> [@dcbw (Dan Williams, Red Hat)](https://github.com/dcbw)<br> [@caseydavenport (Casey Davenport, Tigera)](https://github.com/caseydavenport) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-network) | [#sig-network](https://kubernetes.slack.com/messages/sig-network/) | Thursdays at 2:00 PM PST on [Zoom](https://zoom.us/j/5806599998) |
|
||||
| [Node](sig-node/README.md) | [@dchen1107 (Dawn Chen, Google)](https://github.com/dchen1107)<br>[@euank (Euan Kemp, CoreOS)](https://github.com/orgs/kubernetes/people/euank)<br>[@derekwaynecarr (Derek Carr, Red Hat)](https://github.com/derekwaynecarr) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-node) | [#sig-node](https://kubernetes.slack.com/messages/sig-node/) | [Tuesdays at 10:00 PT](https://plus.google.com/hangouts/_/google.com/sig-node-meetup?authuser=0) |
|
||||
| [On Prem](sig-onprem/README.md) | [@josephjacks (Joseph Jacks, Apprenda)] (https://github.com/josephjacks) <br> [@zen (Tomasz Napierala, Mirantis)] (https://github.com/zen)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-on-prem) | [#sig-onprem](https://kubernetes.slack.com/messages/sig-onprem/) | Every second Wednesday at 8 PM PST / 11 PM EST |
|
||||
| [OpenStack](sig-openstack/README.md) | [@idvoretskyi (Ihor Dvoretskyi, Mirantis)] (https://github.com/idvoretskyi) <br> [@xsgordon (Steve Gordon, Red Hat)] (https://github.com/xsgordon)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack) | [#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/) | Every second Wednesday at 5 PM PDT / 2 PM EDT |
|
||||
| [PM](project-managers/README.md) | [] ()| [Group](https://groups.google.com/forum/#!forum/kubernetes-pm) | []() | TBD|
|
||||
| [Rktnetes](sig-rktnetes/README.md) | [@euank (Euan Kemp, CoreOS)] (https://github.com/euank) <br> [@tmrts (Tamer Tas)] (https://github.com/tmrts) <br> [@yifan-gu (Yifan Gu, CoreOS)] (https://github.com/yifan-gu) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-rktnetes) | [#sig-rktnetes](https://kubernetes.slack.com/messages/sig-rktnetes/) | [As needed (ad-hoc)](https://zoom.us/j/830298957) |
|
||||
| [Scalability](sig-scalability/README.md) | [@lavalamp (Daniel Smith, Google)](https://github.com/lavalamp)<br>[@countspongebob (Bob Wise, Samsung SDS)](https://github.com/countspongebob)<br>[@jbeda (Joe Beda)](https://github.com/jbeda) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scale) | [#sig-scale](https://kubernetes.slack.com/messages/sig-scale/) | [Thursdays at 09:00 PT](https://zoom.us/j/989573207) |
|
||||
| [Scheduling](sig-scheduling/README.md) | [@davidopp (David Oppenheimer, Google)](https://github.com/davidopp)<br>[@timothysc (Timothy St. Clair, Red Hat)](https://github.com/timothysc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scheduling) | [#sig-scheduling](https://kubernetes.slack.com/messages/sig-scheduling/) | Alternate between Mondays at 1 PM PT and Wednesdays at 12:30 AM PT on [Zoom](https://zoom.us/zoomconference?m=rN2RrBUYxXgXY4EMiWWgQP6Vslgcsn86) |
|
||||
| [Service Catalog](sig-service-catalog/README.md) | [@pmorie (Paul Morie, Red Hat)](https://github.com/pmorie) <br> [@arschles (Aaron Schlesinger, Deis)](github.com/arschles) <br> [@bmelville (Brendan Melville, Google)](https://github.com/bmelville) <br> [@duglin (Doug Davis, IBM)](https://github.com/duglin)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-service-catalog) | [#sig-service-catalog](https://kubernetes.slack.com/messages/sig-service-catalog/) | [Mondays at 1 PM PST](https://zoom.us/j/7201225346) |
|
||||
| [Storage](sig-storage/README.md) | [@saad-ali (Saad Ali, Google)](https://github.com/saad-ali)<br>[@childsb (Brad Childs, Red Hat)](https://github.com/childsb) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-storage) | [#sig-storage](https://kubernetes.slack.com/messages/sig-storage/) | Bi-weekly Thursdays 9 AM PST (or more frequently) on [Zoom](https://zoom.us/j/614261834) |
|
||||
| [Testing](sig-testing/README.md) | [@spiffxp (Aaron Crickenberger, Samsung)](https://github.com/spiffxp)<br>[@ixdy (Jeff Grafton, Google)](https://github.com/ixdy) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-testing) | [#sig-testing](https://kubernetes.slack.com/messages/sig-testing/) | [Tuesdays at 9:30 AM PT](https://zoom.us/j/553910341) |
|
||||
| [UI](sig-ui/README.md) | [@romlein (Dan Romlein, Apprenda)](https://github.com/romlein)<br> [@bryk (Piotr Bryk, Google)](https://github.com/bryk) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-ui) | [#sig-ui](https://kubernetes.slack.com/messages/sig-ui/) | Wednesdays at 4:00 PM CEST |
|
||||
| [Windows](sig-windows/README.md) | [@michmike77 (Michael Michael, Apprenda)](https://github.com/michmike)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-windows) | [#sig-windows](https://kubernetes.slack.com/messages/sig-windows) | Bi-weekly Tuesdays at 10:00 AM PT |
|
||||
|
||||
### [How to start a SIG](sig-creation-procedure.md)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,97 @@
|
|||
# Communication
|
||||
|
||||
The Kubernetes community abides by the [CNCF code of conduct]. Here is an excerpt:
|
||||
|
||||
> _As contributors and maintainers of this project, and in the interest
|
||||
> of fostering an open and welcoming community, we pledge to respect
|
||||
> all people who contribute through reporting issues, posting feature
|
||||
> requests, updating documentation, submitting pull requests or patches,
|
||||
> and other activities._
|
||||
|
||||
## SIGs
|
||||
|
||||
Kubernetes encompasses many projects, organized into [SIGs](sig-list.md).
|
||||
Some communication has moved into SIG-specific channels - see
|
||||
a given SIG subdirectory for details.
|
||||
|
||||
Nevertheless, below find a list of many general channels, groups
|
||||
and meetings devoted to Kubernetes.
|
||||
|
||||
## Social Media
|
||||
|
||||
* [Twitter]
|
||||
* [Google+]
|
||||
* [blog]
|
||||
* Pose questions and help answer them on [Slack][slack.k8s.io] or [Stack Overflow].
|
||||
|
||||
Most real time discussion happens at [kubernetes.slack.com];
|
||||
you can sign up at [slack.k8s.io].
|
||||
|
||||
Discussions on most channels are archived at [kubernetes.slackarchive.io].
|
||||
Start archiving by inviting the _slackarchive_ bot to a
|
||||
channel via `/invite @slackarchive`.
|
||||
To add new channels, contact one of the admins
|
||||
(briangrant, goltermann, jbeda, sarahnovotny and thockin).
|
||||
|
||||
## Issues
|
||||
|
||||
If you have a question about Kubernetes or have a problem using it,
|
||||
please start with the [troubleshooting guide].
|
||||
|
||||
If that doesn't answer your questions, or if you think you found a bug,
|
||||
please [file an issue].
|
||||
|
||||
|
||||
## Mailing lists
|
||||
|
||||
Development announcements and discussions appear on the Google group
|
||||
[kubernetes-dev] (send mail to `kubernetes-dev@googlegroups.com`).
|
||||
|
||||
Users trade notes on the Google group
|
||||
[kubernetes-users] (send mail to `kubernetes-users@googlegroups.com`).
|
||||
|
||||
|
||||
## Weekly Meeting
|
||||
|
||||
We have PUBLIC and RECORDED [weekly meeting] every Thursday at 10am US Pacific Time.
|
||||
|
||||
Map that to your local time with this [timezone table].
|
||||
|
||||
See it on the web at [calendar.google.com], or paste this [iCal url] into any iCal client.
|
||||
|
||||
To be added to the calendar items, join the Google group
|
||||
[kubernetes-community-video-chat] for further instructions.
|
||||
|
||||
If you have a topic you'd like to present or would like to see discussed,
|
||||
please propose a specific date on the [Kubernetes Community Meeting Agenda].
|
||||
|
||||
|
||||
## Conferences
|
||||
|
||||
* [kubecon]
|
||||
* [cloudnativecon]
|
||||
|
||||
|
||||
[blog]: http://blog.kubernetes.io
|
||||
[calendar.google.com]: https://calendar.google.com/calendar/embed?src=cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com&ctz=America/Los_Angeles
|
||||
[cloudnativecon]: http://events.linuxfoundation.org/events/cloudnativecon
|
||||
[CNCF code of conduct]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md
|
||||
[communication]: https://github.com/kubernetes/community/blob/master/communication.md
|
||||
[community meeting]: https://github.com/kubernetes/community/blob/master/communication.md#weekly-meeting
|
||||
[file an issue]: https://github.com/kubernetes/kubernetes/issues/new
|
||||
[Google+]: https://plus.google.com/u/0/b/116512812300813784482/116512812300813784482
|
||||
[iCal url]: https://calendar.google.com/calendar/ical/cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com/public/basic.ics
|
||||
[kubecon]: http://events.linuxfoundation.org/events/kubecon
|
||||
[Kubernetes Community Meeting Agenda]: https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#
|
||||
[kubernetes-community-video-chat]: https://groups.google.com/forum/#!forum/kubernetes-community-video-chat
|
||||
[kubernetes-dev]: https://groups.google.com/forum/#!forum/kubernetes-dev
|
||||
[kubernetes-users]: https://groups.google.com/forum/#!forum/kubernetes-users
|
||||
[kubernetes.slackarchive.io]: http://kubernetes.slackarchive.io
|
||||
[kubernetes.slack.com]: http://kubernetes.slack.com
|
||||
[Special Interest Group]: https://github.com/kubernetes/community/blob/master/README.md#SIGs
|
||||
[slack.k8s.io]: http://slack.k8s.io
|
||||
[Stack Overflow]: http://stackoverflow.com/questions/tagged/kubernetes
|
||||
[timezone table]: https://www.google.com/search?q=1000+am+in+pst
|
||||
[troubleshooting guide]: http://kubernetes.io/docs/troubleshooting
|
||||
[Twitter]: https://twitter.com/kubernetesio
|
||||
[weekly meeting]: https://zoom.us/my/kubernetescommunity
|
||||
|
|
@ -1,7 +0,0 @@
|
|||
# Weekly Community Video Conference
|
||||
|
||||
We have PUBLIC and RECORDED [weekly video meetings](https://zoom.us/my/kubernetescommunity) every Thursday at 10am US Pacific Time. You can [find the time in your timezone with this table](https://www.google.com/search?q=1000+am+in+pst).
|
||||
|
||||
To be added to the calendar items, join this [google group](https://groups.google.com/forum/#!forum/kubernetes-community-video-chat) for further instructions.
|
||||
|
||||
If you have a topic you'd like to present or would like to see discussed, please propose a specific date on the Kubernetes Community Meeting [Working Document](https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#).
|
||||
|
|
@ -12,7 +12,7 @@ You don't actually need federation for geo-location now, but it helps. The ment
|
|||
|
||||
From the enterprise point of view, central IT is in control and knowledge of where stuff gets deployed. Bob thinks it would be a very bad idea for us to try to solve complex policy ideas and enable them, it's a tar pit. We should just have the primitives of having different regions and be able to say what goes where.
|
||||
|
||||
Currently, you either do node labelling which ends up being complex and dependant on discipline. Or you have different clusters and you don't have common namespaces. Some discussion of Intel proposal for cluster metadata.
|
||||
Currently, you either do node labelling which ends up being complex and dependent on discipline. Or you have different clusters and you don't have common namespaces. Some discussion of Intel proposal for cluster metadata.
|
||||
|
||||
Bob's mental model is AWS regions and AZs. For example, if we're building a big cassandra cluster, and you want to make sure that nodes aren't all in the same zone.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,61 +1,14 @@
|
|||
# Kubernetes Design Overview
|
||||
# Kubernetes Design Documents and Proposals
|
||||
|
||||
Kubernetes is a system for managing containerized applications across multiple
|
||||
hosts, providing basic mechanisms for deployment, maintenance, and scaling of
|
||||
applications.
|
||||
This directory contains Kubernetes design documents and accepted design proposals.
|
||||
|
||||
Kubernetes establishes robust declarative primitives for maintaining the desired
|
||||
state requested by the user. We see these primitives as the main value added by
|
||||
Kubernetes. Self-healing mechanisms, such as auto-restarting, re-scheduling, and
|
||||
replicating containers require active controllers, not just imperative
|
||||
orchestration.
|
||||
For a design overview, please see [the architecture document](architecture.md).
|
||||
|
||||
Kubernetes is primarily targeted at applications composed of multiple
|
||||
containers, such as elastic, distributed micro-services. It is also designed to
|
||||
facilitate migration of non-containerized application stacks to Kubernetes. It
|
||||
therefore includes abstractions for grouping containers in both loosely coupled
|
||||
and tightly coupled formations, and provides ways for containers to find and
|
||||
communicate with each other in relatively familiar ways.
|
||||
Note that a number of these documents are historical and may be out of date or unimplemented.
|
||||
|
||||
Kubernetes enables users to ask a cluster to run a set of containers. The system
|
||||
automatically chooses hosts to run those containers on. While Kubernetes's
|
||||
scheduler is currently very simple, we expect it to grow in sophistication over
|
||||
time. Scheduling is a policy-rich, topology-aware, workload-specific function
|
||||
that significantly impacts availability, performance, and capacity. The
|
||||
scheduler needs to take into account individual and collective resource
|
||||
requirements, quality of service requirements, hardware/software/policy
|
||||
constraints, affinity and anti-affinity specifications, data locality,
|
||||
inter-workload interference, deadlines, and so on. Workload-specific
|
||||
requirements will be exposed through the API as necessary.
|
||||
|
||||
Kubernetes is intended to run on a number of cloud providers, as well as on
|
||||
physical hosts.
|
||||
|
||||
A single Kubernetes cluster is not intended to span multiple availability zones.
|
||||
Instead, we recommend building a higher-level layer to replicate complete
|
||||
deployments of highly available applications across multiple zones (see
|
||||
[the multi-cluster doc](../admin/multi-cluster.md) and [cluster federation proposal](../proposals/federation.md)
|
||||
for more details).
|
||||
|
||||
Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS
|
||||
platform and toolkit. Therefore, architecturally, we want Kubernetes to be built
|
||||
as a collection of pluggable components and layers, with the ability to use
|
||||
alternative schedulers, controllers, storage systems, and distribution
|
||||
mechanisms, and we're evolving its current code in that direction. Furthermore,
|
||||
we want others to be able to extend Kubernetes functionality, such as with
|
||||
higher-level PaaS functionality or multi-cluster layers, without modification of
|
||||
core Kubernetes source. Therefore, its API isn't just (or even necessarily
|
||||
mainly) targeted at end users, but at tool and extension developers. Its APIs
|
||||
are intended to serve as the foundation for an open ecosystem of tools,
|
||||
automation systems, and higher-level API layers. Consequently, there are no
|
||||
"internal" inter-component APIs. All APIs are visible and available, including
|
||||
the APIs used by the scheduler, the node controller, the replication-controller
|
||||
manager, Kubelet's API, etc. There's no glass to break -- in order to handle
|
||||
more complex use cases, one can just access the lower-level APIs in a fully
|
||||
transparent, composable manner.
|
||||
|
||||
For more about the Kubernetes architecture, see [architecture](architecture.md).
|
||||
TODO: Add the current status to each document and clearly indicate which are up to date.
|
||||
|
||||
TODO: Document the [proposal process](../devel/faster_reviews.md#1-dont-build-a-cathedral-in-one-pr).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
|
|
|||
|
|
@ -0,0 +1,340 @@
|
|||
Add new patchStrategy to clear fields not present in the patch
|
||||
=============
|
||||
|
||||
Add tags `patchStrategy:"replaceKeys"`. For a given type that has the tag, all keys/fields missing
|
||||
from the request will be cleared when patching the object.
|
||||
For a field presents in the request, it will be merged with the live config.
|
||||
|
||||
The proposal of Full Union is in [kubernetes/community#388](https://github.com/kubernetes/community/pull/388).
|
||||
|
||||
| Capability | Supported By This Proposal | Supported By Full Union |
|
||||
|---|---|---|---|
|
||||
| Auto clear missing fields on patch | X | X |
|
||||
| Merge union fields on patch | X | X |
|
||||
| Validate only 1 field set on type | | X |
|
||||
| Validate discriminator field matches one-of field | | X |
|
||||
| Support non-union patchKey | X | TBD |
|
||||
| Support arbitrary combinations of set fields | X | |
|
||||
|
||||
## Use cases
|
||||
|
||||
- As a user patching a map, I want keys mutually exclusive with those that I am providing to automatically be cleared.
|
||||
|
||||
- As a user running kubectl apply, when I update a field in my configuration file,
|
||||
I want mutually exclusive fields never specified in my configuration to be cleared.
|
||||
|
||||
## Examples:
|
||||
|
||||
- General Example: Keys in a Union are mutually exclusive. Clear unspecified union values in a Union that contains a discriminator.
|
||||
|
||||
- Specific Example: When patching a Deployment .spec.strategy, clear .spec.strategy.rollingUpdate
|
||||
if it is not provided in the patch so that changing .spec.strategy.type will not fail.
|
||||
|
||||
- General Example: Keys in a Union are mutually exclusive. Clear unspecified union values in a Union
|
||||
that does not contain a discriminator.
|
||||
|
||||
- Specific Example: When patching a Pod .spec.volume, clear all volume fields except the one specified in the patch.
|
||||
|
||||
## Proposed Changes
|
||||
|
||||
### APIs
|
||||
|
||||
**Scope**:
|
||||
|
||||
| Union Type | Supported |
|
||||
|---|---|---|
|
||||
| non-inlined non-discriminated union | Yes |
|
||||
| non-inlined discriminated union | Yes |
|
||||
| inlined union with [patchMergeKey](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#strategic-merge-patch) only | Yes |
|
||||
| other inlined union | No |
|
||||
|
||||
For the inlined union with patchMergeKey, we move the tag to the parent struct's instead of
|
||||
adding some logic to lookup the metadata in go struct of the inline union.
|
||||
Because the limitation of the latter is that the metadata associated with
|
||||
the inlined APIs will not be reflected in the OpenAPI schema.
|
||||
|
||||
#### Tags
|
||||
|
||||
old tags:
|
||||
|
||||
1) `patchMergeKey`:
|
||||
It is the key to distinguish the entries in the list of non-primitive types. It must always be
|
||||
present to perform the merge on the list of non-primitive types, and will be preserved.
|
||||
|
||||
2) `patchStrategy`:
|
||||
It indicates how to generate and merge a patch for lists. It could be `merge` or `replace`. It is optional for lists.
|
||||
|
||||
new tags:
|
||||
|
||||
`patchStrategy: replaceKeys`:
|
||||
|
||||
We introduce a new value `replaceKeys` for `patchStrategy`.
|
||||
It indicates that all fields needing to be preserved must be present in the patch.
|
||||
And the fields that are present will be merged with live object. All the missing fields will be cleared when patching.
|
||||
|
||||
#### Examples
|
||||
|
||||
1) Non-inlined non-discriminated union:
|
||||
|
||||
Type definition:
|
||||
```go
|
||||
type ContainerStatus struct {
|
||||
...
|
||||
// Add patchStrategy:"replaceKeys"
|
||||
State ContainerState `json:"state,omitempty" protobuf:"bytes,2,opt,name=state" patchStrategy:"replaceKeys"``
|
||||
...
|
||||
}
|
||||
```
|
||||
Live object:
|
||||
```yaml
|
||||
state:
|
||||
running:
|
||||
startedAt: ...
|
||||
```
|
||||
Local file config:
|
||||
```yaml
|
||||
state:
|
||||
terminated:
|
||||
exitCode: 0
|
||||
finishedAt: ...
|
||||
```
|
||||
Patch:
|
||||
```yaml
|
||||
state:
|
||||
$patch: replaceKeys
|
||||
terminated:
|
||||
exitCode: 0
|
||||
finishedAt: ...
|
||||
```
|
||||
Result after merging
|
||||
```yaml
|
||||
state:
|
||||
terminated:
|
||||
exitCode: 0
|
||||
finishedAt: ...
|
||||
```
|
||||
|
||||
2) Non-inlined discriminated union:
|
||||
|
||||
Type definition:
|
||||
```go
|
||||
type DeploymentSpec struct {
|
||||
...
|
||||
// Add patchStrategy:"replaceKeys"
|
||||
Strategy DeploymentStrategy `json:"strategy,omitempty" protobuf:"bytes,4,opt,name=strategy" patchStrategy:"replaceKeys"`
|
||||
...
|
||||
}
|
||||
```
|
||||
Since there are no fields associated with `recreate` in `DeploymentSpec`, I will use a generic example.
|
||||
|
||||
Live object:
|
||||
```yaml
|
||||
unionName:
|
||||
discriminatorName: foo
|
||||
fooField:
|
||||
fooSubfield: val1
|
||||
```
|
||||
Local file config:
|
||||
```yaml
|
||||
unionName:
|
||||
discriminatorName: bar
|
||||
barField:
|
||||
barSubfield: val2
|
||||
```
|
||||
Patch:
|
||||
```yaml
|
||||
unionName:
|
||||
$patch: replaceKeys
|
||||
discriminatorName: bar
|
||||
barField:
|
||||
barSubfield: val2
|
||||
```
|
||||
Result after merging
|
||||
```yaml
|
||||
unionName:
|
||||
discriminatorName: bar
|
||||
barField:
|
||||
barSubfield: val2
|
||||
```
|
||||
|
||||
3) Inlined union with `patchMergeKey` only.
|
||||
This case is special, because `Volumes` already has a tag `patchStrategy:"merge"`.
|
||||
We change the tag to `patchStrategy:"merge|replaceKeys"`
|
||||
|
||||
Type definition:
|
||||
```go
|
||||
type PodSpec struct {
|
||||
...
|
||||
// Add another value "replaceKeys" to patchStrategy
|
||||
Volumes []Volume `json:"volumes,omitempty" patchStrategy:"merge|replaceKeys" patchMergeKey:"name" protobuf:"bytes,1,rep,name=volumes"`
|
||||
...
|
||||
}
|
||||
```
|
||||
Live object:
|
||||
```yaml
|
||||
spec:
|
||||
volumes:
|
||||
- name: foo
|
||||
emptyDir:
|
||||
medium:
|
||||
...
|
||||
```
|
||||
Local file config:
|
||||
```yaml
|
||||
spec:
|
||||
volumes:
|
||||
- name: foo
|
||||
hostPath:
|
||||
path: ...
|
||||
```
|
||||
Patch:
|
||||
```yaml
|
||||
spec:
|
||||
volumes:
|
||||
- name: foo
|
||||
$patch: replaceKeys
|
||||
hostPath:
|
||||
path: ...
|
||||
```
|
||||
Result after merging
|
||||
```yaml
|
||||
spec:
|
||||
volumes:
|
||||
- name: foo
|
||||
hostPath:
|
||||
path: ...
|
||||
```
|
||||
|
||||
**Impacted APIs** are listed in the [Appendix](#appendix).
|
||||
|
||||
### API server
|
||||
|
||||
No required change.
|
||||
Auto clearing missing fields of a patch relies on package Strategic Merge Patch.
|
||||
We don't validate only 1 field is set in union in a generic way. We don't validate discriminator
|
||||
field matches one-of field. But we still rely on hardcoded per field based validation.
|
||||
|
||||
### kubectl
|
||||
|
||||
No required change.
|
||||
Changes about how to generate the patch rely on package Strategic Merge Patch.
|
||||
|
||||
### Strategic Merge Patch
|
||||
**Background**
|
||||
Strategic Merge Patch is a package used by both client and server. A typical usage is that a client
|
||||
calls the function to calculate the patch and the API server calls another function to merge the patch.
|
||||
|
||||
We need to make sure the client always sends a patch that includes all of the fields that it wants to keep.
|
||||
When merging, auto clear missing fields of a patch if the patch has a directive `$patch: replaceKeys`
|
||||
|
||||
### Open API
|
||||
|
||||
Update OpenAPI schema.
|
||||
|
||||
## Version Skew
|
||||
|
||||
The changes are all backward compatible.
|
||||
|
||||
Old kubectl vs New server: All behave the same as before, since no new directive in the patch.
|
||||
|
||||
New kubectl vs Old server: All behave the same as before, since new directive will not be recognized
|
||||
by the old server and it will be dropped in conversion; Unchanged fields will not affect the merged result.
|
||||
|
||||
# Alternatives Considered
|
||||
|
||||
The proposals below are not mutually exclusive with the proposal above, and maybe can be added at some point in the future.
|
||||
|
||||
# 1. Add Discriminators in All Unions/OneOf APIs
|
||||
|
||||
Original issue is described in kubernetes/kubernetes#35345
|
||||
|
||||
## Analysis
|
||||
|
||||
### Behavior
|
||||
|
||||
If the discriminator were set, we'd require that the field corresponding to its value were set and the APIServer (registry) could automatically clear the other fields.
|
||||
|
||||
If the discriminator were unset, behavior would be as before -- exactly one of the fields in the union/oneof would be required to be set and the operation would otherwise fail validation.
|
||||
|
||||
We should set discriminators by default. This means we need to change it accordingly when the corresponding union/oneof fields were set and unset.
|
||||
|
||||
## Proposed Changes
|
||||
|
||||
### API
|
||||
Add a discriminator field in all unions/oneof APIs. The discriminator should be optional for backward compatibility. There is an example below, the field `Type` works as a discriminator.
|
||||
```go
|
||||
type PersistentVolumeSource struct {
|
||||
...
|
||||
// Discriminator for PersistentVolumeSource, it can be "gcePersistentDisk", "awsElasticBlockStore" and etc.
|
||||
// +optional
|
||||
Type *string `json:"type,omitempty" protobuf:"bytes,24,opt,name=type"`
|
||||
}
|
||||
```
|
||||
|
||||
### API Server
|
||||
|
||||
We need to add defaulting logic described in the [Behavior](#behavior) section.
|
||||
|
||||
### kubectl
|
||||
|
||||
No change required on kubectl.
|
||||
|
||||
## Summary
|
||||
|
||||
Limitation: Server-side automatically clearing fields based on discriminator may be unsafe.
|
||||
|
||||
# Appendix
|
||||
|
||||
## List of Impacted APIs
|
||||
In `pkg/api/v1/types.go`:
|
||||
- [`VolumeSource`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L235):
|
||||
It is inlined. Besides `VolumeSource`. its parent [Volume](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L222) has `Name`.
|
||||
- [`PersistentVolumeSource`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L345):
|
||||
It is inlined. Besides `PersistentVolumeSource`, its parent [PersistentVolumeSpec](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L442) has the following fields:
|
||||
```go
|
||||
Capacity ResourceList `json:"capacity,omitempty" protobuf:"bytes,1,rep,name=capacity,casttype=ResourceList,castkey=ResourceName"`
|
||||
// +optional
|
||||
AccessModes []PersistentVolumeAccessMode `json:"accessModes,omitempty" protobuf:"bytes,3,rep,name=accessModes,casttype=PersistentVolumeAccessMode"`
|
||||
// +optional
|
||||
ClaimRef *ObjectReference `json:"claimRef,omitempty" protobuf:"bytes,4,opt,name=claimRef"`
|
||||
// +optional
|
||||
PersistentVolumeReclaimPolicy PersistentVolumeReclaimPolicy `json:"persistentVolumeReclaimPolicy,omitempty" protobuf:"bytes,5,opt,name=persistentVolumeReclaimPolicy,casttype=PersistentVolumeReclaimPolicy"`
|
||||
```
|
||||
- [`Handler`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L1485):
|
||||
It is inlined. Besides `Handler`, its parent struct [`Probe`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L1297) also has the following fields:
|
||||
```go
|
||||
// +optional
|
||||
InitialDelaySeconds int32 `json:"initialDelaySeconds,omitempty" protobuf:"varint,2,opt,name=initialDelaySeconds"`
|
||||
// +optional
|
||||
TimeoutSeconds int32 `json:"timeoutSeconds,omitempty" protobuf:"varint,3,opt,name=timeoutSeconds"`
|
||||
// +optional
|
||||
PeriodSeconds int32 `json:"periodSeconds,omitempty" protobuf:"varint,4,opt,name=periodSeconds"`
|
||||
// +optional
|
||||
SuccessThreshold int32 `json:"successThreshold,omitempty" protobuf:"varint,5,opt,name=successThreshold"`
|
||||
// +optional
|
||||
FailureThreshold int32 `json:"failureThreshold,omitempty" protobuf:"varint,6,opt,name=failureThreshold"`
|
||||
````
|
||||
- [`ContainerState`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L1576):
|
||||
It is NOT inlined.
|
||||
- [`PodSignature`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L2953):
|
||||
It has only one field, but the comment says "Exactly one field should be set". Maybe we will add more in the future? It is NOT inlined.
|
||||
In `pkg/authorization/types.go`:
|
||||
- [`SubjectAccessReviewSpec`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/authorization/types.go#L108):
|
||||
Comments says: `Exactly one of ResourceAttributes and NonResourceAttributes must be set.`
|
||||
But there are some other non-union fields in the struct.
|
||||
So this is similar to INLINED struct.
|
||||
- [`SelfSubjectAccessReviewSpec`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/authorization/types.go#L130):
|
||||
It is NOT inlined.
|
||||
|
||||
In `pkg/apis/extensions/v1beta1/types.go`:
|
||||
- [`DeploymentStrategy`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/types.go#L249):
|
||||
It is NOT inlined.
|
||||
- [`NetworkPolicyPeer`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/v1beta1/types.go#L1340):
|
||||
It is NOT inlined.
|
||||
- [`IngressRuleValue`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/v1beta1/types.go#L876):
|
||||
It says "exactly one of the following must be set". But it has only one field.
|
||||
It is inlined. Its parent [`IngressRule`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/v1beta1/types.go#L848) also has the following fields:
|
||||
```go
|
||||
// +optional
|
||||
Host string `json:"host,omitempty" protobuf:"bytes,1,opt,name=host"`
|
||||
```
|
||||
|
|
@ -80,7 +80,7 @@ There are two configurations in which it makes sense to run `kube-aggregator`.
|
|||
`api.mycompany.com/v2` from another apiserver while you update clients. But
|
||||
you can't serve `api.mycompany.com/v1/frobbers` and
|
||||
`api.mcompany.com/v1/grobinators` from different apiservers. This restriction
|
||||
allows us to limit the scope of `kube-aggregator` to a managable level.
|
||||
allows us to limit the scope of `kube-aggregator` to a manageable level.
|
||||
* Follow API conventions: APIs exposed by every API server should adhere to [kubernetes API
|
||||
conventions](../devel/api-conventions.md).
|
||||
* Support discovery API: Each API server should support the kubernetes discovery API
|
||||
|
|
@ -160,7 +160,7 @@ Since the actual server which serves client's request can be opaque to the clien
|
|||
all API servers need to have homogeneous authentication and authorisation mechanisms.
|
||||
All API servers will handle authn and authz for their resources themselves.
|
||||
The current authentication infrastructure allows token authentication delegation to the
|
||||
core `kube-apiserver` and trust of an authentication proxy, which can be fullfilled by
|
||||
core `kube-apiserver` and trust of an authentication proxy, which can be fulfilled by
|
||||
`kubernetes-aggregator`.
|
||||
|
||||
#### Server Role Bootstrapping
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ is no way to achieve this in Kubernetes without scripting inside of a container.
|
|||
|
||||
## Constraints and Assumptions
|
||||
|
||||
1. The volume types must remain unchanged for backward compatability
|
||||
1. The volume types must remain unchanged for backward compatibility
|
||||
2. There will be a new volume type for this proposed functionality, but no
|
||||
other API changes
|
||||
3. The new volume type should support atomic updates in the event of an input
|
||||
|
|
@ -186,15 +186,31 @@ anything preceding it as before.
|
|||
### Proposed API objects
|
||||
|
||||
```go
|
||||
type Projections struct {
|
||||
type ProjectedVolumeSource struct {
|
||||
Sources []VolumeProjection `json:"sources"`
|
||||
DefaultMode *int32 `json:"defaultMode,omitempty"`
|
||||
DefaultMode *int32 `json:"defaultMode,omitempty"`
|
||||
}
|
||||
|
||||
type VolumeProjection struct {
|
||||
Secret *SecretVolumeSource `json:"secret,omitempty"`
|
||||
ConfigMap *ConfigMapVolumeSource `json:"configMap,omitempty"`
|
||||
DownwardAPI *DownwardAPIVolumeSource `json:"downwardAPI,omitempty"`
|
||||
Secret *SecretProjection `json:"secret,omitempty"`
|
||||
ConfigMap *ConfigMapProjection `json:"configMap,omitempty"`
|
||||
DownwardAPI *DownwardAPIProjection `json:"downwardAPI,omitempty"`
|
||||
}
|
||||
|
||||
type SecretProjection struct {
|
||||
LocalObjectReference
|
||||
Items []KeyToPath
|
||||
Optional *bool
|
||||
}
|
||||
|
||||
type ConfigMapProjection struct {
|
||||
LocalObjectReference
|
||||
Items []KeyToPath
|
||||
Optional *bool
|
||||
}
|
||||
|
||||
type DownwardAPIProjection struct {
|
||||
Items []DownwardAPIVolumeFile
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -203,14 +219,7 @@ type VolumeProjection struct {
|
|||
Add to the VolumeSource struct:
|
||||
|
||||
```go
|
||||
Projected *Projections `json:"projected,omitempty"`
|
||||
// (other existing fields omitted for brevity)
|
||||
```
|
||||
|
||||
Add to the SecretVolumeSource struct:
|
||||
|
||||
```go
|
||||
LocalObjectReference `json:"name,omitempty"`
|
||||
Projected *ProjectedVolumeSource `json:"projected,omitempty"`
|
||||
// (other existing fields omitted for brevity)
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,67 @@
|
|||
# Exposing annotations via environment downward API
|
||||
|
||||
Author: Michal Rostecki \<michal@kinvolk.io\>
|
||||
|
||||
## Introduction
|
||||
|
||||
Annotations of the pod can be taken through the Kubernetes API, but currently
|
||||
there is no way to pass them to the application inside the container. This means
|
||||
that annotations can be used by the core Kubernetes services and the user outside
|
||||
of the Kubernetes cluster.
|
||||
|
||||
Of course using Kubernetes API from the application running inside the container
|
||||
managed by Kubernetes is technically possible, but that's an idea which denies
|
||||
the principles of microservices architecture.
|
||||
|
||||
The purpose of the proposal is to allow to pass the annotation as the environment
|
||||
variable to the container.
|
||||
|
||||
### Use-case
|
||||
|
||||
The primary usecase for this proposal are StatefulSets. There is an idea to expose
|
||||
StatefulSet index to the applications running inside the pods managed by StatefulSet.
|
||||
Since StatefulSet creates pods as the API objects, passing this index as an
|
||||
annotation seems to be a valid way to do this. However, to finally pass this
|
||||
information to the containerized application, we need to pass this annotation.
|
||||
That's why the downward API for annotations is needed here.
|
||||
|
||||
## API
|
||||
|
||||
The exact `fieldPath` to the annotation will have the following syntax:
|
||||
|
||||
```
|
||||
metadata.annotations['annotationKey']
|
||||
```
|
||||
|
||||
Which means that:
|
||||
- the *annotationKey* will be specified inside brackets (`[`, `]`) and single quotation
|
||||
marks (`'`)
|
||||
- if the *annotationKey* contains `[`, `]` or `'` characters inside, they will need to
|
||||
be escaped (like `\[`, `\]`, `\'`) and having these characters unescaped should result
|
||||
in validation error
|
||||
|
||||
Examples:
|
||||
- `metadata.annotations['spec.pod.beta.kubernetes.io/statefulset-index']`
|
||||
- `metadata.annotations['foo.bar/example-annotation']`
|
||||
- `metadata.annotations['foo.bar/more\'complicated\]example\[with\'characters"to-escape']`
|
||||
|
||||
So, assuming that we would want to pass the `pod.beta.kubernetes.io/statefulset-index`
|
||||
annotation as a `STATEFULSET_INDEX` variable, the environment variable definition
|
||||
will look like:
|
||||
|
||||
```
|
||||
env:
|
||||
- name: STATEFULSET_INDEX
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.annotations['spec.pod.beta.kubernetes.io/statefulset-index']
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
In general, this environment downward API part will be implemented in the same
|
||||
place as the other metadata - as a label conversion function.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
@ -1,84 +1,243 @@
|
|||
# Kubernetes architecture
|
||||
# Kubernetes Design and Architecture
|
||||
|
||||
A running Kubernetes cluster contains node agents (`kubelet`) and master
|
||||
components (APIs, scheduler, etc), on top of a distributed storage solution.
|
||||
This diagram shows our desired eventual state, though we're still working on a
|
||||
few things, like making `kubelet` itself (all our components, really) run within
|
||||
containers, and making the scheduler 100% pluggable.
|
||||
## Overview
|
||||
|
||||

|
||||
Kubernetes is production-grade, open-source infrastructure for the deployment, scaling,
|
||||
management, and composition of application containers across clusters of hosts, inspired
|
||||
by [previous work at Google](https://research.google.com/pubs/pub44843.html). Kubernetes
|
||||
is more than just a “container orchestrator”. It aims to eliminate the burden of orchestrating
|
||||
physical/virtual compute, network, and storage infrastructure, and enable application operators
|
||||
and developers to focus entirely on container-centric primitives for self-service operation.
|
||||
Kubernetes also provides a stable, portable foundation (a platform) for building customized
|
||||
workflows and higher-level automation.
|
||||
|
||||
## The Kubernetes Node
|
||||
Kubernetes is primarily targeted at applications composed of multiple containers. It therefore
|
||||
groups containers using *pods* and *labels* into tightly coupled and loosely coupled formations
|
||||
for easy management and discovery.
|
||||
|
||||
When looking at the architecture of the system, we'll break it down to services
|
||||
that run on the worker node and services that compose the cluster-level control
|
||||
plane.
|
||||
## Scope
|
||||
|
||||
Kubernetes is a [platform for deploying and managing containers]
|
||||
(https://kubernetes.io/docs/whatisk8s/). Kubernetes provides a container runtime, container
|
||||
orchestration, container-centric infrastructure orchestration, self-healing mechanisms such as health checking and re-scheduling, and service discovery and load balancing.
|
||||
|
||||
Kubernetes aspires to be an extensible, pluggable, building-block OSS
|
||||
platform and toolkit. Therefore, architecturally, we want Kubernetes to be built
|
||||
as a collection of pluggable components and layers, with the ability to use
|
||||
alternative schedulers, controllers, storage systems, and distribution
|
||||
mechanisms, and we're evolving its current code in that direction. Furthermore,
|
||||
we want others to be able to extend Kubernetes functionality, such as with
|
||||
higher-level PaaS functionality or multi-cluster layers, without modification of
|
||||
core Kubernetes source. Therefore, its API isn't just (or even necessarily
|
||||
mainly) targeted at end users, but at tool and extension developers. Its APIs
|
||||
are intended to serve as the foundation for an open ecosystem of tools,
|
||||
automation systems, and higher-level API layers. Consequently, there are no
|
||||
"internal" inter-component APIs. All APIs are visible and available, including
|
||||
the APIs used by the scheduler, the node controller, the replication-controller
|
||||
manager, Kubelet's API, etc. There's no glass to break -- in order to handle
|
||||
more complex use cases, one can just access the lower-level APIs in a fully
|
||||
transparent, composable manner.
|
||||
|
||||
## Goals
|
||||
|
||||
The project is committed to the following (aspirational) [design ideals](principles.md):
|
||||
* _Portable_. Kubernetes runs everywhere -- public cloud, private cloud, bare metal, laptop --
|
||||
with consistent behavior so that applications and tools are portable throughout the ecosystem
|
||||
as well as between development and production environments.
|
||||
* _General-purpose_. Kubernetes should run all major categories of workloads to enable you to run
|
||||
all of your workloads on a single infrastructure, stateless and stateful, microservices and
|
||||
monoliths, services and batch, greenfield and legacy.
|
||||
* _Meet users partway_. Kubernetes doesn’t just cater to purely greenfield cloud-native
|
||||
applications, nor does it meet all users where they are. It focuses on deployment and management
|
||||
of microservices and cloud-native applications, but provides some mechanisms to facilitate
|
||||
migration of monolithic and legacy applications.
|
||||
* _Flexible_. Kubernetes functionality can be consumed a la carte and (in most cases) Kubernetes
|
||||
does not prevent you from using your own solutions in lieu of built-in functionality.
|
||||
* _Extensible_. Kubernetes enables you to integrate it into your environment and to add the
|
||||
additional capabilities you need, by exposing the same interfaces used by built-in
|
||||
functionality.
|
||||
* _Automatable_. Kubernetes aims to dramatically reduce the burden of manual operations. It
|
||||
supports both declarative control by specifying users’ desired intent via its API, as well as
|
||||
imperative control to support higher-level orchestration and automation. The declarative
|
||||
approach is key to the system’s self-healing and autonomic capabilities.
|
||||
* _Advance the state of the art_. While Kubernetes intends to support non-cloud-native
|
||||
applications, it also aspires to advance the cloud-native and DevOps state of the art, such as
|
||||
in the [participation of applications in their own management]
|
||||
(http://blog.kubernetes.io/2016/09/cloud-native-application-interfaces.html). However, in doing
|
||||
so, we strive not to force applications to lock themselves into Kubernetes APIs, which is, for
|
||||
example, why we prefer configuration over convention in the [downward API]
|
||||
(https://kubernetes.io/docs/user-guide/downward-api/). Additionally, Kubernetes is not bound by
|
||||
the lowest common denominator of systems upon which it depends, such as container runtimes and
|
||||
cloud providers. An example where we pushed the envelope of what was achievable was in its [IP
|
||||
per Pod networking model](https://kubernetes.io/docs/admin/networking/#kubernetes-model).
|
||||
|
||||
## Architecture
|
||||
|
||||
A running Kubernetes cluster contains node agents (kubelet) and a cluster control plane (AKA
|
||||
*master*), with cluster state backed by a distributed storage system
|
||||
([etcd](https://github.com/coreos/etcd)).
|
||||
|
||||
### Cluster control plane (AKA *master*)
|
||||
|
||||
The Kubernetes [control plane](https://en.wikipedia.org/wiki/Control_plane) is split
|
||||
into a set of components, which can all run on a single *master* node, or can be replicated
|
||||
in order to support high-availability clusters, or can even be run on Kubernetes itself (AKA
|
||||
[self-hosted](self-hosted-kubernetes.md#what-is-self-hosted)).
|
||||
|
||||
Kubernetes provides a REST API supporting primarily CRUD operations on (mostly) persistent resources, which
|
||||
serve as the hub of its control plane. Kubernetes’s API provides IaaS-like
|
||||
container-centric primitives such as [Pods](https://kubernetes.io/docs/user-guide/pods/),
|
||||
[Services](https://kubernetes.io/docs/user-guide/services/), and [Ingress]
|
||||
(https://kubernetes.io/docs/user-guide/ingress/), and also lifecycle APIs to support orchestration
|
||||
(self-healing, scaling, updates, termination) of common types of workloads, such as [ReplicaSet]
|
||||
(https://kubernetes.io/docs/user-guide/replicasets/) (simple fungible/stateless app manager),
|
||||
[Deployment](https://kubernetes.io/docs/user-guide/deployments/) (orchestrates updates of
|
||||
stateless apps), [Job](https://kubernetes.io/docs/user-guide/jobs/) (batch), [CronJob]
|
||||
(https://kubernetes.io/docs/user-guide/cron-jobs/) (cron), [DaemonSet]
|
||||
(https://kubernetes.io/docs/admin/daemons/) (cluster services), and [StatefulSet]
|
||||
(https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/) (stateful apps).
|
||||
We deliberately decoupled service naming/discovery and load balancing from application
|
||||
implementation, since the latter is diverse and open-ended.
|
||||
|
||||
Both user clients and components containing asynchronous controllers interact with the same API resources, which serve as coordination points, common intermediate representation, and shared state. Most resources contain metadata, including [labels](https://kubernetes.io/docs/user-guide/labels/) and [annotations](https://kubernetes.io/docs/user-guide/annotations/), fully elaborated desired state (spec), including default values, and observed state (status).
|
||||
|
||||
Controllers work continuously to drive the actual state towards the desired state, while reporting back the currently observed state for users and for other controllers.
|
||||
|
||||
While the controllers are [level-based]
|
||||
(http://gengnosis.blogspot.com/2007/01/level-triggered-and-edge-triggered.html) to maximize fault
|
||||
tolerance, they typically `watch` for changes to relevant resources in order to minimize reaction
|
||||
latency and redundant work. This enables decentralized and decoupled
|
||||
[choreography-like](https://en.wikipedia.org/wiki/Service_choreography) coordination without a
|
||||
message bus.
|
||||
|
||||
#### API Server
|
||||
|
||||
The [API server](https://kubernetes.io/docs/admin/kube-apiserver/) serves up the
|
||||
[Kubernetes API](https://kubernetes.io/docs/api/). It is intended to be a relatively simple
|
||||
server, with most/all business logic implemented in separate components or in plug-ins. It mainly
|
||||
processes REST operations, validates them, and updates the corresponding objects in `etcd` (and
|
||||
perhaps eventually other stores). Note that, for a number of reasons, Kubernetes deliberately does
|
||||
not support atomic transactions across multiple resources.
|
||||
|
||||
Kubernetes cannot function without this basic API machinery, which includes:
|
||||
* REST semantics, watch, durability and consistency guarantees, API versioning, defaulting, and
|
||||
validation
|
||||
* Built-in admission-control semantics, synchronous admission-control hooks, and asynchronous
|
||||
resource initialization
|
||||
* API registration and discovery
|
||||
|
||||
Additionally, the API server acts as the gateway to the cluster. By definition, the API server
|
||||
must be accessible by clients from outside the cluster, whereas the nodes, and certainly
|
||||
containers, may not be. Clients authenticate the API server and also use it as a bastion and
|
||||
proxy/tunnel to nodes and pods (and services).
|
||||
|
||||
#### Cluster state store
|
||||
|
||||
All persistent cluster state is stored in an instance of `etcd`. This provides a way to store
|
||||
configuration data reliably. With `watch` support, coordinating components can be notified very
|
||||
quickly of changes.
|
||||
|
||||
|
||||
#### Controller-Manager Server
|
||||
|
||||
Most other cluster-level functions are currently performed by a separate process, called the
|
||||
[Controller Manager](https://kubernetes.io/docs/admin/kube-controller-manager/). It performs
|
||||
both lifecycle functions (e.g., namespace creation and lifecycle, event garbage collection,
|
||||
terminated-pod garbage collection, cascading-deletion garbage collection, node garbage collection)
|
||||
and API business logic (e.g., scaling of pods controlled by a [ReplicaSet]
|
||||
(https://kubernetes.io/docs/user-guide/replicasets/)).
|
||||
|
||||
The application management and composition layer, providing self-healing, scaling, application lifecycle management, service discovery, routing, and service binding and provisioning.
|
||||
|
||||
These functions may eventually be split into separate components to make them more easily
|
||||
extended or replaced.
|
||||
|
||||
#### Scheduler
|
||||
|
||||
|
||||
Kubernetes enables users to ask a cluster to run a set of containers. The scheduler
|
||||
component automatically chooses hosts to run those containers on.
|
||||
|
||||
The scheduler watches for unscheduled pods and binds them to nodes via the `/binding` pod
|
||||
subresource API, according to the availability of the requested resources, quality of service
|
||||
requirements, affinity and anti-affinity specifications, and other constraints.
|
||||
|
||||
Kubernetes supports user-provided schedulers and multiple concurrent cluster schedulers,
|
||||
using the shared-state approach pioneered by [Omega]
|
||||
(https://research.google.com/pubs/pub41684.html). In addition to the disadvantages of
|
||||
pessimistic concurrency described by the Omega paper, [two-level scheduling models]
|
||||
(http://mesos.berkeley.edu/mesos_tech_report.pdf) that hide information from the upper-level
|
||||
schedulers need to implement all of the same features in the lower-level scheduler as required by
|
||||
all upper-layer schedulers in order to ensure that their scheduling requests can be satisfied by
|
||||
available desired resources.
|
||||
|
||||
|
||||
### The Kubernetes Node
|
||||
|
||||
The Kubernetes node has the services necessary to run application containers and
|
||||
be managed from the master systems.
|
||||
|
||||
Each node runs a container runtime (like Docker, rkt or Hyper). The container
|
||||
runtime is responsible for downloading images and running containers.
|
||||
#### Kubelet
|
||||
|
||||
### `kubelet`
|
||||
The most important and most prominent controller in Kubernetes is the Kubelet, which is the
|
||||
primary implementer of the Pod and Node APIs that drive the container execution layer. Without
|
||||
these APIs, Kubernetes would just be a CRUD-oriented REST application framework backed by a
|
||||
key-value store (and perhaps the API machinery will eventually be spun out as an independent
|
||||
project).
|
||||
|
||||
The `kubelet` manages [pods](../user-guide/pods.md) and their containers, their
|
||||
images, their volumes, etc.
|
||||
Kubernetes executes isolated application containers as its default, native mode of execution, as
|
||||
opposed to processes and traditional operating-system packages. Not only are application
|
||||
containers isolated from each other, but they are also isolated from the hosts on which they
|
||||
execute, which is critical to decoupling management of individual applications from each other and
|
||||
from management of the underlying cluster physical/virtual infrastructure.
|
||||
|
||||
### `kube-proxy`
|
||||
Kubernetes provides [Pods](https://kubernetes.io/docs/user-guide/pods/) that can host multiple
|
||||
containers and storage volumes as its fundamental execution primitive in order to facilitate
|
||||
packaging a single application per container, decoupling deployment-time concerns from build-time
|
||||
concerns, and migration from physical/virtual machines. The Pod primitive is key to glean the
|
||||
[primary benefits](https://kubernetes.io/docs/whatisk8s/#why-containers) of deployment on modern
|
||||
cloud platforms, such as Kubernetes.
|
||||
|
||||
Each node also runs a simple network proxy and load balancer (see the
|
||||
[services FAQ](https://github.com/kubernetes/kubernetes/wiki/Services-FAQ) for
|
||||
more details). This reflects `services` (see
|
||||
[the services doc](../user-guide/services.md) for more details) as defined in
|
||||
the Kubernetes API on each node and can do simple TCP and UDP stream forwarding
|
||||
(round robin) across a set of backends.
|
||||
Kubelet also currently links in the [cAdvisor](https://github.com/google/cadvisor) resource monitoring
|
||||
agent.
|
||||
|
||||
Service endpoints are currently found via [DNS](../admin/dns.md) or through
|
||||
environment variables (both
|
||||
[Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and
|
||||
Kubernetes `{FOO}_SERVICE_HOST` and `{FOO}_SERVICE_PORT` variables are
|
||||
supported). These variables resolve to ports managed by the service proxy.
|
||||
#### Container runtime
|
||||
|
||||
## The Kubernetes Control Plane
|
||||
Each node runs a container runtime, which is responsible for downloading images and running containers.
|
||||
|
||||
The Kubernetes control plane is split into a set of components. Currently they
|
||||
all run on a single _master_ node, but that is expected to change soon in order
|
||||
to support high-availability clusters. These components work together to provide
|
||||
a unified view of the cluster.
|
||||
Kubelet does not link in the base container runtime. Instead, we're defining a [Container Runtime Interface]
|
||||
(container-runtime-interface-v1.md) to control the underlying runtime and facilitate pluggability of that layer.
|
||||
This decoupling is needed in order to maintain clear component boundaries, facilitate testing, and facilitate pluggability.
|
||||
Runtimes supported today, either upstream or by forks, include at least docker (for Linux and Windows),
|
||||
[rkt](https://kubernetes.io/docs/getting-started-guides/rkt/),
|
||||
[cri-o](https://github.com/kubernetes-incubator/cri-o), and [frakti](https://github.com/kubernetes/frakti).
|
||||
|
||||
### `etcd`
|
||||
#### Kube Proxy
|
||||
|
||||
All persistent master state is stored in an instance of `etcd`. This provides a
|
||||
great way to store configuration data reliably. With `watch` support,
|
||||
coordinating components can be notified very quickly of changes.
|
||||
The [service](https://kubernetes.io/docs/user-guide/services/) abstraction provides a way to
|
||||
group pods under a common access policy (e.g., load-balanced). The implementation of this creates
|
||||
A virtual IP which clients can access and which is transparently proxied to the pods in a Service.
|
||||
Each node runs a [kube-proxy](https://kubernetes.io/docs/admin/kube-proxy/) process which programs
|
||||
`iptables` rules to trap access to service IPs and redirect them to the correct backends. This provides a highly-available load-balancing solution with low performance overhead by balancing
|
||||
client traffic from a node on that same node.
|
||||
|
||||
### Kubernetes API Server
|
||||
Service endpoints are found primarily via [DNS](https://kubernetes.io/docs/admin/dns/).
|
||||
|
||||
The apiserver serves up the [Kubernetes API](../api.md). It is intended to be a
|
||||
CRUD-y server, with most/all business logic implemented in separate components
|
||||
or in plug-ins. It mainly processes REST operations, validates them, and updates
|
||||
the corresponding objects in `etcd` (and eventually other stores).
|
||||
### Add-ons and other dependencies
|
||||
|
||||
### Scheduler
|
||||
A number of components, called [*add-ons*]
|
||||
(https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) typically run on Kubernetes
|
||||
itself:
|
||||
* [DNS](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns)
|
||||
* [Ingress controller](https://github.com/kubernetes/ingress/tree/master/controllers)
|
||||
* [Heapster](https://github.com/kubernetes/heapster/) (resource monitoring)
|
||||
* [Dashboard](https://github.com/kubernetes/dashboard/) (GUI)
|
||||
|
||||
The scheduler binds unscheduled pods to nodes via the `/binding` API. The
|
||||
scheduler is pluggable, and we expect to support multiple cluster schedulers and
|
||||
even user-provided schedulers in the future.
|
||||
### Federation
|
||||
|
||||
### Kubernetes Controller Manager Server
|
||||
|
||||
All other cluster-level functions are currently performed by the Controller
|
||||
Manager. For instance, `Endpoints` objects are created and updated by the
|
||||
endpoints controller, and nodes are discovered, managed, and monitored by the
|
||||
node controller. These could eventually be split into separate components to
|
||||
make them independently pluggable.
|
||||
|
||||
The [`replicationcontroller`](../user-guide/replication-controller.md) is a
|
||||
mechanism that is layered on top of the simple [`pod`](../user-guide/pods.md)
|
||||
API. We eventually plan to port it to a generic plug-in mechanism, once one is
|
||||
implemented.
|
||||
A single Kubernetes cluster may span multiple availability zones.
|
||||
|
||||
However, for the highest availability, we recommend using [cluster federation](federation.md).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
|
|
|||
|
|
@ -18,13 +18,13 @@ Similarly, mature organizations will be able to rely on a centrally managed DNS
|
|||
|
||||
With that in mind, the proposals here will devolve into simply using DNS names that are validated with system installed root certificates.
|
||||
|
||||
## Cluster Location information
|
||||
## Cluster location information (aka ClusterInfo)
|
||||
|
||||
First we define a set of information that identifies a cluster and how to talk to it.
|
||||
First we define a set of information that identifies a cluster and how to talk to it. We will call this ClusterInfo in this document.
|
||||
|
||||
While we could define a new format for communicating the set of information needed here, we'll start by using the standard [`kubeconfig`](http://kubernetes.io/docs/user-guide/kubeconfig-file/) file format.
|
||||
|
||||
It is expected that the `kubeconfig` file will have a single unnamed `Cluster` entry. Other information (especially authentication secrets) must be omitted.
|
||||
It is expected that the `kubeconfig` file will have a single unnamed `Cluster` entry. Other information (especially authentication secrets) MUST be omitted.
|
||||
|
||||
### Evolving kubeconfig
|
||||
|
||||
|
|
@ -45,7 +45,7 @@ Additions include:
|
|||
|
||||
**This is to be implemented in a later phase**
|
||||
|
||||
Any client of the cluster will want to have this information. As the configuration of the cluster changes we need the client to keep this information up to date. It is assumed that the information here won't drift so fast that clients won't be able to find *some* way to connect.
|
||||
Any client of the cluster will want to have this information. As the configuration of the cluster changes we need the client to keep this information up to date. The ClusterInfo ConfigMap (defined below) is expected to be a common place to get the latest ClusterInfo for any cluster. Clients should periodically grab this and cache it. It is assumed that the information here won't drift so fast that clients won't be able to find *some* way to connect.
|
||||
|
||||
In exceptional circumstances it is possible that this information may be out of date and a client would be unable to connect to a cluster. Consider the case where a user has kubectl set up and working well and then doesn't run kubectl for quite a while. It is possible that over this time (a) the set of servers will have migrated so that all endpoints are now invalid or (b) the root certificates will have rotated so that the user can no longer trust any endpoint.
|
||||
|
||||
|
|
@ -55,31 +55,35 @@ Now that we know *what* we want to get to the client, the question is how. We w
|
|||
|
||||
### Method: Out of Band
|
||||
|
||||
The simplest way to do this would be to simply put this object in a file and copy it around. This is more overhead for the user, but it is easy to implement and lets users rely on existing systems to distribute configuration.
|
||||
The simplest way to obtain ClusterInfo this would be to simply put this object in a file and copy it around. This is more overhead for the user, but it is easy to implement and lets users rely on existing systems to distribute configuration.
|
||||
|
||||
For the `kubeadm` flow, the command line might look like:
|
||||
|
||||
```
|
||||
kubeadm join --cluster-info-file=my-cluster.yaml
|
||||
kubeadm join --discovery-file=my-cluster.yaml
|
||||
```
|
||||
|
||||
Note that TLS bootstrap (which establishes a way for a client to authenticate itself to the server) is a separate issue and has its own set of methods. This command line may have a TLS bootstrap token (or config file) on the command line also.
|
||||
After loading the ClusterInfo from a file, the client MAY look for updated information from the server by reading the `kube-public` `cluster-info` ConfigMap defined below. However, when retrieving this ConfigMap the client MUST validate the certificate chain when talking to the API server.
|
||||
|
||||
**Note:** TLS bootstrap (which establishes a way for a client to authenticate itself to the server) is a separate issue and has its own set of methods. This command line may have a TLS bootstrap token (or config file) on the command line also. For this reason, even thought the `--discovery-file` argument is in the form of a `kubeconfig`, it MUST NOT contain client credentials as defined above.
|
||||
|
||||
### Method: HTTPS Endpoint
|
||||
|
||||
If the ClusterInfo information is hosted in a trusted place via HTTPS you can just request it that way. This will use the root certificates that are installed on the system. It may or may not be appropriate based on the user's constraints.
|
||||
If the ClusterInfo information is hosted in a trusted place via HTTPS you can just request it that way. This will use the root certificates that are installed on the system. It may or may not be appropriate based on the user's constraints. This method MUST use HTTPS. Also, even though the payload for this URL is the `kubeconfig` format, it MUST NOT contain client credentials.
|
||||
|
||||
```
|
||||
kubeadm join --cluster-info-url="https://example/mycluster.yaml"
|
||||
kubeadm join --discovery-url="https://example/mycluster.yaml"
|
||||
```
|
||||
|
||||
This is really a shorthand for someone doing something like (assuming we support stdin with `-`):
|
||||
|
||||
```
|
||||
curl https://example.com/mycluster.json | kubeadm join --cluster-info-file=-
|
||||
curl https://example.com/mycluster.json | kubeadm join --discovery-file=-
|
||||
```
|
||||
|
||||
If the user requires some auth to the HTTPS server (to keep the ClusterInfo object private) that can be done in the curl command equivalent. Or we could eventually add it to `kubeadm` directly.
|
||||
After loading the ClusterInfo from a URL, the client MAY look for updated information from the server by reading the `kube-public` `cluster-info` ConfigMap defined below. However, when retrieving this ConfigMap the client MUST validate the certificate chain when talking to the API server.
|
||||
|
||||
**Note:** support for loading from stdin for `--discovery-file` may not be implemented immediately.
|
||||
|
||||
### Method: Bootstrap Token
|
||||
|
||||
|
|
@ -100,7 +104,7 @@ The user experience for joining a cluster would be something like:
|
|||
kubeadm join --token=ae23dc.faddc87f5a5ab458 <address>
|
||||
```
|
||||
|
||||
**Note:** This is logically a different use of the token from TLS bootstrap. We harmonize these usages and allow the same token to play double duty.
|
||||
**Note:** This is logically a different use of the token used for authentication for TLS bootstrap. We harmonize these usages and allow the same token to play double duty.
|
||||
|
||||
#### Implementation Flow
|
||||
|
||||
|
|
@ -130,6 +134,8 @@ The first part of the token is the `token-id`. The second part is the `token-se
|
|||
|
||||
This new type of token is different from the current CSV token authenticator that is currently part of Kubernetes. The CSV token authenticator requires an update on disk and a restart of the API server to update/delete tokens. As we prove out this token mechanism we may wish to deprecate and eventually remove that mechanism.
|
||||
|
||||
The `token-id` must be 6 characters and the `token-secret` must be 16 characters. They must be lower case ASCII letters and numbers. Specifically it must match the regular expression: `[a-z0-9]{6}\.[a-z0-9]{16}`. There is no strong reasoning behind this beyond the history of how this has been implemented in alpha versions.
|
||||
|
||||
#### NEW: Bootstrap Token Secrets
|
||||
|
||||
Bootstrap tokens are stored and managed via Kubernetes secrets in the `kube-system` namespace. They have type `bootstrap.kubernetes.io/token`.
|
||||
|
|
@ -138,11 +144,13 @@ The following keys are on the secret data:
|
|||
* **token-id**. As defined above.
|
||||
* **token-secret**. As defined above.
|
||||
* **expiration**. After this time the token should be automatically deleted. This is encoded as an absolute UTC time using RFC3339.
|
||||
* **usage-bootstrap-signing**. Set to `true` to indicate this token should be used for signing bootstrap configs. If omitted or some other string, it defaults to `false`.
|
||||
* **usage-bootstrap-signing**. Set to `true` to indicate this token should be used for signing bootstrap configs. If this is missing from the token secret or set to any other value, the usage is not allowed.
|
||||
* **usage-bootstrap-authentication**. Set to true to indicate that this token should be used for authenticating to the API server. If this is missing from the token secret or set to any other value, the usage is not allowed. The bootstrap token authenticagtor will use this token to auth as a user that is `system:bootstrap:<token-id>` in the group `system:bootstrappers`.
|
||||
* **description**. An optional free form description field for denoting the purpose of the token. If users have especially complex token management neads, they are encouraged to use labels and annotations instead of packing machined readable data in to this field.
|
||||
|
||||
These secrets can be named anything but it is suggested that they be named `bootstrap-token-<token-id>`.
|
||||
**Future**: At some point in the future we may add the ability to specify a set of groups that this token part of during authentication. This will allow users to segment off which tokens are allowed to bootstrap which nodes. However, we will restrict these groups under `system:bootstrappers:*` to discourage usage outside of bootstrapping.
|
||||
|
||||
**QUESTION:** Should we also spec out now how we can use this token for TLS bootstrap.
|
||||
These secrets MUST be named `bootstrap-token-<token-id>`. If a token doesn't adhere to this naming scheme it MUST be ignored. The secret MUST also be ignored if the `token-id` key in the secret doesn't match the name of the secret.
|
||||
|
||||
#### Quick Primer on JWS
|
||||
|
||||
|
|
@ -167,11 +175,60 @@ A new well known ConfigMap will be created in the `kube-public` namespace called
|
|||
|
||||
Users configuring the cluster (and eventually the cluster itself) will update the `kubeconfig` key here with the limited `kubeconfig` above.
|
||||
|
||||
A new controller is introduced that will watch for both new/modified bootstrap tokens and changes to the `cluster-info` ConfigMap. As things change it will generate new JWS signatures. These will be saved under ConfigMap keys of the pattern `jws-kubeconfig-<token-id>`.
|
||||
A new controller (`bootstrapsigner`) is introduced that will watch for both new/modified bootstrap tokens and changes to the `cluster-info` ConfigMap. As things change it will generate new JWS signatures. These will be saved under ConfigMap keys of the pattern `jws-kubeconfig-<token-id>`.
|
||||
|
||||
In addition, `jws-kubeconfig-<token-id>-hash` will be set to the MD5 hash of the contents of the `kubeconfig` data. This will be in the form of `md5:d3b07384d113edec49eaa6238ad5ff00`. This is done so that the controller can detect which signatures need to be updated without reading all of the tokens.
|
||||
Another controller (`tokencleaner`) is introduced that deletes tokens that are past their expiration time.
|
||||
|
||||
This controller will also delete tokens that are past their expiration time.
|
||||
Logically these controllers could run as a component in the control plane. But, for the sake of efficiency, they are bundeled as part of the Kubernetes controller-manager.
|
||||
|
||||
## `kubeadm` UX
|
||||
|
||||
We extend kubeadm with a set of flags and helper commands for managing and using these tokens.
|
||||
|
||||
### `kubeadm init` flags
|
||||
|
||||
* `--token` If set, this injects the bootstrap token to use when initializing the cluster. If this is unset, then a random token is created and shown to the user. If set explicitly to the empty string then no token is generated or created. This token is used for both discovery and TLS bootstrap by having `usage-bootstrap-signing` and `usage-bootstrap-authentication` set on the token secret.
|
||||
* `--token-ttl` If set, this sets the TTL for the lifetime of this token. Defaults to 0 which means "forever"
|
||||
|
||||
### `kubeadm join` flags
|
||||
|
||||
* `--token` This sets the token for both discovery and bootstrap auth.
|
||||
* `--discovery-url` If set this will grab the cluster-info data (a kubeconfig) from a URL. Due to the sensitive nature of this data, we will only support https URLs. This also supports `username:password@host` syntax for doing HTTP auth.
|
||||
* `--discovery-file` If set, this will load the cluster-info from a file.
|
||||
* `--discovery-token` If set, (or set via `--token`) then we will be using the token scheme described above.
|
||||
* `--tls-bootstrap-token` (not officially part of this spec) This sets the token used to temporarily authenticate to the API server in order to submit a CSR for signing. If `--insecure-experimental-approve-all-kubelet-csrs-for-group` is set to `system:bootstrappers` then these CSRs will be approved automatically for a hands off joining flow.
|
||||
|
||||
Only one of `--discovery-url`, `--discovery-file` or `--discovery-token` can be set. If more than one is set then an error is surfaced and `kubeadm join` exits. Setting `--token` counts as setting `--discovery-token`.
|
||||
|
||||
### `kubeadm token` commands
|
||||
|
||||
`kubeadm` provides a set of utilities for manipulating token secrets in a running server.
|
||||
|
||||
* `kubeadm token create [token]` Creates a token server side. With no options this'll create a token that is used for discovery and TLS bootstrap.
|
||||
* `[token]` The actual token value (in `id.secret` form) to write in. If unset, a random value is generated.
|
||||
* `--usages` A list of usages. Defaults to `signing,authentication`.
|
||||
* If the `signing` usage is specified, the token will be used (by the BootstrapSigner controller in the KCM) to JWS-sign the ConfigMap and can then be used for discovery.
|
||||
* If the `authentication` usage is specified, the token can be used to authenticate for TLS bootstrap.
|
||||
* `--ttl` The TTL for this token. This sets the expiration of the token as a duration from the current time. This is converted into an absolute UTC time as it is written into the token secret.
|
||||
* `--description` Sets the free form description field for the token.
|
||||
* `kubeadm token delete <token-id>|<token-id>.<token-secret>`
|
||||
* Users can either just specify the id or the full token. This will delete the token if it exists.
|
||||
* `kubeadm token list`
|
||||
* List tokens in a table form listing out the `token-id.token-secret`, the TTL, the absolute expiration time, the usages, and the description.
|
||||
* **Question** Support a `--json` or `-o json` way to make this info programmatic? We don't want to recreate `kubectl` here and these aren't plain API objects so we can't reuse that plumbing easily.
|
||||
* `kubeadm token generate` This currently exists but is documented here for completeness. This pure client side method just generated a random token in the correct form.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
Our documentations (and output from `kubeadm`) should stress to users that when the token is configured for authenitication and used for TLS bootstrap (using `--insecure-experimental-approve-all-kubelet-csrs-for-group`) it is essentially a root password on the cluster and should be protected as such. Users should set a TTL to limit this risk. Or, after the cluster is up and running, users should delete the token using `kubeadm token delete`.
|
||||
|
||||
After some back and forth, we decided to keep the separator in the token between the ID and Secret be a `.`. During the 1.6 cycle, at one point `:` was implemented but then reverted.
|
||||
|
||||
See https://github.com/kubernetes/client-go/issues/114 for details on creating a shared package with common constants for this scheme.
|
||||
|
||||
This proposal assumes RBAC to lock things down in a couple of ways. First, it will open up `cluster-info` ConfigMap in `kube-public` so that it is readable by unauthenticated users. Next, it will make it so that the identities in the `system:bootstrappers` group can only be used with the certs API to submit CSRs. After a TLS certificate is created, that identity should be used instead of the bootstrap token.
|
||||
|
||||
The binding of `system:bootstrappers` to the ability to submit certs is not part of the default RBAC configuration. Tools like `kubeadm` will have to explicitly create this binding.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
|
|
|||
|
|
@ -0,0 +1,168 @@
|
|||
## Refactor Cloud Provider out of Kubernetes Core
|
||||
|
||||
As kubernetes has evolved tremendously, it has become difficult for different cloudproviders (currently 7) to make changes and iterate quickly. Moreover, the cloudproviders are constrained by the kubernetes build/release life-cycle. This proposal aims to move towards a kubernetes code base where cloud providers specific code will move out of the core repository and into "official" repositories, where it will be maintained by the cloud providers themselves.
|
||||
|
||||
### 1. Current use of Cloud Provider
|
||||
|
||||
The following components have cloudprovider dependencies
|
||||
|
||||
1. kube-controller-manager
|
||||
2. kubelet
|
||||
3. kube-apiserver
|
||||
|
||||
#### Cloud Provider in Kube-Controller-Manager
|
||||
|
||||
The kube-controller-manager has many controller loops
|
||||
|
||||
- nodeController
|
||||
- volumeController
|
||||
- routeController
|
||||
- serviceController
|
||||
- replicationController
|
||||
- endpointController
|
||||
- resourceQuotaController
|
||||
- namespaceController
|
||||
- deploymentController
|
||||
- etc..
|
||||
|
||||
Among these controller loops, the following are cloud provider dependent.
|
||||
|
||||
- nodeController
|
||||
- volumeController
|
||||
- routeController
|
||||
- serviceController
|
||||
|
||||
The nodeController uses the cloudprovider to check if a node has been deleted from the cloud. If cloud provider reports a node as deleted, then this controller immediately deletes the node from kubernetes. This check removes the need to wait for a specific amount of time to conclude that an inactive node is actually dead.
|
||||
|
||||
The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.
|
||||
|
||||
The routeController configures routes for hosts in the cloud provider.
|
||||
|
||||
The serviceController maintains a list of currently active nodes, and is responsible for creating and deleting LoadBalancers in the underlying cloud.
|
||||
|
||||
#### Cloud Provider in Kubelet
|
||||
|
||||
Moving on to the kubelet, the following cloud provider dependencies exist in kubelet.
|
||||
|
||||
- Find the cloud nodename of the host that kubelet is running on for the following reasons :
|
||||
1. To obtain the config map for the kubelet, if one already exists
|
||||
2. To uniquely identify current node using nodeInformer
|
||||
3. To instantiate a reference to the current node object
|
||||
- Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it
|
||||
- Periodically poll the cloud provider to figure out if the node has any new IP addresses associated with it
|
||||
- It sets a condition that makes the node unschedulable until cloud routes are configured.
|
||||
- It allows the cloud provider to post process DNS settings
|
||||
|
||||
#### Cloud Provider in Kube-apiserver
|
||||
|
||||
Finally, in the kube-apiserver, the cloud provider is used for transferring SSH keys to all of the nodes, and within an admission controller for setting labels on persistent volumes.
|
||||
|
||||
### 2. Strategy for refactoring Kube-Controller-Manager
|
||||
|
||||
In order to create a 100% cloud independent controller manager, the controller-manager will be split into multiple binaries.
|
||||
|
||||
1. Cloud dependent controller-manager binaries
|
||||
2. Cloud independent controller-manager binaries - This is the existing `kube-controller-manager` that is being shipped with kubernetes releases.
|
||||
|
||||
The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service. The rest of the controllers will be run in the cloud independent controller manager.
|
||||
|
||||
The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared datastructures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity.
|
||||
|
||||
Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary could link these controllers in, as it serves as a reference form of the controller implementation.
|
||||
|
||||
There are four controllers that rely on cloud provider specific code. These are node controller, service controller, route controller and attach detach controller. Copies of each of these controllers have been bundled them together into one binary. The cloud dependent binary registers itself as a controller, and runs the cloud specific controller loops with the user-agent named "external-controller-manager".
|
||||
|
||||
RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary.
|
||||
|
||||
NodeController does a lot more than just talk to the cloud. It does the following operations -
|
||||
|
||||
1. CIDR management
|
||||
2. Monitor Node Status
|
||||
3. Node Pod Eviction
|
||||
|
||||
While Monitoring Node status, if the status reported by kubelet is either 'ConditionUnknown' or 'ConditionFalse', then the controller checks if the node has been deleted from the cloud provider. If it has already been deleted from the cloud provider, then it deletes the nodeobject without waiting for the `monitorGracePeriod` amount of time. This is the only operation that needs to be moved into the cloud dependent controller manager.
|
||||
|
||||
Finally, The attachDetachController is tricky, and it is not simple to disentangle it from the controller-manager easily, therefore, this will be addressed with Flex Volumes (Discussed under a separate section below)
|
||||
|
||||
### 3. Strategy for refactoring Kubelet
|
||||
|
||||
The majority of the calls by the kubelet to the cloud is done during the initialization of the Node Object. The other uses are for configuring Routes (in case of GCE), scrubbing DNS, and periodically polling for IP addresses.
|
||||
|
||||
All of the above steps, except the Node initialization step can be moved into a controller. Specifically, IP address polling, and configuration of Routes can be moved into the cloud dependent controller manager.
|
||||
|
||||
Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.
|
||||
|
||||
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.
|
||||
|
||||
This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints.
|
||||
|
||||
### 4. Strategy for refactoring Kube-ApiServer
|
||||
|
||||
Kube-apiserver uses the cloud provider for two purposes
|
||||
|
||||
1. Distribute SSH Keys - This can be moved to the cloud dependent controller manager
|
||||
2. Admission Controller for PV - This can be refactored using the taints approach used in Kubelet
|
||||
|
||||
### 5. Strategy for refactoring Volumes
|
||||
|
||||
Volumes need cloud providers, but they only need SPECIFIC cloud providers. The majority of volume management logic resides in the controller manager. These controller loops need to be moved into the cloud-controller manager. The cloud controller manager also needs a mechanism to read parameters for initilization from cloud config. This can be done via config maps.
|
||||
|
||||
There is an entirely different approach to refactoring volumes - Flex Volumes. There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core.
|
||||
|
||||
### 6. Deployment, Upgrades and Downgrades
|
||||
|
||||
This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running.
|
||||
|
||||
##### 6.1 Upgrading kubelet and proxy
|
||||
|
||||
The kubelet and proxy runs on every node in the kubernetes cluster. Based on your setup (systemd/other), you can follow the normal upgrade steps for it. This change does not affect the kubelet and proxy upgrade steps for your setup.
|
||||
|
||||
##### 6.2 Upgrading plugins
|
||||
|
||||
Plugins such as cni, flex volumes can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.
|
||||
|
||||
###### 6.3 Upgrading kubernetes core services
|
||||
|
||||
The master node components (kube-controller-manager,kube-scheduler, kube-apiserver etc.) can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.
|
||||
|
||||
##### 6.4 Applying the cloud-controller-manager
|
||||
|
||||
This is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using
|
||||
|
||||
```
|
||||
kubectl apply -f cloud-controller-manager.yml
|
||||
```
|
||||
|
||||
This will start the cloud specific controller manager in your kuberentes setup.
|
||||
|
||||
The downgrade steps are also the same as before for all the components except the cloud-controller-manager. In case of the cloud-controller-manager, the deployment should be deleted using
|
||||
|
||||
```
|
||||
kubectl delete -f cloud-controller-manager.yml
|
||||
```
|
||||
|
||||
### 7. Roadmap
|
||||
|
||||
##### 7.1 Transition plan
|
||||
|
||||
Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager. At this stage, the cloud-controller-manager will still be in "beta" stage and optional.
|
||||
|
||||
Release 1.7: In this release, all of the supported turnups will be converted to use cloud controller by default. At this point users will still be allowed to opt-out. Users will be expected run the monolithic cloud controller binary. The cloud controller manager will still continue to use the existing library, but code will be factored out to reduce literal duplication between the controller-manager and the cloud-controller-manager. A deprecation announcement will be made to inform users to switch to the cloud-controller-manager.
|
||||
|
||||
Release 1.8: The main change aimed for this release is to break up the various cloud providers into individual binaries. Users will still be allowed to opt-out. There will be a second warning to inform users about the deprecation of the `--cloud-provider` option in the controller-manager.
|
||||
|
||||
Release 1.9: All of the legacy cloud providers will be completely removed in this version
|
||||
|
||||
##### 7.2 Code/Library Evolution
|
||||
|
||||
* Break controller-manager into 2 binaries. One binary will be the existing controller-manager, and the other will only run the cloud specific loops with no other changes. The new cloud-controller-manager will still load all the cloudprovider libraries, and therefore will allow the users to choose which cloud-provider to use.
|
||||
* Move the cloud specific parts of kubelet out using the external admission controller pattern mentioned in the previous sections above.
|
||||
* The cloud controller will then be made into a library. It will take the cloudprovider.Interface as an argument to its constructor. Individual cloudprovider binaries will be created using this library.
|
||||
* Cloud specific operations will be moved out of kube-apiserver using the external admission controller pattern mentioned above.
|
||||
* All cloud specific volume controller loops (attach, detach, provision operation controllers) will be switched to using flex volumes. Flex volumes do not need in-tree cloud specific calls.
|
||||
* As the final step, all of the cloud provider specific code will be moved out of tree.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
@ -0,0 +1,85 @@
|
|||
# Cloud Provider (specifically GCE and AWS) metrics for Storage API calls
|
||||
|
||||
## Goal
|
||||
|
||||
Kubernetes should provide metrics such as - count & latency percentiles
|
||||
for cloud provider API it uses to provision persistent volumes.
|
||||
|
||||
In a ideal world - we would want these metrics for all cloud providers
|
||||
and for all API calls kubernetes makes but to limit the scope of this feature
|
||||
we will implement metrics for:
|
||||
|
||||
* GCE
|
||||
* AWS
|
||||
|
||||
We will also implement metrics only for storage API calls for now. This feature
|
||||
does introduces hooks into kubernetes code which can be used to add additonal metrics
|
||||
but we only focus on storage API calls here.
|
||||
|
||||
## Motivation
|
||||
|
||||
* Cluster admins should be able to monitor Cloud API usage of Kubernetes. It will help
|
||||
them detect problems in certain scenarios which can blow up the API quota of Cloud
|
||||
provider.
|
||||
* Cluster admins should also be able to monitor health and latency of Cloud API on
|
||||
which kubernetes depends on.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Metric format and collection
|
||||
|
||||
Metrics emitted from cloud provider will fall under category of service metrics
|
||||
as defined in [Kubernetes Monitoring Architecture](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/monitoring_architecture.md).
|
||||
|
||||
|
||||
The metrics will be emitted using [Prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and available for collection
|
||||
from `/metrics` HTTP endpoint of kubelet, controller etc. All Kubernetes core components already emit
|
||||
metrics on `/metrics` HTTP endpoint. This proposal merely extends available metrics to include Cloud provider metrics as well.
|
||||
|
||||
|
||||
Any collector which can parse Prometheus metric format should be able to collect
|
||||
metrics from these endpoints.
|
||||
|
||||
A more detailed description of monitoring pipeline can be found in [Monitoring architecture] (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/monitoring_architecture.md#monitoring-pipeline) document.
|
||||
|
||||
|
||||
#### Metric Types
|
||||
|
||||
Since we are interested in count(or rate) and latency percentile metrics of API calls Kubernetes is making to
|
||||
the external Cloud Provider - we will use [Histogram](https://prometheus.io/docs/practices/histograms/) type for
|
||||
emitting these metrics.
|
||||
|
||||
We will be using `HistogramVec` type so as we can attach dimensions at runtime. Whenever available
|
||||
`namespace` will reported as a dimension with the metric.
|
||||
|
||||
### GCE Implementation
|
||||
|
||||
For GCE we simply use `gensupport.RegisterHook()` to register a function which will be called
|
||||
when request is made and response returns.
|
||||
|
||||
To begin with we will start emitting following metrics for GCE. Because these metrics are of type
|
||||
`Summary` - both count and latency will be automatically calculated.
|
||||
|
||||
1. gce_instance_list
|
||||
2. gce_disk_insert
|
||||
3. gce_disk_delete
|
||||
4. gce_attach_disk
|
||||
5. gce_detach_disk
|
||||
6. gce_list_disk
|
||||
|
||||
A POC implementation can be found here - https://github.com/kubernetes/kubernetes/pull/40338/files
|
||||
|
||||
### AWS Implementation
|
||||
|
||||
For AWS currently we will use wrapper type `awsSdkEC2` to intercept all storage API calls and
|
||||
emit metric datapoints. The reason we are not using approach used for `aws/log_handler` is - because AWS SDK doesn't uses Contexts and hence we can't pass custom information such as API call name or namespace to record with metrics.
|
||||
|
||||
To begin with we will start emitting following metrics for AWS:
|
||||
|
||||
1. aws_attach_volume
|
||||
2. aws_create_tags
|
||||
3. aws_create_volume
|
||||
4. aws_delete_volume
|
||||
5. aws_describe_instance
|
||||
6. aws_describe_volume
|
||||
7. aws_detach_volume
|
||||
|
|
@ -1,101 +1,420 @@
|
|||
# ControllerRef proposal
|
||||
|
||||
Author: gmarek@
|
||||
Last edit: 2016-05-11
|
||||
Status: raw
|
||||
* Authors: gmarek, enisoc
|
||||
* Last edit: [2017-02-06](#history)
|
||||
* Status: partially implemented
|
||||
|
||||
Approvers:
|
||||
- [ ] briangrant
|
||||
- [ ] dbsmith
|
||||
* [ ] briangrant
|
||||
* [ ] dbsmith
|
||||
|
||||
**Table of Contents**
|
||||
|
||||
- [Goal of ControllerReference](#goal-of-setreference)
|
||||
- [Non goals](#non-goals)
|
||||
- [API and semantic changes](#api-and-semantic-changes)
|
||||
- [Upgrade/downgrade procedure](#upgradedowngrade-procedure)
|
||||
- [Orphaning/adoption](#orphaningadoption)
|
||||
- [Implementation plan (sketch)](#implementation-plan-sketch)
|
||||
- [Considered alternatives](#considered-alternatives)
|
||||
* [Goals](#goals)
|
||||
* [Non-goals](#non-goals)
|
||||
* [API](#api)
|
||||
* [Behavior](#behavior)
|
||||
* [Upgrading](#upgrading)
|
||||
* [Implementation](#implementation)
|
||||
* [Alternatives](#alternatives)
|
||||
* [History](#history)
|
||||
|
||||
# Goal of ControllerReference
|
||||
# Goals
|
||||
|
||||
Main goal of `ControllerReference` effort is to solve a problem of overlapping controllers that fight over some resources (e.g. `ReplicaSets` fighting with `ReplicationControllers` over `Pods`), which cause serious [problems](https://github.com/kubernetes/kubernetes/issues/24433) such as exploding memory of Controller Manager.
|
||||
* The main goal of ControllerRef (controller reference) is to solve the problem
|
||||
of controllers that fight over controlled objects due to overlapping selectors
|
||||
(e.g. a ReplicaSet fighting with a ReplicationController over Pods because
|
||||
both controllers have label selectors that match those Pods).
|
||||
Fighting controllers can [destabilize the apiserver](https://github.com/kubernetes/kubernetes/issues/24433),
|
||||
[thrash objects back-and-forth](https://github.com/kubernetes/kubernetes/issues/24152),
|
||||
or [cause controller operations to hang](https://github.com/kubernetes/kubernetes/issues/8598).
|
||||
|
||||
We don't want to have (just) an in-memory solution, as we don’t want a Controller Manager crash to cause massive changes in object ownership in the system. I.e. we need to persist the information about "owning controller".
|
||||
We don't want to have just an in-memory solution because we don't want a
|
||||
Controller Manager crash to cause a massive reshuffling of controlled objects.
|
||||
We also want to expose the mapping so that controllers can be in multiple
|
||||
processes (e.g. for HA of kube-controller-manager) and separate binaries
|
||||
(e.g. for controllers that are API extensions).
|
||||
Therefore, we will persist the mapping from each object to its controller in
|
||||
the API object itself.
|
||||
|
||||
Secondary goal of this effort is to improve performance of various controllers and schedulers, by removing the need for expensive lookup for all matching "controllers".
|
||||
* A secondary goal of ControllerRef is to provide back-links from a given object
|
||||
to the controller that manages it, which can be used for:
|
||||
* Efficient object->controller lookup, without having to list all controllers.
|
||||
* Generic object grouping (e.g. in a UI), without having to know about all
|
||||
third-party controller types in advance.
|
||||
* Replacing certain uses of the `kubernetes.io/created-by` annotation,
|
||||
and potentially enabling eventual deprecation of that annotation.
|
||||
However, deprecation is not being proposed at this time, so any uses that
|
||||
remain will be unaffected.
|
||||
|
||||
# Non goals
|
||||
# Non-goals
|
||||
|
||||
Cascading deletion is not a goal of this effort. Cascading deletion will use `ownerReferences`, which is a [separate effort](garbage-collection.md).
|
||||
* Overlapping selectors will continue to be considered user error.
|
||||
|
||||
`ControllerRef` will extend `OwnerReference` and reuse machinery written for it (GarbageCollector, adoption/orphaning logic).
|
||||
ControllerRef will prevent this user error from destabilizing the cluster or
|
||||
causing endless back-and-forth fighting between controllers, but it will not
|
||||
make it completely safe to create controllers with overlapping selectors.
|
||||
|
||||
# API and semantic changes
|
||||
In particular, this proposal does not address cases such as Deployment or
|
||||
StatefulSet, in which "families" of orphans may exist that ought to be adopted
|
||||
as indivisible units.
|
||||
Since multiple controllers may race to adopt orphans, the user must ensure
|
||||
selectors do not overlap to avoid breaking up families.
|
||||
Breaking up families of orphans could result in corruption or loss of
|
||||
Deployment rollout state and history, and possibly also corruption or loss of
|
||||
StatefulSet application data.
|
||||
|
||||
There will be a new API field in the `OwnerReference` in which we will store an information if given owner is a managing controller:
|
||||
* ControllerRef is not intended to replace [selector generation](selector-generation.md),
|
||||
used by some controllers like Job to ensure all selectors are unique
|
||||
and prevent overlapping selectors from occurring in the first place.
|
||||
|
||||
```
|
||||
OwnerReference {
|
||||
…
|
||||
Controller bool
|
||||
However, ControllerRef will still provide extra protection and consistent
|
||||
cross-controller semantics for controllers that already use selector
|
||||
generation. For example, selector generation can be manually overridden,
|
||||
which leaves open the possibility of overlapping selectors due to user error.
|
||||
|
||||
* This proposal does not change how cascading deletion works.
|
||||
|
||||
Although ControllerRef will extend OwnerReference and rely on its machinery,
|
||||
the [Garbage Collector](garbage-collection.md) will continue to implement
|
||||
cascading deletion as before.
|
||||
That is, the GC will look at all OwnerReferences without caring whether a
|
||||
given OwnerReference happens to be a ControllerRef or not.
|
||||
|
||||
# API
|
||||
|
||||
The `Controller` API field in OwnerReference marks whether a given owner is a
|
||||
managing controller:
|
||||
|
||||
```go
|
||||
type OwnerReference struct {
|
||||
…
|
||||
// If true, this reference points to the managing controller.
|
||||
// +optional
|
||||
Controller *bool
|
||||
}
|
||||
```
|
||||
|
||||
From now on by `ControllerRef` we mean an `OwnerReference` with `Controller=true`.
|
||||
A ControllerRef is thus defined as an OwnerReference with `Controller=true`.
|
||||
Each object may have at most one ControllerRef in its list of OwnerReferences.
|
||||
The validator for OwnerReferences lists will fail any update that would violate
|
||||
this invariant.
|
||||
|
||||
Most controllers (all that manage collections of things defined by label selector) will have slightly changed semantics: currently controller owns an object if its selector matches object’s labels and if it doesn't notice an older controller of the same kind that also matches the object's labels, but after introduction of `ControllerReference` a controller will own an object iff selector matches labels and the `OwnerReference` with `Controller=true`points to it.
|
||||
# Behavior
|
||||
|
||||
If the owner's selector or owned object's labels change, the owning controller will be responsible for orphaning (clearing `Controller` field in the `OwnerReference` and/or deleting `OwnerReference` altogether) objects, after which adoption procedure (setting `Controller` field in one of `OwnerReferencec` and/or adding new `OwnerReferences`) might occur, if another controller has a selector matching.
|
||||
This section summarizes the intended behavior for existing controllers.
|
||||
It can also serve as a guide for respecting ControllerRef when writing new
|
||||
controllers.
|
||||
|
||||
For debugging purposes we want to add an `adoptionTime` annotation prefixed with `kubernetes.io/` which will keep the time of last controller ownership transfer.
|
||||
## The Three Laws of Controllers
|
||||
|
||||
# Upgrade/downgrade procedure
|
||||
All controllers that manage collections of objects should obey the following
|
||||
rules.
|
||||
|
||||
Because `ControllerRef` will be a part of `OwnerReference` effort it will have the same upgrade/downgrade procedures.
|
||||
1. **Take ownership**
|
||||
|
||||
# Orphaning/adoption
|
||||
A controller should claim *ownership* of any objects it creates by adding a
|
||||
ControllerRef, and may also claim ownership of an object it didn't create,
|
||||
as long as the object has no existing ControllerRef (i.e. it is an *orphan*).
|
||||
|
||||
Because `ControllerRef` will be a part of `OwnerReference` effort it will have the same orphaning/adoption procedures.
|
||||
1. **Don't interfere**
|
||||
|
||||
Controllers will orphan objects they own in two cases:
|
||||
* Change of label/selector causing selector to stop matching labels (executed by the controller)
|
||||
* Deletion of a controller with `Orphaning=true` (executed by the GarbageCollector)
|
||||
A controller should not take any action (e.g. edit/scale/delete) on an object
|
||||
it does not own, except to [*adopt*](#adoption) the object if allowed by the
|
||||
First Law.
|
||||
|
||||
We will need a secondary orphaning mechanism in case of unclean controller deletion:
|
||||
* GarbageCollector will remove `ControllerRef` from objects that no longer points to existing controllers
|
||||
1. **Don't share**
|
||||
|
||||
Controller will adopt (set `Controller` field in the `OwnerReference` that points to it) an object whose labels match its selector iff:
|
||||
* there are no `OwnerReferences` with `Controller` set to true in `OwnerReferences` array
|
||||
* `DeletionTimestamp` is not set
|
||||
and
|
||||
* Controller is the first controller that will manage to adopt the Pod from all Controllers that have matching label selector and don't have `DeletionTimestamp` set.
|
||||
A controller should not count an object it does not own toward satisfying its
|
||||
desired state (e.g. a certain number of replicas), although it may include
|
||||
the object in plans to achieve its desired state (e.g. through adoption)
|
||||
as long as such plans do not conflict with the First or Second Laws.
|
||||
|
||||
By design there are possible races during adoption if multiple controllers can own a given object.
|
||||
## Adoption
|
||||
|
||||
To prevent re-adoption of an object during deletion the `DeletionTimestamp` will be set when deletion is starting. When a controller has a non-nil `DeletionTimestamp` it won't take any actions except updating its `Status` (in particular it won't adopt any objects).
|
||||
If a controller finds an orphaned object (an object with no ControllerRef) that
|
||||
matches its selector, it may try to adopt the object by adding a ControllerRef.
|
||||
Note that whether or not the controller *should* try to adopt the object depends
|
||||
on the particular controller and object.
|
||||
|
||||
# Implementation plan (sketch):
|
||||
Multiple controllers can race to adopt a given object, but only one can win
|
||||
by being the first to add a ControllerRef to the object's OwnerReferences list.
|
||||
The losers will see their adoptions fail due to a validation error as explained
|
||||
[above](#api).
|
||||
|
||||
* Add API field for `Controller`,
|
||||
* Extend `OwnerReference` adoption procedure to set a `Controller` field in one of the owners,
|
||||
* Update all affected controllers to respect `ControllerRef`.
|
||||
If a controller has a non-nil `DeletionTimestamp`, it must not attempt adoption
|
||||
or take any other actions except updating its `Status`.
|
||||
This prevents readoption of objects orphaned by the [orphan finalizer](garbage-collection.md#part-ii-the-orphan-finalizer)
|
||||
during deletion of the controller.
|
||||
|
||||
Necessary related work:
|
||||
* `OwnerReferences` are correctly added/deleted,
|
||||
* GarbageCollector removes dangling references,
|
||||
* Controllers don't take any meaningful actions when `DeletionTimestamps` is set.
|
||||
## Orphaning
|
||||
|
||||
# Considered alternatives
|
||||
When a controller is deleted, the objects it owns will either be orphaned or
|
||||
deleted according to the normal [Garbage Collection](garbage-collection.md)
|
||||
behavior, based on OwnerReferences.
|
||||
|
||||
In addition, if a controller finds that it owns an object that no longer matches
|
||||
its selector, it should orphan the object by removing itself from the object's
|
||||
OwnerReferences list. Since ControllerRef is just a special type of
|
||||
OwnerReference, this also means the ControllerRef is removed.
|
||||
|
||||
## Watches
|
||||
|
||||
Many controllers use watches to *sync* each controller instance (prompting it to
|
||||
reconcile desired and actual state) as soon as a relevant event occurs for one
|
||||
of its controlled objects, as well as to let controllers wait for asynchronous
|
||||
operations to complete on those objects.
|
||||
The controller subscribes to a stream of events about controlled objects
|
||||
and routes each event to a particular controller instance.
|
||||
|
||||
Previously, the controller used only label selectors to decide which
|
||||
controller to route an event to. If multiple controllers had overlapping
|
||||
selectors, events might be misrouted, causing the wrong controllers to sync.
|
||||
Controllers could also freeze because they keep waiting for an event that
|
||||
already came but was misrouted, manifesting as `kubectl` commands that hang.
|
||||
|
||||
Some controllers introduced a workaround to break ties. For example, they would
|
||||
sort all controller instances with matching selectors, first by creation
|
||||
timestamp and then by name, and always route the event to the first controller
|
||||
in this list. However, that did not prevent misrouting if the overlapping
|
||||
controllers were of different types. It also only worked while controllers
|
||||
themselves assigned ownership over objects using the same tie-break rules.
|
||||
|
||||
Now that controller ownership is defined in terms of ControllerRef,
|
||||
controllers should use the following guidelines for responding to watch events:
|
||||
|
||||
* If the object has a ControllerRef:
|
||||
* Sync only the referenced controller.
|
||||
* Update `expectations` counters for the referenced controller.
|
||||
* If an *Update* event removes the ControllerRef, sync any controllers whose
|
||||
selectors match to give each one a chance to adopt the object.
|
||||
* If the object is an orphan:
|
||||
* *Add* event
|
||||
* Sync any controllers whose selectors match to give each one a chance to
|
||||
adopt the object.
|
||||
* Do *not* update counters on `expectations`.
|
||||
Controllers should never be waiting for creation of an orphan because
|
||||
anything they create should have a ControllerRef.
|
||||
* *Delete* event
|
||||
* Do *not* sync any controllers.
|
||||
Controllers should never care about orphans disappearing.
|
||||
* Do *not* update counters on `expectations`.
|
||||
Controllers should never be waiting for deletion of an orphan because they
|
||||
are not allowed to delete objects they don't own.
|
||||
* *Update* event
|
||||
* If labels changed, sync any controllers whose selectors match to give each
|
||||
one a chance to adopt the object.
|
||||
|
||||
## Default garbage collection policy
|
||||
|
||||
Controllers that used to rely on client-side cascading deletion should set a
|
||||
[`DefaultGarbageCollectionPolicy`](https://github.com/kubernetes/kubernetes/blob/dd22743b54f280f41e68f206449a13ca949aca4e/pkg/genericapiserver/registry/rest/delete.go#L43)
|
||||
of `rest.OrphanDependents` when they are updated to implement ControllerRef.
|
||||
|
||||
This ensures that deleting only the controller, without specifying the optional
|
||||
`DeleteOptions.OrphanDependents` flag, remains a non-cascading delete.
|
||||
Otherwise, the behavior would change to server-side cascading deletion by
|
||||
default as soon as the controller manager is upgraded to a version that performs
|
||||
adoption by setting ControllerRefs.
|
||||
|
||||
Example from [ReplicationController](https://github.com/kubernetes/kubernetes/blob/9ae2dfacf196ca7dbee798ee9c3e1663a5f39473/pkg/registry/core/replicationcontroller/strategy.go#L49):
|
||||
|
||||
```go
|
||||
// DefaultGarbageCollectionPolicy returns Orphan because that was the default
|
||||
// behavior before the server-side garbage collection was implemented.
|
||||
func (rcStrategy) DefaultGarbageCollectionPolicy() rest.GarbageCollectionPolicy {
|
||||
return rest.OrphanDependents
|
||||
}
|
||||
```
|
||||
|
||||
New controllers that don't have legacy behavior to preserve can omit this
|
||||
controller-specific default to use the [global default](https://github.com/kubernetes/kubernetes/blob/2bb1e7581544b9bd059eafe6ac29775332e5a1d6/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L543),
|
||||
which is to enable server-side cascading deletion.
|
||||
|
||||
## Controller-specific behavior
|
||||
|
||||
This section lists considerations specific to a given controller.
|
||||
|
||||
* **ReplicaSet/ReplicationController**
|
||||
|
||||
* These controllers currenly only enable ControllerRef behavior when the
|
||||
Garbage Collector is enabled. When ControllerRef was first added to these
|
||||
controllers, the main purpose was to enable server-side cascading deletion
|
||||
via the Garbage Collector, so it made sense to gate it behind the same flag.
|
||||
|
||||
However, in order to achieve the [goals](#goals) of this proposal, it is
|
||||
necessary to set ControllerRefs and perform adoption/orphaning regardless of
|
||||
whether server-side cascading deletion (the Garbage Collector) is enabled.
|
||||
For example, turning off the GC should not cause controllers to start
|
||||
fighting again. Therefore, these controllers will be updated to always
|
||||
enable ControllerRef.
|
||||
|
||||
* **StatefulSet**
|
||||
|
||||
* A StatefulSet will not adopt any Pod whose name does not match the template
|
||||
it uses to create new Pods: `{statefulset name}-{ordinal}`.
|
||||
This is because Pods in a given StatefulSet form a "family" that may use pod
|
||||
names (via their generated DNS entries) to coordinate among themselves.
|
||||
Adopting Pods with the wrong names would violate StatefulSet's semantics.
|
||||
|
||||
Adoption is allowed when Pod names match, so it remains possible to orphan a
|
||||
family of Pods (by deleting their StatefulSet without cascading) and then
|
||||
create a new StatefulSet with the same name and selector to adopt them.
|
||||
|
||||
* **CronJob**
|
||||
|
||||
* CronJob [does not use watches](https://github.com/kubernetes/kubernetes/blob/9ae2dfacf196ca7dbee798ee9c3e1663a5f39473/pkg/controller/cronjob/cronjob_controller.go#L20),
|
||||
so [that section](#watches) doesn't apply.
|
||||
Instead, all CronJobs are processed together upon every "sync".
|
||||
* CronJob applies a `created-by` annotation to link Jobs to the CronJob that
|
||||
created them.
|
||||
If a ControllerRef is found, it should be used instead to determine this
|
||||
link.
|
||||
|
||||
## Created-by annotation
|
||||
|
||||
Aside from the change to CronJob mentioned above, several other uses of the
|
||||
`kubernetes.io/created-by` annotation have been identified that would be better
|
||||
served by ControllerRef because it tracks who *currently* controls an object,
|
||||
not just who originally created it.
|
||||
|
||||
As a first step, the specific uses identified in the [Implementation](#implementation)
|
||||
section will be augmented to prefer ControllerRef if one is found.
|
||||
If no ControllerRef is found, they will fall back to looking at `created-by`.
|
||||
|
||||
# Upgrading
|
||||
|
||||
In the absence of controllers with overlapping selectors, upgrading or
|
||||
downgrading the master to or from a version that introduces ControllerRef
|
||||
should have no user-visible effects.
|
||||
If no one is fighting, adoption should always succeed eventually, so ultimately
|
||||
only the selectors matter on either side of the transition.
|
||||
|
||||
If there are controllers with overlapping selectors at the time of an *upgrade*:
|
||||
|
||||
* Back-and-forth thrashing should stop after the upgrade.
|
||||
* The ownership of existing objects might change due to races during
|
||||
[adoption](#adoption). As mentioned in the [non-goals](#non-goals) section,
|
||||
this can include breaking up families of objects that should have stayed
|
||||
together.
|
||||
* Controllers might create additional objects because they start to respect the
|
||||
["Don't share"](#behavior) rule.
|
||||
|
||||
If there are controllers with overlapping selectors at the time of a
|
||||
*downgrade*:
|
||||
|
||||
* Controllers may begin to fight and thrash objects.
|
||||
* The ownership of existing objects might change due to ignoring ControllerRef.
|
||||
* Controllers might delete objects because they stop respecting the
|
||||
["Don't share"](#behavior) rule.
|
||||
|
||||
# Implementation
|
||||
|
||||
Checked items had been completed at the time of the [last edit](#history) of
|
||||
this proposal.
|
||||
|
||||
* [x] Add API field for `Controller` to the `OwnerReference` type.
|
||||
* [x] Add validator that prevents an object from having multiple ControllerRefs.
|
||||
* [x] Add `ControllerRefManager` types to encapsulate ControllerRef manipulation
|
||||
logic.
|
||||
* [ ] Update all affected controllers to respect ControllerRef.
|
||||
* [ ] ReplicationController
|
||||
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
|
||||
* [x] Don't adopt/manage objects.
|
||||
* [ ] Don't orphan objects.
|
||||
* [x] Include ControllerRef on all created objects.
|
||||
* [x] Set DefaultGarbageCollectionPolicy to OrphanDependents.
|
||||
* [x] Use ControllerRefManager to adopt and orphan.
|
||||
* [ ] Enable ControllerRef regardless of `--enable-garbage-collector` flag.
|
||||
* [ ] Use ControllerRef to map watch events to controllers.
|
||||
* [ ] ReplicaSet
|
||||
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
|
||||
* [x] Don't adopt/manage objects.
|
||||
* [ ] Don't orphan objects.
|
||||
* [x] Include ControllerRef on all created objects.
|
||||
* [x] Set DefaultGarbageCollectionPolicy to OrphanDependents.
|
||||
* [x] Use ControllerRefManager to adopt and orphan.
|
||||
* [ ] Enable ControllerRef regardless of `--enable-garbage-collector` flag.
|
||||
* [ ] Use ControllerRef to map watch events to controllers.
|
||||
* [ ] StatefulSet
|
||||
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
|
||||
* [ ] Include ControllerRef on all created objects.
|
||||
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
|
||||
* [ ] Use ControllerRefManager to adopt and orphan.
|
||||
* [ ] Use ControllerRef to map watch events to controllers.
|
||||
* [ ] DaemonSet
|
||||
* [x] Don't touch controlled objects if DeletionTimestamp is set.
|
||||
* [ ] Include ControllerRef on all created objects.
|
||||
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
|
||||
* [ ] Use ControllerRefManager to adopt and orphan.
|
||||
* [ ] Use ControllerRef to map watch events to controllers.
|
||||
* [ ] Deployment
|
||||
* [x] Don't touch controlled objects if DeletionTimestamp is set.
|
||||
* [x] Include ControllerRef on all created objects.
|
||||
* [x] Set DefaultGarbageCollectionPolicy to OrphanDependents.
|
||||
* [x] Use ControllerRefManager to adopt and orphan.
|
||||
* [ ] Use ControllerRef to map watch events to controllers.
|
||||
* [ ] Job
|
||||
* [x] Don't touch controlled objects if DeletionTimestamp is set.
|
||||
* [ ] Include ControllerRef on all created objects.
|
||||
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
|
||||
* [ ] Use ControllerRefManager to adopt and orphan.
|
||||
* [ ] Use ControllerRef to map watch events to controllers.
|
||||
* [ ] CronJob
|
||||
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
|
||||
* [ ] Include ControllerRef on all created objects.
|
||||
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
|
||||
* [ ] Use ControllerRefManager to adopt and orphan.
|
||||
* [ ] Use ControllerRef to map Jobs to their parent CronJobs.
|
||||
* [ ] Tests
|
||||
* [ ] Update existing controller tests to use ControllerRef.
|
||||
* [ ] Add test for overlapping controllers of different types.
|
||||
* [ ] Replace or augment uses of `CreatedByAnnotation` with ControllerRef.
|
||||
* [ ] `kubectl describe` list of controllers for an object.
|
||||
* [ ] `kubectl drain` Pod filtering.
|
||||
* [ ] Classifying failed Pods in e2e test framework.
|
||||
|
||||
# Alternatives
|
||||
|
||||
The following alternatives were considered:
|
||||
|
||||
* Centralized "ReferenceController" component that manages adoption/orphaning.
|
||||
|
||||
Not chosen because:
|
||||
* Hard to make it work for all imaginable 3rd party objects.
|
||||
* Adding hooks to framework makes it possible for users to write their own
|
||||
logic.
|
||||
|
||||
* Generic "ReferenceController": centralized component that managed adoption/orphaning
|
||||
* Dropped because: hard to write something that will work for all imaginable 3rd party objects, adding hooks to framework makes it possible for users to write their own logic
|
||||
* Separate API field for `ControllerRef` in the ObjectMeta.
|
||||
* Dropped because: nontrivial relationship between `ControllerRef` and `OwnerReferences` when it comes to deletion/adoption.
|
||||
|
||||
Not chosen because:
|
||||
* Complicated relationship between `ControllerRef` and `OwnerReference`
|
||||
when it comes to deletion/adoption.
|
||||
|
||||
# History
|
||||
|
||||
Summary of significant revisions to this document:
|
||||
|
||||
* 2017-02-06 (enisoc)
|
||||
* [Controller-specific behavior](#controller-specific-behavior)
|
||||
* Enable ControllerRef regardless of whether GC is enabled.
|
||||
* [Implementation](#implementation)
|
||||
* Audit whether existing controllers respect DeletionTimestamp.
|
||||
* 2017-02-01 (enisoc)
|
||||
* Clarify existing specifications and add details not previously specified.
|
||||
* [Non-goals](#non-goals)
|
||||
* Make explicit that overlapping selectors are still user error.
|
||||
* [Behavior](#behavior)
|
||||
* Summarize fundamental rules that all new controllers should follow.
|
||||
* Explain how the validator prevents multiple ControllerRefs on an object.
|
||||
* Specify how ControllerRef should affect the use of watches/expectations.
|
||||
* Specify important controller-specific behavior for existing controllers.
|
||||
* Specify necessary changes to default GC policy when adding ControllerRef.
|
||||
* Propose changing certain uses of `created-by` annotation to ControllerRef.
|
||||
* [Upgrading](#upgrading)
|
||||
* Specify ControllerRef-related behavior changes upon upgrade/downgrade.
|
||||
* [Implementation](#implementation)
|
||||
* List all work to be done and mark items already completed as of this edit.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
**Author**: David Ashpole (@dashpole)
|
||||
|
||||
**Last Updated**: 1/19/2017
|
||||
**Last Updated**: 1/31/2017
|
||||
|
||||
**Status**: Proposal
|
||||
|
||||
|
|
@ -21,8 +21,7 @@ This document proposes a design for the set of metrics included in an eventual C
|
|||
- [Metric Requirements:](#metric-requirements)
|
||||
- [Proposed Core Metrics:](#proposed-core-metrics)
|
||||
- [On-Demand Design:](#on-demand-design)
|
||||
- [Implementation Plan](#implementation-plan)
|
||||
- [Rollout Plan](#rollout-plan)
|
||||
- [Future Work](#future-work)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
|
|
@ -51,12 +50,12 @@ The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/mast
|
|||
By publishing core metrics, the kubelet is relieved of its responsibility to provide metrics for monitoring.
|
||||
The third party monitoring pipeline also is relieved of any responsibility to provide these metrics to system components.
|
||||
|
||||
cAdvisor is structured to collect metrics on an interval, which is appropriate for a stand-alone metrics collector. However, many functions in the kubelet are latency-sensitive (eviction, for example), and would benifit from a more "On-Demand" metrics collection design.
|
||||
cAdvisor is structured to collect metrics on an interval, which is appropriate for a stand-alone metrics collector. However, many functions in the kubelet are latency-sensitive (eviction, for example), and would benefit from a more "On-Demand" metrics collection design.
|
||||
|
||||
### Proposal
|
||||
This proposal is to use this set of core metrics, collected by the kubelet, and used solely by kubernetes system components to support "First-Class Resource Isolation and Utilization Features". This proposal is not designed to be an API published by the kubelet, but rather a set of metrics collected by the kubelet that will be transformed, and published in the future.
|
||||
|
||||
The target "Users" of this set of metrics are kubernetes components (though not neccessarily directly). This set of metrics itself is not designed to be user-facing, but is designed to be general enough to support user-facing components.
|
||||
The target "Users" of this set of metrics are kubernetes components (though not necessarily directly). This set of metrics itself is not designed to be user-facing, but is designed to be general enough to support user-facing components.
|
||||
|
||||
### Non Goals
|
||||
Everything covered in the [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) design doc will not be covered in this proposal. This includes the third party metrics pipeline, and the methods by which the metrics found in this proposal are provided to other kubernetes components.
|
||||
|
|
@ -105,7 +104,7 @@ Metrics requirements for "First Class Resource Isolation and Utilization Feature
|
|||
|
||||
### Proposed Core Metrics:
|
||||
This section defines "usage metrics" for filesystems, CPU, and Memory.
|
||||
As stated in Non-Goals, this proposal does not attempt to define the specific format by which these are exposed. For convenience, it may be neccessary to include static information such as start time, node capacities for CPU, Memory, or filesystems, and more.
|
||||
As stated in Non-Goals, this proposal does not attempt to define the specific format by which these are exposed. For convenience, it may be necessary to include static information such as start time, node capacities for CPU, Memory, or filesystems, and more.
|
||||
|
||||
```go
|
||||
// CpuUsage holds statistics about the amount of cpu time consumed
|
||||
|
|
@ -146,17 +145,10 @@ The interface for exposing these metrics within the kubelet contains methods for
|
|||
Implementation:
|
||||
To keep performance bounded while still offering metrics "On-Demand", all calls to get metrics are cached, and a minimum recency is established to prevent repeated metrics computation. Before computing new metrics, the previous metrics are checked to see if they meet the recency requirements of the caller. If the age of the metrics meet the recency requirements, then the cached metrics are returned. If not, then new metrics are computed and cached.
|
||||
|
||||
## Implementation Plan
|
||||
@dashpole will modify the structure of metrics collection code to be "On-Demand".
|
||||
|
||||
## Future work
|
||||
Suggested, tentative future work, which may be covered by future proposals:
|
||||
- Publish these metrics in some form to a kubelet API endpoint
|
||||
- Obtain all runtime-specific information needed to collect metrics from the CRI.
|
||||
- Kubernetes can be configured to run a default "third party metrics provider" as a daemonset. Possibly standalone cAdvisor.
|
||||
|
||||
## Rollout Plan
|
||||
Once this set of metrics is accepted, @dashpole will begin discussions on the format, and design of the endpoint that exposes them. The node resource metrics endpoint (TBD) will be added alongside the current Summary API in an upcoming release. This should allow concurrent developments of other portions of the system metrics pipeline (metrics-server, for example). Once this addition is made, all other changes will be internal, and will not require any API changes.
|
||||
@dashpole will also start discussions on integrating with the CRI, and discussions on how to provide an out-of-the-box solution for the "third party monitoring" pipeline on the node. One current idea is a standalone verison of cAdvisor, but any third party metrics solution could serve this function as well.
|
||||
- Decide on the format, name, and kubelet endpoint for publishing these metrics.
|
||||
- Integrate with the CRI to allow compatibility with a greater number of runtimes, and to create a better runtime abstraction.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
|||
|
|
@ -0,0 +1,127 @@
|
|||
# CRI: Dockershim PodSandbox Checkpoint
|
||||
|
||||
## Umbrella Issue
|
||||
[#34672](https://github.com/kubernetes/kubernetes/issues/34672)
|
||||
|
||||
## Background
|
||||
[Container Runtime Interface (CRI)](../devel/container-runtime-interface.md)
|
||||
is an ongoing project to allow container runtimes to integrate with
|
||||
kubernetes via a newly-defined API.
|
||||
[Dockershim](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/dockershim)
|
||||
is the Docker CRI implementation. This proposal aims to introduce
|
||||
checkpoint mechanism in dockershim.
|
||||
|
||||
## Motivation
|
||||
### Why do we need checkpoint?
|
||||
|
||||
|
||||
With CRI, Kubelet only passes configurations (SandboxConfig,
|
||||
ContainerConfig and ImageSpec) when creating sandbox, container and
|
||||
image, and only use the reference id to manage them after creation.
|
||||
However, information in configuration is not only needed during creation.
|
||||
|
||||
In the case of dockershim with CNI network plugin, CNI plugins needs
|
||||
the same information from PodSandboxConfig at creation and deletion.
|
||||
|
||||
```
|
||||
Kubelet ---------------------------------
|
||||
| RunPodSandbox(PodSandboxConfig)
|
||||
| StopPodSandbox(PodSandboxID)
|
||||
V
|
||||
Dockershim-------------------------------
|
||||
| SetUpPod
|
||||
| TearDownPod
|
||||
V
|
||||
Network Plugin---------------------------
|
||||
| ADD
|
||||
| DEL
|
||||
V
|
||||
CNI plugin-------------------------------
|
||||
```
|
||||
|
||||
|
||||
In addition, checkpoint helps to improve the reliability of dockershim.
|
||||
With checkpoints, critical information for disaster recovery could be
|
||||
preserved. Kubelet makes decisions based on the reported pod states
|
||||
from runtime shims. Dockershim currently gathers states from docker
|
||||
engine. However, in case of disaster, docker engine may lose all
|
||||
container information, including the reference ids. Without necessary
|
||||
information, kubelet and dockershim could not conduct proper clean up.
|
||||
For example, if docker containers are removed underneath kubelet, reference
|
||||
to the allocated IPs and iptables setup for the pods are also lost.
|
||||
This leads to resource leak and potential iptables rule conflict.
|
||||
|
||||
### Why checkpoint in dockershim?
|
||||
- CNI specification does not require CNI plugins to be stateful. And CNI
|
||||
specification does not provide interface to retrieve states from CNI plugins.
|
||||
- Currently there is no uniform checkpoint requirements across existing runtime shims.
|
||||
- Need to preserve backward compatibility for kubelet.
|
||||
- Easier to maintain backward compatibility by checkpointing at a lower level.
|
||||
|
||||
## PodSandbox Checkpoint
|
||||
Checkpoint file will be created for each PodSandbox. Files will be
|
||||
placed under `/var/lib/dockershim/sandbox/`. File name will be the
|
||||
corresponding `PodSandboxID`. File content will be json encoded.
|
||||
Data structure is as follows:
|
||||
|
||||
```go
|
||||
const schemaVersion = "v1"
|
||||
|
||||
type Protocol string
|
||||
|
||||
// PortMapping is the port mapping configurations of a sandbox.
|
||||
type PortMapping struct {
|
||||
// Protocol of the port mapping.
|
||||
Protocol *Protocol `json:"protocol,omitempty"`
|
||||
// Port number within the container.
|
||||
ContainerPort *int32 `json:"container_port,omitempty"`
|
||||
// Port number on the host.
|
||||
HostPort *int32 `json:"host_port,omitempty"`
|
||||
}
|
||||
|
||||
// CheckpointData contains all types of data that can be stored in the checkpoint.
|
||||
type CheckpointData struct {
|
||||
PortMappings []*PortMapping `json:"port_mappings,omitempty"`
|
||||
}
|
||||
|
||||
// PodSandboxCheckpoint is the checkpoint structure for a sandbox
|
||||
type PodSandboxCheckpoint struct {
|
||||
// Version of the pod sandbox checkpoint schema.
|
||||
Version string `json:"version"`
|
||||
// Pod name of the sandbox. Same as the pod name in the PodSpec.
|
||||
Name string `json:"name"`
|
||||
// Pod namespace of the sandbox. Same as the pod namespace in the PodSpec.
|
||||
Namespace string `json:"namespace"`
|
||||
// Data to checkpoint for pod sandbox.
|
||||
Data *CheckpointData `json:"data,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Workflow Changes
|
||||
|
||||
|
||||
`RunPodSandbox` creates checkpoint:
|
||||
```
|
||||
() --> Pull Image --> Create Sandbox Container --> (Create Sandbox Checkpoint) --> Start Sandbox Container --> Set Up Network --> ()
|
||||
```
|
||||
|
||||
`RemovePodSandbox` removes checkpoint:
|
||||
```
|
||||
() --> Remove Sandbox --> (Remove Sandbox Checkpoint) --> ()
|
||||
```
|
||||
|
||||
`ListPodSandbox` need to include all PodSandboxes as long as their
|
||||
checkpoint files exist. If sandbox checkpoint exists but sandbox
|
||||
container could not be found, the PodSandbox object will include
|
||||
PodSandboxID, namespace and name. PodSandbox state will be `PodSandboxState_SANDBOX_NOTREADY`.
|
||||
|
||||
`StopPodSandbox` and `RemovePodSandbox` need to conduct proper error handling to ensure idempotency.
|
||||
|
||||
|
||||
|
||||
## Future extensions
|
||||
This proposal is mainly driven by networking use cases. More could be added into checkpoint.
|
||||
|
||||
|
||||
|
||||
|
|
@ -1,8 +1,8 @@
|
|||
# ScheduledJob Controller
|
||||
# CronJob Controller (previously ScheduledJob)
|
||||
|
||||
## Abstract
|
||||
|
||||
A proposal for implementing a new controller - ScheduledJob controller - which
|
||||
A proposal for implementing a new controller - CronJob controller - which
|
||||
will be responsible for managing time based jobs, namely:
|
||||
* once at a specified point in time,
|
||||
* repeatedly at a specified point in time.
|
||||
|
|
@ -23,20 +23,20 @@ There are also similar solutions available, already:
|
|||
|
||||
## Motivation
|
||||
|
||||
ScheduledJobs are needed for performing all time-related actions, namely backups,
|
||||
CronJobs are needed for performing all time-related actions, namely backups,
|
||||
report generation and the like. Each of these tasks should be allowed to run
|
||||
repeatedly (once a day/month, etc.) or once at a given point in time.
|
||||
|
||||
|
||||
## Design Overview
|
||||
|
||||
Users create a ScheduledJob object. One ScheduledJob object
|
||||
Users create a CronJob object. One CronJob object
|
||||
is like one line of a crontab file. It has a schedule of when to run,
|
||||
in [Cron](https://en.wikipedia.org/wiki/Cron) format.
|
||||
|
||||
|
||||
The ScheduledJob controller creates a Job object [Job](job.md)
|
||||
about once per execution time of the scheduled (e.g. once per
|
||||
The CronJob controller creates a Job object [Job](job.md)
|
||||
about once per execution time of the schedule (e.g. once per
|
||||
day for a daily schedule.) We say "about" because there are certain
|
||||
circumstances where two jobs might be created, or no job might be
|
||||
created. We attempt to make these rare, but do not completely prevent
|
||||
|
|
@ -44,45 +44,45 @@ them. Therefore, Jobs should be idempotent.
|
|||
|
||||
The Job object is responsible for any retrying of Pods, and any parallelism
|
||||
among pods it creates, and determining the success or failure of the set of
|
||||
pods. The ScheduledJob does not examine pods at all.
|
||||
pods. The CronJob does not examine pods at all.
|
||||
|
||||
|
||||
### ScheduledJob resource
|
||||
### CronJob resource
|
||||
|
||||
The new `ScheduledJob` object will have the following contents:
|
||||
The new `CronJob` object will have the following contents:
|
||||
|
||||
```go
|
||||
// ScheduledJob represents the configuration of a single scheduled job.
|
||||
type ScheduledJob struct {
|
||||
// CronJob represents the configuration of a single cron job.
|
||||
type CronJob struct {
|
||||
TypeMeta
|
||||
ObjectMeta
|
||||
|
||||
// Spec is a structure defining the expected behavior of a job, including the schedule.
|
||||
Spec ScheduledJobSpec
|
||||
Spec CronJobSpec
|
||||
|
||||
// Status is a structure describing current status of a job.
|
||||
Status ScheduledJobStatus
|
||||
Status CronJobStatus
|
||||
}
|
||||
|
||||
// ScheduledJobList is a collection of scheduled jobs.
|
||||
type ScheduledJobList struct {
|
||||
// CronJobList is a collection of cron jobs.
|
||||
type CronJobList struct {
|
||||
TypeMeta
|
||||
ListMeta
|
||||
|
||||
Items []ScheduledJob
|
||||
Items []CronJob
|
||||
}
|
||||
```
|
||||
|
||||
The `ScheduledJobSpec` structure is defined to contain all the information how the actual
|
||||
The `CronJobSpec` structure is defined to contain all the information how the actual
|
||||
job execution will look like, including the `JobSpec` from [Job API](job.md)
|
||||
and the schedule in [Cron](https://en.wikipedia.org/wiki/Cron) format. This implies
|
||||
that each ScheduledJob execution will be created from the JobSpec actual at a point
|
||||
that each CronJob execution will be created from the JobSpec actual at a point
|
||||
in time when the execution will be started. This also implies that any changes
|
||||
to ScheduledJobSpec will be applied upon subsequent execution of a job.
|
||||
to CronJobSpec will be applied upon subsequent execution of a job.
|
||||
|
||||
```go
|
||||
// ScheduledJobSpec describes how the job execution will look like and when it will actually run.
|
||||
type ScheduledJobSpec struct {
|
||||
// CronJobSpec describes how the job execution will look like and when it will actually run.
|
||||
type CronJobSpec struct {
|
||||
|
||||
// Schedule contains the schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
|
||||
Schedule string
|
||||
|
|
@ -99,12 +99,12 @@ type ScheduledJobSpec struct {
|
|||
Suspend bool
|
||||
|
||||
// JobTemplate is the object that describes the job that will be created when
|
||||
// executing a ScheduledJob.
|
||||
// executing a CronJob.
|
||||
JobTemplate *JobTemplateSpec
|
||||
}
|
||||
|
||||
// JobTemplateSpec describes of the Job that will be created when executing
|
||||
// a ScheduledJob, including its standard metadata.
|
||||
// a CronJob, including its standard metadata.
|
||||
type JobTemplateSpec struct {
|
||||
ObjectMeta
|
||||
|
||||
|
|
@ -119,7 +119,7 @@ type JobTemplateSpec struct {
|
|||
type ConcurrencyPolicy string
|
||||
|
||||
const (
|
||||
// AllowConcurrent allows ScheduledJobs to run concurrently.
|
||||
// AllowConcurrent allows CronJobs to run concurrently.
|
||||
AllowConcurrent ConcurrencyPolicy = "Allow"
|
||||
|
||||
// ForbidConcurrent forbids concurrent runs, skipping next run if previous
|
||||
|
|
@ -131,13 +131,13 @@ const (
|
|||
)
|
||||
```
|
||||
|
||||
`ScheduledJobStatus` structure is defined to contain information about scheduled
|
||||
`CronJobStatus` structure is defined to contain information about cron
|
||||
job executions. The structure holds a list of currently running job instances
|
||||
and additional information about overall successful and unsuccessful job executions.
|
||||
|
||||
```go
|
||||
// ScheduledJobStatus represents the current state of a Job.
|
||||
type ScheduledJobStatus struct {
|
||||
// CronJobStatus represents the current state of a Job.
|
||||
type CronJobStatus struct {
|
||||
// Active holds pointers to currently running jobs.
|
||||
Active []ObjectReference
|
||||
|
||||
|
|
@ -159,7 +159,7 @@ Users must use a generated selector for the job.
|
|||
TODO for beta: forbid manual selector since that could cause confusing between
|
||||
subsequent jobs.
|
||||
|
||||
### Running ScheduledJobs using kubectl
|
||||
### Running CronJobs using kubectl
|
||||
|
||||
A user should be able to easily start a Scheduled Job using `kubectl` (similarly
|
||||
to running regular jobs). For example to run a job with a specified schedule,
|
||||
|
|
@ -178,21 +178,21 @@ In the above example:
|
|||
|
||||
## Fields Added to Job Template
|
||||
|
||||
When the controller creates a Job from the JobTemplateSpec in the ScheduledJob, it
|
||||
When the controller creates a Job from the JobTemplateSpec in the CronJob, it
|
||||
adds the following fields to the Job:
|
||||
|
||||
- a name, based on the ScheduledJob's name, but with a suffix to distinguish
|
||||
- a name, based on the CronJob's name, but with a suffix to distinguish
|
||||
multiple executions, which may overlap.
|
||||
- the standard created-by annotation on the Job, pointing to the SJ that created it
|
||||
The standard key is `kubernetes.io/created-by`. The value is a serialized JSON object, like
|
||||
`{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ScheduledJob","namespace":"default",`
|
||||
`{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"CronJob","namespace":"default",`
|
||||
`"name":"nightly-earnings-report","uid":"5ef034e0-1890-11e6-8935-42010af0003e","apiVersion":...`
|
||||
This serialization contains the UID of the parent. This is used to match the Job to the SJ that created
|
||||
it.
|
||||
|
||||
## Updates to ScheduledJobs
|
||||
## Updates to CronJobs
|
||||
|
||||
If the schedule is updated on a ScheduledJob, it will:
|
||||
If the schedule is updated on a CronJob, it will:
|
||||
- continue to use the Status.Active list of jobs to detect conflicts.
|
||||
- try to fulfill all recently-passed times for the new schedule, by starting
|
||||
new jobs. But it will not try to fulfill times prior to the
|
||||
|
|
@ -202,16 +202,16 @@ If the schedule is updated on a ScheduledJob, it will:
|
|||
- Example: If you have a schedule to run every hour, change that to 30-minutely, at 31 minutes past the hour,
|
||||
one run will be started immediately for the starting time that has just passed.
|
||||
|
||||
If the job template of a ScheduledJob is updated, then future executions use the new template
|
||||
If the job template of a CronJob is updated, then future executions use the new template
|
||||
but old ones still satisfy the schedule and are not re-run just because the template changed.
|
||||
|
||||
If you delete and replace a ScheduledJob with one of the same name, it will:
|
||||
If you delete and replace a CronJob with one of the same name, it will:
|
||||
- not use any old Status.Active, and not consider any existing running or terminated jobs from the previous
|
||||
ScheduledJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
|
||||
CronJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
|
||||
- If there is an existing Job with the same time-based hash in its name (see below), then
|
||||
new instances of that job will not be able to be created. So, delete it if you want to re-run.
|
||||
with the same name as conflicts.
|
||||
- not "re-run" jobs for "start times" before the creation time of the new ScheduledJobJob object.
|
||||
- not "re-run" jobs for "start times" before the creation time of the new CronJobJob object.
|
||||
- not consider executions from the previous UID when making decisions about what executions to
|
||||
start, or status, etc.
|
||||
- lose the history of the old SJ.
|
||||
|
|
@ -223,11 +223,11 @@ To preserve status, you can suspend the old one, and make one with a new name, o
|
|||
|
||||
### Starting Jobs in the face of controller failures
|
||||
|
||||
If the process with the scheduledJob controller in it fails,
|
||||
and takes a while to restart, the scheduledJob controller
|
||||
If the process with the cronJob controller in it fails,
|
||||
and takes a while to restart, the cronJob controller
|
||||
may miss the time window and it is too late to start a job.
|
||||
|
||||
With a single scheduledJob controller process, we cannot give
|
||||
With a single cronJob controller process, we cannot give
|
||||
very strong assurances about not missing starting jobs.
|
||||
|
||||
With a suggested HA configuration, there are multiple controller
|
||||
|
|
@ -254,10 +254,10 @@ There are three problems here:
|
|||
|
||||
Multiple jobs might be created in the following sequence:
|
||||
|
||||
1. scheduled job controller sends request to start Job J1 to fulfill start time T.
|
||||
1. cron job controller sends request to start Job J1 to fulfill start time T.
|
||||
1. the create request is accepted by the apiserver and enqueued but not yet written to etcd.
|
||||
1. scheduled job controller crashes
|
||||
1. new scheduled job controller starts, and lists the existing jobs, and does not see one created.
|
||||
1. cron job controller crashes
|
||||
1. new cron job controller starts, and lists the existing jobs, and does not see one created.
|
||||
1. it creates a new one.
|
||||
1. the first one eventually gets written to etcd.
|
||||
1. there are now two jobs for the same start time.
|
||||
|
|
@ -286,24 +286,24 @@ This is too hard to do for the alpha version. We will await user
|
|||
feedback to see if the "at most once" property is needed in the beta version.
|
||||
|
||||
This is awkward but possible for a containerized application ensure on it own, as it needs
|
||||
to know what ScheduledJob name and Start Time it is from, and then record the attempt
|
||||
to know what CronJob name and Start Time it is from, and then record the attempt
|
||||
in a shared storage system. We should ensure it could extract this data from its annotations
|
||||
using the downward API.
|
||||
|
||||
## Name of Jobs
|
||||
|
||||
A ScheduledJob creates one Job at each time when a Job should run.
|
||||
A CronJob creates one Job at each time when a Job should run.
|
||||
Since there may be concurrent jobs, and since we might want to keep failed
|
||||
non-overlapping Jobs around as a debugging record, each Job created by the same ScheduledJob
|
||||
non-overlapping Jobs around as a debugging record, each Job created by the same CronJob
|
||||
needs a distinct name.
|
||||
|
||||
To make the Jobs from the same ScheduledJob distinct, we could use a random string,
|
||||
in the way that pods have a `generateName`. For example, a scheduledJob named `nightly-earnings-report`
|
||||
To make the Jobs from the same CronJob distinct, we could use a random string,
|
||||
in the way that pods have a `generateName`. For example, a cronJob named `nightly-earnings-report`
|
||||
in namespace `ns1` might create a job `nightly-earnings-report-3m4d3`, and later create
|
||||
a job called `nightly-earnings-report-6k7ts`. This is consistent with pods, but
|
||||
does not give the user much information.
|
||||
|
||||
Alternatively, we can use time as a uniquifier. For example, the same scheduledJob could
|
||||
Alternatively, we can use time as a uniquifier. For example, the same cronJob could
|
||||
create a job called `nightly-earnings-report-2016-May-19`.
|
||||
However, for Jobs that run more than once per day, we would need to represent
|
||||
time as well as date. Standard date formats (e.g. RFC 3339) use colons for time.
|
||||
|
|
@ -312,7 +312,7 @@ will annoy some users.
|
|||
|
||||
Also, date strings are much longer than random suffixes, which means that
|
||||
the pods will also have long names, and that we are more likely to exceed the
|
||||
253 character name limit when combining the scheduled-job name,
|
||||
253 character name limit when combining the cron-job name,
|
||||
the time suffix, and pod random suffix.
|
||||
|
||||
One option would be to compute a hash of the nominal start time of the job,
|
||||
|
|
@ -331,5 +331,5 @@ Below are the possible future extensions to the Job controller:
|
|||
types of resources. This relates to the work happening in [#18215](https://issues.k8s.io/18215).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
@ -0,0 +1,329 @@
|
|||
Custom Metrics API
|
||||
==================
|
||||
|
||||
The new [metrics monitoring vision](monitoring_architecture.md) proposes
|
||||
an API that the Horizontal Pod Autoscaler can use to access arbitrary
|
||||
metrics.
|
||||
|
||||
Similarly to the [master metrics API](resource-metrics-api.md), the new
|
||||
API should be structured around accessing metrics by referring to
|
||||
Kubernetes objects (or groups thereof) and a metric name. For this
|
||||
reason, the API could be useful for other consumers (most likely
|
||||
controllers) that want to consume custom metrics (similarly to how the
|
||||
master metrics API is generally useful to multiple cluster components).
|
||||
|
||||
The HPA can refer to metrics describing all pods matching a label
|
||||
selector, as well as an arbitrary named object.
|
||||
|
||||
API Paths
|
||||
---------
|
||||
|
||||
The root API path will look like `/apis/custom-metrics/v1alpha1`. For
|
||||
brevity, this will be left off below.
|
||||
|
||||
- `/{object-type}/{object-name}/{metric-name...}`: retrieve the given
|
||||
metric for the given non-namespaced object (e.g. Node, PersistentVolume)
|
||||
|
||||
- `/{object-type}/*/{metric-name...}`: retrieve the given metric for all
|
||||
non-namespaced objects of the given type
|
||||
|
||||
- `/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the
|
||||
given metric for all non-namespaced objects of the given type matching
|
||||
the given label selector
|
||||
|
||||
- `/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}`:
|
||||
retrieve the given metric for the given namespaced object
|
||||
|
||||
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}`: retrieve the given metric for all
|
||||
namespaced objects of the given type
|
||||
|
||||
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the given
|
||||
metric for all namespaced objects of the given type matching the
|
||||
given label selector
|
||||
|
||||
- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given
|
||||
metric which describes the given namespace.
|
||||
|
||||
For example, to retrieve the custom metric "hits-per-second" for all
|
||||
ingress objects matching "app=frontend` in the namespaces "webapp", the
|
||||
request might look like:
|
||||
|
||||
```
|
||||
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/ingress.extensions/*/hits-per-second?labelSelector=app%3Dfrontend`
|
||||
|
||||
---
|
||||
|
||||
Verb: GET
|
||||
Namespace: webapp
|
||||
APIGroup: custom-metrics
|
||||
APIVersion: v1alpha1
|
||||
Resource: ingress.extensions
|
||||
Subresource: hits-per-second
|
||||
Name: ResourceAll(*)
|
||||
```
|
||||
|
||||
Notice that getting metrics which describe a namespace follows a slightly
|
||||
different pattern from other resources; Since namespaces cannot feasibly
|
||||
have unbounded subresource names (due to collision with resource names,
|
||||
etc), we introduce a pseudo-resource named "metrics", which represents
|
||||
metrics describing namespaces, where the resource name is the metric name:
|
||||
|
||||
```
|
||||
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/metrics/queue-length
|
||||
|
||||
---
|
||||
|
||||
Verb: GET
|
||||
Namespace: webapp
|
||||
APIGroup: custom-metrics
|
||||
APIVersion: v1alpha1
|
||||
Resource: metrics
|
||||
Name: queue-length
|
||||
```
|
||||
|
||||
NB: the branch-node LIST operations (e.g. `LIST
|
||||
/apis/custom-metrics/v1alpha1/namespaces/webapp/pods/`) are unsupported in
|
||||
v1alpha1. They may be defined in a later version of the API.
|
||||
|
||||
API Path Design, Discovery, and Authorization
|
||||
---------------------------------------------
|
||||
|
||||
The API paths in this proposal are designed to a) resemble normal
|
||||
Kubernetes APIs, b) facilitate writing authorization rules, and c)
|
||||
allow for discovery.
|
||||
|
||||
Since the API structure follows the same structure as other Kubernetes
|
||||
APIs, it allows for fine grained control over access to metrics. Access
|
||||
can be controlled on a per-metric basic (each metric is a subresource, so
|
||||
metrics may be whitelisted by allowing access to a particular
|
||||
resource-subresource pair), or granted in general for a namespace (by
|
||||
allowing access to any resource in the `custom-metrics` API group).
|
||||
|
||||
Similarly, since metrics are simply subresources, a normal Kubernetes API
|
||||
discovery document can be published by the adapter's API server, allowing
|
||||
clients to discover the available metrics.
|
||||
|
||||
Note that we introduce the syntax of having a name of ` * ` here since
|
||||
there is no current syntax for getting the output of a subresource on
|
||||
multiple objects.
|
||||
|
||||
API Objects
|
||||
-----------
|
||||
|
||||
The request URLs listed above will return the `MetricValueList` type described
|
||||
below (when a name is given that is not ` * `, the API should simply return a
|
||||
list with a single element):
|
||||
|
||||
```go
|
||||
|
||||
// a list of values for a given metric for some set of objects
|
||||
type MetricValueList struct {
|
||||
metav1.TypeMeta`json:",inline"`
|
||||
metav1.ListMeta`json:"metadata,omitempty"`
|
||||
|
||||
// the value of the metric across the described objects
|
||||
Items []MetricValue `json:"items"`
|
||||
}
|
||||
|
||||
// a metric value for some object
|
||||
type MetricValue struct {
|
||||
metav1.TypeMeta`json:",inline"`
|
||||
|
||||
// a reference to the described object
|
||||
DescribedObject ObjectReference `json:"describedObject"`
|
||||
|
||||
// the name of the metric
|
||||
MetricName string `json:"metricName"`
|
||||
|
||||
// indicates the time at which the metrics were produced
|
||||
Timestamp unversioned.Time `json:"timestamp"`
|
||||
|
||||
// indicates the window ([Timestamp-Window, Timestamp]) from
|
||||
// which these metrics were calculated, when returning rate
|
||||
// metrics calculated from cumulative metrics (or zero for
|
||||
// non-calculated instantaneous metrics).
|
||||
WindowSeconds *int64 `json:"window,omitempty"`
|
||||
|
||||
// the value of the metric for this
|
||||
Value resource.Quantity
|
||||
}
|
||||
```
|
||||
|
||||
For instance, the example request above would yield the following object:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "MetricValueList",
|
||||
"apiVersion": "custom-metrics/v1alpha1",
|
||||
"items": [
|
||||
{
|
||||
"metricName": "hits-per-second",
|
||||
"describedObject": {
|
||||
"kind": "Ingress",
|
||||
"apiVersion": "extensions",
|
||||
"name": "server1",
|
||||
"namespace": "webapp"
|
||||
},
|
||||
"timestamp": SOME_TIMESTAMP_HERE,
|
||||
"windowSeconds": "10",
|
||||
"value": "10"
|
||||
},
|
||||
{
|
||||
"metricName": "hits-per-second",
|
||||
"describedObject": {
|
||||
"kind": "Ingress",
|
||||
"apiVersion": "extensions",
|
||||
"name": "server2",
|
||||
"namespace": "webapp"
|
||||
},
|
||||
"timestamp": ANOTHER_TIMESTAMP_HERE,
|
||||
"windowSeconds": "10",
|
||||
"value": "15"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Semantics
|
||||
---------
|
||||
|
||||
### Object Types ###
|
||||
|
||||
In order to properly identify resources, we must use resource names
|
||||
qualified with group names (since the group for the requests will always
|
||||
be `custom-metrics`).
|
||||
|
||||
The `object-type` parameter should be the string form of
|
||||
`unversioned.GroupResource`. Note that we do not include version in this;
|
||||
we simply wish to uniquely identify all the different types of objects in
|
||||
Kubernetes. For example, the pods resource (which exists in the un-named
|
||||
legacy API group) would be represented simply as `pods`, while the jobs
|
||||
resource (which exists in the `batch` API group) would be represented as
|
||||
`jobs.batch`.
|
||||
|
||||
In the case of cross-group object renames, the adapter should maintain
|
||||
a list of "equivalent versions" that the monitoring system uses. This is
|
||||
monitoring-system dependent (for instance, the monitoring system might
|
||||
record all HorizontalPodAutoscalers as in `autoscaling`, but should be
|
||||
aware that HorizontalPodAutoscaler also exist in `extensions`).
|
||||
|
||||
Note that for namespace metrics, we use a pseudo-resource called
|
||||
`metrics`. Since there is no resource in the legacy API group, this will
|
||||
not clash with any existing resources.
|
||||
|
||||
### Metric Names ###
|
||||
|
||||
Metric names must be able to appear as a single subresource. In particular,
|
||||
metric names, *as passed to the API*, may not contain the characters '%', '/',
|
||||
or '?', and may not be named '.' or '..' (but may contain these sequences).
|
||||
Note, specifically, that URL encoding is not acceptable to escape the forbidden
|
||||
characters, due to issues in the Go URL handling libraries. Otherwise, metric
|
||||
names are open-ended.
|
||||
|
||||
### Metric Values and Timing ###
|
||||
|
||||
There should be only one metric value per object requested. The returned
|
||||
metrics should be the most recently available metrics, as with the resource
|
||||
metrics API. Implementers *should* attempt to return all metrics with roughly
|
||||
identical timestamps and windows (when appropriate), but consumers should also
|
||||
verify that any differences in timestamps are within tolerances for
|
||||
a particular application (e.g. a dashboard might simply display the older
|
||||
metric with a note, while the horizontal pod autoscaler controller might choose
|
||||
to pretend it did not receive that metric value).
|
||||
|
||||
### Labeled Metrics (or lack thereof) ###
|
||||
|
||||
For metrics systems that support differentiating metrics beyond the
|
||||
Kubernetes object hierarchy (such as using additional labels), the metrics
|
||||
systems should have a metric which represents all such series aggregated
|
||||
together. Additionally, implementors may choose to identify the individual
|
||||
"sub-metrics" via the metric name, but this is expected to be fairly rare,
|
||||
since it most likely requires specific knowledge of individual metrics.
|
||||
For instance, suppose we record filesystem usage by filesystem inside the
|
||||
container. There should then be a metric `filesystem/usage`, and the
|
||||
implementors of the API may choose to expose more detailed metrics like
|
||||
`filesystem/usage/my-first-filesystem`.
|
||||
|
||||
### Resource Versions ###
|
||||
|
||||
API implementors should set the `resourceVersion` field based on the
|
||||
scrape time of the metric. The resource version is expected to increment
|
||||
when the scrape/collection time of the returned metric changes. While the
|
||||
API does not support writes, and does not currently support watches,
|
||||
populating resource version preserves the normal expected Kubernetes API
|
||||
semantics.
|
||||
|
||||
Relationship to HPA v2
|
||||
----------------------
|
||||
|
||||
The URL paths in this API are designed to correspond to different source
|
||||
types in the [HPA v2](hpa-v2.md). Specifially, the `pods` source type
|
||||
corresponds to a URL of the form
|
||||
`/namespaces/$NS/pods/*/$METRIC_NAME?labelSelector=foo`, while the
|
||||
`object` source type corresponds to a URL of the form
|
||||
`/namespaces/$NS/$RESOURCE.$GROUP/$OBJECT_NAME/$METRIC_NAME`.
|
||||
|
||||
The HPA then takes the results, aggregates them together (in the case of
|
||||
the former source type), and uses the resulting value to produce a usage
|
||||
ratio.
|
||||
|
||||
The resource source type is taken from the API provided by the
|
||||
"metrics" API group (the master/resource metrics API).
|
||||
|
||||
The HPA will consume the API as a federated API server.
|
||||
|
||||
Relationship to Resource Metrics API
|
||||
------------------------------------
|
||||
|
||||
The metrics presented by this API may be a superset of those present in the
|
||||
resource metrics API, but this is not guaranteed. Clients that need the
|
||||
information in the resource metrics API should use that to retrieve those
|
||||
metrics, and supplement those metrics with this API.
|
||||
|
||||
Mechanical Concerns
|
||||
-------------------
|
||||
|
||||
This API is intended to be implemented by monitoring pipelines (e.g.
|
||||
inside Heapster, or as an adapter on top of a solution like Prometheus).
|
||||
It shares many mechanical requirements with normal Kubernetes APIs, such
|
||||
as the need to support encoding different versions of objects in both JSON
|
||||
and protobuf, as well as acting as a discoverable API server. For these
|
||||
reasons, it is expected that implemenators will make use of the Kubernetes
|
||||
genericapiserver code. If implementors choose not to use this, they must
|
||||
still follow all of the Kubernetes API server conventions in order to work
|
||||
properly with consumers of the API.
|
||||
|
||||
Specifically, they must support the semantics of the GET verb in
|
||||
Kubernetes, including outputting in different API versions and formats as
|
||||
requested by the client. They must support integrating with API discovery
|
||||
(including publishing a discovery document, etc).
|
||||
|
||||
Location
|
||||
--------
|
||||
|
||||
The types and clients for this API will live in a separate repository
|
||||
under the Kubernetes organization (e.g. `kubernetes/metrics`). This
|
||||
repository will most likely also house other metrics-related APIs for
|
||||
Kubernetes (e.g. historical metrics API definitions, the resource metrics
|
||||
API definitions, etc).
|
||||
|
||||
Note that there will not be a canonical implemenation of the custom
|
||||
metrics API under Kubernetes, just the types and clients. Implementations
|
||||
will be left up to the monitoring pipelines.
|
||||
|
||||
Alternative Considerations
|
||||
--------------------------
|
||||
|
||||
### Quantity vs Float ###
|
||||
|
||||
In the past, custom metrics were represented as floats. In general,
|
||||
however, Kubernetes APIs are not supposed to use floats. The API proposed
|
||||
above thus uses `resource.Quantity`. This adds a bit of encoding
|
||||
overhead, but makes the API line up nicely with other Kubernetes APIs.
|
||||
|
||||
### Labeled Metrics ###
|
||||
|
||||
Many metric systems support labeled metrics, allowing for dimenisionality
|
||||
beyond the Kubernetes object hierarchy. Since the HPA currently doesn't
|
||||
support specifying metric labels, this is not supported via this API. We
|
||||
may wish to explore this in the future.
|
||||
|
|
@ -60,6 +60,7 @@ changes:
|
|||
type DaemonSetUpdateStrategy struct {
|
||||
// Type of daemon set update. Can be "RollingUpdate" or "OnDelete".
|
||||
// Default is OnDelete.
|
||||
// +optional
|
||||
Type DaemonSetUpdateStrategyType
|
||||
|
||||
// Rolling update config params. Present only if DaemonSetUpdateStrategy =
|
||||
|
|
@ -68,6 +69,7 @@ type DaemonSetUpdateStrategy struct {
|
|||
// TODO: Update this to follow our convention for oneOf, whatever we decide it
|
||||
// to be. Same as DeploymentStrategy.RollingUpdate.
|
||||
// See https://github.com/kubernetes/kubernetes/issues/35345
|
||||
// +optional
|
||||
RollingUpdate *RollingUpdateDaemonSet
|
||||
}
|
||||
|
||||
|
|
@ -96,51 +98,62 @@ type RollingUpdateDaemonSet struct {
|
|||
// it then proceeds onto other DaemonSet pods, thus ensuring that at least
|
||||
// 70% of original number of DaemonSet pods are available at all times
|
||||
// during the update.
|
||||
// +optional
|
||||
MaxUnavailable intstr.IntOrString
|
||||
}
|
||||
|
||||
// DaemonSetSpec is the specification of a daemon set.
|
||||
type DaemonSetSpec struct {
|
||||
// Note: Existing fields, including Selector and Template are ommitted in
|
||||
// Note: Existing fields, including Selector and Template are omitted in
|
||||
// this proposal.
|
||||
|
||||
// Update strategy to replace existing DaemonSet pods with new pods.
|
||||
// +optional
|
||||
UpdateStrategy DaemonSetUpdateStrategy `json:"updateStrategy,omitempty"`
|
||||
|
||||
// Minimum number of seconds for which a newly created DaemonSet pod should
|
||||
// be ready without any of its container crashing, for it to be considered
|
||||
// available. Defaults to 0 (pod will be considered available as soon as it
|
||||
// is ready).
|
||||
// +optional
|
||||
MinReadySeconds int32 `json:"minReadySeconds,omitempty"`
|
||||
}
|
||||
|
||||
const (
|
||||
// DefaultDaemonSetUniqueLabelKey is the default key of the labels that is added
|
||||
// to daemon set pods to distinguish between old and new pod templates during
|
||||
// DaemonSet update.
|
||||
DefaultDaemonSetUniqueLabelKey string = "pod-template-hash"
|
||||
)
|
||||
// A sequence number representing a specific generation of the template.
|
||||
// Populated by the system. Can be set at creation time. Read-only otherwise.
|
||||
// +optional
|
||||
TemplateGeneration int64 `json:"templateGeneration,omitempty"`
|
||||
}
|
||||
|
||||
// DaemonSetStatus represents the current status of a daemon set.
|
||||
type DaemonSetStatus struct {
|
||||
// Note: Existing fields, including CurrentNumberScheduled, NumberMissscheduled,
|
||||
// DesiredNumberScheduled, NumberReady, and ObservedGeneration are ommitted in
|
||||
// DesiredNumberScheduled, NumberReady, and ObservedGeneration are omitted in
|
||||
// this proposal.
|
||||
|
||||
// UpdatedNumberScheduled is the total number of nodes that are running updated
|
||||
// daemon pod
|
||||
// +optional
|
||||
UpdatedNumberScheduled int32 `json:"updatedNumberScheduled"`
|
||||
|
||||
// NumberAvailable is the number of nodes that should be running the
|
||||
// daemon pod and have one or more of the daemon pod running and
|
||||
// available (ready for at least minReadySeconds)
|
||||
// +optional
|
||||
NumberAvailable int32 `json:"numberAvailable"`
|
||||
|
||||
// NumberUnavailable is the number of nodes that should be running the
|
||||
// daemon pod and have non of the daemon pod running and available
|
||||
// (ready for at least minReadySeconds)
|
||||
// +optional
|
||||
NumberUnavailable int32 `json:"numberUnavailable"`
|
||||
}
|
||||
|
||||
const (
|
||||
// DaemonSetTemplateGenerationKey is the key of the labels that is added
|
||||
// to daemon set pods to distinguish between old and new pod templates
|
||||
// during DaemonSet template update.
|
||||
DaemonSetTemplateGenerationKey string = "pod-template-generation"
|
||||
)
|
||||
```
|
||||
|
||||
### Controller
|
||||
|
|
@ -158,9 +171,13 @@ For each pending DaemonSet updates, it will:
|
|||
1. Check `DaemonSetUpdateStrategy`:
|
||||
- If `OnDelete`: do nothing
|
||||
- If `RollingUpdate`:
|
||||
- Compare spec of the daemon pods from step 1 with DaemonSet
|
||||
`.spec.template.spec` to see if DaemonSet spec has changed.
|
||||
- If DaemonSet spec has changed, compare `MaxUnavailable` with DaemonSet
|
||||
- Find all daemon pods that belong to this DaemonSet using label selectors.
|
||||
- Pods with "pod-template-generation" value equal to the DaemonSet's
|
||||
`.spec.templateGeneration` are new pods, otherwise they're old pods.
|
||||
- Note that pods without "pod-template-generation" labels (e.g. DaemonSet
|
||||
pods created before RollingUpdate strategy is implemented) will be
|
||||
seen as old pods.
|
||||
- If there are old pods found, compare `MaxUnavailable` with DaemonSet
|
||||
`.status.numberUnavailable` to see how many old daemon pods can be
|
||||
killed. Then, kill those pods in the order that unhealthy pods (failed,
|
||||
pending, not ready) are killed first.
|
||||
|
|
@ -175,6 +192,12 @@ For each pending DaemonSet updates, it will:
|
|||
|
||||
If DaemonSet Controller crashes during an update, it can still recover.
|
||||
|
||||
#### API Server
|
||||
|
||||
In DaemonSet strategy (pkg/registry/extensions/daemonset/strategy.go#PrepareForUpdate),
|
||||
increase DaemonSet's `.spec.templateGeneration` by 1 if any changes is made to
|
||||
DaemonSet's `.spec.template`.
|
||||
|
||||
### kubectl
|
||||
|
||||
#### kubectl rollout
|
||||
|
|
@ -270,21 +293,24 @@ Another way to implement DaemonSet history is through creating `PodTemplates` as
|
|||
snapshots of DaemonSet templates, and then create them in DaemonSet controller:
|
||||
|
||||
- Find existing PodTemplates whose labels are matched by DaemonSet
|
||||
`.spec.selector`
|
||||
`.spec.selector`.
|
||||
- Sort those PodTemplates by creation timestamp and only retain at most
|
||||
`.spec.revisionHistoryLimit` latest PodTemplates (remove the rest)
|
||||
- Find the PodTemplate whose `.template` is the same as DaemonSet
|
||||
`.spec.template`. If not found, create a new PodTemplate from DaemonSet
|
||||
`.spec.template`:
|
||||
- The name will be `<DaemonSet-Name>-<Hash-of-pod-template>`
|
||||
- PodTemplate `.metadata.labels` will have a "pod-template-hash" label,
|
||||
value be the hash of PodTemplate `.template` (note: don't include the
|
||||
"pod-template-hash" label when calculating hash)
|
||||
- PodTemplate `.metadata.annotations` will be copied from DaemonSet
|
||||
`.metadata.annotations`
|
||||
|
||||
Note that when the DaemonSet controller creates pods, those pods will be created
|
||||
with the "pod-template-hash" label.
|
||||
`.spec.revisionHistoryLimit` latest PodTemplates (remove the rest).
|
||||
- Find the PodTemplate whose `.template` is the same as the DaemonSet's
|
||||
`.spec.template`.
|
||||
- If not found, create a new PodTemplate from DaemonSet's
|
||||
`.spec.template`:
|
||||
- The name will be `<DaemonSet-Name>-<template-generation>`
|
||||
- PodTemplate `.metadata.labels` will have a "pod-template-generation"
|
||||
label, value be the same as DaemonSet's `.spec.templateGeneration`.
|
||||
- PodTemplate will have revision information to avoid triggering
|
||||
unnecessary restarts on rollback, since we only roll forward and only
|
||||
increase templateGeneration.
|
||||
- PodTemplate `.metadata.annotations` will be copied from DaemonSet
|
||||
`.metadata.annotations`.
|
||||
- If the PodTemplate is found, sync its "pod-template-generation" and
|
||||
revision information with current DaemonSet.
|
||||
- DaemonSet creates pods with "pod-template-generation" label.
|
||||
|
||||
PodTemplate may need to be made an admin-only or read only resource if it's used
|
||||
to store DaemonSet history.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,69 @@
|
|||
# Deploying a default StorageClass during installation
|
||||
|
||||
## Goal
|
||||
|
||||
Usual Kubernetes installation tools should deploy a default StorageClass
|
||||
where it makes sense.
|
||||
|
||||
"*Usual installation tools*" are:
|
||||
|
||||
* cluster/kube-up.sh
|
||||
* kops
|
||||
* kubeadm
|
||||
|
||||
Other "installation tools" can (and should) deploy default StorageClass
|
||||
following easy steps described in this document, however we won't touch them
|
||||
during implementation of this proposal.
|
||||
|
||||
"*Where it makes sense*" are:
|
||||
|
||||
* AWS
|
||||
* Azure
|
||||
* GCE
|
||||
* Photon
|
||||
* OpenStack
|
||||
* vSphere
|
||||
|
||||
Explicitly, there is no default storage class on bare metal.
|
||||
|
||||
## Motivation
|
||||
|
||||
In Kubernetes 1.5, we had "alpha" dynamic provisioning on aforementioned cloud
|
||||
platforms. In 1.6 we want to deprecate this alpha provisioning. In order to keep
|
||||
the same user experience, we need a default StorageClass instance that would
|
||||
provision volumes for PVCs that do not request any special class. As
|
||||
consequence, this default StorageClass would provision volumes for PVCs with
|
||||
"alpha" provisioning annotation - this annotation would be ignored in 1.6 and
|
||||
default storage class would be assumed.
|
||||
|
||||
## Design
|
||||
|
||||
1. Kubernetes will ship yaml files for default StorageClasses for each platform
|
||||
as `cluster/addons/storage-class/<platform>/default.yaml` and all these
|
||||
default classes will distributed together with all other addons in
|
||||
`kubernetes.tar.gz`.
|
||||
|
||||
2. An installation tool will discover on which platform it runs and installs
|
||||
appropriate yaml file into usual directory for addon manager (typically
|
||||
`/etc/kubernetes/addons/storage-class/default.yaml`).
|
||||
|
||||
3. Addon manager will deploy the storage class into installed cluster in usual
|
||||
way. We need to update addon manager not to overwrite any existing object
|
||||
in case cluster admin has manually disabled this default storage class!
|
||||
|
||||
## Implementation
|
||||
|
||||
* AWS, GCE and OpenStack has a default StorageClass in
|
||||
`cluster/addons/storage-class/<platform>/` - already done in 1.5
|
||||
|
||||
* We need a default StorageClass for vSphere, Azure and Photon in `cluster/addons/storage-class/<platform>`
|
||||
|
||||
* cluster/kube-up.sh scripts need to be updated to install the storage class on appropriate platforms
|
||||
* Already done on GCE, AWS and OpenStack.
|
||||
|
||||
* kops needs to be updated to install the storage class on appropriate platforms
|
||||
* already done for kops on AWS and kops does not support other platforms yet.
|
||||
|
||||
* kubeadm needs to be updated to install the storage class on appropriate platforms (if it is cloud-provider aware)
|
||||
|
||||
* addon manager fix: https://github.com/kubernetes/kubernetes/issues/39561
|
||||
|
|
@ -109,10 +109,10 @@ spec:
|
|||
- containerPort: 2380
|
||||
protocol: TCP
|
||||
env:
|
||||
- Name: duplicate_key
|
||||
Value: FROM_ENV
|
||||
- Name: expansion
|
||||
Value: $(REPLACE_ME)
|
||||
- name: duplicate_key
|
||||
value: FROM_ENV
|
||||
- name: expansion
|
||||
value: $(REPLACE_ME)
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: etcd-env-config
|
||||
|
|
|
|||
|
|
@ -53,7 +53,7 @@ lot of applications and customer use-cases.
|
|||
# Alpha Design
|
||||
|
||||
This section describes the proposed design for
|
||||
[alpha-level](../../docs/devel/api_changes.md#alpha-beta-and-stable-versions) support, although
|
||||
[alpha-level](../devel/api_changes.md#alpha-beta-and-stable-versions) support, although
|
||||
additional features are described in [future work](#future-work).
|
||||
|
||||
## Overview
|
||||
|
|
|
|||
|
|
@ -391,7 +391,7 @@ Include federated replica set name in the cluster name hash so that we get
|
|||
slightly different ordering for different RS. So that not all RS of size 1
|
||||
end up on the same cluster.
|
||||
|
||||
3. Assign minimum prefered number of replicas to each of the clusters, if
|
||||
3. Assign minimum preferred number of replicas to each of the clusters, if
|
||||
there is enough replicas and capacity.
|
||||
|
||||
4. If rebalance = false, assign the previously present replicas to the clusters,
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
Horizontal Pod Autoscaler with Arbitary Metrics
|
||||
Horizontal Pod Autoscaler with Arbitrary Metrics
|
||||
===============================================
|
||||
|
||||
The current Horizontal Pod Autoscaler object only has support for CPU as
|
||||
|
|
|
|||
|
|
@ -425,6 +425,15 @@ from placing new best effort pods on the node since they will be rejected by the
|
|||
On the other hand, the `DiskPressure` condition if true should dissuade the scheduler from
|
||||
placing **any** new pods on the node since they will be rejected by the `kubelet` in admission.
|
||||
|
||||
## Enforcing Node Allocatable
|
||||
|
||||
To enforce [Node Allocatable](./node-allocatable.md), Kubelet primarily uses cgroups.
|
||||
However `storage` cannot be enforced using cgroups.
|
||||
|
||||
Once Kubelet supports `storage` as an `Allocatable` resource, Kubelet will perform evictions whenever the total storage usage by pods exceed node allocatable.
|
||||
|
||||
If a pod cannot tolerate evictions, then ensure that requests is set and it will not exceed `requests`.
|
||||
|
||||
## Best Practices
|
||||
|
||||
### DaemonSet
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ a number of dependencies that must exist in its filesystem, including various
|
|||
mount and network utilities. Missing any of these can lead to unexpected
|
||||
differences between Kubernetes hosts. For example, the Google Container VM
|
||||
image (GCI) is missing various mount commands even though the Kernel supports
|
||||
those filesystem types. Similarly, CoreOS Linux intentionally doesn't ship with
|
||||
those filesystem types. Similarly, CoreOS Container Linux intentionally doesn't ship with
|
||||
many mount utilities or socat in the base image. Other distros have a related
|
||||
problem of ensuring these dependencies are present and versioned appropriately
|
||||
for the Kubelet.
|
||||
|
|
@ -38,7 +38,7 @@ mount --rbind /var/lib/kubelet /path/to/chroot/var/lib/kubelet
|
|||
chroot /path/to/kubelet /usr/bin/hyperkube kubelet
|
||||
```
|
||||
|
||||
Note: Kubelet might need access to more directories on the host and we intend to identity mount all those directories into the chroot. A partial list can be found in the CoreOS kubelet-wrapper script.
|
||||
Note: Kubelet might need access to more directories on the host and we intend to identity mount all those directories into the chroot. A partial list can be found in the CoreOS Container Linux kubelet-wrapper script.
|
||||
This logic will also naturally be abstracted so it's no more difficult for the user to run the Kubelet.
|
||||
|
||||
Currently, the Kubelet does not need access to arbitrary paths on the host (as
|
||||
|
|
@ -53,13 +53,13 @@ chroot.
|
|||
|
||||
## Current Use
|
||||
|
||||
This method of running the Kubelet is already in use by users of CoreOS Linux. The details of this implementation are found in the [kubelet wrapper documentation](https://coreos.com/kubernetes/docs/latest/kubelet-wrapper.html).
|
||||
This method of running the Kubelet is already in use by users of CoreOS Container Linux. The details of this implementation are found in the [kubelet wrapper documentation](https://coreos.com/kubernetes/docs/latest/kubelet-wrapper.html).
|
||||
|
||||
## Implementation
|
||||
|
||||
### Target Distros
|
||||
|
||||
The two distros which benefit the most from this change are GCI and CoreOS. Initially, these changes will only be implemented for those distros.
|
||||
The two distros which benefit the most from this change are GCI and CoreOS Container Linux. Initially, these changes will only be implemented for those distros.
|
||||
|
||||
This work will also only initially target the GCE provider and `kube-up` method of deployment.
|
||||
|
||||
|
|
@ -139,7 +139,7 @@ Similarly, for the mount utilities, the [Flex Volume v2](https://github.com/kube
|
|||
|
||||
**Downsides**:
|
||||
|
||||
This requires waiting on other features which might take a signficant time to land. It also could end up not fully fixing the problem (e.g. pushing down port-forwarding to the runtime doesn't ensure the runtime doesn't rely on host utilities).
|
||||
This requires waiting on other features which might take a significant time to land. It also could end up not fully fixing the problem (e.g. pushing down port-forwarding to the runtime doesn't ensure the runtime doesn't rely on host utilities).
|
||||
|
||||
The Flex Volume feature is several releases out from fully replacing the current volumes as well.
|
||||
|
||||
|
|
@ -158,7 +158,7 @@ Currently, there's a `--containerized` flag. This flag doesn't actually remove t
|
|||
|
||||
#### Timeframe
|
||||
|
||||
During the 1.6 timeframe, the changes mentioned in implementation will be undergone for the CoreOS and GCI distros.
|
||||
During the 1.6 timeframe, the changes mentioned in implementation will be undergone for the CoreOS Container Linux and GCI distros.
|
||||
|
||||
Based on the test results and additional problems that may arise, rollout will
|
||||
be determined from there. Hopefully the rollout can also occur in the 1.6
|
||||
|
|
|
|||
|
|
@ -0,0 +1,64 @@
|
|||
# Mount options for mountable volume types
|
||||
|
||||
## Goal
|
||||
|
||||
Enable Kubernetes admins to specify mount options with mountable volumes
|
||||
such as - `nfs`, `glusterfs` or `aws-ebs` etc.
|
||||
|
||||
## Motivation
|
||||
|
||||
We currently support network filesystems: NFS, Glusterfs, Ceph FS, SMB (Azure file), Quobytes, and local filesystems such as ext[3|4] and XFS.
|
||||
|
||||
Mount time options that are operationally important and have no security implications should be suppported. Examples are NFS's TCP mode, versions, lock mode, caching mode; Glusterfs's caching mode; SMB's version, locking, id mapping; and more.
|
||||
|
||||
## Design
|
||||
|
||||
We are going to add support for mount options in PVs as a beta feature to begin with.
|
||||
|
||||
Mount options can be specified as `mountOptions` annotations in PV. For example:
|
||||
|
||||
``` yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolume
|
||||
metadata:
|
||||
name: pv0003
|
||||
annotations:
|
||||
volume.beta.kubernetes.io/mountOptions: "hard,nolock,nfsvers=3"
|
||||
spec:
|
||||
capacity:
|
||||
storage: 5Gi
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
persistentVolumeReclaimPolicy: Recycle
|
||||
nfs:
|
||||
path: /tmp
|
||||
server: 172.17.0.2
|
||||
```
|
||||
|
||||
|
||||
## Preventing users from specifying mount options in inline volume specs of Pod
|
||||
|
||||
While mount options enable more flexibility in how volumes are mounted, it can result
|
||||
in user specifying options that are not supported or are known to be problematic when
|
||||
using inline volume specs.
|
||||
|
||||
After much deliberation it was decided that - `mountOptions` as an API parameter will not be supported
|
||||
for inline volume specs.
|
||||
|
||||
### Error handling and plugins that don't support mount option
|
||||
|
||||
Kubernetes ships with volume plugins that don't support any kind of mount options. Such as `configmaps` or `secrets`,
|
||||
in those cases to prevent user from submitting volume definitions with bogus mount options - plugins can define a interface function
|
||||
such as:
|
||||
|
||||
```go
|
||||
func SupportsMountOption() {
|
||||
return false
|
||||
}
|
||||
```
|
||||
|
||||
which will be used to validate the PV definition and API object will be *only* created if it passes the validation. Additionally
|
||||
support for user specified mount options will be also checked when volumes are being mounted.
|
||||
|
||||
In other cases where plugin supports mount options (such as - `NFS` or `GlusterFS`) but mounting fails because of invalid mount
|
||||
option or otherwise - an Event API object will be created and attached to the appropriate object.
|
||||
|
|
@ -131,7 +131,7 @@ TL;DR;
|
|||
|
||||
### Components should expose their platform
|
||||
|
||||
It should be possible to run clusters with mixed platforms smoothly. After all, bringing heterogenous machines together to a single unit (a cluster) is one of Kubernetes' greatest strengths. And since the Kubernetes' components communicate over HTTP, two binaries of different architectures may talk to each other normally.
|
||||
It should be possible to run clusters with mixed platforms smoothly. After all, bringing heterogeneous machines together to a single unit (a cluster) is one of Kubernetes' greatest strengths. And since the Kubernetes' components communicate over HTTP, two binaries of different architectures may talk to each other normally.
|
||||
|
||||
The crucial thing here is that the components that handle platform-specific tasks (e.g. kubelet) should expose their platform. In the kubelet case, we've initially solved it by exposing the labels `beta.kubernetes.io/{os,arch}` on every node. This way an user may run binaries for different platforms on a multi-platform cluster, but still it requires manual work to apply the label to every manifest.
|
||||
|
||||
|
|
@ -206,7 +206,7 @@ However, before temporarily [deactivating builds](https://github.com/kubernetes/
|
|||
Go 1.5 introduced many changes. To name a few that are relevant to Kubernetes:
|
||||
- C was eliminated from the tree (it was earlier used for the bootstrap runtime).
|
||||
- All processors are used by default, which means we should be able to remove [lines like this one](https://github.com/kubernetes/kubernetes/blob/v1.2.0/cmd/kubelet/kubelet.go#L37)
|
||||
- The garbage collector became more efficent (but also [confused our latency test](https://github.com/golang/go/issues/14396)).
|
||||
- The garbage collector became more efficient (but also [confused our latency test](https://github.com/golang/go/issues/14396)).
|
||||
- `linux/arm64` and `linux/ppc64le` were added as new ports.
|
||||
- The `GO15VENDOREXPERIMENT` was started. We switched from `Godeps/_workspace` to the native `vendor/` in [this PR](https://github.com/kubernetes/kubernetes/pull/24242).
|
||||
- It's not required to pre-build the whole standard library `std` when cross-compliling. [Details](#prebuilding-the-standard-library-std)
|
||||
|
|
@ -448,7 +448,7 @@ ARMv6 | arm | 6 | - | 32-bit
|
|||
ARMv7 | arm | 7 | armhf | 32-bit
|
||||
ARMv8 | arm64 | - | aarch64 | 64-bit
|
||||
|
||||
The compability between the versions is pretty straightforward, ARMv5 binaries may run on ARMv7 hosts, but not vice versa.
|
||||
The compatibility between the versions is pretty straightforward, ARMv5 binaries may run on ARMv7 hosts, but not vice versa.
|
||||
|
||||
## Cross-building docker images for linux
|
||||
|
||||
|
|
|
|||
|
|
@ -1,40 +1,24 @@
|
|||
# Node Allocatable Resources
|
||||
|
||||
**Issue:** https://github.com/kubernetes/kubernetes/issues/13984
|
||||
### Authors: timstclair@, vishh@
|
||||
|
||||
## Overview
|
||||
|
||||
Currently Node.Status has Capacity, but no concept of node Allocatable. We need additional
|
||||
parameters to serve several purposes:
|
||||
Kubernetes nodes typically run many OS system daemons in addition to kubernetes daemons like kubelet, runtime, etc. and user pods.
|
||||
Kubernetes assumes that all the compute resources available, referred to as `Capacity`, in a node are available for user pods.
|
||||
In reality, system daemons use non-trivial amount of resources and their availability is critical for the stability of the system.
|
||||
To address this issue, this proposal introduces the concept of `Allocatable` which identifies the amount of compute resources available to user pods.
|
||||
Specifically, the kubelet will provide a few knobs to reserve resources for OS system daemons and kubernetes daemons.
|
||||
|
||||
1. Kubernetes metrics provides "/docker-daemon", "/kubelet",
|
||||
"/kube-proxy", "/system" etc. raw containers for monitoring system component resource usage
|
||||
patterns and detecting regressions. Eventually we want to cap system component usage to a certain
|
||||
limit / request. However this is not currently feasible due to a variety of reasons including:
|
||||
1. Docker still uses tons of computing resources (See
|
||||
[#16943](https://github.com/kubernetes/kubernetes/issues/16943))
|
||||
2. We have not yet defined the minimal system requirements, so we cannot control Kubernetes
|
||||
nodes or know about arbitrary daemons, which can make the system resources
|
||||
unmanageable. Even with a resource cap we cannot do a full resource management on the
|
||||
node, but with the proposed parameters we can mitigate really bad resource over commits
|
||||
3. Usage scales with the number of pods running on the node
|
||||
2. For external schedulers (such as mesos, hadoop, etc.) integration, they might want to partition
|
||||
compute resources on a given node, limiting how much Kubelet can use. We should provide a
|
||||
mechanism by which they can query kubelet, and reserve some resources for their own purpose.
|
||||
By explicitly reserving compute resources, the intention is to avoid overcommiting the node and not have system daemons compete with user pods.
|
||||
The resources available to system daemons and user pods will be capped based on user specified reservations.
|
||||
|
||||
### Scope of proposal
|
||||
|
||||
This proposal deals with resource reporting through the [`Allocatable` field](#allocatable) for more
|
||||
reliable scheduling, and minimizing resource over commitment. This proposal *does not* cover
|
||||
resource usage enforcement (e.g. limiting kubernetes component usage), pod eviction (e.g. when
|
||||
reservation grows), or running multiple Kubelets on a single node.
|
||||
If `Allocatable` is available, the scheduler will use that instead of `Capacity`, thereby not overcommiting the node.
|
||||
|
||||
## Design
|
||||
|
||||
### Definitions
|
||||
|
||||

|
||||
|
||||
1. **Node Capacity** - Already provided as
|
||||
[`NodeStatus.Capacity`](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html#_v1_nodestatus),
|
||||
this is total capacity read from the node instance, and assumed to be constant.
|
||||
|
|
@ -66,7 +50,7 @@ type NodeStatus struct {
|
|||
Allocatable will be computed by the Kubelet and reported to the API server. It is defined to be:
|
||||
|
||||
```
|
||||
[Allocatable] = [Node Capacity] - [Kube-Reserved] - [System-Reserved]
|
||||
[Allocatable] = [Node Capacity] - [Kube-Reserved] - [System-Reserved] - [Hard-Eviction-Threshold]
|
||||
```
|
||||
|
||||
The scheduler will use `Allocatable` in place of `Capacity` when scheduling pods, and the Kubelet
|
||||
|
|
@ -89,12 +73,7 @@ The flag will be specified as a serialized `ResourceList`, with resources define
|
|||
--kube-reserved=cpu=500m,memory=5Mi
|
||||
```
|
||||
|
||||
Initially we will only support CPU and memory, but will eventually support more resources. See
|
||||
[#16889](https://github.com/kubernetes/kubernetes/pull/16889) for disk accounting.
|
||||
|
||||
If KubeReserved is not set it defaults to a sane value (TBD) calculated from machine capacity. If it
|
||||
is explicitly set to 0 (along with `SystemReserved`), then `Allocatable == Capacity`, and the system
|
||||
behavior is equivalent to the 1.1 behavior with scheduling based on Capacity.
|
||||
Initially we will only support CPU and memory, but will eventually support more resources like [local storage](#phase-3) and io proportional weights to improve node reliability.
|
||||
|
||||
#### System-Reserved
|
||||
|
||||
|
|
@ -102,48 +81,259 @@ In the initial implementation, `SystemReserved` will be functionally equivalent
|
|||
[`KubeReserved`](#kube-reserved), but with a different semantic meaning. While KubeReserved
|
||||
designates resources set aside for kubernetes components, SystemReserved designates resources set
|
||||
aside for non-kubernetes components (currently this is reported as all the processes lumped
|
||||
together in the `/system` raw container).
|
||||
together in the `/system` raw container on non-systemd nodes).
|
||||
|
||||
## Issues
|
||||
## Kubelet Evictions Thresholds
|
||||
|
||||
To improve the reliability of nodes, kubelet evicts pods whenever the node runs out of memory or local storage.
|
||||
Together, evictions and node allocatable help improve node stability.
|
||||
|
||||
As of v1.5, evictions are based on overall node usage relative to `Capacity`.
|
||||
Kubelet evicts pods based on QoS and user configured eviction thresholds.
|
||||
More deails in [this doc](./kubelet-eviction.md#enforce-node-allocatable)
|
||||
|
||||
From v1.6, if `Allocatable` is enforced by default across all pods on a node using cgroups, pods cannot exceed `Allocatable`.
|
||||
Memory and CPU limits are enforced using cgroups, but there exists no easy means to enforce storage limits though.
|
||||
Enforcing storage limits using Linux Quota is not possible since it's not hierarchical.
|
||||
Once storage is supported as a resource for `Allocatable`, Kubelet has to perform evictions based on `Allocatable` in addition to `Capacity`.
|
||||
|
||||
Note that eviction limits are enforced on pods only and system daemons are free to use any amount of resources unless their reservations are enforced.
|
||||
|
||||
Here is an example to illustrate Node Allocatable for memory:
|
||||
|
||||
Node Capacity is `32Gi`, kube-reserved is `2Gi`, system-reserved is `1Gi`, eviction-hard is set to `<100Mi`
|
||||
|
||||
For this node, the effective Node Allocatable is `28.9Gi` only; i.e. if kube and system components use up all their reservation, the memory available for pods is only `28.9Gi` and kubelet will evict pods once overall usage of pods crosses that threshold.
|
||||
|
||||
If we enforce Node Allocatable (`28.9Gi`) via top level cgroups, then pods can never exceed `28.9Gi` in which case evictions will not be performed unless kernel memory consumption is above `100Mi`.
|
||||
|
||||
In order to support evictions and avoid memcg OOM kills for pods, we will set the top level cgroup limits for pods to be `Node Allocatable` + `Eviction Hard Thresholds`.
|
||||
|
||||
However, the scheduler is not expected to use more than `28.9Gi` and so `Node Allocatable` on Node Status will be `28.9Gi`.
|
||||
|
||||
If kube and system components do not use up all their reservation, with the above example, pods will face memcg OOM kills from the node allocatable cgroup before kubelet evictions kick in.
|
||||
To better enforce QoS under this situation, Kubelet will apply the hard eviction thresholds on the node allocatable cgroup as well, if node allocatable is enforced.
|
||||
The resulting behavior will be the same for user pods.
|
||||
With the above example, Kubelet will evict pods whenever pods consume more than `28.9Gi` which will be `<100Mi` from `29Gi` which will be the memory limits on the Node Allocatable cgroup.
|
||||
|
||||
## General guidelines
|
||||
|
||||
System daemons are expected to be treated similar to `Guaranteed` pods.
|
||||
System daemons can burst within their bounding cgroups and this behavior needs to be managed as part of kubernetes deployment.
|
||||
For example, Kubelet can have its own cgroup and share `KubeReserved` resources with the Container Runtime.
|
||||
However, Kubelet cannot burst and use up all available Node resources if `KubeReserved` is enforced.
|
||||
|
||||
Users are advised to be extra careful while enforcing `SystemReserved` reservation since it can lead to critical services being CPU starved or OOM killed on the nodes.
|
||||
The recommendation is to enforce `SystemReserved` only if a user has profiled their nodes exhaustively to come up with precise estimates.
|
||||
|
||||
To begin with enforce `Allocatable` on `pods` only.
|
||||
Once adequate monitoring and alerting is in place to track kube daemons, attempt to enforce `KubeReserved` based on heuristics.
|
||||
More on this in [Phase 2](#phase-2-enforce-allocatable-on-pods).
|
||||
|
||||
The resource requirements of kube system daemons will grow over time as more and more features are added.
|
||||
Over time, the project will attempt to bring down utilization, but that is not a priority as of now.
|
||||
So expect a drop in `Allocatable` capacity over time.
|
||||
|
||||
`Systemd-logind` places ssh sessions under `/user.slice`.
|
||||
Its usage will not be accounted for in the nodes.
|
||||
Take into account resource reservation for `/user.slice` while configuring `SystemReserved`.
|
||||
Ideally `/user.slice` should reside under `SystemReserved` top level cgroup.
|
||||
|
||||
## Recommended Cgroups Setup
|
||||
|
||||
Following is the recommended cgroup configuration for Kubernetes nodes.
|
||||
All OS system daemons are expected to be placed under a top level `SystemReserved` cgroup.
|
||||
`Kubelet` and `Container Runtime` are expected to be placed under `KubeReserved` cgroup.
|
||||
The reason for recommending placing the `Container Runtime` under `KubeReserved` is as follows:
|
||||
|
||||
1. A container runtime on Kubernetes nodes is not expected to be used outside of the Kubelet.
|
||||
1. It's resource consumption is tied to the number of pods running on a node.
|
||||
|
||||
Note that the hierarchy below recommends having dedicated cgroups for kubelet and the runtime to individually track their usage.
|
||||
```text
|
||||
|
||||
/ (Cgroup Root)
|
||||
.
|
||||
+..systemreserved or system.slice (Specified via `--system-reserved-cgroup`; `SystemReserved` enforced here *optionally* by kubelet)
|
||||
. . .tasks(sshd,udev,etc)
|
||||
.
|
||||
.
|
||||
+..podruntime or podruntime.slice (Specified via `--kube-reserved-cgroup`; `KubeReserved` enforced here *optionally* by kubelet)
|
||||
. .
|
||||
. +..kubelet
|
||||
. . .tasks(kubelet)
|
||||
. .
|
||||
. +..runtime
|
||||
. .tasks(docker-engine, containerd)
|
||||
.
|
||||
.
|
||||
+..kubepods or kubepods.slice (Node Allocatable enforced here by Kubelet)
|
||||
. .
|
||||
. +..PodGuaranteed
|
||||
. . .
|
||||
. . +..Container1
|
||||
. . . .tasks(container processes)
|
||||
. . .
|
||||
. . +..PodOverhead
|
||||
. . . .tasks(per-pod processes)
|
||||
. . ...
|
||||
. .
|
||||
. +..Burstable
|
||||
. . .
|
||||
. . +..PodBurstable
|
||||
. . . .
|
||||
. . . +..Container1
|
||||
. . . . .tasks(container processes)
|
||||
. . . +..Container2
|
||||
. . . . .tasks(container processes)
|
||||
. . . .
|
||||
. . . ...
|
||||
. . .
|
||||
. . ...
|
||||
. .
|
||||
. .
|
||||
. +..Besteffort
|
||||
. . .
|
||||
. . +..PodBesteffort
|
||||
. . . .
|
||||
. . . +..Container1
|
||||
. . . . .tasks(container processes)
|
||||
. . . +..Container2
|
||||
. . . . .tasks(container processes)
|
||||
. . . .
|
||||
. . . ...
|
||||
. . .
|
||||
. . ...
|
||||
|
||||
```
|
||||
|
||||
`systemreserved` & `kubereserved` cgroups are expected to be created by users.
|
||||
If Kubelet is creating cgroups for itself and docker daemon, it will create the `kubereserved` cgroups automatically.
|
||||
|
||||
`kubepods` cgroups will be created by kubelet automatically if it is not already there.
|
||||
Creation of `kubepods` cgroup is tied to QoS Cgroup support which is controlled by `--cgroups-per-qos` flag.
|
||||
If the cgroup driver is set to `systemd` then Kubelet will create a `kubepods.slice` via systemd.
|
||||
By default, Kubelet will `mkdir` `/kubepods` cgroup directly via cgroupfs.
|
||||
|
||||
#### Containerizing Kubelet
|
||||
|
||||
If Kubelet is managed using a container runtime, have the runtime create cgroups for kubelet under `kubereserved`.
|
||||
|
||||
### Metrics
|
||||
|
||||
Kubelet identifies it's own cgroup and exposes it's usage metrics via the Summary metrics API (/stats/summary)
|
||||
With docker runtime, kubelet identifies docker runtime's cgroups too and exposes metrics for it via the Summary metrics API.
|
||||
To provide a complete overview of a node, Kubelet will expose metrics from cgroups enforcing `SystemReserved`, `KubeReserved` & `Allocatable` too.
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1 - Introduce Allocatable to the system without enforcement
|
||||
|
||||
**Status**: Implemented v1.2
|
||||
|
||||
In this phase, Kubelet will support specifying `KubeReserved` & `SystemReserved` resource reservations via kubelet flags.
|
||||
The defaults for these flags will be `""`, meaning zero cpu or memory reservations.
|
||||
Kubelet will compute `Allocatable` and update `Node.Status` to include it.
|
||||
The scheduler will use `Allocatable` instead of `Capacity` if it is available.
|
||||
|
||||
### Phase 2 - Enforce Allocatable on Pods
|
||||
|
||||
**Status**: Targeted for v1.6
|
||||
|
||||
In this phase, Kubelet will automatically create a top level cgroup to enforce Node Allocatable across all user pods.
|
||||
The creation of this cgroup is controlled by `--cgroups-per-qos` flag.
|
||||
|
||||
Kubelet will support specifying the top level cgroups for `KubeReserved` and `SystemReserved` and support *optionally* placing resource restrictions on these top level cgroups.
|
||||
|
||||
Users are expected to specify `KubeReserved` and `SystemReserved` based on their deployment requirements.
|
||||
|
||||
Resource requirements for Kubelet and the runtime is typically proportional to the number of pods running on a node.
|
||||
Once a user identified the maximum pod density for each of their nodes, they will be able to compute `KubeReserved` using [this performance dashboard](http://node-perf-dash.k8s.io/#/builds).
|
||||
[This blog post](http://blog.kubernetes.io/2016/11/visualize-kubelet-performance-with-node-dashboard.html) explains how the dashboard has to be interpreted.
|
||||
Note that this dashboard provides usage metrics for docker runtime only as of now.
|
||||
|
||||
Support for evictions based on Allocatable will be introduced in this phase.
|
||||
|
||||
New flags introduced in this phase are as follows:
|
||||
|
||||
1. `--enforce-node-allocatable=[pods][,][kube-reserved][,][system-reserved]`
|
||||
|
||||
* This flag will default to `pods` in v1.6.
|
||||
* This flag will be a `no-op` unless `--kube-reserved` and/or `--system-reserved` has been specified.
|
||||
* If `--cgroups-per-qos=false`, then this flag has to be set to `""`. Otherwise its an error and kubelet will fail.
|
||||
* It is recommended to drain and restart nodes prior to upgrading to v1.6. This is necessary for `--cgroups-per-qos` feature anyways which is expected to be turned on by default in `v1.6`.
|
||||
* Users intending to turn off this feature can set this flag to `""`.
|
||||
* Specifying `kube-reserved` value in this flag is invalid if `--kube-reserved-cgroup` flag is not specified.
|
||||
* Specifying `system-reserved` value in this flag is invalid if `--system-reserved-cgroup` flag is not specified.
|
||||
* By including `kube-reserved` or `system-reserved` in this flag's value, and by specifying the following two flags, Kubelet will attempt to enforce the reservations specified via `--kube-reserved` & `system-reserved` respectively.
|
||||
|
||||
2. `--kube-reserved-cgroup=<absolute path to a cgroup>`
|
||||
* This flag helps kubelet identify the control group managing all kube components like Kubelet & container runtime that fall under the `KubeReserved` reservation.
|
||||
* Example: `/kube.slice`. Note that absolute paths are required and systemd naming scheme isn't supported.
|
||||
|
||||
3. `--system-reserved-cgroup=<absolute path to a cgroup>`
|
||||
* This flag helps kubelet identify the control group managing all OS specific system daemons that fall under the `SystemReserved` reservation.
|
||||
* Example: `/system.slice`. Note that absolute paths are required and systemd naming scheme isn't supported.
|
||||
|
||||
4. `--experimental-node-allocatable-ignore-eviction-threshold`
|
||||
* This flag is provided as an `opt-out` option to avoid including Hard eviction thresholds in Node Allocatable which can impact existing clusters.
|
||||
* The default value is `false`.
|
||||
|
||||
#### Rollout details
|
||||
|
||||
This phase is expected to improve Kubernetes node stability.
|
||||
However it requires users to specify non-default values for `--kube-reserved` & `--system-reserved` flags though.
|
||||
|
||||
The rollout of this phase has been long due and hence we are attempting to include it in v1.6.
|
||||
|
||||
Since `KubeReserved` and `SystemReserved` continue to have `""` as defaults, the node's `Allocatable` does not change automatically.
|
||||
Since this phase requires node drains (or pod restarts/terminations), it is considered disruptive to users.
|
||||
|
||||
To rollback this phase, set `--enforce-node-allocatable` flag to `""` and `--experimental-node-allocatable-ignore-eviction-threshold` to `true`.
|
||||
The former disables Node Allocatable enforcement on all pods and the latter avoids including hard eviction thresholds in Node Allocatable.
|
||||
|
||||
This rollout in v1.6 might cause the following symptoms:
|
||||
|
||||
1. If `--kube-reserved` and/or `--system-reserved` flags are also specified, OOM kills of containers and/or evictions of pods. This can happen primarily to `Burstable` and `BestEffort` pods since they can no longer use up all the resource available on the node.
|
||||
1. Total allocatable capadity in the cluster reduces resulting in pods staying `Pending` because Hard Eviction Thresholds are included in Node Allocatable.
|
||||
|
||||
##### Proposed Timeline
|
||||
|
||||
```text
|
||||
02/14/2017 - Discuss the rollout plan in sig-node meeting
|
||||
02/15/2017 - Flip the switch to enable pod level cgroups by default
|
||||
02/21/2017 - Merge phase 2 implementation
|
||||
02/27/2017 - Kubernetes Feature complete (i.e. code freeze)
|
||||
03/01/2017 - Send an announcement to kubernetes-dev@ about this rollout along with rollback options and potential issues. Recommend users to set kube and system reserved.
|
||||
03/22/2017 - Kubernetes 1.6 release
|
||||
```
|
||||
|
||||
### Phase 3 - Metrics & support for Storage
|
||||
|
||||
*Status*: Targeted for v1.7
|
||||
|
||||
In this phase, Kubelet will expose usage metrics for `KubeReserved`, `SystemReserved` and `Allocatable` top level cgroups via Summary metrics API.
|
||||
`Storage` will also be introduced as a reservable resource in this phase.
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Kubernetes reservation is smaller than kubernetes component usage
|
||||
|
||||
**Solution**: Initially, do nothing (best effort). Let the kubernetes daemons overflow the reserved
|
||||
resources and hope for the best. If the node usage is less than Allocatable, there will be some room
|
||||
for overflow and the node should continue to function. If the node has been scheduled to capacity
|
||||
for overflow and the node should continue to function. If the node has been scheduled to `allocatable`
|
||||
(worst-case scenario) it may enter an unstable state, which is the current behavior in this
|
||||
situation.
|
||||
|
||||
In the [future](#future-work) we may set a parent cgroup for kubernetes components, with limits set
|
||||
A recommended alternative is to enforce KubeReserved once Kubelet supports it (Phase 2).
|
||||
In the future we may set a parent cgroup for kubernetes components, with limits set
|
||||
according to `KubeReserved`.
|
||||
|
||||
### Version discrepancy
|
||||
|
||||
**API server / scheduler is not allocatable-resources aware:** If the Kubelet rejects a Pod but the
|
||||
scheduler expects the Kubelet to accept it, the system could get stuck in an infinite loop
|
||||
scheduling a Pod onto the node only to have Kubelet repeatedly reject it. To avoid this situation,
|
||||
we will do a 2-stage rollout of `Allocatable`. In stage 1 (targeted for 1.2), `Allocatable` will
|
||||
be reported by the Kubelet and the scheduler will be updated to use it, but Kubelet will continue
|
||||
to do admission checks based on `Capacity` (same as today). In stage 2 of the rollout (targeted
|
||||
for 1.3 or later), the Kubelet will start doing admission checks based on `Allocatable`.
|
||||
|
||||
**API server expects `Allocatable` but does not receive it:** If the kubelet is older and does not
|
||||
provide `Allocatable` in the `NodeStatus`, then `Allocatable` will be
|
||||
[defaulted](../../pkg/api/v1/defaults.go) to
|
||||
`Capacity` (which will yield today's behavior of scheduling based on capacity).
|
||||
|
||||
### 3rd party schedulers
|
||||
|
||||
The community should be notified that an update to schedulers is recommended, but if a scheduler is
|
||||
not updated it falls under the above case of "scheduler is not allocatable-resources aware".
|
||||
|
||||
## Future work
|
||||
|
||||
1. Convert kubelet flags to Config API - Prerequisite to (2). See
|
||||
[#12245](https://github.com/kubernetes/kubernetes/issues/12245).
|
||||
2. Set cgroup limits according KubeReserved - as described in the [overview](#overview)
|
||||
3. Report kernel usage to be considered with scheduling decisions.
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
|||
Binary file not shown.
|
Before Width: | Height: | Size: 17 KiB |
|
|
@ -0,0 +1,718 @@
|
|||
# Pod Preset
|
||||
|
||||
* [Abstract](#abstract)
|
||||
* [Motivation](#motivation)
|
||||
* [Constraints and Assumptions](#constraints-and-assumptions)
|
||||
* [Use Cases](#use-cases)
|
||||
* [Summary](#summary)
|
||||
* [Prior Art](#prior-art)
|
||||
* [Objectives](#objectives)
|
||||
* [Proposed Changes](#proposed-changes)
|
||||
* [PodPreset API object](#podpreset-api-object)
|
||||
* [Validations](#validations)
|
||||
* [AdmissionControl Plug-in: PodPreset](#admissioncontrol-plug-in-podpreset)
|
||||
* [Behavior](#behavior)
|
||||
* [Examples](#examples)
|
||||
* [Simple Pod Spec Example](#simple-pod-spec-example)
|
||||
* [Pod Spec with `ConfigMap` Example](#pod-spec-with-`configmap`-example)
|
||||
* [ReplicaSet with Pod Spec Example](#replicaset-with-pod-spec-example)
|
||||
* [Multiple PodPreset Example](#multiple-podpreset-example)
|
||||
* [Conflict Example](#conflict-example)
|
||||
|
||||
|
||||
## Abstract
|
||||
|
||||
Describes a policy resource that allows for the loose coupling of a Pod's
|
||||
definition from additional runtime requirements for that Pod. For example,
|
||||
mounting of Secrets, or setting additional environment variables,
|
||||
may not be known at Pod deployment time, but may be required at Pod creation
|
||||
time.
|
||||
|
||||
## Motivation
|
||||
|
||||
Consuming a service involves more than just connectivity. In addition to
|
||||
coordinates to reach the service, credentials and non-secret configuration
|
||||
parameters are typically needed to use the service. The primitives for this
|
||||
already exist, but a gap exists where loose coupling is desired: it should be
|
||||
possible to inject pods with the information they need to use a service on a
|
||||
service-by-service basis, without the pod authors having to incorporate the
|
||||
information into every pod spec where it is needed.
|
||||
|
||||
## Constraints and Assumptions
|
||||
|
||||
1. Future work might require new mechanisms to be made to work with existing
|
||||
controllers such as deployments and replicasets that create pods. Existing
|
||||
controllers that create pods should recreate their pods when a new Pod Injection
|
||||
Policy is added that would effect them.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- As a user, I want to be able to provision a new pod
|
||||
without needing to know the application configuration primitives the
|
||||
services my pod will consume.
|
||||
- As a cluster admin, I want specific configuration items of a service to be
|
||||
withheld visibly from a developer deploying a service, but not to block the
|
||||
developer from shipping.
|
||||
- As an app developer, I want to provision a Cloud Spanner instance and then
|
||||
access it from within my Kubernetes cluster.
|
||||
- As an app developer, I want the Cloud Spanner provisioning process to
|
||||
configure my Kubernetes cluster so the endpoints and credentials for my
|
||||
Cloud Spanner instance are implicitly injected into Pods matching a label
|
||||
selector (without me having to modify the PodSpec to add the specific
|
||||
Configmap/Secret containing the endpoint/credential data).
|
||||
|
||||
|
||||
**Specific Example:**
|
||||
|
||||
1. Database Administrator provisions a MySQL service for their cluster.
|
||||
2. Database Administrator creates secrets for the cluster containing the
|
||||
database name, username, and password.
|
||||
3. Database Administrator creates a `PodPreset` defining the database
|
||||
port as an enviornment variable, as well as the secrets. See
|
||||
[Examples](#examples) below for various examples.
|
||||
4. Developer of an application can now label their pod with the specified
|
||||
`Selector` the Database Administrator tells them, and consume the MySQL
|
||||
database without needing to know any of the details from step 2 and 3.
|
||||
|
||||
### Summary
|
||||
|
||||
The use case we are targeting is to automatically inject into Pods the
|
||||
information required to access non-Kubernetes-Services, such as accessing an
|
||||
instances of Cloud Spanner. Accessing external services such as Cloud Spanner
|
||||
may require the Pods to have specific credential and endpoint data.
|
||||
|
||||
Using a Pod Preset allows pod template authors to not have to explicitly
|
||||
set information for every pod. This way authors of pod templates consuming a
|
||||
specific service do not need to know all the details about that service.
|
||||
|
||||
### Prior Art
|
||||
|
||||
Internally for Kubernetes we already support accessing the Kubernetes api from
|
||||
all Pods by injecting the credentials and endpoint data automatically - e.g.
|
||||
injecting the serviceaccount credentials into a volume (via secret) using an
|
||||
[admission controller](https://github.com/kubernetes/kubernetes/blob/97212f5b3a2961d0b58a20bdb6bda3ccfa159bd7/plugin/pkg/admission/serviceaccount/admission.go),
|
||||
and injecting the Service endpoints into environment
|
||||
variables. This is done without the Pod explicitly mounting the serviceaccount
|
||||
secret.
|
||||
|
||||
### Objectives
|
||||
|
||||
The goal of this proposal is to generalize these capabilities so we can introduce
|
||||
similar support for accessing Services running external to the Kubernetes cluster.
|
||||
We can assume that an appropriate Secret and Configmap have already been created
|
||||
as part of the provisioning process of the external service. The need then is to
|
||||
provide a mechanism for injecting the Secret and Configmap into Pods automatically.
|
||||
|
||||
The [ExplicitServiceLinks proposal](https://github.com/kubernetes/community/pull/176),
|
||||
will allow us to decouple where a Service's credential and endpoint information
|
||||
is stored in the Kubernetes cluster from a Pod's intent to access that Service
|
||||
(e.g. in declaring it wants to access a Service, a Pod is automatically injected
|
||||
with the credential and endpoint data required to do so).
|
||||
|
||||
## Proposed Changes
|
||||
|
||||
### PodPreset API object
|
||||
|
||||
This resource is alpha. The policy itself is immutable. The API group will be
|
||||
added to new group `settings` and the version is `v1alpha1`.
|
||||
|
||||
```go
|
||||
// PodPreset is a policy resource that defines additional runtime
|
||||
// requirements for a Pod.
|
||||
type PodPreset struct {
|
||||
unversioned.TypeMeta
|
||||
ObjectMeta
|
||||
|
||||
// +optional
|
||||
Spec PodPresetSpec
|
||||
}
|
||||
|
||||
// PodPresetSpec is a description of a pod preset.
|
||||
type PodPresetSpec struct {
|
||||
// Selector is a label query over a set of resources, in this case pods.
|
||||
// Required.
|
||||
Selector unversioned.LabelSelector
|
||||
// Env defines the collection of EnvVar to inject into containers.
|
||||
// +optional
|
||||
Env []EnvVar
|
||||
// EnvFrom defines the collection of EnvFromSource to inject into
|
||||
// containers.
|
||||
// +optional
|
||||
EnvFrom []EnvFromSource
|
||||
// Volumes defines the collection of Volume to inject into the pod.
|
||||
// +optional
|
||||
Volumes []Volume `json:omitempty`
|
||||
// VolumeMounts defines the collection of VolumeMount to inject into
|
||||
// containers.
|
||||
// +optional
|
||||
VolumeMounts []VolumeMount
|
||||
}
|
||||
```
|
||||
|
||||
#### Validations
|
||||
|
||||
In order for the Pod Preset to be valid it must fulfill the
|
||||
following constraints:
|
||||
|
||||
- The `Selector` field must be defined. This is how we know which pods
|
||||
to inject so therefore it is required and cannot be empty.
|
||||
- The policy must define _at least_ 1 of `Env`, `EnvFrom`, or `Volumes` with
|
||||
corresponding `VolumeMounts`.
|
||||
- If you define a `Volume`, it has to define a `VolumeMount`.
|
||||
- For `Env`, `EnvFrom`, `Volumes`, and `VolumeMounts` all existing API
|
||||
validations are applied.
|
||||
|
||||
This resource will be immutable, if you want to change something you can delete
|
||||
the old policy and recreate a new one. We can change this to be mutable in the
|
||||
future but by disallowing it now, we will not break people in the future.
|
||||
|
||||
#### Conflicts
|
||||
|
||||
There are a number of edge conditions that might occur at the time of
|
||||
injection. These are as follows:
|
||||
|
||||
- Merging lists with no conflicts: if a pod already has a `Volume`,
|
||||
`VolumeMount` or `EnvVar` defined **exactly** as defined in the
|
||||
PodPreset. No error will occur since they are the exact same. The
|
||||
motivation behind this is if services have no quite converted to using pod
|
||||
injection policies yet and have duplicated information and an error should
|
||||
obviously not be thrown if the items that need to be injected already exist
|
||||
and are exactly the same.
|
||||
- Merging lists with conflicts: if a PIP redefines an `EnvVar` or a `Volume`,
|
||||
an event on the pod showing the error on the conflict will be thrown and
|
||||
nothing will be injected.
|
||||
- Conflicts between `Env` and `EnvFrom`: this would throw an error with an
|
||||
event on the pod showing the error on the conflict. Nothing would be
|
||||
injected.
|
||||
|
||||
> **Note:** In the case of a conflict nothing will be injected. The entire
|
||||
> policy is ignored and an event is thrown on the pod detailing the conflict.
|
||||
|
||||
### AdmissionControl Plug-in: PodPreset
|
||||
|
||||
The **PodPreset** plug-in introspects all incoming pod creation
|
||||
requests and injects the pod based off a `Selector` with the desired
|
||||
attributes.
|
||||
|
||||
For the initial alpha, the order of precedence for applying multiple
|
||||
`PodPreset` specs is from oldest to newest. All Pod Injection
|
||||
Policies in a namespace should be order agnostic; the order of application is
|
||||
unspecified. Users should ensure that policies do not overlap.
|
||||
However we can use merge keys to detect some of the conflicts that may occur.
|
||||
|
||||
This will not be enabled by default for all clusters, but once GA will be
|
||||
a part of the set of strongly recommended plug-ins documented
|
||||
[here](https://kubernetes.io/docs/admin/admission-controllers/#is-there-a-recommended-set-of-plug-ins-to-use).
|
||||
|
||||
**Why not an Initializer?**
|
||||
|
||||
This will be first implemented as an AdmissionControl plug-in then can be
|
||||
converted to an Initializer once that is fully ready. The proposal for
|
||||
Initializers can be found at [kubernetes/community#132](https://github.com/kubernetes/community/pull/132).
|
||||
|
||||
|
||||
#### Behavior
|
||||
|
||||
This will modify the pod spec. The supported changes to
|
||||
`Env`, `EnvFrom`, and `VolumeMounts` apply to the container spec for
|
||||
all containers in the pod with the specified matching `Selector`. The
|
||||
changes to `Volumes` apply to the pod spec for all pods matching `Selector`.
|
||||
|
||||
The resultant modified pod spec will be annotated to show that it was modified by
|
||||
the `PodPreset`. This will be of the form
|
||||
`podpreset.admission.kubernetes.io/<pip name>": "<resource version>"`.
|
||||
|
||||
*Why modify all containers in a pod?*
|
||||
|
||||
Currently there is no concept of labels on specific containers in a pod which
|
||||
would be necessary for per-container pod injections. We could add labels
|
||||
for specific containers which would allow this and be the best solution to not
|
||||
injecting all. Container labels have been discussed various times through
|
||||
multiple issues and proposals, which all congregate to this thread on the
|
||||
[kubernetes-sig-node mailing
|
||||
list](https://groups.google.com/forum/#!topic/kubernetes-sig-node/gijxbYC7HT8).
|
||||
In the future, even if container labels were added, we would need to be careful
|
||||
about not making breaking changes to the current behavior.
|
||||
|
||||
Other solutions include basing the container to inject based off
|
||||
matching its name to another field in the `PodPreset` spec, but
|
||||
this would not scale well and would cause annoyance with configuration
|
||||
management.
|
||||
|
||||
In the future we might question whether we need or want containers to express
|
||||
that they expect injection. At this time we are deferring this issue.
|
||||
|
||||
## Examples
|
||||
|
||||
### Simple Pod Spec Example
|
||||
|
||||
This is a simple example to show how a Pod spec is modified by the Pod
|
||||
Injection Policy.
|
||||
|
||||
**User submitted pod spec:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
ports:
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
**Example Pod Preset:**
|
||||
|
||||
```yaml
|
||||
kind: PodPreset
|
||||
apiVersion: settings/v1alpha1
|
||||
metadata:
|
||||
name: allow-database
|
||||
namespace: myns
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
role: frontend
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
**Pod spec after admission controller:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
annotations:
|
||||
podpreset.admission.kubernetes.io/allow-database: "resource version"
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
ports:
|
||||
- containerPort: 80
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
### Pod Spec with `ConfigMap` Example
|
||||
|
||||
This is an example to show how a Pod spec is modified by the Pod Injection
|
||||
Policy that defines a `ConfigMap` for Environment Variables.
|
||||
|
||||
**User submitted pod spec:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
ports:
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
**User submitted `ConfigMap`:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: etcd-env-config
|
||||
data:
|
||||
number_of_members: "1"
|
||||
initial_cluster_state: new
|
||||
initial_cluster_token: DUMMY_ETCD_INITIAL_CLUSTER_TOKEN
|
||||
discovery_token: DUMMY_ETCD_DISCOVERY_TOKEN
|
||||
discovery_url: http://etcd_discovery:2379
|
||||
etcdctl_peers: http://etcd:2379
|
||||
duplicate_key: FROM_CONFIG_MAP
|
||||
REPLACE_ME: "a value"
|
||||
```
|
||||
|
||||
**Example Pod Preset:**
|
||||
|
||||
```yaml
|
||||
kind: PodPreset
|
||||
apiVersion: settings/v1alpha1
|
||||
metadata:
|
||||
name: allow-database
|
||||
namespace: myns
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
role: frontend
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
- name: duplicate_key
|
||||
value: FROM_ENV
|
||||
- name: expansion
|
||||
value: $(REPLACE_ME)
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: etcd-env-config
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
- mountPath: /etc/app/config.json
|
||||
readOnly: true
|
||||
name: secret-volume
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
- name: secret-volume
|
||||
secretName: config-details
|
||||
```
|
||||
|
||||
**Pod spec after admission controller:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
annotations:
|
||||
podpreset.admission.kubernetes.io/allow-database: "resource version"
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
- mountPath: /etc/app/config.json
|
||||
readOnly: true
|
||||
name: secret-volume
|
||||
ports:
|
||||
- containerPort: 80
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
- name: duplicate_key
|
||||
value: FROM_ENV
|
||||
- name: expansion
|
||||
value: $(REPLACE_ME)
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: etcd-env-config
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
- name: secret-volume
|
||||
secretName: config-details
|
||||
```
|
||||
|
||||
### ReplicaSet with Pod Spec Example
|
||||
|
||||
The following example shows that only the pod spec is modified by the Pod
|
||||
Injection Policy.
|
||||
|
||||
**User submitted ReplicaSet:**
|
||||
|
||||
```yaml
|
||||
apiVersion: settings/v1alpha1
|
||||
kind: ReplicaSet
|
||||
metadata:
|
||||
name: frontend
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
tier: frontend
|
||||
matchExpressions:
|
||||
- {key: tier, operator: In, values: [frontend]}
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: guestbook
|
||||
tier: frontend
|
||||
spec:
|
||||
containers:
|
||||
- name: php-redis
|
||||
image: gcr.io/google_samples/gb-frontend:v3
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
env:
|
||||
- name: GET_HOSTS_FROM
|
||||
value: dns
|
||||
ports:
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
**Example Pod Preset:**
|
||||
|
||||
```yaml
|
||||
kind: PodPreset
|
||||
apiVersion: settings/v1alpha1
|
||||
metadata:
|
||||
name: allow-database
|
||||
namespace: myns
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
tier: frontend
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
**Pod spec after admission controller:**
|
||||
|
||||
```yaml
|
||||
kind: Pod
|
||||
metadata:
|
||||
labels:
|
||||
app: guestbook
|
||||
tier: frontend
|
||||
annotations:
|
||||
podpreset.admission.kubernetes.io/allow-database: "resource version"
|
||||
spec:
|
||||
containers:
|
||||
- name: php-redis
|
||||
image: gcr.io/google_samples/gb-frontend:v3
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
env:
|
||||
- name: GET_HOSTS_FROM
|
||||
value: dns
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
ports:
|
||||
- containerPort: 80
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
### Multiple PodPreset Example
|
||||
|
||||
This is an example to show how a Pod spec is modified by multiple Pod
|
||||
Injection Policies.
|
||||
|
||||
**User submitted pod spec:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
ports:
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
**Example Pod Preset:**
|
||||
|
||||
```yaml
|
||||
kind: PodPreset
|
||||
apiVersion: settings/v1alpha1
|
||||
metadata:
|
||||
name: allow-database
|
||||
namespace: myns
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
role: frontend
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
**Another Pod Preset:**
|
||||
|
||||
```yaml
|
||||
kind: PodPreset
|
||||
apiVersion: settings/v1alpha1
|
||||
metadata:
|
||||
name: proxy
|
||||
namespace: myns
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
role: frontend
|
||||
volumeMounts:
|
||||
- mountPath: /etc/proxy/configs
|
||||
name: proxy-volume
|
||||
volumes:
|
||||
- name: proxy-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
**Pod spec after admission controller:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
annotations:
|
||||
podpreset.admission.kubernetes.io/allow-database: "resource version"
|
||||
podpreset.admission.kubernetes.io/proxy: "resource version"
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
- mountPath: /etc/proxy/configs
|
||||
name: proxy-volume
|
||||
ports:
|
||||
- containerPort: 80
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
- name: proxy-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
### Conflict Example
|
||||
|
||||
This is a example to show how a Pod spec is not modified by the Pod Injection
|
||||
Policy when there is a conflict.
|
||||
|
||||
**User submitted pod spec:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
ports:
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
**Example Pod Preset:**
|
||||
|
||||
```yaml
|
||||
kind: PodPreset
|
||||
apiVersion: settings/v1alpha1
|
||||
metadata:
|
||||
name: allow-database
|
||||
namespace: myns
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
role: frontend
|
||||
env:
|
||||
- name: DB_PORT
|
||||
value: 6379
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: other-volume
|
||||
volumes:
|
||||
- name: other-volume
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
**Pod spec after admission controller will not change because of the conflict:**
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: website
|
||||
labels:
|
||||
app: website
|
||||
role: frontend
|
||||
spec:
|
||||
containers:
|
||||
- name: website
|
||||
image: ecorp/website
|
||||
volumeMounts:
|
||||
- mountPath: /cache
|
||||
name: cache-volume
|
||||
ports:
|
||||
volumes:
|
||||
- name: cache-volume
|
||||
emptyDir: {}
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
**If we run `kubectl describe...` we can see the event:**
|
||||
|
||||
```
|
||||
$ kubectl describe ...
|
||||
....
|
||||
Events:
|
||||
FirstSeen LastSeen Count From SubobjectPath Reason Message
|
||||
Tue, 07 Feb 2017 16:56:12 -0700 Tue, 07 Feb 2017 16:56:12 -0700 1 {podpreset.admission.kubernetes.io/allow-database } conflict Conflict on pod preset. Duplicate mountPath /cache.
|
||||
```
|
||||
|
|
@ -1,114 +1,480 @@
|
|||
# Pod level resource management in Kubelet
|
||||
# Kubelet pod level resource management
|
||||
|
||||
**Author**: Buddha Prakash (@dubstack), Vishnu Kannan (@vishh)
|
||||
**Authors**:
|
||||
|
||||
**Last Updated**: 06/23/2016
|
||||
1. Buddha Prakash (@dubstack)
|
||||
1. Vishnu Kannan (@vishh)
|
||||
1. Derek Carr (@derekwaynecarr)
|
||||
|
||||
**Status**: Draft Proposal (WIP)
|
||||
**Last Updated**: 02/21/2017
|
||||
|
||||
This document proposes a design for introducing pod level resource accounting to Kubernetes, and outlines the implementation and rollout plan.
|
||||
**Status**: Implementation planned for Kubernetes 1.6
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Pod level resource management in Kubelet](#pod-level-resource-management-in-kubelet)
|
||||
- [Introduction](#introduction)
|
||||
- [Non Goals](#non-goals)
|
||||
- [Motivations](#motivations)
|
||||
- [Design](#design)
|
||||
- [Proposed cgroup hierarchy:](#proposed-cgroup-hierarchy)
|
||||
- [QoS classes](#qos-classes)
|
||||
- [Guaranteed](#guaranteed)
|
||||
- [Burstable](#burstable)
|
||||
- [Best Effort](#best-effort)
|
||||
- [With Systemd](#with-systemd)
|
||||
- [Hierarchy Outline](#hierarchy-outline)
|
||||
- [QoS Policy Design Decisions](#qos-policy-design-decisions)
|
||||
- [Implementation Plan](#implementation-plan)
|
||||
- [Top level Cgroups for QoS tiers](#top-level-cgroups-for-qos-tiers)
|
||||
- [Pod level Cgroup creation and deletion (Docker runtime)](#pod-level-cgroup-creation-and-deletion-docker-runtime)
|
||||
- [Container level cgroups](#container-level-cgroups)
|
||||
- [Rkt runtime](#rkt-runtime)
|
||||
- [Add Pod level metrics to Kubelet's metrics provider](#add-pod-level-metrics-to-kubelets-metrics-provider)
|
||||
- [Rollout Plan](#rollout-plan)
|
||||
- [Implementation Status](#implementation-status)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
This document proposes a design for introducing pod level resource accounting
|
||||
to Kubernetes. It outlines the implementation and associated rollout plan.
|
||||
|
||||
## Introduction
|
||||
|
||||
As of now [Quality of Service(QoS)](../../docs/design/resource-qos.md) is not enforced at a pod level. Excepting pod evictions, all the other QoS features are not applicable at the pod level.
|
||||
To better support QoS, there is a need to add support for pod level resource accounting in Kubernetes.
|
||||
Kubernetes supports container level isolation by allowing users
|
||||
to specify [compute resource requirements](resources.md) via requests and
|
||||
limits on individual containers. The `kubelet` delegates creation of a
|
||||
cgroup sandbox for each container to its associated container runtime.
|
||||
|
||||
We propose to have a unified cgroup hierarchy with pod level cgroups for better resource management. We will have a cgroup hierarchy with top level cgroups for the three QoS classes Guaranteed, Burstable and BestEffort. Pods (and their containers) belonging to a QoS class will be grouped under these top level QoS cgroups. And all containers in a pod are nested under the pod cgroup.
|
||||
Each pod has an associated [Quality of Service (QoS)](resource-qos.md)
|
||||
class based on the aggregate resource requirements made by individual
|
||||
containers in the pod. The `kubelet` has the ability to
|
||||
[evict pods](kubelet-eviction.md) when compute resources are scarce. It evicts
|
||||
pods with the lowest QoS class in order to attempt to maintain stability of the
|
||||
node.
|
||||
|
||||
The proposed cgroup hierarchy would allow for more efficient resource management and lead to improvements in node reliability.
|
||||
This would also allow for significant latency optimizations in terms of pod eviction on nodes with the use of pod level resource usage metrics.
|
||||
This document provides a basic outline of how we plan to implement and rollout this feature.
|
||||
The `kubelet` has no associated cgroup sandbox for individual QoS classes or
|
||||
individual pods. This inhibits the ability to perform proper resource
|
||||
accounting on the node, and introduces a number of code complexities when
|
||||
trying to build features around QoS.
|
||||
|
||||
This design introduces a new cgroup hierarchy to enable the following:
|
||||
|
||||
## Non Goals
|
||||
1. Enforce QoS classes on the node.
|
||||
1. Simplify resource accounting at the pod level.
|
||||
1. Allow containers in a pod to share slack resources within its pod cgroup.
|
||||
For example, a Burstable pod has two containers, where one container makes a
|
||||
CPU request and the other container does not. The latter container should
|
||||
get CPU time not used by the former container. Today, it must compete for
|
||||
scare resources at the node level across all BestEffort containers.
|
||||
1. Ability to charge per container overhead to the pod instead of the node.
|
||||
This overhead is container runtime specific. For example, `docker` has
|
||||
an associated `containerd-shim` process that is created for each container
|
||||
which should be charged to the pod.
|
||||
1. Ability to charge any memory usage of memory-backed volumes to the pod when
|
||||
an individual container exits instead of the node.
|
||||
|
||||
- Pod level disk accounting will not be tackled in this proposal.
|
||||
- Pod level resource specification in the Kubernetes API will not be tackled in this proposal.
|
||||
## Enabling QoS and Pod level cgroups
|
||||
|
||||
## Motivations
|
||||
To enable the new cgroup hierarchy, the operator must enable the
|
||||
`--cgroups-per-qos` flag. Once enabled, the `kubelet` will start managing
|
||||
inner nodes of the described cgroup hierarchy.
|
||||
|
||||
Kubernetes currently supports container level isolation only and lets users specify resource requests/limits on the containers [Compute Resources](../../docs/design/resources.md). The `kubelet` creates a cgroup sandbox (via it's container runtime) for each container.
|
||||
The `--cgroup-root` flag if not specified when the `--cgroups-per-qos` flag
|
||||
is enabled will default to `/`. The `kubelet` will parent any cgroups
|
||||
it creates below that specified value per the
|
||||
[node allocatable](node-allocatable.md) design.
|
||||
|
||||
## Configuring a cgroup driver
|
||||
|
||||
There are a few shortcomings to the current model.
|
||||
- Existing QoS support does not apply to pods as a whole. On-going work to support pod level eviction using QoS requires all containers in a pod to belong to the same class. By having pod level cgroups, it is easy to track pod level usage and make eviction decisions.
|
||||
- Infrastructure overhead per pod is currently charged to the node. The overhead of setting up and managing the pod sandbox is currently accounted to the node. If the pod sandbox is a bit expensive, like in the case of hyper, having pod level accounting becomes critical.
|
||||
- For the docker runtime we have a containerd-shim which is a small library that sits in front of a runtime implementation allowing it to be reparented to init, handle reattach from the caller etc. With pod level cgroups containerd-shim can be charged to the pod instead of the machine.
|
||||
- If a container exits, all its anonymous pages (tmpfs) gets accounted to the machine (root). With pod level cgroups, that usage can also be attributed to the pod.
|
||||
- Let containers share resources - with pod level limits, a pod with a Burstable container and a BestEffort container is classified as Burstable pod. The BestEffort container is able to consume slack resources not used by the Burstable container, and still be capped by the overall pod level limits.
|
||||
The `kubelet` will support manipulation of the cgroup hierarchy on
|
||||
the host using a cgroup driver. The driver is configured via the
|
||||
`--cgroup-driver` flag.
|
||||
|
||||
## Design
|
||||
The supported values are the following:
|
||||
|
||||
High level requirements for the design are as follows:
|
||||
- Do not break existing users. Ideally, there should be no changes to the Kubernetes API semantics.
|
||||
- Support multiple cgroup managers - systemd, cgroupfs, etc.
|
||||
* `cgroupfs` is the default driver that performs direct manipulation of the
|
||||
cgroup filesystem on the host in order to manage cgroup sandboxes.
|
||||
* `systemd` is an alternative driver that manages cgroup sandboxes using
|
||||
transient slices for resources that are supported by that init system.
|
||||
|
||||
How we intend to achieve these high level goals is covered in greater detail in the Implementation Plan.
|
||||
Depending on the configuration of the associated container runtime,
|
||||
operators may have to choose a particular cgroup driver to ensure
|
||||
proper system behavior. For example, if operators use the `systemd`
|
||||
cgroup driver provided by the `docker` runtime, the `kubelet` must
|
||||
be configured to use the `systemd` cgroup driver.
|
||||
|
||||
We use the following denotations in the sections below:
|
||||
Implementation of either driver will delegate to the libcontainer library
|
||||
in opencontainers/runc.
|
||||
|
||||
For the three QoS classes
|
||||
`G⇒ Guaranteed QoS, Bu⇒ Burstable QoS, BE⇒ BestEffort QoS`
|
||||
### Conversion of cgroupfs to systemd naming conventions
|
||||
|
||||
For the value specified for the --qos-memory-overcommitment flag
|
||||
`qmo⇒ qos-memory-overcommitment`
|
||||
Internally, the `kubelet` maintains both an abstract and a concrete name
|
||||
for its associated cgroup sandboxes. The abstract name follows the traditional
|
||||
`cgroupfs` style syntax. The concrete name is the name for how the cgroup
|
||||
sandbox actually appears on the host filesystem after any conversions performed
|
||||
based on the cgroup driver.
|
||||
|
||||
Currently the Kubelet highly prioritizes resource utilization and thus allows BE pods to use as much resources as they want. And in case of OOM the BE pods are first to be killed. We follow this policy as G pods often don't use the amount of resource request they specify. By overcommiting the node the BE pods are able to utilize these left over resources. And in case of OOM the BE pods are evicted by the eviciton manager. But there is some latency involved in the pod eviction process which can be a cause of concern in latency-sensitive servers. On such servers we would want to avoid OOM conditions on the node. Pod level cgroups allow us to restrict the amount of available resources to the BE pods. So reserving the requested resources for the G and Bu pods would allow us to avoid invoking the OOM killer.
|
||||
If the `systemd` cgroup driver is used, the `kubelet` converts the `cgroupfs`
|
||||
style syntax into transient slices, and as a result, it must follow `systemd`
|
||||
conventions for path encoding.
|
||||
|
||||
For example, the cgroup name `/burstable/pod123-456` is translated to a
|
||||
transient slice with the name `burstable-pod123_456.slice`. Given how
|
||||
systemd manages the cgroup filesystem, the concrete name for the cgroup
|
||||
sandbox becomes `/burstable.slice/burstable-pod123_456.slice`.
|
||||
|
||||
We add a flag `qos-memory-overcommitment` to kubelet which would allow users to configure the percentage of memory overcommitment on the node. We have the default as 100, so by default we allow complete overcommitment on the node and let the BE pod use as much memory as it wants, and not reserve any resources for the G and Bu pods. As expected if there is an OOM in such a case we first kill the BE pods before the G and Bu pods.
|
||||
On the other hand if a user wants to ensure very predictable tail latency for latency-sensitive servers he would need to set qos-memory-overcommitment to a really low value(preferrably 0). In this case memory resources would be reserved for the G and Bu pods and BE pods would be able to use only the left over memory resource.
|
||||
## Integration with container runtimes
|
||||
|
||||
Examples in the next section.
|
||||
The `kubelet` when integrating with container runtimes always provides the
|
||||
concrete cgroup filesystem name for the pod sandbox.
|
||||
|
||||
### Proposed cgroup hierarchy:
|
||||
## Conversion of CPU millicores to cgroup configuration
|
||||
|
||||
For the initial implementation we will only support limits for cpu and memory resources.
|
||||
Kubernetes measures CPU requests and limits in millicores.
|
||||
|
||||
#### QoS classes
|
||||
The following formula is used to convert CPU in millicores to cgroup values:
|
||||
|
||||
A pod can belong to one of the following 3 QoS classes: Guaranteed, Burstable, and BestEffort, in decreasing order of priority.
|
||||
* cpu.shares = (cpu in millicores * 1024) / 1000
|
||||
* cpu.cfs_period_us = 100000 (i.e. 100ms)
|
||||
* cpu.cfs_quota_us = quota = (cpu in millicores * 100000) / 1000
|
||||
|
||||
#### Guaranteed
|
||||
## Pod level cgroups
|
||||
|
||||
`G` pods will be placed at the `$Root` cgroup by default. `$Root` is the system root i.e. "/" by default and if `--cgroup-root` flag is used then we use the specified cgroup-root as the `$Root`. To ensure Kubelet's idempotent behaviour we follow a pod cgroup naming format which is opaque and deterministic. Say we have a pod with UID: `5f9b19c9-3a30-11e6-8eea-28d2444e470d` the pod cgroup PodUID would be named: `pod-5f9b19c93a3011e6-8eea28d2444e470d`.
|
||||
The `kubelet` will create a cgroup sandbox for each pod.
|
||||
|
||||
The naming convention for the cgroup sandbox is `pod<pod.UID>`. It enables
|
||||
the `kubelet` to associate a particular cgroup on the host filesytem
|
||||
with a corresponding pod without managing any additional state. This is useful
|
||||
when the `kubelet` restarts and needs to verify the cgroup filesystem.
|
||||
|
||||
__Note__: The cgroup-root flag would allow the user to configure the root of the QoS cgroup hierarchy. Hence cgroup-root would be redefined as the root of QoS cgroup hierarchy and not containers.
|
||||
A pod can belong to one of the following 3 QoS classes in decreasing priority:
|
||||
|
||||
1. Guaranteed
|
||||
1. Burstable
|
||||
1. BestEffort
|
||||
|
||||
The resource configuration for the cgroup sandbox is dependent upon the
|
||||
pod's associated QoS class.
|
||||
|
||||
### Guaranteed QoS
|
||||
|
||||
A pod in this QoS class has its cgroup sandbox configured as follows:
|
||||
|
||||
```
|
||||
/PodUID/cpu.quota = cpu limit of Pod
|
||||
/PodUID/cpu.shares = cpu request of Pod
|
||||
/PodUID/memory.limit_in_bytes = memory limit of Pod
|
||||
pod<UID>/cpu.shares = sum(pod.spec.containers.resources.requests[cpu])
|
||||
pod<UID>/cpu.cfs_quota_us = sum(pod.spec.containers.resources.limits[cpu])
|
||||
pod<UID>/memory.limit_in_bytes = sum(pod.spec.containers.resources.limits[memory])
|
||||
```
|
||||
|
||||
Example:
|
||||
### Burstable QoS
|
||||
|
||||
A pod in this QoS class has its cgroup sandbox configured as follows:
|
||||
|
||||
```
|
||||
pod<UID>/cpu.shares = sum(pod.spec.containers.resources.requests[cpu])
|
||||
```
|
||||
|
||||
If all containers in the pod specify a cpu limit:
|
||||
|
||||
```
|
||||
pod<UID>/cpu.cfs_quota_us = sum(pod.spec.containers.resources.limits[cpu])
|
||||
```
|
||||
|
||||
Finally, if all containers in the pod specify a memory limit:
|
||||
|
||||
```
|
||||
pod<UID>/memory.limit_in_bytes = sum(pod.spec.containers.resources.limits[memory])
|
||||
```
|
||||
|
||||
### BestEffort QoS
|
||||
|
||||
A pod in this QoS class has its cgroup sandbox configured as follows:
|
||||
|
||||
```
|
||||
pod<UID>/cpu.shares = 2
|
||||
```
|
||||
|
||||
## QoS level cgroups
|
||||
|
||||
The `kubelet` defines a `--cgroup-root` flag that is used to specify the `ROOT`
|
||||
node in the cgroup hierarchy below which the `kubelet` should manange individual
|
||||
cgroup sandboxes. It is strongly recommended that users keep the default
|
||||
value for `--cgroup-root` as `/` in order to avoid deep cgroup hierarchies. The
|
||||
`kubelet` creates a cgroup sandbox under the specified path `ROOT/kubepods` per
|
||||
[node allocatable](node-allocatable.md) to parent pods. For simplicity, we will
|
||||
refer to `ROOT/kubepods` as `ROOT` in this document.
|
||||
|
||||
The `ROOT` cgroup sandbox is used to parent all pod sandboxes that are in
|
||||
the Guaranteed QoS class. By definition, pods in this class have cpu and
|
||||
memory limits specified that are equivalent to their requests so the pod
|
||||
level cgroup sandbox confines resource consumption without the need of an
|
||||
additional cgroup sandbox for the tier.
|
||||
|
||||
When the `kubelet` launches, it will ensure a `Burstable` cgroup sandbox
|
||||
and a `BestEffort` cgroup sandbox exist as children of `ROOT`. These cgroup
|
||||
sandboxes will parent pod level cgroups in those associated QoS classes.
|
||||
|
||||
The `kubelet` highly prioritizes resource utilization, and thus
|
||||
allows BestEffort and Burstable pods to potentially consume as many
|
||||
resources that are presently available on the node.
|
||||
|
||||
For compressible resources like CPU, the `kubelet` attempts to mitigate
|
||||
the issue via its use of CPU CFS shares. CPU time is proportioned
|
||||
dynamically when there is contention using CFS shares that attempts to
|
||||
ensure minimum requests are satisfied.
|
||||
|
||||
For incompressible resources, this prioritization scheme can inhibit the
|
||||
ability of a pod to have its requests satisfied. For example, a Guaranteed
|
||||
pods memory request may not be satisfied if there are active BestEffort
|
||||
pods consuming all available memory.
|
||||
|
||||
As a node operator, I may want to satisfy the following use cases:
|
||||
|
||||
1. I want to prioritize access to compressible resources for my system
|
||||
and/or kubernetes daemons over end-user pods.
|
||||
1. I want to prioritize access to compressible resources for my Guaranteed
|
||||
workloads over my Burstable workloads.
|
||||
1. I want to prioritize access to compressible resources for my Burstable
|
||||
workloads over my BestEffort workloads.
|
||||
|
||||
Almost all operators are encouraged to support the first use case by enforcing
|
||||
[node allocatable](node-allocatable.md) via `--system-reserved` and `--kube-reserved`
|
||||
flags. It is understood that not all operators may feel the need to extend
|
||||
that level of reservation to Guaranteed and Burstable workloads if they choose
|
||||
to prioritize utilization. That said, many users in the community deploy
|
||||
cluster services as Guaranteed or Burstable workloads via a `DaemonSet` and would like a similar
|
||||
resource reservation model as is provided via [node allocatable](node-allocatable)
|
||||
for system and kubernetes daemons.
|
||||
|
||||
For operators that have this concern, the `kubelet` with opt-in configuration
|
||||
will attempt to limit the abilty for a pod in a lower QoS tier to burst utilization
|
||||
of a compressible resource that was requested by a pod in a higher QoS tier.
|
||||
|
||||
The `kubelet` will support a flag `experimental-qos-reserved` that
|
||||
takes a set of percentages per incompressible resource that controls how the
|
||||
QoS cgroup sandbox attempts to reserve resources for its tier. It attempts
|
||||
to reserve requested resources to exclude pods from lower OoS classes from
|
||||
using resources requested by higher QoS classes. The flag will accept values
|
||||
in a range from 0-100%, where a value of `0%` instructs the `kubelet` to attempt
|
||||
no reservation, and a value of `100%` will instruct the `kubelet` to attempt to
|
||||
reserve the sum of requested resource across all pods on the node. The `kubelet`
|
||||
initially will only support `memory`. The default value per incompressible
|
||||
resource if not specified is for no reservation to occur for the incompressible
|
||||
resource.
|
||||
|
||||
Prior to starting a pod, the `kubelet` will attempt to update the
|
||||
QoS cgroup sandbox associated with the lower QoS tier(s) in order
|
||||
to prevent consumption of the requested resource by the new pod.
|
||||
For example, prior to starting a Guaranteed pod, the Burstable
|
||||
and BestEffort QoS cgroup sandboxes are adjusted. For resource
|
||||
specific details, and concerns, see the sections per resource that
|
||||
follow.
|
||||
|
||||
The `kubelet` will allocate resources to the QoS level cgroup
|
||||
dynamically in response to the following events:
|
||||
|
||||
1. kubelet startup/recovery
|
||||
1. prior to creation of the pod level cgroup
|
||||
1. after deletion of the pod level cgroup
|
||||
1. at periodic intervals to reach `experimental-qos-reserved`
|
||||
heurisitc that converge to a desired state.
|
||||
|
||||
All writes to the QoS level cgroup sandboxes are protected via a
|
||||
common lock in the kubelet to ensure we do not have multiple concurrent
|
||||
writers to this tier in the hierarchy.
|
||||
|
||||
### QoS level CPU allocation
|
||||
|
||||
The `BestEffort` cgroup sandbox is statically configured as follows:
|
||||
|
||||
```
|
||||
ROOT/besteffort/cpu.shares = 2
|
||||
```
|
||||
|
||||
This ensures that allocation of CPU time to pods in this QoS class
|
||||
is given the lowest priority.
|
||||
|
||||
The `Burstable` cgroup sandbox CPU share allocation is dynamic based
|
||||
on the set of pods currently scheduled to the node.
|
||||
|
||||
```
|
||||
ROOT/burstable/cpu.shares = max(sum(Burstable pods cpu requests, 2)
|
||||
```
|
||||
|
||||
The Burstable cgroup sandbox is updated dynamically in the exit
|
||||
points described in the previous section. Given the compressible
|
||||
nature of CPU, and the fact that cpu.shares are evaluated via relative
|
||||
priority, the risk of an update being incorrect is minimized as the `kubelet`
|
||||
converges to a desired state. Failure to set `cpu.shares` at the QoS level
|
||||
cgroup would result in `500m` of cpu for a Guaranteed pod to have different
|
||||
meaning than `500m` of cpu for a Burstable pod in the current hierarchy. This
|
||||
is because the default `cpu.shares` value if unspecified is `1024` and `cpu.shares`
|
||||
are evaluated relative to sibling nodes in the cgroup hierarchy. As a consequence,
|
||||
all of the Burstable pods under contention would have a relative priority of 1 cpu
|
||||
unless updated dynamically to capture the sum of requests. For this reason,
|
||||
we will always set `cpu.shares` for the QoS level sandboxes
|
||||
by default as part of roll-out for this feature.
|
||||
|
||||
### QoS level memory allocation
|
||||
|
||||
By default, no memory limits are applied to the BestEffort
|
||||
and Burstable QoS level cgroups unless a `--qos-reserve-requests` value
|
||||
is specified for memory.
|
||||
|
||||
The heuristic that is applied is as follows for each QoS level sandbox:
|
||||
|
||||
```
|
||||
ROOT/burstable/memory.limit_in_bytes =
|
||||
Node.Allocatable - {(summation of memory requests of `Guaranteed` pods)*(reservePercent / 100)}
|
||||
ROOT/besteffort/memory.limit_in_bytes =
|
||||
Node.Allocatable - {(summation of memory requests of all `Guaranteed` and `Burstable` pods)*(reservePercent / 100)}
|
||||
```
|
||||
|
||||
A value of `--experimental-qos-reserved=memory=100%` will cause the
|
||||
`kubelet` to adjust the Burstable and BestEffort cgroups from consuming memory
|
||||
that was requested by a higher QoS class. This increases the risk
|
||||
of inducing OOM on BestEffort and Burstable workloads in favor of increasing
|
||||
memory resource guarantees for Guaranteed and Burstable workloads. A value of
|
||||
`--experimental-qos-reserved=memory=0%` will allow a Burstable
|
||||
and BestEffort QoS sandbox to consume up to the full node allocatable amount if
|
||||
available, but increases the risk that a Guaranteed workload will not have
|
||||
access to requested memory.
|
||||
|
||||
Since memory is an incompressible resource, it is possible that a QoS
|
||||
level cgroup sandbox may not be able to reduce memory usage below the
|
||||
value specified in the heuristic described earlier during pod admission
|
||||
and pod termination.
|
||||
|
||||
As a result, the `kubelet` runs a periodic thread to attempt to converge
|
||||
to this desired state from the above heuristic. If unreclaimable memory
|
||||
usage has exceeded the desired limit for the sandbox, the `kubelet` will
|
||||
attempt to set the effective limit near the current usage to put pressure
|
||||
on the QoS cgroup sandbox and prevent further consumption.
|
||||
|
||||
The `kubelet` will not wait for the QoS cgroup memory limit to converge
|
||||
to the desired state prior to execution of the pod, but it will always
|
||||
attempt to cap the existing usage of QoS cgroup sandboxes in lower tiers.
|
||||
This does mean that the new pod could induce an OOM event at the `ROOT`
|
||||
cgroup, but ideally per our QoS design, the oom_killer targets a pod
|
||||
in a lower QoS class, or eviction evicts a lower QoS pod. The periodic
|
||||
task is then able to converge to the steady desired state so any future
|
||||
pods in a lower QoS class do not impact the pod at a higher QoS class.
|
||||
|
||||
Adjusting the memory limits for the QoS level cgroup sandbox carries
|
||||
greater risk given the incompressible nature of memory. As a result,
|
||||
we are not enabling this function by default, but would like operators
|
||||
that want to value resource priority over resource utilization to gather
|
||||
real-world feedback on its utility.
|
||||
|
||||
As a best practice, oeprators that want to provide a similar resource
|
||||
reservation model for Guaranteed pods as we offer via enforcement of
|
||||
node allocatable are encouraged to schedule their Guaranteed pods first
|
||||
as it will ensure the Burstable and BestEffort tiers have had their QoS
|
||||
memory limits appropriately adjusted before taking unbounded workload on
|
||||
node.
|
||||
|
||||
## Memory backed volumes
|
||||
|
||||
The pod level cgroup ensures that any writes to a memory backed volume
|
||||
are correctly charged to the pod sandbox even when a container process
|
||||
in the pod restarts.
|
||||
|
||||
All memory backed volumes are removed when a pod reaches a terminal state.
|
||||
|
||||
The `kubelet` verifies that a pod's cgroup is deleted from the
|
||||
host before deleting a pod from the API server as part of the graceful
|
||||
deletion process.
|
||||
|
||||
## Log basic cgroup management
|
||||
|
||||
The `kubelet` will log and collect metrics associated with cgroup manipulation.
|
||||
|
||||
It will log metrics for cgroup create, update, and delete actions.
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
### Kubernetes 1.5
|
||||
|
||||
The support for the described cgroup hierarchy is experimental.
|
||||
|
||||
### Kubernetes 1.6+
|
||||
|
||||
The feature will be enabled by default.
|
||||
|
||||
As a result, we will recommend that users drain their nodes prior
|
||||
to upgrade of the `kubelet`. If users do not drain their nodes, the
|
||||
`kubelet` will act as follows:
|
||||
|
||||
1. If a pod has a `RestartPolicy=Never`, then mark the pod
|
||||
as `Failed` and terminate its workload.
|
||||
1. All other pods that are not parented by a pod-level cgroup
|
||||
will be restarted.
|
||||
|
||||
The `cgroups-per-qos` flag will be enabled by default, but user's
|
||||
may choose to opt-out. We may deprecate this opt-out mechanism
|
||||
in Kubernetes 1.7, and remove the flag entirely in Kubernetes 1.8.
|
||||
|
||||
#### Risk Assessment
|
||||
|
||||
The impact of the unified cgroup hierarchy is restricted to the `kubelet`.
|
||||
|
||||
Potential issues:
|
||||
|
||||
1. Bugs
|
||||
1. Performance and/or reliability issues for `BestEffort` pods. This is
|
||||
most likely to appear on E2E test runs that mix/match pods across different
|
||||
QoS tiers.
|
||||
1. User misconfiguration; most notably the `--cgroup-driver` needs to match
|
||||
the expected behavior of the container runtime. We provide clear errors
|
||||
in `kubelet` logs for container runtimes that we include in tree.
|
||||
|
||||
#### Proposed Timeline
|
||||
|
||||
* 01/31/2017 - Discuss the rollout plan in sig-node meeting
|
||||
* 02/14/2017 - Flip the switch to enable pod level cgroups by default
|
||||
* enable existing experimental behavior by default
|
||||
* 02/21/2017 - Assess impacts based on enablement
|
||||
* 02/27/2017 - Kubernetes Feature complete (i.e. code freeze)
|
||||
* opt-in behavior surrounding the feature (`experimental-qos-reserved` support) completed.
|
||||
* 03/01/2017 - Send an announcement to kubernetes-dev@ about the rollout and potential impact
|
||||
* 03/22/2017 - Kubernetes 1.6 release
|
||||
* TBD (1.7?) - Eliminate the option to not use the new cgroup hierarchy.
|
||||
|
||||
This is based on the tentative timeline of kubernetes 1.6 release. Need to work out the timeline with the 1.6 release czar.
|
||||
|
||||
## Future enhancements
|
||||
|
||||
### Add Pod level metrics to Kubelet's metrics provider
|
||||
|
||||
Update the `kubelet` metrics provider to include pod level metrics.
|
||||
|
||||
### Evaluate supporting evictions local to QoS cgroup sandboxes
|
||||
|
||||
Rather than induce eviction at `/` or `/kubepods`, evaluate supporting
|
||||
eviction decisions for the unbounded QoS tiers (Burstable, BestEffort).
|
||||
|
||||
## Examples
|
||||
|
||||
The following describes the cgroup representation of a node with pods
|
||||
across multiple QoS classes.
|
||||
|
||||
### Cgroup Hierachy
|
||||
|
||||
The following identifies a sample hierarchy based on the described design.
|
||||
|
||||
It assumes the flag `--experimental-qos-reserved` is not enabled for clarity.
|
||||
|
||||
```
|
||||
$ROOT
|
||||
|
|
||||
+- Pod1
|
||||
| |
|
||||
| +- Container1
|
||||
| +- Container2
|
||||
| ...
|
||||
+- Pod2
|
||||
| +- Container3
|
||||
| ...
|
||||
+- ...
|
||||
|
|
||||
+- burstable
|
||||
| |
|
||||
| +- Pod3
|
||||
| | |
|
||||
| | +- Container4
|
||||
| | ...
|
||||
| +- Pod4
|
||||
| | +- Container5
|
||||
| | ...
|
||||
| +- ...
|
||||
|
|
||||
+- besteffort
|
||||
| |
|
||||
| +- Pod5
|
||||
| | |
|
||||
| | +- Container6
|
||||
| | +- Container7
|
||||
| | ...
|
||||
| +- ...
|
||||
```
|
||||
|
||||
### Guaranteed Pods
|
||||
|
||||
We have two pods Pod1 and Pod2 having Pod Spec given below
|
||||
|
||||
```yaml
|
||||
|
|
@ -142,32 +508,19 @@ spec:
|
|||
memory: 2Gii
|
||||
```
|
||||
|
||||
Pod1 and Pod2 are both classified as `G` and are nested under the `Root` cgroup.
|
||||
Pod1 and Pod2 are both classified as Guaranteed and are nested under the `ROOT` cgroup.
|
||||
|
||||
```
|
||||
/Pod1/cpu.quota = 110m
|
||||
/Pod1/cpu.shares = 110m
|
||||
/Pod2/cpu.quota = 20m
|
||||
/Pod2/cpu.shares = 20m
|
||||
/Pod1/memory.limit_in_bytes = 3Gi
|
||||
/Pod2/memory.limit_in_bytes = 2Gi
|
||||
/ROOT/Pod1/cpu.quota = 110m
|
||||
/ROOT/Pod1/cpu.shares = 110m
|
||||
/ROOT/Pod1/memory.limit_in_bytes = 3Gi
|
||||
/ROOT/Pod2/cpu.quota = 20m
|
||||
/ROOT/Pod2/cpu.shares = 20m
|
||||
/ROOT/Pod2/memory.limit_in_bytes = 2Gi
|
||||
```
|
||||
|
||||
#### Burstable
|
||||
#### Burstable Pods
|
||||
|
||||
We have the following resource parameters for the `Bu` cgroup.
|
||||
|
||||
```
|
||||
/Bu/cpu.shares = summation of cpu requests of all Bu pods
|
||||
/Bu/PodUID/cpu.quota = Pod Cpu Limit
|
||||
/Bu/PodUID/cpu.shares = Pod Cpu Request
|
||||
/Bu/memory.limit_in_bytes = Allocatable - {(summation of memory requests/limits of `G` pods)*(1-qom/100)}
|
||||
/Bu/PodUID/memory.limit_in_bytes = Pod memory limit
|
||||
```
|
||||
|
||||
`Note: For the `Bu` QoS when limits are not specified for any one of the containers, the Pod limit defaults to the node resource allocatable quantity.`
|
||||
|
||||
Example:
|
||||
We have two pods Pod3 and Pod4 having Pod Spec given below:
|
||||
|
||||
```yaml
|
||||
|
|
@ -207,33 +560,23 @@ spec:
|
|||
memory: 1Gi
|
||||
```
|
||||
|
||||
Pod3 and Pod4 are both classified as `Bu` and are hence nested under the Bu cgroup
|
||||
And for `qom` = 0
|
||||
Pod3 and Pod4 are both classified as Burstable and are hence nested under
|
||||
the Burstable cgroup.
|
||||
|
||||
```
|
||||
/Bu/cpu.shares = 30m
|
||||
/Bu/Pod3/cpu.quota = 150m
|
||||
/Bu/Pod3/cpu.shares = 20m
|
||||
/Bu/Pod4/cpu.quota = 20m
|
||||
/Bu/Pod4/cpu.shares = 10m
|
||||
/Bu/memory.limit_in_bytes = Allocatable - 5Gi
|
||||
/Bu/Pod3/memory.limit_in_bytes = 3Gi
|
||||
/Bu/Pod4/memory.limit_in_bytes = 2Gi
|
||||
/ROOT/burstable/cpu.shares = 30m
|
||||
/ROOT/burstable/memory.limit_in_bytes = Allocatable - 5Gi
|
||||
/ROOT/burstable/Pod3/cpu.quota = 150m
|
||||
/ROOT/burstable/Pod3/cpu.shares = 20m
|
||||
/ROOT/burstable/Pod3/memory.limit_in_bytes = 3Gi
|
||||
/ROOT/burstable/Pod4/cpu.quota = 20m
|
||||
/ROOT/burstable/Pod4/cpu.shares = 10m
|
||||
/ROOT/burstable/Pod4/memory.limit_in_bytes = 2Gi
|
||||
```
|
||||
|
||||
#### Best Effort
|
||||
#### Best Effort pods
|
||||
|
||||
For pods belonging to the `BE` QoS we don't set any quota.
|
||||
|
||||
```
|
||||
/BE/cpu.shares = 2
|
||||
/BE/cpu.quota= not set
|
||||
/BE/memory.limit_in_bytes = Allocatable - {(summation of memory requests of all `G` and `Bu` pods)*(1-qom/100)}
|
||||
/BE/PodUID/memory.limit_in_bytes = no limit
|
||||
```
|
||||
|
||||
Example:
|
||||
We have a pod 'Pod5' having Pod Spec given below:
|
||||
We have a pod, Pod5, having Pod Spec given below:
|
||||
|
||||
```yaml
|
||||
kind: Pod
|
||||
|
|
@ -247,170 +590,12 @@ spec:
|
|||
resources:
|
||||
```
|
||||
|
||||
Pod5 is classified as `BE` and is hence nested under the BE cgroup
|
||||
And for `qom` = 0
|
||||
Pod5 is classified as BestEffort and is hence nested under the BestEffort cgroup
|
||||
|
||||
```
|
||||
/BE/cpu.shares = 2
|
||||
/BE/cpu.quota= not set
|
||||
/BE/memory.limit_in_bytes = Allocatable - 7Gi
|
||||
/BE/Pod5/memory.limit_in_bytes = no limit
|
||||
/ROOT/besteffort/cpu.shares = 2
|
||||
/ROOT/besteffort/cpu.quota= not set
|
||||
/ROOT/besteffort/memory.limit_in_bytes = Allocatable - 7Gi
|
||||
/ROOT/besteffort/Pod5/memory.limit_in_bytes = no limit
|
||||
```
|
||||
|
||||
### With Systemd
|
||||
|
||||
In systemd we have slices for the three top level QoS class. Further each pod is a subslice of exactly one of the three QoS slices. Each container in a pod belongs to a scope nested under the qosclass-pod slice.
|
||||
|
||||
Example: We plan to have the following cgroup hierarchy on systemd systems
|
||||
|
||||
```
|
||||
/memory/G-PodUID.slice/containerUID.scope
|
||||
/cpu,cpuacct/G-PodUID.slice/containerUID.scope
|
||||
/memory/Bu.slice/Bu-PodUID.slice/containerUID.scope
|
||||
/cpu,cpuacct/Bu.slice/Bu-PodUID.slice/containerUID.scope
|
||||
/memory/BE.slice/BE-PodUID.slice/containerUID.scope
|
||||
/cpu,cpuacct/BE.slice/BE-PodUID.slice/containerUID.scope
|
||||
```
|
||||
|
||||
### Hierarchy Outline
|
||||
|
||||
- "$Root" is the system root of the node i.e. "/" by default and if `--cgroup-root` is specified then the specified cgroup-root is used as "$Root".
|
||||
- We have a top level QoS cgroup for the `Bu` and `BE` QoS classes.
|
||||
- But we __dont__ have a separate cgroup for the `G` QoS class. `G` pod cgroups are brought up directly under the `Root` cgroup.
|
||||
- Each pod has its own cgroup which is nested under the cgroup matching the pod's QoS class.
|
||||
- All containers brought up by the pod are nested under the pod's cgroup.
|
||||
- system-reserved cgroup contains the system specific processes.
|
||||
- kube-reserved cgroup contains the kubelet specific daemons.
|
||||
|
||||
```
|
||||
$ROOT
|
||||
|
|
||||
+- Pod1
|
||||
| |
|
||||
| +- Container1
|
||||
| +- Container2
|
||||
| ...
|
||||
+- Pod2
|
||||
| +- Container3
|
||||
| ...
|
||||
+- ...
|
||||
|
|
||||
+- Bu
|
||||
| |
|
||||
| +- Pod3
|
||||
| | |
|
||||
| | +- Container4
|
||||
| | ...
|
||||
| +- Pod4
|
||||
| | +- Container5
|
||||
| | ...
|
||||
| +- ...
|
||||
|
|
||||
+- BE
|
||||
| |
|
||||
| +- Pod5
|
||||
| | |
|
||||
| | +- Container6
|
||||
| | +- Container7
|
||||
| | ...
|
||||
| +- ...
|
||||
|
|
||||
+- System-reserved
|
||||
| |
|
||||
| +- system
|
||||
| +- docker (optional)
|
||||
| +- ...
|
||||
|
|
||||
+- Kube-reserved
|
||||
| |
|
||||
| +- kubelet
|
||||
| +- docker (optional)
|
||||
| +- ...
|
||||
|
|
||||
```
|
||||
|
||||
#### QoS Policy Design Decisions
|
||||
|
||||
- This hierarchy highly prioritizes resource guarantees to the `G` over `Bu` and `BE` pods.
|
||||
- By not having a separate cgroup for the `G` class, the hierarchy allows the `G` pods to burst and utilize all of Node's Allocatable capacity.
|
||||
- The `BE` and `Bu` pods are strictly restricted from bursting and hogging resources and thus `G` Pods are guaranteed resource isolation.
|
||||
- `BE` pods are treated as lowest priority. So for the `BE` QoS cgroup we set cpu shares to the lowest possible value ie.2. This ensures that the `BE` containers get a relatively small share of cpu time.
|
||||
- Also we don't set any quota on the cpu resources as the containers on the `BE` pods can use any amount of free resources on the node.
|
||||
- Having memory limit of `BE` cgroup as (Allocatable - summation of memory requests of `G` and `Bu` pods) would result in `BE` pods becoming more susceptible to being OOM killed. As more `G` and `Bu` pods are scheduled kubelet will more likely kill `BE` pods, even if the `G` and `Bu` pods are using less than their request since we will be dynamically reducing the size of `BE` m.limit_in_bytes. But this allows for better memory guarantees to the `G` and `Bu` pods.
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
The implementation plan is outlined in the next sections.
|
||||
We will have a 'experimental-cgroups-per-qos' flag to specify if the user wants to use the QoS based cgroup hierarchy. The flag would be set to false by default at least in v1.5.
|
||||
|
||||
#### Top level Cgroups for QoS tiers
|
||||
|
||||
Two top level cgroups for `Bu` and `BE` QoS classes are created when Kubelet starts to run on a node. All `G` pods cgroups are by default nested under the `Root`. So we dont create a top level cgroup for the `G` class. For raw cgroup systems we would use libcontainers cgroups manager for general cgroup management(cgroup creation/destruction). But for systemd we don't have equivalent support for slice management in libcontainer yet. So we will be adding support for the same in the Kubelet. These cgroups are only created once on Kubelet initialization as a part of node setup. Also on systemd these cgroups are transient units and will not survive reboot.
|
||||
|
||||
#### Pod level Cgroup creation and deletion (Docker runtime)
|
||||
|
||||
- When a new pod is brought up, its QoS class is firstly determined.
|
||||
- We add an interface to Kubelet's ContainerManager to create and delete pod level cgroups under the cgroup that matches the pod's QoS class.
|
||||
- This interface will be pluggable. Kubelet will support both systemd and raw cgroups based __cgroup__ drivers. We will be using the --cgroup-driver flag proposed in the [Systemd Node Spec](kubelet-systemd.md) to specify the cgroup driver.
|
||||
- We inject creation and deletion of pod level cgroups into the pod workers.
|
||||
- As new pods are added QoS class cgroup parameters are updated to match the resource requests by the Pod.
|
||||
|
||||
#### Container level cgroups
|
||||
|
||||
Have docker manager create container cgroups under pod level cgroups. With the docker runtime, we will pass --cgroup-parent using the syntax expected for the corresponding cgroup-driver the runtime was configured to use.
|
||||
|
||||
#### Rkt runtime
|
||||
|
||||
We want to have rkt create pods under a root QoS class that kubelet specifies, and set pod level cgroup parameters mentioned in this proposal by itself.
|
||||
|
||||
#### Add Pod level metrics to Kubelet's metrics provider
|
||||
|
||||
Update Kubelet's metrics provider to include Pod level metrics. Use cAdvisor's cgroup subsystem information to determine various Pod level usage metrics.
|
||||
|
||||
`Note: Changes to cAdvisor might be necessary.`
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
This feature will be opt-in in v1.4 and an opt-out in v1.5. We recommend users to drain their nodes and opt-in, before switching to v1.5, which will result in a no-op when v1.5 kubelet is rolled out.
|
||||
|
||||
## Implementation Status
|
||||
|
||||
The implementation goals of the first milestone are outlined below.
|
||||
- [x] Finalize and submit Pod Resource Management proposal for the project #26751
|
||||
- [x] Refactor qos package to be used globally throughout the codebase #27749 #28093
|
||||
- [x] Add interfaces for CgroupManager and CgroupManagerImpl which implements the CgroupManager interface and creates, destroys/updates cgroups using the libcontainer cgroupfs driver. #27755 #28566
|
||||
- [x] Inject top level QoS Cgroup creation in the Kubelet and add e2e tests to test that behaviour. #27853
|
||||
- [x] Add PodContainerManagerImpl Create and Destroy methods which implements the respective PodContainerManager methods using a cgroupfs driver. #28017
|
||||
- [x] Have docker manager create container cgroups under pod level cgroups. Inject creation and deletion of pod cgroups into the pod workers. Add e2e tests to test this behaviour. #29049
|
||||
- [x] Add support for updating policy for the pod cgroups. Add e2e tests to test this behaviour. #29087
|
||||
- [ ] Enabling 'cgroup-per-qos' flag in Kubelet: The user is expected to drain the node and restart it before enabling this feature, but as a fallback we also want to allow the user to just restart the kubelet with the cgroup-per-qos flag enabled to use this feature. As a part of this we need to figure out a policy for pods having Restart Policy: Never. More details in this [issue](https://github.com/kubernetes/kubernetes/issues/29946).
|
||||
- [ ] Removing terminated pod's Cgroup : We need to cleanup the pod's cgroup once the pod is terminated. More details in this [issue](https://github.com/kubernetes/kubernetes/issues/29927).
|
||||
- [ ] Kubelet needs to ensure that the cgroup settings are what the kubelet expects them to be. If security is not of concern, one can assume that once kubelet applies cgroups setting successfully, the values will never change unless kubelet changes it. If security is of concern, then kubelet will have to ensure that the cgroup values meet its requirements and then continue to watch for updates to cgroups via inotify and re-apply cgroup values if necessary.
|
||||
Updating QoS limits needs to happen before pod cgroups values are updated. When pod cgroups are being deleted, QoS limits have to be updated after pod cgroup values have been updated for deletion or pod cgroups have been removed. Given that kubelet doesn't have any checkpoints and updates to QoS and pod cgroups are not atomic, kubelet needs to reconcile cgroups status whenever it restarts to ensure that the cgroups values match kubelet's expectation.
|
||||
- [ ] [TEST] Opting in for this feature and rollbacks should be accompanied by detailed error message when killing pod intermittently.
|
||||
- [ ] Add a systemd implementation for Cgroup Manager interface
|
||||
|
||||
|
||||
Other smaller work items that we would be good to have before the release of this feature.
|
||||
- [ ] Add Pod UID to the downward api which will help simplify the e2e testing logic.
|
||||
- [ ] Check if parent cgroup exist and error out if they don't.
|
||||
- [ ] Set top level cgroup limit to resource allocatable until we support QoS level cgroup updates. If cgroup root is not `/` then set node resource allocatable as the cgroup resource limits on cgroup root.
|
||||
- [ ] Add a NodeResourceAllocatableProvider which returns the amount of allocatable resources on the nodes. This interface would be used both by the Kubelet and ContainerManager.
|
||||
- [ ] Add top level feasibility check to ensure that pod can be admitted on the node by estimating left over resources on the node.
|
||||
- [ ] Log basic cgroup management ie. creation/deletion metrics
|
||||
|
||||
|
||||
To better support our requirements we needed to make some changes/add features to Libcontainer as well
|
||||
|
||||
- [x] Allowing or denying all devices by writing 'a' to devices.allow or devices.deny is
|
||||
not possible once the device cgroups has children. Libcontainer doesn't have the option of skipping updates on parent devices cgroup. opencontainers/runc/pull/958
|
||||
- [x] To use libcontainer for creating and managing cgroups in the Kubelet, I would like to just create a cgroup with no pid attached and if need be apply a pid to the cgroup later on. But libcontainer did not support cgroup creation without attaching a pid. opencontainers/runc/pull/956
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
|||
|
|
@ -0,0 +1,407 @@
|
|||
# Pod Safety, Consistency Guarantees, and Storage Implicitions
|
||||
|
||||
@smarterclayton @bprashanth
|
||||
|
||||
October 2016
|
||||
|
||||
## Proposal and Motivation
|
||||
|
||||
A pod represents the finite execution of one or more related processes on the
|
||||
cluster. In order to ensure higher level consistent controllers can safely
|
||||
build on top of pods, the exact guarantees around its lifecycle on the cluster
|
||||
must be clarified, and it must be possible for higher order controllers
|
||||
and application authors to correctly reason about the lifetime of those
|
||||
processes and their access to cluster resources in a distributed computing
|
||||
environment.
|
||||
|
||||
To run most clustered software on Kubernetes, it must be possible to guarantee
|
||||
**at most once** execution of a particular pet pod at any time on the cluster.
|
||||
This allows the controller to prevent multiple processes having access to
|
||||
shared cluster resources believing they are the same entity. When a node
|
||||
containing a pet is partitioned, the Pet Set must remain consistent (no new
|
||||
entity will be spawned) but may become unavailable (cluster no longer has
|
||||
a sufficient number of members). The Pet Set guarantee must be strong enough
|
||||
for an administrator to reason about the state of the cluster by observing
|
||||
the Kubernetes API.
|
||||
|
||||
In order to reconcile partitions, an actor (human or automated) must decide
|
||||
when the partition is unrecoverable. The actor may be informed of the failure
|
||||
in an unambiguous way (e.g. the node was destroyed by a meteor) allowing for
|
||||
certainty that the processes on that node are terminated, and thus may
|
||||
resolve the partition by deleting the node and the pods on the node.
|
||||
Alternatively, the actor may take steps to ensure the partitioned node
|
||||
cannot return to the cluster or access shared resources - this is known
|
||||
as **fencing** and is a well understood domain.
|
||||
|
||||
This proposal covers the changes necessary to ensure:
|
||||
|
||||
* Pet Sets can ensure **at most one** semantics for each individual pet
|
||||
* Other system components such as the node and namespace controller can
|
||||
safely perform their responsibilities without violating that guarantee
|
||||
* An administrator or higher level controller can signal that a node
|
||||
partition is permanent, allowing the Pet Set controller to proceed.
|
||||
* A fencing controller can take corrective action automatically to heal
|
||||
partitions
|
||||
|
||||
We will accomplish this by:
|
||||
|
||||
* Clarifying which components are allowed to force delete pods (as opposed
|
||||
to merely requesting termination)
|
||||
* Ensuring system components can observe partitioned pods and nodes
|
||||
correctly
|
||||
* Defining how a fencing controller could safely interoperate with
|
||||
partitioned nodes and pods to safely heal partitions
|
||||
* Describing how shared storage components without innate safety
|
||||
guarantees can be safely shared on the cluster.
|
||||
|
||||
|
||||
### Current Guarantees for Pod lifecycle
|
||||
|
||||
The existing pod model provides the following guarantees:
|
||||
|
||||
* A pod is executed on exactly one node
|
||||
* A pod has the following lifecycle phases:
|
||||
* Creation
|
||||
* Scheduling
|
||||
* Execution
|
||||
* Init containers
|
||||
* Application containers
|
||||
* Termination
|
||||
* Deletion
|
||||
* A pod can only move through its phases in order, and may not return
|
||||
to an earlier phase.
|
||||
* A user may specify an interval on the pod called the **termination
|
||||
grace period** that defines the minimum amount of time the pod will
|
||||
have to complete the termination phase, and all components will honor
|
||||
this interval.
|
||||
* Once a pod begins termination, its termination grace period can only
|
||||
be shortened, not lengthened.
|
||||
|
||||
Pod termination is divided into the following steps:
|
||||
|
||||
* A component requests the termination of the pod by issuing a DELETE
|
||||
to the pod resource with an optional **grace period**
|
||||
* If no grace period is provided, the default from the pod is leveraged
|
||||
* When the kubelet observes the deletion, it starts a timer equal to the
|
||||
grace period and performs the following actions:
|
||||
* Executes the pre-stop hook, if specified, waiting up to **grace period**
|
||||
seconds before continuing
|
||||
* Sends the termination signal to the container runtime (SIGTERM or the
|
||||
container image's STOPSIGNAL on Docker)
|
||||
* Waits 2 seconds, or the remaining grace period, whichever is longer
|
||||
* Sends the force termination signal to the container runtime (SIGKILL)
|
||||
* Once the kubelet observes the container is fully terminated, it issues
|
||||
a status update to the REST API for the pod indicating termination, then
|
||||
issues a DELETE with grace period = 0.
|
||||
|
||||
If the kubelet crashes during the termination process, it will restart the
|
||||
termination process from the beginning (grace period is reset). This ensures
|
||||
that a process is always given **at least** grace period to terminate cleanly.
|
||||
|
||||
A user may re-issue a DELETE to the pod resource specifying a shorter grace
|
||||
period, but never a longer one.
|
||||
|
||||
Deleting a pod with grace period 0 is called **force deletion** and will
|
||||
update the pod with a `deletionGracePeriodSeconds` of 0, and then immediately
|
||||
remove the pod from etcd. Because all communication is asynchronous,
|
||||
force deleting a pod means that the pod processes may continue
|
||||
to run for an arbitary amount of time. If a higher level component like the
|
||||
StatefulSet controller treats the existence of the pod API object as a strongly
|
||||
consistent entity, deleting the pod in this fashion will violate the
|
||||
at-most-one guarantee we wish to offer for pet sets.
|
||||
|
||||
|
||||
### Guarantees provided by replica sets and replication controllers
|
||||
|
||||
ReplicaSets and ReplicationControllers both attempt to **preserve availability**
|
||||
of their constituent pods over ensuring at most one (of a pod) semantics. So a
|
||||
replica set to scale 1 will immediately create a new pod when it observes an
|
||||
old pod has begun graceful deletion, and as a result at many points in the
|
||||
lifetime of a replica set there will be 2 copies of a pod's processes running
|
||||
concurrently. Only access to exclusive resources like storage can prevent that
|
||||
simultaneous execution.
|
||||
|
||||
Deployments, being based on replica sets, can offer no stronger guarantee.
|
||||
|
||||
|
||||
### Concurrent access guarantees for shared storage
|
||||
|
||||
A persistent volume that references a strongly consistent storage backend
|
||||
like AWS EBS, GCE PD, OpenStack Cinder, or Ceph RBD can rely on the storage
|
||||
API to prevent corruption of the data due to simultaneous access by multiple
|
||||
clients. However, many commonly deployed storage technologies in the
|
||||
enterprise offer no such consistency guarantee, or much weaker variants, and
|
||||
rely on complex systems to control which clients may access the storage.
|
||||
|
||||
If a PV is assigned a iSCSI, Fibre Channel, or NFS mount point and that PV
|
||||
is used by two pods on different nodes simultaneously, concurrent access may
|
||||
result in corruption, even if the PV or PVC is identified as "read write once".
|
||||
PVC consumers must ensure these volume types are *never* referenced from
|
||||
multiple pods without some external synchronization. As described above, it
|
||||
is not safe to use persistent volumes that lack RWO guarantees with a
|
||||
replica set or deployment, even at scale 1.
|
||||
|
||||
|
||||
## Proposed changes
|
||||
|
||||
### Avoid multiple instances of pods
|
||||
|
||||
To ensure that the Pet Set controller can safely use pods and ensure at most
|
||||
one pod instance is running on the cluster at any time for a given pod name,
|
||||
it must be possible to make pod deletion strongly consistent.
|
||||
|
||||
To do that, we will:
|
||||
|
||||
* Give the Kubelet sole responsibility for normal deletion of pods -
|
||||
only the Kubelet in the course of normal operation should ever remove a
|
||||
pod from etcd (only the Kubelet should force delete)
|
||||
* The kubelet must not delete the pod until all processes are confirmed
|
||||
terminated.
|
||||
* The kubelet SHOULD ensure all consumed resources on the node are freed
|
||||
before deleting the pod.
|
||||
* Application owners must be free to force delete pods, but they *must*
|
||||
understand the implications of doing so, and all client UI must be able
|
||||
to communicate those implications.
|
||||
* Force deleting a pod may cause data loss (two instances of the same
|
||||
pod process may be running at the same time)
|
||||
* All existing controllers in the system must be limited to signaling pod
|
||||
termination (starting graceful deletion), and are not allowed to force
|
||||
delete a pod.
|
||||
* The node controller will no longer be allowed to force delete pods -
|
||||
it may only signal deletion by beginning (but not completing) a
|
||||
graceful deletion.
|
||||
* The GC controller may not force delete pods
|
||||
* The namespace controller used to force delete pods, but no longer
|
||||
does so. This means a node partition can block namespace deletion
|
||||
indefinitely.
|
||||
* The pod GC controller may continue to force delete pods on nodes that
|
||||
no longer exist if we treat node deletion as confirming permanent
|
||||
partition. If we do not, the pod GC controller must not force delete
|
||||
pods.
|
||||
* It must be possible for an administrator to effectively resolve partitions
|
||||
manually to allow namespace deletion.
|
||||
* Deleting a node from etcd should be seen as a signal to the cluster that
|
||||
the node is permanently partitioned. We must audit existing components
|
||||
to verify this is the case.
|
||||
* The PodGC controller has primary responsibility for this - it already
|
||||
owns the responsibility to delete pods on nodes that do not exist, and
|
||||
so is allowed to force delete pods on nodes that do not exist.
|
||||
* The PodGC controller must therefore always be running and will be
|
||||
changed to always be running for this responsibility in a >=1.5
|
||||
cluster.
|
||||
|
||||
In the above scheme, force deleting a pod releases the lock on that pod and
|
||||
allows higher level components to proceed to create a replacement.
|
||||
|
||||
It has been requested that force deletion be restricted to privileged users.
|
||||
That limits the application owner in resolving partitions when the consequences
|
||||
of force deletion are understood, and not all application owners will be
|
||||
privileged users. For example, a user may be running a 3 node etcd cluster in a
|
||||
pet set. If pet 2 becomes partitioned, the user can instruct etcd to remove
|
||||
pet 2 from the cluster (via direct etcd membership calls), and because a quorum
|
||||
exists pets 0 and 1 can safely accept that action. The user can then force
|
||||
delete pet 2 and the pet set controller will be able to recreate that pet on
|
||||
another node and have it join the cluster safely (pets 0 and 1 constitute a
|
||||
quorum for membership change).
|
||||
|
||||
This proposal does not alter the behavior of finalizers - instead, it makes
|
||||
finalizers unnecessary for common application cases (because the cluster only
|
||||
deletes pods when safe).
|
||||
|
||||
### Fencing
|
||||
|
||||
The changes above allow Pet Sets to ensure at-most-one pod, but provide no
|
||||
recourse for the automatic resolution of cluster partitions during normal
|
||||
operation. For that, we propose a **fencing controller** which exists above
|
||||
the current controller plane and is capable of detecting and automatically
|
||||
resolving partitions. The fencing controller is an agent empowered to make
|
||||
similar decisions as a human administrator would make to resolve partitions,
|
||||
and to take corresponding steps to prevent a dead machine from coming back
|
||||
to life automatically.
|
||||
|
||||
Fencing controllers most benefit services that are not innately replicated
|
||||
by reducing the amount of time it takes to detect a failure of a node or
|
||||
process, isolate that node or process so it cannot initiate or receive
|
||||
communication from clients, and then spawn another process. It is expected
|
||||
that many StatefulSets of size 1 would prefer to be fenced, given that most
|
||||
applications in the real world of size 1 have no other alternative for HA
|
||||
except reducing mean-time-to-recovery.
|
||||
|
||||
While the methods and algorithms may vary, the basic pattern would be:
|
||||
|
||||
1. Detect a partitioned pod or node via the Kubernetes API or via external
|
||||
means.
|
||||
2. Decide whether the partition justifies fencing based on priority, policy, or
|
||||
service availability requirements.
|
||||
3. Fence the node or any connected storage using appropriate mechanisms.
|
||||
|
||||
For this proposal we only describe the general shape of detection and how existing
|
||||
Kubernetes components can be leveraged for policy, while the exact implementation
|
||||
and mechanisms for fencing are left to a future proposal. A future fencing controller
|
||||
would be able to leverage a number of systems including but not limited to:
|
||||
|
||||
* Cloud control plane APIs such as machine force shutdown
|
||||
* Additional agents running on each host to force kill process or trigger reboots
|
||||
* Agents integrated with or communicating with hypervisors running hosts to stop VMs
|
||||
* Hardware IPMI interfaces to reboot a host
|
||||
* Rack level power units to power cycle a blade
|
||||
* Network routers, backplane switches, software defined networks, or system firewalls
|
||||
* Storage server APIs to block client access
|
||||
|
||||
to appropriately limit the ability of the partitioned system to impact the cluster.
|
||||
Fencing agents today use many of these mechanisms to allow the system to make
|
||||
progress in the event of failure. The key contribution of Kubernetes is to define
|
||||
a strongly consistent pattern whereby fencing agents can be plugged in.
|
||||
|
||||
To allow users, clients, and automated systems like the fencing controllers to
|
||||
observe partitions, we propose an additional responsibility to the node controller
|
||||
or any future controller that attempts to detect partition. The node controller should
|
||||
add an additional condition to pods that have been terminated due to a node failing
|
||||
to heartbeat that indicates that the cause of the deletion was node partition.
|
||||
|
||||
It may be desirable for users to be able to request fencing when they suspect a
|
||||
component is malfunctioning. It is outside the scope of this proposal but would
|
||||
allow administrators to take an action that is safer than force deletion, and
|
||||
decide at the end whether to force delete.
|
||||
|
||||
How the fencing controller decides to fence is left undefined, but it is likely
|
||||
it could use a combination of pod forgiveness (as a signal of how much disruption
|
||||
a pod author is likely to accept) and pod disruption budget (as a measurement of
|
||||
the amount of disruption already undergone) to measure how much latency between
|
||||
failure and fencing the app is willing to tolerate. Likewise, it can use its own
|
||||
understanding of the latency of the various failure detectors - the node controller,
|
||||
any hypothetical information it gathers from service proxies or node peers, any
|
||||
heartbeat agents in the system - to describe an upper bound on reaction.
|
||||
|
||||
|
||||
### Storage Consistency
|
||||
|
||||
To ensure that shared storage without implicit locking be safe for RWO access, the
|
||||
Kubernetes storage subsystem should leverage the strong consistency available through
|
||||
the API server and prevent concurrent execution for some types of persistent volumes.
|
||||
By leveraging existing concepts, we can allow the scheduler and the kubelet to enforce
|
||||
a guarantee that an RWO volume can be used on at-most-one node at a time.
|
||||
|
||||
In order to properly support region and zone specific storage, Kubernetes adds node
|
||||
selector restrictions to pods derived from the persistent volume. Expanding this
|
||||
concept to volume types that have no external metadata to read (NFS, iSCSI) may
|
||||
result in adding a label selector to PVs that defines the allowed nodes the storage
|
||||
can run on (this is a common requirement for iSCSI, FibreChannel, or NFS clusters).
|
||||
|
||||
Because all nodes in a Kubernetes cluster possess a special node name label, it would
|
||||
be possible for a controller to observe the scheduling decision of a pod using an
|
||||
unsafe volume and "attach" that volume to the node, and also observe the deletion of
|
||||
the pod and "detach" the volume from the node. The node would then require that these
|
||||
unsafe volumes be "attached" before allowing pod execution. Attach and detach may
|
||||
be recorded on the PVC or PV as a new field or materialized via the selection labels.
|
||||
|
||||
Possible sequence of operations:
|
||||
|
||||
1. Cluster administrator creates a RWO iSCSI persistent volume, available only to
|
||||
nodes with the label selector `storagecluster=iscsi-1`
|
||||
2. User requests an RWO volume and is bound to the iSCSI volume
|
||||
3. The user creates a pod referencing the PVC
|
||||
4. The scheduler observes the pod must schedule on nodes with `storagecluster=iscsi-1`
|
||||
(alternatively this could be enforced in admission) and binds to node `A`
|
||||
5. The kubelet on node `A` observes the pod references a PVC that specifies RWO which
|
||||
requires "attach" to be successful
|
||||
6. The attach/detach controller observes that a pod has been bound with a PVC that
|
||||
requires "attach", and attempts to execute a compare and swap update on the PVC/PV
|
||||
attaching it to node `A` and pod 1
|
||||
7. The kubelet observes the attach of the PVC/PV and executes the pod
|
||||
8. The user terminates the pod
|
||||
9. The user creates a new pod that references the PVC
|
||||
10. The scheduler binds this new pod to node `B`, which also has `storagecluster=iscsi-1`
|
||||
11. The kubelet on node `B` observes the new pod, but sees that the PVC/PV is bound
|
||||
to node `A` and so must wait for detach
|
||||
12. The kubelet on node `A` completes the deletion of pod 1
|
||||
13. The attach/detach controller observes the first pod has been deleted and that the
|
||||
previous attach of the volume to pod 1 is no longer valid - it performs a CAS
|
||||
update on the PVC/PV clearing its attach state.
|
||||
14. The attach/detach controller observes the second pod has been scheduled and
|
||||
attaches it to node `B` and pod 2
|
||||
15. The kubelet on node `B` observes the attach and allows the pod to execute.
|
||||
|
||||
If a partition occurred after step 11, the attach controller would block waiting
|
||||
for the pod to be deleted, and prevent node `B` from launching the second pod.
|
||||
The fencing controller, upon observing the partition, could signal the iSCSI servers
|
||||
to firewall node `A`. Once that firewall is in place, the fencing controller could
|
||||
break the PVC/PV attach to node `A`, allowing steps 13 onwards to continue.
|
||||
|
||||
|
||||
### User interface changes
|
||||
|
||||
Clients today may assume that force deletions are safe. We must appropriately
|
||||
audit clients to identify this behavior and improve the messages. For instance,
|
||||
`kubectl delete --grace-period=0` could print a warning and require `--confirm`:
|
||||
|
||||
```
|
||||
$ kubectl delete pod foo --grace-period=0
|
||||
warning: Force deleting a pod does not wait for the pod to terminate, meaning
|
||||
your containers will be stopped asynchronously. Pass --confirm to
|
||||
continue
|
||||
```
|
||||
|
||||
Likewise, attached volumes would require new semantics to allow the attachment
|
||||
to be broken.
|
||||
|
||||
Clients should communicate partitioned state more clearly - changing the status
|
||||
column of a pod list to contain the condition indicating NodeDown would help
|
||||
users understand what actions they could take.
|
||||
|
||||
|
||||
## Backwards compatibility
|
||||
|
||||
On an upgrade, pet sets would not be "safe" until the above behavior is implemented.
|
||||
All other behaviors should remain as-is.
|
||||
|
||||
|
||||
## Testing
|
||||
|
||||
All of the above implementations propose to ensure pods can be treated as components
|
||||
of a strongly consistent cluster. Since formal proofs of correctness are unlikely in
|
||||
the foreseeable future, Kubernetes must empirically demonstrate the correctness of
|
||||
the proposed systems. Automated testing of the mentioned components should be
|
||||
designed to expose ordering and consistency flaws in the presence of
|
||||
|
||||
* Master-node partitions
|
||||
* Node-node partitions
|
||||
* Master-etcd partitions
|
||||
* Concurrent controller execution
|
||||
* Kubelet failures
|
||||
* Controller failures
|
||||
|
||||
A test suite that can perform these tests in combination with real world pet sets
|
||||
would be desirable, although possibly non-blocking for this proposal.
|
||||
|
||||
|
||||
## Documentation
|
||||
|
||||
We should document the lifecycle guarantees provided by the cluster in a clear
|
||||
and unambiguous way to end users.
|
||||
|
||||
|
||||
## Deferred issues
|
||||
|
||||
* Live migration continues to be unsupported on Kubernetes for the foreseeable
|
||||
future, and no additional changes will be made to this proposal to account for
|
||||
that feature.
|
||||
|
||||
|
||||
## Open Questions
|
||||
|
||||
* Should node deletion be treated as "node was down and all processes terminated"
|
||||
* Pro: it's a convenient signal that we use in other places today
|
||||
* Con: the kubelet recreates its Node object, so if a node is partitioned and
|
||||
the admin deletes the node, when the partition is healed the node would be
|
||||
recreated, and the processes are *definitely* not terminated
|
||||
* Implies we must alter the pod GC controller to only signal graceful deletion,
|
||||
and only to flag pods on nodes that don't exist as partitioned, rather than
|
||||
force deleting them.
|
||||
* Decision: YES - captured above.
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
@ -30,6 +30,7 @@ implementation-oriented (think control knobs).
|
|||
given the desired state and the current/observed state, regardless of how many
|
||||
intermediate state updates may have been missed. Edge-triggered behavior must be
|
||||
just an optimization.
|
||||
* There should be a CAP-like theorem regarding the tradeoffs between driving control loops via polling or events about simultaneously achieving high performance, reliability, and simplicity -- pick any 2.
|
||||
* Assume an open world: continually verify assumptions and gracefully adapt to
|
||||
external events and/or actors. Example: we allow users to kill pods under
|
||||
control of a replication controller; it just replaces them.
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ The slave mount namespace is the correct solution for this AFAICS. Until this
|
|||
becomes available in k8s, we will have to have operations restart containers manually
|
||||
based on monitoring alerts.
|
||||
|
||||
1. (From @victorgp) When using CoreOS that does not provides external fuse systems
|
||||
1. (From @victorgp) When using CoreOS Container Linux that does not provides external fuse systems
|
||||
like, in our case, GlusterFS, and you need a container to do the mounts. The only
|
||||
way to see those mounts in the host, hence also visible by other containers, is by
|
||||
sharing the mount propagation.
|
||||
|
|
@ -140,7 +140,7 @@ runtime support matrix and when that will be addressed.
|
|||
distros.
|
||||
|
||||
1. (From @euank) Changing those mountflags may make docker even less stable,
|
||||
this may lock up kernel accidently or potentially leak mounts.
|
||||
this may lock up kernel accidentally or potentially leak mounts.
|
||||
|
||||
|
||||
## Decision
|
||||
|
|
|
|||
|
|
@ -52,7 +52,7 @@ Example use cases for rescheduling are
|
|||
* (note that these last two cases are the only use cases where the first-order intent
|
||||
is to move a pod specifically for the benefit of another pod)
|
||||
* moving a running pod off of a node from which it is receiving poor service
|
||||
* anomalous crashlooping or other mysterious incompatiblity between the pod and the node
|
||||
* anomalous crashlooping or other mysterious incompatibility between the pod and the node
|
||||
* repeated out-of-resource killing (see #18724)
|
||||
* repeated attempts by the scheduler to schedule the pod onto some node, but it is
|
||||
rejected by Kubelet admission control due to incomplete scheduler knowledge
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ Borg increased utilization by about 20% when it started allowing use of such non
|
|||
|
||||
## Requests and Limits
|
||||
|
||||
For each resource, containers can specify a resource request and limit, `0 <= request <= `[`Node Allocatable`](../proposals/node-allocatable.md) & `request <= limit <= Infinity`.
|
||||
For each resource, containers can specify a resource request and limit, `0 <= request <= `[`Node Allocatable`](../design-proposals/node-allocatable.md) & `request <= limit <= Infinity`.
|
||||
If a pod is successfully scheduled, the container is guaranteed the amount of resources requested.
|
||||
Scheduling is based on `requests` and not `limits`.
|
||||
The pods and its containers will not be allowed to exceed the specified limit.
|
||||
|
|
|
|||
|
|
@ -302,8 +302,8 @@ where a `<CPU-info>` or `<memory-info>` structure looks like this:
|
|||
```yaml
|
||||
{
|
||||
mean: <value> # arithmetic mean
|
||||
max: <value> # minimum value
|
||||
min: <value> # maximum value
|
||||
max: <value> # maximum value
|
||||
min: <value> # minimum value
|
||||
count: <value> # number of data points
|
||||
percentiles: [ # map from %iles to values
|
||||
"10": <10th-percentile-value>,
|
||||
|
|
|
|||
|
|
@ -191,7 +191,7 @@ profiles to be opaque to kubernetes for now.
|
|||
|
||||
The following format is scoped as follows:
|
||||
|
||||
1. `runtime/default` - the default profile for the container runtime
|
||||
1. `docker/default` - the default profile for the container runtime
|
||||
2. `unconfined` - unconfined profile, ie, no seccomp sandboxing
|
||||
3. `localhost/<profile-name>` - the profile installed to the node's local seccomp profile root
|
||||
|
||||
|
|
|
|||
|
|
@ -99,5 +99,5 @@ Kubernetes self-hosted is working today. Bootkube is an implementation of the "t
|
|||
## Known Issues
|
||||
|
||||
- [Health check endpoints for components don't work correctly](https://github.com/kubernetes-incubator/bootkube/issues/64#issuecomment-228144345)
|
||||
- [kubeadm doesn't do self-hosted yet](https://github.com/kubernetes/kubernetes/pull/38407)
|
||||
- [kubeadm does do self-hosted, but isn't tested yet](https://github.com/kubernetes/kubernetes/pull/40075)
|
||||
- The Kubernetes [versioning policy](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/versioning.md) allows for version skew of kubelet and control plane but not skew between control plane components themselves. We must add testing and validation to Kubernetes that this skew works. Otherwise the work to make Kubernetes HA is rather pointless if it can't be upgraded in an HA manner as well.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,196 @@
|
|||
# Kubectl apply subcommands for last-config
|
||||
|
||||
## Abstract
|
||||
|
||||
`kubectl apply` uses the `last-applied-config` annotation to compute
|
||||
the removal of fields from local object configuration files and then
|
||||
send patches to delete those fields from the live object. Reading or
|
||||
updating the `last-applied-config` is complex as it requires parsing
|
||||
out and writing to the annotation. Here we propose a set of porcelain
|
||||
commands for users to better understand what is going on in the system
|
||||
and make updates.
|
||||
|
||||
## Motivation
|
||||
|
||||
What is going on behind the scenes with `kubectl apply` is opaque. Users
|
||||
have to interact directly with annotations on the object to view
|
||||
and make changes. In order to stop having `apply` manage a field on
|
||||
an object, it must be manually removed from the annotation and then be removed
|
||||
from the local object configuration. Users should be able to simply edit
|
||||
the local object configuration and set it as the last-applied-config
|
||||
to be used for the next diff base. Storing the last-applied-config
|
||||
in an annotation adds black magic to `kubectl apply`, and it would
|
||||
help users learn and understand if the value was exposed in a discoverable
|
||||
manner.
|
||||
|
||||
## Use Cases
|
||||
|
||||
1. As a user, I want to be able to diff the last-applied-configuration
|
||||
against the current local configuration to see which changes the command is seeing
|
||||
2. As a user, I want to remove fields from being managed by the local
|
||||
object configuration by removing them from the local object configuration
|
||||
and setting the last-applied-configuration to match.
|
||||
3. As a user, I want to be able to view the last-applied-configuration
|
||||
on the live object that will be used to calculate the diff patch
|
||||
to update the live object from the configuration file.
|
||||
|
||||
## Naming and Format possibilities
|
||||
|
||||
### Naming
|
||||
|
||||
1. *cmd*-last-applied
|
||||
|
||||
Rejected alternatives:
|
||||
|
||||
2. ~~last-config~~
|
||||
3. ~~last-applied-config~~
|
||||
4. ~~last-configuration~~
|
||||
5. ~~last-applied-configuration~~
|
||||
6. ~~last~~
|
||||
|
||||
### Formats
|
||||
|
||||
1. Apply subcommands
|
||||
- `kubectl apply set-last-applied/view-last-applied/diff-last-applied
|
||||
- a little bit odd to have 2 verbs in a row
|
||||
- improves discoverability to have these as subcommands so they are tied to apply
|
||||
|
||||
Rejected alternatives:
|
||||
|
||||
2. ~~Set/View subcommands~~
|
||||
- `kubectl set/view/diff last-applied
|
||||
- consistent with other set/view commands
|
||||
- clutters discoverability of set/view commands since these are only for apply
|
||||
- clutters discoverability for last-applied commands since they are for apply
|
||||
3. ~~Apply flags~~
|
||||
- `kubectl apply [--set-last-applied | --view-last-applied | --diff-last-applied]
|
||||
- Not a fan of these
|
||||
|
||||
## view last-applied
|
||||
|
||||
Porcelain command that retrieves the object and prints the annotation value as yaml or json.
|
||||
|
||||
Prints an error message if the object is not managed by `apply`.
|
||||
|
||||
1. Get the last-applied by type/name
|
||||
|
||||
```sh
|
||||
kubectl apply view-last-applied deployment/nginx
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: nginx
|
||||
spec:
|
||||
replicas: 1
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
run: nginx
|
||||
spec:
|
||||
containers:
|
||||
- image: nginx
|
||||
name: nginx
|
||||
```
|
||||
|
||||
2. Get the last-applied by file, print as json
|
||||
|
||||
```sh
|
||||
kubectl apply view-last-applied -f deployment_nginx.yaml -o json
|
||||
```
|
||||
|
||||
Same as above, but in json
|
||||
|
||||
## diff last-applied
|
||||
|
||||
Porcelain command that retrieves the object and displays a diff against
|
||||
the local configuration
|
||||
|
||||
1. Diff the last-applied
|
||||
|
||||
```sh
|
||||
kubectl apply diff-last-applied -f deployment_nginx.yaml
|
||||
```
|
||||
|
||||
Opens up a 2-way diff in the default diff viewer. This should
|
||||
follow the same semantics as `git diff`. It should accept either a
|
||||
flag `--diff-viewer=meld` or check the environment variable
|
||||
`KUBECTL_EXTERNAL_DIFF=meld`. If neither is specified, the `diff`
|
||||
command should be used.
|
||||
|
||||
This is meant to show the user what they changed in the configuration,
|
||||
since it was last applied, but not show what has changed in the server.
|
||||
|
||||
The supported output formats should be `yaml` and `json`, as specified
|
||||
by the `-o` flag.
|
||||
|
||||
A future goal is to provide a 3-way diff with `kubectl apply diff -f deployment_nginx.yaml`.
|
||||
Together these tools would give the user the ability to see what is going
|
||||
on and compare changes made to the configuration file vs other
|
||||
changes made to the server independent of the configuration file.
|
||||
|
||||
## set last-applied
|
||||
|
||||
Porcelain command that sets the last-applied-config annotation to as
|
||||
if the local configuration file had just been applied.
|
||||
|
||||
1. Set the last-applied-config
|
||||
|
||||
```sh
|
||||
kubectl apply set-last-applied -f deployment_nginx.yaml
|
||||
```
|
||||
|
||||
Sends a Patch request to set the last-applied-config as if
|
||||
the configuration had just been applied.
|
||||
|
||||
## edit last-applied
|
||||
|
||||
1. Open the last-applied-config in an editor
|
||||
|
||||
```sh
|
||||
kubectl apply edit-last-applied -f deployment_nginx.yaml
|
||||
```
|
||||
|
||||
Since the last-applied-configuration annotation exists only
|
||||
on the live object, this command can alternatively take the
|
||||
kind/name.
|
||||
|
||||
```sh
|
||||
kubectl apply edit-last-applied deployment/nginx
|
||||
```
|
||||
|
||||
Sends a Patch request to set the last-applied-config to
|
||||
the value saved in the editor.
|
||||
|
||||
## Example workflow to stop managing a field with apply - using get/set
|
||||
|
||||
As a user, I want to have the replicas on a Deployment managed by an autoscaler
|
||||
instead of by the configuration.
|
||||
|
||||
1. Check to make sure the live object is up-to-date
|
||||
- `kubectl apply diff-last-applied -f deployment_nginx.yaml`
|
||||
- Expect no changes
|
||||
2. Update the deployment_nginx.yaml by removing the replicas field
|
||||
3. Diff the last-applied-config to make sure the only change is the removal of the replicas field
|
||||
4. Remove the replicas field from the last-applied-config so it doesn't get deleted next apply
|
||||
- `kubectl apply set-last-applied -f deployment_nginx.yaml`
|
||||
5. Verify the last-applied-config has been updated
|
||||
- `kubectl apply view-last-applied -f deployment_nginx.yaml`
|
||||
|
||||
## Example workflow to stop managing a field with apply - using edit
|
||||
|
||||
1. Check to make sure the live object is up-to-date
|
||||
- `kubectl apply diff-last-applied -f deployment_nginx.yaml`
|
||||
- Expect no changes
|
||||
2. Update the deployment_nginx.yaml by removing the replicas field
|
||||
3. Edit the last-applied-config and remove the replicas field
|
||||
- `kubectl apply edit-last-applied deployment/nginx`
|
||||
4. Verify the last-applied-config has been updated
|
||||
- `kubectl apply view-last-applied -f deployment_nginx.yaml`
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
# <Title>
|
||||
|
||||
Status: Pending
|
||||
|
||||
Version: Alpha | Beta | GA
|
||||
|
||||
Implementation Owner: TBD
|
||||
|
||||
## Motivation
|
||||
|
||||
<2-6 sentences about why this is needed>
|
||||
|
||||
## Proposal
|
||||
|
||||
<4-6 description of the proposed solution>
|
||||
|
||||
## User Experience
|
||||
|
||||
### Use Cases
|
||||
|
||||
<enumerated list of use cases for this feature>
|
||||
|
||||
<in depth description of user experience>
|
||||
|
||||
<*include full examples*>
|
||||
|
||||
## Implementation
|
||||
|
||||
<in depth description of how the feature will be implemented. in some cases this may be very simple.>
|
||||
|
||||
### Client/Server Backwards/Forwards compatibility
|
||||
|
||||
<define behavior when using a kubectl client with an older or newer version of the apiserver (+-1 version)>
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
<short description of alternative solutions to be considered>
|
||||
|
||||
|
|
@ -21,21 +21,38 @@ in your group;
|
|||
|
||||
2. Create pkg/apis/`<group>`/{register.go, `<version>`/register.go} to register
|
||||
this group's API objects to the encoding/decoding scheme (e.g.,
|
||||
[pkg/apis/authentication/register.go](../../pkg/apis/authentication/register.go) and
|
||||
[pkg/apis/authentication/v1beta1/register.go](../../pkg/apis/authentication/v1beta1/register.go);
|
||||
[pkg/apis/authentication/register.go](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/authentication/register.go)
|
||||
and
|
||||
[pkg/apis/authentication/v1beta1/register.go](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/authentication/v1beta1/register.go);
|
||||
The register files must have a var called SchemeBuilder for the generated code
|
||||
to reference. There must be an AddToScheme method for the installer to
|
||||
reference. You can look at a group under `pkg/apis/...` for example register.go
|
||||
files to use as a template, but do not copy the register.go files under
|
||||
`pkg/api/...`--they are not general.
|
||||
|
||||
3. Add a pkg/apis/`<group>`/install/install.go, which is responsible for adding
|
||||
the group to the `latest` package, so that other packages can access the group's
|
||||
meta through `latest.Group`. You probably only need to change the name of group
|
||||
and version in the [example](../../pkg/apis/authentication/install/install.go)). You
|
||||
need to import this `install` package in {pkg/master,
|
||||
pkg/client/unversioned}/import_known_versions.go, if you want to make your group
|
||||
accessible to other packages in the kube-apiserver binary, binaries that uses
|
||||
the client package.
|
||||
3. Add a pkg/apis/`<group>`/install/install.go, You probably only need to change
|
||||
the name of group and version in the
|
||||
[example](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/authentication/install/install.go)). This
|
||||
package must be imported by the server along with
|
||||
`k8s.io/kubernetes/pkg/api/install`. Import these packages with the blank
|
||||
identifier as they do not have user callable code and exist solely for their
|
||||
initialization side-effects.
|
||||
|
||||
Step 2 and 3 are mechanical, we plan on autogenerate these using the
|
||||
cmd/libs/go2idl/ tool.
|
||||
|
||||
### Type definitions in `types.go`
|
||||
|
||||
Each type should be an exported struct (have a capitalized name). The struct
|
||||
should have the `TypeMeta` and `ObjectMeta` embeds. There should be a `Spec` and
|
||||
a `Status` field. If the object is solely a data storage object, and will not be
|
||||
modified by a controller, the status field can be left off and the fields inside
|
||||
the `Spec` can be inlined directly into the struct.
|
||||
|
||||
For each top-level type there should also be a `List` struct. The `List` struct should
|
||||
have the `TypeMeta` and `ListMeta` embeds. There should be an `Items` field that
|
||||
is a slice of the defined type.
|
||||
|
||||
### Scripts changes and auto-generated code:
|
||||
|
||||
1. Generate conversions and deep-copies:
|
||||
|
|
|
|||
|
|
@ -1,12 +1,11 @@
|
|||
API Conventions
|
||||
===============
|
||||
|
||||
Updated: 4/22/2016
|
||||
Updated: 2/23/2017
|
||||
|
||||
*This document is oriented at users who want a deeper understanding of the
|
||||
Kubernetes API structure, and developers wanting to extend the Kubernetes API.
|
||||
An introduction to using resources with kubectl can be found in [Working with
|
||||
resources](../user-guide/working-with-resources.md).*
|
||||
An introduction to using resources with kubectl can be found in [the object management overview](https://kubernetes.io/docs/concepts/tools/kubectl/object-management-overview/).*
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
|
@ -53,7 +52,7 @@ resources](../user-guide/working-with-resources.md).*
|
|||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
The conventions of the [Kubernetes API](../api.md) (and related APIs in the
|
||||
The conventions of the [Kubernetes API](https://kubernetes.io/docs/api/) (and related APIs in the
|
||||
ecosystem) are intended to ease client development and ensure that configuration
|
||||
mechanisms can be implemented that work across a diverse set of use cases
|
||||
consistently.
|
||||
|
|
@ -75,6 +74,9 @@ kinds would have different attributes and properties)
|
|||
via HTTP to the server. Resources are exposed via:
|
||||
* Collections - a list of resources of the same type, which may be queryable
|
||||
* Elements - an individual resource, addressable via a URL
|
||||
* **API Group** a set of resources that are exposed together at the same. Along
|
||||
with the version is exposed in the "apiVersion" field as "GROUP/VERSION", e.g.
|
||||
"policy.k8s.io/v1".
|
||||
|
||||
Each resource typically accepts and returns data of a single kind. A kind may be
|
||||
accepted or returned by multiple resources that reflect specific use cases. For
|
||||
|
|
@ -83,8 +85,17 @@ to create, update, and delete pods, while a separate "pod status" resource (that
|
|||
acts on "Pod" kind) allows automated processes to update a subset of the fields
|
||||
in that resource.
|
||||
|
||||
Resources are bound together in API groups - each group may have one or more
|
||||
versions that evolve independent of other API groups, and each version within
|
||||
the group has one or more resources. Group names are typically in domain name
|
||||
form - the Kubernetes project reserves use of the empty group, all single
|
||||
word names ("extensions", "apps"), and any group name ending in "*.k8s.io" for
|
||||
its sole use. When choosing a group name, we recommend selecting a subdomain
|
||||
your group or organization owns, such as "widget.mycompany.com".
|
||||
|
||||
Resource collections should be all lowercase and plural, whereas kinds are
|
||||
CamelCase and singular.
|
||||
CamelCase and singular. Group names must be lower case and be valid DNS
|
||||
subdomains.
|
||||
|
||||
|
||||
## Types (Kinds)
|
||||
|
|
@ -114,7 +125,7 @@ the full list. Some objects may be singletons (the current user, the system
|
|||
defaults) and may not have lists.
|
||||
|
||||
In addition, all lists that return objects with labels should support label
|
||||
filtering (see [docs/user-guide/labels.md](../user-guide/labels.md), and most
|
||||
filtering (see [the labels documentation](https://kubernetes.io/docs/user-guide/labels/)), and most
|
||||
lists should support filtering by fields.
|
||||
|
||||
Examples: PodLists, ServiceLists, NodeLists
|
||||
|
|
@ -150,13 +161,23 @@ is independent of the specific resource schema.
|
|||
|
||||
Two additional subresources, `proxy` and `portforward`, provide access to
|
||||
cluster resources as described in
|
||||
[docs/user-guide/accessing-the-cluster.md](../user-guide/accessing-the-cluster.md).
|
||||
[accessing the cluster docs](https://kubernetes.io/docs/user-guide/accessing-the-cluster/).
|
||||
|
||||
The standard REST verbs (defined below) MUST return singular JSON objects. Some
|
||||
API endpoints may deviate from the strict REST pattern and return resources that
|
||||
are not singular JSON objects, such as streams of JSON objects or unstructured
|
||||
text log data.
|
||||
|
||||
A common set of "meta" API objects are used across all API groups and are
|
||||
thus considered part of the server group named `meta.k8s.io`. These types may
|
||||
evolve independent of the API group that uses them and API servers may allow
|
||||
them to be addressed in their generic form. Examples are `ListOptions`,
|
||||
`DeleteOptions`, `List`, `Status`, `WatchEvent`, and `Scale`. For historical
|
||||
reasons these types are part of each existing API group. Generic tools like
|
||||
quota, garbage collection, autoscalers, and generic clients like kubectl
|
||||
leverage these types to define consistent behavior across different resource
|
||||
types, like the interfaces in programming languages.
|
||||
|
||||
The term "kind" is reserved for these "top-level" API types. The term "type"
|
||||
should be used for distinguishing sub-categories within objects or subobjects.
|
||||
|
||||
|
|
@ -181,12 +202,12 @@ called "metadata":
|
|||
|
||||
* namespace: a namespace is a DNS compatible label that objects are subdivided
|
||||
into. The default namespace is 'default'. See
|
||||
[docs/user-guide/namespaces.md](../user-guide/namespaces.md) for more.
|
||||
[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more.
|
||||
* name: a string that uniquely identifies this object within the current
|
||||
namespace (see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)).
|
||||
namespace (see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)).
|
||||
This value is used in the path when retrieving an individual object.
|
||||
* uid: a unique in time and space value (typically an RFC 4122 generated
|
||||
identifier, see [docs/user-guide/identifiers.md](../user-guide/identifiers.md))
|
||||
identifier, see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/))
|
||||
used to distinguish between objects with the same name that have been deleted
|
||||
and recreated
|
||||
|
||||
|
|
@ -213,10 +234,10 @@ not reachable by name) after the time in this field. Once set, this value may
|
|||
not be unset or be set further into the future, although it may be shortened or
|
||||
the resource may be deleted prior to this time.
|
||||
* labels: a map of string keys and values that can be used to organize and
|
||||
categorize objects (see [docs/user-guide/labels.md](../user-guide/labels.md))
|
||||
categorize objects (see [the labels docs](https://kubernetes.io/docs/user-guide/labels/))
|
||||
* annotations: a map of string keys and values that can be used by external
|
||||
tooling to store and retrieve arbitrary metadata about this object (see
|
||||
[docs/user-guide/annotations.md](../user-guide/annotations.md))
|
||||
[the annotations docs](https://kubernetes.io/docs/user-guide/annotations/)
|
||||
|
||||
Labels are intended for organizational purposes by end users (select the pods
|
||||
that match this label query). Annotations enable third-party automation and
|
||||
|
|
@ -277,6 +298,14 @@ cannot vary from the user's desired intent MAY have only "spec", and MAY rename
|
|||
Objects that contain both spec and status should not contain additional
|
||||
top-level fields other than the standard metadata fields.
|
||||
|
||||
Some objects which are not persisted in the system - such as `SubjectAccessReview`
|
||||
and other webhook style calls - may choose to add spec and status to encapsulate
|
||||
a "call and response" pattern. The spec is the request (often a request for
|
||||
information) and the status is the response. For these RPC like objects the only
|
||||
operation may be POST, but having a consistent schema between submission and
|
||||
response reduces the complexity of these clients.
|
||||
|
||||
|
||||
##### Typical status properties
|
||||
|
||||
**Conditions** represent the latest available observations of an object's
|
||||
|
|
@ -322,7 +351,7 @@ Some resources in the v1 API contain fields called **`phase`**, and associated
|
|||
`message`, `reason`, and other status fields. The pattern of using `phase` is
|
||||
deprecated. Newer API types should use conditions instead. Phase was essentially
|
||||
a state-machine enumeration field, that contradicted
|
||||
[system-design principles](../design/principles.md#control-logic) and hampered
|
||||
[system-design principles](../design-proposals/principles.md#control-logic) and hampered
|
||||
evolution, since [adding new enum values breaks backward
|
||||
compatibility](api_changes.md). Rather than encouraging clients to infer
|
||||
implicit properties from phases, we intend to explicitly expose the conditions
|
||||
|
|
@ -346,7 +375,7 @@ only provided with reasonable effort, and is not guaranteed to not be lost.
|
|||
Status information that may be large (especially proportional in size to
|
||||
collections of other resources, such as lists of references to other objects --
|
||||
see below) and/or rapidly changing, such as
|
||||
[resource usage](../design/resources.md#usage-data), should be put into separate
|
||||
[resource usage](../design-proposals/resources.md#usage-data), should be put into separate
|
||||
objects, with possibly a reference from the original object. This helps to
|
||||
ensure that GETs and watch remain reasonably efficient for the majority of
|
||||
clients, which may not need that data.
|
||||
|
|
@ -359,9 +388,9 @@ the reported status reflects the most recent desired status.
|
|||
#### References to related objects
|
||||
|
||||
References to loosely coupled sets of objects, such as
|
||||
[pods](../user-guide/pods.md) overseen by a
|
||||
[replication controller](../user-guide/replication-controller.md), are usually
|
||||
best referred to using a [label selector](../user-guide/labels.md). In order to
|
||||
[pods](https://kubernetes.io/docs/user-guide/pods/) overseen by a
|
||||
[replication controller](https://kubernetes.io/docs/user-guide/replication-controller/), are usually
|
||||
best referred to using a [label selector](https://kubernetes.io/docs/user-guide/labels/). In order to
|
||||
ensure that GETs of individual objects remain bounded in time and space, these
|
||||
sets may be queried via separate API queries, but will not be expanded in the
|
||||
referring object's status.
|
||||
|
|
@ -698,7 +727,7 @@ labels:
|
|||
All compatible Kubernetes APIs MUST support "name idempotency" and respond with
|
||||
an HTTP status code 409 when a request is made to POST an object that has the
|
||||
same name as an existing object in the system. See
|
||||
[docs/user-guide/identifiers.md](../user-guide/identifiers.md) for details.
|
||||
[the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/) for details.
|
||||
|
||||
Names generated by the system may be requested using `metadata.generateName`.
|
||||
GenerateName indicates that the name should be made unique by the server prior
|
||||
|
|
@ -1296,7 +1325,7 @@ that hard to consistently apply schemas that ensure uniqueness. One just needs
|
|||
to ensure that at least one value of some label key in common differs compared
|
||||
to all other comparable resources. We could/should provide a verification tool
|
||||
to check that. However, development of conventions similar to the examples in
|
||||
[Labels](../user-guide/labels.md) make uniqueness straightforward. Furthermore,
|
||||
[Labels](https://kubernetes.io/docs/user-guide/labels/) make uniqueness straightforward. Furthermore,
|
||||
relatively narrowly used namespaces (e.g., per environment, per application) can
|
||||
be used to reduce the set of resources that could potentially cause overlap.
|
||||
|
||||
|
|
|
|||
|
|
@ -215,7 +215,7 @@ runs just prior to conversion. That works fine when the user creates a resource
|
|||
from a hand-written configuration -- clients can write either field and read
|
||||
either field, but what about creation or update from the output of GET, or
|
||||
update via PATCH (see
|
||||
[In-place updates](../user-guide/managing-deployments.md#in-place-updates-of-resources))?
|
||||
[In-place updates](https://kubernetes.io/docs/user-guide/managing-deployments/#in-place-updates-of-resources))?
|
||||
In this case, the two fields will conflict, because only one field would be
|
||||
updated in the case of an old client that was only aware of the old field (e.g.,
|
||||
`height`).
|
||||
|
|
@ -414,14 +414,10 @@ inefficient).
|
|||
|
||||
The conversion code resides with each versioned API. There are two files:
|
||||
|
||||
- `pkg/api/<version>/conversion.go` containing manually written conversion
|
||||
functions
|
||||
- `pkg/api/<version>/conversion_generated.go` containing auto-generated
|
||||
conversion functions
|
||||
- `pkg/apis/extensions/<version>/conversion.go` containing manually written
|
||||
conversion functions
|
||||
- `pkg/apis/extensions/<version>/conversion_generated.go` containing
|
||||
auto-generated conversion functions
|
||||
conversion functions
|
||||
- `pkg/apis/extensions/<version>/zz_generated.conversion.go` containing
|
||||
auto-generated conversion functions
|
||||
|
||||
Since auto-generated conversion functions are using manually written ones,
|
||||
those manually written should be named with a defined convention, i.e. a
|
||||
|
|
|
|||
|
|
@ -104,7 +104,7 @@ kindness...)
|
|||
PRs should only need to be manually re-tested if you believe there was a flake
|
||||
during the original test. All flakes should be filed as an
|
||||
[issue](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake).
|
||||
Once you find or file a flake a contributer (this may be you!) should request
|
||||
Once you find or file a flake a contributor (this may be you!) should request
|
||||
a retest with "@k8s-bot test this issue: #NNNNN", where NNNNN is replaced with
|
||||
the issue number you found or filed.
|
||||
|
||||
|
|
|
|||
|
|
@ -16,8 +16,7 @@ depending on the point in the release cycle.
|
|||
to set the same label to confirm that no release note is needed.
|
||||
1. `release-note` labeled PRs generate a release note using the PR title by
|
||||
default OR the release-note block in the PR template if filled in.
|
||||
* See the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md) for more
|
||||
details.
|
||||
* See the [PR template](https://github.com/kubernetes/kubernetes/blob/master/.github/PULL_REQUEST_TEMPLATE.md) for more details.
|
||||
* PR titles and body comments are mutable and can be modified at any time
|
||||
prior to the release to reflect a release note friendly message.
|
||||
|
||||
|
|
|
|||
|
|
@ -120,7 +120,7 @@ subdirectories).
|
|||
intended for users that deploy applications or cluster administrators,
|
||||
respectively. Actual application examples belong in /examples.
|
||||
- Examples should also illustrate [best practices for configuration and
|
||||
using the system](../user-guide/config-best-practices.md)
|
||||
using the system](https://kubernetes.io/docs/user-guide/config-best-practices/)
|
||||
|
||||
- Third-party code
|
||||
|
||||
|
|
|
|||
|
|
@ -19,7 +19,7 @@ around code review that govern all active contributors to Kubernetes.
|
|||
### Code of Conduct
|
||||
|
||||
The most important expectation of the Kubernetes community is that all members
|
||||
abide by the Kubernetes [community code of conduct](../../code-of-conduct.md).
|
||||
abide by the Kubernetes [community code of conduct](../../governance.md#code-of-conduct).
|
||||
Only by respecting each other can we develop a productive, collaborative
|
||||
community.
|
||||
|
||||
|
|
@ -42,7 +42,7 @@ contributors are considered to be anyone who meets any of the following criteria
|
|||
than 20 PRs in the previous year.
|
||||
* Filed more than three issues in the previous month, or more than 30 issues in
|
||||
the previous 12 months.
|
||||
* Commented on more than pull requests in the previous month, or
|
||||
* Commented on more than five pull requests in the previous month, or
|
||||
more than 50 pull requests in the previous 12 months.
|
||||
* Marked any PR as LGTM in the previous month.
|
||||
* Have *collaborator* permissions in the Kubernetes github project.
|
||||
|
|
@ -58,7 +58,7 @@ Because reviewers are often the first points of contact between new members of
|
|||
the community and can significantly impact the first impression of the
|
||||
Kubernetes community, reviewers are especially important in shaping the
|
||||
Kubernetes community. Reviewers are highly encouraged to review the
|
||||
[code of conduct](../../code-of-conduct.md) and are strongly encouraged to go above
|
||||
[code of conduct](../../governance.md#code-of-conduct) and are strongly encouraged to go above
|
||||
and beyond the code of conduct to promote a collaborative, respectful
|
||||
Kubernetes community.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,15 +1,17 @@
|
|||
# Development Guide
|
||||
|
||||
This document is intended to be the canonical source of truth for things like
|
||||
supported toolchain versions for building Kubernetes. If you find a
|
||||
requirement that this doc does not capture, please
|
||||
[submit an issue](https://github.com/kubernetes/kubernetes/issues) on github. If
|
||||
you find other docs with references to requirements that are not simply links to
|
||||
this doc, please [submit an issue](https://github.com/kubernetes/kubernetes/issues).
|
||||
This document is the canonical source of truth for things like
|
||||
supported toolchain versions for building Kubernetes.
|
||||
|
||||
Please submit an [issue] on github if you
|
||||
* find a requirement that this doc does not capture,
|
||||
* find other docs with references to requirements that
|
||||
are not simply links to this doc.
|
||||
|
||||
This document is intended to be relative to the branch in which it is found.
|
||||
It is guaranteed that requirements will change over time for the development
|
||||
branch, but release branches of Kubernetes should not change.
|
||||
|
||||
Development branch requirements will change over time, but release branch
|
||||
requirements are frozen.
|
||||
|
||||
## Building Kubernetes with Docker
|
||||
|
||||
|
|
@ -19,189 +21,245 @@ Docker please follow [these instructions]
|
|||
|
||||
## Building Kubernetes on a local OS/shell environment
|
||||
|
||||
Many of the Kubernetes development helper scripts rely on a fairly up-to-date
|
||||
GNU tools environment, so most recent Linux distros should work just fine
|
||||
out-of-the-box. Note that Mac OS X ships with somewhat outdated BSD-based tools,
|
||||
some of which may be incompatible in subtle ways, so we recommend
|
||||
[replacing those with modern GNU tools]
|
||||
(https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/).
|
||||
Kubernetes development helper scripts assume an up-to-date
|
||||
GNU tools environment. Most recent Linux distros should work
|
||||
out-of-the-box.
|
||||
|
||||
### Go development environment
|
||||
Mac OS X ships with outdated BSD-based tools.
|
||||
We recommend installing [Os X GNU tools].
|
||||
|
||||
Kubernetes is written in the [Go](http://golang.org) programming language.
|
||||
To build Kubernetes without using Docker containers, you'll need a Go
|
||||
development environment. Builds for Kubernetes 1.0 - 1.2 require Go version
|
||||
1.4.2. Builds for Kubernetes 1.3 and higher require Go version 1.6.0. If you
|
||||
haven't set up a Go development environment, please follow [these
|
||||
instructions](http://golang.org/doc/code.html) to install the go tools.
|
||||
### etcd
|
||||
|
||||
Set up your GOPATH and add a path entry for go binaries to your PATH. Typically
|
||||
added to your ~/.profile:
|
||||
Kubernetes maintains state in [`etcd`][etcd-latest], a distributed key store.
|
||||
|
||||
Please [install it locally][etcd-install] to run local integration tests.
|
||||
|
||||
### Go
|
||||
|
||||
Kubernetes is written in [Go](http://golang.org).
|
||||
If you don't have a Go development environment,
|
||||
please [set one up](http://golang.org/doc/code.html).
|
||||
|
||||
|
||||
| Kubernetes | requires Go |
|
||||
|----------------|--------------|
|
||||
| 1.0 - 1.2 | 1.4.2 |
|
||||
| 1.3, 1.4 | 1.6 |
|
||||
| 1.5 and higher | 1.7 - 1.7.5 |
|
||||
| | [1.8][go-1.8] not verified as of Feb 2017 |
|
||||
|
||||
After installation, you'll need `GOPATH` defined,
|
||||
and `PATH` modified to access your Go binaries.
|
||||
|
||||
A common setup is
|
||||
```sh
|
||||
export GOPATH=$HOME/go
|
||||
export PATH=$PATH:$GOPATH/bin
|
||||
```
|
||||
|
||||
### Godep dependency management
|
||||
#### Upgrading Go
|
||||
|
||||
Kubernetes build and test scripts use [godep](https://github.com/tools/godep) to
|
||||
Upgrading Go requires specific modification of some scripts and container
|
||||
images.
|
||||
|
||||
- The image for cross compiling in [build/build-image/cross].
|
||||
The `VERSION` file and `Dockerfile`.
|
||||
- Update [dockerized-e2e-runner.sh] to run a kubekins-e2e with the desired Go version.
|
||||
This requires pushing the [e2e][e2e-image] and [test][test-image] images that are `FROM` the desired Go version.
|
||||
- The cross tag `KUBE_BUILD_IMAGE_CROSS_TAG` in [build/common.sh].
|
||||
|
||||
|
||||
#### Dependency management
|
||||
|
||||
Kubernetes build/test scripts use [`godep`](https://github.com/tools/godep) to
|
||||
manage dependencies.
|
||||
|
||||
#### Install godep
|
||||
|
||||
Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is
|
||||
installed on your system. (some of godep's dependencies use the mercurial
|
||||
source control system). Use `apt-get install mercurial` or `yum install
|
||||
mercurial` on Linux, or [brew.sh](http://brew.sh) on OS X, or download directly
|
||||
from mercurial.
|
||||
|
||||
Install godep (may require sudo):
|
||||
|
||||
```sh
|
||||
go get -u github.com/tools/godep
|
||||
```
|
||||
|
||||
Note:
|
||||
At this time, godep version >= v63 is known to work in the Kubernetes project.
|
||||
|
||||
To check your version of godep:
|
||||
|
||||
Check your version; `v63` or higher is known to work for Kubernetes.
|
||||
```sh
|
||||
$ godep version
|
||||
godep v74 (linux/amd64/go1.6.2)
|
||||
godep version
|
||||
```
|
||||
|
||||
Developers planning to managing dependencies in the `vendor/` tree may want to
|
||||
explore alternative environment setups. See
|
||||
[using godep to manage dependencies](godep.md).
|
||||
Developers planning to manage dependencies in the `vendor/` tree may want to
|
||||
explore alternative environment setups. See [using godep to manage dependencies](godep.md).
|
||||
|
||||
### Local build using make
|
||||
|
||||
To build Kubernetes using your local Go development environment (generate linux
|
||||
binaries):
|
||||
|
||||
## Workflow
|
||||
|
||||

|
||||
|
||||
### 1 Fork in the cloud
|
||||
|
||||
1. Visit https://github.com/kubernetes/kubernetes
|
||||
2. Click `Fork` button (top right) to establish a cloud-based fork.
|
||||
|
||||
### 2 Clone fork to local storage
|
||||
|
||||
Per Go's [workspace instructions][go-workspace], place Kubernetes' code on your
|
||||
`GOPATH` using the following cloning procedure.
|
||||
|
||||
Define a local working directory:
|
||||
|
||||
```sh
|
||||
make
|
||||
# If your GOPATH has multiple paths, pick
|
||||
# just one and use it instead of $GOPATH here
|
||||
working_dir=$GOPATH/src/k8s.io
|
||||
```
|
||||
|
||||
You may pass build options and packages to the script as necessary. For example,
|
||||
to build with optimizations disabled for enabling use of source debug tools:
|
||||
> If you already do Go development on github, the `k8s.io` directory
|
||||
> will be a sibling to your existing `github.com` directory.
|
||||
|
||||
Set `user` to match your github profile name:
|
||||
|
||||
```sh
|
||||
make GOGCFLAGS="-N -l"
|
||||
user={your github profile name}
|
||||
```
|
||||
|
||||
Both `$working_dir` and `$user` are mentioned in the figure above.
|
||||
|
||||
Create your clone:
|
||||
|
||||
```sh
|
||||
mkdir -p $working_dir
|
||||
cd $working_dir
|
||||
git clone https://github.com/$user/kubernetes.git
|
||||
# or: git clone git@github.com:$user/kubernetes.git
|
||||
|
||||
cd $working_dir/kubernetes
|
||||
git remote add upstream https://github.com/kubernetes/kubernetes.git
|
||||
# or: git remote add upstream git@github.com:kubernetes/kubernetes.git
|
||||
|
||||
# Never push to upstream master
|
||||
git remote set-url --push upstream no_push
|
||||
|
||||
# Confirm that your remotes make sense:
|
||||
git remote -v
|
||||
```
|
||||
|
||||
#### Define a pre-commit hook
|
||||
|
||||
Please link the Kubernetes pre-commit hook into your `.git` directory.
|
||||
|
||||
This hook checks your commits for formatting, building, doc generation, etc.
|
||||
It requires both `godep` and `etcd` on your `PATH`.
|
||||
|
||||
```sh
|
||||
cd $working_dir/kubernetes/.git/hooks
|
||||
ln -s ../../hooks/pre-commit .
|
||||
```
|
||||
|
||||
### 3 Branch
|
||||
|
||||
Get your local master up to date:
|
||||
|
||||
```sh
|
||||
cd $working_dir/kubernetes
|
||||
git fetch upstream
|
||||
git checkout master
|
||||
git rebase upstream/master
|
||||
```
|
||||
|
||||
Branch from it:
|
||||
```sh
|
||||
git checkout -b myfeature
|
||||
```
|
||||
|
||||
Then edit code on the `myfeature` branch.
|
||||
|
||||
#### Build
|
||||
|
||||
```sh
|
||||
cd $working_dir/kubernetes
|
||||
make
|
||||
```
|
||||
|
||||
To build with optimizations disabled for enabling use of source debug tools:
|
||||
|
||||
```sh
|
||||
make GOGCFLAGS="-N -l"
|
||||
```
|
||||
|
||||
To build binaries for all platforms:
|
||||
|
||||
```sh
|
||||
make cross
|
||||
make cross
|
||||
```
|
||||
|
||||
### How to update the Go version used to test & build k8s
|
||||
|
||||
The kubernetes project tries to stay on the latest version of Go so it can
|
||||
benefit from the improvements to the language over time and can easily
|
||||
bump to a minor release version for security updates.
|
||||
|
||||
Since kubernetes is mostly built and tested in containers, there are a few
|
||||
unique places you need to update the go version.
|
||||
|
||||
- The image for cross compiling in [build/build-image/cross/](https://github.com/kubernetes/kubernetes/blob/master/build/build-image/cross/). The `VERSION` file and `Dockerfile`.
|
||||
- Update [dockerized-e2e-runner.sh](https://github.com/kubernetes/test-infra/blob/master/jenkins/dockerized-e2e-runner.sh) to run a kubekins-e2e with the desired go version, which requires pushing [e2e-image](https://github.com/kubernetes/test-infra/tree/master/jenkins/e2e-image) and [test-image](https://github.com/kubernetes/test-infra/tree/master/jenkins/test-image) images that are `FROM` the desired go version.
|
||||
- The docker image being run in [gotest-dockerized.sh](https://github.com/kubernetes/test-infra/blob/master/jenkins/gotest-dockerized.sh).
|
||||
- The cross tag `KUBE_BUILD_IMAGE_CROSS_TAG` in [build/common.sh](https://github.com/kubernetes/kubernetes/blob/master/build/common.sh)
|
||||
|
||||
## Workflow
|
||||
|
||||
Below, we outline one of the more common git workflows that core developers use.
|
||||
Other git workflows are also valid.
|
||||
|
||||
### Visual overview
|
||||
|
||||

|
||||
|
||||
### Fork the main repository
|
||||
|
||||
1. Go to https://github.com/kubernetes/kubernetes
|
||||
2. Click the "Fork" button (at the top right)
|
||||
|
||||
### Clone your fork
|
||||
|
||||
The commands below require that you have $GOPATH set ([$GOPATH
|
||||
docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put
|
||||
Kubernetes' code into your GOPATH. Note: the commands below will not work if
|
||||
there is more than one directory in your `$GOPATH`.
|
||||
#### Test
|
||||
|
||||
```sh
|
||||
mkdir -p $GOPATH/src/k8s.io
|
||||
cd $GOPATH/src/k8s.io
|
||||
# Replace "$YOUR_GITHUB_USERNAME" below with your github username
|
||||
git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git
|
||||
cd kubernetes
|
||||
git remote add upstream 'https://github.com/kubernetes/kubernetes.git'
|
||||
cd $working_dir/kubernetes
|
||||
|
||||
# Run every unit test
|
||||
make test
|
||||
|
||||
# Run package tests verbosely
|
||||
make test WHAT=pkg/util/cache GOFLAGS=-v
|
||||
|
||||
# Run integration tests, requires etcd
|
||||
make test-integration
|
||||
|
||||
# Run e2e tests
|
||||
make test-e2e
|
||||
```
|
||||
|
||||
### Create a branch and make changes
|
||||
|
||||
```sh
|
||||
git checkout -b my-feature
|
||||
# Make your code changes
|
||||
```
|
||||
|
||||
### Keeping your development fork in sync
|
||||
See the [testing guide](testing.md) and [end-to-end tests](e2e-tests.md)
|
||||
for additional information and scenarios.
|
||||
|
||||
### 4 Keep your branch in sync
|
||||
|
||||
```sh
|
||||
# While on your myfeature branch
|
||||
git fetch upstream
|
||||
git rebase upstream/master
|
||||
```
|
||||
|
||||
Note: If you have write access to the main repository at
|
||||
github.com/kubernetes/kubernetes, you should modify your git configuration so
|
||||
that you can't accidentally push to upstream:
|
||||
### 5 Commit
|
||||
|
||||
```sh
|
||||
git remote set-url --push upstream no_push
|
||||
```
|
||||
|
||||
### Committing changes to your fork
|
||||
|
||||
Before committing any changes, please link/copy the pre-commit hook into your
|
||||
.git directory. This will keep you from accidentally committing non-gofmt'd Go
|
||||
code. This hook will also do a build and test whether documentation generation
|
||||
scripts need to be executed.
|
||||
|
||||
The hook requires both Godep and etcd on your `PATH`.
|
||||
|
||||
```sh
|
||||
cd kubernetes/.git/hooks/
|
||||
ln -s ../../hooks/pre-commit .
|
||||
```
|
||||
|
||||
Then you can commit your changes and push them to your fork:
|
||||
Commit your changes.
|
||||
|
||||
```sh
|
||||
git commit
|
||||
git push -f origin my-feature
|
||||
```
|
||||
Likely you go back and edit/build/test some more then `commit --amend`
|
||||
in a few cycles.
|
||||
|
||||
### 6 Push
|
||||
|
||||
When ready to review (or just to establish an offsite backup or your work),
|
||||
push your branch to your fork on `github.com`:
|
||||
|
||||
```sh
|
||||
git push -f origin myfeature
|
||||
```
|
||||
|
||||
### Creating a pull request
|
||||
### 7 Create a pull request
|
||||
|
||||
1. Visit https://github.com/$YOUR_GITHUB_USERNAME/kubernetes
|
||||
2. Click the "Compare & pull request" button next to your "my-feature" branch.
|
||||
3. Check out the pull request [process](pull-requests.md) for more details
|
||||
1. Visit your fork at https://github.com/$user/kubernetes (replace `$user` obviously).
|
||||
2. Click the `Compare & pull request` button next to your `myfeature` branch.
|
||||
3. Check out the pull request [process](pull-requests.md) for more details.
|
||||
|
||||
**Note:** If you have write access, please refrain from using the GitHub UI for creating PRs, because GitHub will create the PR branch inside the main repository rather than inside your fork.
|
||||
_If you have upstream write access_, please refrain from using the GitHub UI for
|
||||
creating PRs, because GitHub will create the PR branch inside the main
|
||||
repository rather than inside your fork.
|
||||
|
||||
### Getting a code review
|
||||
#### Get a code review
|
||||
|
||||
Once your pull request has been opened it will be assigned to one or more
|
||||
reviewers. Those reviewers will do a thorough code review, looking for
|
||||
correctness, bugs, opportunities for improvement, documentation and comments,
|
||||
and style.
|
||||
|
||||
Commit changes made in response to review comments to the same branch on your
|
||||
fork.
|
||||
|
||||
Very small PRs are easy to review. Very large PRs are very difficult to
|
||||
review. Github has a built-in code review tool, which is what most people use.
|
||||
review.
|
||||
|
||||
At the assigned reviewer's discretion, a PR may be switched to use
|
||||
[Reviewable](https://reviewable.k8s.io) instead. Once a PR is switched to
|
||||
Reviewable, please ONLY send or reply to comments through reviewable. Mixing
|
||||
|
|
@ -210,41 +268,39 @@ code review tools can be very confusing.
|
|||
See [Faster Reviews](faster_reviews.md) for some thoughts on how to streamline
|
||||
the review process.
|
||||
|
||||
### When to retain commits and when to squash
|
||||
|
||||
Upon merge, all git commits should represent meaningful milestones or units of
|
||||
work. Use commits to add clarity to the development and review process.
|
||||
#### Squash and Merge
|
||||
|
||||
Before merging a PR, squash any "fix review feedback", "typo", and "rebased"
|
||||
sorts of commits. It is not imperative that every commit in a PR compile and
|
||||
pass tests independently, but it is worth striving for. For mass automated
|
||||
fixups (e.g. automated doc formatting), use one or more commits for the
|
||||
changes to tooling and a final commit to apply the fixup en masse. This makes
|
||||
reviews much easier.
|
||||
|
||||
## Testing
|
||||
|
||||
Three basic commands let you run unit, integration and/or e2e tests:
|
||||
|
||||
```sh
|
||||
cd kubernetes
|
||||
make test # Run every unit test
|
||||
make test WHAT=pkg/util/cache GOFLAGS=-v # Run tests of a package verbosely
|
||||
make test-integration # Run integration tests, requires etcd
|
||||
make test-e2e # Run e2e tests
|
||||
```
|
||||
|
||||
See the [testing guide](testing.md) and [end-to-end tests](e2e-tests.md) for additional information and scenarios.
|
||||
|
||||
## Regenerating the CLI documentation
|
||||
|
||||
```sh
|
||||
hack/update-generated-docs.sh
|
||||
```
|
||||
Upon merge (by either you or your reviewer), all commits left on the review
|
||||
branch should represent meaningful milestones or units of work. Use commits to
|
||||
add clarity to the development and review process.
|
||||
|
||||
Before merging a PR, squash any _fix review feedback_, _typo_, and _rebased_
|
||||
sorts of commits.
|
||||
|
||||
It is not imperative that every commit in a PR compile and pass tests
|
||||
independently, but it is worth striving for.
|
||||
|
||||
For mass automated fixups (e.g. automated doc formatting), use one or more
|
||||
commits for the changes to tooling and a final commit to apply the fixup en
|
||||
masse. This makes reviews easier.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
||||
[Os X GNU tools]: https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x
|
||||
[build/build-image/cross]: https://github.com/kubernetes/kubernetes/blob/master/build/build-image/cross
|
||||
[build/common.sh]: https://github.com/kubernetes/kubernetes/blob/master/build/common.sh
|
||||
[dockerized-e2e-runner.sh]: https://github.com/kubernetes/test-infra/blob/master/jenkins/dockerized-e2e-runner.sh
|
||||
[e2e-image]: https://github.com/kubernetes/test-infra/tree/master/jenkins/e2e-image
|
||||
[etcd-latest]: https://coreos.com/etcd/docs/latest
|
||||
[etcd-install]: testing.md#install-etcd-dependency
|
||||
<!-- https://github.com/coreos/etcd/releases -->
|
||||
[go-1.8]: https://blog.golang.org/go1.8
|
||||
[go-workspace]: https://golang.org/doc/code.html#Workspaces
|
||||
[issue]: https://github.com/kubernetes/kubernetes/issues
|
||||
[kubectl user guide]: https://kubernetes.io/docs/user-guide/kubectl
|
||||
[kubernetes.io]: https://kubernetes.io
|
||||
[mercurial]: http://mercurial.selenic.com/wiki/Download
|
||||
[test-image]: https://github.com/kubernetes/test-infra/tree/master/jenkins/test-image
|
||||
|
|
|
|||
|
|
@ -137,7 +137,7 @@ make test-e2e-node REMOTE=true IMAGE_PROJECT="<name-of-project-with-images>" IMA
|
|||
```
|
||||
|
||||
Setting up your own host image may require additional steps such as installing etcd or docker. See
|
||||
[setup_host.sh](../../test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests.
|
||||
[setup_host.sh](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests.
|
||||
|
||||
## Create instances using a different instance name prefix
|
||||
|
||||
|
|
@ -202,8 +202,10 @@ related test, Remote execution is recommended.**
|
|||
To enable/disable kubenet:
|
||||
|
||||
```sh
|
||||
make test_e2e_node TEST_ARGS="--disable-kubenet=true" # enable kubenet
|
||||
make test_e2e_node TEST_ARGS="--disable-kubenet=false" # disable kubenet
|
||||
# enable kubenet
|
||||
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin=kubenet --network-plugin-dir=/opt/cni/bin"'
|
||||
# disable kubenet
|
||||
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin= --network-plugin-dir="'
|
||||
```
|
||||
|
||||
## Additional QoS Cgroups Hierarchy level testing
|
||||
|
|
@ -221,9 +223,9 @@ the bottom of the comments section. To re-run just the node e2e tests from the
|
|||
`@k8s-bot node e2e test this issue: #<Flake-Issue-Number or IGNORE>` and **include a link to the test
|
||||
failure logs if caused by a flake.**
|
||||
|
||||
The PR builder runs tests against the images listed in [jenkins-pull.properties](../../test/e2e_node/jenkins/jenkins-pull.properties)
|
||||
The PR builder runs tests against the images listed in [jenkins-pull.properties](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node/jenkins/jenkins-pull.properties)
|
||||
|
||||
The post submit tests run against the images listed in [jenkins-ci.properties](../../test/e2e_node/jenkins/jenkins-ci.properties)
|
||||
The post submit tests run against the images listed in [jenkins-ci.properties](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node/jenkins/jenkins-ci.properties)
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@ Updated: 5/3/2016
|
|||
- [Building and Running the Tests](#building-and-running-the-tests)
|
||||
- [Cleaning up](#cleaning-up)
|
||||
- [Advanced testing](#advanced-testing)
|
||||
- [Installing/updating kubetest](#installingupdating-kubetest)
|
||||
- [Extracting a specific version of kubernetes](#extracting-a-specific-version-of-kubernetes)
|
||||
- [Bringing up a cluster for testing](#bringing-up-a-cluster-for-testing)
|
||||
- [Federation e2e tests](#federation-e2e-tests)
|
||||
- [Configuring federation e2e tests](#configuring-federation-e2e-tests)
|
||||
|
|
@ -79,26 +81,26 @@ changing the `KUBERNETES_PROVIDER` environment variable to something other than
|
|||
To build Kubernetes, up a cluster, run tests, and tear everything down, use:
|
||||
|
||||
```sh
|
||||
go run hack/e2e.go -v --build --up --test --down
|
||||
go run hack/e2e.go -- -v --build --up --test --down
|
||||
```
|
||||
|
||||
If you'd like to just perform one of these steps, here are some examples:
|
||||
|
||||
```sh
|
||||
# Build binaries for testing
|
||||
go run hack/e2e.go -v --build
|
||||
go run hack/e2e.go -- -v --build
|
||||
|
||||
# Create a fresh cluster. Deletes a cluster first, if it exists
|
||||
go run hack/e2e.go -v --up
|
||||
go run hack/e2e.go -- -v --up
|
||||
|
||||
# Run all tests
|
||||
go run hack/e2e.go -v --test
|
||||
go run hack/e2e.go -- -v --test
|
||||
|
||||
# Run tests matching the regex "\[Feature:Performance\]"
|
||||
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Feature:Performance\]"
|
||||
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Feature:Performance\]"
|
||||
|
||||
# Conversely, exclude tests that match the regex "Pods.*env"
|
||||
go run hack/e2e.go -v --test --test_args="--ginkgo.skip=Pods.*env"
|
||||
go run hack/e2e.go -- -v --test --test_args="--ginkgo.skip=Pods.*env"
|
||||
|
||||
# Run tests in parallel, skip any that must be run serially
|
||||
GINKGO_PARALLEL=y go run hack/e2e.go --v --test --test_args="--ginkgo.skip=\[Serial\]"
|
||||
|
|
@ -112,13 +114,13 @@ GINKGO_PARALLEL=y go run hack/e2e.go --v --test --test_args="--ginkgo.skip=\[Ser
|
|||
# You can also specify an alternative provider, such as 'aws'
|
||||
#
|
||||
# e.g.:
|
||||
KUBERNETES_PROVIDER=aws go run hack/e2e.go -v --build --up --test --down
|
||||
KUBERNETES_PROVIDER=aws go run hack/e2e.go -- -v --build --up --test --down
|
||||
|
||||
# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for
|
||||
# cleaning up after a failed test or viewing logs. Use -v to avoid suppressing
|
||||
# kubectl output.
|
||||
go run hack/e2e.go -v -ctl='get events'
|
||||
go run hack/e2e.go -v -ctl='delete pod foobar'
|
||||
go run hack/e2e.go -- -v -ctl='get events'
|
||||
go run hack/e2e.go -- -v -ctl='delete pod foobar'
|
||||
```
|
||||
|
||||
The tests are built into a single binary which can be run used to deploy a
|
||||
|
|
@ -133,11 +135,60 @@ something goes wrong and you still have some VMs running you can force a cleanup
|
|||
with this command:
|
||||
|
||||
```sh
|
||||
go run hack/e2e.go -v --down
|
||||
go run hack/e2e.go -- -v --down
|
||||
```
|
||||
|
||||
## Advanced testing
|
||||
|
||||
### Installing/updating kubetest
|
||||
|
||||
The logic in `e2e.go` moved out of the main kubernetes repo to test-infra.
|
||||
The remaining code in `hack/e2e.go` installs `kubetest` and sends it flags.
|
||||
It now lives in [kubernetes/test-infra/kubetest](https://github.com/kubernetes/test-infra/tree/master/kubetest).
|
||||
By default `hack/e2e.go` updates and installs `kubetest` once per day.
|
||||
Control the updater behavior with the `--get` and `--old` flags:
|
||||
The `--` flag separates updater and kubetest flags (kubetest flags on the right).
|
||||
|
||||
```sh
|
||||
go run hack/e2e.go --get=true --old=1h -- # Update every hour
|
||||
go run hack/e2e.go --get=false -- # Never attempt to install/update.
|
||||
go install k8s.io/test-infra/kubetest # Manually install
|
||||
go get -u k8s.io/test-infra/kubetest # Manually update installation
|
||||
```
|
||||
### Extracting a specific version of kubernetes
|
||||
|
||||
The `kubetest` binary can download and extract a specific version of kubernetes,
|
||||
both the server, client and test binaries. The `--extract=E` flag enables this
|
||||
functionality.
|
||||
|
||||
There are a variety of values to pass this flag:
|
||||
|
||||
```sh
|
||||
# Official builds: <ci|release>/<latest|stable>[-N.N]
|
||||
go run hack/e2e.go -- --extract=ci/latest --up # Deploy the latest ci build.
|
||||
go run hack/e2e.go -- --extract=ci/latest-1.5 --up # Deploy the latest 1.5 CI build.
|
||||
go run hack/e2e.go -- --extract=release/latest --up # Deploy the latest RC.
|
||||
go run hack/e2e.go -- --extract=release/stable-1.5 --up # Deploy the 1.5 release.
|
||||
|
||||
# A specific version:
|
||||
go run hack/e2e.go -- --extract=v1.5.1 --up # Deploy 1.5.1
|
||||
go run hack/e2e.go -- --extract=v1.5.2-beta.0 --up # Deploy 1.5.2-beta.0
|
||||
go run hack/e2e.go -- --extract=gs://foo/bar --up # --stage=gs://foo/bar
|
||||
|
||||
# Whatever GKE is using (gke, gke-staging, gke-test):
|
||||
go run hack/e2e.go -- --extract=gke --up # Deploy whatever GKE prod uses
|
||||
|
||||
# Using a GCI version:
|
||||
go run hack/e2e.go -- --extract=gci/gci-canary --up # Deploy the version for next gci release
|
||||
go run hack/e2e.go -- --extract=gci/gci-57 # Deploy the version bound to gci m57
|
||||
go run hack/e2e.go -- --extract=gci/gci-57/ci/latest # Deploy the latest CI build using gci m57 for the VM image
|
||||
|
||||
# Reuse whatever is already built
|
||||
go run hack/e2e.go -- --up # Most common. Note, no extract flag
|
||||
go run hack/e2e.go -- --build --up # Most common. Note, no extract flag
|
||||
go run hack/e2e.go -- --build --stage=gs://foo/bar --extract=local --up # Extract the staged version
|
||||
```
|
||||
|
||||
### Bringing up a cluster for testing
|
||||
|
||||
If you want, you may bring up a cluster in some other manner and run tests
|
||||
|
|
@ -265,7 +316,7 @@ Next, specify the docker repository where your ci images will be pushed.
|
|||
* Compile the binaries and build container images:
|
||||
|
||||
```sh
|
||||
$ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true go run hack/e2e.go -v -build
|
||||
$ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true go run hack/e2e.go -- -v -build
|
||||
```
|
||||
|
||||
* Push the federation container images
|
||||
|
|
@ -280,7 +331,7 @@ The following command will create the underlying Kubernetes clusters in each of
|
|||
federation control plane in the cluster occupying the last zone in the `E2E_ZONES` list.
|
||||
|
||||
```sh
|
||||
$ go run hack/e2e.go -v --up
|
||||
$ go run hack/e2e.go -- -v --up
|
||||
```
|
||||
|
||||
#### Run the Tests
|
||||
|
|
@ -288,13 +339,13 @@ $ go run hack/e2e.go -v --up
|
|||
This will run only the `Feature:Federation` e2e tests. You can omit the `ginkgo.focus` argument to run the entire e2e suite.
|
||||
|
||||
```sh
|
||||
$ go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Feature:Federation\]"
|
||||
$ go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Feature:Federation\]"
|
||||
```
|
||||
|
||||
#### Teardown
|
||||
|
||||
```sh
|
||||
$ go run hack/e2e.go -v --down
|
||||
$ go run hack/e2e.go -- -v --down
|
||||
```
|
||||
|
||||
#### Shortcuts for test developers
|
||||
|
|
@ -364,13 +415,13 @@ at a custom host directly:
|
|||
export KUBECONFIG=/path/to/kubeconfig
|
||||
export KUBE_MASTER_IP="http://127.0.0.1:<PORT>"
|
||||
export KUBE_MASTER=local
|
||||
go run hack/e2e.go -v --test
|
||||
go run hack/e2e.go -- -v --test
|
||||
```
|
||||
|
||||
To control the tests that are run:
|
||||
|
||||
```sh
|
||||
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\"Secrets\""
|
||||
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\"Secrets\""
|
||||
```
|
||||
|
||||
### Version-skewed and upgrade testing
|
||||
|
|
@ -403,7 +454,7 @@ export CLUSTER_API_VERSION=${OLD_VERSION}
|
|||
|
||||
# Deploy a cluster at the old version; see above for more details
|
||||
cd ./kubernetes_old
|
||||
go run ./hack/e2e.go -v --up
|
||||
go run ./hack/e2e.go -- -v --up
|
||||
|
||||
# Upgrade the cluster to the new version
|
||||
#
|
||||
|
|
@ -411,11 +462,11 @@ go run ./hack/e2e.go -v --up
|
|||
#
|
||||
# You can target Feature:MasterUpgrade or Feature:ClusterUpgrade
|
||||
cd ../kubernetes
|
||||
go run ./hack/e2e.go -v --test --check_version_skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]"
|
||||
go run ./hack/e2e.go -- -v --test --check_version_skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]"
|
||||
|
||||
# Run old tests with new kubectl
|
||||
cd ../kubernetes_old
|
||||
go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
|
||||
go run ./hack/e2e.go -- -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
|
||||
```
|
||||
|
||||
If you are just testing version-skew, you may want to just deploy at one
|
||||
|
|
@ -427,14 +478,14 @@ upgrade process:
|
|||
|
||||
# Deploy a cluster at the new version
|
||||
cd ./kubernetes
|
||||
go run ./hack/e2e.go -v --up
|
||||
go run ./hack/e2e.go -- -v --up
|
||||
|
||||
# Run new tests with old kubectl
|
||||
go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh"
|
||||
go run ./hack/e2e.go -- -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh"
|
||||
|
||||
# Run old tests with new kubectl
|
||||
cd ../kubernetes_old
|
||||
go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
|
||||
go run ./hack/e2e.go -- -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
|
||||
```
|
||||
|
||||
## Kinds of tests
|
||||
|
|
@ -480,6 +531,15 @@ breaking changes, it does *not* block the merge-queue, and thus should run in
|
|||
some separate test suites owned by the feature owner(s)
|
||||
(see [Continuous Integration](#continuous-integration) below).
|
||||
|
||||
In order to simplify running component-specific test suites, it may also be
|
||||
necessary to tag tests with a component label. The component may include
|
||||
standard and non-standard tests, so the `[Feature:.+]` label is not sufficient for
|
||||
this purpose. These component labels have no impact on the standard e2e test
|
||||
suites. The following component labels have been defined:
|
||||
|
||||
- `[Volume]`: All tests related to volumes and storage: volume plugins,
|
||||
attach/detatch controller, persistent volume controller, etc.
|
||||
|
||||
### Viper configuration and hierarchichal test parameters.
|
||||
|
||||
The future of e2e test configuration idioms will be increasingly defined using viper, and decreasingly via flags.
|
||||
|
|
@ -490,7 +550,7 @@ To use viper, rather than flags, to configure your tests:
|
|||
|
||||
- Just add "e2e.json" to the current directory you are in, and define parameters in it... i.e. `"kubeconfig":"/tmp/x"`.
|
||||
|
||||
Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](../../test/e2e/framework/test_context.go).
|
||||
Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/test_context.go).
|
||||
|
||||
In time, it is our intent to add or autogenerate a sample viper configuration that includes all e2e parameters, to ship with kubernetes.
|
||||
|
||||
|
|
@ -527,13 +587,13 @@ export KUBERNETES_CONFORMANCE_TEST=y
|
|||
export KUBERNETES_PROVIDER=skeleton
|
||||
|
||||
# run all conformance tests
|
||||
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Conformance\]"
|
||||
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Conformance\]"
|
||||
|
||||
# run all parallel-safe conformance tests in parallel
|
||||
GINKGO_PARALLEL=y go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]"
|
||||
GINKGO_PARALLEL=y go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]"
|
||||
|
||||
# ... and finish up with remaining tests in serial
|
||||
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Serial\].*\[Conformance\]"
|
||||
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Serial\].*\[Conformance\]"
|
||||
```
|
||||
|
||||
### Defining Conformance Subset
|
||||
|
|
|
|||
|
|
@ -19,7 +19,7 @@
|
|||
[Gubernator](https://k8s-gubernator.appspot.com/) is a webpage for viewing and filtering Kubernetes
|
||||
test results.
|
||||
|
||||
Gubernator simplifies the debugging proccess and makes it easier to track down failures by automating many
|
||||
Gubernator simplifies the debugging process and makes it easier to track down failures by automating many
|
||||
steps commonly taken in searching through logs, and by offering tools to filter through logs to find relevant lines.
|
||||
Gubernator automates the steps of finding the failed tests, displaying relevant logs, and determining the
|
||||
failed pods and the corresponing pod UID, namespace, and container ID.
|
||||
|
|
@ -83,7 +83,7 @@ included, the "Weave by timestamp" option can weave the selected logs together b
|
|||
|
||||
*Currently Gubernator can only be used with remote node e2e tests.*
|
||||
|
||||
**NOTE: Using Gubernator with local tests will publically upload your test logs to Google Cloud Storage**
|
||||
**NOTE: Using Gubernator with local tests will publicly upload your test logs to Google Cloud Storage**
|
||||
|
||||
To use Gubernator to view logs from local test runs, set the GUBERNATOR tag to true.
|
||||
A URL link to view the test results will be printed to the console.
|
||||
|
|
|
|||
|
|
@ -1,32 +1,3 @@
|
|||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
|
||||
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
|
||||
|
||||
If you are using a released version of Kubernetes, you should
|
||||
refer to the docs that go with that version.
|
||||
|
||||
Documentation for other releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).
|
||||
</strong>
|
||||
--
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Container Runtime Interface (CRI) Networking Specifications
|
||||
|
||||
## Introduction
|
||||
|
|
@ -82,4 +53,4 @@ k8s networking requirements are satisfied.
|
|||
|
||||
## Related Issues
|
||||
* Kubelet network plugin for client/server container runtimes [#28667](https://github.com/kubernetes/kubernetes/issues/28667)
|
||||
* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316)
|
||||
* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316)
|
||||
|
|
|
|||
|
|
@ -26,7 +26,7 @@ Heapster will hide the performance cost of serving those stats in the Kubelet.
|
|||
|
||||
Disabling addons is simple. Just ssh into the Kubernetes master and move the
|
||||
addon from `/etc/kubernetes/addons/` to a backup location. More details
|
||||
[here](../../cluster/addons/).
|
||||
[here](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/).
|
||||
|
||||
### Which / how many pods?
|
||||
|
||||
|
|
@ -57,11 +57,11 @@ sampling.
|
|||
## E2E Performance Test
|
||||
|
||||
There is an end-to-end test for collecting overall resource usage of node
|
||||
components: [kubelet_perf.go](../../test/e2e/kubelet_perf.go). To
|
||||
components: [kubelet_perf.go](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/kubelet_perf.go). To
|
||||
run the test, simply make sure you have an e2e cluster running (`go run
|
||||
hack/e2e.go -up`) and [set up](#cluster-set-up) correctly.
|
||||
hack/e2e.go -- -up`) and [set up](#cluster-set-up) correctly.
|
||||
|
||||
Run the test with `go run hack/e2e.go -v -test
|
||||
Run the test with `go run hack/e2e.go -- -v -test
|
||||
--test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to
|
||||
customise the number of pods or other parameters of the test (remember to rerun
|
||||
`make WHAT=test/e2e/e2e.test` after you do).
|
||||
|
|
|
|||
|
|
@ -7,9 +7,9 @@
|
|||
|
||||
### Traffic sources and responsibilities
|
||||
|
||||
* GitHub Kubernetes [issues](https://github.com/kubernetes/kubernetes/issues)
|
||||
and [pulls](https://github.com/kubernetes/kubernetes/pulls): Your job is to be
|
||||
the first responder to all new issues and PRs. If you are not equipped to do
|
||||
* GitHub Kubernetes [issues](https://github.com/kubernetes/kubernetes/issues):
|
||||
Your job is to be
|
||||
the first responder to all new issues. If you are not equipped to do
|
||||
this (which is fine!), it is your job to seek guidance!
|
||||
|
||||
* Support issues should be closed and redirected to Stackoverflow (see example
|
||||
|
|
@ -35,18 +35,12 @@ This is the only situation in which you should add a priority/* label
|
|||
* Assign any issues related to Vagrant to @derekwaynecarr (and @mention him
|
||||
in the issue)
|
||||
|
||||
* All incoming PRs should be assigned a reviewer.
|
||||
|
||||
* unless it is a WIP (Work in Progress), RFC (Request for Comments), or design proposal.
|
||||
* An auto-assigner [should do this for you] (https://github.com/kubernetes/kubernetes/pull/12365/files)
|
||||
* When in doubt, choose a TL or team maintainer of the most relevant team; they can delegate
|
||||
|
||||
* Keep in mind that you can @ mention people in an issue/PR to bring it to
|
||||
* Keep in mind that you can @ mention people in an issue to bring it to
|
||||
their attention without assigning it to them. You can also @ mention github
|
||||
teams, such as @kubernetes/goog-ux or @kubernetes/kubectl
|
||||
|
||||
* If you need help triaging an issue or PR, consult with (or assign it to)
|
||||
@brendandburns, @thockin, @bgrant0607, @quinton-hoole, @davidopp, @dchen1107,
|
||||
* If you need help triaging an issue, consult with (or assign it to)
|
||||
@brendandburns, @thockin, @bgrant0607, @davidopp, @dchen1107,
|
||||
@lavalamp (all U.S. Pacific Time) or @fgrzadkowski (Central European Time).
|
||||
|
||||
* At the beginning of your shift, please add team/* labels to any issues that
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ redirect users to Slack. Also check out the
|
|||
|
||||
In general, try to direct support questions to:
|
||||
|
||||
1. Documentation, such as the [user guide](../user-guide/README.md) and
|
||||
1. Documentation, such as the [user guide](https://kubernetes.io/docs/user-guide/) and
|
||||
[troubleshooting guide](http://kubernetes.io/docs/troubleshooting/)
|
||||
|
||||
2. Stackoverflow
|
||||
|
|
|
|||
|
|
@ -13,9 +13,13 @@
|
|||
|
||||
# Pull Request Process
|
||||
|
||||
An overview of how pull requests are managed for kubernetes. This document
|
||||
assumes the reader has already followed the [development guide](development.md)
|
||||
to set up their environment.
|
||||
An overview of how pull requests are managed for kubernetes.
|
||||
|
||||
This document assumes the reader has already followed the
|
||||
[development guide](development.md) to set up their environment,
|
||||
and understands
|
||||
[basic pull request mechanics](https://help.github.com/articles/using-pull-requests).
|
||||
|
||||
|
||||
# Life of a Pull Request
|
||||
|
||||
|
|
@ -50,7 +54,7 @@ For cherry-pick PRs, see the [Cherrypick instructions](cherry-picks.md)
|
|||
at release time.
|
||||
1. `release-note` labeled PRs generate a release note using the PR title by
|
||||
default OR the release-note block in the PR template if filled in.
|
||||
* See the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md) for more
|
||||
* See the [PR template](https://github.com/kubernetes/kubernetes/blob/master/.github/PULL_REQUEST_TEMPLATE.md) for more
|
||||
details.
|
||||
* PR titles and body comments are mutable and can be modified at any time
|
||||
prior to the release to reflect a release note friendly message.
|
||||
|
|
|
|||
|
|
@ -45,8 +45,8 @@ The Release Management Team Lead is the person ultimately responsible for ensuri
|
|||
* Ensures that cherry-picks do not destabilize the branch by either giving the PR enough time to stabilize in master or giving it enough time to stabilize in the release branch before cutting the release.
|
||||
* Cuts the actual [release](https://github.com/kubernetes/kubernetes/releases).
|
||||
|
||||
#### Release Docs Lead
|
||||
* Sets release docs related deadlines for developers and works with Release Management Team Lead to ensure they are widely communicated.
|
||||
#### Docs Lead
|
||||
* Sets docs related deadlines for developers and works with Release Management Team Lead to ensure they are widely communicated.
|
||||
* Sets up release branch for docs.
|
||||
* Pings feature owners to ensure that release docs are created on time.
|
||||
* Reviews/merges release doc PRs.
|
||||
|
|
|
|||
|
|
@ -117,8 +117,8 @@ cluster/kubectl.sh get replicationcontrollers
|
|||
|
||||
### Running a user defined pod
|
||||
|
||||
Note the difference between a [container](../user-guide/containers.md)
|
||||
and a [pod](../user-guide/pods.md). Since you only asked for the former, Kubernetes will create a wrapper pod for you.
|
||||
Note the difference between a [container](https://kubernetes.io/docs/user-guide/containers/)
|
||||
and a [pod](https://kubernetes.io/docs/user-guide/pods/). Since you only asked for the former, Kubernetes will create a wrapper pod for you.
|
||||
However you cannot view the nginx start page on localhost. To verify that nginx is running you need to run `curl` within the docker container (try `docker exec`).
|
||||
|
||||
You can control the specifications of a pod via a user defined manifest, and reach nginx through your browser on the port specified therein:
|
||||
|
|
|
|||
|
|
@ -9,9 +9,9 @@ and for each Pod, it posts a binding indicating where the Pod should be schedule
|
|||
We are dividng scheduler into three layers from high level:
|
||||
- [plugin/cmd/kube-scheduler/scheduler.go](http://releases.k8s.io/HEAD/plugin/cmd/kube-scheduler/scheduler.go):
|
||||
This is the main() entry that does initialization before calling the scheduler framework.
|
||||
- [pkg/scheduler/scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/scheduler.go):
|
||||
- [plugin/pkg/scheduler/scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/scheduler.go):
|
||||
This is the scheduler framework that handles stuff (e.g. binding) beyond the scheduling algorithm.
|
||||
- [pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/generic_scheduler.go):
|
||||
- [plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/core/generic_scheduler.go):
|
||||
The scheduling algorithm that assigns nodes for pods.
|
||||
|
||||
## The scheduling algorithm
|
||||
|
|
@ -64,7 +64,7 @@ The Scheduler tries to find a node for each Pod, one at a time.
|
|||
- First it applies a set of "predicates" to filter out inappropriate nodes. For example, if the PodSpec specifies resource requests, then the scheduler will filter out nodes that don't have at least that much resources available (computed as the capacity of the node minus the sum of the resource requests of the containers that are already running on the node).
|
||||
- Second, it applies a set of "priority functions"
|
||||
that rank the nodes that weren't filtered out by the predicate check. For example, it tries to spread Pods across nodes and zones while at the same time favoring the least (theoretically) loaded nodes (where "load" - in theory - is measured as the sum of the resource requests of the containers running on the node, divided by the node's capacity).
|
||||
- Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in [plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/generic_scheduler.go)
|
||||
- Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in [plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/core/generic_scheduler.go)
|
||||
|
||||
### Predicates and priorities policies
|
||||
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ The purpose of filtering the nodes is to filter out the nodes that do not meet c
|
|||
- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../design-proposals/resource-qos.md).
|
||||
- `PodFitsHostPorts`: Check if any HostPort required by the Pod is already occupied on the node.
|
||||
- `HostName`: Filter out all nodes except the one specified in the PodSpec's NodeName field.
|
||||
- `MatchNodeSelector`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field and, as of Kubernetes v1.2, also match the `scheduler.alpha.kubernetes.io/affinity` pod annotation if present. See [here](../user-guide/node-selection/) for more details on both.
|
||||
- `MatchNodeSelector`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field and, as of Kubernetes v1.2, also match the `scheduler.alpha.kubernetes.io/affinity` pod annotation if present. See [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details on both.
|
||||
- `MaxEBSVolumeCount`: Ensure that the number of attached ElasticBlockStore volumes does not exceed a maximum value (by default, 39, since Amazon recommends a maximum of 40 with one of those 40 reserved for the root volume -- see [Amazon's documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html#linux-specific-volume-limits)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable.
|
||||
- `MaxGCEPDVolumeCount`: Ensure that the number of attached GCE PersistentDisk volumes does not exceed a maximum value (by default, 16, which is the maximum GCE allows -- see [GCE's documentation](https://cloud.google.com/compute/docs/disks/persistent-disks#limits_for_predefined_machine_types)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable.
|
||||
- `CheckNodeMemoryPressure`: Check if a pod can be scheduled on a node reporting memory pressure condition. Currently, no ``BestEffort`` should be placed on a node under memory pressure as it gets automatically evicted by kubelet.
|
||||
|
|
@ -34,7 +34,7 @@ Currently, Kubernetes scheduler provides some practical priority functions, incl
|
|||
- `SelectorSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service, replication controller, or replica set on the same node. If zone information is present on the nodes, the priority will be adjusted so that pods are spread across zones and nodes.
|
||||
- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label.
|
||||
- `ImageLocalityPriority`: Nodes are prioritized based on locality of images requested by a pod. Nodes with larger size of already-installed packages required by the pod will be preferred over nodes with no already-installed packages required by the pod or a small total size of already-installed packages required by the pod.
|
||||
- `NodeAffinityPriority`: (Kubernetes v1.2) Implements `preferredDuringSchedulingIgnoredDuringExecution` node affinity; see [here](../user-guide/node-selection/) for more details.
|
||||
- `NodeAffinityPriority`: (Kubernetes v1.2) Implements `preferredDuringSchedulingIgnoredDuringExecution` node affinity; see [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details.
|
||||
|
||||
The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize).
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,103 @@
|
|||
# Security Release Process
|
||||
|
||||
Kubernetes is a large growing community of volunteers, users, and vendors. The Kubernetes community has adopted this security disclosures and response policy to ensure we responsibly handle critical issues.
|
||||
|
||||
## Product Security Team (PST)
|
||||
|
||||
Security vulnerabilities should be handled quickly and sometimes privately. The primary goal of this process is to reduce the total time users are vulnerable to publicly known exploits.
|
||||
|
||||
The Product Security Team (PST) is responsible for organizing the entire response including internal communication and external disclosure but will need help from relevant developers and release managers to successfully run this process.
|
||||
|
||||
The initial Product Security Team will consist of four volunteers subscribed to the private [Kubernetes Security](https://groups.google.com/forum/#!forum/kubernetes-security) list. These are the people who have been involved in the initial discussion and volunteered:
|
||||
|
||||
- Brandon Philips `<brandon.philips@coreos.com>` [4096R/154343260542DF34]
|
||||
- Jess Frazelle `<jessfraz@google.com>`
|
||||
- CJ Cullen `<cjcullen@google.com>`
|
||||
- Tim St. Clair `<stclair@google.com>` [4096R/0x5E6F2E2DA760AF51]
|
||||
- Jordan Liggitt `<jliggitt@redhat.com>`
|
||||
|
||||
**Known issues**
|
||||
|
||||
- We haven't specified a way to cycle the Product Security Team; but we need this process deployed quickly as our current process isn't working. I (@philips) will put a deadline of March 1st 2017 to sort that.
|
||||
|
||||
## Release Manager Role
|
||||
|
||||
Also included on the private [Kubernetes Security](https://groups.google.com/forum/#!forum/kubernetes-security) list are all [Release Managers](https://github.com/kubernetes/community/wiki).
|
||||
|
||||
It is the responsibility of the PST to add and remove Release Managers as Kubernetes minor releases created and deprecated.
|
||||
|
||||
## Disclosures
|
||||
|
||||
### Private Disclosure Processes
|
||||
|
||||
The Kubernetes Community asks that all suspected vulnerabilities be privately and responsibly disclosed via the Private Disclosure process available at [https://kubernetes.io/security](https://kubernetes.io/security).
|
||||
|
||||
### Public Disclosure Processes
|
||||
|
||||
If you know of a publicly disclosed security vulnerability please IMMEDIATELY email [kubernetes-security@googlegroups.com](mailto:kubernetes-security@googlegroups.com) to inform the Product Security Team (PST) about the vulnerability so they may start the patch, release, and communication process.
|
||||
|
||||
If possible the PST will ask the person making the public report if the issue can be handled via a private disclosure process. If the reporter denies the PST will move swiftly with the fix and release process. In extreme cases you can ask GitHub to delete the issue but this generally isn't necessary and is unlikely to make a public disclosure less damaging.
|
||||
|
||||
## Patch, Release, and Public Communication
|
||||
|
||||
For each vulnerability a member of the PST will volunteer to lead coordination with the Fix Team, Release Managers and is responsible for sending disclosure emails to the rest of the community. This lead will be referred to as the Fix Lead.
|
||||
|
||||
The role of Fix Lead should rotate round-robin across the PST.
|
||||
|
||||
All of the timelines below are suggestions and assume a Private Disclosure. The Fix Lead drives the schedule using their best judgment based on severity, development time, and release manager feedback. If the Fix Lead is dealing with a Public Disclosure all timelines become ASAP.
|
||||
|
||||
### Fix Team Organization
|
||||
|
||||
These steps should be completed within the first 24 hours of Disclosure.
|
||||
|
||||
- The Fix Lead will work quickly to identify relevant engineers from the affected projects and packages and CC those engineers into the disclosure thread. This selected developers are the Fix Team. A best guess is to invite all assignees in the OWNERS file from the affected packages.
|
||||
- The Fix Lead will get the Fix Team access to private security repos to develop the fix.
|
||||
|
||||
### Fix Development Process
|
||||
|
||||
These steps should be completed within the 1-7 days of Disclosure.
|
||||
|
||||
- The Fix Lead and the Fix Team will create a [CVSS](https://www.first.org/cvss/specification-document) using the [CVSS Calculator](https://www.first.org/cvss/calculator/3.0). The Fix Lead makes the final call on the calculated CVSS; it is better to move quickly than make the CVSS prefect.
|
||||
- The Fix Team will notify the Fix Lead that work on the fix branch is complete once there are LGTMs on all commits in the private repo from one or more relevant assignees in the relevant OWNERS file.
|
||||
|
||||
If the CVSS score is under 4.0 ([a low severity score](https://www.first.org/cvss/specification-document#i5)) the Fix Team can decide to slow the release process down in the face of holidays, developer bandwidth, etc. These decisions must be discussed on the kubernetes-security mailing list.
|
||||
|
||||
### Fix Disclosure Process
|
||||
|
||||
With the Fix Development underway the Fix Lead needs to come up with an overall communication plan for the wider community. This Disclosure process should begin after the Fix Team has developed a Fix or mitigation so that a realistic timeline can be communicated to users.
|
||||
|
||||
**Disclosure of Forthcoming Fix to Users** (Completed within 1-7 days of Disclosure)
|
||||
|
||||
- The Fix Lead will email [kubernetes-announce@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-announce) and [kubernetes-security-announce@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-security-announce) informing users that a security vulnerability has been disclosed and that a fix will be made available at YYYY-MM-DD HH:MM UTC in the future via this list. This time is the Release Date.
|
||||
- The Fix Lead will include any mitigating steps users can take until a fix is available.
|
||||
|
||||
The communication to users should be actionable. They should know when to block time to apply patches, understand exact mitigation steps, etc.
|
||||
|
||||
**Optional Fix Disclosure to Private Distributors List** (Completed within 1-14 days of Disclosure):
|
||||
|
||||
- The Fix Lead will make a determination with the help of the Fix Team if an issue is critical enough to require early disclosure to distributors. Generally this Private Distributor Disclosure process should be reserved for remotely exploitable or privilege escalation issues. Otherwise, this process can be skipped.
|
||||
- The Fix Lead will email the patches to kubernetes-distributors-announce@googlegroups.com so distributors can prepare builds to be available to users on the day of the issue's announcement. Distributors can ask to be added to this list by emailing kubernetes-security@googlegroups.com and it is up to the Product Security Team's discretion to manage the list.
|
||||
- TODO: Figure out process for getting folks onto this list.
|
||||
- **What if a vendor breaks embargo?** The PST will assess the damage. The Fix Lead will make the call to release earlier or continue with the plan. When in doubt push forward and go public ASAP.
|
||||
|
||||
**Fix Release Day** (Completed within 1-21 days of Disclosure)
|
||||
|
||||
- The Release Managers will ensure all the binaries are built, publicly available, and functional before the Release Date.
|
||||
- TODO: this will require a private security build process.
|
||||
- The Release Managers will create a new patch release branch from the latest patch release tag + the fix from the security branch. As a practical example if v1.5.3 is the latest patch release in kubernetes.git a new branch will be created called v1.5.4 which includes only patches required to fix the issue.
|
||||
- The Fix Lead will cherry-pick the patches onto the master branch and all relevant release branches. The Fix Team will LGTM and merge.
|
||||
- The Release Managers will merge these PRs as quickly as possible. Changes shouldn't be made to the commits even for a typo in the CHANGELOG as this will change the git sha of the already built and commits leading to confusion and potentially conflicts as the fix is cherry-picked around branches.
|
||||
- The Fix Lead will request a CVE from [DWF](https://github.com/distributedweaknessfiling/DWF-Documentation) and include the CVSS and release details.
|
||||
- The Fix Lead will email kubernetes-{dev,users,announce,security-announce}@googlegroups.com now that everything is public announcing the new releases, the CVE number, the location of the binaries, and the relevant merged PRs to get wide distribution and user action. As much as possible this email should be actionable and include links how to apply the fix to users environments; this can include links to external distributor documentation.
|
||||
- The Fix Lead will remove the Fix Team from the private security repo.
|
||||
|
||||
### Retrospective
|
||||
|
||||
These steps should be completed 1-3 days after the Release Date. The retrospective process [should be blameless](https://landing.google.com/sre/book/chapters/postmortem-culture.html).
|
||||
|
||||
- The Fix Lead will send a retrospective of the process to kubernetes-dev@googlegroups.com including details on everyone involved, the timeline of the process, links to relevant PRs that introduced the issue, if relevant, and any critiques of the response and release process.
|
||||
- The Release Managers and Fix Team are also encouraged to send their own feedback on the process to kubernetes-dev@googlegroups.com. Honest critique is the only way we are going to get good at this as a community.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
@ -69,19 +69,19 @@ You can set [go flags](https://golang.org/cmd/go/) by setting the
|
|||
added automatically to these:
|
||||
|
||||
```sh
|
||||
make test WHAT=pkg/api # run tests for pkg/api
|
||||
make test WHAT=./pkg/api # run tests for pkg/api
|
||||
```
|
||||
|
||||
To run multiple targets you need quotes:
|
||||
|
||||
```sh
|
||||
make test WHAT="pkg/api pkg/kubelet" # run tests for pkg/api and pkg/kubelet
|
||||
make test WHAT="./pkg/api ./pkg/kubelet" # run tests for pkg/api and pkg/kubelet
|
||||
```
|
||||
|
||||
In a shell, it's often handy to use brace expansion:
|
||||
|
||||
```sh
|
||||
make test WHAT=pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet
|
||||
make test WHAT=./pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet
|
||||
```
|
||||
|
||||
### Run specific unit test cases in a package
|
||||
|
|
@ -92,10 +92,10 @@ regular expression for the name of the test that should be run.
|
|||
|
||||
```sh
|
||||
# Runs TestValidatePod in pkg/api/validation with the verbose flag set
|
||||
make test WHAT=pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$'
|
||||
make test WHAT=./pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$'
|
||||
|
||||
# Runs tests that match the regex ValidatePod|ValidateConfigMap in pkg/api/validation
|
||||
make test WHAT=pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$"
|
||||
make test WHAT=./pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$"
|
||||
```
|
||||
|
||||
For other supported test flags, see the [golang
|
||||
|
|
@ -130,7 +130,7 @@ To run tests and collect coverage in only one package, pass its relative path
|
|||
under the `kubernetes` directory as an argument, for example:
|
||||
|
||||
```sh
|
||||
make test WHAT=pkg/kubectl KUBE_COVER=y
|
||||
make test WHAT=./pkg/kubectl KUBE_COVER=y
|
||||
```
|
||||
|
||||
Multiple arguments can be passed, in which case the coverage results will be
|
||||
|
|
@ -215,7 +215,7 @@ script to run a specific integration test case:
|
|||
|
||||
```sh
|
||||
# Run integration test TestPodUpdateActiveDeadlineSeconds with the verbose flag set.
|
||||
make test-integration KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$"
|
||||
make test-integration WHAT=./test/integration/pods KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$"
|
||||
```
|
||||
|
||||
If you set `KUBE_TEST_ARGS`, the test case will be run with only the `v1` API
|
||||
|
|
|
|||
|
|
@ -146,7 +146,7 @@ right thing.
|
|||
|
||||
Here are a few pointers:
|
||||
|
||||
+ [E2e Framework](../../test/e2e/framework/framework.go):
|
||||
+ [E2e Framework](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/framework.go):
|
||||
Familiarise yourself with this test framework and how to use it.
|
||||
Amongst others, it automatically creates uniquely named namespaces
|
||||
within which your tests can run to avoid name clashes, and reliably
|
||||
|
|
@ -160,7 +160,7 @@ Here are a few pointers:
|
|||
should always use this framework. Trying other home-grown
|
||||
approaches to avoiding name clashes and resource leaks has proven
|
||||
to be a very bad idea.
|
||||
+ [E2e utils library](../../test/e2e/framework/util.go):
|
||||
+ [E2e utils library](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/util.go):
|
||||
This handy library provides tons of reusable code for a host of
|
||||
commonly needed test functionality, including waiting for resources
|
||||
to enter specified states, safely and consistently retrying failed
|
||||
|
|
@ -178,9 +178,9 @@ Here are a few pointers:
|
|||
+ **Follow the examples of stable, well-written tests:** Some of our
|
||||
existing end-to-end tests are better written and more reliable than
|
||||
others. A few examples of well-written tests include:
|
||||
[Replication Controllers](../../test/e2e/rc.go),
|
||||
[Services](../../test/e2e/service.go),
|
||||
[Reboot](../../test/e2e/reboot.go).
|
||||
[Replication Controllers](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/rc.go),
|
||||
[Services](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/service.go),
|
||||
[Reboot](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/reboot.go).
|
||||
+ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the
|
||||
test library and runner upon which our e2e tests are built. Before
|
||||
you write or refactor a test, read the docs and make sure that you
|
||||
|
|
|
|||
|
|
@ -113,14 +113,15 @@ These are grandfathered in as full projects:
|
|||
- github.com/kubernetes/md-check
|
||||
- github.com/kubernetes/pr-bot - move from mungebot, etc from contrib, currently running in "prod" on github.com/kubernetes
|
||||
- github.com/kubernetes/dashboard
|
||||
- github.com/kubernetes/helm (Graduated from incubator on Feb 2017)
|
||||
- github.com/kubernetes/minikube (Graduated from incubator on Feb 2017)
|
||||
|
||||
**Project to Incubate But Not Move**
|
||||
|
||||
These projects are young but have significant user facing docs pointing at their current github.com/kubernetes location. Lets put them through incubation process but leave them at github.com/kubernetes.
|
||||
|
||||
- github.com/kubernetes/minikube
|
||||
- github.com/kubernetes/charts
|
||||
|
||||
|
||||
**Projects to Move to Incubator**
|
||||
|
||||
- github.com/kubernetes/kube2consul
|
||||
|
|
@ -149,5 +150,6 @@ Large portions of this process and prose are inspired by the Apache Incubator pr
|
|||
## Original Discussion
|
||||
https://groups.google.com/d/msg/kubernetes-dev/o6E1u-orDK8/SAqal_CeCgAJ
|
||||
|
||||
## Future Work
|
||||
## Future Work
|
||||
|
||||
- Expanding potential sources of champions outside of Kubernetes main repo
|
||||
|
|
|
|||
|
|
@ -1,10 +1,11 @@
|
|||
# SIG AWS
|
||||
|
||||
A Special Interest Group for maintaining, supporting, and using Kubernetes on AWS.
|
||||
A Special Interest Group for maintaining, supporting, and using Kubernetes on AWS.
|
||||
|
||||
## Meeting:
|
||||
- Meetings: Scheduled via the official [group mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-aws)
|
||||
- Zoom Link: [SIG AWS](https://zoom.us/my/k8ssigaws)
|
||||
- Agenda: [Google Doc](https://docs.google.com/document/d/1-i0xQidlXnFEP9fXHWkBxqySkXwJnrGJP9OGyP2_P14/edit)
|
||||
|
||||
## Organizers:
|
||||
|
||||
|
|
@ -15,4 +16,4 @@ A Special Interest Group for maintaining, supporting, and using Kubernetes on AW
|
|||
| Kris Nova | [kris-nova](https://github.com/kris-nova) |
|
||||
| Mackenzie Burnett | [mfburnett](https://github.com/mfburnett) |
|
||||
|
||||
The meeting is open to all and we encourage you to join. Feel free to join the zoom call at your convenience.
|
||||
The meeting is open to all and we encourage you to join. Feel free to join the zoom call at your convenience.
|
||||
|
|
|
|||
|
|
@ -1,3 +1,26 @@
|
|||
# NOTE: THE BIG DATA SIG IS INDEFINITELY SUSPENDED, IN FAVOR OF THE ["APPS" SIG](https://github.com/kubernetes/community/blob/master/sig-apps/README.md).
|
||||
# SIG Big Data
|
||||
|
||||
[Old Meeting Notes](https://docs.google.com/document/d/1YhNLN39f5oZ4AHn_g7vBp0LQd7k37azL7FkWG8CEDrE/edit)
|
||||
A Special Interest Group for deploying and operating big data applications (Spark, Kafka, Hadoop, Flink, Storm, etc) on Kubernetes. We focus on integrations with big data applications and architecting the best ways to run them on Kubernetes.
|
||||
|
||||
## Meeting:
|
||||
|
||||
* Meetings: Wednesdays 10:00 AM PST
|
||||
* Video Conference Link: updated in [the official group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data)
|
||||
* Check out the [Agenda and Minutes](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit)! Note: this SIG was operational briefly in 2015. Minutes for those meetings are in [their prior location](https://docs.google.com/document/u/1/d/1YhNLN39f5oZ4AHn_g7vBp0LQd7k37azL7FkWG8CEDrE/edit).
|
||||
* Slack: https://kubernetes.slack.com/messages/sig-big-data/
|
||||
|
||||
## Goals:
|
||||
|
||||
* Design and architect ways to run big data applications effectively on Kubernetes
|
||||
* Discuss ongoing implementation efforts
|
||||
* Discuss resource sharing and multi-tenancy (in the context of big data applications)
|
||||
* Suggest Kubernetes features where we see a need
|
||||
|
||||
## Non-goals:
|
||||
|
||||
* Endorsing any particular tool/framework
|
||||
|
||||
## Organizers:
|
||||
|
||||
* [Anirudh Ramanathan](https://github.com/foxish), Google
|
||||
* [Kenneth Owens](https://github.com/kow3ns), Google
|
||||
|
|
|
|||
|
|
@ -13,7 +13,31 @@ We focus on the development and standardization of the CLI [framework](https://g
|
|||
* Slack: <https://kubernetes.slack.com/messages/sig-cli> ([archive](http://kubernetes.slackarchive.io/sig-cli))
|
||||
* Google Group: <https://groups.google.com/forum/#!forum/kubernetes-sig-cli>
|
||||
|
||||
## Organizers:
|
||||
## Leads
|
||||
|
||||
**Note:** Escalate to these folks if you cannot get help from slack or the Google group
|
||||
|
||||
* Fabiano Franz <ffranz@redhat.com>, Red Hat
|
||||
- slack / github: @fabianofranz
|
||||
* Phillip Wittrock <pwittroc@google.com>, Google
|
||||
* Tony Ado <coolhzb@gmail.com>, Alibaba
|
||||
- slack / github: @pwittrock
|
||||
* Tony Ado <coolhzb@gmail.com>, Alibaba
|
||||
- slack / github: @adohe
|
||||
|
||||
## Contributing
|
||||
|
||||
See [this document](https://github.com/kubernetes/community/blob/master/sig-cli/contributing.md) for contributing instructions.
|
||||
|
||||
## Sig-cli teams
|
||||
|
||||
Mention one or more of
|
||||
|
||||
| Name | Description |
|
||||
|------------------------------------|--------------------------------------------------|
|
||||
|@kubernetes/sig-cli-bugs | For bugs in kubectl |
|
||||
|@kubernetes/sig-cli-feature-requests| For initial discussion of new feature requests |
|
||||
|@kubernetes/sig-cli-proposals | For in depth discussion of new feature proposals |
|
||||
|@kubernetes/sig-cli-pr-reviews | For PR code reviews |
|
||||
|@kubernetes/sig-test-failures | For e2e test flakes |
|
||||
|@kubernetes/sig-cli-misc | For general discussion and escalation |
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,306 @@
|
|||
# Contributing
|
||||
|
||||
The process for contributing code to Kubernetes via the sig-cli [community][community page].
|
||||
|
||||
## TL;DR
|
||||
|
||||
- The sig-cli [community page] lists sig-cli [leads],
|
||||
channels of [communication], and group [meeting] times.
|
||||
- New contributors: please start by adopting an [existing issue].
|
||||
- Request a feature by making an [issue] and mentioning
|
||||
`@kubernetes/sig-cli-feature-requests`.
|
||||
- Write a [design proposal] before starting work on a new feature.
|
||||
- Write [tests]!
|
||||
|
||||
## Before You Begin
|
||||
|
||||
Welcome to the Kubernetes sig-cli contributing guide. We are excited
|
||||
about the prospect of you joining our [community][community page]!
|
||||
|
||||
Please understand that all contributions to Kubernetes require time
|
||||
and commitment from the project maintainers to review the ux, software
|
||||
design, and code. Mentoring and on-boarding new contributors is done
|
||||
in addition to many other responsibilities.
|
||||
|
||||
### Understand the big picture
|
||||
|
||||
- Complete the [Kubernetes Basics Tutorial].
|
||||
- Be familiar with [kubctl user facing documentation ][kubectl docs].
|
||||
- Read the concept guides starting with the [management overview].
|
||||
|
||||
### Modify your own `kubectl` fork
|
||||
|
||||
Make sure you are ready to immediately get started once you have been
|
||||
assigned a piece of work. Do this right away.
|
||||
|
||||
- Setup your [development environment][development guide].
|
||||
- Look at code:
|
||||
- [kubernetes/cmd/kubectl] is the entry point
|
||||
- [kubernetes/pkg/kubectl] is the implementation
|
||||
- Look at how some of the other commands are implemented
|
||||
- Add a new command to do something simple:
|
||||
- Add `kubectl hello-world`: print "Hello World"
|
||||
- Add `kubectl hello-kubernetes -f file`: Print "Hello \<kind of resource\> \<name of resource\>"
|
||||
- Add `kubectl hello-kubernetes type/name`: Print "Hello \<kind of resource\> \<name of resource\> \<creation time\>"
|
||||
|
||||
### Agree to contribution rules
|
||||
|
||||
Follow the [CLA signup instructions](../CLA.md).
|
||||
|
||||
### Adopt an issue
|
||||
|
||||
New contributors can try the following to work on an existing [bug] or [approved design][design repo]:
|
||||
|
||||
- In [slack][slack-messages] (signup [here][slack-signup]),
|
||||
@mention a [lead][leads] and ask if there are any issues you could pick up.
|
||||
Leads can recommend issues that have enough priority to receive PR review bandwidth.
|
||||
- Send an email to the _kubernetes-sig-cli@googlegroups.com_ [group]
|
||||
|
||||
> Subject: New sig-cli contributor _${yourName}_
|
||||
>
|
||||
> Body: Hello, my name is _${yourName}_. I would like to get involved in
|
||||
> contributing to the Kubernetes project. I have read all of the
|
||||
> user documentation listed on the community contributring page.
|
||||
> What should I do next to get started?
|
||||
|
||||
- Attend a sig-cli [meeting] and introduce yourself as looking to get started.
|
||||
|
||||
### Bug lifecycle
|
||||
|
||||
1. An [issue] is filed that
|
||||
- includes steps to reproduce the issue including client / server version,
|
||||
- mentions `@kubernetes/sig-cli-bugs`.
|
||||
2. A [PR] fixing the issue is implemented that
|
||||
- __includes unit and e2e tests__,
|
||||
- incorporates review feedback,
|
||||
- description includes `Closes #<Issue Number>`,
|
||||
- description or comment @mentions `@kubernetes/sig-cli-pr-reviews`.
|
||||
3. Fix appears in the next Kubernetes release!
|
||||
|
||||
## Feature requests
|
||||
|
||||
__New contributors:__ Please start by adopting an [existing issue].
|
||||
|
||||
A feature request is an [issue] mentioning `@kubernetes/sig-cli-feature-requests`.
|
||||
|
||||
To encourage readership, the issue description should _concisely_ (2-4 sentence) describe
|
||||
the problem that the feature addresses.
|
||||
|
||||
### Feature lifecycle
|
||||
|
||||
Working on a feature without getting approval for the user experience
|
||||
and software design often results in wasted time and effort due to
|
||||
decisions around flag-names, command names, and specific command
|
||||
behavior.
|
||||
|
||||
To minimize wasted work and improve communication across efforts,
|
||||
the user experience and software design must be agreed upon before
|
||||
any PRs are sent for code review.
|
||||
|
||||
1. Identify a problem by filing an [issue] (mention `@kubernetes/sig-cli-feature-requests`).
|
||||
2. Submit a [design proposal] and get it approved by a lead.
|
||||
3. Announce the proposal as an [agenda] item for the sig-cli [meeting].
|
||||
- Ensures awareness and feedback.
|
||||
- Should be included in meeting notes sent to the sig-cli [group].
|
||||
4. _Merge_ the proposal PR after approval and announcement.
|
||||
5. A [lead][leads] adds the associated feature to the [feature repo], ensuring that
|
||||
- release-related decisions are properly made and communicated,
|
||||
- API changes are vetted,
|
||||
- testing is completed,
|
||||
- docs are completed,
|
||||
- feature is designated _alpha_, _beta_ or _GA_.
|
||||
6. Implement the code per discussion in [bug lifecycle][bug].
|
||||
7. Update [kubectl concept docs].
|
||||
8. Wait for your feature to appear in the next Kubernetes release!
|
||||
|
||||
|
||||
## Design Proposals
|
||||
|
||||
__New contributors:__ Please start by adopting an [existing issue].
|
||||
|
||||
A design proposal is a single markdown document in the [design repo]
|
||||
that follows the [design template].
|
||||
|
||||
To make one,
|
||||
- Prepare the markdown document as a PR to that repo.
|
||||
- Avoid _Work In Progress_ (WIP) PRs (send it only after
|
||||
you consider it complete).
|
||||
- For early feedback, use the email discussion [group].
|
||||
- Mention `@kubernetes/sig-cli-proposals` in the description.
|
||||
- Mention the related [feature request].
|
||||
|
||||
Expect feedback from 2-3 different sig-cli community members.
|
||||
|
||||
Incorporate feedback and comment [`PTAL`].
|
||||
|
||||
Once a [lead][leads] has agreed (via review commentary) that design
|
||||
and code review resources can be allocated to tackle the proposal, the
|
||||
details of the user experience and design should be discussed in the
|
||||
community.
|
||||
|
||||
This step is _important_; it prevents code churn and thrashing around
|
||||
issues like flag names, command names, etc.
|
||||
|
||||
It is normal for sig-cli community members to push back on feature
|
||||
proposals. sig-cli development and review resources are extremely
|
||||
constrained. Community members are free to say
|
||||
|
||||
- No, not this release (or year).
|
||||
- This is desirable but we need help on these other existing issues before tackling this.
|
||||
- No, this problem should be solved in another way.
|
||||
|
||||
The proposal can be merged into the [design repo] after [lead][leads]
|
||||
approval and discussion as a meeting [agenda] item.
|
||||
|
||||
Then coding can begin.
|
||||
|
||||
## Implementation
|
||||
|
||||
Contributors can begin implementing a feature before any of the above
|
||||
steps have been completed, but _should not send a PR until
|
||||
the [design proposal] has been merged_.
|
||||
|
||||
See the [development guide] for instructions on setting up the
|
||||
Kubernetes development environment.
|
||||
|
||||
Implementation PRs should
|
||||
- mention the issue of the associated design proposal,
|
||||
- mention `@kubernetes/sig-cli-pr-reviews`,
|
||||
- __include tests__.
|
||||
|
||||
Small features and flag changes require only unit/integration tests,
|
||||
while larger changes require both unit/integration tests and e2e tests.
|
||||
|
||||
### Report progress
|
||||
|
||||
_Leads need your help to ensure that progress is made to
|
||||
get the feature into a [release]._
|
||||
|
||||
While working on the issue, leave a weekly update on the issue
|
||||
including:
|
||||
|
||||
1. What's finished?
|
||||
2. What's part is being worked on now?
|
||||
3. Anything blocking?
|
||||
|
||||
|
||||
## Documentation
|
||||
|
||||
_Let users know about cool new features by updating user facing documentation._
|
||||
|
||||
Depending on the contributor and size of the feature, this
|
||||
may be done either by the same contributor that implemented the feature,
|
||||
or another contributor who is more familiar with the existing docs
|
||||
templates.
|
||||
|
||||
## Release
|
||||
|
||||
Several weeks before a Kubernetes release, development enters a stabilization
|
||||
period where no new features are merged. For a feature to be accepted
|
||||
into a release, it must be fully merged and tested by this time. If
|
||||
your feature is not fully complete, _including tests_, it will have
|
||||
to wait until the next release.
|
||||
|
||||
## Merge state meanings
|
||||
|
||||
- Merged:
|
||||
- Ready to be implemented.
|
||||
- Unmerged:
|
||||
- Experience and design still being worked out.
|
||||
- Not a high priority issue but may implement in the future: revisit
|
||||
in 6 months.
|
||||
- Unintentionally dropped.
|
||||
- Closed:
|
||||
- Not something we plan to implement in the proposed manner.
|
||||
- Not something we plan to revisit in the next 12 months.
|
||||
|
||||
## Escalation
|
||||
|
||||
### If your bug issue is stuck
|
||||
|
||||
If an issue isn't getting any attention and is unresolved, mention
|
||||
`@kubernetes/sig-cli-bugs`.
|
||||
|
||||
Highlight the severity and urgency of the issue. For severe issues
|
||||
escalate by contacting sig [leads] and attending the [meeting].
|
||||
|
||||
### If your feature request issue is stuck
|
||||
|
||||
If an issue isn't getting any attention and is unresolved, mention
|
||||
`@kubernetes/sig-cli-feature-requests`.
|
||||
|
||||
If a particular issue has a high impact for you or your business,
|
||||
make sure this is clear on the bug, and reach out to the sig leads
|
||||
directly. Consider attending the sig meeting to discuss over video
|
||||
conference.
|
||||
|
||||
### If your PR is stuck
|
||||
|
||||
It may happen that your PR seems to be stuck without clear actionable
|
||||
feedback for a week or longer. A PR _associated with a bug or design
|
||||
proposal_ is much less likely to be stuck than a dangling PR.
|
||||
|
||||
However, if it happens do the following:
|
||||
|
||||
- If your PR is stuck for a week or more because it has never gotten any
|
||||
comments, mention `@kubernetes/sig-cli-pr-reviews` and ask for attention.
|
||||
- If your PR is stuck for a week or more _after_ it got comments, but
|
||||
the attention has died down. Mention the reviewer and comment with
|
||||
[`PTAL`].
|
||||
|
||||
If you are still not able to get any attention after a couple days,
|
||||
escalate to sig [leads] by mentioning them.
|
||||
|
||||
### If your design proposal issue is stuck
|
||||
|
||||
It may happen that your design doc gets stuck without getting merged
|
||||
or additional feedback. If you believe that your design is important
|
||||
and has been dropped, or it is not moving forward, please add it to
|
||||
the sig cli bi-weekly meeting [agenda] and mail the [group] saying
|
||||
you'd like to discuss it.
|
||||
|
||||
### General escalation instructions
|
||||
|
||||
See the sig-cli [community page] for points of contact and meeting times:
|
||||
|
||||
- attend the sig-cli [meeting]
|
||||
- message one of the sig leads on [slack][slack-messages] (signup [here][slack-signup])
|
||||
- send an email to the _kubernetes-sig-cli@googlegroups.com_ [group].
|
||||
|
||||
## Use of [@mentions]
|
||||
|
||||
- `@{any lead}` solicit opinion or advice from [leads].
|
||||
- `@kubernetes/sig-cli-bugs` sig-cli centric bugs.
|
||||
- `@kubernetes/sig-cli-pr-reviews` triggers review of code fix PR.
|
||||
- `@kubernetes/sig-cli-feature-requests` flags a feature request.
|
||||
- `@kubernetes/sig-cli-proposals` flags a design proposal.
|
||||
|
||||
[@mentions]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#mentioning-users-and-teams
|
||||
[Kubernetes Basics Tutorial]: https://kubernetes.io/docs/tutorials/kubernetes-basics
|
||||
[PR]: https://help.github.com/articles/creating-a-pull-request
|
||||
[`PTAL`]: https://en.wiktionary.org/wiki/PTAL
|
||||
[agenda]: https://docs.google.com/document/d/1r0YElcXt6G5mOWxwZiXgGu_X6he3F--wKwg-9UBc29I/edit
|
||||
[bug]: #bug-lifecycle
|
||||
[communication]: https://github.com/kubernetes/community/tree/master/sig-cli#communication
|
||||
[community page]: https://github.com/kubernetes/community/tree/master/sig-cli
|
||||
[design proposal]: #design-proposals
|
||||
[design repo]: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/sig-cli
|
||||
[design template]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/sig-cli/template.md
|
||||
[development guide]: https://github.com/kubernetes/community/blob/master/contributors/devel/development.md
|
||||
[existing issue]: #adopt-an-issue
|
||||
[feature repo]: https://github.com/kubernetes/features
|
||||
[feature request]: #feature-requests
|
||||
[feature]: https://github.com/kubernetes/features
|
||||
[group]: https://groups.google.com/forum/#!forum/kubernetes-sig-cli
|
||||
[issue]: https://github.com/kubernetes/kubernetes/issues
|
||||
[kubectl concept docs]: https://github.com/kubernetes/kubernetes.github.io/tree/master/docs/concepts/tools/kubectl
|
||||
[kubectl docs]: https://kubernetes.io/docs/user-guide/kubectl-overview
|
||||
[kubernetes/cmd/kubectl]: https://github.com/kubernetes/kubernetes/tree/master/cmd/kubectl
|
||||
[kubernetes/pkg/kubectl]: https://github.com/kubernetes/kubernetes/tree/master/pkg/kubectl
|
||||
[leads]: https://github.com/kubernetes/community/tree/master/sig-cli#leads
|
||||
[management overview]: https://kubernetes.io/docs/concepts/tools/kubectl/object-management-overview
|
||||
[meeting]: https://github.com/kubernetes/community/tree/master/sig-cli#meetings
|
||||
[release]: #release
|
||||
[slack-messages]: https://kubernetes.slack.com/messages/sig-cli
|
||||
[slack-signup]: http://slack.k8s.io/
|
||||
[tests]: https://github.com/kubernetes/community/blob/master/contributors/devel/testing.md
|
||||
|
|
@ -19,6 +19,7 @@ and we need to get the monotonically growing PR merge latency and numbers of ope
|
|||
|
||||
##Organizers:
|
||||
* Garrett Rodrigues grod@google.com, @grodrigues3, Google
|
||||
* Elsie Phillips elsie.phillips@coreos.com, @Phillels, CoreOS
|
||||
|
||||
##Issues:
|
||||
* [Detailed backlog](https://github.com/kubernetes/contrib/projects/1)
|
||||
|
|
|
|||
|
|
@ -11,12 +11,11 @@
|
|||
* Announce new SIG on kubernetes-dev@googlegroups.com
|
||||
* Submit a PR to add a row for the SIG to the table in the kubernetes/community README.md file, to create a kubernetes/community directory, and to add any SIG-related docs, schedules, roadmaps, etc. to your new kubernetes/community/SIG-foo directory.
|
||||
|
||||
####
|
||||
**Google Groups creation**
|
||||
#### **Google Groups creation**
|
||||
|
||||
Create Google Groups at [https://groups.google.com/forum/#!creategroup](https://groups.google.com/forum/#!creategroup), following the procedure:
|
||||
|
||||
* Each SIG should have one discussion groups, and a number of groups for mirroring relevant github notificaitons;
|
||||
* Each SIG should have one discussion groups, and a number of groups for mirroring relevant github notifications;
|
||||
* Create groups using the name conventions below;
|
||||
* Groups should be created as e-mail lists with at least three owners (including sarahnovotny at google.com and ihor.dvoretskyi at gmail.com);
|
||||
* To add the owners, visit the Group Settings (drop-down menu on the right side), select Direct Add Members on the left side and add Sarah and Ihor via email address (with a suitable welcome message); in Members/All Members select Ihor and Sarah and assign to an "owner role";
|
||||
|
|
@ -44,8 +43,7 @@ Example:
|
|||
* kubernetes-sig-onprem-pr-reviews
|
||||
* kubernetes-sig-onprem-api-reviews
|
||||
|
||||
####
|
||||
**GitHub users creation**
|
||||
#### **GitHub users creation**
|
||||
|
||||
Create the GitHub users at [https://github.com/join](https://github.com/join), using the name convention below.
|
||||
|
||||
|
|
@ -76,8 +74,7 @@ Example:
|
|||
|
||||
NOTE: We have found that Github's notification autocompletion finds the users before the corresponding teams. This is the reason we recommend naming the users `k8s-mirror-foo-*` instead of `k8s-sig-foo-*`. If you previously created users named `k8s-sig-foo-*`, we recommend you rename them.
|
||||
|
||||
####
|
||||
**Create the GitHub teams**
|
||||
#### **Create the GitHub teams**
|
||||
|
||||
Create the GitHub teams at [https://github.com/orgs/kubernetes/new-team](https://github.com/orgs/kubernetes/new-team), using the name convention below. Please, add the GitHub users (created before) to the GitHub teams respectively.
|
||||
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ A Special Interest Group for documentation, doc processes, and doc publishing fo
|
|||
|
||||
## Meeting:
|
||||
* Meetings: Tuesdays @ 10:30AM PST
|
||||
* Zoom Link: https://zoom.us/j/4730809290
|
||||
* Zoom Link: https://zoom.us/j/678394311
|
||||
* Check out the [Agenda and Minutes](https://docs.google.com/document/d/1Ds87eRiNZeXwRBEbFr6Z7ukjbTow5RQcNZLaSvWWQsE/edit)
|
||||
|
||||
## Comms:
|
||||
|
|
@ -12,12 +12,12 @@ A Special Interest Group for documentation, doc processes, and doc publishing fo
|
|||
* Google Group: [kubernetes-sig-docs@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
|
||||
|
||||
## Goals:
|
||||
* Discuss documentation and docs issues for kubernetes
|
||||
* Plan docs releases for k8s
|
||||
* Suggest improvements to developer onboarding where we see friction
|
||||
* Discuss documentation and docs issues for kubernetes.io
|
||||
* Plan docs releases for kubernetes
|
||||
* Suggest improvements to user onboarding through better documentation on Kubernetes.io
|
||||
* Identify and implement ways to get documentation feedback and metrics
|
||||
* Help people get involved in the kubernetes community
|
||||
* Help community contributors get involved in kubernetes documentation
|
||||
|
||||
## Organizers:
|
||||
* Jared Bhatti <jaredb@google.com>, Google
|
||||
* Devin Donnelly <ddonnelly@google.com>, Google
|
||||
* Jared Bhatti <jaredb@google.com>, Google
|
||||
|
|
|
|||
|
|
@ -4,12 +4,12 @@ This is a SIG focused on Federation of Kubernetes Clusters ("Ubernetes") and rel
|
|||
* Hybrid clouds
|
||||
* Spanning of multiple could providers
|
||||
* Application migration from private to public clouds (and vice versa)
|
||||
* ... and other similar subjects.
|
||||
* ... and other similar subjects.
|
||||
|
||||
## Meetings:
|
||||
* Bi-weekly on Mondays @ 9am [America/Los_Angeles](http://time.is/Los_Angeles) (check [the calendar](https://calendar.google.com/calendar/embed?src=cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com&ctz=America/Los_Angeles))
|
||||
* Hangouts link: <https://plus.google.com/hangouts/_/google.com/ubernetes>
|
||||
* [Working Group Notes](https://docs.google.com/document/d/1r0YElcXt6G5mOWxwZiXgGu_X6he3F--wKwg-9UBc29I/edit?usp=sharing)
|
||||
* [Working Group Notes](https://docs.google.com/document/d/18mk62nOXE_MCSSnb4yJD_8UadtzJrYyJxFwbrgabHe8/edit)
|
||||
|
||||
## Communication:
|
||||
* Slack: <https://kubernetes.slack.com/messages/sig-federation> ([archive](http://kubernetes.slackarchive.io/sig-federation))
|
||||
|
|
|
|||
|
|
@ -0,0 +1,43 @@
|
|||
# SIGs and Working Groups
|
||||
|
||||
Most community activity is organized into Special Interest Groups (SIGs),
|
||||
time bounded Working Groups, and the [community meeting](communication.md#Meeting).
|
||||
|
||||
SIGs follow these [guidelines](governance.md) although each of these groups may operate a little differently
|
||||
depending on their needs and workflow.
|
||||
|
||||
Each group's material is in its subdirectory in this project.
|
||||
|
||||
When the need arises, a [new SIG can be created](sig-creation-procedure.md)
|
||||
|
||||
### Master SIG List
|
||||
|
||||
| Name | Leads | Group | Slack Channel | Meetings |
|
||||
|------|-------|-------|---------------|----------|
|
||||
| [API Machinery](sig-api-machinery/README.md) | [@lavalamp](https://github.com/lavalamp) Daniel Smith, Google <br> [@deads2k](https://github.com/orgs/kubernetes/people/deads2k) David Eads, Red Hat| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-api-machinery) | [#sig-api-machinery](https://kubernetes.slack.com/messages/sig-api-machinery/) | [Every other Wednesday at 11:00 AM PST](https://staging.talkgadget.google.com/hangouts/_/google.com/kubernetes-sig) |
|
||||
| [Apps](sig-apps/README.md) | [@michelleN (Michelle Noorali, Deis)](https://github.com/michelleN)<br>[@mattfarina (Matt Farina, HPE)](https://github.com/mattfarina) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-apps) | [#sig-apps](https://kubernetes.slack.com/messages/sig-apps) | [Mondays 9:00AM PST](https://zoom.us/j/4526666954) |
|
||||
| [Auth](sig-auth/README.md) | [@erictune (Eric Tune, Google)](https://github.com/erictune)<br> [@ericchiang (Eric Chiang, CoreOS)](https://github.com/orgs/kubernetes/people/ericchiang)<br> [@liggitt (Jordan Liggitt, Red Hat)](https://github.com/orgs/kubernetes/people/liggitt) <br> [@deads2k (David Eads, Red Hat)](https://github.com/orgs/kubernetes/people/deads2k) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-auth) | [#sig-auth](https://kubernetes.slack.com/messages/sig-auth/) | Biweekly [Wednesdays at 1100 to 1200 PT](https://zoom.us/my/k8s.sig.auth) |
|
||||
| [Autoscaling](sig-autoscaling/README.md) | [@fgrzadkowski (Filip Grządkowski, Google)](https://github.com/fgrzadkowski)<br> [@directxman12 (Solly Ross, Red Hat)](https://github.com/directxman12) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-autoscaling) | [#sig-autoscaling](https://kubernetes.slack.com/messages/sig-autoscaling/) | Biweekly (or triweekly) on [Thurs at 0830 PT](https://plus.google.com/hangouts/_/google.com/k8s-autoscaling) |
|
||||
| [AWS](sig-aws/README.md) | [@justinsb (Justin Santa Barbara)](https://github.com/justinsb)<br>[@kris-nova (Kris Nova)](https://github.com/kris-nova)<br>[@chrislovecnm (Chris Love)](https://github.com/chrislovecnm)<br>[@mfburnett (Mackenzie Burnett)](https://github.com/mfburnett) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) | [#sig-aws](https://kubernetes.slack.com/messages/sig-aws/) | We meet on [Zoom](https://zoom.us/my/k8ssigaws), and the calls are scheduled via the official [group mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) |
|
||||
| [Big Data](sig-big-data/README.md) | [@foxish (Anirudh Ramanathan, Google)](https://github.com/foxish)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data) | [#sig-big-data](https://kubernetes.slack.com/messages/sig-big-data/) | Wednesdays at 10am PST, link posted in [the official group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data). |
|
||||
| [CLI](sig-cli/README.md) | [@fabianofranz (Fabiano Franz, Red Hat)](https://github.com/fabianofranz)<br>[@pwittrock (Phillip Wittrock, Google)](https://github.com/pwittrock)<br>[@AdoHe (Tony Ado, Alibaba)](https://github.com/AdoHe) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cli) | [#sig-cli](https://kubernetes.slack.com/messages/sig-cli) | Bi-weekly Wednesdays at 9:00 AM PT on [Zoom](https://zoom.us/my/sigcli) |
|
||||
| [Cluster Lifecycle](sig-cluster-lifecycle/README.md) | [@lukemarsden (Luke Marsden, Weave)](https://github.com/lukemarsden) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-lifecycle) | [#sig-cluster-lifecycle](https://kubernetes.slack.com/messages/sig-cluster-lifecycle) | Tuesdays at 09:00 AM PST on [Zoom](https://zoom.us/j/166836624) |
|
||||
| [Cluster Ops](sig-cluster-ops/README.md) | [@zehicle (Rob Hirschfeld, RackN)](https://github.com/zehicle) <br> [@mikedanese (Mike Danese, Google](https://github.com/mikedanese) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-ops) | [#sig-cluster-ops](https://kubernetes.slack.com/messages/sig-cluster-ops) | Thursdays at 1:00 PM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/sig-cluster-ops)|
|
||||
| [Contributor Experience](sig-contribx/README.md) | [@grodrigues3 (Garrett Rodrigues, Google)](https://github.com/Grodrigues3) <br> [@pwittrock (Phillip Witrock, Google)](https://github.com/pwittrock) <br> [@Phillels (Elsie Phillips, CoreOS)](https://github.com/Phillels) | [Group](https://groups.google.com/forum/#!forum/kubernetes-wg-contribex) | [#wg-contribex](https://kubernetes.slack.com/messages/wg-contribex) | Biweekly Wednesdays 9:30 AM PST on [zoom](https://zoom.us/j/4730809290) |
|
||||
| [Docs](sig-docs/README.md) | [@pwittrock (Philip Wittrock, Google)](https://github.com/pwittrock) <br> [@devin-donnelly (Devin Donnelly, Google)](https://github.com/devin-donnelly) <br> [@jaredbhatti (Jared Bhatti, Google)](https://github.com/jaredbhatti)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-docs) | [#sig-docs](https://kubernetes.slack.com/messages/sig-docs) | Tuesdays @ 10:30AM PST on [Zoom](https://zoom.us/j/678394311) |
|
||||
| [Federation](sig-federation/README.md) | [@csbell (Christian Bell, Google)](https://github.com/csbell) <br> [@quinton-hoole (Quinton Hoole, Huawei)](https://github.com/quinton-hoole) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-federation) | [#sig-federation](https://kubernetes.slack.com/messages/sig-federation/) | Bi-weekly on Monday at 9:00 AM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/ubernetes) |
|
||||
| [Instrumentation](sig-instrumentation/README.md) | [@piosz (Piotr Szczesniak, Google)](https://github.com/piosz) <br> [@fabxc (Fabian Reinartz, CoreOS)](https://github.com/fabxc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-instrumentation) | [#sig-instrumentation](https://kubernetes.slack.com/messages/sig-instrumentation) | [Thursdays at 9.30 AM PST](https://zoom.us/j/5342565819) |
|
||||
| [Network](sig-network/README.md) | [@thockin (Tim Hockin, Google)](https://github.com/thockin)<br> [@dcbw (Dan Williams, Red Hat)](https://github.com/dcbw)<br> [@caseydavenport (Casey Davenport, Tigera)](https://github.com/caseydavenport) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-network) | [#sig-network](https://kubernetes.slack.com/messages/sig-network/) | Thursdays at 2:00 PM PST on [Zoom](https://zoom.us/j/5806599998) |
|
||||
| [Node](sig-node/README.md) | [@dchen1107 (Dawn Chen, Google)](https://github.com/dchen1107)<br>[@euank (Euan Kemp, CoreOS)](https://github.com/orgs/kubernetes/people/euank)<br>[@derekwaynecarr (Derek Carr, Red Hat)](https://github.com/derekwaynecarr) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-node) | [#sig-node](https://kubernetes.slack.com/messages/sig-node/) | [Tuesdays at 10:00 PT](https://plus.google.com/hangouts/_/google.com/sig-node-meetup?authuser=0) |
|
||||
| [On Prem](sig-on-prem/README.md) | [@josephjacks (Joseph Jacks, Apprenda)](https://github.com/josephjacks) <br> [@zen (Tomasz Napierala, Mirantis)](https://github.com/zen)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-on-prem) | [#sig-onprem](https://kubernetes.slack.com/messages/sig-onprem/) | Every two weeks on Wednesday at 9 PM PST / 12 PM EST |
|
||||
| [OpenStack](sig-openstack/README.md) | [@idvoretskyi (Ihor Dvoretskyi, Mirantis)](https://github.com/idvoretskyi) <br> [@xsgordon (Steve Gordon, Red Hat)](https://github.com/xsgordon)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack) | [#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/) | Every second Wednesday at 5 PM PDT / 2 PM EDT |
|
||||
| [PM](sig-pm/README.md) | [@apsinha (Aparna Sinha, Google)](https://github.com/apsinha) <br> [@idvoretskyi (Ihor Dvoretskyi, Mirantis)](https://github.com/idvoretskyi) <br> [@calebamiles (Caleb Miles, CoreOS)](https://github.com/calebamiles)| [Group](https://groups.google.com/forum/#!forum/kubernetes-pm) | [#kubernetes-pm](https://kubernetes.slack.com/messages/kubernetes-pm/) | TBD|
|
||||
| [Rktnetes](sig-rktnetes/README.md) | [@euank (Euan Kemp, CoreOS)](https://github.com/euank) <br> [@tmrts (Tamer Tas)](https://github.com/tmrts) <br> [@yifan-gu (Yifan Gu, CoreOS)](https://github.com/yifan-gu) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-rktnetes) | [#sig-rktnetes](https://kubernetes.slack.com/messages/sig-rktnetes/) | [As needed (ad-hoc)](https://zoom.us/j/830298957) |
|
||||
| [Scalability](sig-scalability/README.md) | [@lavalamp (Daniel Smith, Google)](https://github.com/lavalamp)<br>[@countspongebob (Bob Wise, Samsung SDS)](https://github.com/countspongebob)<br>[@jbeda (Joe Beda)](https://github.com/jbeda) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scale) | [#sig-scale](https://kubernetes.slack.com/messages/sig-scale/) | [Thursdays at 09:00 PT](https://zoom.us/j/989573207) |
|
||||
| [Scheduling](sig-scheduling/README.md) | [@davidopp (David Oppenheimer, Google)](https://github.com/davidopp)<br>[@timothysc (Timothy St. Clair, Red Hat)](https://github.com/timothysc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scheduling) | [#sig-scheduling](https://kubernetes.slack.com/messages/sig-scheduling/) | Alternate between Mondays at 1 PM PT and Wednesdays at 12:30 AM PT on [Zoom](https://zoom.us/zoomconference?m=rN2RrBUYxXgXY4EMiWWgQP6Vslgcsn86) |
|
||||
| [Service Catalog](sig-service-catalog/README.md) | [@pmorie (Paul Morie, Red Hat)](https://github.com/pmorie) <br> [@arschles (Aaron Schlesinger, Deis)](github.com/arschles) <br> [@bmelville (Brendan Melville, Google)](https://github.com/bmelville) <br> [@duglin (Doug Davis, IBM)](https://github.com/duglin)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-service-catalog) | [#sig-service-catalog](https://kubernetes.slack.com/messages/sig-service-catalog/) | [Mondays at 1 PM PST](https://zoom.us/j/7201225346) |
|
||||
| [Storage](sig-storage/README.md) | [@saad-ali (Saad Ali, Google)](https://github.com/saad-ali)<br>[@childsb (Brad Childs, Red Hat)](https://github.com/childsb) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-storage) | [#sig-storage](https://kubernetes.slack.com/messages/sig-storage/) | Bi-weekly Thursdays 9 AM PST (or more frequently) on [Zoom](https://zoom.us/j/614261834) |
|
||||
| [Testing](sig-testing/README.md) | [@spiffxp (Aaron Crickenberger, Samsung)](https://github.com/spiffxp)<br>[@ixdy (Jeff Grafton, Google)](https://github.com/ixdy) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-testing) | [#sig-testing](https://kubernetes.slack.com/messages/sig-testing/) | [Tuesdays at 9:30 AM PT](https://zoom.us/j/553910341) |
|
||||
| [UI](sig-ui/README.md) | [@romlein (Dan Romlein, Apprenda)](https://github.com/romlein)<br> [@bryk (Piotr Bryk, Google)](https://github.com/bryk) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-ui) | [#sig-ui](https://kubernetes.slack.com/messages/sig-ui/) | Wednesdays at 4:00 PM CEST |
|
||||
| [Windows](sig-windows/README.md) | [@michmike77 (Michael Michael, Apprenda)](https://github.com/michmike)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-windows) | [#sig-windows](https://kubernetes.slack.com/messages/sig-windows) | Bi-weekly Tuesdays at 9:30 AM PT |
|
||||
|
||||
|
|
@ -18,4 +18,4 @@ We use **sig-onprem** label to track on premise efforts in PRs and issues:
|
|||
|
||||
**Effort tracking document** [On-Prem related projects](https://docs.google.com/spreadsheets/d/1Ca9ZpGXM4PfycYv0Foi7Y4vmN4KVXrGYcJipbH8_xLY/edit#gid=0)
|
||||
|
||||
**Meetings:** Every second Wednesday at 0800 PST (11 AM ET / 5 PM CET) - [Connect using Zoom](https://zoom.us/my/k8s.sig.onprem), [Agenda/Notes](https://docs.google.com/document/d/1AHF1a8ni7iMOpUgDMcPKrLQCML5EMZUAwP4rro3P6sk/edit#)
|
||||
**Meetings:** Every second Wednesday at 0900 PST (12 AM ET / 6 PM CET) - [Connect using Zoom](https://zoom.us/my/k8s.sig.onprem), [Agenda/Notes](https://docs.google.com/document/d/1AHF1a8ni7iMOpUgDMcPKrLQCML5EMZUAwP4rro3P6sk/edit#)
|
||||
|
|
|
|||
|
|
@ -1,21 +1,62 @@
|
|||
# OpenStack SIG
|
||||
|
||||
This is the wiki page of the Kubernetes OpenStack SIG: a special interest group
|
||||
co-ordinating contributions of OpenStack-related changes to Kubernetes.
|
||||
This is the community page of the Kubernetes OpenStack SIG: a special
|
||||
interest group coordinating the cross-community efforts of the OpenStack
|
||||
and Kubernetes communities. This includes OpenStack-related contributions
|
||||
to Kubernetes projects with OpenStack as:
|
||||
* a deployment platform for Kubernetes,
|
||||
* a service provider for Kubernetes,
|
||||
* a collection of applications to run on Kubernetes.
|
||||
|
||||
## Meetings
|
||||
|
||||
Meetings are held every second Wednesday. The meetings occur at
|
||||
1500 UTC or 2100 UTC, alternating.
|
||||
|
||||
To check which time is being used for the upcoming meeting refer to the
|
||||
[Agenda/Notes](https://docs.google.com/document/d/1iAQ3LSF_Ky6uZdFtEZPD_8i6HXeFxIeW4XtGcUJtPyU/edit?usp=sharing_eixpa_nl&ts=588b986f).
|
||||
|
||||
Meeting reminders are also sent to the [kubernetes-sig-openstack](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack)
|
||||
list. Meetings are held on [Zoom](https://zoom.us) in the room at
|
||||
[https://zoom.us/j/417251241](https://zoom.us/j/417251241).
|
||||
|
||||
## Leads
|
||||
|
||||
Steve Gordon (@xsgordon) and Ihor Dvoretskyi (@idvoretskyi)
|
||||
|
||||
## Slack Channel
|
||||
|
||||
[#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/). [Archive](http://kubernetes.slackarchive.io/sig-openstack/)
|
||||
|
||||
|
||||
## Mailing Lists
|
||||
|
||||
The OpenStack SIG has a number of mailing lists, most activities are
|
||||
co-ordinated via the general discussion list with the remainder used for
|
||||
following Github notifications where the SIG is tagged in a comment.
|
||||
|
||||
The general discussion regarding the SIG occurs on the [kubernetes-sig-openstack](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack)
|
||||
mailing list.
|
||||
|
||||
## GitHub Teams
|
||||
|
||||
There are a number of GitHub teams set up that can be tagged in an issue or PR
|
||||
to bring them to the attention of the team. These notifications are also sent
|
||||
to the mailing list attached to each GitHub team for archival purposes. It is
|
||||
not intended that discussion occur on these lists directly.
|
||||
|
||||
| Name | Archival List |
|
||||
|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
|
||||
|@kubernetes/sig-openstack-api-reviews | [kubernetes-sig-openstack-api-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-api-reviews) |
|
||||
|@kubernetes/sig-openstack-bugs | [kubernetes-sig-openstack-bugs](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-bugs) |
|
||||
|@kubernetes/sig-openstack-feature-requests| [kubernetes-sig-openstack-feature-requests](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-feature-requests) |
|
||||
|@kubernetes/sig-openstack-proposals | [kubernetes-sig-openstack-proposals](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-proposals) |
|
||||
|@kubernetes/sig-openstack-pr-reviews | [kubernetes-sig-openstack-pr-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-pr-reviews) |
|
||||
|@kubernetes/sig-openstack-misc | [kubernetes-sig-openstack-misc](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-misc) |
|
||||
|@kubernetes/sig-openstack-test-failures | [kubernetes-sig-openstack-test-failures](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-test-failures) |
|
||||
|
||||
## Issues and Bugs
|
||||
|
||||
Relevant [Issues](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen%20label%3Asig%2Fopenstack%20is%3Aissue)
|
||||
and [Pull Requests](https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen%20is%3Apr%20label%3Asig%2Fopenstack)
|
||||
are tagged with the **sig-openstack** label.
|
||||
|
||||
**Leads:** Steve Gordon (@xsgordon) and Ihor Dvoretskyi (@idvoretskyi)
|
||||
|
||||
**Slack Channel:** [#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/). [Archive](http://kubernetes.slackarchive.io/sig-openstack/)
|
||||
|
||||
**Mailing List:** [kubernetes-sig-openstack](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack)
|
||||
|
||||
**Meetings:** Meetings are held every second Wednesday. The meetings occur at
|
||||
1500 UTC or 2100 UTC, alternating. To check which time is being used for the
|
||||
upcoming meeting refer to the [Agenda/Notes](https://docs.google.com/document/d/1iAQ3LSF_Ky6uZdFtEZPD_8i6HXeFxIeW4XtGcUJtPyU/edit#).
|
||||
Meeting reminders are also sent to the mailing list linked above. Meetings are
|
||||
held on [Zoom](https://zoom.us) in the room at [https://zoom.us/j/417251241](https://zoom.us/j/417251241).
|
||||
)
|
||||
are tagged with the **sig/openstack** label.
|
||||
|
|
|
|||
|
|
@ -1,21 +0,0 @@
|
|||
List of the OpenStack Special Interest Group team members.
|
||||
|
||||
Use @kubernetes/sig-openstack to mention this team in comments.
|
||||
|
||||
* [David Oppenheimer](https://github.com/davidopp) (owner)
|
||||
* [Steve Gordon](https://github.com/xsgordon) (SIG lead)
|
||||
* [Ihor Dvoretskyi](https://github.com/idvoretskyi) (SIG lead)
|
||||
* [Angus Lees](https://github.com/anguslees)
|
||||
* [Pengfei Ni](https://github.com/feiskyer)
|
||||
* [Joshua Harlow](https://github.com/harlowja)
|
||||
* [Stephen McQuaid](https://github.com/stevemcquaid)
|
||||
* [Huamin Chen](https://github.com/rootfs)
|
||||
* [David F. Flanders](https://github.com/DFFlanders)
|
||||
* [Davanum Srinivas](https://github.com/dims)
|
||||
* [Egor Guz](https://github.com/eghobo)
|
||||
* [Flavio Percoco Premoli](https://github.com/flaper87)
|
||||
* [Hongbin Lu](https://github.com/hongbin)
|
||||
* [Louis Taylor](https://github.com/kragniz)
|
||||
* [Jędrzej Nowak](https://github.com/pigmej)
|
||||
* [rohitagarwalla](https://github.com/rohitagarwalla)
|
||||
* [Russell Bryant](https://github.com/russellb)
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
## The Kubernetes PM Group
|
||||
|
||||
### Focus
|
||||
|
||||
The Kubernetes PM Group focuses on aspects of product management, such as the qualification and successful management of user requests, and aspects of project and program management such as the continued improvement of the processes used by the Kubernetes community to maintain the Kubernetes Project itself.
|
||||
|
||||
Besides helping to discover both what to build and how to build it, the PM Group also helps to try and keep the wheels on this spaceship we are all building together; bringing together people who think about Kubernetes as both a vibrant community of humans and technical program is another primary focus of this group.
|
||||
|
||||
Members of the Kubernetes PM Group can assume [certain additional](https://github.com/kubernetes/community/blob/master/project-managers/README.md) responsibilities to help maintain the Kubernetes Project itself.
|
||||
|
||||
It is also important to remember that the role of managing an open source project is very new and largely unscoped for a project as large as Kubernetes; we are learning too and we are excited to learn how we can best serve the community of users and contributors.
|
||||
|
||||
### Common activities
|
||||
- Collecting and generalizing user feedback to help drive project direction and priorities: delivering on user needs while enforcing vendor neutrality
|
||||
- Supporting collaboration across the community by working to improve the communication of roadmap and workload of other [Special Interest Groups](https://github.com/kubernetes/community#special-interest-groups-sig-and-working-groups)
|
||||
- Supporting the continued effort to improve the stability and extensibility of Kubernetes Project
|
||||
- Supporting the marketing and promotion of the Kubernetes Project through the [CNCF](https://www.cncf.io/)
|
||||
- Working with the [Kubernetes Release Team](https://github.com/kubernetes/community/tree/master/contributors/devel/release) to continually ensure a high quality release of the Kubernetes Project
|
||||
- Supporting the Kubernetes ecosystem through [the Kubernetes Incubator](https://github.com/kubernetes/community/blob/master/incubator.md)
|
||||
- Coordinating project wide policy changes for Kubernetes and the Kubernetes Incubator
|
||||
- Onboarding large groups of corporate contributors and welcoming them into the Kubernetes Community
|
||||
- Whatever is needed to help make the project go!
|
||||
|
||||
### Contact us
|
||||
- via [Slack](https://kubernetes.slack.com/messages/kubernetes-pm/)
|
||||
- via [Google Groups](https://groups.google.com/forum/#!forum/kubernetes-pm)
|
||||
|
||||
### Regular Meetings
|
||||
|
||||
Every second Tuesday, 8:00 AM PT/4:00 PM UTC
|
||||
- [Zoom link](https://zoom.us/j/845373595)
|
||||
- [Meeting Notes](https://docs.google.com/document/d/1YqIpyjz4mV1jjvzhLx9JYy8LAduedzaoBMjpUKGUJQo/edit?usp=sharing)
|
||||
|
||||
### Leaders
|
||||
- Aparna Sinha apsinha@google.com, Google
|
||||
- Ihor Dvoretskyi ihor.dvoretskyi@gmail.com, Mirantis
|
||||
- Caleb Miles caleb.miles@coreos.com, CoreOS
|
||||
|
|
@ -1,18 +1,34 @@
|
|||
# Scalability SIG
|
||||
# SIG Scalability
|
||||
|
||||
**Leads:** Bob Wise (@countspongebob) and Joe Beda (@jbeda)
|
||||
Responsible for answering scalability related questions such as:
|
||||
|
||||
**Slack Channel:** [#sig-scale](https://kubernetes.slack.com/messages/sig-scale/). [Archive](http://kubernetes.slackarchive.io/sig-scale/)
|
||||
What size clusters do we think that we should support with Kubernetes in the short to
|
||||
medium term? How performant do we think that the control system should be at scale?
|
||||
What resource overhead should the Kubernetes control system reasonably consume?
|
||||
|
||||
**Mailing List:** [kubernetes-sig-scale](https://groups.google.com/forum/#!forum/kubernetes-sig-scale)
|
||||
For more details about our objectives please review [Scaling And Performance Goals](goals.md)
|
||||
|
||||
**Meetings:** Thursdays at 9am pacific. Contact Joe or Bob for invite. [Notes](https://docs.google.com/a/bobsplanet.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?usp=drive_web
|
||||
)
|
||||
## Organizers
|
||||
- Bob Wise (@countspongebob), Samsung-CNCT
|
||||
- Joe Beda (@jbeda), Heptio
|
||||
|
||||
**Docs:**
|
||||
[Scaling And Performance Goals](goals.md)
|
||||
## Meetings
|
||||
|
||||
### Scalability SLAs
|
||||
- **Every Thursday at 9am pacific.**
|
||||
- Contact Joe or Bob for invite.
|
||||
- [Zoom link](https://zoom.us/j/989573207)
|
||||
- [Agenda items](https://docs.google.com/a/bobsplanet.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?usp=drive_web)
|
||||
|
||||
## Slack / Google Groups
|
||||
- [Slack: #sig-scale](https://kubernetes.slack.com/messages/sig-scale/).
|
||||
- [Slack Archive](http://kubernetes.slackarchive.io/sig-scale/)
|
||||
|
||||
- [kubernetes-sig-scale](https://groups.google.com/forum/#!forum/kubernetes-sig-scale)
|
||||
|
||||
## Docs
|
||||
- [Scaling And Performance Goals](goals.md)
|
||||
|
||||
## Scalability SLAs
|
||||
|
||||
We officially support two different SLAs:
|
||||
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@
|
|||
|
||||
[**Meeting Agenda**](http://goo.gl/A0m24V)
|
||||
|
||||
[**Meeting Video Playlist**](https://goo.gl/ZmLNX9)
|
||||
|
||||
### SIG Mission
|
||||
|
||||
Mission: to develop a Kubernetes API for the CNCF service broker and Kubernetes broker implementation.
|
||||
|
|
|
|||
|
|
@ -25,13 +25,13 @@ Interested in contributing to storage features in Kubernetes? [Please read our g
|
|||
* [kubernetes-sig-storage-proposals](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-proposals)
|
||||
* [kubernetes-sig-storage-test-failures](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-test-failures)
|
||||
* Github Teams - These are the teams that should be mentioned on Github PRs and Issues:
|
||||
* [kubernetes-sig-storage-api-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-api-reviews)
|
||||
* [kubernetes-sig-storage-bugs](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-bugs)
|
||||
* [kubernetes-sig-storage-feature-requests](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-feature-requests)
|
||||
* [kubernetes-sig-storage-misc](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-misc)
|
||||
* [kubernetes-sig-storage-pr-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-pr-reviews)
|
||||
* [kubernetes-sig-storage-proposals](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-proposals)
|
||||
* [kubernetes-sig-storage-test-failures](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-test-failures)
|
||||
* [kubernetes-sig-storage-api-reviews](https://github.com/orgs/kubernetes/teams/sig-storage-api-reviews)
|
||||
* [kubernetes-sig-storage-bugs](https://github.com/orgs/kubernetes/teams/sig-storage-bugs)
|
||||
* [kubernetes-sig-storage-feature-requests](https://github.com/orgs/kubernetes/teams/sig-storage-feature-requests)
|
||||
* [kubernetes-sig-storage-misc](https://github.com/orgs/kubernetes/teams/sig-storage-misc)
|
||||
* [kubernetes-sig-storage-pr-reviews](https://github.com/orgs/kubernetes/teams/sig-storage-pr-reviews)
|
||||
* [kubernetes-sig-storage-proposals](https://github.com/orgs/kubernetes/teams/sig-storage-proposals)
|
||||
* [kubernetes-sig-storage-test-failures](https://github.com/orgs/kubernetes/teams/sig-storage-test-failures)
|
||||
* Github Issues
|
||||
* [link](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Asig%2Fstorage)
|
||||
* Documentation for currently supported volume plugins: http://kubernetes.io/docs/user-guide/volumes/
|
||||
|
|
|
|||
|
|
@ -6,7 +6,8 @@ For folks that prefer reading the docs first, we recommend reading our Storage D
|
|||
For folks that prefer a video overview, we recommend watching the following videos:
|
||||
- [The state of state](https://www.youtube.com/watch?v=jsTQ24CLRhI&index=6&list=PLosInM-8doqcBy3BirmLM4S_pmox6qTw3)
|
||||
- [Kubernetes Storage 101](https://www.youtube.com/watch?v=ZqTHe6Xj0Ek&list=PLosInM-8doqcBy3BirmLM4S_pmox6qTw3&index=38)
|
||||
- [Storage overview to SIG Apps](https://www.youtube.com/watch?v=DrLGxkFdDNc&feature=youtu.be&t=11m19s)
|
||||
- [Overview of Basic Volume for SIG Apps](https://youtu.be/DrLGxkFdDNc?t=11m19s)
|
||||
- [Overview of Dynamic Provisioning for SIG Apps](https://youtu.be/NXUHmxXytUQ?t=10m33s)
|
||||
|
||||
Keep in mind that the video overviews reflect the state of the art at the time they were created. In Kubernetes we try very hard to maintain backwards compatibility but Kubernetes is a fast moving project and we do add features going forward and attending the Storage SIG meetings and the Storage SIG Google group are both good ways of continually staying up to speed.
|
||||
|
||||
|
|
|
|||
|
|
@ -2,10 +2,17 @@
|
|||
|
||||
A special interest group for bringing Kubernetes support to Windows.
|
||||
|
||||
## Meeting
|
||||
* Bi-weekly: Tuesday 1:00 PM EST (10:00 AM PST)
|
||||
## Meetings
|
||||
* Bi-weekly: Tuesday 12:30 PM EST (9:30 AM PST)
|
||||
* Zoom link: [https://zoom.us/my/sigwindows](https://zoom.us/my/sigwindows)
|
||||
* To get an invite to the meeting, first join the Google group https://groups.google.com/forum/#!forum/kubernetes-sig-windows, and also ask the SIG Lead for the current invitation
|
||||
|
||||
## History
|
||||
* Recorded Meetings Playlist on Youtube: https://www.youtube.com/playlist?list=PL69nYSiGNLP2OH9InCcNkWNu2bl-gmIU4&jct=LZ9EIvD4DGrhr2h4r0ItaBmco7gTgw
|
||||
* Meeting Notes: https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit#heading=h.kbz22d1yc431
|
||||
|
||||
The meeting agenda and notes can be found [here](https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit)
|
||||
## Get Involved
|
||||
* Find us on Slack at https://kubernetes.slack.com/messages/sig-windows
|
||||
* Find us on Google groups https://groups.google.com/forum/#!forum/kubernetes-sig-windows
|
||||
* Slack History is archived at http://kubernetes.slackarchive.io/sig-windows/
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue