Merge branch 'master' into fix/helm-incu

This commit is contained in:
Brandon Philips 2017-03-07 23:05:20 -08:00 committed by GitHub
commit 03ecf9eb26
86 changed files with 5510 additions and 1132 deletions

68
CLA.md
View File

@ -1,30 +1,70 @@
### How do I sign the CNCF CLA?
# The Contributor License Agreement
* To sign up as an individual or as an employee of a signed organization, go to https://identity.linuxfoundation.org/projects/cncf
* To sign up as an organization, go to https://identity.linuxfoundation.org/node/285/organization-signup
* To review the CNCF CLA, go to https://github.com/cncf/cla
The [Cloud Native Computing Foundation][CNCF] defines the legal status of the
contributed code in a _Contributor License Agreement_ (CLA).
***
Only original source code from CLA signatories can be accepted into kubernetes.
### After you select one of the options above, please follow the instructions below:
This policy does not apply to [third_party] and [vendor].
**Step 1**: You must sign in with GitHub.
## How do I sign?
**Step 2**: If you are signing up as an employee, you must use your official person@organization.domain email address in the CNCF account registration page.
#### 1. Read
* [CLA for individuals] to sign up as an individual or as an employee of a signed organization.
* [CLA for corporations] to sign as a corporation representative and manage signups from your organization.
#### 2. Sign in with GitHub.
**Step 3**: The email you use on your commits (https://help.github.com/articles/setting-your-email-in-git/) must match the email address you use when signing up for the CNCF account.
Click
* [Individual signup] to sign up as an individual or as an employee of a signed organization.
* [Corp signup] to sign as a corporation representative and manage signups from your organization.
Either signup form looks like this:
![CNCFCLA1](http://i.imgur.com/tEk2x3j.png)
#### 3. Enter the correct E-mail address to validate!
The address entered on the form must meet two constraints:
* It __must match__ your [git email] (the output of `git config user.email`)
or your PRs will not be approved!
* It must be your official `person@organization.com` address if you signed up
as an employee of said organization.
![CNCFCLA](http://i.imgur.com/tEk2x3j.png)
![CNCFCLA2](http://i.imgur.com/t3WAtrz.png)
#### 4. Look for an email indicating successful signup.
**Step 4**: Once the CLA sent to your email address is signed (or your email address is verified in case your organization has signed the CLA), you should be able to check that you are authorized in any new PR you create.
> The Linux Foundation
>
> Hello,
>
> You have signed CNCF Individual Contributor License Agreement.
> You can see your document anytime by clicking View on HelloSign.
>
Once you have this, the CLA authorizer bot will authorize your PRs.
![CNCFCLA3](http://i.imgur.com/C5ZsNN6.png)
**Step 5**: The status on your old PRs will be updated when any new comment is made on it.
### I'm having issues with signing the CLA.
## Troubleshooting
If you're facing difficulty with signing the CNCF CLA, please explain your case on https://github.com/kubernetes/kubernetes/issues/27796 and we (@sarahnovotny and @foxish), along with the CNCF will help sort it out.
If you have signup trouble, please explain your case on
the [CLA signing issue] and we (@sarahnovotny and @foxish),
along with the [CNCF] will help sort it out.
Another option: ask for help at `helpdesk@rt.linuxfoundation.org`.
[CNCF]: https://www.cncf.io/community
[CLA signing issue]: https://github.com/kubernetes/kubernetes/issues/27796
[CLA for individuals]: https://github.com/cncf/cla/blob/master/individual-cla.pdf
[CLA for corporations]: https://github.com/cncf/cla/blob/master/corporate-cla.pdf
[Corp signup]: https://identity.linuxfoundation.org/node/285/organization-signup
[Individual signup]: https://identity.linuxfoundation.org/projects/cncf
[git email]: https://help.github.com/articles/setting-your-email-in-git
[third_party]: https://github.com/kubernetes/kubernetes/tree/master/third_party
[vendor]: https://github.com/kubernetes/kubernetes/tree/master/vendor

View File

@ -1,16 +1,30 @@
# Contributing guidelines
# Contributing to the community repo
This project is for documentation about the community. To contribute to one of
the Kubernetes projects please see the contribution guide for that project.
## How To Contribute
Contributions to this community repository follow a
[pull request](https://help.github.com/articles/using-pull-requests/) (PR)
model:
The contributions here follow a [pull request](https://help.github.com/articles/using-pull-requests/) model with some additional process.
The process is as follows:
#### 1. Submit a PR with your change
1. Submit a pull request with the requested change.
2. Another person, other than a Special Interest Group (SIG) owner, can mark it Looks Good To Me (LGTM) upon successful review. Otherwise feedback can be given.
3. A SIG owner can merge someone else's change into their SIG documentation immediate.
4. Someone cannot immediately merge their own change. To merge your own change wait 24 hours during the week or 72 hours over a weekend. This allows others the opportunity to review a change.
#### 2. Get an LGTM.
_Note, the SIG Owners decide on the layout for their own sub-directory structure._
Upon successful review, someone will give the PR
a __LGTM__ (_looks good to me_) in the review thread.
#### 3. Allow time for others to see it
Once you have an __LGTM__, please wait 24 hours during
the week or 72 hours over a weekend before you
merge it, to give others (besides your initial reviewer)
time to see it.
__That said, a [SIG lead](sig-list.md) may shortcut this by merging
someone else's change into their SIG's documentation
at any time.__
Edits in SIG sub-directories should follow structure and guidelines set
by the respective SIG leads - see `CONTRIBUTING` instructions in subdirectories.

121
README.md
View File

@ -1,71 +1,86 @@
# Kubernetes Community Documentation
# Kubernetes Community
Welcome to the Kubernetes community documentation. Here you can learn about what's happening in the community.
Welcome to the Kubernetes community!
## Slack Chat
This is the starting point for becoming a contributor - improving docs, improving code, giving talks etc.
## Communicating
Kubernetes uses [Slack](http://slack.com) for community discussions.
General communication channels - e.g. filing issues, chat, mailing lists and
conferences are listed on the [communication](communication.md) page.
**Join**: Joining is self-service. Go to [slack.k8s.io](http://slack.k8s.io) to join.
For more specific topics, try a SIG.
**Access**: Once you join, the team can be found at [kubernetes.slack.com](http://kubernetes.slack.com)
## SIGs
**Archives**: Discussions on most channels are archived at [kubernetes.slackarchive.io](http://kubernetes.slackarchive.io). Start archiving by inviting the slackarchive bot to a channel via `/invite @slackarchive`
Kubernetes is a set of projects, each shepherded by a special interest group (SIG).
A first step to contributing is to pick from the [list of kubernetes SIGs](sig-list.md).
To add new channels, contact one of the admins. Currently that includes briangrant, goltermann, jbeda, sarahnovotny and thockin.
A SIG can have its own policy for contribution,
described in a `README` or `CONTRIBUTING` file in the SIG
folder in this repo (e.g. [sig-cli/contributing](sig-cli/contributing.md)),
and its own mailing list, slack channel, etc.
## How Can I help?
## kubernetes mailing lists
Documentation (like the text you are reading now) can
always use improvement!
Many important announcements and discussions end up on the main development group.
There's a [semi-curated list of issues][help-wanted]
that should not need deep knowledge of the system.
kubernetes-dev@googlegroups.com
To dig deeper, read a design doc, e.g. [architecture].
[Google Group](https://groups.google.com/forum/#!forum/kubernetes-dev)
[Pick a SIG](sig-list.md), peruse its associated [cmd] directory,
find a `main()` and read code until you find something you want to fix.
Users of kubernetes trade notes on:
There's always code that can be clarified and variables
or functions that can be renamed or commented.
kubernetes-users@googlegroups.com
There's always a need for more test coverage.
[Google Group](https://groups.google.com/forum/#!forum/kubernetes-users)
## Learn to Build
Links in [contributors/devel/README.md](contributors/devel/README.md)
lead to many relevant topics, including
* [Developer's Guide] - how to start a build/test cycle
* [Collaboration Guide] - how to work together
* [expectations] - what the community expects
* [pull request] policy - how to prepare a pull request
## Making a Pull Request
We recommend that you work on existing issues before attempting
to [develop a new feature].
Find an existing issue (e.g. one marked [help-wanted], or simply
ask a SIG lead for suggestions), and respond on the issue thread
expressing interest in working on it.
This helps other people know that the issue is active, and
hopefully prevents duplicated efforts.
Before submitting a pull request, sign the [CLA].
If you want to work on a new idea of relatively small scope:
1. Submit an issue describing your proposed change to the repo in question.
1. The repo owners will respond to your issue promptly.
1. If your proposed change is accepted,
sign the [CLA],
and start work in your fork.
1. Submit a [pull request] containing a tested change.
## [Weekly Community Video Conference](community/README.md)
[architecture]: https://github.com/kubernetes/kubernetes/blob/master/docs/design/architecture.md
[cmd]: https://github.com/kubernetes/kubernetes/tree/master/cmd
[CLA]: cla.md
[Collaboration Guide]: contributors/devel/development.md
[Developer's Guide]: contributors/devel/development.md
[develop a new feature]: https://github.com/kubernetes/features
[expectations]: contributors/devel/community-expectations.md
[help-wanted]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Ahelp-wanted
[pull request]: contributors/devel/pull-requests.md
The [weekly community meeting](https://zoom.us/my/kubernetescommunity) provides an opportunity for the different SIGs, WGs and other parts of the community to come together. More information about joining the weekly community meeting is available on our [agenda working document] (https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/CONTRIBUTING.md?pixel)]()
## Special Interest Groups (SIG) and Working Groups
Much of the community activity is organized into a community meeting, numerous SIGs and time bounded WGs. SIGs follow these [guidelines](governance.md) although each of these groups may operate a little differently depending on their needs and workflow. Each groups material is in its subdirectory in this project.
The community meeting calendar is available as an [iCal to subscribe to] (https://calendar.google.com/calendar/ical/cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com/public/basic.ics) (simply copy and paste the url into any calendar product that supports the iCal format) or [html to view] (https://calendar.google.com/calendar/embed?src=cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com&ctz=America/Los_Angeles).
| Name | Leads | Group | Slack Channel | Meetings |
|------|-------|-------|---------------|----------|
| [API Machinery](sig-api-machinery/README.md) | [@lavalamp (Daniel Smith, Google)](https://github.com/lavalamp) <br> [@deads2k (David Eads, Red Hat)] (https://github.com/orgs/kubernetes/people/deads2k)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-api-machinery) | [#sig-api-machinery](https://kubernetes.slack.com/messages/sig-api-machinery/) | [Every other Wednesday at 11:00 AM PST](https://staging.talkgadget.google.com/hangouts/_/google.com/kubernetes-sig) |
| [Apps](sig-apps/README.md) | [@michelleN (Michelle Noorali, Deis)](https://github.com/michelleN)<br>[@mattfarina (Matt Farina, HPE)](https://github.com/mattfarina) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-apps) | [#sig-apps](https://kubernetes.slack.com/messages/sig-apps) | [Mondays 9:00AM PST](https://zoom.us/j/4526666954) |
| [Auth](sig-auth/README.md) | [@ erictune (Eric Tune, Google)](https://github.com/erictune)<br> [@ericchiang (Eric Chiang, CoreOS)](https://github.com/orgs/kubernetes/people/ericchiang)<br> [@liggitt (Jordan Liggitt, Red Hat)] (https://github.com/orgs/kubernetes/people/liggitt) <br> [@deads2k (David Eads, Red Hat)] (https://github.com/orgs/kubernetes/people/deads2k) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-auth) | [#sig-auth](https://kubernetes.slack.com/messages/sig-auth/) | Biweekly [Wednesdays at 1100 to 1200 PT](https://zoom.us/my/k8s.sig.auth) |
| [Autoscaling](sig-autoscaling/README.md) | [@fgrzadkowski (Filip Grządkowski, Google)](https://github.com/fgrzadkowski)<br> [@directxman12 (Solly Ross, Red Hat)](https://github.com/directxman12) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-autoscaling) | [#sig-autoscaling](https://kubernetes.slack.com/messages/sig-autoscaling/) | Biweekly (or triweekly) on [Thurs at 0830 PT](https://plus.google.com/hangouts/_/google.com/k8s-autoscaling) |
| [AWS](sig-aws/README.md) | [@justinsb (Justin Santa Barbara)](https://github.com/justinsb)<br>[@kris-nova (Kris Nova)](https://github.com/kris-nova)<br>[@chrislovecnm (Chris Love)](https://github.com/chrislovecnm)<br>[@mfburnett (Mackenzie Burnett)](https://github.com/mfburnett) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) | [#sig-aws](https://kubernetes.slack.com/messages/sig-aws/) | We meet on [Zoom](https://zoom.us/my/k8ssigaws), and the calls are scheduled via the official [group mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) |
| [Big Data](sig-big-data/README.md) | [@zmerlynn (Zach Loafman, Google)](https://github.com/zmerlynn)<br>[@timothysc (Timothy St. Clair, Red Hat)](https://github.com/timothysc)<br>[@wattsteve (Steve Watt, Red Hat)](https://github.com/wattsteve) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data) | [#sig-big-data](https://kubernetes.slack.com/messages/sig-big-data/) | Suspended |
| [CLI](sig-cli/README.md) | [@fabianofranz (Fabiano Franz, Red Hat)](https://github.com/fabianofranz)<br>[@pwittrock (Phillip Wittrock, Google)](https://github.com/pwittrock)<br>[@AdoHe (Tony Ado, Alibaba)](https://github.com/AdoHe) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cli) | [#sig-cli](https://kubernetes.slack.com/messages/sig-cli) | Bi-weekly Wednesdays at 9:00 AM PT on [Zoom](https://zoom.us/my/sigcli) |
| [Cluster Lifecycle](sig-cluster-lifecycle/README.md) | [@lukemarsden (Luke Marsden, Weave)] (https://github.com/lukemarsden) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-lifecycle) | [#sig-cluster-lifecycle](https://kubernetes.slack.com/messages/sig-cluster-lifecycle) | Tuesdays at 09:00 AM PST on [Zoom](https://zoom.us/j/166836624) |
| [Cluster Ops](sig-cluster-ops/README.md) | [@zehicle (Rob Hirschfeld, RackN)](https://github.com/zehicle) <br> [@mikedanese (Mike Danese, Google] (https://github.com/mikedanese) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-ops) | [#sig-cluster-ops](https://kubernetes.slack.com/messages/sig-cluster-ops) | Thursdays at 1:00 PM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/sig-cluster-ops)|
| [Contributor Experience](sig-contribx/README.md) | [@grodrigues3 (Garrett Rodrigues, Google)](https://github.com/Grodrigues3) <br> [@pwittrock (Phillip Witrock, Google)] (https://github.com/pwittrock) <br> [@Phillels (Elsie Phillips, CoreOS)](https://github.com/Phillels) | [Group](https://groups.google.com/forum/#!forum/kubernetes-wg-contribex) | [#wg-contribex] (https://kubernetes.slack.com/messages/wg-contribex) | Biweekly Wednesdays 9:30 AM PST on [zoom] (https://zoom.us/j/4730809290) |
| [Docs] (sig-docs/README.md) | [@pwittrock (Philip Wittrock, Google)] (https://github.com/pwittrock) <br> [@devin-donnelly (Devin Donnelly, Google)] (https://github.com/devin-donnelly) <br> [@jaredbhatti (Jared Bhatti, Google)] (https://github.com/jaredbhatti)| [Group] (https://groups.google.com/forum/#!forum/kubernetes-sig-docs) | [#sig-docs] (https://kubernetes.slack.com/messages/sig-docs) | Tuesdays @ 10:30AM PST on [Zoom](https://zoom.us/j/4730809290) |
| [Federation](sig-federation/README.md) | [@csbell (Christian Bell, Google)](https://github.com/csbell) <br> [@quinton-hoole (Quinton Hoole, Huawei)](https://github.com/quinton-hoole) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-federation) | [#sig-federation](https://kubernetes.slack.com/messages/sig-federation/) | Bi-weekly on Monday at 9:00 AM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/ubernetes) |
| [Instrumentation](sig-instrumentation/README.md) | [@piosz (Piotr Szczesniak, Google)](https://github.com/piosz) <br> [@fabxc (Fabian Reinartz, CoreOS)](https://github.com/fabxc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-instrumentation) | [#sig-instrumentation](https://kubernetes.slack.com/messages/sig-instrumentation) | [Thursdays at 9.30 AM PST](https://zoom.us/j/5342565819) |
| [Network](sig-network/README.md) | [@thockin (Tim Hockin, Google)](https://github.com/thockin)<br> [@dcbw (Dan Williams, Red Hat)](https://github.com/dcbw)<br> [@caseydavenport (Casey Davenport, Tigera)](https://github.com/caseydavenport) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-network) | [#sig-network](https://kubernetes.slack.com/messages/sig-network/) | Thursdays at 2:00 PM PST on [Zoom](https://zoom.us/j/5806599998) |
| [Node](sig-node/README.md) | [@dchen1107 (Dawn Chen, Google)](https://github.com/dchen1107)<br>[@euank (Euan Kemp, CoreOS)](https://github.com/orgs/kubernetes/people/euank)<br>[@derekwaynecarr (Derek Carr, Red Hat)](https://github.com/derekwaynecarr) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-node) | [#sig-node](https://kubernetes.slack.com/messages/sig-node/) | [Tuesdays at 10:00 PT](https://plus.google.com/hangouts/_/google.com/sig-node-meetup?authuser=0) |
| [On Prem](sig-onprem/README.md) | [@josephjacks (Joseph Jacks, Apprenda)] (https://github.com/josephjacks) <br> [@zen (Tomasz Napierala, Mirantis)] (https://github.com/zen)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-on-prem) | [#sig-onprem](https://kubernetes.slack.com/messages/sig-onprem/) | Every second Wednesday at 8 PM PST / 11 PM EST |
| [OpenStack](sig-openstack/README.md) | [@idvoretskyi (Ihor Dvoretskyi, Mirantis)] (https://github.com/idvoretskyi) <br> [@xsgordon (Steve Gordon, Red Hat)] (https://github.com/xsgordon)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack) | [#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/) | Every second Wednesday at 5 PM PDT / 2 PM EDT |
| [PM](project-managers/README.md) | [] ()| [Group](https://groups.google.com/forum/#!forum/kubernetes-pm) | []() | TBD|
| [Rktnetes](sig-rktnetes/README.md) | [@euank (Euan Kemp, CoreOS)] (https://github.com/euank) <br> [@tmrts (Tamer Tas)] (https://github.com/tmrts) <br> [@yifan-gu (Yifan Gu, CoreOS)] (https://github.com/yifan-gu) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-rktnetes) | [#sig-rktnetes](https://kubernetes.slack.com/messages/sig-rktnetes/) | [As needed (ad-hoc)](https://zoom.us/j/830298957) |
| [Scalability](sig-scalability/README.md) | [@lavalamp (Daniel Smith, Google)](https://github.com/lavalamp)<br>[@countspongebob (Bob Wise, Samsung SDS)](https://github.com/countspongebob)<br>[@jbeda (Joe Beda)](https://github.com/jbeda) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scale) | [#sig-scale](https://kubernetes.slack.com/messages/sig-scale/) | [Thursdays at 09:00 PT](https://zoom.us/j/989573207) |
| [Scheduling](sig-scheduling/README.md) | [@davidopp (David Oppenheimer, Google)](https://github.com/davidopp)<br>[@timothysc (Timothy St. Clair, Red Hat)](https://github.com/timothysc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scheduling) | [#sig-scheduling](https://kubernetes.slack.com/messages/sig-scheduling/) | Alternate between Mondays at 1 PM PT and Wednesdays at 12:30 AM PT on [Zoom](https://zoom.us/zoomconference?m=rN2RrBUYxXgXY4EMiWWgQP6Vslgcsn86) |
| [Service Catalog](sig-service-catalog/README.md) | [@pmorie (Paul Morie, Red Hat)](https://github.com/pmorie) <br> [@arschles (Aaron Schlesinger, Deis)](github.com/arschles) <br> [@bmelville (Brendan Melville, Google)](https://github.com/bmelville) <br> [@duglin (Doug Davis, IBM)](https://github.com/duglin)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-service-catalog) | [#sig-service-catalog](https://kubernetes.slack.com/messages/sig-service-catalog/) | [Mondays at 1 PM PST](https://zoom.us/j/7201225346) |
| [Storage](sig-storage/README.md) | [@saad-ali (Saad Ali, Google)](https://github.com/saad-ali)<br>[@childsb (Brad Childs, Red Hat)](https://github.com/childsb) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-storage) | [#sig-storage](https://kubernetes.slack.com/messages/sig-storage/) | Bi-weekly Thursdays 9 AM PST (or more frequently) on [Zoom](https://zoom.us/j/614261834) |
| [Testing](sig-testing/README.md) | [@spiffxp (Aaron Crickenberger, Samsung)](https://github.com/spiffxp)<br>[@ixdy (Jeff Grafton, Google)](https://github.com/ixdy) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-testing) | [#sig-testing](https://kubernetes.slack.com/messages/sig-testing/) | [Tuesdays at 9:30 AM PT](https://zoom.us/j/553910341) |
| [UI](sig-ui/README.md) | [@romlein (Dan Romlein, Apprenda)](https://github.com/romlein)<br> [@bryk (Piotr Bryk, Google)](https://github.com/bryk) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-ui) | [#sig-ui](https://kubernetes.slack.com/messages/sig-ui/) | Wednesdays at 4:00 PM CEST |
| [Windows](sig-windows/README.md) | [@michmike77 (Michael Michael, Apprenda)](https://github.com/michmike)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-windows) | [#sig-windows](https://kubernetes.slack.com/messages/sig-windows) | Bi-weekly Tuesdays at 10:00 AM PT |
### [How to start a SIG](sig-creation-procedure.md)

97
communication.md Normal file
View File

@ -0,0 +1,97 @@
# Communication
The Kubernetes community abides by the [CNCF code of conduct]. Here is an excerpt:
> _As contributors and maintainers of this project, and in the interest
> of fostering an open and welcoming community, we pledge to respect
> all people who contribute through reporting issues, posting feature
> requests, updating documentation, submitting pull requests or patches,
> and other activities._
## SIGs
Kubernetes encompasses many projects, organized into [SIGs](sig-list.md).
Some communication has moved into SIG-specific channels - see
a given SIG subdirectory for details.
Nevertheless, below find a list of many general channels, groups
and meetings devoted to Kubernetes.
## Social Media
* [Twitter]
* [Google+]
* [blog]
* Pose questions and help answer them on [Slack][slack.k8s.io] or [Stack Overflow].
Most real time discussion happens at [kubernetes.slack.com];
you can sign up at [slack.k8s.io].
Discussions on most channels are archived at [kubernetes.slackarchive.io].
Start archiving by inviting the _slackarchive_ bot to a
channel via `/invite @slackarchive`.
To add new channels, contact one of the admins
(briangrant, goltermann, jbeda, sarahnovotny and thockin).
## Issues
If you have a question about Kubernetes or have a problem using it,
please start with the [troubleshooting guide].
If that doesn't answer your questions, or if you think you found a bug,
please [file an issue].
## Mailing lists
Development announcements and discussions appear on the Google group
[kubernetes-dev] (send mail to `kubernetes-dev@googlegroups.com`).
Users trade notes on the Google group
[kubernetes-users] (send mail to `kubernetes-users@googlegroups.com`).
## Weekly Meeting
We have PUBLIC and RECORDED [weekly meeting] every Thursday at 10am US Pacific Time.
Map that to your local time with this [timezone table].
See it on the web at [calendar.google.com], or paste this [iCal url] into any iCal client.
To be added to the calendar items, join the Google group
[kubernetes-community-video-chat] for further instructions.
If you have a topic you'd like to present or would like to see discussed,
please propose a specific date on the [Kubernetes Community Meeting Agenda].
## Conferences
* [kubecon]
* [cloudnativecon]
[blog]: http://blog.kubernetes.io
[calendar.google.com]: https://calendar.google.com/calendar/embed?src=cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com&ctz=America/Los_Angeles
[cloudnativecon]: http://events.linuxfoundation.org/events/cloudnativecon
[CNCF code of conduct]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md
[communication]: https://github.com/kubernetes/community/blob/master/communication.md
[community meeting]: https://github.com/kubernetes/community/blob/master/communication.md#weekly-meeting
[file an issue]: https://github.com/kubernetes/kubernetes/issues/new
[Google+]: https://plus.google.com/u/0/b/116512812300813784482/116512812300813784482
[iCal url]: https://calendar.google.com/calendar/ical/cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com/public/basic.ics
[kubecon]: http://events.linuxfoundation.org/events/kubecon
[Kubernetes Community Meeting Agenda]: https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#
[kubernetes-community-video-chat]: https://groups.google.com/forum/#!forum/kubernetes-community-video-chat
[kubernetes-dev]: https://groups.google.com/forum/#!forum/kubernetes-dev
[kubernetes-users]: https://groups.google.com/forum/#!forum/kubernetes-users
[kubernetes.slackarchive.io]: http://kubernetes.slackarchive.io
[kubernetes.slack.com]: http://kubernetes.slack.com
[Special Interest Group]: https://github.com/kubernetes/community/blob/master/README.md#SIGs
[slack.k8s.io]: http://slack.k8s.io
[Stack Overflow]: http://stackoverflow.com/questions/tagged/kubernetes
[timezone table]: https://www.google.com/search?q=1000+am+in+pst
[troubleshooting guide]: http://kubernetes.io/docs/troubleshooting
[Twitter]: https://twitter.com/kubernetesio
[weekly meeting]: https://zoom.us/my/kubernetescommunity

View File

@ -1,7 +0,0 @@
# Weekly Community Video Conference
We have PUBLIC and RECORDED [weekly video meetings](https://zoom.us/my/kubernetescommunity) every Thursday at 10am US Pacific Time. You can [find the time in your timezone with this table](https://www.google.com/search?q=1000+am+in+pst).
To be added to the calendar items, join this [google group](https://groups.google.com/forum/#!forum/kubernetes-community-video-chat) for further instructions.
If you have a topic you'd like to present or would like to see discussed, please propose a specific date on the Kubernetes Community Meeting [Working Document](https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#).

View File

@ -12,7 +12,7 @@ You don't actually need federation for geo-location now, but it helps. The ment
From the enterprise point of view, central IT is in control and knowledge of where stuff gets deployed. Bob thinks it would be a very bad idea for us to try to solve complex policy ideas and enable them, it's a tar pit. We should just have the primitives of having different regions and be able to say what goes where.
Currently, you either do node labelling which ends up being complex and dependant on discipline. Or you have different clusters and you don't have common namespaces. Some discussion of Intel proposal for cluster metadata.
Currently, you either do node labelling which ends up being complex and dependent on discipline. Or you have different clusters and you don't have common namespaces. Some discussion of Intel proposal for cluster metadata.
Bob's mental model is AWS regions and AZs. For example, if we're building a big cassandra cluster, and you want to make sure that nodes aren't all in the same zone.

View File

@ -1,61 +1,14 @@
# Kubernetes Design Overview
# Kubernetes Design Documents and Proposals
Kubernetes is a system for managing containerized applications across multiple
hosts, providing basic mechanisms for deployment, maintenance, and scaling of
applications.
This directory contains Kubernetes design documents and accepted design proposals.
Kubernetes establishes robust declarative primitives for maintaining the desired
state requested by the user. We see these primitives as the main value added by
Kubernetes. Self-healing mechanisms, such as auto-restarting, re-scheduling, and
replicating containers require active controllers, not just imperative
orchestration.
For a design overview, please see [the architecture document](architecture.md).
Kubernetes is primarily targeted at applications composed of multiple
containers, such as elastic, distributed micro-services. It is also designed to
facilitate migration of non-containerized application stacks to Kubernetes. It
therefore includes abstractions for grouping containers in both loosely coupled
and tightly coupled formations, and provides ways for containers to find and
communicate with each other in relatively familiar ways.
Note that a number of these documents are historical and may be out of date or unimplemented.
Kubernetes enables users to ask a cluster to run a set of containers. The system
automatically chooses hosts to run those containers on. While Kubernetes's
scheduler is currently very simple, we expect it to grow in sophistication over
time. Scheduling is a policy-rich, topology-aware, workload-specific function
that significantly impacts availability, performance, and capacity. The
scheduler needs to take into account individual and collective resource
requirements, quality of service requirements, hardware/software/policy
constraints, affinity and anti-affinity specifications, data locality,
inter-workload interference, deadlines, and so on. Workload-specific
requirements will be exposed through the API as necessary.
Kubernetes is intended to run on a number of cloud providers, as well as on
physical hosts.
A single Kubernetes cluster is not intended to span multiple availability zones.
Instead, we recommend building a higher-level layer to replicate complete
deployments of highly available applications across multiple zones (see
[the multi-cluster doc](../admin/multi-cluster.md) and [cluster federation proposal](../proposals/federation.md)
for more details).
Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS
platform and toolkit. Therefore, architecturally, we want Kubernetes to be built
as a collection of pluggable components and layers, with the ability to use
alternative schedulers, controllers, storage systems, and distribution
mechanisms, and we're evolving its current code in that direction. Furthermore,
we want others to be able to extend Kubernetes functionality, such as with
higher-level PaaS functionality or multi-cluster layers, without modification of
core Kubernetes source. Therefore, its API isn't just (or even necessarily
mainly) targeted at end users, but at tool and extension developers. Its APIs
are intended to serve as the foundation for an open ecosystem of tools,
automation systems, and higher-level API layers. Consequently, there are no
"internal" inter-component APIs. All APIs are visible and available, including
the APIs used by the scheduler, the node controller, the replication-controller
manager, Kubelet's API, etc. There's no glass to break -- in order to handle
more complex use cases, one can just access the lower-level APIs in a fully
transparent, composable manner.
For more about the Kubernetes architecture, see [architecture](architecture.md).
TODO: Add the current status to each document and clearly indicate which are up to date.
TODO: Document the [proposal process](../devel/faster_reviews.md#1-dont-build-a-cathedral-in-one-pr).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/README.md?pixel)]()

View File

@ -0,0 +1,340 @@
Add new patchStrategy to clear fields not present in the patch
=============
Add tags `patchStrategy:"replaceKeys"`. For a given type that has the tag, all keys/fields missing
from the request will be cleared when patching the object.
For a field presents in the request, it will be merged with the live config.
The proposal of Full Union is in [kubernetes/community#388](https://github.com/kubernetes/community/pull/388).
| Capability | Supported By This Proposal | Supported By Full Union |
|---|---|---|---|
| Auto clear missing fields on patch | X | X |
| Merge union fields on patch | X | X |
| Validate only 1 field set on type | | X |
| Validate discriminator field matches one-of field | | X |
| Support non-union patchKey | X | TBD |
| Support arbitrary combinations of set fields | X | |
## Use cases
- As a user patching a map, I want keys mutually exclusive with those that I am providing to automatically be cleared.
- As a user running kubectl apply, when I update a field in my configuration file,
I want mutually exclusive fields never specified in my configuration to be cleared.
## Examples:
- General Example: Keys in a Union are mutually exclusive. Clear unspecified union values in a Union that contains a discriminator.
- Specific Example: When patching a Deployment .spec.strategy, clear .spec.strategy.rollingUpdate
if it is not provided in the patch so that changing .spec.strategy.type will not fail.
- General Example: Keys in a Union are mutually exclusive. Clear unspecified union values in a Union
that does not contain a discriminator.
- Specific Example: When patching a Pod .spec.volume, clear all volume fields except the one specified in the patch.
## Proposed Changes
### APIs
**Scope**:
| Union Type | Supported |
|---|---|---|
| non-inlined non-discriminated union | Yes |
| non-inlined discriminated union | Yes |
| inlined union with [patchMergeKey](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#strategic-merge-patch) only | Yes |
| other inlined union | No |
For the inlined union with patchMergeKey, we move the tag to the parent struct's instead of
adding some logic to lookup the metadata in go struct of the inline union.
Because the limitation of the latter is that the metadata associated with
the inlined APIs will not be reflected in the OpenAPI schema.
#### Tags
old tags:
1) `patchMergeKey`:
It is the key to distinguish the entries in the list of non-primitive types. It must always be
present to perform the merge on the list of non-primitive types, and will be preserved.
2) `patchStrategy`:
It indicates how to generate and merge a patch for lists. It could be `merge` or `replace`. It is optional for lists.
new tags:
`patchStrategy: replaceKeys`:
We introduce a new value `replaceKeys` for `patchStrategy`.
It indicates that all fields needing to be preserved must be present in the patch.
And the fields that are present will be merged with live object. All the missing fields will be cleared when patching.
#### Examples
1) Non-inlined non-discriminated union:
Type definition:
```go
type ContainerStatus struct {
...
// Add patchStrategy:"replaceKeys"
State ContainerState `json:"state,omitempty" protobuf:"bytes,2,opt,name=state" patchStrategy:"replaceKeys"``
...
}
```
Live object:
```yaml
state:
running:
startedAt: ...
```
Local file config:
```yaml
state:
terminated:
exitCode: 0
finishedAt: ...
```
Patch:
```yaml
state:
$patch: replaceKeys
terminated:
exitCode: 0
finishedAt: ...
```
Result after merging
```yaml
state:
terminated:
exitCode: 0
finishedAt: ...
```
2) Non-inlined discriminated union:
Type definition:
```go
type DeploymentSpec struct {
...
// Add patchStrategy:"replaceKeys"
Strategy DeploymentStrategy `json:"strategy,omitempty" protobuf:"bytes,4,opt,name=strategy" patchStrategy:"replaceKeys"`
...
}
```
Since there are no fields associated with `recreate` in `DeploymentSpec`, I will use a generic example.
Live object:
```yaml
unionName:
discriminatorName: foo
fooField:
fooSubfield: val1
```
Local file config:
```yaml
unionName:
discriminatorName: bar
barField:
barSubfield: val2
```
Patch:
```yaml
unionName:
$patch: replaceKeys
discriminatorName: bar
barField:
barSubfield: val2
```
Result after merging
```yaml
unionName:
discriminatorName: bar
barField:
barSubfield: val2
```
3) Inlined union with `patchMergeKey` only.
This case is special, because `Volumes` already has a tag `patchStrategy:"merge"`.
We change the tag to `patchStrategy:"merge|replaceKeys"`
Type definition:
```go
type PodSpec struct {
...
// Add another value "replaceKeys" to patchStrategy
Volumes []Volume `json:"volumes,omitempty" patchStrategy:"merge|replaceKeys" patchMergeKey:"name" protobuf:"bytes,1,rep,name=volumes"`
...
}
```
Live object:
```yaml
spec:
volumes:
- name: foo
emptyDir:
medium:
...
```
Local file config:
```yaml
spec:
volumes:
- name: foo
hostPath:
path: ...
```
Patch:
```yaml
spec:
volumes:
- name: foo
$patch: replaceKeys
hostPath:
path: ...
```
Result after merging
```yaml
spec:
volumes:
- name: foo
hostPath:
path: ...
```
**Impacted APIs** are listed in the [Appendix](#appendix).
### API server
No required change.
Auto clearing missing fields of a patch relies on package Strategic Merge Patch.
We don't validate only 1 field is set in union in a generic way. We don't validate discriminator
field matches one-of field. But we still rely on hardcoded per field based validation.
### kubectl
No required change.
Changes about how to generate the patch rely on package Strategic Merge Patch.
### Strategic Merge Patch
**Background**
Strategic Merge Patch is a package used by both client and server. A typical usage is that a client
calls the function to calculate the patch and the API server calls another function to merge the patch.
We need to make sure the client always sends a patch that includes all of the fields that it wants to keep.
When merging, auto clear missing fields of a patch if the patch has a directive `$patch: replaceKeys`
### Open API
Update OpenAPI schema.
## Version Skew
The changes are all backward compatible.
Old kubectl vs New server: All behave the same as before, since no new directive in the patch.
New kubectl vs Old server: All behave the same as before, since new directive will not be recognized
by the old server and it will be dropped in conversion; Unchanged fields will not affect the merged result.
# Alternatives Considered
The proposals below are not mutually exclusive with the proposal above, and maybe can be added at some point in the future.
# 1. Add Discriminators in All Unions/OneOf APIs
Original issue is described in kubernetes/kubernetes#35345
## Analysis
### Behavior
If the discriminator were set, we'd require that the field corresponding to its value were set and the APIServer (registry) could automatically clear the other fields.
If the discriminator were unset, behavior would be as before -- exactly one of the fields in the union/oneof would be required to be set and the operation would otherwise fail validation.
We should set discriminators by default. This means we need to change it accordingly when the corresponding union/oneof fields were set and unset.
## Proposed Changes
### API
Add a discriminator field in all unions/oneof APIs. The discriminator should be optional for backward compatibility. There is an example below, the field `Type` works as a discriminator.
```go
type PersistentVolumeSource struct {
...
// Discriminator for PersistentVolumeSource, it can be "gcePersistentDisk", "awsElasticBlockStore" and etc.
// +optional
Type *string `json:"type,omitempty" protobuf:"bytes,24,opt,name=type"`
}
```
### API Server
We need to add defaulting logic described in the [Behavior](#behavior) section.
### kubectl
No change required on kubectl.
## Summary
Limitation: Server-side automatically clearing fields based on discriminator may be unsafe.
# Appendix
## List of Impacted APIs
In `pkg/api/v1/types.go`:
- [`VolumeSource`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L235):
It is inlined. Besides `VolumeSource`. its parent [Volume](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L222) has `Name`.
- [`PersistentVolumeSource`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L345):
It is inlined. Besides `PersistentVolumeSource`, its parent [PersistentVolumeSpec](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L442) has the following fields:
```go
Capacity ResourceList `json:"capacity,omitempty" protobuf:"bytes,1,rep,name=capacity,casttype=ResourceList,castkey=ResourceName"`
// +optional
AccessModes []PersistentVolumeAccessMode `json:"accessModes,omitempty" protobuf:"bytes,3,rep,name=accessModes,casttype=PersistentVolumeAccessMode"`
// +optional
ClaimRef *ObjectReference `json:"claimRef,omitempty" protobuf:"bytes,4,opt,name=claimRef"`
// +optional
PersistentVolumeReclaimPolicy PersistentVolumeReclaimPolicy `json:"persistentVolumeReclaimPolicy,omitempty" protobuf:"bytes,5,opt,name=persistentVolumeReclaimPolicy,casttype=PersistentVolumeReclaimPolicy"`
```
- [`Handler`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L1485):
It is inlined. Besides `Handler`, its parent struct [`Probe`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L1297) also has the following fields:
```go
// +optional
InitialDelaySeconds int32 `json:"initialDelaySeconds,omitempty" protobuf:"varint,2,opt,name=initialDelaySeconds"`
// +optional
TimeoutSeconds int32 `json:"timeoutSeconds,omitempty" protobuf:"varint,3,opt,name=timeoutSeconds"`
// +optional
PeriodSeconds int32 `json:"periodSeconds,omitempty" protobuf:"varint,4,opt,name=periodSeconds"`
// +optional
SuccessThreshold int32 `json:"successThreshold,omitempty" protobuf:"varint,5,opt,name=successThreshold"`
// +optional
FailureThreshold int32 `json:"failureThreshold,omitempty" protobuf:"varint,6,opt,name=failureThreshold"`
````
- [`ContainerState`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L1576):
It is NOT inlined.
- [`PodSignature`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/api/v1/types.go#L2953):
It has only one field, but the comment says "Exactly one field should be set". Maybe we will add more in the future? It is NOT inlined.
In `pkg/authorization/types.go`:
- [`SubjectAccessReviewSpec`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/authorization/types.go#L108):
Comments says: `Exactly one of ResourceAttributes and NonResourceAttributes must be set.`
But there are some other non-union fields in the struct.
So this is similar to INLINED struct.
- [`SelfSubjectAccessReviewSpec`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/authorization/types.go#L130):
It is NOT inlined.
In `pkg/apis/extensions/v1beta1/types.go`:
- [`DeploymentStrategy`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/types.go#L249):
It is NOT inlined.
- [`NetworkPolicyPeer`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/v1beta1/types.go#L1340):
It is NOT inlined.
- [`IngressRuleValue`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/v1beta1/types.go#L876):
It says "exactly one of the following must be set". But it has only one field.
It is inlined. Its parent [`IngressRule`](https://github.com/kubernetes/kubernetes/blob/v1.5.2/pkg/apis/extensions/v1beta1/types.go#L848) also has the following fields:
```go
// +optional
Host string `json:"host,omitempty" protobuf:"bytes,1,opt,name=host"`
```

View File

@ -80,7 +80,7 @@ There are two configurations in which it makes sense to run `kube-aggregator`.
`api.mycompany.com/v2` from another apiserver while you update clients. But
you can't serve `api.mycompany.com/v1/frobbers` and
`api.mcompany.com/v1/grobinators` from different apiservers. This restriction
allows us to limit the scope of `kube-aggregator` to a managable level.
allows us to limit the scope of `kube-aggregator` to a manageable level.
* Follow API conventions: APIs exposed by every API server should adhere to [kubernetes API
conventions](../devel/api-conventions.md).
* Support discovery API: Each API server should support the kubernetes discovery API
@ -160,7 +160,7 @@ Since the actual server which serves client's request can be opaque to the clien
all API servers need to have homogeneous authentication and authorisation mechanisms.
All API servers will handle authn and authz for their resources themselves.
The current authentication infrastructure allows token authentication delegation to the
core `kube-apiserver` and trust of an authentication proxy, which can be fullfilled by
core `kube-apiserver` and trust of an authentication proxy, which can be fulfilled by
`kubernetes-aggregator`.
#### Server Role Bootstrapping

View File

@ -12,7 +12,7 @@ is no way to achieve this in Kubernetes without scripting inside of a container.
## Constraints and Assumptions
1. The volume types must remain unchanged for backward compatability
1. The volume types must remain unchanged for backward compatibility
2. There will be a new volume type for this proposed functionality, but no
other API changes
3. The new volume type should support atomic updates in the event of an input
@ -186,15 +186,31 @@ anything preceding it as before.
### Proposed API objects
```go
type Projections struct {
type ProjectedVolumeSource struct {
Sources []VolumeProjection `json:"sources"`
DefaultMode *int32 `json:"defaultMode,omitempty"`
DefaultMode *int32 `json:"defaultMode,omitempty"`
}
type VolumeProjection struct {
Secret *SecretVolumeSource `json:"secret,omitempty"`
ConfigMap *ConfigMapVolumeSource `json:"configMap,omitempty"`
DownwardAPI *DownwardAPIVolumeSource `json:"downwardAPI,omitempty"`
Secret *SecretProjection `json:"secret,omitempty"`
ConfigMap *ConfigMapProjection `json:"configMap,omitempty"`
DownwardAPI *DownwardAPIProjection `json:"downwardAPI,omitempty"`
}
type SecretProjection struct {
LocalObjectReference
Items []KeyToPath
Optional *bool
}
type ConfigMapProjection struct {
LocalObjectReference
Items []KeyToPath
Optional *bool
}
type DownwardAPIProjection struct {
Items []DownwardAPIVolumeFile
}
```
@ -203,14 +219,7 @@ type VolumeProjection struct {
Add to the VolumeSource struct:
```go
Projected *Projections `json:"projected,omitempty"`
// (other existing fields omitted for brevity)
```
Add to the SecretVolumeSource struct:
```go
LocalObjectReference `json:"name,omitempty"`
Projected *ProjectedVolumeSource `json:"projected,omitempty"`
// (other existing fields omitted for brevity)
```

View File

@ -0,0 +1,67 @@
# Exposing annotations via environment downward API
Author: Michal Rostecki \<michal@kinvolk.io\>
## Introduction
Annotations of the pod can be taken through the Kubernetes API, but currently
there is no way to pass them to the application inside the container. This means
that annotations can be used by the core Kubernetes services and the user outside
of the Kubernetes cluster.
Of course using Kubernetes API from the application running inside the container
managed by Kubernetes is technically possible, but that's an idea which denies
the principles of microservices architecture.
The purpose of the proposal is to allow to pass the annotation as the environment
variable to the container.
### Use-case
The primary usecase for this proposal are StatefulSets. There is an idea to expose
StatefulSet index to the applications running inside the pods managed by StatefulSet.
Since StatefulSet creates pods as the API objects, passing this index as an
annotation seems to be a valid way to do this. However, to finally pass this
information to the containerized application, we need to pass this annotation.
That's why the downward API for annotations is needed here.
## API
The exact `fieldPath` to the annotation will have the following syntax:
```
metadata.annotations['annotationKey']
```
Which means that:
- the *annotationKey* will be specified inside brackets (`[`, `]`) and single quotation
marks (`'`)
- if the *annotationKey* contains `[`, `]` or `'` characters inside, they will need to
be escaped (like `\[`, `\]`, `\'`) and having these characters unescaped should result
in validation error
Examples:
- `metadata.annotations['spec.pod.beta.kubernetes.io/statefulset-index']`
- `metadata.annotations['foo.bar/example-annotation']`
- `metadata.annotations['foo.bar/more\'complicated\]example\[with\'characters"to-escape']`
So, assuming that we would want to pass the `pod.beta.kubernetes.io/statefulset-index`
annotation as a `STATEFULSET_INDEX` variable, the environment variable definition
will look like:
```
env:
- name: STATEFULSET_INDEX
valueFrom:
fieldRef:
fieldPath: metadata.annotations['spec.pod.beta.kubernetes.io/statefulset-index']
```
## Implementation
In general, this environment downward API part will be implemented in the same
place as the other metadata - as a label conversion function.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/annotations-downward-api.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -1,84 +1,243 @@
# Kubernetes architecture
# Kubernetes Design and Architecture
A running Kubernetes cluster contains node agents (`kubelet`) and master
components (APIs, scheduler, etc), on top of a distributed storage solution.
This diagram shows our desired eventual state, though we're still working on a
few things, like making `kubelet` itself (all our components, really) run within
containers, and making the scheduler 100% pluggable.
## Overview
![Architecture Diagram](architecture.png?raw=true "Architecture overview")
Kubernetes is production-grade, open-source infrastructure for the deployment, scaling,
management, and composition of application containers across clusters of hosts, inspired
by [previous work at Google](https://research.google.com/pubs/pub44843.html). Kubernetes
is more than just a “container orchestrator”. It aims to eliminate the burden of orchestrating
physical/virtual compute, network, and storage infrastructure, and enable application operators
and developers to focus entirely on container-centric primitives for self-service operation.
Kubernetes also provides a stable, portable foundation (a platform) for building customized
workflows and higher-level automation.
## The Kubernetes Node
Kubernetes is primarily targeted at applications composed of multiple containers. It therefore
groups containers using *pods* and *labels* into tightly coupled and loosely coupled formations
for easy management and discovery.
When looking at the architecture of the system, we'll break it down to services
that run on the worker node and services that compose the cluster-level control
plane.
## Scope
Kubernetes is a [platform for deploying and managing containers]
(https://kubernetes.io/docs/whatisk8s/). Kubernetes provides a container runtime, container
orchestration, container-centric infrastructure orchestration, self-healing mechanisms such as health checking and re-scheduling, and service discovery and load balancing.
Kubernetes aspires to be an extensible, pluggable, building-block OSS
platform and toolkit. Therefore, architecturally, we want Kubernetes to be built
as a collection of pluggable components and layers, with the ability to use
alternative schedulers, controllers, storage systems, and distribution
mechanisms, and we're evolving its current code in that direction. Furthermore,
we want others to be able to extend Kubernetes functionality, such as with
higher-level PaaS functionality or multi-cluster layers, without modification of
core Kubernetes source. Therefore, its API isn't just (or even necessarily
mainly) targeted at end users, but at tool and extension developers. Its APIs
are intended to serve as the foundation for an open ecosystem of tools,
automation systems, and higher-level API layers. Consequently, there are no
"internal" inter-component APIs. All APIs are visible and available, including
the APIs used by the scheduler, the node controller, the replication-controller
manager, Kubelet's API, etc. There's no glass to break -- in order to handle
more complex use cases, one can just access the lower-level APIs in a fully
transparent, composable manner.
## Goals
The project is committed to the following (aspirational) [design ideals](principles.md):
* _Portable_. Kubernetes runs everywhere -- public cloud, private cloud, bare metal, laptop --
with consistent behavior so that applications and tools are portable throughout the ecosystem
as well as between development and production environments.
* _General-purpose_. Kubernetes should run all major categories of workloads to enable you to run
all of your workloads on a single infrastructure, stateless and stateful, microservices and
monoliths, services and batch, greenfield and legacy.
* _Meet users partway_. Kubernetes doesnt just cater to purely greenfield cloud-native
applications, nor does it meet all users where they are. It focuses on deployment and management
of microservices and cloud-native applications, but provides some mechanisms to facilitate
migration of monolithic and legacy applications.
* _Flexible_. Kubernetes functionality can be consumed a la carte and (in most cases) Kubernetes
does not prevent you from using your own solutions in lieu of built-in functionality.
* _Extensible_. Kubernetes enables you to integrate it into your environment and to add the
additional capabilities you need, by exposing the same interfaces used by built-in
functionality.
* _Automatable_. Kubernetes aims to dramatically reduce the burden of manual operations. It
supports both declarative control by specifying users desired intent via its API, as well as
imperative control to support higher-level orchestration and automation. The declarative
approach is key to the systems self-healing and autonomic capabilities.
* _Advance the state of the art_. While Kubernetes intends to support non-cloud-native
applications, it also aspires to advance the cloud-native and DevOps state of the art, such as
in the [participation of applications in their own management]
(http://blog.kubernetes.io/2016/09/cloud-native-application-interfaces.html). However, in doing
so, we strive not to force applications to lock themselves into Kubernetes APIs, which is, for
example, why we prefer configuration over convention in the [downward API]
(https://kubernetes.io/docs/user-guide/downward-api/). Additionally, Kubernetes is not bound by
the lowest common denominator of systems upon which it depends, such as container runtimes and
cloud providers. An example where we pushed the envelope of what was achievable was in its [IP
per Pod networking model](https://kubernetes.io/docs/admin/networking/#kubernetes-model).
## Architecture
A running Kubernetes cluster contains node agents (kubelet) and a cluster control plane (AKA
*master*), with cluster state backed by a distributed storage system
([etcd](https://github.com/coreos/etcd)).
### Cluster control plane (AKA *master*)
The Kubernetes [control plane](https://en.wikipedia.org/wiki/Control_plane) is split
into a set of components, which can all run on a single *master* node, or can be replicated
in order to support high-availability clusters, or can even be run on Kubernetes itself (AKA
[self-hosted](self-hosted-kubernetes.md#what-is-self-hosted)).
Kubernetes provides a REST API supporting primarily CRUD operations on (mostly) persistent resources, which
serve as the hub of its control plane. Kubernetess API provides IaaS-like
container-centric primitives such as [Pods](https://kubernetes.io/docs/user-guide/pods/),
[Services](https://kubernetes.io/docs/user-guide/services/), and [Ingress]
(https://kubernetes.io/docs/user-guide/ingress/), and also lifecycle APIs to support orchestration
(self-healing, scaling, updates, termination) of common types of workloads, such as [ReplicaSet]
(https://kubernetes.io/docs/user-guide/replicasets/) (simple fungible/stateless app manager),
[Deployment](https://kubernetes.io/docs/user-guide/deployments/) (orchestrates updates of
stateless apps), [Job](https://kubernetes.io/docs/user-guide/jobs/) (batch), [CronJob]
(https://kubernetes.io/docs/user-guide/cron-jobs/) (cron), [DaemonSet]
(https://kubernetes.io/docs/admin/daemons/) (cluster services), and [StatefulSet]
(https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/) (stateful apps).
We deliberately decoupled service naming/discovery and load balancing from application
implementation, since the latter is diverse and open-ended.
Both user clients and components containing asynchronous controllers interact with the same API resources, which serve as coordination points, common intermediate representation, and shared state. Most resources contain metadata, including [labels](https://kubernetes.io/docs/user-guide/labels/) and [annotations](https://kubernetes.io/docs/user-guide/annotations/), fully elaborated desired state (spec), including default values, and observed state (status).
Controllers work continuously to drive the actual state towards the desired state, while reporting back the currently observed state for users and for other controllers.
While the controllers are [level-based]
(http://gengnosis.blogspot.com/2007/01/level-triggered-and-edge-triggered.html) to maximize fault
tolerance, they typically `watch` for changes to relevant resources in order to minimize reaction
latency and redundant work. This enables decentralized and decoupled
[choreography-like](https://en.wikipedia.org/wiki/Service_choreography) coordination without a
message bus.
#### API Server
The [API server](https://kubernetes.io/docs/admin/kube-apiserver/) serves up the
[Kubernetes API](https://kubernetes.io/docs/api/). It is intended to be a relatively simple
server, with most/all business logic implemented in separate components or in plug-ins. It mainly
processes REST operations, validates them, and updates the corresponding objects in `etcd` (and
perhaps eventually other stores). Note that, for a number of reasons, Kubernetes deliberately does
not support atomic transactions across multiple resources.
Kubernetes cannot function without this basic API machinery, which includes:
* REST semantics, watch, durability and consistency guarantees, API versioning, defaulting, and
validation
* Built-in admission-control semantics, synchronous admission-control hooks, and asynchronous
resource initialization
* API registration and discovery
Additionally, the API server acts as the gateway to the cluster. By definition, the API server
must be accessible by clients from outside the cluster, whereas the nodes, and certainly
containers, may not be. Clients authenticate the API server and also use it as a bastion and
proxy/tunnel to nodes and pods (and services).
#### Cluster state store
All persistent cluster state is stored in an instance of `etcd`. This provides a way to store
configuration data reliably. With `watch` support, coordinating components can be notified very
quickly of changes.
#### Controller-Manager Server
Most other cluster-level functions are currently performed by a separate process, called the
[Controller Manager](https://kubernetes.io/docs/admin/kube-controller-manager/). It performs
both lifecycle functions (e.g., namespace creation and lifecycle, event garbage collection,
terminated-pod garbage collection, cascading-deletion garbage collection, node garbage collection)
and API business logic (e.g., scaling of pods controlled by a [ReplicaSet]
(https://kubernetes.io/docs/user-guide/replicasets/)).
The application management and composition layer, providing self-healing, scaling, application lifecycle management, service discovery, routing, and service binding and provisioning.
These functions may eventually be split into separate components to make them more easily
extended or replaced.
#### Scheduler
Kubernetes enables users to ask a cluster to run a set of containers. The scheduler
component automatically chooses hosts to run those containers on.
The scheduler watches for unscheduled pods and binds them to nodes via the `/binding` pod
subresource API, according to the availability of the requested resources, quality of service
requirements, affinity and anti-affinity specifications, and other constraints.
Kubernetes supports user-provided schedulers and multiple concurrent cluster schedulers,
using the shared-state approach pioneered by [Omega]
(https://research.google.com/pubs/pub41684.html). In addition to the disadvantages of
pessimistic concurrency described by the Omega paper, [two-level scheduling models]
(http://mesos.berkeley.edu/mesos_tech_report.pdf) that hide information from the upper-level
schedulers need to implement all of the same features in the lower-level scheduler as required by
all upper-layer schedulers in order to ensure that their scheduling requests can be satisfied by
available desired resources.
### The Kubernetes Node
The Kubernetes node has the services necessary to run application containers and
be managed from the master systems.
Each node runs a container runtime (like Docker, rkt or Hyper). The container
runtime is responsible for downloading images and running containers.
#### Kubelet
### `kubelet`
The most important and most prominent controller in Kubernetes is the Kubelet, which is the
primary implementer of the Pod and Node APIs that drive the container execution layer. Without
these APIs, Kubernetes would just be a CRUD-oriented REST application framework backed by a
key-value store (and perhaps the API machinery will eventually be spun out as an independent
project).
The `kubelet` manages [pods](../user-guide/pods.md) and their containers, their
images, their volumes, etc.
Kubernetes executes isolated application containers as its default, native mode of execution, as
opposed to processes and traditional operating-system packages. Not only are application
containers isolated from each other, but they are also isolated from the hosts on which they
execute, which is critical to decoupling management of individual applications from each other and
from management of the underlying cluster physical/virtual infrastructure.
### `kube-proxy`
Kubernetes provides [Pods](https://kubernetes.io/docs/user-guide/pods/) that can host multiple
containers and storage volumes as its fundamental execution primitive in order to facilitate
packaging a single application per container, decoupling deployment-time concerns from build-time
concerns, and migration from physical/virtual machines. The Pod primitive is key to glean the
[primary benefits](https://kubernetes.io/docs/whatisk8s/#why-containers) of deployment on modern
cloud platforms, such as Kubernetes.
Each node also runs a simple network proxy and load balancer (see the
[services FAQ](https://github.com/kubernetes/kubernetes/wiki/Services-FAQ) for
more details). This reflects `services` (see
[the services doc](../user-guide/services.md) for more details) as defined in
the Kubernetes API on each node and can do simple TCP and UDP stream forwarding
(round robin) across a set of backends.
Kubelet also currently links in the [cAdvisor](https://github.com/google/cadvisor) resource monitoring
agent.
Service endpoints are currently found via [DNS](../admin/dns.md) or through
environment variables (both
[Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and
Kubernetes `{FOO}_SERVICE_HOST` and `{FOO}_SERVICE_PORT` variables are
supported). These variables resolve to ports managed by the service proxy.
#### Container runtime
## The Kubernetes Control Plane
Each node runs a container runtime, which is responsible for downloading images and running containers.
The Kubernetes control plane is split into a set of components. Currently they
all run on a single _master_ node, but that is expected to change soon in order
to support high-availability clusters. These components work together to provide
a unified view of the cluster.
Kubelet does not link in the base container runtime. Instead, we're defining a [Container Runtime Interface]
(container-runtime-interface-v1.md) to control the underlying runtime and facilitate pluggability of that layer.
This decoupling is needed in order to maintain clear component boundaries, facilitate testing, and facilitate pluggability.
Runtimes supported today, either upstream or by forks, include at least docker (for Linux and Windows),
[rkt](https://kubernetes.io/docs/getting-started-guides/rkt/),
[cri-o](https://github.com/kubernetes-incubator/cri-o), and [frakti](https://github.com/kubernetes/frakti).
### `etcd`
#### Kube Proxy
All persistent master state is stored in an instance of `etcd`. This provides a
great way to store configuration data reliably. With `watch` support,
coordinating components can be notified very quickly of changes.
The [service](https://kubernetes.io/docs/user-guide/services/) abstraction provides a way to
group pods under a common access policy (e.g., load-balanced). The implementation of this creates
A virtual IP which clients can access and which is transparently proxied to the pods in a Service.
Each node runs a [kube-proxy](https://kubernetes.io/docs/admin/kube-proxy/) process which programs
`iptables` rules to trap access to service IPs and redirect them to the correct backends. This provides a highly-available load-balancing solution with low performance overhead by balancing
client traffic from a node on that same node.
### Kubernetes API Server
Service endpoints are found primarily via [DNS](https://kubernetes.io/docs/admin/dns/).
The apiserver serves up the [Kubernetes API](../api.md). It is intended to be a
CRUD-y server, with most/all business logic implemented in separate components
or in plug-ins. It mainly processes REST operations, validates them, and updates
the corresponding objects in `etcd` (and eventually other stores).
### Add-ons and other dependencies
### Scheduler
A number of components, called [*add-ons*]
(https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) typically run on Kubernetes
itself:
* [DNS](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns)
* [Ingress controller](https://github.com/kubernetes/ingress/tree/master/controllers)
* [Heapster](https://github.com/kubernetes/heapster/) (resource monitoring)
* [Dashboard](https://github.com/kubernetes/dashboard/) (GUI)
The scheduler binds unscheduled pods to nodes via the `/binding` API. The
scheduler is pluggable, and we expect to support multiple cluster schedulers and
even user-provided schedulers in the future.
### Federation
### Kubernetes Controller Manager Server
All other cluster-level functions are currently performed by the Controller
Manager. For instance, `Endpoints` objects are created and updated by the
endpoints controller, and nodes are discovered, managed, and monitored by the
node controller. These could eventually be split into separate components to
make them independently pluggable.
The [`replicationcontroller`](../user-guide/replication-controller.md) is a
mechanism that is layered on top of the simple [`pod`](../user-guide/pods.md)
API. We eventually plan to port it to a generic plug-in mechanism, once one is
implemented.
A single Kubernetes cluster may span multiple availability zones.
However, for the highest availability, we recommend using [cluster federation](federation.md).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/architecture.md?pixel)]()

View File

@ -18,13 +18,13 @@ Similarly, mature organizations will be able to rely on a centrally managed DNS
With that in mind, the proposals here will devolve into simply using DNS names that are validated with system installed root certificates.
## Cluster Location information
## Cluster location information (aka ClusterInfo)
First we define a set of information that identifies a cluster and how to talk to it.
First we define a set of information that identifies a cluster and how to talk to it. We will call this ClusterInfo in this document.
While we could define a new format for communicating the set of information needed here, we'll start by using the standard [`kubeconfig`](http://kubernetes.io/docs/user-guide/kubeconfig-file/) file format.
It is expected that the `kubeconfig` file will have a single unnamed `Cluster` entry. Other information (especially authentication secrets) must be omitted.
It is expected that the `kubeconfig` file will have a single unnamed `Cluster` entry. Other information (especially authentication secrets) MUST be omitted.
### Evolving kubeconfig
@ -45,7 +45,7 @@ Additions include:
**This is to be implemented in a later phase**
Any client of the cluster will want to have this information. As the configuration of the cluster changes we need the client to keep this information up to date. It is assumed that the information here won't drift so fast that clients won't be able to find *some* way to connect.
Any client of the cluster will want to have this information. As the configuration of the cluster changes we need the client to keep this information up to date. The ClusterInfo ConfigMap (defined below) is expected to be a common place to get the latest ClusterInfo for any cluster. Clients should periodically grab this and cache it. It is assumed that the information here won't drift so fast that clients won't be able to find *some* way to connect.
In exceptional circumstances it is possible that this information may be out of date and a client would be unable to connect to a cluster. Consider the case where a user has kubectl set up and working well and then doesn't run kubectl for quite a while. It is possible that over this time (a) the set of servers will have migrated so that all endpoints are now invalid or (b) the root certificates will have rotated so that the user can no longer trust any endpoint.
@ -55,31 +55,35 @@ Now that we know *what* we want to get to the client, the question is how. We w
### Method: Out of Band
The simplest way to do this would be to simply put this object in a file and copy it around. This is more overhead for the user, but it is easy to implement and lets users rely on existing systems to distribute configuration.
The simplest way to obtain ClusterInfo this would be to simply put this object in a file and copy it around. This is more overhead for the user, but it is easy to implement and lets users rely on existing systems to distribute configuration.
For the `kubeadm` flow, the command line might look like:
```
kubeadm join --cluster-info-file=my-cluster.yaml
kubeadm join --discovery-file=my-cluster.yaml
```
Note that TLS bootstrap (which establishes a way for a client to authenticate itself to the server) is a separate issue and has its own set of methods. This command line may have a TLS bootstrap token (or config file) on the command line also.
After loading the ClusterInfo from a file, the client MAY look for updated information from the server by reading the `kube-public` `cluster-info` ConfigMap defined below. However, when retrieving this ConfigMap the client MUST validate the certificate chain when talking to the API server.
**Note:** TLS bootstrap (which establishes a way for a client to authenticate itself to the server) is a separate issue and has its own set of methods. This command line may have a TLS bootstrap token (or config file) on the command line also. For this reason, even thought the `--discovery-file` argument is in the form of a `kubeconfig`, it MUST NOT contain client credentials as defined above.
### Method: HTTPS Endpoint
If the ClusterInfo information is hosted in a trusted place via HTTPS you can just request it that way. This will use the root certificates that are installed on the system. It may or may not be appropriate based on the user's constraints.
If the ClusterInfo information is hosted in a trusted place via HTTPS you can just request it that way. This will use the root certificates that are installed on the system. It may or may not be appropriate based on the user's constraints. This method MUST use HTTPS. Also, even though the payload for this URL is the `kubeconfig` format, it MUST NOT contain client credentials.
```
kubeadm join --cluster-info-url="https://example/mycluster.yaml"
kubeadm join --discovery-url="https://example/mycluster.yaml"
```
This is really a shorthand for someone doing something like (assuming we support stdin with `-`):
```
curl https://example.com/mycluster.json | kubeadm join --cluster-info-file=-
curl https://example.com/mycluster.json | kubeadm join --discovery-file=-
```
If the user requires some auth to the HTTPS server (to keep the ClusterInfo object private) that can be done in the curl command equivalent. Or we could eventually add it to `kubeadm` directly.
After loading the ClusterInfo from a URL, the client MAY look for updated information from the server by reading the `kube-public` `cluster-info` ConfigMap defined below. However, when retrieving this ConfigMap the client MUST validate the certificate chain when talking to the API server.
**Note:** support for loading from stdin for `--discovery-file` may not be implemented immediately.
### Method: Bootstrap Token
@ -100,7 +104,7 @@ The user experience for joining a cluster would be something like:
kubeadm join --token=ae23dc.faddc87f5a5ab458 <address>
```
**Note:** This is logically a different use of the token from TLS bootstrap. We harmonize these usages and allow the same token to play double duty.
**Note:** This is logically a different use of the token used for authentication for TLS bootstrap. We harmonize these usages and allow the same token to play double duty.
#### Implementation Flow
@ -130,6 +134,8 @@ The first part of the token is the `token-id`. The second part is the `token-se
This new type of token is different from the current CSV token authenticator that is currently part of Kubernetes. The CSV token authenticator requires an update on disk and a restart of the API server to update/delete tokens. As we prove out this token mechanism we may wish to deprecate and eventually remove that mechanism.
The `token-id` must be 6 characters and the `token-secret` must be 16 characters. They must be lower case ASCII letters and numbers. Specifically it must match the regular expression: `[a-z0-9]{6}\.[a-z0-9]{16}`. There is no strong reasoning behind this beyond the history of how this has been implemented in alpha versions.
#### NEW: Bootstrap Token Secrets
Bootstrap tokens are stored and managed via Kubernetes secrets in the `kube-system` namespace. They have type `bootstrap.kubernetes.io/token`.
@ -138,11 +144,13 @@ The following keys are on the secret data:
* **token-id**. As defined above.
* **token-secret**. As defined above.
* **expiration**. After this time the token should be automatically deleted. This is encoded as an absolute UTC time using RFC3339.
* **usage-bootstrap-signing**. Set to `true` to indicate this token should be used for signing bootstrap configs. If omitted or some other string, it defaults to `false`.
* **usage-bootstrap-signing**. Set to `true` to indicate this token should be used for signing bootstrap configs. If this is missing from the token secret or set to any other value, the usage is not allowed.
* **usage-bootstrap-authentication**. Set to true to indicate that this token should be used for authenticating to the API server. If this is missing from the token secret or set to any other value, the usage is not allowed. The bootstrap token authenticagtor will use this token to auth as a user that is `system:bootstrap:<token-id>` in the group `system:bootstrappers`.
* **description**. An optional free form description field for denoting the purpose of the token. If users have especially complex token management neads, they are encouraged to use labels and annotations instead of packing machined readable data in to this field.
These secrets can be named anything but it is suggested that they be named `bootstrap-token-<token-id>`.
**Future**: At some point in the future we may add the ability to specify a set of groups that this token part of during authentication. This will allow users to segment off which tokens are allowed to bootstrap which nodes. However, we will restrict these groups under `system:bootstrappers:*` to discourage usage outside of bootstrapping.
**QUESTION:** Should we also spec out now how we can use this token for TLS bootstrap.
These secrets MUST be named `bootstrap-token-<token-id>`. If a token doesn't adhere to this naming scheme it MUST be ignored. The secret MUST also be ignored if the `token-id` key in the secret doesn't match the name of the secret.
#### Quick Primer on JWS
@ -167,11 +175,60 @@ A new well known ConfigMap will be created in the `kube-public` namespace called
Users configuring the cluster (and eventually the cluster itself) will update the `kubeconfig` key here with the limited `kubeconfig` above.
A new controller is introduced that will watch for both new/modified bootstrap tokens and changes to the `cluster-info` ConfigMap. As things change it will generate new JWS signatures. These will be saved under ConfigMap keys of the pattern `jws-kubeconfig-<token-id>`.
A new controller (`bootstrapsigner`) is introduced that will watch for both new/modified bootstrap tokens and changes to the `cluster-info` ConfigMap. As things change it will generate new JWS signatures. These will be saved under ConfigMap keys of the pattern `jws-kubeconfig-<token-id>`.
In addition, `jws-kubeconfig-<token-id>-hash` will be set to the MD5 hash of the contents of the `kubeconfig` data. This will be in the form of `md5:d3b07384d113edec49eaa6238ad5ff00`. This is done so that the controller can detect which signatures need to be updated without reading all of the tokens.
Another controller (`tokencleaner`) is introduced that deletes tokens that are past their expiration time.
This controller will also delete tokens that are past their expiration time.
Logically these controllers could run as a component in the control plane. But, for the sake of efficiency, they are bundeled as part of the Kubernetes controller-manager.
## `kubeadm` UX
We extend kubeadm with a set of flags and helper commands for managing and using these tokens.
### `kubeadm init` flags
* `--token` If set, this injects the bootstrap token to use when initializing the cluster. If this is unset, then a random token is created and shown to the user. If set explicitly to the empty string then no token is generated or created. This token is used for both discovery and TLS bootstrap by having `usage-bootstrap-signing` and `usage-bootstrap-authentication` set on the token secret.
* `--token-ttl` If set, this sets the TTL for the lifetime of this token. Defaults to 0 which means "forever"
### `kubeadm join` flags
* `--token` This sets the token for both discovery and bootstrap auth.
* `--discovery-url` If set this will grab the cluster-info data (a kubeconfig) from a URL. Due to the sensitive nature of this data, we will only support https URLs. This also supports `username:password@host` syntax for doing HTTP auth.
* `--discovery-file` If set, this will load the cluster-info from a file.
* `--discovery-token` If set, (or set via `--token`) then we will be using the token scheme described above.
* `--tls-bootstrap-token` (not officially part of this spec) This sets the token used to temporarily authenticate to the API server in order to submit a CSR for signing. If `--insecure-experimental-approve-all-kubelet-csrs-for-group` is set to `system:bootstrappers` then these CSRs will be approved automatically for a hands off joining flow.
Only one of `--discovery-url`, `--discovery-file` or `--discovery-token` can be set. If more than one is set then an error is surfaced and `kubeadm join` exits. Setting `--token` counts as setting `--discovery-token`.
### `kubeadm token` commands
`kubeadm` provides a set of utilities for manipulating token secrets in a running server.
* `kubeadm token create [token]` Creates a token server side. With no options this'll create a token that is used for discovery and TLS bootstrap.
* `[token]` The actual token value (in `id.secret` form) to write in. If unset, a random value is generated.
* `--usages` A list of usages. Defaults to `signing,authentication`.
* If the `signing` usage is specified, the token will be used (by the BootstrapSigner controller in the KCM) to JWS-sign the ConfigMap and can then be used for discovery.
* If the `authentication` usage is specified, the token can be used to authenticate for TLS bootstrap.
* `--ttl` The TTL for this token. This sets the expiration of the token as a duration from the current time. This is converted into an absolute UTC time as it is written into the token secret.
* `--description` Sets the free form description field for the token.
* `kubeadm token delete <token-id>|<token-id>.<token-secret>`
* Users can either just specify the id or the full token. This will delete the token if it exists.
* `kubeadm token list`
* List tokens in a table form listing out the `token-id.token-secret`, the TTL, the absolute expiration time, the usages, and the description.
* **Question** Support a `--json` or `-o json` way to make this info programmatic? We don't want to recreate `kubectl` here and these aren't plain API objects so we can't reuse that plumbing easily.
* `kubeadm token generate` This currently exists but is documented here for completeness. This pure client side method just generated a random token in the correct form.
## Implementation Details
Our documentations (and output from `kubeadm`) should stress to users that when the token is configured for authenitication and used for TLS bootstrap (using `--insecure-experimental-approve-all-kubelet-csrs-for-group`) it is essentially a root password on the cluster and should be protected as such. Users should set a TTL to limit this risk. Or, after the cluster is up and running, users should delete the token using `kubeadm token delete`.
After some back and forth, we decided to keep the separator in the token between the ID and Secret be a `.`. During the 1.6 cycle, at one point `:` was implemented but then reverted.
See https://github.com/kubernetes/client-go/issues/114 for details on creating a shared package with common constants for this scheme.
This proposal assumes RBAC to lock things down in a couple of ways. First, it will open up `cluster-info` ConfigMap in `kube-public` so that it is readable by unauthenticated users. Next, it will make it so that the identities in the `system:bootstrappers` group can only be used with the certs API to submit CSRs. After a TLS certificate is created, that identity should be used instead of the bootstrap token.
The binding of `system:bootstrappers` to the ability to submit certs is not part of the default RBAC configuration. Tools like `kubeadm` will have to explicitly create this binding.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/super-simple-discovery-api.md?pixel)]()

View File

@ -0,0 +1,168 @@
## Refactor Cloud Provider out of Kubernetes Core
As kubernetes has evolved tremendously, it has become difficult for different cloudproviders (currently 7) to make changes and iterate quickly. Moreover, the cloudproviders are constrained by the kubernetes build/release life-cycle. This proposal aims to move towards a kubernetes code base where cloud providers specific code will move out of the core repository and into "official" repositories, where it will be maintained by the cloud providers themselves.
### 1. Current use of Cloud Provider
The following components have cloudprovider dependencies
1. kube-controller-manager
2. kubelet
3. kube-apiserver
#### Cloud Provider in Kube-Controller-Manager
The kube-controller-manager has many controller loops
- nodeController
- volumeController
- routeController
- serviceController
- replicationController
- endpointController
- resourceQuotaController
- namespaceController
- deploymentController
- etc..
Among these controller loops, the following are cloud provider dependent.
- nodeController
- volumeController
- routeController
- serviceController
The nodeController uses the cloudprovider to check if a node has been deleted from the cloud. If cloud provider reports a node as deleted, then this controller immediately deletes the node from kubernetes. This check removes the need to wait for a specific amount of time to conclude that an inactive node is actually dead.
The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.
The routeController configures routes for hosts in the cloud provider.
The serviceController maintains a list of currently active nodes, and is responsible for creating and deleting LoadBalancers in the underlying cloud.
#### Cloud Provider in Kubelet
Moving on to the kubelet, the following cloud provider dependencies exist in kubelet.
- Find the cloud nodename of the host that kubelet is running on for the following reasons :
1. To obtain the config map for the kubelet, if one already exists
2. To uniquely identify current node using nodeInformer
3. To instantiate a reference to the current node object
- Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it
- Periodically poll the cloud provider to figure out if the node has any new IP addresses associated with it
- It sets a condition that makes the node unschedulable until cloud routes are configured.
- It allows the cloud provider to post process DNS settings
#### Cloud Provider in Kube-apiserver
Finally, in the kube-apiserver, the cloud provider is used for transferring SSH keys to all of the nodes, and within an admission controller for setting labels on persistent volumes.
### 2. Strategy for refactoring Kube-Controller-Manager
In order to create a 100% cloud independent controller manager, the controller-manager will be split into multiple binaries.
1. Cloud dependent controller-manager binaries
2. Cloud independent controller-manager binaries - This is the existing `kube-controller-manager` that is being shipped with kubernetes releases.
The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service. The rest of the controllers will be run in the cloud independent controller manager.
The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared datastructures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity.
Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary could link these controllers in, as it serves as a reference form of the controller implementation.
There are four controllers that rely on cloud provider specific code. These are node controller, service controller, route controller and attach detach controller. Copies of each of these controllers have been bundled them together into one binary. The cloud dependent binary registers itself as a controller, and runs the cloud specific controller loops with the user-agent named "external-controller-manager".
RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary.
NodeController does a lot more than just talk to the cloud. It does the following operations -
1. CIDR management
2. Monitor Node Status
3. Node Pod Eviction
While Monitoring Node status, if the status reported by kubelet is either 'ConditionUnknown' or 'ConditionFalse', then the controller checks if the node has been deleted from the cloud provider. If it has already been deleted from the cloud provider, then it deletes the nodeobject without waiting for the `monitorGracePeriod` amount of time. This is the only operation that needs to be moved into the cloud dependent controller manager.
Finally, The attachDetachController is tricky, and it is not simple to disentangle it from the controller-manager easily, therefore, this will be addressed with Flex Volumes (Discussed under a separate section below)
### 3. Strategy for refactoring Kubelet
The majority of the calls by the kubelet to the cloud is done during the initialization of the Node Object. The other uses are for configuring Routes (in case of GCE), scrubbing DNS, and periodically polling for IP addresses.
All of the above steps, except the Node initialization step can be moved into a controller. Specifically, IP address polling, and configuration of Routes can be moved into the cloud dependent controller manager.
Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.
This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints.
### 4. Strategy for refactoring Kube-ApiServer
Kube-apiserver uses the cloud provider for two purposes
1. Distribute SSH Keys - This can be moved to the cloud dependent controller manager
2. Admission Controller for PV - This can be refactored using the taints approach used in Kubelet
### 5. Strategy for refactoring Volumes
Volumes need cloud providers, but they only need SPECIFIC cloud providers. The majority of volume management logic resides in the controller manager. These controller loops need to be moved into the cloud-controller manager. The cloud controller manager also needs a mechanism to read parameters for initilization from cloud config. This can be done via config maps.
There is an entirely different approach to refactoring volumes - Flex Volumes. There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core.
### 6. Deployment, Upgrades and Downgrades
This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running.
##### 6.1 Upgrading kubelet and proxy
The kubelet and proxy runs on every node in the kubernetes cluster. Based on your setup (systemd/other), you can follow the normal upgrade steps for it. This change does not affect the kubelet and proxy upgrade steps for your setup.
##### 6.2 Upgrading plugins
Plugins such as cni, flex volumes can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.
###### 6.3 Upgrading kubernetes core services
The master node components (kube-controller-manager,kube-scheduler, kube-apiserver etc.) can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.
##### 6.4 Applying the cloud-controller-manager
This is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using
```
kubectl apply -f cloud-controller-manager.yml
```
This will start the cloud specific controller manager in your kuberentes setup.
The downgrade steps are also the same as before for all the components except the cloud-controller-manager. In case of the cloud-controller-manager, the deployment should be deleted using
```
kubectl delete -f cloud-controller-manager.yml
```
### 7. Roadmap
##### 7.1 Transition plan
Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager. At this stage, the cloud-controller-manager will still be in "beta" stage and optional.
Release 1.7: In this release, all of the supported turnups will be converted to use cloud controller by default. At this point users will still be allowed to opt-out. Users will be expected run the monolithic cloud controller binary. The cloud controller manager will still continue to use the existing library, but code will be factored out to reduce literal duplication between the controller-manager and the cloud-controller-manager. A deprecation announcement will be made to inform users to switch to the cloud-controller-manager.
Release 1.8: The main change aimed for this release is to break up the various cloud providers into individual binaries. Users will still be allowed to opt-out. There will be a second warning to inform users about the deprecation of the `--cloud-provider` option in the controller-manager.
Release 1.9: All of the legacy cloud providers will be completely removed in this version
##### 7.2 Code/Library Evolution
* Break controller-manager into 2 binaries. One binary will be the existing controller-manager, and the other will only run the cloud specific loops with no other changes. The new cloud-controller-manager will still load all the cloudprovider libraries, and therefore will allow the users to choose which cloud-provider to use.
* Move the cloud specific parts of kubelet out using the external admission controller pattern mentioned in the previous sections above.
* The cloud controller will then be made into a library. It will take the cloudprovider.Interface as an argument to its constructor. Individual cloudprovider binaries will be created using this library.
* Cloud specific operations will be moved out of kube-apiserver using the external admission controller pattern mentioned above.
* All cloud specific volume controller loops (attach, detach, provision operation controllers) will be switched to using flex volumes. Flex volumes do not need in-tree cloud specific calls.
* As the final step, all of the cloud provider specific code will be moved out of tree.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/cloud-provider-refactoring.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -0,0 +1,85 @@
# Cloud Provider (specifically GCE and AWS) metrics for Storage API calls
## Goal
Kubernetes should provide metrics such as - count & latency percentiles
for cloud provider API it uses to provision persistent volumes.
In a ideal world - we would want these metrics for all cloud providers
and for all API calls kubernetes makes but to limit the scope of this feature
we will implement metrics for:
* GCE
* AWS
We will also implement metrics only for storage API calls for now. This feature
does introduces hooks into kubernetes code which can be used to add additonal metrics
but we only focus on storage API calls here.
## Motivation
* Cluster admins should be able to monitor Cloud API usage of Kubernetes. It will help
them detect problems in certain scenarios which can blow up the API quota of Cloud
provider.
* Cluster admins should also be able to monitor health and latency of Cloud API on
which kubernetes depends on.
## Implementation
### Metric format and collection
Metrics emitted from cloud provider will fall under category of service metrics
as defined in [Kubernetes Monitoring Architecture](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/monitoring_architecture.md).
The metrics will be emitted using [Prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and available for collection
from `/metrics` HTTP endpoint of kubelet, controller etc. All Kubernetes core components already emit
metrics on `/metrics` HTTP endpoint. This proposal merely extends available metrics to include Cloud provider metrics as well.
Any collector which can parse Prometheus metric format should be able to collect
metrics from these endpoints.
A more detailed description of monitoring pipeline can be found in [Monitoring architecture] (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/monitoring_architecture.md#monitoring-pipeline) document.
#### Metric Types
Since we are interested in count(or rate) and latency percentile metrics of API calls Kubernetes is making to
the external Cloud Provider - we will use [Histogram](https://prometheus.io/docs/practices/histograms/) type for
emitting these metrics.
We will be using `HistogramVec` type so as we can attach dimensions at runtime. Whenever available
`namespace` will reported as a dimension with the metric.
### GCE Implementation
For GCE we simply use `gensupport.RegisterHook()` to register a function which will be called
when request is made and response returns.
To begin with we will start emitting following metrics for GCE. Because these metrics are of type
`Summary` - both count and latency will be automatically calculated.
1. gce_instance_list
2. gce_disk_insert
3. gce_disk_delete
4. gce_attach_disk
5. gce_detach_disk
6. gce_list_disk
A POC implementation can be found here - https://github.com/kubernetes/kubernetes/pull/40338/files
### AWS Implementation
For AWS currently we will use wrapper type `awsSdkEC2` to intercept all storage API calls and
emit metric datapoints. The reason we are not using approach used for `aws/log_handler` is - because AWS SDK doesn't uses Contexts and hence we can't pass custom information such as API call name or namespace to record with metrics.
To begin with we will start emitting following metrics for AWS:
1. aws_attach_volume
2. aws_create_tags
3. aws_create_volume
4. aws_delete_volume
5. aws_describe_instance
6. aws_describe_volume
7. aws_detach_volume

View File

@ -1,101 +1,420 @@
# ControllerRef proposal
Author: gmarek@
Last edit: 2016-05-11
Status: raw
* Authors: gmarek, enisoc
* Last edit: [2017-02-06](#history)
* Status: partially implemented
Approvers:
- [ ] briangrant
- [ ] dbsmith
* [ ] briangrant
* [ ] dbsmith
**Table of Contents**
- [Goal of ControllerReference](#goal-of-setreference)
- [Non goals](#non-goals)
- [API and semantic changes](#api-and-semantic-changes)
- [Upgrade/downgrade procedure](#upgradedowngrade-procedure)
- [Orphaning/adoption](#orphaningadoption)
- [Implementation plan (sketch)](#implementation-plan-sketch)
- [Considered alternatives](#considered-alternatives)
* [Goals](#goals)
* [Non-goals](#non-goals)
* [API](#api)
* [Behavior](#behavior)
* [Upgrading](#upgrading)
* [Implementation](#implementation)
* [Alternatives](#alternatives)
* [History](#history)
# Goal of ControllerReference
# Goals
Main goal of `ControllerReference` effort is to solve a problem of overlapping controllers that fight over some resources (e.g. `ReplicaSets` fighting with `ReplicationControllers` over `Pods`), which cause serious [problems](https://github.com/kubernetes/kubernetes/issues/24433) such as exploding memory of Controller Manager.
* The main goal of ControllerRef (controller reference) is to solve the problem
of controllers that fight over controlled objects due to overlapping selectors
(e.g. a ReplicaSet fighting with a ReplicationController over Pods because
both controllers have label selectors that match those Pods).
Fighting controllers can [destabilize the apiserver](https://github.com/kubernetes/kubernetes/issues/24433),
[thrash objects back-and-forth](https://github.com/kubernetes/kubernetes/issues/24152),
or [cause controller operations to hang](https://github.com/kubernetes/kubernetes/issues/8598).
We don't want to have (just) an in-memory solution, as we dont want a Controller Manager crash to cause massive changes in object ownership in the system. I.e. we need to persist the information about "owning controller".
We don't want to have just an in-memory solution because we don't want a
Controller Manager crash to cause a massive reshuffling of controlled objects.
We also want to expose the mapping so that controllers can be in multiple
processes (e.g. for HA of kube-controller-manager) and separate binaries
(e.g. for controllers that are API extensions).
Therefore, we will persist the mapping from each object to its controller in
the API object itself.
Secondary goal of this effort is to improve performance of various controllers and schedulers, by removing the need for expensive lookup for all matching "controllers".
* A secondary goal of ControllerRef is to provide back-links from a given object
to the controller that manages it, which can be used for:
* Efficient object->controller lookup, without having to list all controllers.
* Generic object grouping (e.g. in a UI), without having to know about all
third-party controller types in advance.
* Replacing certain uses of the `kubernetes.io/created-by` annotation,
and potentially enabling eventual deprecation of that annotation.
However, deprecation is not being proposed at this time, so any uses that
remain will be unaffected.
# Non goals
# Non-goals
Cascading deletion is not a goal of this effort. Cascading deletion will use `ownerReferences`, which is a [separate effort](garbage-collection.md).
* Overlapping selectors will continue to be considered user error.
`ControllerRef` will extend `OwnerReference` and reuse machinery written for it (GarbageCollector, adoption/orphaning logic).
ControllerRef will prevent this user error from destabilizing the cluster or
causing endless back-and-forth fighting between controllers, but it will not
make it completely safe to create controllers with overlapping selectors.
# API and semantic changes
In particular, this proposal does not address cases such as Deployment or
StatefulSet, in which "families" of orphans may exist that ought to be adopted
as indivisible units.
Since multiple controllers may race to adopt orphans, the user must ensure
selectors do not overlap to avoid breaking up families.
Breaking up families of orphans could result in corruption or loss of
Deployment rollout state and history, and possibly also corruption or loss of
StatefulSet application data.
There will be a new API field in the `OwnerReference` in which we will store an information if given owner is a managing controller:
* ControllerRef is not intended to replace [selector generation](selector-generation.md),
used by some controllers like Job to ensure all selectors are unique
and prevent overlapping selectors from occurring in the first place.
```
OwnerReference {
Controller bool
However, ControllerRef will still provide extra protection and consistent
cross-controller semantics for controllers that already use selector
generation. For example, selector generation can be manually overridden,
which leaves open the possibility of overlapping selectors due to user error.
* This proposal does not change how cascading deletion works.
Although ControllerRef will extend OwnerReference and rely on its machinery,
the [Garbage Collector](garbage-collection.md) will continue to implement
cascading deletion as before.
That is, the GC will look at all OwnerReferences without caring whether a
given OwnerReference happens to be a ControllerRef or not.
# API
The `Controller` API field in OwnerReference marks whether a given owner is a
managing controller:
```go
type OwnerReference struct {
// If true, this reference points to the managing controller.
// +optional
Controller *bool
}
```
From now on by `ControllerRef` we mean an `OwnerReference` with `Controller=true`.
A ControllerRef is thus defined as an OwnerReference with `Controller=true`.
Each object may have at most one ControllerRef in its list of OwnerReferences.
The validator for OwnerReferences lists will fail any update that would violate
this invariant.
Most controllers (all that manage collections of things defined by label selector) will have slightly changed semantics: currently controller owns an object if its selector matches objects labels and if it doesn't notice an older controller of the same kind that also matches the object's labels, but after introduction of `ControllerReference` a controller will own an object iff selector matches labels and the `OwnerReference` with `Controller=true`points to it.
# Behavior
If the owner's selector or owned object's labels change, the owning controller will be responsible for orphaning (clearing `Controller` field in the `OwnerReference` and/or deleting `OwnerReference` altogether) objects, after which adoption procedure (setting `Controller` field in one of `OwnerReferencec` and/or adding new `OwnerReferences`) might occur, if another controller has a selector matching.
This section summarizes the intended behavior for existing controllers.
It can also serve as a guide for respecting ControllerRef when writing new
controllers.
For debugging purposes we want to add an `adoptionTime` annotation prefixed with `kubernetes.io/` which will keep the time of last controller ownership transfer.
## The Three Laws of Controllers
# Upgrade/downgrade procedure
All controllers that manage collections of objects should obey the following
rules.
Because `ControllerRef` will be a part of `OwnerReference` effort it will have the same upgrade/downgrade procedures.
1. **Take ownership**
# Orphaning/adoption
A controller should claim *ownership* of any objects it creates by adding a
ControllerRef, and may also claim ownership of an object it didn't create,
as long as the object has no existing ControllerRef (i.e. it is an *orphan*).
Because `ControllerRef` will be a part of `OwnerReference` effort it will have the same orphaning/adoption procedures.
1. **Don't interfere**
Controllers will orphan objects they own in two cases:
* Change of label/selector causing selector to stop matching labels (executed by the controller)
* Deletion of a controller with `Orphaning=true` (executed by the GarbageCollector)
A controller should not take any action (e.g. edit/scale/delete) on an object
it does not own, except to [*adopt*](#adoption) the object if allowed by the
First Law.
We will need a secondary orphaning mechanism in case of unclean controller deletion:
* GarbageCollector will remove `ControllerRef` from objects that no longer points to existing controllers
1. **Don't share**
Controller will adopt (set `Controller` field in the `OwnerReference` that points to it) an object whose labels match its selector iff:
* there are no `OwnerReferences` with `Controller` set to true in `OwnerReferences` array
* `DeletionTimestamp` is not set
and
* Controller is the first controller that will manage to adopt the Pod from all Controllers that have matching label selector and don't have `DeletionTimestamp` set.
A controller should not count an object it does not own toward satisfying its
desired state (e.g. a certain number of replicas), although it may include
the object in plans to achieve its desired state (e.g. through adoption)
as long as such plans do not conflict with the First or Second Laws.
By design there are possible races during adoption if multiple controllers can own a given object.
## Adoption
To prevent re-adoption of an object during deletion the `DeletionTimestamp` will be set when deletion is starting. When a controller has a non-nil `DeletionTimestamp` it won't take any actions except updating its `Status` (in particular it won't adopt any objects).
If a controller finds an orphaned object (an object with no ControllerRef) that
matches its selector, it may try to adopt the object by adding a ControllerRef.
Note that whether or not the controller *should* try to adopt the object depends
on the particular controller and object.
# Implementation plan (sketch):
Multiple controllers can race to adopt a given object, but only one can win
by being the first to add a ControllerRef to the object's OwnerReferences list.
The losers will see their adoptions fail due to a validation error as explained
[above](#api).
* Add API field for `Controller`,
* Extend `OwnerReference` adoption procedure to set a `Controller` field in one of the owners,
* Update all affected controllers to respect `ControllerRef`.
If a controller has a non-nil `DeletionTimestamp`, it must not attempt adoption
or take any other actions except updating its `Status`.
This prevents readoption of objects orphaned by the [orphan finalizer](garbage-collection.md#part-ii-the-orphan-finalizer)
during deletion of the controller.
Necessary related work:
* `OwnerReferences` are correctly added/deleted,
* GarbageCollector removes dangling references,
* Controllers don't take any meaningful actions when `DeletionTimestamps` is set.
## Orphaning
# Considered alternatives
When a controller is deleted, the objects it owns will either be orphaned or
deleted according to the normal [Garbage Collection](garbage-collection.md)
behavior, based on OwnerReferences.
In addition, if a controller finds that it owns an object that no longer matches
its selector, it should orphan the object by removing itself from the object's
OwnerReferences list. Since ControllerRef is just a special type of
OwnerReference, this also means the ControllerRef is removed.
## Watches
Many controllers use watches to *sync* each controller instance (prompting it to
reconcile desired and actual state) as soon as a relevant event occurs for one
of its controlled objects, as well as to let controllers wait for asynchronous
operations to complete on those objects.
The controller subscribes to a stream of events about controlled objects
and routes each event to a particular controller instance.
Previously, the controller used only label selectors to decide which
controller to route an event to. If multiple controllers had overlapping
selectors, events might be misrouted, causing the wrong controllers to sync.
Controllers could also freeze because they keep waiting for an event that
already came but was misrouted, manifesting as `kubectl` commands that hang.
Some controllers introduced a workaround to break ties. For example, they would
sort all controller instances with matching selectors, first by creation
timestamp and then by name, and always route the event to the first controller
in this list. However, that did not prevent misrouting if the overlapping
controllers were of different types. It also only worked while controllers
themselves assigned ownership over objects using the same tie-break rules.
Now that controller ownership is defined in terms of ControllerRef,
controllers should use the following guidelines for responding to watch events:
* If the object has a ControllerRef:
* Sync only the referenced controller.
* Update `expectations` counters for the referenced controller.
* If an *Update* event removes the ControllerRef, sync any controllers whose
selectors match to give each one a chance to adopt the object.
* If the object is an orphan:
* *Add* event
* Sync any controllers whose selectors match to give each one a chance to
adopt the object.
* Do *not* update counters on `expectations`.
Controllers should never be waiting for creation of an orphan because
anything they create should have a ControllerRef.
* *Delete* event
* Do *not* sync any controllers.
Controllers should never care about orphans disappearing.
* Do *not* update counters on `expectations`.
Controllers should never be waiting for deletion of an orphan because they
are not allowed to delete objects they don't own.
* *Update* event
* If labels changed, sync any controllers whose selectors match to give each
one a chance to adopt the object.
## Default garbage collection policy
Controllers that used to rely on client-side cascading deletion should set a
[`DefaultGarbageCollectionPolicy`](https://github.com/kubernetes/kubernetes/blob/dd22743b54f280f41e68f206449a13ca949aca4e/pkg/genericapiserver/registry/rest/delete.go#L43)
of `rest.OrphanDependents` when they are updated to implement ControllerRef.
This ensures that deleting only the controller, without specifying the optional
`DeleteOptions.OrphanDependents` flag, remains a non-cascading delete.
Otherwise, the behavior would change to server-side cascading deletion by
default as soon as the controller manager is upgraded to a version that performs
adoption by setting ControllerRefs.
Example from [ReplicationController](https://github.com/kubernetes/kubernetes/blob/9ae2dfacf196ca7dbee798ee9c3e1663a5f39473/pkg/registry/core/replicationcontroller/strategy.go#L49):
```go
// DefaultGarbageCollectionPolicy returns Orphan because that was the default
// behavior before the server-side garbage collection was implemented.
func (rcStrategy) DefaultGarbageCollectionPolicy() rest.GarbageCollectionPolicy {
return rest.OrphanDependents
}
```
New controllers that don't have legacy behavior to preserve can omit this
controller-specific default to use the [global default](https://github.com/kubernetes/kubernetes/blob/2bb1e7581544b9bd059eafe6ac29775332e5a1d6/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L543),
which is to enable server-side cascading deletion.
## Controller-specific behavior
This section lists considerations specific to a given controller.
* **ReplicaSet/ReplicationController**
* These controllers currenly only enable ControllerRef behavior when the
Garbage Collector is enabled. When ControllerRef was first added to these
controllers, the main purpose was to enable server-side cascading deletion
via the Garbage Collector, so it made sense to gate it behind the same flag.
However, in order to achieve the [goals](#goals) of this proposal, it is
necessary to set ControllerRefs and perform adoption/orphaning regardless of
whether server-side cascading deletion (the Garbage Collector) is enabled.
For example, turning off the GC should not cause controllers to start
fighting again. Therefore, these controllers will be updated to always
enable ControllerRef.
* **StatefulSet**
* A StatefulSet will not adopt any Pod whose name does not match the template
it uses to create new Pods: `{statefulset name}-{ordinal}`.
This is because Pods in a given StatefulSet form a "family" that may use pod
names (via their generated DNS entries) to coordinate among themselves.
Adopting Pods with the wrong names would violate StatefulSet's semantics.
Adoption is allowed when Pod names match, so it remains possible to orphan a
family of Pods (by deleting their StatefulSet without cascading) and then
create a new StatefulSet with the same name and selector to adopt them.
* **CronJob**
* CronJob [does not use watches](https://github.com/kubernetes/kubernetes/blob/9ae2dfacf196ca7dbee798ee9c3e1663a5f39473/pkg/controller/cronjob/cronjob_controller.go#L20),
so [that section](#watches) doesn't apply.
Instead, all CronJobs are processed together upon every "sync".
* CronJob applies a `created-by` annotation to link Jobs to the CronJob that
created them.
If a ControllerRef is found, it should be used instead to determine this
link.
## Created-by annotation
Aside from the change to CronJob mentioned above, several other uses of the
`kubernetes.io/created-by` annotation have been identified that would be better
served by ControllerRef because it tracks who *currently* controls an object,
not just who originally created it.
As a first step, the specific uses identified in the [Implementation](#implementation)
section will be augmented to prefer ControllerRef if one is found.
If no ControllerRef is found, they will fall back to looking at `created-by`.
# Upgrading
In the absence of controllers with overlapping selectors, upgrading or
downgrading the master to or from a version that introduces ControllerRef
should have no user-visible effects.
If no one is fighting, adoption should always succeed eventually, so ultimately
only the selectors matter on either side of the transition.
If there are controllers with overlapping selectors at the time of an *upgrade*:
* Back-and-forth thrashing should stop after the upgrade.
* The ownership of existing objects might change due to races during
[adoption](#adoption). As mentioned in the [non-goals](#non-goals) section,
this can include breaking up families of objects that should have stayed
together.
* Controllers might create additional objects because they start to respect the
["Don't share"](#behavior) rule.
If there are controllers with overlapping selectors at the time of a
*downgrade*:
* Controllers may begin to fight and thrash objects.
* The ownership of existing objects might change due to ignoring ControllerRef.
* Controllers might delete objects because they stop respecting the
["Don't share"](#behavior) rule.
# Implementation
Checked items had been completed at the time of the [last edit](#history) of
this proposal.
* [x] Add API field for `Controller` to the `OwnerReference` type.
* [x] Add validator that prevents an object from having multiple ControllerRefs.
* [x] Add `ControllerRefManager` types to encapsulate ControllerRef manipulation
logic.
* [ ] Update all affected controllers to respect ControllerRef.
* [ ] ReplicationController
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
* [x] Don't adopt/manage objects.
* [ ] Don't orphan objects.
* [x] Include ControllerRef on all created objects.
* [x] Set DefaultGarbageCollectionPolicy to OrphanDependents.
* [x] Use ControllerRefManager to adopt and orphan.
* [ ] Enable ControllerRef regardless of `--enable-garbage-collector` flag.
* [ ] Use ControllerRef to map watch events to controllers.
* [ ] ReplicaSet
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
* [x] Don't adopt/manage objects.
* [ ] Don't orphan objects.
* [x] Include ControllerRef on all created objects.
* [x] Set DefaultGarbageCollectionPolicy to OrphanDependents.
* [x] Use ControllerRefManager to adopt and orphan.
* [ ] Enable ControllerRef regardless of `--enable-garbage-collector` flag.
* [ ] Use ControllerRef to map watch events to controllers.
* [ ] StatefulSet
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
* [ ] Include ControllerRef on all created objects.
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
* [ ] Use ControllerRefManager to adopt and orphan.
* [ ] Use ControllerRef to map watch events to controllers.
* [ ] DaemonSet
* [x] Don't touch controlled objects if DeletionTimestamp is set.
* [ ] Include ControllerRef on all created objects.
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
* [ ] Use ControllerRefManager to adopt and orphan.
* [ ] Use ControllerRef to map watch events to controllers.
* [ ] Deployment
* [x] Don't touch controlled objects if DeletionTimestamp is set.
* [x] Include ControllerRef on all created objects.
* [x] Set DefaultGarbageCollectionPolicy to OrphanDependents.
* [x] Use ControllerRefManager to adopt and orphan.
* [ ] Use ControllerRef to map watch events to controllers.
* [ ] Job
* [x] Don't touch controlled objects if DeletionTimestamp is set.
* [ ] Include ControllerRef on all created objects.
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
* [ ] Use ControllerRefManager to adopt and orphan.
* [ ] Use ControllerRef to map watch events to controllers.
* [ ] CronJob
* [ ] Don't touch controlled objects if DeletionTimestamp is set.
* [ ] Include ControllerRef on all created objects.
* [ ] Set DefaultGarbageCollectionPolicy to OrphanDependents.
* [ ] Use ControllerRefManager to adopt and orphan.
* [ ] Use ControllerRef to map Jobs to their parent CronJobs.
* [ ] Tests
* [ ] Update existing controller tests to use ControllerRef.
* [ ] Add test for overlapping controllers of different types.
* [ ] Replace or augment uses of `CreatedByAnnotation` with ControllerRef.
* [ ] `kubectl describe` list of controllers for an object.
* [ ] `kubectl drain` Pod filtering.
* [ ] Classifying failed Pods in e2e test framework.
# Alternatives
The following alternatives were considered:
* Centralized "ReferenceController" component that manages adoption/orphaning.
Not chosen because:
* Hard to make it work for all imaginable 3rd party objects.
* Adding hooks to framework makes it possible for users to write their own
logic.
* Generic "ReferenceController": centralized component that managed adoption/orphaning
* Dropped because: hard to write something that will work for all imaginable 3rd party objects, adding hooks to framework makes it possible for users to write their own logic
* Separate API field for `ControllerRef` in the ObjectMeta.
* Dropped because: nontrivial relationship between `ControllerRef` and `OwnerReferences` when it comes to deletion/adoption.
Not chosen because:
* Complicated relationship between `ControllerRef` and `OwnerReference`
when it comes to deletion/adoption.
# History
Summary of significant revisions to this document:
* 2017-02-06 (enisoc)
* [Controller-specific behavior](#controller-specific-behavior)
* Enable ControllerRef regardless of whether GC is enabled.
* [Implementation](#implementation)
* Audit whether existing controllers respect DeletionTimestamp.
* 2017-02-01 (enisoc)
* Clarify existing specifications and add details not previously specified.
* [Non-goals](#non-goals)
* Make explicit that overlapping selectors are still user error.
* [Behavior](#behavior)
* Summarize fundamental rules that all new controllers should follow.
* Explain how the validator prevents multiple ControllerRefs on an object.
* Specify how ControllerRef should affect the use of watches/expectations.
* Specify important controller-specific behavior for existing controllers.
* Specify necessary changes to default GC policy when adding ControllerRef.
* Propose changing certain uses of `created-by` annotation to ControllerRef.
* [Upgrading](#upgrading)
* Specify ControllerRef-related behavior changes upon upgrade/downgrade.
* [Implementation](#implementation)
* List all work to be done and mark items already completed as of this edit.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/controller-ref.md?pixel)]()

View File

@ -2,7 +2,7 @@
**Author**: David Ashpole (@dashpole)
**Last Updated**: 1/19/2017
**Last Updated**: 1/31/2017
**Status**: Proposal
@ -21,8 +21,7 @@ This document proposes a design for the set of metrics included in an eventual C
- [Metric Requirements:](#metric-requirements)
- [Proposed Core Metrics:](#proposed-core-metrics)
- [On-Demand Design:](#on-demand-design)
- [Implementation Plan](#implementation-plan)
- [Rollout Plan](#rollout-plan)
- [Future Work](#future-work)
<!-- END MUNGE: GENERATED_TOC -->
@ -51,12 +50,12 @@ The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/mast
By publishing core metrics, the kubelet is relieved of its responsibility to provide metrics for monitoring.
The third party monitoring pipeline also is relieved of any responsibility to provide these metrics to system components.
cAdvisor is structured to collect metrics on an interval, which is appropriate for a stand-alone metrics collector. However, many functions in the kubelet are latency-sensitive (eviction, for example), and would benifit from a more "On-Demand" metrics collection design.
cAdvisor is structured to collect metrics on an interval, which is appropriate for a stand-alone metrics collector. However, many functions in the kubelet are latency-sensitive (eviction, for example), and would benefit from a more "On-Demand" metrics collection design.
### Proposal
This proposal is to use this set of core metrics, collected by the kubelet, and used solely by kubernetes system components to support "First-Class Resource Isolation and Utilization Features". This proposal is not designed to be an API published by the kubelet, but rather a set of metrics collected by the kubelet that will be transformed, and published in the future.
The target "Users" of this set of metrics are kubernetes components (though not neccessarily directly). This set of metrics itself is not designed to be user-facing, but is designed to be general enough to support user-facing components.
The target "Users" of this set of metrics are kubernetes components (though not necessarily directly). This set of metrics itself is not designed to be user-facing, but is designed to be general enough to support user-facing components.
### Non Goals
Everything covered in the [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) design doc will not be covered in this proposal. This includes the third party metrics pipeline, and the methods by which the metrics found in this proposal are provided to other kubernetes components.
@ -105,7 +104,7 @@ Metrics requirements for "First Class Resource Isolation and Utilization Feature
### Proposed Core Metrics:
This section defines "usage metrics" for filesystems, CPU, and Memory.
As stated in Non-Goals, this proposal does not attempt to define the specific format by which these are exposed. For convenience, it may be neccessary to include static information such as start time, node capacities for CPU, Memory, or filesystems, and more.
As stated in Non-Goals, this proposal does not attempt to define the specific format by which these are exposed. For convenience, it may be necessary to include static information such as start time, node capacities for CPU, Memory, or filesystems, and more.
```go
// CpuUsage holds statistics about the amount of cpu time consumed
@ -146,17 +145,10 @@ The interface for exposing these metrics within the kubelet contains methods for
Implementation:
To keep performance bounded while still offering metrics "On-Demand", all calls to get metrics are cached, and a minimum recency is established to prevent repeated metrics computation. Before computing new metrics, the previous metrics are checked to see if they meet the recency requirements of the caller. If the age of the metrics meet the recency requirements, then the cached metrics are returned. If not, then new metrics are computed and cached.
## Implementation Plan
@dashpole will modify the structure of metrics collection code to be "On-Demand".
## Future work
Suggested, tentative future work, which may be covered by future proposals:
- Publish these metrics in some form to a kubelet API endpoint
- Obtain all runtime-specific information needed to collect metrics from the CRI.
- Kubernetes can be configured to run a default "third party metrics provider" as a daemonset. Possibly standalone cAdvisor.
## Rollout Plan
Once this set of metrics is accepted, @dashpole will begin discussions on the format, and design of the endpoint that exposes them. The node resource metrics endpoint (TBD) will be added alongside the current Summary API in an upcoming release. This should allow concurrent developments of other portions of the system metrics pipeline (metrics-server, for example). Once this addition is made, all other changes will be internal, and will not require any API changes.
@dashpole will also start discussions on integrating with the CRI, and discussions on how to provide an out-of-the-box solution for the "third party monitoring" pipeline on the node. One current idea is a standalone verison of cAdvisor, but any third party metrics solution could serve this function as well.
- Decide on the format, name, and kubelet endpoint for publishing these metrics.
- Integrate with the CRI to allow compatibility with a greater number of runtimes, and to create a better runtime abstraction.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

View File

@ -0,0 +1,127 @@
# CRI: Dockershim PodSandbox Checkpoint
## Umbrella Issue
[#34672](https://github.com/kubernetes/kubernetes/issues/34672)
## Background
[Container Runtime Interface (CRI)](../devel/container-runtime-interface.md)
is an ongoing project to allow container runtimes to integrate with
kubernetes via a newly-defined API.
[Dockershim](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/dockershim)
is the Docker CRI implementation. This proposal aims to introduce
checkpoint mechanism in dockershim.
## Motivation
### Why do we need checkpoint?
With CRI, Kubelet only passes configurations (SandboxConfig,
ContainerConfig and ImageSpec) when creating sandbox, container and
image, and only use the reference id to manage them after creation.
However, information in configuration is not only needed during creation.
In the case of dockershim with CNI network plugin, CNI plugins needs
the same information from PodSandboxConfig at creation and deletion.
```
Kubelet ---------------------------------
| RunPodSandbox(PodSandboxConfig)
| StopPodSandbox(PodSandboxID)
V
Dockershim-------------------------------
| SetUpPod
| TearDownPod
V
Network Plugin---------------------------
| ADD
| DEL
V
CNI plugin-------------------------------
```
In addition, checkpoint helps to improve the reliability of dockershim.
With checkpoints, critical information for disaster recovery could be
preserved. Kubelet makes decisions based on the reported pod states
from runtime shims. Dockershim currently gathers states from docker
engine. However, in case of disaster, docker engine may lose all
container information, including the reference ids. Without necessary
information, kubelet and dockershim could not conduct proper clean up.
For example, if docker containers are removed underneath kubelet, reference
to the allocated IPs and iptables setup for the pods are also lost.
This leads to resource leak and potential iptables rule conflict.
### Why checkpoint in dockershim?
- CNI specification does not require CNI plugins to be stateful. And CNI
specification does not provide interface to retrieve states from CNI plugins.
- Currently there is no uniform checkpoint requirements across existing runtime shims.
- Need to preserve backward compatibility for kubelet.
- Easier to maintain backward compatibility by checkpointing at a lower level.
## PodSandbox Checkpoint
Checkpoint file will be created for each PodSandbox. Files will be
placed under `/var/lib/dockershim/sandbox/`. File name will be the
corresponding `PodSandboxID`. File content will be json encoded.
Data structure is as follows:
```go
const schemaVersion = "v1"
type Protocol string
// PortMapping is the port mapping configurations of a sandbox.
type PortMapping struct {
// Protocol of the port mapping.
Protocol *Protocol `json:"protocol,omitempty"`
// Port number within the container.
ContainerPort *int32 `json:"container_port,omitempty"`
// Port number on the host.
HostPort *int32 `json:"host_port,omitempty"`
}
// CheckpointData contains all types of data that can be stored in the checkpoint.
type CheckpointData struct {
PortMappings []*PortMapping `json:"port_mappings,omitempty"`
}
// PodSandboxCheckpoint is the checkpoint structure for a sandbox
type PodSandboxCheckpoint struct {
// Version of the pod sandbox checkpoint schema.
Version string `json:"version"`
// Pod name of the sandbox. Same as the pod name in the PodSpec.
Name string `json:"name"`
// Pod namespace of the sandbox. Same as the pod namespace in the PodSpec.
Namespace string `json:"namespace"`
// Data to checkpoint for pod sandbox.
Data *CheckpointData `json:"data,omitempty"`
}
```
## Workflow Changes
`RunPodSandbox` creates checkpoint:
```
() --> Pull Image --> Create Sandbox Container --> (Create Sandbox Checkpoint) --> Start Sandbox Container --> Set Up Network --> ()
```
`RemovePodSandbox` removes checkpoint:
```
() --> Remove Sandbox --> (Remove Sandbox Checkpoint) --> ()
```
`ListPodSandbox` need to include all PodSandboxes as long as their
checkpoint files exist. If sandbox checkpoint exists but sandbox
container could not be found, the PodSandbox object will include
PodSandboxID, namespace and name. PodSandbox state will be `PodSandboxState_SANDBOX_NOTREADY`.
`StopPodSandbox` and `RemovePodSandbox` need to conduct proper error handling to ensure idempotency.
## Future extensions
This proposal is mainly driven by networking use cases. More could be added into checkpoint.

View File

@ -1,8 +1,8 @@
# ScheduledJob Controller
# CronJob Controller (previously ScheduledJob)
## Abstract
A proposal for implementing a new controller - ScheduledJob controller - which
A proposal for implementing a new controller - CronJob controller - which
will be responsible for managing time based jobs, namely:
* once at a specified point in time,
* repeatedly at a specified point in time.
@ -23,20 +23,20 @@ There are also similar solutions available, already:
## Motivation
ScheduledJobs are needed for performing all time-related actions, namely backups,
CronJobs are needed for performing all time-related actions, namely backups,
report generation and the like. Each of these tasks should be allowed to run
repeatedly (once a day/month, etc.) or once at a given point in time.
## Design Overview
Users create a ScheduledJob object. One ScheduledJob object
Users create a CronJob object. One CronJob object
is like one line of a crontab file. It has a schedule of when to run,
in [Cron](https://en.wikipedia.org/wiki/Cron) format.
The ScheduledJob controller creates a Job object [Job](job.md)
about once per execution time of the scheduled (e.g. once per
The CronJob controller creates a Job object [Job](job.md)
about once per execution time of the schedule (e.g. once per
day for a daily schedule.) We say "about" because there are certain
circumstances where two jobs might be created, or no job might be
created. We attempt to make these rare, but do not completely prevent
@ -44,45 +44,45 @@ them. Therefore, Jobs should be idempotent.
The Job object is responsible for any retrying of Pods, and any parallelism
among pods it creates, and determining the success or failure of the set of
pods. The ScheduledJob does not examine pods at all.
pods. The CronJob does not examine pods at all.
### ScheduledJob resource
### CronJob resource
The new `ScheduledJob` object will have the following contents:
The new `CronJob` object will have the following contents:
```go
// ScheduledJob represents the configuration of a single scheduled job.
type ScheduledJob struct {
// CronJob represents the configuration of a single cron job.
type CronJob struct {
TypeMeta
ObjectMeta
// Spec is a structure defining the expected behavior of a job, including the schedule.
Spec ScheduledJobSpec
Spec CronJobSpec
// Status is a structure describing current status of a job.
Status ScheduledJobStatus
Status CronJobStatus
}
// ScheduledJobList is a collection of scheduled jobs.
type ScheduledJobList struct {
// CronJobList is a collection of cron jobs.
type CronJobList struct {
TypeMeta
ListMeta
Items []ScheduledJob
Items []CronJob
}
```
The `ScheduledJobSpec` structure is defined to contain all the information how the actual
The `CronJobSpec` structure is defined to contain all the information how the actual
job execution will look like, including the `JobSpec` from [Job API](job.md)
and the schedule in [Cron](https://en.wikipedia.org/wiki/Cron) format. This implies
that each ScheduledJob execution will be created from the JobSpec actual at a point
that each CronJob execution will be created from the JobSpec actual at a point
in time when the execution will be started. This also implies that any changes
to ScheduledJobSpec will be applied upon subsequent execution of a job.
to CronJobSpec will be applied upon subsequent execution of a job.
```go
// ScheduledJobSpec describes how the job execution will look like and when it will actually run.
type ScheduledJobSpec struct {
// CronJobSpec describes how the job execution will look like and when it will actually run.
type CronJobSpec struct {
// Schedule contains the schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
Schedule string
@ -99,12 +99,12 @@ type ScheduledJobSpec struct {
Suspend bool
// JobTemplate is the object that describes the job that will be created when
// executing a ScheduledJob.
// executing a CronJob.
JobTemplate *JobTemplateSpec
}
// JobTemplateSpec describes of the Job that will be created when executing
// a ScheduledJob, including its standard metadata.
// a CronJob, including its standard metadata.
type JobTemplateSpec struct {
ObjectMeta
@ -119,7 +119,7 @@ type JobTemplateSpec struct {
type ConcurrencyPolicy string
const (
// AllowConcurrent allows ScheduledJobs to run concurrently.
// AllowConcurrent allows CronJobs to run concurrently.
AllowConcurrent ConcurrencyPolicy = "Allow"
// ForbidConcurrent forbids concurrent runs, skipping next run if previous
@ -131,13 +131,13 @@ const (
)
```
`ScheduledJobStatus` structure is defined to contain information about scheduled
`CronJobStatus` structure is defined to contain information about cron
job executions. The structure holds a list of currently running job instances
and additional information about overall successful and unsuccessful job executions.
```go
// ScheduledJobStatus represents the current state of a Job.
type ScheduledJobStatus struct {
// CronJobStatus represents the current state of a Job.
type CronJobStatus struct {
// Active holds pointers to currently running jobs.
Active []ObjectReference
@ -159,7 +159,7 @@ Users must use a generated selector for the job.
TODO for beta: forbid manual selector since that could cause confusing between
subsequent jobs.
### Running ScheduledJobs using kubectl
### Running CronJobs using kubectl
A user should be able to easily start a Scheduled Job using `kubectl` (similarly
to running regular jobs). For example to run a job with a specified schedule,
@ -178,21 +178,21 @@ In the above example:
## Fields Added to Job Template
When the controller creates a Job from the JobTemplateSpec in the ScheduledJob, it
When the controller creates a Job from the JobTemplateSpec in the CronJob, it
adds the following fields to the Job:
- a name, based on the ScheduledJob's name, but with a suffix to distinguish
- a name, based on the CronJob's name, but with a suffix to distinguish
multiple executions, which may overlap.
- the standard created-by annotation on the Job, pointing to the SJ that created it
The standard key is `kubernetes.io/created-by`. The value is a serialized JSON object, like
`{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ScheduledJob","namespace":"default",`
`{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"CronJob","namespace":"default",`
`"name":"nightly-earnings-report","uid":"5ef034e0-1890-11e6-8935-42010af0003e","apiVersion":...`
This serialization contains the UID of the parent. This is used to match the Job to the SJ that created
it.
## Updates to ScheduledJobs
## Updates to CronJobs
If the schedule is updated on a ScheduledJob, it will:
If the schedule is updated on a CronJob, it will:
- continue to use the Status.Active list of jobs to detect conflicts.
- try to fulfill all recently-passed times for the new schedule, by starting
new jobs. But it will not try to fulfill times prior to the
@ -202,16 +202,16 @@ If the schedule is updated on a ScheduledJob, it will:
- Example: If you have a schedule to run every hour, change that to 30-minutely, at 31 minutes past the hour,
one run will be started immediately for the starting time that has just passed.
If the job template of a ScheduledJob is updated, then future executions use the new template
If the job template of a CronJob is updated, then future executions use the new template
but old ones still satisfy the schedule and are not re-run just because the template changed.
If you delete and replace a ScheduledJob with one of the same name, it will:
If you delete and replace a CronJob with one of the same name, it will:
- not use any old Status.Active, and not consider any existing running or terminated jobs from the previous
ScheduledJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
CronJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
- If there is an existing Job with the same time-based hash in its name (see below), then
new instances of that job will not be able to be created. So, delete it if you want to re-run.
with the same name as conflicts.
- not "re-run" jobs for "start times" before the creation time of the new ScheduledJobJob object.
- not "re-run" jobs for "start times" before the creation time of the new CronJobJob object.
- not consider executions from the previous UID when making decisions about what executions to
start, or status, etc.
- lose the history of the old SJ.
@ -223,11 +223,11 @@ To preserve status, you can suspend the old one, and make one with a new name, o
### Starting Jobs in the face of controller failures
If the process with the scheduledJob controller in it fails,
and takes a while to restart, the scheduledJob controller
If the process with the cronJob controller in it fails,
and takes a while to restart, the cronJob controller
may miss the time window and it is too late to start a job.
With a single scheduledJob controller process, we cannot give
With a single cronJob controller process, we cannot give
very strong assurances about not missing starting jobs.
With a suggested HA configuration, there are multiple controller
@ -254,10 +254,10 @@ There are three problems here:
Multiple jobs might be created in the following sequence:
1. scheduled job controller sends request to start Job J1 to fulfill start time T.
1. cron job controller sends request to start Job J1 to fulfill start time T.
1. the create request is accepted by the apiserver and enqueued but not yet written to etcd.
1. scheduled job controller crashes
1. new scheduled job controller starts, and lists the existing jobs, and does not see one created.
1. cron job controller crashes
1. new cron job controller starts, and lists the existing jobs, and does not see one created.
1. it creates a new one.
1. the first one eventually gets written to etcd.
1. there are now two jobs for the same start time.
@ -286,24 +286,24 @@ This is too hard to do for the alpha version. We will await user
feedback to see if the "at most once" property is needed in the beta version.
This is awkward but possible for a containerized application ensure on it own, as it needs
to know what ScheduledJob name and Start Time it is from, and then record the attempt
to know what CronJob name and Start Time it is from, and then record the attempt
in a shared storage system. We should ensure it could extract this data from its annotations
using the downward API.
## Name of Jobs
A ScheduledJob creates one Job at each time when a Job should run.
A CronJob creates one Job at each time when a Job should run.
Since there may be concurrent jobs, and since we might want to keep failed
non-overlapping Jobs around as a debugging record, each Job created by the same ScheduledJob
non-overlapping Jobs around as a debugging record, each Job created by the same CronJob
needs a distinct name.
To make the Jobs from the same ScheduledJob distinct, we could use a random string,
in the way that pods have a `generateName`. For example, a scheduledJob named `nightly-earnings-report`
To make the Jobs from the same CronJob distinct, we could use a random string,
in the way that pods have a `generateName`. For example, a cronJob named `nightly-earnings-report`
in namespace `ns1` might create a job `nightly-earnings-report-3m4d3`, and later create
a job called `nightly-earnings-report-6k7ts`. This is consistent with pods, but
does not give the user much information.
Alternatively, we can use time as a uniquifier. For example, the same scheduledJob could
Alternatively, we can use time as a uniquifier. For example, the same cronJob could
create a job called `nightly-earnings-report-2016-May-19`.
However, for Jobs that run more than once per day, we would need to represent
time as well as date. Standard date formats (e.g. RFC 3339) use colons for time.
@ -312,7 +312,7 @@ will annoy some users.
Also, date strings are much longer than random suffixes, which means that
the pods will also have long names, and that we are more likely to exceed the
253 character name limit when combining the scheduled-job name,
253 character name limit when combining the cron-job name,
the time suffix, and pod random suffix.
One option would be to compute a hash of the nominal start time of the job,
@ -331,5 +331,5 @@ Below are the possible future extensions to the Job controller:
types of resources. This relates to the work happening in [#18215](https://issues.k8s.io/18215).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/scheduledjob.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/cronjob.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -0,0 +1,329 @@
Custom Metrics API
==================
The new [metrics monitoring vision](monitoring_architecture.md) proposes
an API that the Horizontal Pod Autoscaler can use to access arbitrary
metrics.
Similarly to the [master metrics API](resource-metrics-api.md), the new
API should be structured around accessing metrics by referring to
Kubernetes objects (or groups thereof) and a metric name. For this
reason, the API could be useful for other consumers (most likely
controllers) that want to consume custom metrics (similarly to how the
master metrics API is generally useful to multiple cluster components).
The HPA can refer to metrics describing all pods matching a label
selector, as well as an arbitrary named object.
API Paths
---------
The root API path will look like `/apis/custom-metrics/v1alpha1`. For
brevity, this will be left off below.
- `/{object-type}/{object-name}/{metric-name...}`: retrieve the given
metric for the given non-namespaced object (e.g. Node, PersistentVolume)
- `/{object-type}/*/{metric-name...}`: retrieve the given metric for all
non-namespaced objects of the given type
- `/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the
given metric for all non-namespaced objects of the given type matching
the given label selector
- `/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}`:
retrieve the given metric for the given namespaced object
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}`: retrieve the given metric for all
namespaced objects of the given type
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the given
metric for all namespaced objects of the given type matching the
given label selector
- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given
metric which describes the given namespace.
For example, to retrieve the custom metric "hits-per-second" for all
ingress objects matching "app=frontend` in the namespaces "webapp", the
request might look like:
```
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/ingress.extensions/*/hits-per-second?labelSelector=app%3Dfrontend`
---
Verb: GET
Namespace: webapp
APIGroup: custom-metrics
APIVersion: v1alpha1
Resource: ingress.extensions
Subresource: hits-per-second
Name: ResourceAll(*)
```
Notice that getting metrics which describe a namespace follows a slightly
different pattern from other resources; Since namespaces cannot feasibly
have unbounded subresource names (due to collision with resource names,
etc), we introduce a pseudo-resource named "metrics", which represents
metrics describing namespaces, where the resource name is the metric name:
```
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/metrics/queue-length
---
Verb: GET
Namespace: webapp
APIGroup: custom-metrics
APIVersion: v1alpha1
Resource: metrics
Name: queue-length
```
NB: the branch-node LIST operations (e.g. `LIST
/apis/custom-metrics/v1alpha1/namespaces/webapp/pods/`) are unsupported in
v1alpha1. They may be defined in a later version of the API.
API Path Design, Discovery, and Authorization
---------------------------------------------
The API paths in this proposal are designed to a) resemble normal
Kubernetes APIs, b) facilitate writing authorization rules, and c)
allow for discovery.
Since the API structure follows the same structure as other Kubernetes
APIs, it allows for fine grained control over access to metrics. Access
can be controlled on a per-metric basic (each metric is a subresource, so
metrics may be whitelisted by allowing access to a particular
resource-subresource pair), or granted in general for a namespace (by
allowing access to any resource in the `custom-metrics` API group).
Similarly, since metrics are simply subresources, a normal Kubernetes API
discovery document can be published by the adapter's API server, allowing
clients to discover the available metrics.
Note that we introduce the syntax of having a name of ` * ` here since
there is no current syntax for getting the output of a subresource on
multiple objects.
API Objects
-----------
The request URLs listed above will return the `MetricValueList` type described
below (when a name is given that is not ` * `, the API should simply return a
list with a single element):
```go
// a list of values for a given metric for some set of objects
type MetricValueList struct {
metav1.TypeMeta`json:",inline"`
metav1.ListMeta`json:"metadata,omitempty"`
// the value of the metric across the described objects
Items []MetricValue `json:"items"`
}
// a metric value for some object
type MetricValue struct {
metav1.TypeMeta`json:",inline"`
// a reference to the described object
DescribedObject ObjectReference `json:"describedObject"`
// the name of the metric
MetricName string `json:"metricName"`
// indicates the time at which the metrics were produced
Timestamp unversioned.Time `json:"timestamp"`
// indicates the window ([Timestamp-Window, Timestamp]) from
// which these metrics were calculated, when returning rate
// metrics calculated from cumulative metrics (or zero for
// non-calculated instantaneous metrics).
WindowSeconds *int64 `json:"window,omitempty"`
// the value of the metric for this
Value resource.Quantity
}
```
For instance, the example request above would yield the following object:
```json
{
"kind": "MetricValueList",
"apiVersion": "custom-metrics/v1alpha1",
"items": [
{
"metricName": "hits-per-second",
"describedObject": {
"kind": "Ingress",
"apiVersion": "extensions",
"name": "server1",
"namespace": "webapp"
},
"timestamp": SOME_TIMESTAMP_HERE,
"windowSeconds": "10",
"value": "10"
},
{
"metricName": "hits-per-second",
"describedObject": {
"kind": "Ingress",
"apiVersion": "extensions",
"name": "server2",
"namespace": "webapp"
},
"timestamp": ANOTHER_TIMESTAMP_HERE,
"windowSeconds": "10",
"value": "15"
}
]
}
```
Semantics
---------
### Object Types ###
In order to properly identify resources, we must use resource names
qualified with group names (since the group for the requests will always
be `custom-metrics`).
The `object-type` parameter should be the string form of
`unversioned.GroupResource`. Note that we do not include version in this;
we simply wish to uniquely identify all the different types of objects in
Kubernetes. For example, the pods resource (which exists in the un-named
legacy API group) would be represented simply as `pods`, while the jobs
resource (which exists in the `batch` API group) would be represented as
`jobs.batch`.
In the case of cross-group object renames, the adapter should maintain
a list of "equivalent versions" that the monitoring system uses. This is
monitoring-system dependent (for instance, the monitoring system might
record all HorizontalPodAutoscalers as in `autoscaling`, but should be
aware that HorizontalPodAutoscaler also exist in `extensions`).
Note that for namespace metrics, we use a pseudo-resource called
`metrics`. Since there is no resource in the legacy API group, this will
not clash with any existing resources.
### Metric Names ###
Metric names must be able to appear as a single subresource. In particular,
metric names, *as passed to the API*, may not contain the characters '%', '/',
or '?', and may not be named '.' or '..' (but may contain these sequences).
Note, specifically, that URL encoding is not acceptable to escape the forbidden
characters, due to issues in the Go URL handling libraries. Otherwise, metric
names are open-ended.
### Metric Values and Timing ###
There should be only one metric value per object requested. The returned
metrics should be the most recently available metrics, as with the resource
metrics API. Implementers *should* attempt to return all metrics with roughly
identical timestamps and windows (when appropriate), but consumers should also
verify that any differences in timestamps are within tolerances for
a particular application (e.g. a dashboard might simply display the older
metric with a note, while the horizontal pod autoscaler controller might choose
to pretend it did not receive that metric value).
### Labeled Metrics (or lack thereof) ###
For metrics systems that support differentiating metrics beyond the
Kubernetes object hierarchy (such as using additional labels), the metrics
systems should have a metric which represents all such series aggregated
together. Additionally, implementors may choose to identify the individual
"sub-metrics" via the metric name, but this is expected to be fairly rare,
since it most likely requires specific knowledge of individual metrics.
For instance, suppose we record filesystem usage by filesystem inside the
container. There should then be a metric `filesystem/usage`, and the
implementors of the API may choose to expose more detailed metrics like
`filesystem/usage/my-first-filesystem`.
### Resource Versions ###
API implementors should set the `resourceVersion` field based on the
scrape time of the metric. The resource version is expected to increment
when the scrape/collection time of the returned metric changes. While the
API does not support writes, and does not currently support watches,
populating resource version preserves the normal expected Kubernetes API
semantics.
Relationship to HPA v2
----------------------
The URL paths in this API are designed to correspond to different source
types in the [HPA v2](hpa-v2.md). Specifially, the `pods` source type
corresponds to a URL of the form
`/namespaces/$NS/pods/*/$METRIC_NAME?labelSelector=foo`, while the
`object` source type corresponds to a URL of the form
`/namespaces/$NS/$RESOURCE.$GROUP/$OBJECT_NAME/$METRIC_NAME`.
The HPA then takes the results, aggregates them together (in the case of
the former source type), and uses the resulting value to produce a usage
ratio.
The resource source type is taken from the API provided by the
"metrics" API group (the master/resource metrics API).
The HPA will consume the API as a federated API server.
Relationship to Resource Metrics API
------------------------------------
The metrics presented by this API may be a superset of those present in the
resource metrics API, but this is not guaranteed. Clients that need the
information in the resource metrics API should use that to retrieve those
metrics, and supplement those metrics with this API.
Mechanical Concerns
-------------------
This API is intended to be implemented by monitoring pipelines (e.g.
inside Heapster, or as an adapter on top of a solution like Prometheus).
It shares many mechanical requirements with normal Kubernetes APIs, such
as the need to support encoding different versions of objects in both JSON
and protobuf, as well as acting as a discoverable API server. For these
reasons, it is expected that implemenators will make use of the Kubernetes
genericapiserver code. If implementors choose not to use this, they must
still follow all of the Kubernetes API server conventions in order to work
properly with consumers of the API.
Specifically, they must support the semantics of the GET verb in
Kubernetes, including outputting in different API versions and formats as
requested by the client. They must support integrating with API discovery
(including publishing a discovery document, etc).
Location
--------
The types and clients for this API will live in a separate repository
under the Kubernetes organization (e.g. `kubernetes/metrics`). This
repository will most likely also house other metrics-related APIs for
Kubernetes (e.g. historical metrics API definitions, the resource metrics
API definitions, etc).
Note that there will not be a canonical implemenation of the custom
metrics API under Kubernetes, just the types and clients. Implementations
will be left up to the monitoring pipelines.
Alternative Considerations
--------------------------
### Quantity vs Float ###
In the past, custom metrics were represented as floats. In general,
however, Kubernetes APIs are not supposed to use floats. The API proposed
above thus uses `resource.Quantity`. This adds a bit of encoding
overhead, but makes the API line up nicely with other Kubernetes APIs.
### Labeled Metrics ###
Many metric systems support labeled metrics, allowing for dimenisionality
beyond the Kubernetes object hierarchy. Since the HPA currently doesn't
support specifying metric labels, this is not supported via this API. We
may wish to explore this in the future.

View File

@ -60,6 +60,7 @@ changes:
type DaemonSetUpdateStrategy struct {
// Type of daemon set update. Can be "RollingUpdate" or "OnDelete".
// Default is OnDelete.
// +optional
Type DaemonSetUpdateStrategyType
// Rolling update config params. Present only if DaemonSetUpdateStrategy =
@ -68,6 +69,7 @@ type DaemonSetUpdateStrategy struct {
// TODO: Update this to follow our convention for oneOf, whatever we decide it
// to be. Same as DeploymentStrategy.RollingUpdate.
// See https://github.com/kubernetes/kubernetes/issues/35345
// +optional
RollingUpdate *RollingUpdateDaemonSet
}
@ -96,51 +98,62 @@ type RollingUpdateDaemonSet struct {
// it then proceeds onto other DaemonSet pods, thus ensuring that at least
// 70% of original number of DaemonSet pods are available at all times
// during the update.
// +optional
MaxUnavailable intstr.IntOrString
}
// DaemonSetSpec is the specification of a daemon set.
type DaemonSetSpec struct {
// Note: Existing fields, including Selector and Template are ommitted in
// Note: Existing fields, including Selector and Template are omitted in
// this proposal.
// Update strategy to replace existing DaemonSet pods with new pods.
// +optional
UpdateStrategy DaemonSetUpdateStrategy `json:"updateStrategy,omitempty"`
// Minimum number of seconds for which a newly created DaemonSet pod should
// be ready without any of its container crashing, for it to be considered
// available. Defaults to 0 (pod will be considered available as soon as it
// is ready).
// +optional
MinReadySeconds int32 `json:"minReadySeconds,omitempty"`
}
const (
// DefaultDaemonSetUniqueLabelKey is the default key of the labels that is added
// to daemon set pods to distinguish between old and new pod templates during
// DaemonSet update.
DefaultDaemonSetUniqueLabelKey string = "pod-template-hash"
)
// A sequence number representing a specific generation of the template.
// Populated by the system. Can be set at creation time. Read-only otherwise.
// +optional
TemplateGeneration int64 `json:"templateGeneration,omitempty"`
}
// DaemonSetStatus represents the current status of a daemon set.
type DaemonSetStatus struct {
// Note: Existing fields, including CurrentNumberScheduled, NumberMissscheduled,
// DesiredNumberScheduled, NumberReady, and ObservedGeneration are ommitted in
// DesiredNumberScheduled, NumberReady, and ObservedGeneration are omitted in
// this proposal.
// UpdatedNumberScheduled is the total number of nodes that are running updated
// daemon pod
// +optional
UpdatedNumberScheduled int32 `json:"updatedNumberScheduled"`
// NumberAvailable is the number of nodes that should be running the
// daemon pod and have one or more of the daemon pod running and
// available (ready for at least minReadySeconds)
// +optional
NumberAvailable int32 `json:"numberAvailable"`
// NumberUnavailable is the number of nodes that should be running the
// daemon pod and have non of the daemon pod running and available
// (ready for at least minReadySeconds)
// +optional
NumberUnavailable int32 `json:"numberUnavailable"`
}
const (
// DaemonSetTemplateGenerationKey is the key of the labels that is added
// to daemon set pods to distinguish between old and new pod templates
// during DaemonSet template update.
DaemonSetTemplateGenerationKey string = "pod-template-generation"
)
```
### Controller
@ -158,9 +171,13 @@ For each pending DaemonSet updates, it will:
1. Check `DaemonSetUpdateStrategy`:
- If `OnDelete`: do nothing
- If `RollingUpdate`:
- Compare spec of the daemon pods from step 1 with DaemonSet
`.spec.template.spec` to see if DaemonSet spec has changed.
- If DaemonSet spec has changed, compare `MaxUnavailable` with DaemonSet
- Find all daemon pods that belong to this DaemonSet using label selectors.
- Pods with "pod-template-generation" value equal to the DaemonSet's
`.spec.templateGeneration` are new pods, otherwise they're old pods.
- Note that pods without "pod-template-generation" labels (e.g. DaemonSet
pods created before RollingUpdate strategy is implemented) will be
seen as old pods.
- If there are old pods found, compare `MaxUnavailable` with DaemonSet
`.status.numberUnavailable` to see how many old daemon pods can be
killed. Then, kill those pods in the order that unhealthy pods (failed,
pending, not ready) are killed first.
@ -175,6 +192,12 @@ For each pending DaemonSet updates, it will:
If DaemonSet Controller crashes during an update, it can still recover.
#### API Server
In DaemonSet strategy (pkg/registry/extensions/daemonset/strategy.go#PrepareForUpdate),
increase DaemonSet's `.spec.templateGeneration` by 1 if any changes is made to
DaemonSet's `.spec.template`.
### kubectl
#### kubectl rollout
@ -270,21 +293,24 @@ Another way to implement DaemonSet history is through creating `PodTemplates` as
snapshots of DaemonSet templates, and then create them in DaemonSet controller:
- Find existing PodTemplates whose labels are matched by DaemonSet
`.spec.selector`
`.spec.selector`.
- Sort those PodTemplates by creation timestamp and only retain at most
`.spec.revisionHistoryLimit` latest PodTemplates (remove the rest)
- Find the PodTemplate whose `.template` is the same as DaemonSet
`.spec.template`. If not found, create a new PodTemplate from DaemonSet
`.spec.template`:
- The name will be `<DaemonSet-Name>-<Hash-of-pod-template>`
- PodTemplate `.metadata.labels` will have a "pod-template-hash" label,
value be the hash of PodTemplate `.template` (note: don't include the
"pod-template-hash" label when calculating hash)
- PodTemplate `.metadata.annotations` will be copied from DaemonSet
`.metadata.annotations`
Note that when the DaemonSet controller creates pods, those pods will be created
with the "pod-template-hash" label.
`.spec.revisionHistoryLimit` latest PodTemplates (remove the rest).
- Find the PodTemplate whose `.template` is the same as the DaemonSet's
`.spec.template`.
- If not found, create a new PodTemplate from DaemonSet's
`.spec.template`:
- The name will be `<DaemonSet-Name>-<template-generation>`
- PodTemplate `.metadata.labels` will have a "pod-template-generation"
label, value be the same as DaemonSet's `.spec.templateGeneration`.
- PodTemplate will have revision information to avoid triggering
unnecessary restarts on rollback, since we only roll forward and only
increase templateGeneration.
- PodTemplate `.metadata.annotations` will be copied from DaemonSet
`.metadata.annotations`.
- If the PodTemplate is found, sync its "pod-template-generation" and
revision information with current DaemonSet.
- DaemonSet creates pods with "pod-template-generation" label.
PodTemplate may need to be made an admin-only or read only resource if it's used
to store DaemonSet history.

View File

@ -0,0 +1,69 @@
# Deploying a default StorageClass during installation
## Goal
Usual Kubernetes installation tools should deploy a default StorageClass
where it makes sense.
"*Usual installation tools*" are:
* cluster/kube-up.sh
* kops
* kubeadm
Other "installation tools" can (and should) deploy default StorageClass
following easy steps described in this document, however we won't touch them
during implementation of this proposal.
"*Where it makes sense*" are:
* AWS
* Azure
* GCE
* Photon
* OpenStack
* vSphere
Explicitly, there is no default storage class on bare metal.
## Motivation
In Kubernetes 1.5, we had "alpha" dynamic provisioning on aforementioned cloud
platforms. In 1.6 we want to deprecate this alpha provisioning. In order to keep
the same user experience, we need a default StorageClass instance that would
provision volumes for PVCs that do not request any special class. As
consequence, this default StorageClass would provision volumes for PVCs with
"alpha" provisioning annotation - this annotation would be ignored in 1.6 and
default storage class would be assumed.
## Design
1. Kubernetes will ship yaml files for default StorageClasses for each platform
as `cluster/addons/storage-class/<platform>/default.yaml` and all these
default classes will distributed together with all other addons in
`kubernetes.tar.gz`.
2. An installation tool will discover on which platform it runs and installs
appropriate yaml file into usual directory for addon manager (typically
`/etc/kubernetes/addons/storage-class/default.yaml`).
3. Addon manager will deploy the storage class into installed cluster in usual
way. We need to update addon manager not to overwrite any existing object
in case cluster admin has manually disabled this default storage class!
## Implementation
* AWS, GCE and OpenStack has a default StorageClass in
`cluster/addons/storage-class/<platform>/` - already done in 1.5
* We need a default StorageClass for vSphere, Azure and Photon in `cluster/addons/storage-class/<platform>`
* cluster/kube-up.sh scripts need to be updated to install the storage class on appropriate platforms
* Already done on GCE, AWS and OpenStack.
* kops needs to be updated to install the storage class on appropriate platforms
* already done for kops on AWS and kops does not support other platforms yet.
* kubeadm needs to be updated to install the storage class on appropriate platforms (if it is cloud-provider aware)
* addon manager fix: https://github.com/kubernetes/kubernetes/issues/39561

View File

@ -109,10 +109,10 @@ spec:
- containerPort: 2380
protocol: TCP
env:
- Name: duplicate_key
Value: FROM_ENV
- Name: expansion
Value: $(REPLACE_ME)
- name: duplicate_key
value: FROM_ENV
- name: expansion
value: $(REPLACE_ME)
envFrom:
- configMapRef:
name: etcd-env-config

View File

@ -53,7 +53,7 @@ lot of applications and customer use-cases.
# Alpha Design
This section describes the proposed design for
[alpha-level](../../docs/devel/api_changes.md#alpha-beta-and-stable-versions) support, although
[alpha-level](../devel/api_changes.md#alpha-beta-and-stable-versions) support, although
additional features are described in [future work](#future-work).
## Overview

View File

@ -391,7 +391,7 @@ Include federated replica set name in the cluster name hash so that we get
slightly different ordering for different RS. So that not all RS of size 1
end up on the same cluster.
3. Assign minimum prefered number of replicas to each of the clusters, if
3. Assign minimum preferred number of replicas to each of the clusters, if
there is enough replicas and capacity.
4. If rebalance = false, assign the previously present replicas to the clusters,

View File

@ -1,4 +1,4 @@
Horizontal Pod Autoscaler with Arbitary Metrics
Horizontal Pod Autoscaler with Arbitrary Metrics
===============================================
The current Horizontal Pod Autoscaler object only has support for CPU as

View File

@ -425,6 +425,15 @@ from placing new best effort pods on the node since they will be rejected by the
On the other hand, the `DiskPressure` condition if true should dissuade the scheduler from
placing **any** new pods on the node since they will be rejected by the `kubelet` in admission.
## Enforcing Node Allocatable
To enforce [Node Allocatable](./node-allocatable.md), Kubelet primarily uses cgroups.
However `storage` cannot be enforced using cgroups.
Once Kubelet supports `storage` as an `Allocatable` resource, Kubelet will perform evictions whenever the total storage usage by pods exceed node allocatable.
If a pod cannot tolerate evictions, then ensure that requests is set and it will not exceed `requests`.
## Best Practices
### DaemonSet

View File

@ -11,7 +11,7 @@ a number of dependencies that must exist in its filesystem, including various
mount and network utilities. Missing any of these can lead to unexpected
differences between Kubernetes hosts. For example, the Google Container VM
image (GCI) is missing various mount commands even though the Kernel supports
those filesystem types. Similarly, CoreOS Linux intentionally doesn't ship with
those filesystem types. Similarly, CoreOS Container Linux intentionally doesn't ship with
many mount utilities or socat in the base image. Other distros have a related
problem of ensuring these dependencies are present and versioned appropriately
for the Kubelet.
@ -38,7 +38,7 @@ mount --rbind /var/lib/kubelet /path/to/chroot/var/lib/kubelet
chroot /path/to/kubelet /usr/bin/hyperkube kubelet
```
Note: Kubelet might need access to more directories on the host and we intend to identity mount all those directories into the chroot. A partial list can be found in the CoreOS kubelet-wrapper script.
Note: Kubelet might need access to more directories on the host and we intend to identity mount all those directories into the chroot. A partial list can be found in the CoreOS Container Linux kubelet-wrapper script.
This logic will also naturally be abstracted so it's no more difficult for the user to run the Kubelet.
Currently, the Kubelet does not need access to arbitrary paths on the host (as
@ -53,13 +53,13 @@ chroot.
## Current Use
This method of running the Kubelet is already in use by users of CoreOS Linux. The details of this implementation are found in the [kubelet wrapper documentation](https://coreos.com/kubernetes/docs/latest/kubelet-wrapper.html).
This method of running the Kubelet is already in use by users of CoreOS Container Linux. The details of this implementation are found in the [kubelet wrapper documentation](https://coreos.com/kubernetes/docs/latest/kubelet-wrapper.html).
## Implementation
### Target Distros
The two distros which benefit the most from this change are GCI and CoreOS. Initially, these changes will only be implemented for those distros.
The two distros which benefit the most from this change are GCI and CoreOS Container Linux. Initially, these changes will only be implemented for those distros.
This work will also only initially target the GCE provider and `kube-up` method of deployment.
@ -139,7 +139,7 @@ Similarly, for the mount utilities, the [Flex Volume v2](https://github.com/kube
**Downsides**:
This requires waiting on other features which might take a signficant time to land. It also could end up not fully fixing the problem (e.g. pushing down port-forwarding to the runtime doesn't ensure the runtime doesn't rely on host utilities).
This requires waiting on other features which might take a significant time to land. It also could end up not fully fixing the problem (e.g. pushing down port-forwarding to the runtime doesn't ensure the runtime doesn't rely on host utilities).
The Flex Volume feature is several releases out from fully replacing the current volumes as well.
@ -158,7 +158,7 @@ Currently, there's a `--containerized` flag. This flag doesn't actually remove t
#### Timeframe
During the 1.6 timeframe, the changes mentioned in implementation will be undergone for the CoreOS and GCI distros.
During the 1.6 timeframe, the changes mentioned in implementation will be undergone for the CoreOS Container Linux and GCI distros.
Based on the test results and additional problems that may arise, rollout will
be determined from there. Hopefully the rollout can also occur in the 1.6

View File

@ -0,0 +1,64 @@
# Mount options for mountable volume types
## Goal
Enable Kubernetes admins to specify mount options with mountable volumes
such as - `nfs`, `glusterfs` or `aws-ebs` etc.
## Motivation
We currently support network filesystems: NFS, Glusterfs, Ceph FS, SMB (Azure file), Quobytes, and local filesystems such as ext[3|4] and XFS.
Mount time options that are operationally important and have no security implications should be suppported. Examples are NFS's TCP mode, versions, lock mode, caching mode; Glusterfs's caching mode; SMB's version, locking, id mapping; and more.
## Design
We are going to add support for mount options in PVs as a beta feature to begin with.
Mount options can be specified as `mountOptions` annotations in PV. For example:
``` yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0003
annotations:
volume.beta.kubernetes.io/mountOptions: "hard,nolock,nfsvers=3"
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /tmp
server: 172.17.0.2
```
## Preventing users from specifying mount options in inline volume specs of Pod
While mount options enable more flexibility in how volumes are mounted, it can result
in user specifying options that are not supported or are known to be problematic when
using inline volume specs.
After much deliberation it was decided that - `mountOptions` as an API parameter will not be supported
for inline volume specs.
### Error handling and plugins that don't support mount option
Kubernetes ships with volume plugins that don't support any kind of mount options. Such as `configmaps` or `secrets`,
in those cases to prevent user from submitting volume definitions with bogus mount options - plugins can define a interface function
such as:
```go
func SupportsMountOption() {
return false
}
```
which will be used to validate the PV definition and API object will be *only* created if it passes the validation. Additionally
support for user specified mount options will be also checked when volumes are being mounted.
In other cases where plugin supports mount options (such as - `NFS` or `GlusterFS`) but mounting fails because of invalid mount
option or otherwise - an Event API object will be created and attached to the appropriate object.

View File

@ -131,7 +131,7 @@ TL;DR;
### Components should expose their platform
It should be possible to run clusters with mixed platforms smoothly. After all, bringing heterogenous machines together to a single unit (a cluster) is one of Kubernetes' greatest strengths. And since the Kubernetes' components communicate over HTTP, two binaries of different architectures may talk to each other normally.
It should be possible to run clusters with mixed platforms smoothly. After all, bringing heterogeneous machines together to a single unit (a cluster) is one of Kubernetes' greatest strengths. And since the Kubernetes' components communicate over HTTP, two binaries of different architectures may talk to each other normally.
The crucial thing here is that the components that handle platform-specific tasks (e.g. kubelet) should expose their platform. In the kubelet case, we've initially solved it by exposing the labels `beta.kubernetes.io/{os,arch}` on every node. This way an user may run binaries for different platforms on a multi-platform cluster, but still it requires manual work to apply the label to every manifest.
@ -206,7 +206,7 @@ However, before temporarily [deactivating builds](https://github.com/kubernetes/
Go 1.5 introduced many changes. To name a few that are relevant to Kubernetes:
- C was eliminated from the tree (it was earlier used for the bootstrap runtime).
- All processors are used by default, which means we should be able to remove [lines like this one](https://github.com/kubernetes/kubernetes/blob/v1.2.0/cmd/kubelet/kubelet.go#L37)
- The garbage collector became more efficent (but also [confused our latency test](https://github.com/golang/go/issues/14396)).
- The garbage collector became more efficient (but also [confused our latency test](https://github.com/golang/go/issues/14396)).
- `linux/arm64` and `linux/ppc64le` were added as new ports.
- The `GO15VENDOREXPERIMENT` was started. We switched from `Godeps/_workspace` to the native `vendor/` in [this PR](https://github.com/kubernetes/kubernetes/pull/24242).
- It's not required to pre-build the whole standard library `std` when cross-compliling. [Details](#prebuilding-the-standard-library-std)
@ -448,7 +448,7 @@ ARMv6 | arm | 6 | - | 32-bit
ARMv7 | arm | 7 | armhf | 32-bit
ARMv8 | arm64 | - | aarch64 | 64-bit
The compability between the versions is pretty straightforward, ARMv5 binaries may run on ARMv7 hosts, but not vice versa.
The compatibility between the versions is pretty straightforward, ARMv5 binaries may run on ARMv7 hosts, but not vice versa.
## Cross-building docker images for linux

View File

@ -1,40 +1,24 @@
# Node Allocatable Resources
**Issue:** https://github.com/kubernetes/kubernetes/issues/13984
### Authors: timstclair@, vishh@
## Overview
Currently Node.Status has Capacity, but no concept of node Allocatable. We need additional
parameters to serve several purposes:
Kubernetes nodes typically run many OS system daemons in addition to kubernetes daemons like kubelet, runtime, etc. and user pods.
Kubernetes assumes that all the compute resources available, referred to as `Capacity`, in a node are available for user pods.
In reality, system daemons use non-trivial amount of resources and their availability is critical for the stability of the system.
To address this issue, this proposal introduces the concept of `Allocatable` which identifies the amount of compute resources available to user pods.
Specifically, the kubelet will provide a few knobs to reserve resources for OS system daemons and kubernetes daemons.
1. Kubernetes metrics provides "/docker-daemon", "/kubelet",
"/kube-proxy", "/system" etc. raw containers for monitoring system component resource usage
patterns and detecting regressions. Eventually we want to cap system component usage to a certain
limit / request. However this is not currently feasible due to a variety of reasons including:
1. Docker still uses tons of computing resources (See
[#16943](https://github.com/kubernetes/kubernetes/issues/16943))
2. We have not yet defined the minimal system requirements, so we cannot control Kubernetes
nodes or know about arbitrary daemons, which can make the system resources
unmanageable. Even with a resource cap we cannot do a full resource management on the
node, but with the proposed parameters we can mitigate really bad resource over commits
3. Usage scales with the number of pods running on the node
2. For external schedulers (such as mesos, hadoop, etc.) integration, they might want to partition
compute resources on a given node, limiting how much Kubelet can use. We should provide a
mechanism by which they can query kubelet, and reserve some resources for their own purpose.
By explicitly reserving compute resources, the intention is to avoid overcommiting the node and not have system daemons compete with user pods.
The resources available to system daemons and user pods will be capped based on user specified reservations.
### Scope of proposal
This proposal deals with resource reporting through the [`Allocatable` field](#allocatable) for more
reliable scheduling, and minimizing resource over commitment. This proposal *does not* cover
resource usage enforcement (e.g. limiting kubernetes component usage), pod eviction (e.g. when
reservation grows), or running multiple Kubelets on a single node.
If `Allocatable` is available, the scheduler will use that instead of `Capacity`, thereby not overcommiting the node.
## Design
### Definitions
![image](node-allocatable.png)
1. **Node Capacity** - Already provided as
[`NodeStatus.Capacity`](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html#_v1_nodestatus),
this is total capacity read from the node instance, and assumed to be constant.
@ -66,7 +50,7 @@ type NodeStatus struct {
Allocatable will be computed by the Kubelet and reported to the API server. It is defined to be:
```
[Allocatable] = [Node Capacity] - [Kube-Reserved] - [System-Reserved]
[Allocatable] = [Node Capacity] - [Kube-Reserved] - [System-Reserved] - [Hard-Eviction-Threshold]
```
The scheduler will use `Allocatable` in place of `Capacity` when scheduling pods, and the Kubelet
@ -89,12 +73,7 @@ The flag will be specified as a serialized `ResourceList`, with resources define
--kube-reserved=cpu=500m,memory=5Mi
```
Initially we will only support CPU and memory, but will eventually support more resources. See
[#16889](https://github.com/kubernetes/kubernetes/pull/16889) for disk accounting.
If KubeReserved is not set it defaults to a sane value (TBD) calculated from machine capacity. If it
is explicitly set to 0 (along with `SystemReserved`), then `Allocatable == Capacity`, and the system
behavior is equivalent to the 1.1 behavior with scheduling based on Capacity.
Initially we will only support CPU and memory, but will eventually support more resources like [local storage](#phase-3) and io proportional weights to improve node reliability.
#### System-Reserved
@ -102,48 +81,259 @@ In the initial implementation, `SystemReserved` will be functionally equivalent
[`KubeReserved`](#kube-reserved), but with a different semantic meaning. While KubeReserved
designates resources set aside for kubernetes components, SystemReserved designates resources set
aside for non-kubernetes components (currently this is reported as all the processes lumped
together in the `/system` raw container).
together in the `/system` raw container on non-systemd nodes).
## Issues
## Kubelet Evictions Thresholds
To improve the reliability of nodes, kubelet evicts pods whenever the node runs out of memory or local storage.
Together, evictions and node allocatable help improve node stability.
As of v1.5, evictions are based on overall node usage relative to `Capacity`.
Kubelet evicts pods based on QoS and user configured eviction thresholds.
More deails in [this doc](./kubelet-eviction.md#enforce-node-allocatable)
From v1.6, if `Allocatable` is enforced by default across all pods on a node using cgroups, pods cannot exceed `Allocatable`.
Memory and CPU limits are enforced using cgroups, but there exists no easy means to enforce storage limits though.
Enforcing storage limits using Linux Quota is not possible since it's not hierarchical.
Once storage is supported as a resource for `Allocatable`, Kubelet has to perform evictions based on `Allocatable` in addition to `Capacity`.
Note that eviction limits are enforced on pods only and system daemons are free to use any amount of resources unless their reservations are enforced.
Here is an example to illustrate Node Allocatable for memory:
Node Capacity is `32Gi`, kube-reserved is `2Gi`, system-reserved is `1Gi`, eviction-hard is set to `<100Mi`
For this node, the effective Node Allocatable is `28.9Gi` only; i.e. if kube and system components use up all their reservation, the memory available for pods is only `28.9Gi` and kubelet will evict pods once overall usage of pods crosses that threshold.
If we enforce Node Allocatable (`28.9Gi`) via top level cgroups, then pods can never exceed `28.9Gi` in which case evictions will not be performed unless kernel memory consumption is above `100Mi`.
In order to support evictions and avoid memcg OOM kills for pods, we will set the top level cgroup limits for pods to be `Node Allocatable` + `Eviction Hard Thresholds`.
However, the scheduler is not expected to use more than `28.9Gi` and so `Node Allocatable` on Node Status will be `28.9Gi`.
If kube and system components do not use up all their reservation, with the above example, pods will face memcg OOM kills from the node allocatable cgroup before kubelet evictions kick in.
To better enforce QoS under this situation, Kubelet will apply the hard eviction thresholds on the node allocatable cgroup as well, if node allocatable is enforced.
The resulting behavior will be the same for user pods.
With the above example, Kubelet will evict pods whenever pods consume more than `28.9Gi` which will be `<100Mi` from `29Gi` which will be the memory limits on the Node Allocatable cgroup.
## General guidelines
System daemons are expected to be treated similar to `Guaranteed` pods.
System daemons can burst within their bounding cgroups and this behavior needs to be managed as part of kubernetes deployment.
For example, Kubelet can have its own cgroup and share `KubeReserved` resources with the Container Runtime.
However, Kubelet cannot burst and use up all available Node resources if `KubeReserved` is enforced.
Users are advised to be extra careful while enforcing `SystemReserved` reservation since it can lead to critical services being CPU starved or OOM killed on the nodes.
The recommendation is to enforce `SystemReserved` only if a user has profiled their nodes exhaustively to come up with precise estimates.
To begin with enforce `Allocatable` on `pods` only.
Once adequate monitoring and alerting is in place to track kube daemons, attempt to enforce `KubeReserved` based on heuristics.
More on this in [Phase 2](#phase-2-enforce-allocatable-on-pods).
The resource requirements of kube system daemons will grow over time as more and more features are added.
Over time, the project will attempt to bring down utilization, but that is not a priority as of now.
So expect a drop in `Allocatable` capacity over time.
`Systemd-logind` places ssh sessions under `/user.slice`.
Its usage will not be accounted for in the nodes.
Take into account resource reservation for `/user.slice` while configuring `SystemReserved`.
Ideally `/user.slice` should reside under `SystemReserved` top level cgroup.
## Recommended Cgroups Setup
Following is the recommended cgroup configuration for Kubernetes nodes.
All OS system daemons are expected to be placed under a top level `SystemReserved` cgroup.
`Kubelet` and `Container Runtime` are expected to be placed under `KubeReserved` cgroup.
The reason for recommending placing the `Container Runtime` under `KubeReserved` is as follows:
1. A container runtime on Kubernetes nodes is not expected to be used outside of the Kubelet.
1. It's resource consumption is tied to the number of pods running on a node.
Note that the hierarchy below recommends having dedicated cgroups for kubelet and the runtime to individually track their usage.
```text
/ (Cgroup Root)
.
+..systemreserved or system.slice (Specified via `--system-reserved-cgroup`; `SystemReserved` enforced here *optionally* by kubelet)
. . .tasks(sshd,udev,etc)
.
.
+..podruntime or podruntime.slice (Specified via `--kube-reserved-cgroup`; `KubeReserved` enforced here *optionally* by kubelet)
. .
. +..kubelet
. . .tasks(kubelet)
. .
. +..runtime
. .tasks(docker-engine, containerd)
.
.
+..kubepods or kubepods.slice (Node Allocatable enforced here by Kubelet)
. .
. +..PodGuaranteed
. . .
. . +..Container1
. . . .tasks(container processes)
. . .
. . +..PodOverhead
. . . .tasks(per-pod processes)
. . ...
. .
. +..Burstable
. . .
. . +..PodBurstable
. . . .
. . . +..Container1
. . . . .tasks(container processes)
. . . +..Container2
. . . . .tasks(container processes)
. . . .
. . . ...
. . .
. . ...
. .
. .
. +..Besteffort
. . .
. . +..PodBesteffort
. . . .
. . . +..Container1
. . . . .tasks(container processes)
. . . +..Container2
. . . . .tasks(container processes)
. . . .
. . . ...
. . .
. . ...
```
`systemreserved` & `kubereserved` cgroups are expected to be created by users.
If Kubelet is creating cgroups for itself and docker daemon, it will create the `kubereserved` cgroups automatically.
`kubepods` cgroups will be created by kubelet automatically if it is not already there.
Creation of `kubepods` cgroup is tied to QoS Cgroup support which is controlled by `--cgroups-per-qos` flag.
If the cgroup driver is set to `systemd` then Kubelet will create a `kubepods.slice` via systemd.
By default, Kubelet will `mkdir` `/kubepods` cgroup directly via cgroupfs.
#### Containerizing Kubelet
If Kubelet is managed using a container runtime, have the runtime create cgroups for kubelet under `kubereserved`.
### Metrics
Kubelet identifies it's own cgroup and exposes it's usage metrics via the Summary metrics API (/stats/summary)
With docker runtime, kubelet identifies docker runtime's cgroups too and exposes metrics for it via the Summary metrics API.
To provide a complete overview of a node, Kubelet will expose metrics from cgroups enforcing `SystemReserved`, `KubeReserved` & `Allocatable` too.
## Implementation Phases
### Phase 1 - Introduce Allocatable to the system without enforcement
**Status**: Implemented v1.2
In this phase, Kubelet will support specifying `KubeReserved` & `SystemReserved` resource reservations via kubelet flags.
The defaults for these flags will be `""`, meaning zero cpu or memory reservations.
Kubelet will compute `Allocatable` and update `Node.Status` to include it.
The scheduler will use `Allocatable` instead of `Capacity` if it is available.
### Phase 2 - Enforce Allocatable on Pods
**Status**: Targeted for v1.6
In this phase, Kubelet will automatically create a top level cgroup to enforce Node Allocatable across all user pods.
The creation of this cgroup is controlled by `--cgroups-per-qos` flag.
Kubelet will support specifying the top level cgroups for `KubeReserved` and `SystemReserved` and support *optionally* placing resource restrictions on these top level cgroups.
Users are expected to specify `KubeReserved` and `SystemReserved` based on their deployment requirements.
Resource requirements for Kubelet and the runtime is typically proportional to the number of pods running on a node.
Once a user identified the maximum pod density for each of their nodes, they will be able to compute `KubeReserved` using [this performance dashboard](http://node-perf-dash.k8s.io/#/builds).
[This blog post](http://blog.kubernetes.io/2016/11/visualize-kubelet-performance-with-node-dashboard.html) explains how the dashboard has to be interpreted.
Note that this dashboard provides usage metrics for docker runtime only as of now.
Support for evictions based on Allocatable will be introduced in this phase.
New flags introduced in this phase are as follows:
1. `--enforce-node-allocatable=[pods][,][kube-reserved][,][system-reserved]`
* This flag will default to `pods` in v1.6.
* This flag will be a `no-op` unless `--kube-reserved` and/or `--system-reserved` has been specified.
* If `--cgroups-per-qos=false`, then this flag has to be set to `""`. Otherwise its an error and kubelet will fail.
* It is recommended to drain and restart nodes prior to upgrading to v1.6. This is necessary for `--cgroups-per-qos` feature anyways which is expected to be turned on by default in `v1.6`.
* Users intending to turn off this feature can set this flag to `""`.
* Specifying `kube-reserved` value in this flag is invalid if `--kube-reserved-cgroup` flag is not specified.
* Specifying `system-reserved` value in this flag is invalid if `--system-reserved-cgroup` flag is not specified.
* By including `kube-reserved` or `system-reserved` in this flag's value, and by specifying the following two flags, Kubelet will attempt to enforce the reservations specified via `--kube-reserved` & `system-reserved` respectively.
2. `--kube-reserved-cgroup=<absolute path to a cgroup>`
* This flag helps kubelet identify the control group managing all kube components like Kubelet & container runtime that fall under the `KubeReserved` reservation.
* Example: `/kube.slice`. Note that absolute paths are required and systemd naming scheme isn't supported.
3. `--system-reserved-cgroup=<absolute path to a cgroup>`
* This flag helps kubelet identify the control group managing all OS specific system daemons that fall under the `SystemReserved` reservation.
* Example: `/system.slice`. Note that absolute paths are required and systemd naming scheme isn't supported.
4. `--experimental-node-allocatable-ignore-eviction-threshold`
* This flag is provided as an `opt-out` option to avoid including Hard eviction thresholds in Node Allocatable which can impact existing clusters.
* The default value is `false`.
#### Rollout details
This phase is expected to improve Kubernetes node stability.
However it requires users to specify non-default values for `--kube-reserved` & `--system-reserved` flags though.
The rollout of this phase has been long due and hence we are attempting to include it in v1.6.
Since `KubeReserved` and `SystemReserved` continue to have `""` as defaults, the node's `Allocatable` does not change automatically.
Since this phase requires node drains (or pod restarts/terminations), it is considered disruptive to users.
To rollback this phase, set `--enforce-node-allocatable` flag to `""` and `--experimental-node-allocatable-ignore-eviction-threshold` to `true`.
The former disables Node Allocatable enforcement on all pods and the latter avoids including hard eviction thresholds in Node Allocatable.
This rollout in v1.6 might cause the following symptoms:
1. If `--kube-reserved` and/or `--system-reserved` flags are also specified, OOM kills of containers and/or evictions of pods. This can happen primarily to `Burstable` and `BestEffort` pods since they can no longer use up all the resource available on the node.
1. Total allocatable capadity in the cluster reduces resulting in pods staying `Pending` because Hard Eviction Thresholds are included in Node Allocatable.
##### Proposed Timeline
```text
02/14/2017 - Discuss the rollout plan in sig-node meeting
02/15/2017 - Flip the switch to enable pod level cgroups by default
02/21/2017 - Merge phase 2 implementation
02/27/2017 - Kubernetes Feature complete (i.e. code freeze)
03/01/2017 - Send an announcement to kubernetes-dev@ about this rollout along with rollback options and potential issues. Recommend users to set kube and system reserved.
03/22/2017 - Kubernetes 1.6 release
```
### Phase 3 - Metrics & support for Storage
*Status*: Targeted for v1.7
In this phase, Kubelet will expose usage metrics for `KubeReserved`, `SystemReserved` and `Allocatable` top level cgroups via Summary metrics API.
`Storage` will also be introduced as a reservable resource in this phase.
## Known Issues
### Kubernetes reservation is smaller than kubernetes component usage
**Solution**: Initially, do nothing (best effort). Let the kubernetes daemons overflow the reserved
resources and hope for the best. If the node usage is less than Allocatable, there will be some room
for overflow and the node should continue to function. If the node has been scheduled to capacity
for overflow and the node should continue to function. If the node has been scheduled to `allocatable`
(worst-case scenario) it may enter an unstable state, which is the current behavior in this
situation.
In the [future](#future-work) we may set a parent cgroup for kubernetes components, with limits set
A recommended alternative is to enforce KubeReserved once Kubelet supports it (Phase 2).
In the future we may set a parent cgroup for kubernetes components, with limits set
according to `KubeReserved`.
### Version discrepancy
**API server / scheduler is not allocatable-resources aware:** If the Kubelet rejects a Pod but the
scheduler expects the Kubelet to accept it, the system could get stuck in an infinite loop
scheduling a Pod onto the node only to have Kubelet repeatedly reject it. To avoid this situation,
we will do a 2-stage rollout of `Allocatable`. In stage 1 (targeted for 1.2), `Allocatable` will
be reported by the Kubelet and the scheduler will be updated to use it, but Kubelet will continue
to do admission checks based on `Capacity` (same as today). In stage 2 of the rollout (targeted
for 1.3 or later), the Kubelet will start doing admission checks based on `Allocatable`.
**API server expects `Allocatable` but does not receive it:** If the kubelet is older and does not
provide `Allocatable` in the `NodeStatus`, then `Allocatable` will be
[defaulted](../../pkg/api/v1/defaults.go) to
`Capacity` (which will yield today's behavior of scheduling based on capacity).
### 3rd party schedulers
The community should be notified that an update to schedulers is recommended, but if a scheduler is
not updated it falls under the above case of "scheduler is not allocatable-resources aware".
## Future work
1. Convert kubelet flags to Config API - Prerequisite to (2). See
[#12245](https://github.com/kubernetes/kubernetes/issues/12245).
2. Set cgroup limits according KubeReserved - as described in the [overview](#overview)
3. Report kernel usage to be considered with scheduling decisions.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

View File

@ -0,0 +1,718 @@
# Pod Preset
* [Abstract](#abstract)
* [Motivation](#motivation)
* [Constraints and Assumptions](#constraints-and-assumptions)
* [Use Cases](#use-cases)
* [Summary](#summary)
* [Prior Art](#prior-art)
* [Objectives](#objectives)
* [Proposed Changes](#proposed-changes)
* [PodPreset API object](#podpreset-api-object)
* [Validations](#validations)
* [AdmissionControl Plug-in: PodPreset](#admissioncontrol-plug-in-podpreset)
* [Behavior](#behavior)
* [Examples](#examples)
* [Simple Pod Spec Example](#simple-pod-spec-example)
* [Pod Spec with `ConfigMap` Example](#pod-spec-with-`configmap`-example)
* [ReplicaSet with Pod Spec Example](#replicaset-with-pod-spec-example)
* [Multiple PodPreset Example](#multiple-podpreset-example)
* [Conflict Example](#conflict-example)
## Abstract
Describes a policy resource that allows for the loose coupling of a Pod's
definition from additional runtime requirements for that Pod. For example,
mounting of Secrets, or setting additional environment variables,
may not be known at Pod deployment time, but may be required at Pod creation
time.
## Motivation
Consuming a service involves more than just connectivity. In addition to
coordinates to reach the service, credentials and non-secret configuration
parameters are typically needed to use the service. The primitives for this
already exist, but a gap exists where loose coupling is desired: it should be
possible to inject pods with the information they need to use a service on a
service-by-service basis, without the pod authors having to incorporate the
information into every pod spec where it is needed.
## Constraints and Assumptions
1. Future work might require new mechanisms to be made to work with existing
controllers such as deployments and replicasets that create pods. Existing
controllers that create pods should recreate their pods when a new Pod Injection
Policy is added that would effect them.
## Use Cases
- As a user, I want to be able to provision a new pod
without needing to know the application configuration primitives the
services my pod will consume.
- As a cluster admin, I want specific configuration items of a service to be
withheld visibly from a developer deploying a service, but not to block the
developer from shipping.
- As an app developer, I want to provision a Cloud Spanner instance and then
access it from within my Kubernetes cluster.
- As an app developer, I want the Cloud Spanner provisioning process to
configure my Kubernetes cluster so the endpoints and credentials for my
Cloud Spanner instance are implicitly injected into Pods matching a label
selector (without me having to modify the PodSpec to add the specific
Configmap/Secret containing the endpoint/credential data).
**Specific Example:**
1. Database Administrator provisions a MySQL service for their cluster.
2. Database Administrator creates secrets for the cluster containing the
database name, username, and password.
3. Database Administrator creates a `PodPreset` defining the database
port as an enviornment variable, as well as the secrets. See
[Examples](#examples) below for various examples.
4. Developer of an application can now label their pod with the specified
`Selector` the Database Administrator tells them, and consume the MySQL
database without needing to know any of the details from step 2 and 3.
### Summary
The use case we are targeting is to automatically inject into Pods the
information required to access non-Kubernetes-Services, such as accessing an
instances of Cloud Spanner. Accessing external services such as Cloud Spanner
may require the Pods to have specific credential and endpoint data.
Using a Pod Preset allows pod template authors to not have to explicitly
set information for every pod. This way authors of pod templates consuming a
specific service do not need to know all the details about that service.
### Prior Art
Internally for Kubernetes we already support accessing the Kubernetes api from
all Pods by injecting the credentials and endpoint data automatically - e.g.
injecting the serviceaccount credentials into a volume (via secret) using an
[admission controller](https://github.com/kubernetes/kubernetes/blob/97212f5b3a2961d0b58a20bdb6bda3ccfa159bd7/plugin/pkg/admission/serviceaccount/admission.go),
and injecting the Service endpoints into environment
variables. This is done without the Pod explicitly mounting the serviceaccount
secret.
### Objectives
The goal of this proposal is to generalize these capabilities so we can introduce
similar support for accessing Services running external to the Kubernetes cluster.
We can assume that an appropriate Secret and Configmap have already been created
as part of the provisioning process of the external service. The need then is to
provide a mechanism for injecting the Secret and Configmap into Pods automatically.
The [ExplicitServiceLinks proposal](https://github.com/kubernetes/community/pull/176),
will allow us to decouple where a Service's credential and endpoint information
is stored in the Kubernetes cluster from a Pod's intent to access that Service
(e.g. in declaring it wants to access a Service, a Pod is automatically injected
with the credential and endpoint data required to do so).
## Proposed Changes
### PodPreset API object
This resource is alpha. The policy itself is immutable. The API group will be
added to new group `settings` and the version is `v1alpha1`.
```go
// PodPreset is a policy resource that defines additional runtime
// requirements for a Pod.
type PodPreset struct {
unversioned.TypeMeta
ObjectMeta
// +optional
Spec PodPresetSpec
}
// PodPresetSpec is a description of a pod preset.
type PodPresetSpec struct {
// Selector is a label query over a set of resources, in this case pods.
// Required.
Selector unversioned.LabelSelector
// Env defines the collection of EnvVar to inject into containers.
// +optional
Env []EnvVar
// EnvFrom defines the collection of EnvFromSource to inject into
// containers.
// +optional
EnvFrom []EnvFromSource
// Volumes defines the collection of Volume to inject into the pod.
// +optional
Volumes []Volume `json:omitempty`
// VolumeMounts defines the collection of VolumeMount to inject into
// containers.
// +optional
VolumeMounts []VolumeMount
}
```
#### Validations
In order for the Pod Preset to be valid it must fulfill the
following constraints:
- The `Selector` field must be defined. This is how we know which pods
to inject so therefore it is required and cannot be empty.
- The policy must define _at least_ 1 of `Env`, `EnvFrom`, or `Volumes` with
corresponding `VolumeMounts`.
- If you define a `Volume`, it has to define a `VolumeMount`.
- For `Env`, `EnvFrom`, `Volumes`, and `VolumeMounts` all existing API
validations are applied.
This resource will be immutable, if you want to change something you can delete
the old policy and recreate a new one. We can change this to be mutable in the
future but by disallowing it now, we will not break people in the future.
#### Conflicts
There are a number of edge conditions that might occur at the time of
injection. These are as follows:
- Merging lists with no conflicts: if a pod already has a `Volume`,
`VolumeMount` or `EnvVar` defined **exactly** as defined in the
PodPreset. No error will occur since they are the exact same. The
motivation behind this is if services have no quite converted to using pod
injection policies yet and have duplicated information and an error should
obviously not be thrown if the items that need to be injected already exist
and are exactly the same.
- Merging lists with conflicts: if a PIP redefines an `EnvVar` or a `Volume`,
an event on the pod showing the error on the conflict will be thrown and
nothing will be injected.
- Conflicts between `Env` and `EnvFrom`: this would throw an error with an
event on the pod showing the error on the conflict. Nothing would be
injected.
> **Note:** In the case of a conflict nothing will be injected. The entire
> policy is ignored and an event is thrown on the pod detailing the conflict.
### AdmissionControl Plug-in: PodPreset
The **PodPreset** plug-in introspects all incoming pod creation
requests and injects the pod based off a `Selector` with the desired
attributes.
For the initial alpha, the order of precedence for applying multiple
`PodPreset` specs is from oldest to newest. All Pod Injection
Policies in a namespace should be order agnostic; the order of application is
unspecified. Users should ensure that policies do not overlap.
However we can use merge keys to detect some of the conflicts that may occur.
This will not be enabled by default for all clusters, but once GA will be
a part of the set of strongly recommended plug-ins documented
[here](https://kubernetes.io/docs/admin/admission-controllers/#is-there-a-recommended-set-of-plug-ins-to-use).
**Why not an Initializer?**
This will be first implemented as an AdmissionControl plug-in then can be
converted to an Initializer once that is fully ready. The proposal for
Initializers can be found at [kubernetes/community#132](https://github.com/kubernetes/community/pull/132).
#### Behavior
This will modify the pod spec. The supported changes to
`Env`, `EnvFrom`, and `VolumeMounts` apply to the container spec for
all containers in the pod with the specified matching `Selector`. The
changes to `Volumes` apply to the pod spec for all pods matching `Selector`.
The resultant modified pod spec will be annotated to show that it was modified by
the `PodPreset`. This will be of the form
`podpreset.admission.kubernetes.io/<pip name>": "<resource version>"`.
*Why modify all containers in a pod?*
Currently there is no concept of labels on specific containers in a pod which
would be necessary for per-container pod injections. We could add labels
for specific containers which would allow this and be the best solution to not
injecting all. Container labels have been discussed various times through
multiple issues and proposals, which all congregate to this thread on the
[kubernetes-sig-node mailing
list](https://groups.google.com/forum/#!topic/kubernetes-sig-node/gijxbYC7HT8).
In the future, even if container labels were added, we would need to be careful
about not making breaking changes to the current behavior.
Other solutions include basing the container to inject based off
matching its name to another field in the `PodPreset` spec, but
this would not scale well and would cause annoyance with configuration
management.
In the future we might question whether we need or want containers to express
that they expect injection. At this time we are deferring this issue.
## Examples
### Simple Pod Spec Example
This is a simple example to show how a Pod spec is modified by the Pod
Injection Policy.
**User submitted pod spec:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
ports:
- containerPort: 80
```
**Example Pod Preset:**
```yaml
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
```
**Pod spec after admission controller:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
ports:
- containerPort: 80
env:
- name: DB_PORT
value: 6379
volumes:
- name: cache-volume
emptyDir: {}
```
### Pod Spec with `ConfigMap` Example
This is an example to show how a Pod spec is modified by the Pod Injection
Policy that defines a `ConfigMap` for Environment Variables.
**User submitted pod spec:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
ports:
- containerPort: 80
```
**User submitted `ConfigMap`:**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: etcd-env-config
data:
number_of_members: "1"
initial_cluster_state: new
initial_cluster_token: DUMMY_ETCD_INITIAL_CLUSTER_TOKEN
discovery_token: DUMMY_ETCD_DISCOVERY_TOKEN
discovery_url: http://etcd_discovery:2379
etcdctl_peers: http://etcd:2379
duplicate_key: FROM_CONFIG_MAP
REPLACE_ME: "a value"
```
**Example Pod Preset:**
```yaml
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
- name: duplicate_key
value: FROM_ENV
- name: expansion
value: $(REPLACE_ME)
envFrom:
- configMapRef:
name: etcd-env-config
volumeMounts:
- mountPath: /cache
name: cache-volume
- mountPath: /etc/app/config.json
readOnly: true
name: secret-volume
volumes:
- name: cache-volume
emptyDir: {}
- name: secret-volume
secretName: config-details
```
**Pod spec after admission controller:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
- mountPath: /etc/app/config.json
readOnly: true
name: secret-volume
ports:
- containerPort: 80
env:
- name: DB_PORT
value: 6379
- name: duplicate_key
value: FROM_ENV
- name: expansion
value: $(REPLACE_ME)
envFrom:
- configMapRef:
name: etcd-env-config
volumes:
- name: cache-volume
emptyDir: {}
- name: secret-volume
secretName: config-details
```
### ReplicaSet with Pod Spec Example
The following example shows that only the pod spec is modified by the Pod
Injection Policy.
**User submitted ReplicaSet:**
```yaml
apiVersion: settings/v1alpha1
kind: ReplicaSet
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
tier: frontend
matchExpressions:
- {key: tier, operator: In, values: [frontend]}
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 80
```
**Example Pod Preset:**
```yaml
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
tier: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
```
**Pod spec after admission controller:**
```yaml
kind: Pod
metadata:
labels:
app: guestbook
tier: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3
resources:
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- mountPath: /cache
name: cache-volume
env:
- name: GET_HOSTS_FROM
value: dns
- name: DB_PORT
value: 6379
ports:
- containerPort: 80
volumes:
- name: cache-volume
emptyDir: {}
```
### Multiple PodPreset Example
This is an example to show how a Pod spec is modified by multiple Pod
Injection Policies.
**User submitted pod spec:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
ports:
- containerPort: 80
```
**Example Pod Preset:**
```yaml
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
```
**Another Pod Preset:**
```yaml
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: proxy
namespace: myns
spec:
selector:
matchLabels:
role: frontend
volumeMounts:
- mountPath: /etc/proxy/configs
name: proxy-volume
volumes:
- name: proxy-volume
emptyDir: {}
```
**Pod spec after admission controller:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
podpreset.admission.kubernetes.io/proxy: "resource version"
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
- mountPath: /etc/proxy/configs
name: proxy-volume
ports:
- containerPort: 80
env:
- name: DB_PORT
value: 6379
volumes:
- name: cache-volume
emptyDir: {}
- name: proxy-volume
emptyDir: {}
```
### Conflict Example
This is a example to show how a Pod spec is not modified by the Pod Injection
Policy when there is a conflict.
**User submitted pod spec:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
ports:
volumes:
- name: cache-volume
emptyDir: {}
- containerPort: 80
```
**Example Pod Preset:**
```yaml
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: other-volume
volumes:
- name: other-volume
emptyDir: {}
```
**Pod spec after admission controller will not change because of the conflict:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
ports:
volumes:
- name: cache-volume
emptyDir: {}
- containerPort: 80
```
**If we run `kubectl describe...` we can see the event:**
```
$ kubectl describe ...
....
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Tue, 07 Feb 2017 16:56:12 -0700 Tue, 07 Feb 2017 16:56:12 -0700 1 {podpreset.admission.kubernetes.io/allow-database } conflict Conflict on pod preset. Duplicate mountPath /cache.
```

View File

@ -1,114 +1,480 @@
# Pod level resource management in Kubelet
# Kubelet pod level resource management
**Author**: Buddha Prakash (@dubstack), Vishnu Kannan (@vishh)
**Authors**:
**Last Updated**: 06/23/2016
1. Buddha Prakash (@dubstack)
1. Vishnu Kannan (@vishh)
1. Derek Carr (@derekwaynecarr)
**Status**: Draft Proposal (WIP)
**Last Updated**: 02/21/2017
This document proposes a design for introducing pod level resource accounting to Kubernetes, and outlines the implementation and rollout plan.
**Status**: Implementation planned for Kubernetes 1.6
<!-- BEGIN MUNGE: GENERATED_TOC -->
- [Pod level resource management in Kubelet](#pod-level-resource-management-in-kubelet)
- [Introduction](#introduction)
- [Non Goals](#non-goals)
- [Motivations](#motivations)
- [Design](#design)
- [Proposed cgroup hierarchy:](#proposed-cgroup-hierarchy)
- [QoS classes](#qos-classes)
- [Guaranteed](#guaranteed)
- [Burstable](#burstable)
- [Best Effort](#best-effort)
- [With Systemd](#with-systemd)
- [Hierarchy Outline](#hierarchy-outline)
- [QoS Policy Design Decisions](#qos-policy-design-decisions)
- [Implementation Plan](#implementation-plan)
- [Top level Cgroups for QoS tiers](#top-level-cgroups-for-qos-tiers)
- [Pod level Cgroup creation and deletion (Docker runtime)](#pod-level-cgroup-creation-and-deletion-docker-runtime)
- [Container level cgroups](#container-level-cgroups)
- [Rkt runtime](#rkt-runtime)
- [Add Pod level metrics to Kubelet's metrics provider](#add-pod-level-metrics-to-kubelets-metrics-provider)
- [Rollout Plan](#rollout-plan)
- [Implementation Status](#implementation-status)
<!-- END MUNGE: GENERATED_TOC -->
This document proposes a design for introducing pod level resource accounting
to Kubernetes. It outlines the implementation and associated rollout plan.
## Introduction
As of now [Quality of Service(QoS)](../../docs/design/resource-qos.md) is not enforced at a pod level. Excepting pod evictions, all the other QoS features are not applicable at the pod level.
To better support QoS, there is a need to add support for pod level resource accounting in Kubernetes.
Kubernetes supports container level isolation by allowing users
to specify [compute resource requirements](resources.md) via requests and
limits on individual containers. The `kubelet` delegates creation of a
cgroup sandbox for each container to its associated container runtime.
We propose to have a unified cgroup hierarchy with pod level cgroups for better resource management. We will have a cgroup hierarchy with top level cgroups for the three QoS classes Guaranteed, Burstable and BestEffort. Pods (and their containers) belonging to a QoS class will be grouped under these top level QoS cgroups. And all containers in a pod are nested under the pod cgroup.
Each pod has an associated [Quality of Service (QoS)](resource-qos.md)
class based on the aggregate resource requirements made by individual
containers in the pod. The `kubelet` has the ability to
[evict pods](kubelet-eviction.md) when compute resources are scarce. It evicts
pods with the lowest QoS class in order to attempt to maintain stability of the
node.
The proposed cgroup hierarchy would allow for more efficient resource management and lead to improvements in node reliability.
This would also allow for significant latency optimizations in terms of pod eviction on nodes with the use of pod level resource usage metrics.
This document provides a basic outline of how we plan to implement and rollout this feature.
The `kubelet` has no associated cgroup sandbox for individual QoS classes or
individual pods. This inhibits the ability to perform proper resource
accounting on the node, and introduces a number of code complexities when
trying to build features around QoS.
This design introduces a new cgroup hierarchy to enable the following:
## Non Goals
1. Enforce QoS classes on the node.
1. Simplify resource accounting at the pod level.
1. Allow containers in a pod to share slack resources within its pod cgroup.
For example, a Burstable pod has two containers, where one container makes a
CPU request and the other container does not. The latter container should
get CPU time not used by the former container. Today, it must compete for
scare resources at the node level across all BestEffort containers.
1. Ability to charge per container overhead to the pod instead of the node.
This overhead is container runtime specific. For example, `docker` has
an associated `containerd-shim` process that is created for each container
which should be charged to the pod.
1. Ability to charge any memory usage of memory-backed volumes to the pod when
an individual container exits instead of the node.
- Pod level disk accounting will not be tackled in this proposal.
- Pod level resource specification in the Kubernetes API will not be tackled in this proposal.
## Enabling QoS and Pod level cgroups
## Motivations
To enable the new cgroup hierarchy, the operator must enable the
`--cgroups-per-qos` flag. Once enabled, the `kubelet` will start managing
inner nodes of the described cgroup hierarchy.
Kubernetes currently supports container level isolation only and lets users specify resource requests/limits on the containers [Compute Resources](../../docs/design/resources.md). The `kubelet` creates a cgroup sandbox (via it's container runtime) for each container.
The `--cgroup-root` flag if not specified when the `--cgroups-per-qos` flag
is enabled will default to `/`. The `kubelet` will parent any cgroups
it creates below that specified value per the
[node allocatable](node-allocatable.md) design.
## Configuring a cgroup driver
There are a few shortcomings to the current model.
- Existing QoS support does not apply to pods as a whole. On-going work to support pod level eviction using QoS requires all containers in a pod to belong to the same class. By having pod level cgroups, it is easy to track pod level usage and make eviction decisions.
- Infrastructure overhead per pod is currently charged to the node. The overhead of setting up and managing the pod sandbox is currently accounted to the node. If the pod sandbox is a bit expensive, like in the case of hyper, having pod level accounting becomes critical.
- For the docker runtime we have a containerd-shim which is a small library that sits in front of a runtime implementation allowing it to be reparented to init, handle reattach from the caller etc. With pod level cgroups containerd-shim can be charged to the pod instead of the machine.
- If a container exits, all its anonymous pages (tmpfs) gets accounted to the machine (root). With pod level cgroups, that usage can also be attributed to the pod.
- Let containers share resources - with pod level limits, a pod with a Burstable container and a BestEffort container is classified as Burstable pod. The BestEffort container is able to consume slack resources not used by the Burstable container, and still be capped by the overall pod level limits.
The `kubelet` will support manipulation of the cgroup hierarchy on
the host using a cgroup driver. The driver is configured via the
`--cgroup-driver` flag.
## Design
The supported values are the following:
High level requirements for the design are as follows:
- Do not break existing users. Ideally, there should be no changes to the Kubernetes API semantics.
- Support multiple cgroup managers - systemd, cgroupfs, etc.
* `cgroupfs` is the default driver that performs direct manipulation of the
cgroup filesystem on the host in order to manage cgroup sandboxes.
* `systemd` is an alternative driver that manages cgroup sandboxes using
transient slices for resources that are supported by that init system.
How we intend to achieve these high level goals is covered in greater detail in the Implementation Plan.
Depending on the configuration of the associated container runtime,
operators may have to choose a particular cgroup driver to ensure
proper system behavior. For example, if operators use the `systemd`
cgroup driver provided by the `docker` runtime, the `kubelet` must
be configured to use the `systemd` cgroup driver.
We use the following denotations in the sections below:
Implementation of either driver will delegate to the libcontainer library
in opencontainers/runc.
For the three QoS classes
`G⇒ Guaranteed QoS, Bu⇒ Burstable QoS, BE⇒ BestEffort QoS`
### Conversion of cgroupfs to systemd naming conventions
For the value specified for the --qos-memory-overcommitment flag
`qmo⇒ qos-memory-overcommitment`
Internally, the `kubelet` maintains both an abstract and a concrete name
for its associated cgroup sandboxes. The abstract name follows the traditional
`cgroupfs` style syntax. The concrete name is the name for how the cgroup
sandbox actually appears on the host filesystem after any conversions performed
based on the cgroup driver.
Currently the Kubelet highly prioritizes resource utilization and thus allows BE pods to use as much resources as they want. And in case of OOM the BE pods are first to be killed. We follow this policy as G pods often don't use the amount of resource request they specify. By overcommiting the node the BE pods are able to utilize these left over resources. And in case of OOM the BE pods are evicted by the eviciton manager. But there is some latency involved in the pod eviction process which can be a cause of concern in latency-sensitive servers. On such servers we would want to avoid OOM conditions on the node. Pod level cgroups allow us to restrict the amount of available resources to the BE pods. So reserving the requested resources for the G and Bu pods would allow us to avoid invoking the OOM killer.
If the `systemd` cgroup driver is used, the `kubelet` converts the `cgroupfs`
style syntax into transient slices, and as a result, it must follow `systemd`
conventions for path encoding.
For example, the cgroup name `/burstable/pod123-456` is translated to a
transient slice with the name `burstable-pod123_456.slice`. Given how
systemd manages the cgroup filesystem, the concrete name for the cgroup
sandbox becomes `/burstable.slice/burstable-pod123_456.slice`.
We add a flag `qos-memory-overcommitment` to kubelet which would allow users to configure the percentage of memory overcommitment on the node. We have the default as 100, so by default we allow complete overcommitment on the node and let the BE pod use as much memory as it wants, and not reserve any resources for the G and Bu pods. As expected if there is an OOM in such a case we first kill the BE pods before the G and Bu pods.
On the other hand if a user wants to ensure very predictable tail latency for latency-sensitive servers he would need to set qos-memory-overcommitment to a really low value(preferrably 0). In this case memory resources would be reserved for the G and Bu pods and BE pods would be able to use only the left over memory resource.
## Integration with container runtimes
Examples in the next section.
The `kubelet` when integrating with container runtimes always provides the
concrete cgroup filesystem name for the pod sandbox.
### Proposed cgroup hierarchy:
## Conversion of CPU millicores to cgroup configuration
For the initial implementation we will only support limits for cpu and memory resources.
Kubernetes measures CPU requests and limits in millicores.
#### QoS classes
The following formula is used to convert CPU in millicores to cgroup values:
A pod can belong to one of the following 3 QoS classes: Guaranteed, Burstable, and BestEffort, in decreasing order of priority.
* cpu.shares = (cpu in millicores * 1024) / 1000
* cpu.cfs_period_us = 100000 (i.e. 100ms)
* cpu.cfs_quota_us = quota = (cpu in millicores * 100000) / 1000
#### Guaranteed
## Pod level cgroups
`G` pods will be placed at the `$Root` cgroup by default. `$Root` is the system root i.e. "/" by default and if `--cgroup-root` flag is used then we use the specified cgroup-root as the `$Root`. To ensure Kubelet's idempotent behaviour we follow a pod cgroup naming format which is opaque and deterministic. Say we have a pod with UID: `5f9b19c9-3a30-11e6-8eea-28d2444e470d` the pod cgroup PodUID would be named: `pod-5f9b19c93a3011e6-8eea28d2444e470d`.
The `kubelet` will create a cgroup sandbox for each pod.
The naming convention for the cgroup sandbox is `pod<pod.UID>`. It enables
the `kubelet` to associate a particular cgroup on the host filesytem
with a corresponding pod without managing any additional state. This is useful
when the `kubelet` restarts and needs to verify the cgroup filesystem.
__Note__: The cgroup-root flag would allow the user to configure the root of the QoS cgroup hierarchy. Hence cgroup-root would be redefined as the root of QoS cgroup hierarchy and not containers.
A pod can belong to one of the following 3 QoS classes in decreasing priority:
1. Guaranteed
1. Burstable
1. BestEffort
The resource configuration for the cgroup sandbox is dependent upon the
pod's associated QoS class.
### Guaranteed QoS
A pod in this QoS class has its cgroup sandbox configured as follows:
```
/PodUID/cpu.quota = cpu limit of Pod
/PodUID/cpu.shares = cpu request of Pod
/PodUID/memory.limit_in_bytes = memory limit of Pod
pod<UID>/cpu.shares = sum(pod.spec.containers.resources.requests[cpu])
pod<UID>/cpu.cfs_quota_us = sum(pod.spec.containers.resources.limits[cpu])
pod<UID>/memory.limit_in_bytes = sum(pod.spec.containers.resources.limits[memory])
```
Example:
### Burstable QoS
A pod in this QoS class has its cgroup sandbox configured as follows:
```
pod<UID>/cpu.shares = sum(pod.spec.containers.resources.requests[cpu])
```
If all containers in the pod specify a cpu limit:
```
pod<UID>/cpu.cfs_quota_us = sum(pod.spec.containers.resources.limits[cpu])
```
Finally, if all containers in the pod specify a memory limit:
```
pod<UID>/memory.limit_in_bytes = sum(pod.spec.containers.resources.limits[memory])
```
### BestEffort QoS
A pod in this QoS class has its cgroup sandbox configured as follows:
```
pod<UID>/cpu.shares = 2
```
## QoS level cgroups
The `kubelet` defines a `--cgroup-root` flag that is used to specify the `ROOT`
node in the cgroup hierarchy below which the `kubelet` should manange individual
cgroup sandboxes. It is strongly recommended that users keep the default
value for `--cgroup-root` as `/` in order to avoid deep cgroup hierarchies. The
`kubelet` creates a cgroup sandbox under the specified path `ROOT/kubepods` per
[node allocatable](node-allocatable.md) to parent pods. For simplicity, we will
refer to `ROOT/kubepods` as `ROOT` in this document.
The `ROOT` cgroup sandbox is used to parent all pod sandboxes that are in
the Guaranteed QoS class. By definition, pods in this class have cpu and
memory limits specified that are equivalent to their requests so the pod
level cgroup sandbox confines resource consumption without the need of an
additional cgroup sandbox for the tier.
When the `kubelet` launches, it will ensure a `Burstable` cgroup sandbox
and a `BestEffort` cgroup sandbox exist as children of `ROOT`. These cgroup
sandboxes will parent pod level cgroups in those associated QoS classes.
The `kubelet` highly prioritizes resource utilization, and thus
allows BestEffort and Burstable pods to potentially consume as many
resources that are presently available on the node.
For compressible resources like CPU, the `kubelet` attempts to mitigate
the issue via its use of CPU CFS shares. CPU time is proportioned
dynamically when there is contention using CFS shares that attempts to
ensure minimum requests are satisfied.
For incompressible resources, this prioritization scheme can inhibit the
ability of a pod to have its requests satisfied. For example, a Guaranteed
pods memory request may not be satisfied if there are active BestEffort
pods consuming all available memory.
As a node operator, I may want to satisfy the following use cases:
1. I want to prioritize access to compressible resources for my system
and/or kubernetes daemons over end-user pods.
1. I want to prioritize access to compressible resources for my Guaranteed
workloads over my Burstable workloads.
1. I want to prioritize access to compressible resources for my Burstable
workloads over my BestEffort workloads.
Almost all operators are encouraged to support the first use case by enforcing
[node allocatable](node-allocatable.md) via `--system-reserved` and `--kube-reserved`
flags. It is understood that not all operators may feel the need to extend
that level of reservation to Guaranteed and Burstable workloads if they choose
to prioritize utilization. That said, many users in the community deploy
cluster services as Guaranteed or Burstable workloads via a `DaemonSet` and would like a similar
resource reservation model as is provided via [node allocatable](node-allocatable)
for system and kubernetes daemons.
For operators that have this concern, the `kubelet` with opt-in configuration
will attempt to limit the abilty for a pod in a lower QoS tier to burst utilization
of a compressible resource that was requested by a pod in a higher QoS tier.
The `kubelet` will support a flag `experimental-qos-reserved` that
takes a set of percentages per incompressible resource that controls how the
QoS cgroup sandbox attempts to reserve resources for its tier. It attempts
to reserve requested resources to exclude pods from lower OoS classes from
using resources requested by higher QoS classes. The flag will accept values
in a range from 0-100%, where a value of `0%` instructs the `kubelet` to attempt
no reservation, and a value of `100%` will instruct the `kubelet` to attempt to
reserve the sum of requested resource across all pods on the node. The `kubelet`
initially will only support `memory`. The default value per incompressible
resource if not specified is for no reservation to occur for the incompressible
resource.
Prior to starting a pod, the `kubelet` will attempt to update the
QoS cgroup sandbox associated with the lower QoS tier(s) in order
to prevent consumption of the requested resource by the new pod.
For example, prior to starting a Guaranteed pod, the Burstable
and BestEffort QoS cgroup sandboxes are adjusted. For resource
specific details, and concerns, see the sections per resource that
follow.
The `kubelet` will allocate resources to the QoS level cgroup
dynamically in response to the following events:
1. kubelet startup/recovery
1. prior to creation of the pod level cgroup
1. after deletion of the pod level cgroup
1. at periodic intervals to reach `experimental-qos-reserved`
heurisitc that converge to a desired state.
All writes to the QoS level cgroup sandboxes are protected via a
common lock in the kubelet to ensure we do not have multiple concurrent
writers to this tier in the hierarchy.
### QoS level CPU allocation
The `BestEffort` cgroup sandbox is statically configured as follows:
```
ROOT/besteffort/cpu.shares = 2
```
This ensures that allocation of CPU time to pods in this QoS class
is given the lowest priority.
The `Burstable` cgroup sandbox CPU share allocation is dynamic based
on the set of pods currently scheduled to the node.
```
ROOT/burstable/cpu.shares = max(sum(Burstable pods cpu requests, 2)
```
The Burstable cgroup sandbox is updated dynamically in the exit
points described in the previous section. Given the compressible
nature of CPU, and the fact that cpu.shares are evaluated via relative
priority, the risk of an update being incorrect is minimized as the `kubelet`
converges to a desired state. Failure to set `cpu.shares` at the QoS level
cgroup would result in `500m` of cpu for a Guaranteed pod to have different
meaning than `500m` of cpu for a Burstable pod in the current hierarchy. This
is because the default `cpu.shares` value if unspecified is `1024` and `cpu.shares`
are evaluated relative to sibling nodes in the cgroup hierarchy. As a consequence,
all of the Burstable pods under contention would have a relative priority of 1 cpu
unless updated dynamically to capture the sum of requests. For this reason,
we will always set `cpu.shares` for the QoS level sandboxes
by default as part of roll-out for this feature.
### QoS level memory allocation
By default, no memory limits are applied to the BestEffort
and Burstable QoS level cgroups unless a `--qos-reserve-requests` value
is specified for memory.
The heuristic that is applied is as follows for each QoS level sandbox:
```
ROOT/burstable/memory.limit_in_bytes =
Node.Allocatable - {(summation of memory requests of `Guaranteed` pods)*(reservePercent / 100)}
ROOT/besteffort/memory.limit_in_bytes =
Node.Allocatable - {(summation of memory requests of all `Guaranteed` and `Burstable` pods)*(reservePercent / 100)}
```
A value of `--experimental-qos-reserved=memory=100%` will cause the
`kubelet` to adjust the Burstable and BestEffort cgroups from consuming memory
that was requested by a higher QoS class. This increases the risk
of inducing OOM on BestEffort and Burstable workloads in favor of increasing
memory resource guarantees for Guaranteed and Burstable workloads. A value of
`--experimental-qos-reserved=memory=0%` will allow a Burstable
and BestEffort QoS sandbox to consume up to the full node allocatable amount if
available, but increases the risk that a Guaranteed workload will not have
access to requested memory.
Since memory is an incompressible resource, it is possible that a QoS
level cgroup sandbox may not be able to reduce memory usage below the
value specified in the heuristic described earlier during pod admission
and pod termination.
As a result, the `kubelet` runs a periodic thread to attempt to converge
to this desired state from the above heuristic. If unreclaimable memory
usage has exceeded the desired limit for the sandbox, the `kubelet` will
attempt to set the effective limit near the current usage to put pressure
on the QoS cgroup sandbox and prevent further consumption.
The `kubelet` will not wait for the QoS cgroup memory limit to converge
to the desired state prior to execution of the pod, but it will always
attempt to cap the existing usage of QoS cgroup sandboxes in lower tiers.
This does mean that the new pod could induce an OOM event at the `ROOT`
cgroup, but ideally per our QoS design, the oom_killer targets a pod
in a lower QoS class, or eviction evicts a lower QoS pod. The periodic
task is then able to converge to the steady desired state so any future
pods in a lower QoS class do not impact the pod at a higher QoS class.
Adjusting the memory limits for the QoS level cgroup sandbox carries
greater risk given the incompressible nature of memory. As a result,
we are not enabling this function by default, but would like operators
that want to value resource priority over resource utilization to gather
real-world feedback on its utility.
As a best practice, oeprators that want to provide a similar resource
reservation model for Guaranteed pods as we offer via enforcement of
node allocatable are encouraged to schedule their Guaranteed pods first
as it will ensure the Burstable and BestEffort tiers have had their QoS
memory limits appropriately adjusted before taking unbounded workload on
node.
## Memory backed volumes
The pod level cgroup ensures that any writes to a memory backed volume
are correctly charged to the pod sandbox even when a container process
in the pod restarts.
All memory backed volumes are removed when a pod reaches a terminal state.
The `kubelet` verifies that a pod's cgroup is deleted from the
host before deleting a pod from the API server as part of the graceful
deletion process.
## Log basic cgroup management
The `kubelet` will log and collect metrics associated with cgroup manipulation.
It will log metrics for cgroup create, update, and delete actions.
## Rollout Plan
### Kubernetes 1.5
The support for the described cgroup hierarchy is experimental.
### Kubernetes 1.6+
The feature will be enabled by default.
As a result, we will recommend that users drain their nodes prior
to upgrade of the `kubelet`. If users do not drain their nodes, the
`kubelet` will act as follows:
1. If a pod has a `RestartPolicy=Never`, then mark the pod
as `Failed` and terminate its workload.
1. All other pods that are not parented by a pod-level cgroup
will be restarted.
The `cgroups-per-qos` flag will be enabled by default, but user's
may choose to opt-out. We may deprecate this opt-out mechanism
in Kubernetes 1.7, and remove the flag entirely in Kubernetes 1.8.
#### Risk Assessment
The impact of the unified cgroup hierarchy is restricted to the `kubelet`.
Potential issues:
1. Bugs
1. Performance and/or reliability issues for `BestEffort` pods. This is
most likely to appear on E2E test runs that mix/match pods across different
QoS tiers.
1. User misconfiguration; most notably the `--cgroup-driver` needs to match
the expected behavior of the container runtime. We provide clear errors
in `kubelet` logs for container runtimes that we include in tree.
#### Proposed Timeline
* 01/31/2017 - Discuss the rollout plan in sig-node meeting
* 02/14/2017 - Flip the switch to enable pod level cgroups by default
* enable existing experimental behavior by default
* 02/21/2017 - Assess impacts based on enablement
* 02/27/2017 - Kubernetes Feature complete (i.e. code freeze)
* opt-in behavior surrounding the feature (`experimental-qos-reserved` support) completed.
* 03/01/2017 - Send an announcement to kubernetes-dev@ about the rollout and potential impact
* 03/22/2017 - Kubernetes 1.6 release
* TBD (1.7?) - Eliminate the option to not use the new cgroup hierarchy.
This is based on the tentative timeline of kubernetes 1.6 release. Need to work out the timeline with the 1.6 release czar.
## Future enhancements
### Add Pod level metrics to Kubelet's metrics provider
Update the `kubelet` metrics provider to include pod level metrics.
### Evaluate supporting evictions local to QoS cgroup sandboxes
Rather than induce eviction at `/` or `/kubepods`, evaluate supporting
eviction decisions for the unbounded QoS tiers (Burstable, BestEffort).
## Examples
The following describes the cgroup representation of a node with pods
across multiple QoS classes.
### Cgroup Hierachy
The following identifies a sample hierarchy based on the described design.
It assumes the flag `--experimental-qos-reserved` is not enabled for clarity.
```
$ROOT
|
+- Pod1
| |
| +- Container1
| +- Container2
| ...
+- Pod2
| +- Container3
| ...
+- ...
|
+- burstable
| |
| +- Pod3
| | |
| | +- Container4
| | ...
| +- Pod4
| | +- Container5
| | ...
| +- ...
|
+- besteffort
| |
| +- Pod5
| | |
| | +- Container6
| | +- Container7
| | ...
| +- ...
```
### Guaranteed Pods
We have two pods Pod1 and Pod2 having Pod Spec given below
```yaml
@ -142,32 +508,19 @@ spec:
memory: 2Gii
```
Pod1 and Pod2 are both classified as `G` and are nested under the `Root` cgroup.
Pod1 and Pod2 are both classified as Guaranteed and are nested under the `ROOT` cgroup.
```
/Pod1/cpu.quota = 110m
/Pod1/cpu.shares = 110m
/Pod2/cpu.quota = 20m
/Pod2/cpu.shares = 20m
/Pod1/memory.limit_in_bytes = 3Gi
/Pod2/memory.limit_in_bytes = 2Gi
/ROOT/Pod1/cpu.quota = 110m
/ROOT/Pod1/cpu.shares = 110m
/ROOT/Pod1/memory.limit_in_bytes = 3Gi
/ROOT/Pod2/cpu.quota = 20m
/ROOT/Pod2/cpu.shares = 20m
/ROOT/Pod2/memory.limit_in_bytes = 2Gi
```
#### Burstable
#### Burstable Pods
We have the following resource parameters for the `Bu` cgroup.
```
/Bu/cpu.shares = summation of cpu requests of all Bu pods
/Bu/PodUID/cpu.quota = Pod Cpu Limit
/Bu/PodUID/cpu.shares = Pod Cpu Request
/Bu/memory.limit_in_bytes = Allocatable - {(summation of memory requests/limits of `G` pods)*(1-qom/100)}
/Bu/PodUID/memory.limit_in_bytes = Pod memory limit
```
`Note: For the `Bu` QoS when limits are not specified for any one of the containers, the Pod limit defaults to the node resource allocatable quantity.`
Example:
We have two pods Pod3 and Pod4 having Pod Spec given below:
```yaml
@ -207,33 +560,23 @@ spec:
memory: 1Gi
```
Pod3 and Pod4 are both classified as `Bu` and are hence nested under the Bu cgroup
And for `qom` = 0
Pod3 and Pod4 are both classified as Burstable and are hence nested under
the Burstable cgroup.
```
/Bu/cpu.shares = 30m
/Bu/Pod3/cpu.quota = 150m
/Bu/Pod3/cpu.shares = 20m
/Bu/Pod4/cpu.quota = 20m
/Bu/Pod4/cpu.shares = 10m
/Bu/memory.limit_in_bytes = Allocatable - 5Gi
/Bu/Pod3/memory.limit_in_bytes = 3Gi
/Bu/Pod4/memory.limit_in_bytes = 2Gi
/ROOT/burstable/cpu.shares = 30m
/ROOT/burstable/memory.limit_in_bytes = Allocatable - 5Gi
/ROOT/burstable/Pod3/cpu.quota = 150m
/ROOT/burstable/Pod3/cpu.shares = 20m
/ROOT/burstable/Pod3/memory.limit_in_bytes = 3Gi
/ROOT/burstable/Pod4/cpu.quota = 20m
/ROOT/burstable/Pod4/cpu.shares = 10m
/ROOT/burstable/Pod4/memory.limit_in_bytes = 2Gi
```
#### Best Effort
#### Best Effort pods
For pods belonging to the `BE` QoS we don't set any quota.
```
/BE/cpu.shares = 2
/BE/cpu.quota= not set
/BE/memory.limit_in_bytes = Allocatable - {(summation of memory requests of all `G` and `Bu` pods)*(1-qom/100)}
/BE/PodUID/memory.limit_in_bytes = no limit
```
Example:
We have a pod 'Pod5' having Pod Spec given below:
We have a pod, Pod5, having Pod Spec given below:
```yaml
kind: Pod
@ -247,170 +590,12 @@ spec:
resources:
```
Pod5 is classified as `BE` and is hence nested under the BE cgroup
And for `qom` = 0
Pod5 is classified as BestEffort and is hence nested under the BestEffort cgroup
```
/BE/cpu.shares = 2
/BE/cpu.quota= not set
/BE/memory.limit_in_bytes = Allocatable - 7Gi
/BE/Pod5/memory.limit_in_bytes = no limit
/ROOT/besteffort/cpu.shares = 2
/ROOT/besteffort/cpu.quota= not set
/ROOT/besteffort/memory.limit_in_bytes = Allocatable - 7Gi
/ROOT/besteffort/Pod5/memory.limit_in_bytes = no limit
```
### With Systemd
In systemd we have slices for the three top level QoS class. Further each pod is a subslice of exactly one of the three QoS slices. Each container in a pod belongs to a scope nested under the qosclass-pod slice.
Example: We plan to have the following cgroup hierarchy on systemd systems
```
/memory/G-PodUID.slice/containerUID.scope
/cpu,cpuacct/G-PodUID.slice/containerUID.scope
/memory/Bu.slice/Bu-PodUID.slice/containerUID.scope
/cpu,cpuacct/Bu.slice/Bu-PodUID.slice/containerUID.scope
/memory/BE.slice/BE-PodUID.slice/containerUID.scope
/cpu,cpuacct/BE.slice/BE-PodUID.slice/containerUID.scope
```
### Hierarchy Outline
- "$Root" is the system root of the node i.e. "/" by default and if `--cgroup-root` is specified then the specified cgroup-root is used as "$Root".
- We have a top level QoS cgroup for the `Bu` and `BE` QoS classes.
- But we __dont__ have a separate cgroup for the `G` QoS class. `G` pod cgroups are brought up directly under the `Root` cgroup.
- Each pod has its own cgroup which is nested under the cgroup matching the pod's QoS class.
- All containers brought up by the pod are nested under the pod's cgroup.
- system-reserved cgroup contains the system specific processes.
- kube-reserved cgroup contains the kubelet specific daemons.
```
$ROOT
|
+- Pod1
| |
| +- Container1
| +- Container2
| ...
+- Pod2
| +- Container3
| ...
+- ...
|
+- Bu
| |
| +- Pod3
| | |
| | +- Container4
| | ...
| +- Pod4
| | +- Container5
| | ...
| +- ...
|
+- BE
| |
| +- Pod5
| | |
| | +- Container6
| | +- Container7
| | ...
| +- ...
|
+- System-reserved
| |
| +- system
| +- docker (optional)
| +- ...
|
+- Kube-reserved
| |
| +- kubelet
| +- docker (optional)
| +- ...
|
```
#### QoS Policy Design Decisions
- This hierarchy highly prioritizes resource guarantees to the `G` over `Bu` and `BE` pods.
- By not having a separate cgroup for the `G` class, the hierarchy allows the `G` pods to burst and utilize all of Node's Allocatable capacity.
- The `BE` and `Bu` pods are strictly restricted from bursting and hogging resources and thus `G` Pods are guaranteed resource isolation.
- `BE` pods are treated as lowest priority. So for the `BE` QoS cgroup we set cpu shares to the lowest possible value ie.2. This ensures that the `BE` containers get a relatively small share of cpu time.
- Also we don't set any quota on the cpu resources as the containers on the `BE` pods can use any amount of free resources on the node.
- Having memory limit of `BE` cgroup as (Allocatable - summation of memory requests of `G` and `Bu` pods) would result in `BE` pods becoming more susceptible to being OOM killed. As more `G` and `Bu` pods are scheduled kubelet will more likely kill `BE` pods, even if the `G` and `Bu` pods are using less than their request since we will be dynamically reducing the size of `BE` m.limit_in_bytes. But this allows for better memory guarantees to the `G` and `Bu` pods.
## Implementation Plan
The implementation plan is outlined in the next sections.
We will have a 'experimental-cgroups-per-qos' flag to specify if the user wants to use the QoS based cgroup hierarchy. The flag would be set to false by default at least in v1.5.
#### Top level Cgroups for QoS tiers
Two top level cgroups for `Bu` and `BE` QoS classes are created when Kubelet starts to run on a node. All `G` pods cgroups are by default nested under the `Root`. So we dont create a top level cgroup for the `G` class. For raw cgroup systems we would use libcontainers cgroups manager for general cgroup management(cgroup creation/destruction). But for systemd we don't have equivalent support for slice management in libcontainer yet. So we will be adding support for the same in the Kubelet. These cgroups are only created once on Kubelet initialization as a part of node setup. Also on systemd these cgroups are transient units and will not survive reboot.
#### Pod level Cgroup creation and deletion (Docker runtime)
- When a new pod is brought up, its QoS class is firstly determined.
- We add an interface to Kubelet's ContainerManager to create and delete pod level cgroups under the cgroup that matches the pod's QoS class.
- This interface will be pluggable. Kubelet will support both systemd and raw cgroups based __cgroup__ drivers. We will be using the --cgroup-driver flag proposed in the [Systemd Node Spec](kubelet-systemd.md) to specify the cgroup driver.
- We inject creation and deletion of pod level cgroups into the pod workers.
- As new pods are added QoS class cgroup parameters are updated to match the resource requests by the Pod.
#### Container level cgroups
Have docker manager create container cgroups under pod level cgroups. With the docker runtime, we will pass --cgroup-parent using the syntax expected for the corresponding cgroup-driver the runtime was configured to use.
#### Rkt runtime
We want to have rkt create pods under a root QoS class that kubelet specifies, and set pod level cgroup parameters mentioned in this proposal by itself.
#### Add Pod level metrics to Kubelet's metrics provider
Update Kubelet's metrics provider to include Pod level metrics. Use cAdvisor's cgroup subsystem information to determine various Pod level usage metrics.
`Note: Changes to cAdvisor might be necessary.`
## Rollout Plan
This feature will be opt-in in v1.4 and an opt-out in v1.5. We recommend users to drain their nodes and opt-in, before switching to v1.5, which will result in a no-op when v1.5 kubelet is rolled out.
## Implementation Status
The implementation goals of the first milestone are outlined below.
- [x] Finalize and submit Pod Resource Management proposal for the project #26751
- [x] Refactor qos package to be used globally throughout the codebase #27749 #28093
- [x] Add interfaces for CgroupManager and CgroupManagerImpl which implements the CgroupManager interface and creates, destroys/updates cgroups using the libcontainer cgroupfs driver. #27755 #28566
- [x] Inject top level QoS Cgroup creation in the Kubelet and add e2e tests to test that behaviour. #27853
- [x] Add PodContainerManagerImpl Create and Destroy methods which implements the respective PodContainerManager methods using a cgroupfs driver. #28017
- [x] Have docker manager create container cgroups under pod level cgroups. Inject creation and deletion of pod cgroups into the pod workers. Add e2e tests to test this behaviour. #29049
- [x] Add support for updating policy for the pod cgroups. Add e2e tests to test this behaviour. #29087
- [ ] Enabling 'cgroup-per-qos' flag in Kubelet: The user is expected to drain the node and restart it before enabling this feature, but as a fallback we also want to allow the user to just restart the kubelet with the cgroup-per-qos flag enabled to use this feature. As a part of this we need to figure out a policy for pods having Restart Policy: Never. More details in this [issue](https://github.com/kubernetes/kubernetes/issues/29946).
- [ ] Removing terminated pod's Cgroup : We need to cleanup the pod's cgroup once the pod is terminated. More details in this [issue](https://github.com/kubernetes/kubernetes/issues/29927).
- [ ] Kubelet needs to ensure that the cgroup settings are what the kubelet expects them to be. If security is not of concern, one can assume that once kubelet applies cgroups setting successfully, the values will never change unless kubelet changes it. If security is of concern, then kubelet will have to ensure that the cgroup values meet its requirements and then continue to watch for updates to cgroups via inotify and re-apply cgroup values if necessary.
Updating QoS limits needs to happen before pod cgroups values are updated. When pod cgroups are being deleted, QoS limits have to be updated after pod cgroup values have been updated for deletion or pod cgroups have been removed. Given that kubelet doesn't have any checkpoints and updates to QoS and pod cgroups are not atomic, kubelet needs to reconcile cgroups status whenever it restarts to ensure that the cgroups values match kubelet's expectation.
- [ ] [TEST] Opting in for this feature and rollbacks should be accompanied by detailed error message when killing pod intermittently.
- [ ] Add a systemd implementation for Cgroup Manager interface
Other smaller work items that we would be good to have before the release of this feature.
- [ ] Add Pod UID to the downward api which will help simplify the e2e testing logic.
- [ ] Check if parent cgroup exist and error out if they don't.
- [ ] Set top level cgroup limit to resource allocatable until we support QoS level cgroup updates. If cgroup root is not `/` then set node resource allocatable as the cgroup resource limits on cgroup root.
- [ ] Add a NodeResourceAllocatableProvider which returns the amount of allocatable resources on the nodes. This interface would be used both by the Kubelet and ContainerManager.
- [ ] Add top level feasibility check to ensure that pod can be admitted on the node by estimating left over resources on the node.
- [ ] Log basic cgroup management ie. creation/deletion metrics
To better support our requirements we needed to make some changes/add features to Libcontainer as well
- [x] Allowing or denying all devices by writing 'a' to devices.allow or devices.deny is
not possible once the device cgroups has children. Libcontainer doesn't have the option of skipping updates on parent devices cgroup. opencontainers/runc/pull/958
- [x] To use libcontainer for creating and managing cgroups in the Kubelet, I would like to just create a cgroup with no pid attached and if need be apply a pid to the cgroup later on. But libcontainer did not support cgroup creation without attaching a pid. opencontainers/runc/pull/956
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-resource-management.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -0,0 +1,407 @@
# Pod Safety, Consistency Guarantees, and Storage Implicitions
@smarterclayton @bprashanth
October 2016
## Proposal and Motivation
A pod represents the finite execution of one or more related processes on the
cluster. In order to ensure higher level consistent controllers can safely
build on top of pods, the exact guarantees around its lifecycle on the cluster
must be clarified, and it must be possible for higher order controllers
and application authors to correctly reason about the lifetime of those
processes and their access to cluster resources in a distributed computing
environment.
To run most clustered software on Kubernetes, it must be possible to guarantee
**at most once** execution of a particular pet pod at any time on the cluster.
This allows the controller to prevent multiple processes having access to
shared cluster resources believing they are the same entity. When a node
containing a pet is partitioned, the Pet Set must remain consistent (no new
entity will be spawned) but may become unavailable (cluster no longer has
a sufficient number of members). The Pet Set guarantee must be strong enough
for an administrator to reason about the state of the cluster by observing
the Kubernetes API.
In order to reconcile partitions, an actor (human or automated) must decide
when the partition is unrecoverable. The actor may be informed of the failure
in an unambiguous way (e.g. the node was destroyed by a meteor) allowing for
certainty that the processes on that node are terminated, and thus may
resolve the partition by deleting the node and the pods on the node.
Alternatively, the actor may take steps to ensure the partitioned node
cannot return to the cluster or access shared resources - this is known
as **fencing** and is a well understood domain.
This proposal covers the changes necessary to ensure:
* Pet Sets can ensure **at most one** semantics for each individual pet
* Other system components such as the node and namespace controller can
safely perform their responsibilities without violating that guarantee
* An administrator or higher level controller can signal that a node
partition is permanent, allowing the Pet Set controller to proceed.
* A fencing controller can take corrective action automatically to heal
partitions
We will accomplish this by:
* Clarifying which components are allowed to force delete pods (as opposed
to merely requesting termination)
* Ensuring system components can observe partitioned pods and nodes
correctly
* Defining how a fencing controller could safely interoperate with
partitioned nodes and pods to safely heal partitions
* Describing how shared storage components without innate safety
guarantees can be safely shared on the cluster.
### Current Guarantees for Pod lifecycle
The existing pod model provides the following guarantees:
* A pod is executed on exactly one node
* A pod has the following lifecycle phases:
* Creation
* Scheduling
* Execution
* Init containers
* Application containers
* Termination
* Deletion
* A pod can only move through its phases in order, and may not return
to an earlier phase.
* A user may specify an interval on the pod called the **termination
grace period** that defines the minimum amount of time the pod will
have to complete the termination phase, and all components will honor
this interval.
* Once a pod begins termination, its termination grace period can only
be shortened, not lengthened.
Pod termination is divided into the following steps:
* A component requests the termination of the pod by issuing a DELETE
to the pod resource with an optional **grace period**
* If no grace period is provided, the default from the pod is leveraged
* When the kubelet observes the deletion, it starts a timer equal to the
grace period and performs the following actions:
* Executes the pre-stop hook, if specified, waiting up to **grace period**
seconds before continuing
* Sends the termination signal to the container runtime (SIGTERM or the
container image's STOPSIGNAL on Docker)
* Waits 2 seconds, or the remaining grace period, whichever is longer
* Sends the force termination signal to the container runtime (SIGKILL)
* Once the kubelet observes the container is fully terminated, it issues
a status update to the REST API for the pod indicating termination, then
issues a DELETE with grace period = 0.
If the kubelet crashes during the termination process, it will restart the
termination process from the beginning (grace period is reset). This ensures
that a process is always given **at least** grace period to terminate cleanly.
A user may re-issue a DELETE to the pod resource specifying a shorter grace
period, but never a longer one.
Deleting a pod with grace period 0 is called **force deletion** and will
update the pod with a `deletionGracePeriodSeconds` of 0, and then immediately
remove the pod from etcd. Because all communication is asynchronous,
force deleting a pod means that the pod processes may continue
to run for an arbitary amount of time. If a higher level component like the
StatefulSet controller treats the existence of the pod API object as a strongly
consistent entity, deleting the pod in this fashion will violate the
at-most-one guarantee we wish to offer for pet sets.
### Guarantees provided by replica sets and replication controllers
ReplicaSets and ReplicationControllers both attempt to **preserve availability**
of their constituent pods over ensuring at most one (of a pod) semantics. So a
replica set to scale 1 will immediately create a new pod when it observes an
old pod has begun graceful deletion, and as a result at many points in the
lifetime of a replica set there will be 2 copies of a pod's processes running
concurrently. Only access to exclusive resources like storage can prevent that
simultaneous execution.
Deployments, being based on replica sets, can offer no stronger guarantee.
### Concurrent access guarantees for shared storage
A persistent volume that references a strongly consistent storage backend
like AWS EBS, GCE PD, OpenStack Cinder, or Ceph RBD can rely on the storage
API to prevent corruption of the data due to simultaneous access by multiple
clients. However, many commonly deployed storage technologies in the
enterprise offer no such consistency guarantee, or much weaker variants, and
rely on complex systems to control which clients may access the storage.
If a PV is assigned a iSCSI, Fibre Channel, or NFS mount point and that PV
is used by two pods on different nodes simultaneously, concurrent access may
result in corruption, even if the PV or PVC is identified as "read write once".
PVC consumers must ensure these volume types are *never* referenced from
multiple pods without some external synchronization. As described above, it
is not safe to use persistent volumes that lack RWO guarantees with a
replica set or deployment, even at scale 1.
## Proposed changes
### Avoid multiple instances of pods
To ensure that the Pet Set controller can safely use pods and ensure at most
one pod instance is running on the cluster at any time for a given pod name,
it must be possible to make pod deletion strongly consistent.
To do that, we will:
* Give the Kubelet sole responsibility for normal deletion of pods -
only the Kubelet in the course of normal operation should ever remove a
pod from etcd (only the Kubelet should force delete)
* The kubelet must not delete the pod until all processes are confirmed
terminated.
* The kubelet SHOULD ensure all consumed resources on the node are freed
before deleting the pod.
* Application owners must be free to force delete pods, but they *must*
understand the implications of doing so, and all client UI must be able
to communicate those implications.
* Force deleting a pod may cause data loss (two instances of the same
pod process may be running at the same time)
* All existing controllers in the system must be limited to signaling pod
termination (starting graceful deletion), and are not allowed to force
delete a pod.
* The node controller will no longer be allowed to force delete pods -
it may only signal deletion by beginning (but not completing) a
graceful deletion.
* The GC controller may not force delete pods
* The namespace controller used to force delete pods, but no longer
does so. This means a node partition can block namespace deletion
indefinitely.
* The pod GC controller may continue to force delete pods on nodes that
no longer exist if we treat node deletion as confirming permanent
partition. If we do not, the pod GC controller must not force delete
pods.
* It must be possible for an administrator to effectively resolve partitions
manually to allow namespace deletion.
* Deleting a node from etcd should be seen as a signal to the cluster that
the node is permanently partitioned. We must audit existing components
to verify this is the case.
* The PodGC controller has primary responsibility for this - it already
owns the responsibility to delete pods on nodes that do not exist, and
so is allowed to force delete pods on nodes that do not exist.
* The PodGC controller must therefore always be running and will be
changed to always be running for this responsibility in a >=1.5
cluster.
In the above scheme, force deleting a pod releases the lock on that pod and
allows higher level components to proceed to create a replacement.
It has been requested that force deletion be restricted to privileged users.
That limits the application owner in resolving partitions when the consequences
of force deletion are understood, and not all application owners will be
privileged users. For example, a user may be running a 3 node etcd cluster in a
pet set. If pet 2 becomes partitioned, the user can instruct etcd to remove
pet 2 from the cluster (via direct etcd membership calls), and because a quorum
exists pets 0 and 1 can safely accept that action. The user can then force
delete pet 2 and the pet set controller will be able to recreate that pet on
another node and have it join the cluster safely (pets 0 and 1 constitute a
quorum for membership change).
This proposal does not alter the behavior of finalizers - instead, it makes
finalizers unnecessary for common application cases (because the cluster only
deletes pods when safe).
### Fencing
The changes above allow Pet Sets to ensure at-most-one pod, but provide no
recourse for the automatic resolution of cluster partitions during normal
operation. For that, we propose a **fencing controller** which exists above
the current controller plane and is capable of detecting and automatically
resolving partitions. The fencing controller is an agent empowered to make
similar decisions as a human administrator would make to resolve partitions,
and to take corresponding steps to prevent a dead machine from coming back
to life automatically.
Fencing controllers most benefit services that are not innately replicated
by reducing the amount of time it takes to detect a failure of a node or
process, isolate that node or process so it cannot initiate or receive
communication from clients, and then spawn another process. It is expected
that many StatefulSets of size 1 would prefer to be fenced, given that most
applications in the real world of size 1 have no other alternative for HA
except reducing mean-time-to-recovery.
While the methods and algorithms may vary, the basic pattern would be:
1. Detect a partitioned pod or node via the Kubernetes API or via external
means.
2. Decide whether the partition justifies fencing based on priority, policy, or
service availability requirements.
3. Fence the node or any connected storage using appropriate mechanisms.
For this proposal we only describe the general shape of detection and how existing
Kubernetes components can be leveraged for policy, while the exact implementation
and mechanisms for fencing are left to a future proposal. A future fencing controller
would be able to leverage a number of systems including but not limited to:
* Cloud control plane APIs such as machine force shutdown
* Additional agents running on each host to force kill process or trigger reboots
* Agents integrated with or communicating with hypervisors running hosts to stop VMs
* Hardware IPMI interfaces to reboot a host
* Rack level power units to power cycle a blade
* Network routers, backplane switches, software defined networks, or system firewalls
* Storage server APIs to block client access
to appropriately limit the ability of the partitioned system to impact the cluster.
Fencing agents today use many of these mechanisms to allow the system to make
progress in the event of failure. The key contribution of Kubernetes is to define
a strongly consistent pattern whereby fencing agents can be plugged in.
To allow users, clients, and automated systems like the fencing controllers to
observe partitions, we propose an additional responsibility to the node controller
or any future controller that attempts to detect partition. The node controller should
add an additional condition to pods that have been terminated due to a node failing
to heartbeat that indicates that the cause of the deletion was node partition.
It may be desirable for users to be able to request fencing when they suspect a
component is malfunctioning. It is outside the scope of this proposal but would
allow administrators to take an action that is safer than force deletion, and
decide at the end whether to force delete.
How the fencing controller decides to fence is left undefined, but it is likely
it could use a combination of pod forgiveness (as a signal of how much disruption
a pod author is likely to accept) and pod disruption budget (as a measurement of
the amount of disruption already undergone) to measure how much latency between
failure and fencing the app is willing to tolerate. Likewise, it can use its own
understanding of the latency of the various failure detectors - the node controller,
any hypothetical information it gathers from service proxies or node peers, any
heartbeat agents in the system - to describe an upper bound on reaction.
### Storage Consistency
To ensure that shared storage without implicit locking be safe for RWO access, the
Kubernetes storage subsystem should leverage the strong consistency available through
the API server and prevent concurrent execution for some types of persistent volumes.
By leveraging existing concepts, we can allow the scheduler and the kubelet to enforce
a guarantee that an RWO volume can be used on at-most-one node at a time.
In order to properly support region and zone specific storage, Kubernetes adds node
selector restrictions to pods derived from the persistent volume. Expanding this
concept to volume types that have no external metadata to read (NFS, iSCSI) may
result in adding a label selector to PVs that defines the allowed nodes the storage
can run on (this is a common requirement for iSCSI, FibreChannel, or NFS clusters).
Because all nodes in a Kubernetes cluster possess a special node name label, it would
be possible for a controller to observe the scheduling decision of a pod using an
unsafe volume and "attach" that volume to the node, and also observe the deletion of
the pod and "detach" the volume from the node. The node would then require that these
unsafe volumes be "attached" before allowing pod execution. Attach and detach may
be recorded on the PVC or PV as a new field or materialized via the selection labels.
Possible sequence of operations:
1. Cluster administrator creates a RWO iSCSI persistent volume, available only to
nodes with the label selector `storagecluster=iscsi-1`
2. User requests an RWO volume and is bound to the iSCSI volume
3. The user creates a pod referencing the PVC
4. The scheduler observes the pod must schedule on nodes with `storagecluster=iscsi-1`
(alternatively this could be enforced in admission) and binds to node `A`
5. The kubelet on node `A` observes the pod references a PVC that specifies RWO which
requires "attach" to be successful
6. The attach/detach controller observes that a pod has been bound with a PVC that
requires "attach", and attempts to execute a compare and swap update on the PVC/PV
attaching it to node `A` and pod 1
7. The kubelet observes the attach of the PVC/PV and executes the pod
8. The user terminates the pod
9. The user creates a new pod that references the PVC
10. The scheduler binds this new pod to node `B`, which also has `storagecluster=iscsi-1`
11. The kubelet on node `B` observes the new pod, but sees that the PVC/PV is bound
to node `A` and so must wait for detach
12. The kubelet on node `A` completes the deletion of pod 1
13. The attach/detach controller observes the first pod has been deleted and that the
previous attach of the volume to pod 1 is no longer valid - it performs a CAS
update on the PVC/PV clearing its attach state.
14. The attach/detach controller observes the second pod has been scheduled and
attaches it to node `B` and pod 2
15. The kubelet on node `B` observes the attach and allows the pod to execute.
If a partition occurred after step 11, the attach controller would block waiting
for the pod to be deleted, and prevent node `B` from launching the second pod.
The fencing controller, upon observing the partition, could signal the iSCSI servers
to firewall node `A`. Once that firewall is in place, the fencing controller could
break the PVC/PV attach to node `A`, allowing steps 13 onwards to continue.
### User interface changes
Clients today may assume that force deletions are safe. We must appropriately
audit clients to identify this behavior and improve the messages. For instance,
`kubectl delete --grace-period=0` could print a warning and require `--confirm`:
```
$ kubectl delete pod foo --grace-period=0
warning: Force deleting a pod does not wait for the pod to terminate, meaning
your containers will be stopped asynchronously. Pass --confirm to
continue
```
Likewise, attached volumes would require new semantics to allow the attachment
to be broken.
Clients should communicate partitioned state more clearly - changing the status
column of a pod list to contain the condition indicating NodeDown would help
users understand what actions they could take.
## Backwards compatibility
On an upgrade, pet sets would not be "safe" until the above behavior is implemented.
All other behaviors should remain as-is.
## Testing
All of the above implementations propose to ensure pods can be treated as components
of a strongly consistent cluster. Since formal proofs of correctness are unlikely in
the foreseeable future, Kubernetes must empirically demonstrate the correctness of
the proposed systems. Automated testing of the mentioned components should be
designed to expose ordering and consistency flaws in the presence of
* Master-node partitions
* Node-node partitions
* Master-etcd partitions
* Concurrent controller execution
* Kubelet failures
* Controller failures
A test suite that can perform these tests in combination with real world pet sets
would be desirable, although possibly non-blocking for this proposal.
## Documentation
We should document the lifecycle guarantees provided by the cluster in a clear
and unambiguous way to end users.
## Deferred issues
* Live migration continues to be unsupported on Kubernetes for the foreseeable
future, and no additional changes will be made to this proposal to account for
that feature.
## Open Questions
* Should node deletion be treated as "node was down and all processes terminated"
* Pro: it's a convenient signal that we use in other places today
* Con: the kubelet recreates its Node object, so if a node is partitioned and
the admin deletes the node, when the partition is healed the node would be
recreated, and the processes are *definitely* not terminated
* Implies we must alter the pod GC controller to only signal graceful deletion,
and only to flag pods on nodes that don't exist as partitioned, rather than
force deleting them.
* Decision: YES - captured above.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-safety.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -30,6 +30,7 @@ implementation-oriented (think control knobs).
given the desired state and the current/observed state, regardless of how many
intermediate state updates may have been missed. Edge-triggered behavior must be
just an optimization.
* There should be a CAP-like theorem regarding the tradeoffs between driving control loops via polling or events about simultaneously achieving high performance, reliability, and simplicity -- pick any 2.
* Assume an open world: continually verify assumptions and gracefully adapt to
external events and/or actors. Example: we allow users to kill pods under
control of a replication controller; it just replaces them.

View File

@ -37,7 +37,7 @@ The slave mount namespace is the correct solution for this AFAICS. Until this
becomes available in k8s, we will have to have operations restart containers manually
based on monitoring alerts.
1. (From @victorgp) When using CoreOS that does not provides external fuse systems
1. (From @victorgp) When using CoreOS Container Linux that does not provides external fuse systems
like, in our case, GlusterFS, and you need a container to do the mounts. The only
way to see those mounts in the host, hence also visible by other containers, is by
sharing the mount propagation.
@ -140,7 +140,7 @@ runtime support matrix and when that will be addressed.
distros.
1. (From @euank) Changing those mountflags may make docker even less stable,
this may lock up kernel accidently or potentially leak mounts.
this may lock up kernel accidentally or potentially leak mounts.
## Decision

View File

@ -52,7 +52,7 @@ Example use cases for rescheduling are
* (note that these last two cases are the only use cases where the first-order intent
is to move a pod specifically for the benefit of another pod)
* moving a running pod off of a node from which it is receiving poor service
* anomalous crashlooping or other mysterious incompatiblity between the pod and the node
* anomalous crashlooping or other mysterious incompatibility between the pod and the node
* repeated out-of-resource killing (see #18724)
* repeated attempts by the scheduler to schedule the pod onto some node, but it is
rejected by Kubelet admission control due to incomplete scheduler knowledge

View File

@ -20,7 +20,7 @@ Borg increased utilization by about 20% when it started allowing use of such non
## Requests and Limits
For each resource, containers can specify a resource request and limit, `0 <= request <= `[`Node Allocatable`](../proposals/node-allocatable.md) & `request <= limit <= Infinity`.
For each resource, containers can specify a resource request and limit, `0 <= request <= `[`Node Allocatable`](../design-proposals/node-allocatable.md) & `request <= limit <= Infinity`.
If a pod is successfully scheduled, the container is guaranteed the amount of resources requested.
Scheduling is based on `requests` and not `limits`.
The pods and its containers will not be allowed to exceed the specified limit.

View File

@ -302,8 +302,8 @@ where a `<CPU-info>` or `<memory-info>` structure looks like this:
```yaml
{
mean: <value> # arithmetic mean
max: <value> # minimum value
min: <value> # maximum value
max: <value> # maximum value
min: <value> # minimum value
count: <value> # number of data points
percentiles: [ # map from %iles to values
"10": <10th-percentile-value>,

View File

@ -191,7 +191,7 @@ profiles to be opaque to kubernetes for now.
The following format is scoped as follows:
1. `runtime/default` - the default profile for the container runtime
1. `docker/default` - the default profile for the container runtime
2. `unconfined` - unconfined profile, ie, no seccomp sandboxing
3. `localhost/<profile-name>` - the profile installed to the node's local seccomp profile root

View File

@ -99,5 +99,5 @@ Kubernetes self-hosted is working today. Bootkube is an implementation of the "t
## Known Issues
- [Health check endpoints for components don't work correctly](https://github.com/kubernetes-incubator/bootkube/issues/64#issuecomment-228144345)
- [kubeadm doesn't do self-hosted yet](https://github.com/kubernetes/kubernetes/pull/38407)
- [kubeadm does do self-hosted, but isn't tested yet](https://github.com/kubernetes/kubernetes/pull/40075)
- The Kubernetes [versioning policy](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/versioning.md) allows for version skew of kubelet and control plane but not skew between control plane components themselves. We must add testing and validation to Kubernetes that this skew works. Otherwise the work to make Kubernetes HA is rather pointless if it can't be upgraded in an HA manner as well.

View File

@ -0,0 +1,196 @@
# Kubectl apply subcommands for last-config
## Abstract
`kubectl apply` uses the `last-applied-config` annotation to compute
the removal of fields from local object configuration files and then
send patches to delete those fields from the live object. Reading or
updating the `last-applied-config` is complex as it requires parsing
out and writing to the annotation. Here we propose a set of porcelain
commands for users to better understand what is going on in the system
and make updates.
## Motivation
What is going on behind the scenes with `kubectl apply` is opaque. Users
have to interact directly with annotations on the object to view
and make changes. In order to stop having `apply` manage a field on
an object, it must be manually removed from the annotation and then be removed
from the local object configuration. Users should be able to simply edit
the local object configuration and set it as the last-applied-config
to be used for the next diff base. Storing the last-applied-config
in an annotation adds black magic to `kubectl apply`, and it would
help users learn and understand if the value was exposed in a discoverable
manner.
## Use Cases
1. As a user, I want to be able to diff the last-applied-configuration
against the current local configuration to see which changes the command is seeing
2. As a user, I want to remove fields from being managed by the local
object configuration by removing them from the local object configuration
and setting the last-applied-configuration to match.
3. As a user, I want to be able to view the last-applied-configuration
on the live object that will be used to calculate the diff patch
to update the live object from the configuration file.
## Naming and Format possibilities
### Naming
1. *cmd*-last-applied
Rejected alternatives:
2. ~~last-config~~
3. ~~last-applied-config~~
4. ~~last-configuration~~
5. ~~last-applied-configuration~~
6. ~~last~~
### Formats
1. Apply subcommands
- `kubectl apply set-last-applied/view-last-applied/diff-last-applied
- a little bit odd to have 2 verbs in a row
- improves discoverability to have these as subcommands so they are tied to apply
Rejected alternatives:
2. ~~Set/View subcommands~~
- `kubectl set/view/diff last-applied
- consistent with other set/view commands
- clutters discoverability of set/view commands since these are only for apply
- clutters discoverability for last-applied commands since they are for apply
3. ~~Apply flags~~
- `kubectl apply [--set-last-applied | --view-last-applied | --diff-last-applied]
- Not a fan of these
## view last-applied
Porcelain command that retrieves the object and prints the annotation value as yaml or json.
Prints an error message if the object is not managed by `apply`.
1. Get the last-applied by type/name
```sh
kubectl apply view-last-applied deployment/nginx
```
```yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 1
template:
metadata:
labels:
run: nginx
spec:
containers:
- image: nginx
name: nginx
```
2. Get the last-applied by file, print as json
```sh
kubectl apply view-last-applied -f deployment_nginx.yaml -o json
```
Same as above, but in json
## diff last-applied
Porcelain command that retrieves the object and displays a diff against
the local configuration
1. Diff the last-applied
```sh
kubectl apply diff-last-applied -f deployment_nginx.yaml
```
Opens up a 2-way diff in the default diff viewer. This should
follow the same semantics as `git diff`. It should accept either a
flag `--diff-viewer=meld` or check the environment variable
`KUBECTL_EXTERNAL_DIFF=meld`. If neither is specified, the `diff`
command should be used.
This is meant to show the user what they changed in the configuration,
since it was last applied, but not show what has changed in the server.
The supported output formats should be `yaml` and `json`, as specified
by the `-o` flag.
A future goal is to provide a 3-way diff with `kubectl apply diff -f deployment_nginx.yaml`.
Together these tools would give the user the ability to see what is going
on and compare changes made to the configuration file vs other
changes made to the server independent of the configuration file.
## set last-applied
Porcelain command that sets the last-applied-config annotation to as
if the local configuration file had just been applied.
1. Set the last-applied-config
```sh
kubectl apply set-last-applied -f deployment_nginx.yaml
```
Sends a Patch request to set the last-applied-config as if
the configuration had just been applied.
## edit last-applied
1. Open the last-applied-config in an editor
```sh
kubectl apply edit-last-applied -f deployment_nginx.yaml
```
Since the last-applied-configuration annotation exists only
on the live object, this command can alternatively take the
kind/name.
```sh
kubectl apply edit-last-applied deployment/nginx
```
Sends a Patch request to set the last-applied-config to
the value saved in the editor.
## Example workflow to stop managing a field with apply - using get/set
As a user, I want to have the replicas on a Deployment managed by an autoscaler
instead of by the configuration.
1. Check to make sure the live object is up-to-date
- `kubectl apply diff-last-applied -f deployment_nginx.yaml`
- Expect no changes
2. Update the deployment_nginx.yaml by removing the replicas field
3. Diff the last-applied-config to make sure the only change is the removal of the replicas field
4. Remove the replicas field from the last-applied-config so it doesn't get deleted next apply
- `kubectl apply set-last-applied -f deployment_nginx.yaml`
5. Verify the last-applied-config has been updated
- `kubectl apply view-last-applied -f deployment_nginx.yaml`
## Example workflow to stop managing a field with apply - using edit
1. Check to make sure the live object is up-to-date
- `kubectl apply diff-last-applied -f deployment_nginx.yaml`
- Expect no changes
2. Update the deployment_nginx.yaml by removing the replicas field
3. Edit the last-applied-config and remove the replicas field
- `kubectl apply edit-last-applied deployment/nginx`
4. Verify the last-applied-config has been updated
- `kubectl apply view-last-applied -f deployment_nginx.yaml`
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/configmap.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -0,0 +1,38 @@
# <Title>
Status: Pending
Version: Alpha | Beta | GA
Implementation Owner: TBD
## Motivation
<2-6 sentences about why this is needed>
## Proposal
<4-6 description of the proposed solution>
## User Experience
### Use Cases
<enumerated list of use cases for this feature>
<in depth description of user experience>
<*include full examples*>
## Implementation
<in depth description of how the feature will be implemented. in some cases this may be very simple.>
### Client/Server Backwards/Forwards compatibility
<define behavior when using a kubectl client with an older or newer version of the apiserver (+-1 version)>
## Alternatives considered
<short description of alternative solutions to be considered>

View File

@ -21,21 +21,38 @@ in your group;
2. Create pkg/apis/`<group>`/{register.go, `<version>`/register.go} to register
this group's API objects to the encoding/decoding scheme (e.g.,
[pkg/apis/authentication/register.go](../../pkg/apis/authentication/register.go) and
[pkg/apis/authentication/v1beta1/register.go](../../pkg/apis/authentication/v1beta1/register.go);
[pkg/apis/authentication/register.go](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/authentication/register.go)
and
[pkg/apis/authentication/v1beta1/register.go](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/authentication/v1beta1/register.go);
The register files must have a var called SchemeBuilder for the generated code
to reference. There must be an AddToScheme method for the installer to
reference. You can look at a group under `pkg/apis/...` for example register.go
files to use as a template, but do not copy the register.go files under
`pkg/api/...`--they are not general.
3. Add a pkg/apis/`<group>`/install/install.go, which is responsible for adding
the group to the `latest` package, so that other packages can access the group's
meta through `latest.Group`. You probably only need to change the name of group
and version in the [example](../../pkg/apis/authentication/install/install.go)). You
need to import this `install` package in {pkg/master,
pkg/client/unversioned}/import_known_versions.go, if you want to make your group
accessible to other packages in the kube-apiserver binary, binaries that uses
the client package.
3. Add a pkg/apis/`<group>`/install/install.go, You probably only need to change
the name of group and version in the
[example](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/authentication/install/install.go)). This
package must be imported by the server along with
`k8s.io/kubernetes/pkg/api/install`. Import these packages with the blank
identifier as they do not have user callable code and exist solely for their
initialization side-effects.
Step 2 and 3 are mechanical, we plan on autogenerate these using the
cmd/libs/go2idl/ tool.
### Type definitions in `types.go`
Each type should be an exported struct (have a capitalized name). The struct
should have the `TypeMeta` and `ObjectMeta` embeds. There should be a `Spec` and
a `Status` field. If the object is solely a data storage object, and will not be
modified by a controller, the status field can be left off and the fields inside
the `Spec` can be inlined directly into the struct.
For each top-level type there should also be a `List` struct. The `List` struct should
have the `TypeMeta` and `ListMeta` embeds. There should be an `Items` field that
is a slice of the defined type.
### Scripts changes and auto-generated code:
1. Generate conversions and deep-copies:

View File

@ -1,12 +1,11 @@
API Conventions
===============
Updated: 4/22/2016
Updated: 2/23/2017
*This document is oriented at users who want a deeper understanding of the
Kubernetes API structure, and developers wanting to extend the Kubernetes API.
An introduction to using resources with kubectl can be found in [Working with
resources](../user-guide/working-with-resources.md).*
An introduction to using resources with kubectl can be found in [the object management overview](https://kubernetes.io/docs/concepts/tools/kubectl/object-management-overview/).*
**Table of Contents**
<!-- BEGIN MUNGE: GENERATED_TOC -->
@ -53,7 +52,7 @@ resources](../user-guide/working-with-resources.md).*
<!-- END MUNGE: GENERATED_TOC -->
The conventions of the [Kubernetes API](../api.md) (and related APIs in the
The conventions of the [Kubernetes API](https://kubernetes.io/docs/api/) (and related APIs in the
ecosystem) are intended to ease client development and ensure that configuration
mechanisms can be implemented that work across a diverse set of use cases
consistently.
@ -75,6 +74,9 @@ kinds would have different attributes and properties)
via HTTP to the server. Resources are exposed via:
* Collections - a list of resources of the same type, which may be queryable
* Elements - an individual resource, addressable via a URL
* **API Group** a set of resources that are exposed together at the same. Along
with the version is exposed in the "apiVersion" field as "GROUP/VERSION", e.g.
"policy.k8s.io/v1".
Each resource typically accepts and returns data of a single kind. A kind may be
accepted or returned by multiple resources that reflect specific use cases. For
@ -83,8 +85,17 @@ to create, update, and delete pods, while a separate "pod status" resource (that
acts on "Pod" kind) allows automated processes to update a subset of the fields
in that resource.
Resources are bound together in API groups - each group may have one or more
versions that evolve independent of other API groups, and each version within
the group has one or more resources. Group names are typically in domain name
form - the Kubernetes project reserves use of the empty group, all single
word names ("extensions", "apps"), and any group name ending in "*.k8s.io" for
its sole use. When choosing a group name, we recommend selecting a subdomain
your group or organization owns, such as "widget.mycompany.com".
Resource collections should be all lowercase and plural, whereas kinds are
CamelCase and singular.
CamelCase and singular. Group names must be lower case and be valid DNS
subdomains.
## Types (Kinds)
@ -114,7 +125,7 @@ the full list. Some objects may be singletons (the current user, the system
defaults) and may not have lists.
In addition, all lists that return objects with labels should support label
filtering (see [docs/user-guide/labels.md](../user-guide/labels.md), and most
filtering (see [the labels documentation](https://kubernetes.io/docs/user-guide/labels/)), and most
lists should support filtering by fields.
Examples: PodLists, ServiceLists, NodeLists
@ -150,13 +161,23 @@ is independent of the specific resource schema.
Two additional subresources, `proxy` and `portforward`, provide access to
cluster resources as described in
[docs/user-guide/accessing-the-cluster.md](../user-guide/accessing-the-cluster.md).
[accessing the cluster docs](https://kubernetes.io/docs/user-guide/accessing-the-cluster/).
The standard REST verbs (defined below) MUST return singular JSON objects. Some
API endpoints may deviate from the strict REST pattern and return resources that
are not singular JSON objects, such as streams of JSON objects or unstructured
text log data.
A common set of "meta" API objects are used across all API groups and are
thus considered part of the server group named `meta.k8s.io`. These types may
evolve independent of the API group that uses them and API servers may allow
them to be addressed in their generic form. Examples are `ListOptions`,
`DeleteOptions`, `List`, `Status`, `WatchEvent`, and `Scale`. For historical
reasons these types are part of each existing API group. Generic tools like
quota, garbage collection, autoscalers, and generic clients like kubectl
leverage these types to define consistent behavior across different resource
types, like the interfaces in programming languages.
The term "kind" is reserved for these "top-level" API types. The term "type"
should be used for distinguishing sub-categories within objects or subobjects.
@ -181,12 +202,12 @@ called "metadata":
* namespace: a namespace is a DNS compatible label that objects are subdivided
into. The default namespace is 'default'. See
[docs/user-guide/namespaces.md](../user-guide/namespaces.md) for more.
[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more.
* name: a string that uniquely identifies this object within the current
namespace (see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)).
namespace (see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)).
This value is used in the path when retrieving an individual object.
* uid: a unique in time and space value (typically an RFC 4122 generated
identifier, see [docs/user-guide/identifiers.md](../user-guide/identifiers.md))
identifier, see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/))
used to distinguish between objects with the same name that have been deleted
and recreated
@ -213,10 +234,10 @@ not reachable by name) after the time in this field. Once set, this value may
not be unset or be set further into the future, although it may be shortened or
the resource may be deleted prior to this time.
* labels: a map of string keys and values that can be used to organize and
categorize objects (see [docs/user-guide/labels.md](../user-guide/labels.md))
categorize objects (see [the labels docs](https://kubernetes.io/docs/user-guide/labels/))
* annotations: a map of string keys and values that can be used by external
tooling to store and retrieve arbitrary metadata about this object (see
[docs/user-guide/annotations.md](../user-guide/annotations.md))
[the annotations docs](https://kubernetes.io/docs/user-guide/annotations/)
Labels are intended for organizational purposes by end users (select the pods
that match this label query). Annotations enable third-party automation and
@ -277,6 +298,14 @@ cannot vary from the user's desired intent MAY have only "spec", and MAY rename
Objects that contain both spec and status should not contain additional
top-level fields other than the standard metadata fields.
Some objects which are not persisted in the system - such as `SubjectAccessReview`
and other webhook style calls - may choose to add spec and status to encapsulate
a "call and response" pattern. The spec is the request (often a request for
information) and the status is the response. For these RPC like objects the only
operation may be POST, but having a consistent schema between submission and
response reduces the complexity of these clients.
##### Typical status properties
**Conditions** represent the latest available observations of an object's
@ -322,7 +351,7 @@ Some resources in the v1 API contain fields called **`phase`**, and associated
`message`, `reason`, and other status fields. The pattern of using `phase` is
deprecated. Newer API types should use conditions instead. Phase was essentially
a state-machine enumeration field, that contradicted
[system-design principles](../design/principles.md#control-logic) and hampered
[system-design principles](../design-proposals/principles.md#control-logic) and hampered
evolution, since [adding new enum values breaks backward
compatibility](api_changes.md). Rather than encouraging clients to infer
implicit properties from phases, we intend to explicitly expose the conditions
@ -346,7 +375,7 @@ only provided with reasonable effort, and is not guaranteed to not be lost.
Status information that may be large (especially proportional in size to
collections of other resources, such as lists of references to other objects --
see below) and/or rapidly changing, such as
[resource usage](../design/resources.md#usage-data), should be put into separate
[resource usage](../design-proposals/resources.md#usage-data), should be put into separate
objects, with possibly a reference from the original object. This helps to
ensure that GETs and watch remain reasonably efficient for the majority of
clients, which may not need that data.
@ -359,9 +388,9 @@ the reported status reflects the most recent desired status.
#### References to related objects
References to loosely coupled sets of objects, such as
[pods](../user-guide/pods.md) overseen by a
[replication controller](../user-guide/replication-controller.md), are usually
best referred to using a [label selector](../user-guide/labels.md). In order to
[pods](https://kubernetes.io/docs/user-guide/pods/) overseen by a
[replication controller](https://kubernetes.io/docs/user-guide/replication-controller/), are usually
best referred to using a [label selector](https://kubernetes.io/docs/user-guide/labels/). In order to
ensure that GETs of individual objects remain bounded in time and space, these
sets may be queried via separate API queries, but will not be expanded in the
referring object's status.
@ -698,7 +727,7 @@ labels:
All compatible Kubernetes APIs MUST support "name idempotency" and respond with
an HTTP status code 409 when a request is made to POST an object that has the
same name as an existing object in the system. See
[docs/user-guide/identifiers.md](../user-guide/identifiers.md) for details.
[the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/) for details.
Names generated by the system may be requested using `metadata.generateName`.
GenerateName indicates that the name should be made unique by the server prior
@ -1296,7 +1325,7 @@ that hard to consistently apply schemas that ensure uniqueness. One just needs
to ensure that at least one value of some label key in common differs compared
to all other comparable resources. We could/should provide a verification tool
to check that. However, development of conventions similar to the examples in
[Labels](../user-guide/labels.md) make uniqueness straightforward. Furthermore,
[Labels](https://kubernetes.io/docs/user-guide/labels/) make uniqueness straightforward. Furthermore,
relatively narrowly used namespaces (e.g., per environment, per application) can
be used to reduce the set of resources that could potentially cause overlap.

View File

@ -215,7 +215,7 @@ runs just prior to conversion. That works fine when the user creates a resource
from a hand-written configuration -- clients can write either field and read
either field, but what about creation or update from the output of GET, or
update via PATCH (see
[In-place updates](../user-guide/managing-deployments.md#in-place-updates-of-resources))?
[In-place updates](https://kubernetes.io/docs/user-guide/managing-deployments/#in-place-updates-of-resources))?
In this case, the two fields will conflict, because only one field would be
updated in the case of an old client that was only aware of the old field (e.g.,
`height`).
@ -414,14 +414,10 @@ inefficient).
The conversion code resides with each versioned API. There are two files:
- `pkg/api/<version>/conversion.go` containing manually written conversion
functions
- `pkg/api/<version>/conversion_generated.go` containing auto-generated
conversion functions
- `pkg/apis/extensions/<version>/conversion.go` containing manually written
conversion functions
- `pkg/apis/extensions/<version>/conversion_generated.go` containing
auto-generated conversion functions
conversion functions
- `pkg/apis/extensions/<version>/zz_generated.conversion.go` containing
auto-generated conversion functions
Since auto-generated conversion functions are using manually written ones,
those manually written should be named with a defined convention, i.e. a

View File

@ -104,7 +104,7 @@ kindness...)
PRs should only need to be manually re-tested if you believe there was a flake
during the original test. All flakes should be filed as an
[issue](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake).
Once you find or file a flake a contributer (this may be you!) should request
Once you find or file a flake a contributor (this may be you!) should request
a retest with "@k8s-bot test this issue: #NNNNN", where NNNNN is replaced with
the issue number you found or filed.

View File

@ -16,8 +16,7 @@ depending on the point in the release cycle.
to set the same label to confirm that no release note is needed.
1. `release-note` labeled PRs generate a release note using the PR title by
default OR the release-note block in the PR template if filled in.
* See the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md) for more
details.
* See the [PR template](https://github.com/kubernetes/kubernetes/blob/master/.github/PULL_REQUEST_TEMPLATE.md) for more details.
* PR titles and body comments are mutable and can be modified at any time
prior to the release to reflect a release note friendly message.

View File

@ -120,7 +120,7 @@ subdirectories).
intended for users that deploy applications or cluster administrators,
respectively. Actual application examples belong in /examples.
- Examples should also illustrate [best practices for configuration and
using the system](../user-guide/config-best-practices.md)
using the system](https://kubernetes.io/docs/user-guide/config-best-practices/)
- Third-party code

View File

@ -19,7 +19,7 @@ around code review that govern all active contributors to Kubernetes.
### Code of Conduct
The most important expectation of the Kubernetes community is that all members
abide by the Kubernetes [community code of conduct](../../code-of-conduct.md).
abide by the Kubernetes [community code of conduct](../../governance.md#code-of-conduct).
Only by respecting each other can we develop a productive, collaborative
community.
@ -42,7 +42,7 @@ contributors are considered to be anyone who meets any of the following criteria
than 20 PRs in the previous year.
* Filed more than three issues in the previous month, or more than 30 issues in
the previous 12 months.
* Commented on more than pull requests in the previous month, or
* Commented on more than five pull requests in the previous month, or
more than 50 pull requests in the previous 12 months.
* Marked any PR as LGTM in the previous month.
* Have *collaborator* permissions in the Kubernetes github project.
@ -58,7 +58,7 @@ Because reviewers are often the first points of contact between new members of
the community and can significantly impact the first impression of the
Kubernetes community, reviewers are especially important in shaping the
Kubernetes community. Reviewers are highly encouraged to review the
[code of conduct](../../code-of-conduct.md) and are strongly encouraged to go above
[code of conduct](../../governance.md#code-of-conduct) and are strongly encouraged to go above
and beyond the code of conduct to promote a collaborative, respectful
Kubernetes community.

View File

@ -1,15 +1,17 @@
# Development Guide
This document is intended to be the canonical source of truth for things like
supported toolchain versions for building Kubernetes. If you find a
requirement that this doc does not capture, please
[submit an issue](https://github.com/kubernetes/kubernetes/issues) on github. If
you find other docs with references to requirements that are not simply links to
this doc, please [submit an issue](https://github.com/kubernetes/kubernetes/issues).
This document is the canonical source of truth for things like
supported toolchain versions for building Kubernetes.
Please submit an [issue] on github if you
* find a requirement that this doc does not capture,
* find other docs with references to requirements that
are not simply links to this doc.
This document is intended to be relative to the branch in which it is found.
It is guaranteed that requirements will change over time for the development
branch, but release branches of Kubernetes should not change.
Development branch requirements will change over time, but release branch
requirements are frozen.
## Building Kubernetes with Docker
@ -19,189 +21,245 @@ Docker please follow [these instructions]
## Building Kubernetes on a local OS/shell environment
Many of the Kubernetes development helper scripts rely on a fairly up-to-date
GNU tools environment, so most recent Linux distros should work just fine
out-of-the-box. Note that Mac OS X ships with somewhat outdated BSD-based tools,
some of which may be incompatible in subtle ways, so we recommend
[replacing those with modern GNU tools]
(https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/).
Kubernetes development helper scripts assume an up-to-date
GNU tools environment. Most recent Linux distros should work
out-of-the-box.
### Go development environment
Mac OS X ships with outdated BSD-based tools.
We recommend installing [Os X GNU tools].
Kubernetes is written in the [Go](http://golang.org) programming language.
To build Kubernetes without using Docker containers, you'll need a Go
development environment. Builds for Kubernetes 1.0 - 1.2 require Go version
1.4.2. Builds for Kubernetes 1.3 and higher require Go version 1.6.0. If you
haven't set up a Go development environment, please follow [these
instructions](http://golang.org/doc/code.html) to install the go tools.
### etcd
Set up your GOPATH and add a path entry for go binaries to your PATH. Typically
added to your ~/.profile:
Kubernetes maintains state in [`etcd`][etcd-latest], a distributed key store.
Please [install it locally][etcd-install] to run local integration tests.
### Go
Kubernetes is written in [Go](http://golang.org).
If you don't have a Go development environment,
please [set one up](http://golang.org/doc/code.html).
| Kubernetes | requires Go |
|----------------|--------------|
| 1.0 - 1.2 | 1.4.2 |
| 1.3, 1.4 | 1.6 |
| 1.5 and higher | 1.7 - 1.7.5 |
| | [1.8][go-1.8] not verified as of Feb 2017 |
After installation, you'll need `GOPATH` defined,
and `PATH` modified to access your Go binaries.
A common setup is
```sh
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
```
### Godep dependency management
#### Upgrading Go
Kubernetes build and test scripts use [godep](https://github.com/tools/godep) to
Upgrading Go requires specific modification of some scripts and container
images.
- The image for cross compiling in [build/build-image/cross].
The `VERSION` file and `Dockerfile`.
- Update [dockerized-e2e-runner.sh] to run a kubekins-e2e with the desired Go version.
This requires pushing the [e2e][e2e-image] and [test][test-image] images that are `FROM` the desired Go version.
- The cross tag `KUBE_BUILD_IMAGE_CROSS_TAG` in [build/common.sh].
#### Dependency management
Kubernetes build/test scripts use [`godep`](https://github.com/tools/godep) to
manage dependencies.
#### Install godep
Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is
installed on your system. (some of godep's dependencies use the mercurial
source control system). Use `apt-get install mercurial` or `yum install
mercurial` on Linux, or [brew.sh](http://brew.sh) on OS X, or download directly
from mercurial.
Install godep (may require sudo):
```sh
go get -u github.com/tools/godep
```
Note:
At this time, godep version >= v63 is known to work in the Kubernetes project.
To check your version of godep:
Check your version; `v63` or higher is known to work for Kubernetes.
```sh
$ godep version
godep v74 (linux/amd64/go1.6.2)
godep version
```
Developers planning to managing dependencies in the `vendor/` tree may want to
explore alternative environment setups. See
[using godep to manage dependencies](godep.md).
Developers planning to manage dependencies in the `vendor/` tree may want to
explore alternative environment setups. See [using godep to manage dependencies](godep.md).
### Local build using make
To build Kubernetes using your local Go development environment (generate linux
binaries):
## Workflow
![Git workflow](git_workflow.png)
### 1 Fork in the cloud
1. Visit https://github.com/kubernetes/kubernetes
2. Click `Fork` button (top right) to establish a cloud-based fork.
### 2 Clone fork to local storage
Per Go's [workspace instructions][go-workspace], place Kubernetes' code on your
`GOPATH` using the following cloning procedure.
Define a local working directory:
```sh
make
# If your GOPATH has multiple paths, pick
# just one and use it instead of $GOPATH here
working_dir=$GOPATH/src/k8s.io
```
You may pass build options and packages to the script as necessary. For example,
to build with optimizations disabled for enabling use of source debug tools:
> If you already do Go development on github, the `k8s.io` directory
> will be a sibling to your existing `github.com` directory.
Set `user` to match your github profile name:
```sh
make GOGCFLAGS="-N -l"
user={your github profile name}
```
Both `$working_dir` and `$user` are mentioned in the figure above.
Create your clone:
```sh
mkdir -p $working_dir
cd $working_dir
git clone https://github.com/$user/kubernetes.git
# or: git clone git@github.com:$user/kubernetes.git
cd $working_dir/kubernetes
git remote add upstream https://github.com/kubernetes/kubernetes.git
# or: git remote add upstream git@github.com:kubernetes/kubernetes.git
# Never push to upstream master
git remote set-url --push upstream no_push
# Confirm that your remotes make sense:
git remote -v
```
#### Define a pre-commit hook
Please link the Kubernetes pre-commit hook into your `.git` directory.
This hook checks your commits for formatting, building, doc generation, etc.
It requires both `godep` and `etcd` on your `PATH`.
```sh
cd $working_dir/kubernetes/.git/hooks
ln -s ../../hooks/pre-commit .
```
### 3 Branch
Get your local master up to date:
```sh
cd $working_dir/kubernetes
git fetch upstream
git checkout master
git rebase upstream/master
```
Branch from it:
```sh
git checkout -b myfeature
```
Then edit code on the `myfeature` branch.
#### Build
```sh
cd $working_dir/kubernetes
make
```
To build with optimizations disabled for enabling use of source debug tools:
```sh
make GOGCFLAGS="-N -l"
```
To build binaries for all platforms:
```sh
make cross
make cross
```
### How to update the Go version used to test & build k8s
The kubernetes project tries to stay on the latest version of Go so it can
benefit from the improvements to the language over time and can easily
bump to a minor release version for security updates.
Since kubernetes is mostly built and tested in containers, there are a few
unique places you need to update the go version.
- The image for cross compiling in [build/build-image/cross/](https://github.com/kubernetes/kubernetes/blob/master/build/build-image/cross/). The `VERSION` file and `Dockerfile`.
- Update [dockerized-e2e-runner.sh](https://github.com/kubernetes/test-infra/blob/master/jenkins/dockerized-e2e-runner.sh) to run a kubekins-e2e with the desired go version, which requires pushing [e2e-image](https://github.com/kubernetes/test-infra/tree/master/jenkins/e2e-image) and [test-image](https://github.com/kubernetes/test-infra/tree/master/jenkins/test-image) images that are `FROM` the desired go version.
- The docker image being run in [gotest-dockerized.sh](https://github.com/kubernetes/test-infra/blob/master/jenkins/gotest-dockerized.sh).
- The cross tag `KUBE_BUILD_IMAGE_CROSS_TAG` in [build/common.sh](https://github.com/kubernetes/kubernetes/blob/master/build/common.sh)
## Workflow
Below, we outline one of the more common git workflows that core developers use.
Other git workflows are also valid.
### Visual overview
![Git workflow](git_workflow.png)
### Fork the main repository
1. Go to https://github.com/kubernetes/kubernetes
2. Click the "Fork" button (at the top right)
### Clone your fork
The commands below require that you have $GOPATH set ([$GOPATH
docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put
Kubernetes' code into your GOPATH. Note: the commands below will not work if
there is more than one directory in your `$GOPATH`.
#### Test
```sh
mkdir -p $GOPATH/src/k8s.io
cd $GOPATH/src/k8s.io
# Replace "$YOUR_GITHUB_USERNAME" below with your github username
git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git
cd kubernetes
git remote add upstream 'https://github.com/kubernetes/kubernetes.git'
cd $working_dir/kubernetes
# Run every unit test
make test
# Run package tests verbosely
make test WHAT=pkg/util/cache GOFLAGS=-v
# Run integration tests, requires etcd
make test-integration
# Run e2e tests
make test-e2e
```
### Create a branch and make changes
```sh
git checkout -b my-feature
# Make your code changes
```
### Keeping your development fork in sync
See the [testing guide](testing.md) and [end-to-end tests](e2e-tests.md)
for additional information and scenarios.
### 4 Keep your branch in sync
```sh
# While on your myfeature branch
git fetch upstream
git rebase upstream/master
```
Note: If you have write access to the main repository at
github.com/kubernetes/kubernetes, you should modify your git configuration so
that you can't accidentally push to upstream:
### 5 Commit
```sh
git remote set-url --push upstream no_push
```
### Committing changes to your fork
Before committing any changes, please link/copy the pre-commit hook into your
.git directory. This will keep you from accidentally committing non-gofmt'd Go
code. This hook will also do a build and test whether documentation generation
scripts need to be executed.
The hook requires both Godep and etcd on your `PATH`.
```sh
cd kubernetes/.git/hooks/
ln -s ../../hooks/pre-commit .
```
Then you can commit your changes and push them to your fork:
Commit your changes.
```sh
git commit
git push -f origin my-feature
```
Likely you go back and edit/build/test some more then `commit --amend`
in a few cycles.
### 6 Push
When ready to review (or just to establish an offsite backup or your work),
push your branch to your fork on `github.com`:
```sh
git push -f origin myfeature
```
### Creating a pull request
### 7 Create a pull request
1. Visit https://github.com/$YOUR_GITHUB_USERNAME/kubernetes
2. Click the "Compare & pull request" button next to your "my-feature" branch.
3. Check out the pull request [process](pull-requests.md) for more details
1. Visit your fork at https://github.com/$user/kubernetes (replace `$user` obviously).
2. Click the `Compare & pull request` button next to your `myfeature` branch.
3. Check out the pull request [process](pull-requests.md) for more details.
**Note:** If you have write access, please refrain from using the GitHub UI for creating PRs, because GitHub will create the PR branch inside the main repository rather than inside your fork.
_If you have upstream write access_, please refrain from using the GitHub UI for
creating PRs, because GitHub will create the PR branch inside the main
repository rather than inside your fork.
### Getting a code review
#### Get a code review
Once your pull request has been opened it will be assigned to one or more
reviewers. Those reviewers will do a thorough code review, looking for
correctness, bugs, opportunities for improvement, documentation and comments,
and style.
Commit changes made in response to review comments to the same branch on your
fork.
Very small PRs are easy to review. Very large PRs are very difficult to
review. Github has a built-in code review tool, which is what most people use.
review.
At the assigned reviewer's discretion, a PR may be switched to use
[Reviewable](https://reviewable.k8s.io) instead. Once a PR is switched to
Reviewable, please ONLY send or reply to comments through reviewable. Mixing
@ -210,41 +268,39 @@ code review tools can be very confusing.
See [Faster Reviews](faster_reviews.md) for some thoughts on how to streamline
the review process.
### When to retain commits and when to squash
Upon merge, all git commits should represent meaningful milestones or units of
work. Use commits to add clarity to the development and review process.
#### Squash and Merge
Before merging a PR, squash any "fix review feedback", "typo", and "rebased"
sorts of commits. It is not imperative that every commit in a PR compile and
pass tests independently, but it is worth striving for. For mass automated
fixups (e.g. automated doc formatting), use one or more commits for the
changes to tooling and a final commit to apply the fixup en masse. This makes
reviews much easier.
## Testing
Three basic commands let you run unit, integration and/or e2e tests:
```sh
cd kubernetes
make test # Run every unit test
make test WHAT=pkg/util/cache GOFLAGS=-v # Run tests of a package verbosely
make test-integration # Run integration tests, requires etcd
make test-e2e # Run e2e tests
```
See the [testing guide](testing.md) and [end-to-end tests](e2e-tests.md) for additional information and scenarios.
## Regenerating the CLI documentation
```sh
hack/update-generated-docs.sh
```
Upon merge (by either you or your reviewer), all commits left on the review
branch should represent meaningful milestones or units of work. Use commits to
add clarity to the development and review process.
Before merging a PR, squash any _fix review feedback_, _typo_, and _rebased_
sorts of commits.
It is not imperative that every commit in a PR compile and pass tests
independently, but it is worth striving for.
For mass automated fixups (e.g. automated doc formatting), use one or more
commits for the changes to tooling and a final commit to apply the fixup en
masse. This makes reviews easier.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/development.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->
[Os X GNU tools]: https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x
[build/build-image/cross]: https://github.com/kubernetes/kubernetes/blob/master/build/build-image/cross
[build/common.sh]: https://github.com/kubernetes/kubernetes/blob/master/build/common.sh
[dockerized-e2e-runner.sh]: https://github.com/kubernetes/test-infra/blob/master/jenkins/dockerized-e2e-runner.sh
[e2e-image]: https://github.com/kubernetes/test-infra/tree/master/jenkins/e2e-image
[etcd-latest]: https://coreos.com/etcd/docs/latest
[etcd-install]: testing.md#install-etcd-dependency
<!-- https://github.com/coreos/etcd/releases -->
[go-1.8]: https://blog.golang.org/go1.8
[go-workspace]: https://golang.org/doc/code.html#Workspaces
[issue]: https://github.com/kubernetes/kubernetes/issues
[kubectl user guide]: https://kubernetes.io/docs/user-guide/kubectl
[kubernetes.io]: https://kubernetes.io
[mercurial]: http://mercurial.selenic.com/wiki/Download
[test-image]: https://github.com/kubernetes/test-infra/tree/master/jenkins/test-image

View File

@ -137,7 +137,7 @@ make test-e2e-node REMOTE=true IMAGE_PROJECT="<name-of-project-with-images>" IMA
```
Setting up your own host image may require additional steps such as installing etcd or docker. See
[setup_host.sh](../../test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests.
[setup_host.sh](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests.
## Create instances using a different instance name prefix
@ -202,8 +202,10 @@ related test, Remote execution is recommended.**
To enable/disable kubenet:
```sh
make test_e2e_node TEST_ARGS="--disable-kubenet=true" # enable kubenet
make test_e2e_node TEST_ARGS="--disable-kubenet=false" # disable kubenet
# enable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin=kubenet --network-plugin-dir=/opt/cni/bin"'
# disable kubenet
make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin= --network-plugin-dir="'
```
## Additional QoS Cgroups Hierarchy level testing
@ -221,9 +223,9 @@ the bottom of the comments section. To re-run just the node e2e tests from the
`@k8s-bot node e2e test this issue: #<Flake-Issue-Number or IGNORE>` and **include a link to the test
failure logs if caused by a flake.**
The PR builder runs tests against the images listed in [jenkins-pull.properties](../../test/e2e_node/jenkins/jenkins-pull.properties)
The PR builder runs tests against the images listed in [jenkins-pull.properties](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node/jenkins/jenkins-pull.properties)
The post submit tests run against the images listed in [jenkins-ci.properties](../../test/e2e_node/jenkins/jenkins-ci.properties)
The post submit tests run against the images listed in [jenkins-ci.properties](https://github.com/kubernetes/kubernetes/tree/master/test/e2e_node/jenkins/jenkins-ci.properties)
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

View File

@ -10,6 +10,8 @@ Updated: 5/3/2016
- [Building and Running the Tests](#building-and-running-the-tests)
- [Cleaning up](#cleaning-up)
- [Advanced testing](#advanced-testing)
- [Installing/updating kubetest](#installingupdating-kubetest)
- [Extracting a specific version of kubernetes](#extracting-a-specific-version-of-kubernetes)
- [Bringing up a cluster for testing](#bringing-up-a-cluster-for-testing)
- [Federation e2e tests](#federation-e2e-tests)
- [Configuring federation e2e tests](#configuring-federation-e2e-tests)
@ -79,26 +81,26 @@ changing the `KUBERNETES_PROVIDER` environment variable to something other than
To build Kubernetes, up a cluster, run tests, and tear everything down, use:
```sh
go run hack/e2e.go -v --build --up --test --down
go run hack/e2e.go -- -v --build --up --test --down
```
If you'd like to just perform one of these steps, here are some examples:
```sh
# Build binaries for testing
go run hack/e2e.go -v --build
go run hack/e2e.go -- -v --build
# Create a fresh cluster. Deletes a cluster first, if it exists
go run hack/e2e.go -v --up
go run hack/e2e.go -- -v --up
# Run all tests
go run hack/e2e.go -v --test
go run hack/e2e.go -- -v --test
# Run tests matching the regex "\[Feature:Performance\]"
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Feature:Performance\]"
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Feature:Performance\]"
# Conversely, exclude tests that match the regex "Pods.*env"
go run hack/e2e.go -v --test --test_args="--ginkgo.skip=Pods.*env"
go run hack/e2e.go -- -v --test --test_args="--ginkgo.skip=Pods.*env"
# Run tests in parallel, skip any that must be run serially
GINKGO_PARALLEL=y go run hack/e2e.go --v --test --test_args="--ginkgo.skip=\[Serial\]"
@ -112,13 +114,13 @@ GINKGO_PARALLEL=y go run hack/e2e.go --v --test --test_args="--ginkgo.skip=\[Ser
# You can also specify an alternative provider, such as 'aws'
#
# e.g.:
KUBERNETES_PROVIDER=aws go run hack/e2e.go -v --build --up --test --down
KUBERNETES_PROVIDER=aws go run hack/e2e.go -- -v --build --up --test --down
# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for
# cleaning up after a failed test or viewing logs. Use -v to avoid suppressing
# kubectl output.
go run hack/e2e.go -v -ctl='get events'
go run hack/e2e.go -v -ctl='delete pod foobar'
go run hack/e2e.go -- -v -ctl='get events'
go run hack/e2e.go -- -v -ctl='delete pod foobar'
```
The tests are built into a single binary which can be run used to deploy a
@ -133,11 +135,60 @@ something goes wrong and you still have some VMs running you can force a cleanup
with this command:
```sh
go run hack/e2e.go -v --down
go run hack/e2e.go -- -v --down
```
## Advanced testing
### Installing/updating kubetest
The logic in `e2e.go` moved out of the main kubernetes repo to test-infra.
The remaining code in `hack/e2e.go` installs `kubetest` and sends it flags.
It now lives in [kubernetes/test-infra/kubetest](https://github.com/kubernetes/test-infra/tree/master/kubetest).
By default `hack/e2e.go` updates and installs `kubetest` once per day.
Control the updater behavior with the `--get` and `--old` flags:
The `--` flag separates updater and kubetest flags (kubetest flags on the right).
```sh
go run hack/e2e.go --get=true --old=1h -- # Update every hour
go run hack/e2e.go --get=false -- # Never attempt to install/update.
go install k8s.io/test-infra/kubetest # Manually install
go get -u k8s.io/test-infra/kubetest # Manually update installation
```
### Extracting a specific version of kubernetes
The `kubetest` binary can download and extract a specific version of kubernetes,
both the server, client and test binaries. The `--extract=E` flag enables this
functionality.
There are a variety of values to pass this flag:
```sh
# Official builds: <ci|release>/<latest|stable>[-N.N]
go run hack/e2e.go -- --extract=ci/latest --up # Deploy the latest ci build.
go run hack/e2e.go -- --extract=ci/latest-1.5 --up # Deploy the latest 1.5 CI build.
go run hack/e2e.go -- --extract=release/latest --up # Deploy the latest RC.
go run hack/e2e.go -- --extract=release/stable-1.5 --up # Deploy the 1.5 release.
# A specific version:
go run hack/e2e.go -- --extract=v1.5.1 --up # Deploy 1.5.1
go run hack/e2e.go -- --extract=v1.5.2-beta.0 --up # Deploy 1.5.2-beta.0
go run hack/e2e.go -- --extract=gs://foo/bar --up # --stage=gs://foo/bar
# Whatever GKE is using (gke, gke-staging, gke-test):
go run hack/e2e.go -- --extract=gke --up # Deploy whatever GKE prod uses
# Using a GCI version:
go run hack/e2e.go -- --extract=gci/gci-canary --up # Deploy the version for next gci release
go run hack/e2e.go -- --extract=gci/gci-57 # Deploy the version bound to gci m57
go run hack/e2e.go -- --extract=gci/gci-57/ci/latest # Deploy the latest CI build using gci m57 for the VM image
# Reuse whatever is already built
go run hack/e2e.go -- --up # Most common. Note, no extract flag
go run hack/e2e.go -- --build --up # Most common. Note, no extract flag
go run hack/e2e.go -- --build --stage=gs://foo/bar --extract=local --up # Extract the staged version
```
### Bringing up a cluster for testing
If you want, you may bring up a cluster in some other manner and run tests
@ -265,7 +316,7 @@ Next, specify the docker repository where your ci images will be pushed.
* Compile the binaries and build container images:
```sh
$ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true go run hack/e2e.go -v -build
$ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true go run hack/e2e.go -- -v -build
```
* Push the federation container images
@ -280,7 +331,7 @@ The following command will create the underlying Kubernetes clusters in each of
federation control plane in the cluster occupying the last zone in the `E2E_ZONES` list.
```sh
$ go run hack/e2e.go -v --up
$ go run hack/e2e.go -- -v --up
```
#### Run the Tests
@ -288,13 +339,13 @@ $ go run hack/e2e.go -v --up
This will run only the `Feature:Federation` e2e tests. You can omit the `ginkgo.focus` argument to run the entire e2e suite.
```sh
$ go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Feature:Federation\]"
$ go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Feature:Federation\]"
```
#### Teardown
```sh
$ go run hack/e2e.go -v --down
$ go run hack/e2e.go -- -v --down
```
#### Shortcuts for test developers
@ -364,13 +415,13 @@ at a custom host directly:
export KUBECONFIG=/path/to/kubeconfig
export KUBE_MASTER_IP="http://127.0.0.1:<PORT>"
export KUBE_MASTER=local
go run hack/e2e.go -v --test
go run hack/e2e.go -- -v --test
```
To control the tests that are run:
```sh
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\"Secrets\""
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\"Secrets\""
```
### Version-skewed and upgrade testing
@ -403,7 +454,7 @@ export CLUSTER_API_VERSION=${OLD_VERSION}
# Deploy a cluster at the old version; see above for more details
cd ./kubernetes_old
go run ./hack/e2e.go -v --up
go run ./hack/e2e.go -- -v --up
# Upgrade the cluster to the new version
#
@ -411,11 +462,11 @@ go run ./hack/e2e.go -v --up
#
# You can target Feature:MasterUpgrade or Feature:ClusterUpgrade
cd ../kubernetes
go run ./hack/e2e.go -v --test --check_version_skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]"
go run ./hack/e2e.go -- -v --test --check_version_skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]"
# Run old tests with new kubectl
cd ../kubernetes_old
go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
go run ./hack/e2e.go -- -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
```
If you are just testing version-skew, you may want to just deploy at one
@ -427,14 +478,14 @@ upgrade process:
# Deploy a cluster at the new version
cd ./kubernetes
go run ./hack/e2e.go -v --up
go run ./hack/e2e.go -- -v --up
# Run new tests with old kubectl
go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh"
go run ./hack/e2e.go -- -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh"
# Run old tests with new kubectl
cd ../kubernetes_old
go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
go run ./hack/e2e.go -- -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh"
```
## Kinds of tests
@ -480,6 +531,15 @@ breaking changes, it does *not* block the merge-queue, and thus should run in
some separate test suites owned by the feature owner(s)
(see [Continuous Integration](#continuous-integration) below).
In order to simplify running component-specific test suites, it may also be
necessary to tag tests with a component label. The component may include
standard and non-standard tests, so the `[Feature:.+]` label is not sufficient for
this purpose. These component labels have no impact on the standard e2e test
suites. The following component labels have been defined:
- `[Volume]`: All tests related to volumes and storage: volume plugins,
attach/detatch controller, persistent volume controller, etc.
### Viper configuration and hierarchichal test parameters.
The future of e2e test configuration idioms will be increasingly defined using viper, and decreasingly via flags.
@ -490,7 +550,7 @@ To use viper, rather than flags, to configure your tests:
- Just add "e2e.json" to the current directory you are in, and define parameters in it... i.e. `"kubeconfig":"/tmp/x"`.
Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](../../test/e2e/framework/test_context.go).
Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/test_context.go).
In time, it is our intent to add or autogenerate a sample viper configuration that includes all e2e parameters, to ship with kubernetes.
@ -527,13 +587,13 @@ export KUBERNETES_CONFORMANCE_TEST=y
export KUBERNETES_PROVIDER=skeleton
# run all conformance tests
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Conformance\]"
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Conformance\]"
# run all parallel-safe conformance tests in parallel
GINKGO_PARALLEL=y go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]"
GINKGO_PARALLEL=y go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]"
# ... and finish up with remaining tests in serial
go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Serial\].*\[Conformance\]"
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Serial\].*\[Conformance\]"
```
### Defining Conformance Subset

View File

@ -19,7 +19,7 @@
[Gubernator](https://k8s-gubernator.appspot.com/) is a webpage for viewing and filtering Kubernetes
test results.
Gubernator simplifies the debugging proccess and makes it easier to track down failures by automating many
Gubernator simplifies the debugging process and makes it easier to track down failures by automating many
steps commonly taken in searching through logs, and by offering tools to filter through logs to find relevant lines.
Gubernator automates the steps of finding the failed tests, displaying relevant logs, and determining the
failed pods and the corresponing pod UID, namespace, and container ID.
@ -83,7 +83,7 @@ included, the "Weave by timestamp" option can weave the selected logs together b
*Currently Gubernator can only be used with remote node e2e tests.*
**NOTE: Using Gubernator with local tests will publically upload your test logs to Google Cloud Storage**
**NOTE: Using Gubernator with local tests will publicly upload your test logs to Google Cloud Storage**
To use Gubernator to view logs from local test runs, set the GUBERNATOR tag to true.
A URL link to view the test results will be printed to the console.

View File

@ -1,32 +1,3 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
width="25" height="25">
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.
Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Container Runtime Interface (CRI) Networking Specifications
## Introduction
@ -82,4 +53,4 @@ k8s networking requirements are satisfied.
## Related Issues
* Kubelet network plugin for client/server container runtimes [#28667](https://github.com/kubernetes/kubernetes/issues/28667)
* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316)
* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316)

View File

@ -26,7 +26,7 @@ Heapster will hide the performance cost of serving those stats in the Kubelet.
Disabling addons is simple. Just ssh into the Kubernetes master and move the
addon from `/etc/kubernetes/addons/` to a backup location. More details
[here](../../cluster/addons/).
[here](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/).
### Which / how many pods?
@ -57,11 +57,11 @@ sampling.
## E2E Performance Test
There is an end-to-end test for collecting overall resource usage of node
components: [kubelet_perf.go](../../test/e2e/kubelet_perf.go). To
components: [kubelet_perf.go](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/kubelet_perf.go). To
run the test, simply make sure you have an e2e cluster running (`go run
hack/e2e.go -up`) and [set up](#cluster-set-up) correctly.
hack/e2e.go -- -up`) and [set up](#cluster-set-up) correctly.
Run the test with `go run hack/e2e.go -v -test
Run the test with `go run hack/e2e.go -- -v -test
--test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to
customise the number of pods or other parameters of the test (remember to rerun
`make WHAT=test/e2e/e2e.test` after you do).

View File

@ -7,9 +7,9 @@
### Traffic sources and responsibilities
* GitHub Kubernetes [issues](https://github.com/kubernetes/kubernetes/issues)
and [pulls](https://github.com/kubernetes/kubernetes/pulls): Your job is to be
the first responder to all new issues and PRs. If you are not equipped to do
* GitHub Kubernetes [issues](https://github.com/kubernetes/kubernetes/issues):
Your job is to be
the first responder to all new issues. If you are not equipped to do
this (which is fine!), it is your job to seek guidance!
* Support issues should be closed and redirected to Stackoverflow (see example
@ -35,18 +35,12 @@ This is the only situation in which you should add a priority/* label
* Assign any issues related to Vagrant to @derekwaynecarr (and @mention him
in the issue)
* All incoming PRs should be assigned a reviewer.
* unless it is a WIP (Work in Progress), RFC (Request for Comments), or design proposal.
* An auto-assigner [should do this for you] (https://github.com/kubernetes/kubernetes/pull/12365/files)
* When in doubt, choose a TL or team maintainer of the most relevant team; they can delegate
* Keep in mind that you can @ mention people in an issue/PR to bring it to
* Keep in mind that you can @ mention people in an issue to bring it to
their attention without assigning it to them. You can also @ mention github
teams, such as @kubernetes/goog-ux or @kubernetes/kubectl
* If you need help triaging an issue or PR, consult with (or assign it to)
@brendandburns, @thockin, @bgrant0607, @quinton-hoole, @davidopp, @dchen1107,
* If you need help triaging an issue, consult with (or assign it to)
@brendandburns, @thockin, @bgrant0607, @davidopp, @dchen1107,
@lavalamp (all U.S. Pacific Time) or @fgrzadkowski (Central European Time).
* At the beginning of your shift, please add team/* labels to any issues that

View File

@ -29,7 +29,7 @@ redirect users to Slack. Also check out the
In general, try to direct support questions to:
1. Documentation, such as the [user guide](../user-guide/README.md) and
1. Documentation, such as the [user guide](https://kubernetes.io/docs/user-guide/) and
[troubleshooting guide](http://kubernetes.io/docs/troubleshooting/)
2. Stackoverflow

View File

@ -13,9 +13,13 @@
# Pull Request Process
An overview of how pull requests are managed for kubernetes. This document
assumes the reader has already followed the [development guide](development.md)
to set up their environment.
An overview of how pull requests are managed for kubernetes.
This document assumes the reader has already followed the
[development guide](development.md) to set up their environment,
and understands
[basic pull request mechanics](https://help.github.com/articles/using-pull-requests).
# Life of a Pull Request
@ -50,7 +54,7 @@ For cherry-pick PRs, see the [Cherrypick instructions](cherry-picks.md)
at release time.
1. `release-note` labeled PRs generate a release note using the PR title by
default OR the release-note block in the PR template if filled in.
* See the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md) for more
* See the [PR template](https://github.com/kubernetes/kubernetes/blob/master/.github/PULL_REQUEST_TEMPLATE.md) for more
details.
* PR titles and body comments are mutable and can be modified at any time
prior to the release to reflect a release note friendly message.

View File

@ -45,8 +45,8 @@ The Release Management Team Lead is the person ultimately responsible for ensuri
* Ensures that cherry-picks do not destabilize the branch by either giving the PR enough time to stabilize in master or giving it enough time to stabilize in the release branch before cutting the release.
* Cuts the actual [release](https://github.com/kubernetes/kubernetes/releases).
#### Release Docs Lead
* Sets release docs related deadlines for developers and works with Release Management Team Lead to ensure they are widely communicated.
#### Docs Lead
* Sets docs related deadlines for developers and works with Release Management Team Lead to ensure they are widely communicated.
* Sets up release branch for docs.
* Pings feature owners to ensure that release docs are created on time.
* Reviews/merges release doc PRs.

View File

@ -117,8 +117,8 @@ cluster/kubectl.sh get replicationcontrollers
### Running a user defined pod
Note the difference between a [container](../user-guide/containers.md)
and a [pod](../user-guide/pods.md). Since you only asked for the former, Kubernetes will create a wrapper pod for you.
Note the difference between a [container](https://kubernetes.io/docs/user-guide/containers/)
and a [pod](https://kubernetes.io/docs/user-guide/pods/). Since you only asked for the former, Kubernetes will create a wrapper pod for you.
However you cannot view the nginx start page on localhost. To verify that nginx is running you need to run `curl` within the docker container (try `docker exec`).
You can control the specifications of a pod via a user defined manifest, and reach nginx through your browser on the port specified therein:

View File

@ -9,9 +9,9 @@ and for each Pod, it posts a binding indicating where the Pod should be schedule
We are dividng scheduler into three layers from high level:
- [plugin/cmd/kube-scheduler/scheduler.go](http://releases.k8s.io/HEAD/plugin/cmd/kube-scheduler/scheduler.go):
This is the main() entry that does initialization before calling the scheduler framework.
- [pkg/scheduler/scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/scheduler.go):
- [plugin/pkg/scheduler/scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/scheduler.go):
This is the scheduler framework that handles stuff (e.g. binding) beyond the scheduling algorithm.
- [pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/generic_scheduler.go):
- [plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/core/generic_scheduler.go):
The scheduling algorithm that assigns nodes for pods.
## The scheduling algorithm
@ -64,7 +64,7 @@ The Scheduler tries to find a node for each Pod, one at a time.
- First it applies a set of "predicates" to filter out inappropriate nodes. For example, if the PodSpec specifies resource requests, then the scheduler will filter out nodes that don't have at least that much resources available (computed as the capacity of the node minus the sum of the resource requests of the containers that are already running on the node).
- Second, it applies a set of "priority functions"
that rank the nodes that weren't filtered out by the predicate check. For example, it tries to spread Pods across nodes and zones while at the same time favoring the least (theoretically) loaded nodes (where "load" - in theory - is measured as the sum of the resource requests of the containers running on the node, divided by the node's capacity).
- Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in [plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/generic_scheduler.go)
- Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in [plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/core/generic_scheduler.go)
### Predicates and priorities policies

View File

@ -11,7 +11,7 @@ The purpose of filtering the nodes is to filter out the nodes that do not meet c
- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../design-proposals/resource-qos.md).
- `PodFitsHostPorts`: Check if any HostPort required by the Pod is already occupied on the node.
- `HostName`: Filter out all nodes except the one specified in the PodSpec's NodeName field.
- `MatchNodeSelector`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field and, as of Kubernetes v1.2, also match the `scheduler.alpha.kubernetes.io/affinity` pod annotation if present. See [here](../user-guide/node-selection/) for more details on both.
- `MatchNodeSelector`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field and, as of Kubernetes v1.2, also match the `scheduler.alpha.kubernetes.io/affinity` pod annotation if present. See [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details on both.
- `MaxEBSVolumeCount`: Ensure that the number of attached ElasticBlockStore volumes does not exceed a maximum value (by default, 39, since Amazon recommends a maximum of 40 with one of those 40 reserved for the root volume -- see [Amazon's documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html#linux-specific-volume-limits)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable.
- `MaxGCEPDVolumeCount`: Ensure that the number of attached GCE PersistentDisk volumes does not exceed a maximum value (by default, 16, which is the maximum GCE allows -- see [GCE's documentation](https://cloud.google.com/compute/docs/disks/persistent-disks#limits_for_predefined_machine_types)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable.
- `CheckNodeMemoryPressure`: Check if a pod can be scheduled on a node reporting memory pressure condition. Currently, no ``BestEffort`` should be placed on a node under memory pressure as it gets automatically evicted by kubelet.
@ -34,7 +34,7 @@ Currently, Kubernetes scheduler provides some practical priority functions, incl
- `SelectorSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service, replication controller, or replica set on the same node. If zone information is present on the nodes, the priority will be adjusted so that pods are spread across zones and nodes.
- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label.
- `ImageLocalityPriority`: Nodes are prioritized based on locality of images requested by a pod. Nodes with larger size of already-installed packages required by the pod will be preferred over nodes with no already-installed packages required by the pod or a small total size of already-installed packages required by the pod.
- `NodeAffinityPriority`: (Kubernetes v1.2) Implements `preferredDuringSchedulingIgnoredDuringExecution` node affinity; see [here](../user-guide/node-selection/) for more details.
- `NodeAffinityPriority`: (Kubernetes v1.2) Implements `preferredDuringSchedulingIgnoredDuringExecution` node affinity; see [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details.
The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize).

View File

@ -0,0 +1,103 @@
# Security Release Process
Kubernetes is a large growing community of volunteers, users, and vendors. The Kubernetes community has adopted this security disclosures and response policy to ensure we responsibly handle critical issues.
## Product Security Team (PST)
Security vulnerabilities should be handled quickly and sometimes privately. The primary goal of this process is to reduce the total time users are vulnerable to publicly known exploits.
The Product Security Team (PST) is responsible for organizing the entire response including internal communication and external disclosure but will need help from relevant developers and release managers to successfully run this process.
The initial Product Security Team will consist of four volunteers subscribed to the private [Kubernetes Security](https://groups.google.com/forum/#!forum/kubernetes-security) list. These are the people who have been involved in the initial discussion and volunteered:
- Brandon Philips `<brandon.philips@coreos.com>` [4096R/154343260542DF34]
- Jess Frazelle `<jessfraz@google.com>`
- CJ Cullen `<cjcullen@google.com>`
- Tim St. Clair `<stclair@google.com>` [4096R/0x5E6F2E2DA760AF51]
- Jordan Liggitt `<jliggitt@redhat.com>`
**Known issues**
- We haven't specified a way to cycle the Product Security Team; but we need this process deployed quickly as our current process isn't working. I (@philips) will put a deadline of March 1st 2017 to sort that.
## Release Manager Role
Also included on the private [Kubernetes Security](https://groups.google.com/forum/#!forum/kubernetes-security) list are all [Release Managers](https://github.com/kubernetes/community/wiki).
It is the responsibility of the PST to add and remove Release Managers as Kubernetes minor releases created and deprecated.
## Disclosures
### Private Disclosure Processes
The Kubernetes Community asks that all suspected vulnerabilities be privately and responsibly disclosed via the Private Disclosure process available at [https://kubernetes.io/security](https://kubernetes.io/security).
### Public Disclosure Processes
If you know of a publicly disclosed security vulnerability please IMMEDIATELY email [kubernetes-security@googlegroups.com](mailto:kubernetes-security@googlegroups.com) to inform the Product Security Team (PST) about the vulnerability so they may start the patch, release, and communication process.
If possible the PST will ask the person making the public report if the issue can be handled via a private disclosure process. If the reporter denies the PST will move swiftly with the fix and release process. In extreme cases you can ask GitHub to delete the issue but this generally isn't necessary and is unlikely to make a public disclosure less damaging.
## Patch, Release, and Public Communication
For each vulnerability a member of the PST will volunteer to lead coordination with the Fix Team, Release Managers and is responsible for sending disclosure emails to the rest of the community. This lead will be referred to as the Fix Lead.
The role of Fix Lead should rotate round-robin across the PST.
All of the timelines below are suggestions and assume a Private Disclosure. The Fix Lead drives the schedule using their best judgment based on severity, development time, and release manager feedback. If the Fix Lead is dealing with a Public Disclosure all timelines become ASAP.
### Fix Team Organization
These steps should be completed within the first 24 hours of Disclosure.
- The Fix Lead will work quickly to identify relevant engineers from the affected projects and packages and CC those engineers into the disclosure thread. This selected developers are the Fix Team. A best guess is to invite all assignees in the OWNERS file from the affected packages.
- The Fix Lead will get the Fix Team access to private security repos to develop the fix.
### Fix Development Process
These steps should be completed within the 1-7 days of Disclosure.
- The Fix Lead and the Fix Team will create a [CVSS](https://www.first.org/cvss/specification-document) using the [CVSS Calculator](https://www.first.org/cvss/calculator/3.0). The Fix Lead makes the final call on the calculated CVSS; it is better to move quickly than make the CVSS prefect.
- The Fix Team will notify the Fix Lead that work on the fix branch is complete once there are LGTMs on all commits in the private repo from one or more relevant assignees in the relevant OWNERS file.
If the CVSS score is under 4.0 ([a low severity score](https://www.first.org/cvss/specification-document#i5)) the Fix Team can decide to slow the release process down in the face of holidays, developer bandwidth, etc. These decisions must be discussed on the kubernetes-security mailing list.
### Fix Disclosure Process
With the Fix Development underway the Fix Lead needs to come up with an overall communication plan for the wider community. This Disclosure process should begin after the Fix Team has developed a Fix or mitigation so that a realistic timeline can be communicated to users.
**Disclosure of Forthcoming Fix to Users** (Completed within 1-7 days of Disclosure)
- The Fix Lead will email [kubernetes-announce@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-announce) and [kubernetes-security-announce@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-security-announce) informing users that a security vulnerability has been disclosed and that a fix will be made available at YYYY-MM-DD HH:MM UTC in the future via this list. This time is the Release Date.
- The Fix Lead will include any mitigating steps users can take until a fix is available.
The communication to users should be actionable. They should know when to block time to apply patches, understand exact mitigation steps, etc.
**Optional Fix Disclosure to Private Distributors List** (Completed within 1-14 days of Disclosure):
- The Fix Lead will make a determination with the help of the Fix Team if an issue is critical enough to require early disclosure to distributors. Generally this Private Distributor Disclosure process should be reserved for remotely exploitable or privilege escalation issues. Otherwise, this process can be skipped.
- The Fix Lead will email the patches to kubernetes-distributors-announce@googlegroups.com so distributors can prepare builds to be available to users on the day of the issue's announcement. Distributors can ask to be added to this list by emailing kubernetes-security@googlegroups.com and it is up to the Product Security Team's discretion to manage the list.
- TODO: Figure out process for getting folks onto this list.
- **What if a vendor breaks embargo?** The PST will assess the damage. The Fix Lead will make the call to release earlier or continue with the plan. When in doubt push forward and go public ASAP.
**Fix Release Day** (Completed within 1-21 days of Disclosure)
- The Release Managers will ensure all the binaries are built, publicly available, and functional before the Release Date.
- TODO: this will require a private security build process.
- The Release Managers will create a new patch release branch from the latest patch release tag + the fix from the security branch. As a practical example if v1.5.3 is the latest patch release in kubernetes.git a new branch will be created called v1.5.4 which includes only patches required to fix the issue.
- The Fix Lead will cherry-pick the patches onto the master branch and all relevant release branches. The Fix Team will LGTM and merge.
- The Release Managers will merge these PRs as quickly as possible. Changes shouldn't be made to the commits even for a typo in the CHANGELOG as this will change the git sha of the already built and commits leading to confusion and potentially conflicts as the fix is cherry-picked around branches.
- The Fix Lead will request a CVE from [DWF](https://github.com/distributedweaknessfiling/DWF-Documentation) and include the CVSS and release details.
- The Fix Lead will email kubernetes-{dev,users,announce,security-announce}@googlegroups.com now that everything is public announcing the new releases, the CVE number, the location of the binaries, and the relevant merged PRs to get wide distribution and user action. As much as possible this email should be actionable and include links how to apply the fix to users environments; this can include links to external distributor documentation.
- The Fix Lead will remove the Fix Team from the private security repo.
### Retrospective
These steps should be completed 1-3 days after the Release Date. The retrospective process [should be blameless](https://landing.google.com/sre/book/chapters/postmortem-culture.html).
- The Fix Lead will send a retrospective of the process to kubernetes-dev@googlegroups.com including details on everyone involved, the timeline of the process, links to relevant PRs that introduced the issue, if relevant, and any critiques of the response and release process.
- The Release Managers and Fix Team are also encouraged to send their own feedback on the process to kubernetes-dev@googlegroups.com. Honest critique is the only way we are going to get good at this as a community.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/security-release-process.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@ -69,19 +69,19 @@ You can set [go flags](https://golang.org/cmd/go/) by setting the
added automatically to these:
```sh
make test WHAT=pkg/api # run tests for pkg/api
make test WHAT=./pkg/api # run tests for pkg/api
```
To run multiple targets you need quotes:
```sh
make test WHAT="pkg/api pkg/kubelet" # run tests for pkg/api and pkg/kubelet
make test WHAT="./pkg/api ./pkg/kubelet" # run tests for pkg/api and pkg/kubelet
```
In a shell, it's often handy to use brace expansion:
```sh
make test WHAT=pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet
make test WHAT=./pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet
```
### Run specific unit test cases in a package
@ -92,10 +92,10 @@ regular expression for the name of the test that should be run.
```sh
# Runs TestValidatePod in pkg/api/validation with the verbose flag set
make test WHAT=pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$'
make test WHAT=./pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$'
# Runs tests that match the regex ValidatePod|ValidateConfigMap in pkg/api/validation
make test WHAT=pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$"
make test WHAT=./pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$"
```
For other supported test flags, see the [golang
@ -130,7 +130,7 @@ To run tests and collect coverage in only one package, pass its relative path
under the `kubernetes` directory as an argument, for example:
```sh
make test WHAT=pkg/kubectl KUBE_COVER=y
make test WHAT=./pkg/kubectl KUBE_COVER=y
```
Multiple arguments can be passed, in which case the coverage results will be
@ -215,7 +215,7 @@ script to run a specific integration test case:
```sh
# Run integration test TestPodUpdateActiveDeadlineSeconds with the verbose flag set.
make test-integration KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$"
make test-integration WHAT=./test/integration/pods KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$"
```
If you set `KUBE_TEST_ARGS`, the test case will be run with only the `v1` API

View File

@ -146,7 +146,7 @@ right thing.
Here are a few pointers:
+ [E2e Framework](../../test/e2e/framework/framework.go):
+ [E2e Framework](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/framework.go):
Familiarise yourself with this test framework and how to use it.
Amongst others, it automatically creates uniquely named namespaces
within which your tests can run to avoid name clashes, and reliably
@ -160,7 +160,7 @@ Here are a few pointers:
should always use this framework. Trying other home-grown
approaches to avoiding name clashes and resource leaks has proven
to be a very bad idea.
+ [E2e utils library](../../test/e2e/framework/util.go):
+ [E2e utils library](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/util.go):
This handy library provides tons of reusable code for a host of
commonly needed test functionality, including waiting for resources
to enter specified states, safely and consistently retrying failed
@ -178,9 +178,9 @@ Here are a few pointers:
+ **Follow the examples of stable, well-written tests:** Some of our
existing end-to-end tests are better written and more reliable than
others. A few examples of well-written tests include:
[Replication Controllers](../../test/e2e/rc.go),
[Services](../../test/e2e/service.go),
[Reboot](../../test/e2e/reboot.go).
[Replication Controllers](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/rc.go),
[Services](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/service.go),
[Reboot](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/reboot.go).
+ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the
test library and runner upon which our e2e tests are built. Before
you write or refactor a test, read the docs and make sure that you

View File

@ -113,14 +113,15 @@ These are grandfathered in as full projects:
- github.com/kubernetes/md-check
- github.com/kubernetes/pr-bot - move from mungebot, etc from contrib, currently running in "prod" on github.com/kubernetes
- github.com/kubernetes/dashboard
- github.com/kubernetes/helm (Graduated from incubator on Feb 2017)
- github.com/kubernetes/minikube (Graduated from incubator on Feb 2017)
**Project to Incubate But Not Move**
These projects are young but have significant user facing docs pointing at their current github.com/kubernetes location. Lets put them through incubation process but leave them at github.com/kubernetes.
- github.com/kubernetes/minikube
- github.com/kubernetes/charts
**Projects to Move to Incubator**
- github.com/kubernetes/kube2consul
@ -149,5 +150,6 @@ Large portions of this process and prose are inspired by the Apache Incubator pr
## Original Discussion
https://groups.google.com/d/msg/kubernetes-dev/o6E1u-orDK8/SAqal_CeCgAJ
## Future Work
## Future Work
- Expanding potential sources of champions outside of Kubernetes main repo

View File

@ -1,10 +1,11 @@
# SIG AWS
A Special Interest Group for maintaining, supporting, and using Kubernetes on AWS.
A Special Interest Group for maintaining, supporting, and using Kubernetes on AWS.
## Meeting:
- Meetings: Scheduled via the official [group mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-aws)
- Zoom Link: [SIG AWS](https://zoom.us/my/k8ssigaws)
- Agenda: [Google Doc](https://docs.google.com/document/d/1-i0xQidlXnFEP9fXHWkBxqySkXwJnrGJP9OGyP2_P14/edit)
## Organizers:
@ -15,4 +16,4 @@ A Special Interest Group for maintaining, supporting, and using Kubernetes on AW
| Kris Nova | [kris-nova](https://github.com/kris-nova) |
| Mackenzie Burnett | [mfburnett](https://github.com/mfburnett) |
The meeting is open to all and we encourage you to join. Feel free to join the zoom call at your convenience.
The meeting is open to all and we encourage you to join. Feel free to join the zoom call at your convenience.

View File

@ -1,3 +1,26 @@
# NOTE: THE BIG DATA SIG IS INDEFINITELY SUSPENDED, IN FAVOR OF THE ["APPS" SIG](https://github.com/kubernetes/community/blob/master/sig-apps/README.md).
# SIG Big Data
[Old Meeting Notes](https://docs.google.com/document/d/1YhNLN39f5oZ4AHn_g7vBp0LQd7k37azL7FkWG8CEDrE/edit)
A Special Interest Group for deploying and operating big data applications (Spark, Kafka, Hadoop, Flink, Storm, etc) on Kubernetes. We focus on integrations with big data applications and architecting the best ways to run them on Kubernetes.
## Meeting:
* Meetings: Wednesdays 10:00 AM PST
* Video Conference Link: updated in [the official group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data)
* Check out the [Agenda and Minutes](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit)! Note: this SIG was operational briefly in 2015. Minutes for those meetings are in [their prior location](https://docs.google.com/document/u/1/d/1YhNLN39f5oZ4AHn_g7vBp0LQd7k37azL7FkWG8CEDrE/edit).
* Slack: https://kubernetes.slack.com/messages/sig-big-data/
## Goals:
* Design and architect ways to run big data applications effectively on Kubernetes
* Discuss ongoing implementation efforts
* Discuss resource sharing and multi-tenancy (in the context of big data applications)
* Suggest Kubernetes features where we see a need
## Non-goals:
* Endorsing any particular tool/framework
## Organizers:
* [Anirudh Ramanathan](https://github.com/foxish), Google
* [Kenneth Owens](https://github.com/kow3ns), Google

View File

@ -13,7 +13,31 @@ We focus on the development and standardization of the CLI [framework](https://g
* Slack: <https://kubernetes.slack.com/messages/sig-cli> ([archive](http://kubernetes.slackarchive.io/sig-cli))
* Google Group: <https://groups.google.com/forum/#!forum/kubernetes-sig-cli>
## Organizers:
## Leads
**Note:** Escalate to these folks if you cannot get help from slack or the Google group
* Fabiano Franz <ffranz@redhat.com>, Red Hat
- slack / github: @fabianofranz
* Phillip Wittrock <pwittroc@google.com>, Google
* Tony Ado <coolhzb@gmail.com>, Alibaba
- slack / github: @pwittrock
* Tony Ado <coolhzb@gmail.com>, Alibaba
- slack / github: @adohe
## Contributing
See [this document](https://github.com/kubernetes/community/blob/master/sig-cli/contributing.md) for contributing instructions.
## Sig-cli teams
Mention one or more of
| Name | Description |
|------------------------------------|--------------------------------------------------|
|@kubernetes/sig-cli-bugs | For bugs in kubectl |
|@kubernetes/sig-cli-feature-requests| For initial discussion of new feature requests |
|@kubernetes/sig-cli-proposals | For in depth discussion of new feature proposals |
|@kubernetes/sig-cli-pr-reviews | For PR code reviews |
|@kubernetes/sig-test-failures | For e2e test flakes |
|@kubernetes/sig-cli-misc | For general discussion and escalation |

306
sig-cli/contributing.md Normal file
View File

@ -0,0 +1,306 @@
# Contributing
The process for contributing code to Kubernetes via the sig-cli [community][community page].
## TL;DR
- The sig-cli [community page] lists sig-cli [leads],
channels of [communication], and group [meeting] times.
- New contributors: please start by adopting an [existing issue].
- Request a feature by making an [issue] and mentioning
`@kubernetes/sig-cli-feature-requests`.
- Write a [design proposal] before starting work on a new feature.
- Write [tests]!
## Before You Begin
Welcome to the Kubernetes sig-cli contributing guide. We are excited
about the prospect of you joining our [community][community page]!
Please understand that all contributions to Kubernetes require time
and commitment from the project maintainers to review the ux, software
design, and code. Mentoring and on-boarding new contributors is done
in addition to many other responsibilities.
### Understand the big picture
- Complete the [Kubernetes Basics Tutorial].
- Be familiar with [kubctl user facing documentation ][kubectl docs].
- Read the concept guides starting with the [management overview].
### Modify your own `kubectl` fork
Make sure you are ready to immediately get started once you have been
assigned a piece of work. Do this right away.
- Setup your [development environment][development guide].
- Look at code:
- [kubernetes/cmd/kubectl] is the entry point
- [kubernetes/pkg/kubectl] is the implementation
- Look at how some of the other commands are implemented
- Add a new command to do something simple:
- Add `kubectl hello-world`: print "Hello World"
- Add `kubectl hello-kubernetes -f file`: Print "Hello \<kind of resource\> \<name of resource\>"
- Add `kubectl hello-kubernetes type/name`: Print "Hello \<kind of resource\> \<name of resource\> \<creation time\>"
### Agree to contribution rules
Follow the [CLA signup instructions](../CLA.md).
### Adopt an issue
New contributors can try the following to work on an existing [bug] or [approved design][design repo]:
- In [slack][slack-messages] (signup [here][slack-signup]),
@mention a [lead][leads] and ask if there are any issues you could pick up.
Leads can recommend issues that have enough priority to receive PR review bandwidth.
- Send an email to the _kubernetes-sig-cli@googlegroups.com_ [group]
> Subject: New sig-cli contributor _${yourName}_
>
> Body: Hello, my name is _${yourName}_. I would like to get involved in
> contributing to the Kubernetes project. I have read all of the
> user documentation listed on the community contributring page.
> What should I do next to get started?
- Attend a sig-cli [meeting] and introduce yourself as looking to get started.
### Bug lifecycle
1. An [issue] is filed that
- includes steps to reproduce the issue including client / server version,
- mentions `@kubernetes/sig-cli-bugs`.
2. A [PR] fixing the issue is implemented that
- __includes unit and e2e tests__,
- incorporates review feedback,
- description includes `Closes #<Issue Number>`,
- description or comment @mentions `@kubernetes/sig-cli-pr-reviews`.
3. Fix appears in the next Kubernetes release!
## Feature requests
__New contributors:__ Please start by adopting an [existing issue].
A feature request is an [issue] mentioning `@kubernetes/sig-cli-feature-requests`.
To encourage readership, the issue description should _concisely_ (2-4 sentence) describe
the problem that the feature addresses.
### Feature lifecycle
Working on a feature without getting approval for the user experience
and software design often results in wasted time and effort due to
decisions around flag-names, command names, and specific command
behavior.
To minimize wasted work and improve communication across efforts,
the user experience and software design must be agreed upon before
any PRs are sent for code review.
1. Identify a problem by filing an [issue] (mention `@kubernetes/sig-cli-feature-requests`).
2. Submit a [design proposal] and get it approved by a lead.
3. Announce the proposal as an [agenda] item for the sig-cli [meeting].
- Ensures awareness and feedback.
- Should be included in meeting notes sent to the sig-cli [group].
4. _Merge_ the proposal PR after approval and announcement.
5. A [lead][leads] adds the associated feature to the [feature repo], ensuring that
- release-related decisions are properly made and communicated,
- API changes are vetted,
- testing is completed,
- docs are completed,
- feature is designated _alpha_, _beta_ or _GA_.
6. Implement the code per discussion in [bug lifecycle][bug].
7. Update [kubectl concept docs].
8. Wait for your feature to appear in the next Kubernetes release!
## Design Proposals
__New contributors:__ Please start by adopting an [existing issue].
A design proposal is a single markdown document in the [design repo]
that follows the [design template].
To make one,
- Prepare the markdown document as a PR to that repo.
- Avoid _Work In Progress_ (WIP) PRs (send it only after
you consider it complete).
- For early feedback, use the email discussion [group].
- Mention `@kubernetes/sig-cli-proposals` in the description.
- Mention the related [feature request].
Expect feedback from 2-3 different sig-cli community members.
Incorporate feedback and comment [`PTAL`].
Once a [lead][leads] has agreed (via review commentary) that design
and code review resources can be allocated to tackle the proposal, the
details of the user experience and design should be discussed in the
community.
This step is _important_; it prevents code churn and thrashing around
issues like flag names, command names, etc.
It is normal for sig-cli community members to push back on feature
proposals. sig-cli development and review resources are extremely
constrained. Community members are free to say
- No, not this release (or year).
- This is desirable but we need help on these other existing issues before tackling this.
- No, this problem should be solved in another way.
The proposal can be merged into the [design repo] after [lead][leads]
approval and discussion as a meeting [agenda] item.
Then coding can begin.
## Implementation
Contributors can begin implementing a feature before any of the above
steps have been completed, but _should not send a PR until
the [design proposal] has been merged_.
See the [development guide] for instructions on setting up the
Kubernetes development environment.
Implementation PRs should
- mention the issue of the associated design proposal,
- mention `@kubernetes/sig-cli-pr-reviews`,
- __include tests__.
Small features and flag changes require only unit/integration tests,
while larger changes require both unit/integration tests and e2e tests.
### Report progress
_Leads need your help to ensure that progress is made to
get the feature into a [release]._
While working on the issue, leave a weekly update on the issue
including:
1. What's finished?
2. What's part is being worked on now?
3. Anything blocking?
## Documentation
_Let users know about cool new features by updating user facing documentation._
Depending on the contributor and size of the feature, this
may be done either by the same contributor that implemented the feature,
or another contributor who is more familiar with the existing docs
templates.
## Release
Several weeks before a Kubernetes release, development enters a stabilization
period where no new features are merged. For a feature to be accepted
into a release, it must be fully merged and tested by this time. If
your feature is not fully complete, _including tests_, it will have
to wait until the next release.
## Merge state meanings
- Merged:
- Ready to be implemented.
- Unmerged:
- Experience and design still being worked out.
- Not a high priority issue but may implement in the future: revisit
in 6 months.
- Unintentionally dropped.
- Closed:
- Not something we plan to implement in the proposed manner.
- Not something we plan to revisit in the next 12 months.
## Escalation
### If your bug issue is stuck
If an issue isn't getting any attention and is unresolved, mention
`@kubernetes/sig-cli-bugs`.
Highlight the severity and urgency of the issue. For severe issues
escalate by contacting sig [leads] and attending the [meeting].
### If your feature request issue is stuck
If an issue isn't getting any attention and is unresolved, mention
`@kubernetes/sig-cli-feature-requests`.
If a particular issue has a high impact for you or your business,
make sure this is clear on the bug, and reach out to the sig leads
directly. Consider attending the sig meeting to discuss over video
conference.
### If your PR is stuck
It may happen that your PR seems to be stuck without clear actionable
feedback for a week or longer. A PR _associated with a bug or design
proposal_ is much less likely to be stuck than a dangling PR.
However, if it happens do the following:
- If your PR is stuck for a week or more because it has never gotten any
comments, mention `@kubernetes/sig-cli-pr-reviews` and ask for attention.
- If your PR is stuck for a week or more _after_ it got comments, but
the attention has died down. Mention the reviewer and comment with
[`PTAL`].
If you are still not able to get any attention after a couple days,
escalate to sig [leads] by mentioning them.
### If your design proposal issue is stuck
It may happen that your design doc gets stuck without getting merged
or additional feedback. If you believe that your design is important
and has been dropped, or it is not moving forward, please add it to
the sig cli bi-weekly meeting [agenda] and mail the [group] saying
you'd like to discuss it.
### General escalation instructions
See the sig-cli [community page] for points of contact and meeting times:
- attend the sig-cli [meeting]
- message one of the sig leads on [slack][slack-messages] (signup [here][slack-signup])
- send an email to the _kubernetes-sig-cli@googlegroups.com_ [group].
## Use of [@mentions]
- `@{any lead}` solicit opinion or advice from [leads].
- `@kubernetes/sig-cli-bugs` sig-cli centric bugs.
- `@kubernetes/sig-cli-pr-reviews` triggers review of code fix PR.
- `@kubernetes/sig-cli-feature-requests` flags a feature request.
- `@kubernetes/sig-cli-proposals` flags a design proposal.
[@mentions]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#mentioning-users-and-teams
[Kubernetes Basics Tutorial]: https://kubernetes.io/docs/tutorials/kubernetes-basics
[PR]: https://help.github.com/articles/creating-a-pull-request
[`PTAL`]: https://en.wiktionary.org/wiki/PTAL
[agenda]: https://docs.google.com/document/d/1r0YElcXt6G5mOWxwZiXgGu_X6he3F--wKwg-9UBc29I/edit
[bug]: #bug-lifecycle
[communication]: https://github.com/kubernetes/community/tree/master/sig-cli#communication
[community page]: https://github.com/kubernetes/community/tree/master/sig-cli
[design proposal]: #design-proposals
[design repo]: https://github.com/kubernetes/community/tree/master/contributors/design-proposals/sig-cli
[design template]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/sig-cli/template.md
[development guide]: https://github.com/kubernetes/community/blob/master/contributors/devel/development.md
[existing issue]: #adopt-an-issue
[feature repo]: https://github.com/kubernetes/features
[feature request]: #feature-requests
[feature]: https://github.com/kubernetes/features
[group]: https://groups.google.com/forum/#!forum/kubernetes-sig-cli
[issue]: https://github.com/kubernetes/kubernetes/issues
[kubectl concept docs]: https://github.com/kubernetes/kubernetes.github.io/tree/master/docs/concepts/tools/kubectl
[kubectl docs]: https://kubernetes.io/docs/user-guide/kubectl-overview
[kubernetes/cmd/kubectl]: https://github.com/kubernetes/kubernetes/tree/master/cmd/kubectl
[kubernetes/pkg/kubectl]: https://github.com/kubernetes/kubernetes/tree/master/pkg/kubectl
[leads]: https://github.com/kubernetes/community/tree/master/sig-cli#leads
[management overview]: https://kubernetes.io/docs/concepts/tools/kubectl/object-management-overview
[meeting]: https://github.com/kubernetes/community/tree/master/sig-cli#meetings
[release]: #release
[slack-messages]: https://kubernetes.slack.com/messages/sig-cli
[slack-signup]: http://slack.k8s.io/
[tests]: https://github.com/kubernetes/community/blob/master/contributors/devel/testing.md

View File

@ -19,6 +19,7 @@ and we need to get the monotonically growing PR merge latency and numbers of ope
##Organizers:
* Garrett Rodrigues grod@google.com, @grodrigues3, Google
* Elsie Phillips elsie.phillips@coreos.com, @Phillels, CoreOS
##Issues:
* [Detailed backlog](https://github.com/kubernetes/contrib/projects/1)

View File

@ -11,12 +11,11 @@
* Announce new SIG on kubernetes-dev@googlegroups.com
* Submit a PR to add a row for the SIG to the table in the kubernetes/community README.md file, to create a kubernetes/community directory, and to add any SIG-related docs, schedules, roadmaps, etc. to your new kubernetes/community/SIG-foo directory.
####
**Google Groups creation**
#### **Google Groups creation**
Create Google Groups at [https://groups.google.com/forum/#!creategroup](https://groups.google.com/forum/#!creategroup), following the procedure:
* Each SIG should have one discussion groups, and a number of groups for mirroring relevant github notificaitons;
* Each SIG should have one discussion groups, and a number of groups for mirroring relevant github notifications;
* Create groups using the name conventions below;
* Groups should be created as e-mail lists with at least three owners (including sarahnovotny at google.com and ihor.dvoretskyi at gmail.com);
* To add the owners, visit the Group Settings (drop-down menu on the right side), select Direct Add Members on the left side and add Sarah and Ihor via email address (with a suitable welcome message); in Members/All Members select Ihor and Sarah and assign to an "owner role";
@ -44,8 +43,7 @@ Example:
* kubernetes-sig-onprem-pr-reviews
* kubernetes-sig-onprem-api-reviews
####
**GitHub users creation**
#### **GitHub users creation**
Create the GitHub users at [https://github.com/join](https://github.com/join), using the name convention below.
@ -76,8 +74,7 @@ Example:
NOTE: We have found that Github's notification autocompletion finds the users before the corresponding teams. This is the reason we recommend naming the users `k8s-mirror-foo-*` instead of `k8s-sig-foo-*`. If you previously created users named `k8s-sig-foo-*`, we recommend you rename them.
####
**Create the GitHub teams**
#### **Create the GitHub teams**
Create the GitHub teams at [https://github.com/orgs/kubernetes/new-team](https://github.com/orgs/kubernetes/new-team), using the name convention below. Please, add the GitHub users (created before) to the GitHub teams respectively.

View File

@ -4,7 +4,7 @@ A Special Interest Group for documentation, doc processes, and doc publishing fo
## Meeting:
* Meetings: Tuesdays @ 10:30AM PST
* Zoom Link: https://zoom.us/j/4730809290
* Zoom Link: https://zoom.us/j/678394311
* Check out the [Agenda and Minutes](https://docs.google.com/document/d/1Ds87eRiNZeXwRBEbFr6Z7ukjbTow5RQcNZLaSvWWQsE/edit)
## Comms:
@ -12,12 +12,12 @@ A Special Interest Group for documentation, doc processes, and doc publishing fo
* Google Group: [kubernetes-sig-docs@googlegroups.com](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
## Goals:
* Discuss documentation and docs issues for kubernetes
* Plan docs releases for k8s
* Suggest improvements to developer onboarding where we see friction
* Discuss documentation and docs issues for kubernetes.io
* Plan docs releases for kubernetes
* Suggest improvements to user onboarding through better documentation on Kubernetes.io
* Identify and implement ways to get documentation feedback and metrics
* Help people get involved in the kubernetes community
* Help community contributors get involved in kubernetes documentation
## Organizers:
* Jared Bhatti <jaredb@google.com>, Google
* Devin Donnelly <ddonnelly@google.com>, Google
* Jared Bhatti <jaredb@google.com>, Google

View File

@ -4,12 +4,12 @@ This is a SIG focused on Federation of Kubernetes Clusters ("Ubernetes") and rel
* Hybrid clouds
* Spanning of multiple could providers
* Application migration from private to public clouds (and vice versa)
* ... and other similar subjects.
* ... and other similar subjects.
## Meetings:
* Bi-weekly on Mondays @ 9am [America/Los_Angeles](http://time.is/Los_Angeles) (check [the calendar](https://calendar.google.com/calendar/embed?src=cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com&ctz=America/Los_Angeles))
* Hangouts link: <https://plus.google.com/hangouts/_/google.com/ubernetes>
* [Working Group Notes](https://docs.google.com/document/d/1r0YElcXt6G5mOWxwZiXgGu_X6he3F--wKwg-9UBc29I/edit?usp=sharing)
* [Working Group Notes](https://docs.google.com/document/d/18mk62nOXE_MCSSnb4yJD_8UadtzJrYyJxFwbrgabHe8/edit)
## Communication:
* Slack: <https://kubernetes.slack.com/messages/sig-federation> ([archive](http://kubernetes.slackarchive.io/sig-federation))

43
sig-list.md Normal file
View File

@ -0,0 +1,43 @@
# SIGs and Working Groups
Most community activity is organized into Special Interest Groups (SIGs),
time bounded Working Groups, and the [community meeting](communication.md#Meeting).
SIGs follow these [guidelines](governance.md) although each of these groups may operate a little differently
depending on their needs and workflow.
Each group's material is in its subdirectory in this project.
When the need arises, a [new SIG can be created](sig-creation-procedure.md)
### Master SIG List
| Name | Leads | Group | Slack Channel | Meetings |
|------|-------|-------|---------------|----------|
| [API Machinery](sig-api-machinery/README.md) | [@lavalamp](https://github.com/lavalamp) Daniel Smith, Google <br> [@deads2k](https://github.com/orgs/kubernetes/people/deads2k) David Eads, Red Hat| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-api-machinery) | [#sig-api-machinery](https://kubernetes.slack.com/messages/sig-api-machinery/) | [Every other Wednesday at 11:00 AM PST](https://staging.talkgadget.google.com/hangouts/_/google.com/kubernetes-sig) |
| [Apps](sig-apps/README.md) | [@michelleN (Michelle Noorali, Deis)](https://github.com/michelleN)<br>[@mattfarina (Matt Farina, HPE)](https://github.com/mattfarina) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-apps) | [#sig-apps](https://kubernetes.slack.com/messages/sig-apps) | [Mondays 9:00AM PST](https://zoom.us/j/4526666954) |
| [Auth](sig-auth/README.md) | [@erictune (Eric Tune, Google)](https://github.com/erictune)<br> [@ericchiang (Eric Chiang, CoreOS)](https://github.com/orgs/kubernetes/people/ericchiang)<br> [@liggitt (Jordan Liggitt, Red Hat)](https://github.com/orgs/kubernetes/people/liggitt) <br> [@deads2k (David Eads, Red Hat)](https://github.com/orgs/kubernetes/people/deads2k) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-auth) | [#sig-auth](https://kubernetes.slack.com/messages/sig-auth/) | Biweekly [Wednesdays at 1100 to 1200 PT](https://zoom.us/my/k8s.sig.auth) |
| [Autoscaling](sig-autoscaling/README.md) | [@fgrzadkowski (Filip Grządkowski, Google)](https://github.com/fgrzadkowski)<br> [@directxman12 (Solly Ross, Red Hat)](https://github.com/directxman12) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-autoscaling) | [#sig-autoscaling](https://kubernetes.slack.com/messages/sig-autoscaling/) | Biweekly (or triweekly) on [Thurs at 0830 PT](https://plus.google.com/hangouts/_/google.com/k8s-autoscaling) |
| [AWS](sig-aws/README.md) | [@justinsb (Justin Santa Barbara)](https://github.com/justinsb)<br>[@kris-nova (Kris Nova)](https://github.com/kris-nova)<br>[@chrislovecnm (Chris Love)](https://github.com/chrislovecnm)<br>[@mfburnett (Mackenzie Burnett)](https://github.com/mfburnett) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) | [#sig-aws](https://kubernetes.slack.com/messages/sig-aws/) | We meet on [Zoom](https://zoom.us/my/k8ssigaws), and the calls are scheduled via the official [group mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-aws) |
| [Big Data](sig-big-data/README.md) | [@foxish (Anirudh Ramanathan, Google)](https://github.com/foxish)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data) | [#sig-big-data](https://kubernetes.slack.com/messages/sig-big-data/) | Wednesdays at 10am PST, link posted in [the official group](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data). |
| [CLI](sig-cli/README.md) | [@fabianofranz (Fabiano Franz, Red Hat)](https://github.com/fabianofranz)<br>[@pwittrock (Phillip Wittrock, Google)](https://github.com/pwittrock)<br>[@AdoHe (Tony Ado, Alibaba)](https://github.com/AdoHe) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cli) | [#sig-cli](https://kubernetes.slack.com/messages/sig-cli) | Bi-weekly Wednesdays at 9:00 AM PT on [Zoom](https://zoom.us/my/sigcli) |
| [Cluster Lifecycle](sig-cluster-lifecycle/README.md) | [@lukemarsden (Luke Marsden, Weave)](https://github.com/lukemarsden) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-lifecycle) | [#sig-cluster-lifecycle](https://kubernetes.slack.com/messages/sig-cluster-lifecycle) | Tuesdays at 09:00 AM PST on [Zoom](https://zoom.us/j/166836624) |
| [Cluster Ops](sig-cluster-ops/README.md) | [@zehicle (Rob Hirschfeld, RackN)](https://github.com/zehicle) <br> [@mikedanese (Mike Danese, Google](https://github.com/mikedanese) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-ops) | [#sig-cluster-ops](https://kubernetes.slack.com/messages/sig-cluster-ops) | Thursdays at 1:00 PM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/sig-cluster-ops)|
| [Contributor Experience](sig-contribx/README.md) | [@grodrigues3 (Garrett Rodrigues, Google)](https://github.com/Grodrigues3) <br> [@pwittrock (Phillip Witrock, Google)](https://github.com/pwittrock) <br> [@Phillels (Elsie Phillips, CoreOS)](https://github.com/Phillels) | [Group](https://groups.google.com/forum/#!forum/kubernetes-wg-contribex) | [#wg-contribex](https://kubernetes.slack.com/messages/wg-contribex) | Biweekly Wednesdays 9:30 AM PST on [zoom](https://zoom.us/j/4730809290) |
| [Docs](sig-docs/README.md) | [@pwittrock (Philip Wittrock, Google)](https://github.com/pwittrock) <br> [@devin-donnelly (Devin Donnelly, Google)](https://github.com/devin-donnelly) <br> [@jaredbhatti (Jared Bhatti, Google)](https://github.com/jaredbhatti)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-docs) | [#sig-docs](https://kubernetes.slack.com/messages/sig-docs) | Tuesdays @ 10:30AM PST on [Zoom](https://zoom.us/j/678394311) |
| [Federation](sig-federation/README.md) | [@csbell (Christian Bell, Google)](https://github.com/csbell) <br> [@quinton-hoole (Quinton Hoole, Huawei)](https://github.com/quinton-hoole) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-federation) | [#sig-federation](https://kubernetes.slack.com/messages/sig-federation/) | Bi-weekly on Monday at 9:00 AM PST on [hangouts](https://plus.google.com/hangouts/_/google.com/ubernetes) |
| [Instrumentation](sig-instrumentation/README.md) | [@piosz (Piotr Szczesniak, Google)](https://github.com/piosz) <br> [@fabxc (Fabian Reinartz, CoreOS)](https://github.com/fabxc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-instrumentation) | [#sig-instrumentation](https://kubernetes.slack.com/messages/sig-instrumentation) | [Thursdays at 9.30 AM PST](https://zoom.us/j/5342565819) |
| [Network](sig-network/README.md) | [@thockin (Tim Hockin, Google)](https://github.com/thockin)<br> [@dcbw (Dan Williams, Red Hat)](https://github.com/dcbw)<br> [@caseydavenport (Casey Davenport, Tigera)](https://github.com/caseydavenport) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-network) | [#sig-network](https://kubernetes.slack.com/messages/sig-network/) | Thursdays at 2:00 PM PST on [Zoom](https://zoom.us/j/5806599998) |
| [Node](sig-node/README.md) | [@dchen1107 (Dawn Chen, Google)](https://github.com/dchen1107)<br>[@euank (Euan Kemp, CoreOS)](https://github.com/orgs/kubernetes/people/euank)<br>[@derekwaynecarr (Derek Carr, Red Hat)](https://github.com/derekwaynecarr) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-node) | [#sig-node](https://kubernetes.slack.com/messages/sig-node/) | [Tuesdays at 10:00 PT](https://plus.google.com/hangouts/_/google.com/sig-node-meetup?authuser=0) |
| [On Prem](sig-on-prem/README.md) | [@josephjacks (Joseph Jacks, Apprenda)](https://github.com/josephjacks) <br> [@zen (Tomasz Napierala, Mirantis)](https://github.com/zen)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-on-prem) | [#sig-onprem](https://kubernetes.slack.com/messages/sig-onprem/) | Every two weeks on Wednesday at 9 PM PST / 12 PM EST |
| [OpenStack](sig-openstack/README.md) | [@idvoretskyi (Ihor Dvoretskyi, Mirantis)](https://github.com/idvoretskyi) <br> [@xsgordon (Steve Gordon, Red Hat)](https://github.com/xsgordon)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack) | [#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/) | Every second Wednesday at 5 PM PDT / 2 PM EDT |
| [PM](sig-pm/README.md) | [@apsinha (Aparna Sinha, Google)](https://github.com/apsinha) <br> [@idvoretskyi (Ihor Dvoretskyi, Mirantis)](https://github.com/idvoretskyi) <br> [@calebamiles (Caleb Miles, CoreOS)](https://github.com/calebamiles)| [Group](https://groups.google.com/forum/#!forum/kubernetes-pm) | [#kubernetes-pm](https://kubernetes.slack.com/messages/kubernetes-pm/) | TBD|
| [Rktnetes](sig-rktnetes/README.md) | [@euank (Euan Kemp, CoreOS)](https://github.com/euank) <br> [@tmrts (Tamer Tas)](https://github.com/tmrts) <br> [@yifan-gu (Yifan Gu, CoreOS)](https://github.com/yifan-gu) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-rktnetes) | [#sig-rktnetes](https://kubernetes.slack.com/messages/sig-rktnetes/) | [As needed (ad-hoc)](https://zoom.us/j/830298957) |
| [Scalability](sig-scalability/README.md) | [@lavalamp (Daniel Smith, Google)](https://github.com/lavalamp)<br>[@countspongebob (Bob Wise, Samsung SDS)](https://github.com/countspongebob)<br>[@jbeda (Joe Beda)](https://github.com/jbeda) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scale) | [#sig-scale](https://kubernetes.slack.com/messages/sig-scale/) | [Thursdays at 09:00 PT](https://zoom.us/j/989573207) |
| [Scheduling](sig-scheduling/README.md) | [@davidopp (David Oppenheimer, Google)](https://github.com/davidopp)<br>[@timothysc (Timothy St. Clair, Red Hat)](https://github.com/timothysc) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-scheduling) | [#sig-scheduling](https://kubernetes.slack.com/messages/sig-scheduling/) | Alternate between Mondays at 1 PM PT and Wednesdays at 12:30 AM PT on [Zoom](https://zoom.us/zoomconference?m=rN2RrBUYxXgXY4EMiWWgQP6Vslgcsn86) |
| [Service Catalog](sig-service-catalog/README.md) | [@pmorie (Paul Morie, Red Hat)](https://github.com/pmorie) <br> [@arschles (Aaron Schlesinger, Deis)](github.com/arschles) <br> [@bmelville (Brendan Melville, Google)](https://github.com/bmelville) <br> [@duglin (Doug Davis, IBM)](https://github.com/duglin)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-service-catalog) | [#sig-service-catalog](https://kubernetes.slack.com/messages/sig-service-catalog/) | [Mondays at 1 PM PST](https://zoom.us/j/7201225346) |
| [Storage](sig-storage/README.md) | [@saad-ali (Saad Ali, Google)](https://github.com/saad-ali)<br>[@childsb (Brad Childs, Red Hat)](https://github.com/childsb) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-storage) | [#sig-storage](https://kubernetes.slack.com/messages/sig-storage/) | Bi-weekly Thursdays 9 AM PST (or more frequently) on [Zoom](https://zoom.us/j/614261834) |
| [Testing](sig-testing/README.md) | [@spiffxp (Aaron Crickenberger, Samsung)](https://github.com/spiffxp)<br>[@ixdy (Jeff Grafton, Google)](https://github.com/ixdy) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-testing) | [#sig-testing](https://kubernetes.slack.com/messages/sig-testing/) | [Tuesdays at 9:30 AM PT](https://zoom.us/j/553910341) |
| [UI](sig-ui/README.md) | [@romlein (Dan Romlein, Apprenda)](https://github.com/romlein)<br> [@bryk (Piotr Bryk, Google)](https://github.com/bryk) | [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-ui) | [#sig-ui](https://kubernetes.slack.com/messages/sig-ui/) | Wednesdays at 4:00 PM CEST |
| [Windows](sig-windows/README.md) | [@michmike77 (Michael Michael, Apprenda)](https://github.com/michmike)| [Group](https://groups.google.com/forum/#!forum/kubernetes-sig-windows) | [#sig-windows](https://kubernetes.slack.com/messages/sig-windows) | Bi-weekly Tuesdays at 9:30 AM PT |

View File

@ -18,4 +18,4 @@ We use **sig-onprem** label to track on premise efforts in PRs and issues:
**Effort tracking document** [On-Prem related projects](https://docs.google.com/spreadsheets/d/1Ca9ZpGXM4PfycYv0Foi7Y4vmN4KVXrGYcJipbH8_xLY/edit#gid=0)
**Meetings:** Every second Wednesday at 0800 PST (11 AM ET / 5 PM CET) - [Connect using Zoom](https://zoom.us/my/k8s.sig.onprem), [Agenda/Notes](https://docs.google.com/document/d/1AHF1a8ni7iMOpUgDMcPKrLQCML5EMZUAwP4rro3P6sk/edit#)
**Meetings:** Every second Wednesday at 0900 PST (12 AM ET / 6 PM CET) - [Connect using Zoom](https://zoom.us/my/k8s.sig.onprem), [Agenda/Notes](https://docs.google.com/document/d/1AHF1a8ni7iMOpUgDMcPKrLQCML5EMZUAwP4rro3P6sk/edit#)

View File

@ -1,21 +1,62 @@
# OpenStack SIG
This is the wiki page of the Kubernetes OpenStack SIG: a special interest group
co-ordinating contributions of OpenStack-related changes to Kubernetes.
This is the community page of the Kubernetes OpenStack SIG: a special
interest group coordinating the cross-community efforts of the OpenStack
and Kubernetes communities. This includes OpenStack-related contributions
to Kubernetes projects with OpenStack as:
* a deployment platform for Kubernetes,
* a service provider for Kubernetes,
* a collection of applications to run on Kubernetes.
## Meetings
Meetings are held every second Wednesday. The meetings occur at
1500 UTC or 2100 UTC, alternating.
To check which time is being used for the upcoming meeting refer to the
[Agenda/Notes](https://docs.google.com/document/d/1iAQ3LSF_Ky6uZdFtEZPD_8i6HXeFxIeW4XtGcUJtPyU/edit?usp=sharing_eixpa_nl&ts=588b986f).
Meeting reminders are also sent to the [kubernetes-sig-openstack](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack)
list. Meetings are held on [Zoom](https://zoom.us) in the room at
[https://zoom.us/j/417251241](https://zoom.us/j/417251241).
## Leads
Steve Gordon (@xsgordon) and Ihor Dvoretskyi (@idvoretskyi)
## Slack Channel
[#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/). [Archive](http://kubernetes.slackarchive.io/sig-openstack/)
## Mailing Lists
The OpenStack SIG has a number of mailing lists, most activities are
co-ordinated via the general discussion list with the remainder used for
following Github notifications where the SIG is tagged in a comment.
The general discussion regarding the SIG occurs on the [kubernetes-sig-openstack](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack)
mailing list.
## GitHub Teams
There are a number of GitHub teams set up that can be tagged in an issue or PR
to bring them to the attention of the team. These notifications are also sent
to the mailing list attached to each GitHub team for archival purposes. It is
not intended that discussion occur on these lists directly.
| Name | Archival List |
|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
|@kubernetes/sig-openstack-api-reviews | [kubernetes-sig-openstack-api-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-api-reviews) |
|@kubernetes/sig-openstack-bugs | [kubernetes-sig-openstack-bugs](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-bugs) |
|@kubernetes/sig-openstack-feature-requests| [kubernetes-sig-openstack-feature-requests](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-feature-requests) |
|@kubernetes/sig-openstack-proposals | [kubernetes-sig-openstack-proposals](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-proposals) |
|@kubernetes/sig-openstack-pr-reviews | [kubernetes-sig-openstack-pr-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-pr-reviews) |
|@kubernetes/sig-openstack-misc | [kubernetes-sig-openstack-misc](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-misc) |
|@kubernetes/sig-openstack-test-failures | [kubernetes-sig-openstack-test-failures](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack-test-failures) |
## Issues and Bugs
Relevant [Issues](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen%20label%3Asig%2Fopenstack%20is%3Aissue)
and [Pull Requests](https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen%20is%3Apr%20label%3Asig%2Fopenstack)
are tagged with the **sig-openstack** label.
**Leads:** Steve Gordon (@xsgordon) and Ihor Dvoretskyi (@idvoretskyi)
**Slack Channel:** [#sig-openstack](https://kubernetes.slack.com/messages/sig-openstack/). [Archive](http://kubernetes.slackarchive.io/sig-openstack/)
**Mailing List:** [kubernetes-sig-openstack](https://groups.google.com/forum/#!forum/kubernetes-sig-openstack)
**Meetings:** Meetings are held every second Wednesday. The meetings occur at
1500 UTC or 2100 UTC, alternating. To check which time is being used for the
upcoming meeting refer to the [Agenda/Notes](https://docs.google.com/document/d/1iAQ3LSF_Ky6uZdFtEZPD_8i6HXeFxIeW4XtGcUJtPyU/edit#).
Meeting reminders are also sent to the mailing list linked above. Meetings are
held on [Zoom](https://zoom.us) in the room at [https://zoom.us/j/417251241](https://zoom.us/j/417251241).
)
are tagged with the **sig/openstack** label.

View File

@ -1,21 +0,0 @@
List of the OpenStack Special Interest Group team members.
Use @kubernetes/sig-openstack to mention this team in comments.
* [David Oppenheimer](https://github.com/davidopp) (owner)
* [Steve Gordon](https://github.com/xsgordon) (SIG lead)
* [Ihor Dvoretskyi](https://github.com/idvoretskyi) (SIG lead)
* [Angus Lees](https://github.com/anguslees)
* [Pengfei Ni](https://github.com/feiskyer)
* [Joshua Harlow](https://github.com/harlowja)
* [Stephen McQuaid](https://github.com/stevemcquaid)
* [Huamin Chen](https://github.com/rootfs)
* [David F. Flanders](https://github.com/DFFlanders)
* [Davanum Srinivas](https://github.com/dims)
* [Egor Guz](https://github.com/eghobo)
* [Flavio Percoco Premoli](https://github.com/flaper87)
* [Hongbin Lu](https://github.com/hongbin)
* [Louis Taylor](https://github.com/kragniz)
* [Jędrzej Nowak](https://github.com/pigmej)
* [rohitagarwalla](https://github.com/rohitagarwalla)
* [Russell Bryant](https://github.com/russellb)

37
sig-pm/README.md Normal file
View File

@ -0,0 +1,37 @@
## The Kubernetes PM Group
### Focus
The Kubernetes PM Group focuses on aspects of product management, such as the qualification and successful management of user requests, and aspects of project and program management such as the continued improvement of the processes used by the Kubernetes community to maintain the Kubernetes Project itself.
Besides helping to discover both what to build and how to build it, the PM Group also helps to try and keep the wheels on this spaceship we are all building together; bringing together people who think about Kubernetes as both a vibrant community of humans and technical program is another primary focus of this group.
Members of the Kubernetes PM Group can assume [certain additional](https://github.com/kubernetes/community/blob/master/project-managers/README.md) responsibilities to help maintain the Kubernetes Project itself.
It is also important to remember that the role of managing an open source project is very new and largely unscoped for a project as large as Kubernetes; we are learning too and we are excited to learn how we can best serve the community of users and contributors.
### Common activities
- Collecting and generalizing user feedback to help drive project direction and priorities: delivering on user needs while enforcing vendor neutrality
- Supporting collaboration across the community by working to improve the communication of roadmap and workload of other [Special Interest Groups](https://github.com/kubernetes/community#special-interest-groups-sig-and-working-groups)
- Supporting the continued effort to improve the stability and extensibility of Kubernetes Project
- Supporting the marketing and promotion of the Kubernetes Project through the [CNCF](https://www.cncf.io/)
- Working with the [Kubernetes Release Team](https://github.com/kubernetes/community/tree/master/contributors/devel/release) to continually ensure a high quality release of the Kubernetes Project
- Supporting the Kubernetes ecosystem through [the Kubernetes Incubator](https://github.com/kubernetes/community/blob/master/incubator.md)
- Coordinating project wide policy changes for Kubernetes and the Kubernetes Incubator
- Onboarding large groups of corporate contributors and welcoming them into the Kubernetes Community
- Whatever is needed to help make the project go!
### Contact us
- via [Slack](https://kubernetes.slack.com/messages/kubernetes-pm/)
- via [Google Groups](https://groups.google.com/forum/#!forum/kubernetes-pm)
### Regular Meetings
Every second Tuesday, 8:00 AM PT/4:00 PM UTC
- [Zoom link](https://zoom.us/j/845373595)
- [Meeting Notes](https://docs.google.com/document/d/1YqIpyjz4mV1jjvzhLx9JYy8LAduedzaoBMjpUKGUJQo/edit?usp=sharing)
### Leaders
- Aparna Sinha apsinha@google.com, Google
- Ihor Dvoretskyi ihor.dvoretskyi@gmail.com, Mirantis
- Caleb Miles caleb.miles@coreos.com, CoreOS

View File

@ -1,18 +1,34 @@
# Scalability SIG
# SIG Scalability
**Leads:** Bob Wise (@countspongebob) and Joe Beda (@jbeda)
Responsible for answering scalability related questions such as:
**Slack Channel:** [#sig-scale](https://kubernetes.slack.com/messages/sig-scale/). [Archive](http://kubernetes.slackarchive.io/sig-scale/)
What size clusters do we think that we should support with Kubernetes in the short to
medium term? How performant do we think that the control system should be at scale?
What resource overhead should the Kubernetes control system reasonably consume?
**Mailing List:** [kubernetes-sig-scale](https://groups.google.com/forum/#!forum/kubernetes-sig-scale)
For more details about our objectives please review [Scaling And Performance Goals](goals.md)
**Meetings:** Thursdays at 9am pacific. Contact Joe or Bob for invite. [Notes](https://docs.google.com/a/bobsplanet.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?usp=drive_web
)
## Organizers
- Bob Wise (@countspongebob), Samsung-CNCT
- Joe Beda (@jbeda), Heptio
**Docs:**
[Scaling And Performance Goals](goals.md)
## Meetings
### Scalability SLAs
- **Every Thursday at 9am pacific.**
- Contact Joe or Bob for invite.
- [Zoom link](https://zoom.us/j/989573207)
- [Agenda items](https://docs.google.com/a/bobsplanet.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?usp=drive_web)
## Slack / Google Groups
- [Slack: #sig-scale](https://kubernetes.slack.com/messages/sig-scale/).
- [Slack Archive](http://kubernetes.slackarchive.io/sig-scale/)
- [kubernetes-sig-scale](https://groups.google.com/forum/#!forum/kubernetes-sig-scale)
## Docs
- [Scaling And Performance Goals](goals.md)
## Scalability SLAs
We officially support two different SLAs:

View File

@ -10,6 +10,8 @@
[**Meeting Agenda**](http://goo.gl/A0m24V)
[**Meeting Video Playlist**](https://goo.gl/ZmLNX9)
### SIG Mission
Mission: to develop a Kubernetes API for the CNCF service broker and Kubernetes broker implementation.

View File

@ -25,13 +25,13 @@ Interested in contributing to storage features in Kubernetes? [Please read our g
* [kubernetes-sig-storage-proposals](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-proposals)
* [kubernetes-sig-storage-test-failures](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-test-failures)
* Github Teams - These are the teams that should be mentioned on Github PRs and Issues:
* [kubernetes-sig-storage-api-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-api-reviews)
* [kubernetes-sig-storage-bugs](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-bugs)
* [kubernetes-sig-storage-feature-requests](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-feature-requests)
* [kubernetes-sig-storage-misc](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-misc)
* [kubernetes-sig-storage-pr-reviews](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-pr-reviews)
* [kubernetes-sig-storage-proposals](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-proposals)
* [kubernetes-sig-storage-test-failures](https://groups.google.com/forum/#!forum/kubernetes-sig-storage-test-failures)
* [kubernetes-sig-storage-api-reviews](https://github.com/orgs/kubernetes/teams/sig-storage-api-reviews)
* [kubernetes-sig-storage-bugs](https://github.com/orgs/kubernetes/teams/sig-storage-bugs)
* [kubernetes-sig-storage-feature-requests](https://github.com/orgs/kubernetes/teams/sig-storage-feature-requests)
* [kubernetes-sig-storage-misc](https://github.com/orgs/kubernetes/teams/sig-storage-misc)
* [kubernetes-sig-storage-pr-reviews](https://github.com/orgs/kubernetes/teams/sig-storage-pr-reviews)
* [kubernetes-sig-storage-proposals](https://github.com/orgs/kubernetes/teams/sig-storage-proposals)
* [kubernetes-sig-storage-test-failures](https://github.com/orgs/kubernetes/teams/sig-storage-test-failures)
* Github Issues
* [link](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Asig%2Fstorage)
* Documentation for currently supported volume plugins: http://kubernetes.io/docs/user-guide/volumes/

View File

@ -6,7 +6,8 @@ For folks that prefer reading the docs first, we recommend reading our Storage D
For folks that prefer a video overview, we recommend watching the following videos:
- [The state of state](https://www.youtube.com/watch?v=jsTQ24CLRhI&index=6&list=PLosInM-8doqcBy3BirmLM4S_pmox6qTw3)
- [Kubernetes Storage 101](https://www.youtube.com/watch?v=ZqTHe6Xj0Ek&list=PLosInM-8doqcBy3BirmLM4S_pmox6qTw3&index=38)
- [Storage overview to SIG Apps](https://www.youtube.com/watch?v=DrLGxkFdDNc&feature=youtu.be&t=11m19s)
- [Overview of Basic Volume for SIG Apps](https://youtu.be/DrLGxkFdDNc?t=11m19s)
- [Overview of Dynamic Provisioning for SIG Apps](https://youtu.be/NXUHmxXytUQ?t=10m33s)
Keep in mind that the video overviews reflect the state of the art at the time they were created. In Kubernetes we try very hard to maintain backwards compatibility but Kubernetes is a fast moving project and we do add features going forward and attending the Storage SIG meetings and the Storage SIG Google group are both good ways of continually staying up to speed.

View File

@ -2,10 +2,17 @@
A special interest group for bringing Kubernetes support to Windows.
## Meeting
* Bi-weekly: Tuesday 1:00 PM EST (10:00 AM PST)
## Meetings
* Bi-weekly: Tuesday 12:30 PM EST (9:30 AM PST)
* Zoom link: [https://zoom.us/my/sigwindows](https://zoom.us/my/sigwindows)
* To get an invite to the meeting, first join the Google group https://groups.google.com/forum/#!forum/kubernetes-sig-windows, and also ask the SIG Lead for the current invitation
## History
* Recorded Meetings Playlist on Youtube: https://www.youtube.com/playlist?list=PL69nYSiGNLP2OH9InCcNkWNu2bl-gmIU4&jct=LZ9EIvD4DGrhr2h4r0ItaBmco7gTgw
* Meeting Notes: https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit#heading=h.kbz22d1yc431
The meeting agenda and notes can be found [here](https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit)
## Get Involved
* Find us on Slack at https://kubernetes.slack.com/messages/sig-windows
* Find us on Google groups https://groups.google.com/forum/#!forum/kubernetes-sig-windows
* Slack History is archived at http://kubernetes.slackarchive.io/sig-windows/