Merge pull request #3 from wu-qiang/kms-plugin-grpc-api
gRPC-based KMS plugin service
This commit is contained in:
commit
ca6d798a21
|
@ -1,7 +1,7 @@
|
|||
# Files that should be ignored by tools which do not want to consider generated
|
||||
# code.
|
||||
#
|
||||
# https://github.com/kubernetes/contrib/blob/master/mungegithub/mungers/size.go
|
||||
# https://git.k8s.io/contrib/mungegithub/mungers/size.go
|
||||
#
|
||||
# This file is a series of lines, each of the form:
|
||||
# <type> <name>
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
<!-- Thanks for sending a pull request! Here are some tips for you:
|
||||
- If this is your first contribution, read our Getting Started guide https://github.com/kubernetes/community#your-first-contribution
|
||||
- If you are editing SIG information, please follow these instructions: https://git.k8s.io/community/generator
|
||||
You will need to follow these steps:
|
||||
1. Edit sigs.yaml with your change
|
||||
2. Generate docs with `make generate`. To build docs for one sig, run `make WHAT=sig-apps generate`
|
||||
-->
|
|
@ -0,0 +1,39 @@
|
|||
# OSX leaves these everywhere on SMB shares
|
||||
._*
|
||||
|
||||
# OSX trash
|
||||
.DS_Store
|
||||
|
||||
# Eclipse files
|
||||
.classpath
|
||||
.project
|
||||
.settings/**
|
||||
|
||||
# Files generated by JetBrains IDEs, e.g. IntelliJ IDEA
|
||||
.idea/
|
||||
*.iml
|
||||
|
||||
# Vscode files
|
||||
.vscode
|
||||
|
||||
# Emacs save files
|
||||
*~
|
||||
\#*\#
|
||||
.\#*
|
||||
|
||||
# Vim-related files
|
||||
[._]*.s[a-w][a-z]
|
||||
[._]s[a-w][a-z]
|
||||
*.un~
|
||||
Session.vim
|
||||
.netrwhist
|
||||
|
||||
# JUnit test output from ginkgo e2e tests
|
||||
/junit*.xml
|
||||
|
||||
# Mercurial files
|
||||
**/.hg
|
||||
**/.hg*
|
||||
|
||||
# direnv .envrc files
|
||||
.envrc
|
85
CLA.md
85
CLA.md
|
@ -1,42 +1,68 @@
|
|||
# The Contributor License Agreement
|
||||
|
||||
The [Cloud Native Computing Foundation][CNCF] defines the legal status of the
|
||||
contributed code in a _Contributor License Agreement_ (CLA).
|
||||
The [Cloud Native Computing Foundation](https://www.cncf.io/community) defines
|
||||
the legal status of the contributed code in a _Contributor License Agreement_
|
||||
(CLA).
|
||||
|
||||
Only original source code from CLA signatories can be accepted into kubernetes.
|
||||
|
||||
This policy does not apply to [third_party] and [vendor].
|
||||
This policy does not apply to [third_party](https://git.k8s.io/kubernetes/third_party)
|
||||
and [vendor](https://git.k8s.io/kubernetes/vendor).
|
||||
|
||||
## What am I agreeing to?
|
||||
|
||||
There are two versions of the CLA:
|
||||
|
||||
1. One for [individual contributors](https://github.com/cncf/cla/blob/master/individual-cla.pdf)
|
||||
submitting contributions on their own behalf.
|
||||
1. One for [corporations](https://github.com/cncf/cla/blob/master/corporate-cla.pdf)
|
||||
to sign for contributions submitted by their employees.
|
||||
|
||||
It is important to read and understand this legal agreement.
|
||||
|
||||
## How do I sign?
|
||||
|
||||
#### 1. Read
|
||||
#### 1. Log into the Linux Foundation ID Portal with Github
|
||||
|
||||
* [CLA for individuals] to sign up as an individual or as an employee of a signed organization.
|
||||
* [CLA for corporations] to sign as a corporation representative and manage signups from your organization.
|
||||
|
||||
#### 2. Sign in with GitHub.
|
||||
Click one of:
|
||||
* [Individual signup](https://identity.linuxfoundation.org/projects/cncf) to
|
||||
sign up as an individual or as an employee of a signed organization.
|
||||
* [Corporation signup](https://identity.linuxfoundation.org/node/285/organization-signup)
|
||||
to sign as a corporation representative and manage signups from your organization.
|
||||
|
||||
Click
|
||||
* [Individual signup] to sign up as an individual or as an employee of a signed organization.
|
||||
* [Corp signup] to sign as a corporation representative and manage signups from your organization.
|
||||
|
||||
Either signup form looks like this:
|
||||
Once you get to the sign in form, click "Log in with Github":
|
||||
|
||||

|
||||
|
||||
#### 3. Enter the correct E-mail address to validate!
|
||||
#### 2. Create Linux Foundation ID Portal account with correct e-mail address
|
||||
|
||||
The address entered on the form must meet two constraints:
|
||||
|
||||
* It __must match__ your [git email] (the output of `git config user.email`)
|
||||
or your PRs will not be approved!
|
||||
Ensure that the e-mail address you use when completing this form matches the one
|
||||
you will use for your commits.
|
||||
|
||||
* It must be your official `person@organization.com` address if you signed up
|
||||
as an employee of said organization.
|
||||
If you are signing up as an employee, you must use your official
|
||||
person@organization.domain email address in the CNCF account registration page.
|
||||
|
||||

|
||||
|
||||
#### 4. Look for an email indicating successful signup.
|
||||
#### 3. Complete signing process
|
||||
|
||||
Once you have created your account, follow the instructions to complete the
|
||||
signing process via Hellosign.
|
||||
|
||||
#### 4. Ensure your Github e-mail address matches address used to sign CLA
|
||||
|
||||
Your Github email address __must match__ the same address you use when signing
|
||||
the CLA. Github has [documentation](https://help.github.com/articles/setting-your-commit-email-address-on-github/)
|
||||
on setting email addresses.
|
||||
|
||||
You must also set your [git e-mail](https://help.github.com/articles/setting-your-email-in-git)
|
||||
to match this e-mail address as well.
|
||||
|
||||
If you've already submitted a PR you can correct your user.name and user.email
|
||||
and then use use `git commit --ammend --reset-author` and then `git push` to
|
||||
correct the PR.
|
||||
|
||||
#### 5. Look for an email indicating successful signup.
|
||||
|
||||
> The Linux Foundation
|
||||
>
|
||||
|
@ -50,21 +76,8 @@ Once you have this, the CLA authorizer bot will authorize your PRs.
|
|||
|
||||

|
||||
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If you have signup trouble, please explain your case on
|
||||
the [CLA signing issue] and we (@sarahnovotny and @foxish),
|
||||
along with the [CNCF] will help sort it out.
|
||||
If you are having problems with signed the CLA send a mail to: `helpdesk@rt.linuxfoundation.org`.
|
||||
|
||||
Another option: ask for help at `helpdesk@rt.linuxfoundation.org`.
|
||||
|
||||
[CNCF]: https://www.cncf.io/community
|
||||
[CLA signing issue]: https://github.com/kubernetes/kubernetes/issues/27796
|
||||
[CLA for individuals]: https://github.com/cncf/cla/blob/master/individual-cla.pdf
|
||||
[CLA for corporations]: https://github.com/cncf/cla/blob/master/corporate-cla.pdf
|
||||
[Corp signup]: https://identity.linuxfoundation.org/node/285/organization-signup
|
||||
[Individual signup]: https://identity.linuxfoundation.org/projects/cncf
|
||||
[git email]: https://help.github.com/articles/setting-your-email-in-git
|
||||
[third_party]: https://github.com/kubernetes/kubernetes/tree/master/third_party
|
||||
[vendor]: https://github.com/kubernetes/kubernetes/tree/master/vendor
|
||||
Someone from the CNCF will respond to your ticket to help.
|
||||
|
|
|
@ -1,17 +1,39 @@
|
|||
# Contributing to the community repo
|
||||
# Contributing to the Community Repo
|
||||
|
||||
Make a [pull request](https://help.github.com/articles/using-pull-requests) (PR).
|
||||
Welcome to the Kubernetes Community contributing guide. We are excited about the prospect of you joining our [community](https://github.com/kubernetes/community)!
|
||||
|
||||
Upon successful review, someone will give the PR
|
||||
## Finding Something to Work On
|
||||
|
||||
Before you begin, you can look at some places where you can help out:
|
||||
|
||||
- [Existing open issues](https://github.com/kubernetes/community/issues)
|
||||
- [Open Pull Requests](https://github.com/kubernetes/community/pulls) - Most of the pull requests in this repository are not code, but documentation and process. This is a great place to get started. Even if you do not have the rights to merge you can still help by doing a review.
|
||||
|
||||
## Steps
|
||||
|
||||
1. Make a [Pull Request](https://help.github.com/articles/using-pull-requests) (PR).
|
||||
2. Upon successful review, someone will give the PR
|
||||
a __LGTM__ (_looks good to me_) in the review thread.
|
||||
|
||||
A [SIG lead](sig-list.md) (or someone with approval powers
|
||||
3. A [SIG lead](sig-list.md) (or someone with approval powers
|
||||
as specified in an OWNERS file) may merge the PR immediately
|
||||
with or without an LGTM from someone else.
|
||||
Or they may wait a business day to get further feedback from other reviewers.
|
||||
|
||||
### Trivial Edits
|
||||
|
||||
Each incoming Pull Request needs to be reviewed, checked, and then merged.
|
||||
While automation helps with this, each contribution also has an engineering cost. Therefore it is appreciated if you do NOT make trivial edits and fixes, but instead focus on giving the entire file a review.
|
||||
If you find one grammatical or spelling error, it is likely there are more in that file, you can really make your Pull Request count by checking formatting, checking for broken links, and fixing errors and then submitting all the fixes at once to that file.
|
||||
Some questions to consider:
|
||||
- Can the file be improved further?
|
||||
- Does the trivial edit greatly improve the quality of the content?
|
||||
|
||||
## Contributing to Individual SIGs
|
||||
|
||||
Each SIG may or may not have it's own policies for editing their section of this repository.
|
||||
|
||||
Edits in SIG sub-directories should follow any additional guidelines described
|
||||
by the respective SIG leads in the sub-directory's `CONTRIBUTING` file
|
||||
(e.g. [sig-cli/CONTRIBUTING](sig-cli/CONTRIBUTING.md)).
|
||||
|
||||
|
||||
Attending a [SIG meeting](/sig-list.md) or posting on their mailing list might be prudent if you want to make extensive contributions.
|
||||
|
|
21
Makefile
21
Makefile
|
@ -1,21 +1,24 @@
|
|||
IMAGE_NAME=kube-communitydocs
|
||||
IMAGE_NAME=golang:1.9
|
||||
|
||||
default: \
|
||||
generate \
|
||||
|
||||
reset-docs:
|
||||
git checkout HEAD -- sig-list.md sig-*/README.md
|
||||
git checkout HEAD -- ./sig-list.md ./sig-*/README.md ./wg-*/README.md
|
||||
|
||||
build-image:
|
||||
docker build -q -t $(IMAGE_NAME) -f generator/Dockerfile generator
|
||||
generate:
|
||||
go run ./generator/app.go
|
||||
|
||||
generate: build-image
|
||||
docker run --rm -e WG -e SIG -v $(shell pwd):/go/src/app/generated:Z $(IMAGE_NAME) app
|
||||
generate-dockerized:
|
||||
docker run --rm -e WHAT -v $(shell pwd):/go/src/app:Z $(IMAGE_NAME) make -C /go/src/app generate
|
||||
|
||||
verify:
|
||||
@hack/verify.sh
|
||||
|
||||
test: build-image
|
||||
docker run --rm $(IMAGE_NAME) go test -v ./...
|
||||
test:
|
||||
go test -v ./generator/...
|
||||
|
||||
.PHONY: default reset-docs build-image generate verify test
|
||||
test-dockerized:
|
||||
docker run --rm -v $(shell pwd):/go/src/app:Z $(IMAGE_NAME) make -C /go/src/app test
|
||||
|
||||
.PHONY: default reset-docs generate generate-dockerized verify test test-dockerized
|
||||
|
|
18
OWNERS
18
OWNERS
|
@ -1,18 +1,22 @@
|
|||
reviewers:
|
||||
- sarahnovotny
|
||||
- idvoretskyi
|
||||
- calebamiles
|
||||
- castrojo
|
||||
- cblecker
|
||||
- grodrigues3
|
||||
- idvoretskyi
|
||||
- sarahnovotny
|
||||
approvers:
|
||||
- sarahnovotny
|
||||
- idvoretskyi
|
||||
- calebamiles
|
||||
- grodrigues3
|
||||
- brendandburns
|
||||
- calebamiles
|
||||
- castrojo
|
||||
- cblecker
|
||||
- dchen1107
|
||||
- grodrigues3
|
||||
- idvoretskyi
|
||||
- jbeda
|
||||
- lavalamp
|
||||
- sarahnovotny
|
||||
- smarterclayton
|
||||
- spiffxp
|
||||
- thockin
|
||||
- wojtek-t
|
||||
|
||||
|
|
|
@ -0,0 +1,117 @@
|
|||
aliases:
|
||||
sig-api-machinery-leads:
|
||||
- lavalamp
|
||||
- deads2k
|
||||
sig-apps-leads:
|
||||
- michelleN
|
||||
- mattfarina
|
||||
- prydonius
|
||||
sig-architecture-leads:
|
||||
- bgrant0607
|
||||
- jdumars
|
||||
sig-auth-leads:
|
||||
- ericchiang
|
||||
- liggitt
|
||||
- deads2k
|
||||
sig-autoscaling-leads:
|
||||
- mwielgus
|
||||
- directxman12
|
||||
sig-aws-leads:
|
||||
- justinsb
|
||||
- kris-nova
|
||||
- chrislovecnm
|
||||
- mfburnett
|
||||
sig-azure-leads:
|
||||
- slack
|
||||
- colemickens
|
||||
- jdumars
|
||||
sig-big-data-leads:
|
||||
- foxish
|
||||
- erikerlandson
|
||||
sig-cli-leads:
|
||||
- fabianofranz
|
||||
- pwittrock
|
||||
- AdoHe
|
||||
sig-cluster-lifecycle-leads:
|
||||
- lukemarsden
|
||||
- jbeda
|
||||
- roberthbailey
|
||||
- luxas
|
||||
sig-cluster-ops-leads:
|
||||
- zehicle
|
||||
- jdumars
|
||||
sig-contributor-experience-leads:
|
||||
- grodrigues3
|
||||
- Phillels
|
||||
sig-docs-leads:
|
||||
- devin-donnelly
|
||||
- jaredbhatti
|
||||
sig-gcp-leads:
|
||||
- abgworrall
|
||||
sig-multicluster-leads:
|
||||
- csbell
|
||||
- quinton-hoole
|
||||
sig-instrumentation-leads:
|
||||
- piosz
|
||||
- fabxc
|
||||
sig-network-leads:
|
||||
- thockin
|
||||
- dcbw
|
||||
- caseydavenport
|
||||
sig-node-leads:
|
||||
- dchen1107
|
||||
- derekwaynecarr
|
||||
sig-on-premise-leads:
|
||||
- marcoceppi
|
||||
- dghubble
|
||||
sig-openstack-leads:
|
||||
- idvoretskyi
|
||||
- xsgordon
|
||||
sig-product-management-leads:
|
||||
- apsinha
|
||||
- idvoretskyi
|
||||
- calebamiles
|
||||
sig-release-leads:
|
||||
- pwittrock
|
||||
- calebamiles
|
||||
sig-scalability-leads:
|
||||
- wojtek-t
|
||||
- countspongebob
|
||||
- jbeda
|
||||
sig-scheduling-leads:
|
||||
- davidopp
|
||||
- timothysc
|
||||
sig-service-catalog-leads:
|
||||
- pmorie
|
||||
- arschles
|
||||
- vaikas-google
|
||||
- duglin
|
||||
sig-storage-leads:
|
||||
- saad-ali
|
||||
- childsb
|
||||
sig-testing-leads:
|
||||
- spiffxp
|
||||
- fejta
|
||||
- stevekuznetsov
|
||||
- timothysc
|
||||
sig-ui-leads:
|
||||
- danielromlein
|
||||
- floreks
|
||||
sig-windows-leads:
|
||||
- michmike
|
||||
wg-resource-management-leads:
|
||||
- vishh
|
||||
- derekwaynecarr
|
||||
wg-container-identity-leads:
|
||||
- smarterclayton
|
||||
- destijl
|
||||
wg-kubeadm-adoption-leads:
|
||||
- luxas
|
||||
- justinsb
|
||||
wg-cluster-api-leads:
|
||||
- kris-nova
|
||||
- pipejakob
|
||||
- roberthbailey
|
||||
wg-app-def-leads:
|
||||
- ant31
|
||||
- sebgoa
|
32
README.md
32
README.md
|
@ -14,21 +14,24 @@ For more specific topics, try a SIG.
|
|||
## SIGs
|
||||
|
||||
Kubernetes is a set of projects, each shepherded by a special interest group (SIG).
|
||||
|
||||
|
||||
A first step to contributing is to pick from the [list of kubernetes SIGs](sig-list.md).
|
||||
|
||||
A SIG can have its own policy for contribution,
|
||||
A SIG can have its own policy for contribution,
|
||||
described in a `README` or `CONTRIBUTING` file in the SIG
|
||||
folder in this repo (e.g. [sig-cli/CONTRIBUTING](sig-cli/CONTRIBUTING.md)),
|
||||
and its own mailing list, slack channel, etc.
|
||||
|
||||
|
||||
If you want to edit details about a SIG (e.g. its weekly meeting time or its leads),
|
||||
please follow [these instructions](./generator) that detail how our docs are auto-generated.
|
||||
|
||||
## How Can I Help?
|
||||
|
||||
Documentation (like the text you are reading now) can
|
||||
always use improvement!
|
||||
|
||||
There's a [semi-curated list of issues][help-wanted]
|
||||
that should not need deep knowledge of the system.
|
||||
There's a [semi-curated list of issues][help wanted]
|
||||
that should not need deep knowledge of the system.
|
||||
|
||||
To dig deeper, read a design doc, e.g. [architecture].
|
||||
|
||||
|
@ -52,14 +55,14 @@ lead to many relevant topics, including
|
|||
## Your First Contribution
|
||||
|
||||
We recommend that you work on an existing issue before attempting
|
||||
to [develop a new feature].
|
||||
to [develop a new feature].
|
||||
|
||||
Start by finding an existing issue with the [help-wanted] label;
|
||||
these are issues we've deemed are well suited for new contributors.
|
||||
Alternatively, if there is a specific area you are interested in,
|
||||
Start by finding an existing issue with the [help wanted] label;
|
||||
these issues we've deemed are well suited for new contributors.
|
||||
Alternatively, if there is a specific area you are interested in,
|
||||
ask a [SIG lead](sig-list.md) for suggestions), and respond on the
|
||||
issue thread expressing interest in working on it.
|
||||
|
||||
issue thread expressing interest in working on it.
|
||||
|
||||
This helps other people know that the issue is active, and
|
||||
hopefully prevents duplicated efforts.
|
||||
|
||||
|
@ -75,15 +78,14 @@ If you want to work on a new idea of relatively small scope:
|
|||
1. Submit a [pull request] containing a tested change.
|
||||
|
||||
|
||||
[architecture]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture.md
|
||||
[cmd]: https://github.com/kubernetes/kubernetes/tree/master/cmd
|
||||
[architecture]: /contributors/design-proposals/architecture/architecture.md
|
||||
[cmd]: https://git.k8s.io/kubernetes/cmd
|
||||
[CLA]: CLA.md
|
||||
[Collaboration Guide]: contributors/devel/collab.md
|
||||
[Developer's Guide]: contributors/devel/development.md
|
||||
[develop a new feature]: https://github.com/kubernetes/features
|
||||
[expectations]: contributors/devel/community-expectations.md
|
||||
[help-wanted]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Ahelp-wanted
|
||||
[help wanted]: https://go.k8s.io/help-wanted
|
||||
[pull request]: contributors/devel/pull-requests.md
|
||||
|
||||
[]()
|
||||
|
||||
|
|
|
@ -0,0 +1,3 @@
|
|||
# Kubernetes Community Code of Conduct
|
||||
|
||||
Kubernetes follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
|
|
@ -20,18 +20,15 @@ and meetings devoted to Kubernetes.
|
|||
## Social Media
|
||||
|
||||
* [Twitter]
|
||||
* [Google+]
|
||||
* [blog]
|
||||
* Pose questions and help answer them on [Slack][slack.k8s.io] or [Stack Overflow].
|
||||
* [Blog]
|
||||
* Pose questions and help answer them on [Stack Overflow].
|
||||
* [Slack] - sign up
|
||||
|
||||
Most real time discussion happens at [kubernetes.slack.com];
|
||||
you can sign up at [slack.k8s.io].
|
||||
|
||||
Real time discussion at kubernetes.slack.io:
|
||||
Discussions on most channels are archived at [kubernetes.slackarchive.io].
|
||||
Start archiving by inviting the _slackarchive_ bot to a
|
||||
channel via `/invite @slackarchive`.
|
||||
To add new channels, contact one of the admins
|
||||
(briangrant, goltermann, jbeda, sarahnovotny and thockin).
|
||||
To add new channels, contact one of the admins in the #slack-admins channel. Our guidelines are [here](/communication/slack-guidelines.md).
|
||||
|
||||
## Issues
|
||||
|
||||
|
@ -50,10 +47,13 @@ Development announcements and discussions appear on the Google group
|
|||
Users trade notes on the Google group
|
||||
[kubernetes-users] (send mail to `kubernetes-users@googlegroups.com`).
|
||||
|
||||
## Office Hours
|
||||
|
||||
Office hours are held once a month. Please refer to [this document](events/office-hours.md) to learn more.
|
||||
|
||||
## Weekly Meeting
|
||||
|
||||
We have PUBLIC and RECORDED [weekly meeting] every Thursday at 10am US Pacific Time.
|
||||
We have PUBLIC and RECORDED [weekly meeting] every Thursday at 10am US Pacific Time over Zoom.
|
||||
|
||||
Map that to your local time with this [timezone table].
|
||||
|
||||
|
@ -71,11 +71,11 @@ please propose a specific date on the [Kubernetes Community Meeting Agenda].
|
|||
Kubernetes is the main focus of CloudNativeCon/KubeCon, held every spring in Europe and winter in North America. Information about these and other community events is available on the CNCF [events] pages.
|
||||
|
||||
|
||||
[blog]: http://blog.kubernetes.io
|
||||
[Blog]: http://blog.kubernetes.io
|
||||
[calendar.google.com]: https://calendar.google.com/calendar/embed?src=cgnt364vd8s86hr2phapfjc6uk%40group.calendar.google.com&ctz=America/Los_Angeles
|
||||
[CNCF code of conduct]: https://github.com/cncf/foundation/blob/master/code-of-conduct.md
|
||||
[communication]: https://github.com/kubernetes/community/blob/master/communication.md
|
||||
[community meeting]: https://github.com/kubernetes/community/blob/master/communication.md#weekly-meeting
|
||||
[communication]: /communication.md
|
||||
[community meeting]: /communication.md#weekly-meeting
|
||||
[events]: https://www.cncf.io/events/
|
||||
[file an issue]: https://github.com/kubernetes/kubernetes/issues/new
|
||||
[Google+]: https://plus.google.com/u/0/b/116512812300813784482/116512812300813784482
|
||||
|
@ -84,10 +84,10 @@ Kubernetes is the main focus of CloudNativeCon/KubeCon, held every spring in Eur
|
|||
[kubernetes-community-video-chat]: https://groups.google.com/forum/#!forum/kubernetes-community-video-chat
|
||||
[kubernetes-dev]: https://groups.google.com/forum/#!forum/kubernetes-dev
|
||||
[kubernetes-users]: https://groups.google.com/forum/#!forum/kubernetes-users
|
||||
[kubernetes.slackarchive.io]: http://kubernetes.slackarchive.io
|
||||
[kubernetes.slack.com]: http://kubernetes.slack.com
|
||||
[Special Interest Group]: https://github.com/kubernetes/community/blob/master/README.md#SIGs
|
||||
[slack.k8s.io]: http://slack.k8s.io
|
||||
[kubernetes.slackarchive.io]: https://kubernetes.slackarchive.io
|
||||
[kubernetes.slack.com]: https://kubernetes.slack.com
|
||||
[Slack]: slack.k8s.io
|
||||
[Special Interest Group]: /README.md#SIGs
|
||||
[Stack Overflow]: http://stackoverflow.com/questions/tagged/kubernetes
|
||||
[timezone table]: https://www.google.com/search?q=1000+am+in+pst
|
||||
[troubleshooting guide]: http://kubernetes.io/docs/troubleshooting
|
||||
|
|
|
@ -0,0 +1,12 @@
|
|||
reviewers:
|
||||
- parispittman
|
||||
- castrojo
|
||||
- jdumars
|
||||
- idvoretskyi
|
||||
- cblecker
|
||||
approvers:
|
||||
- cblecker
|
||||
- castrojo
|
||||
- sig-contributor-experience-leads
|
||||
labels:
|
||||
- sig/contributor-experience
|
|
@ -0,0 +1 @@
|
|||
[placeholder for the future home of communication.md]
|
|
@ -0,0 +1,85 @@
|
|||
# SLACK GUIDELINES
|
||||
|
||||
Slack is the main communication platform for Kubernetes outside of our mailing lists. It’s important that conversation stays on topic in each channel, and that everyone abides by the Code of Conduct. We have over 30,000 members who should all expect to have a positive experience.
|
||||
|
||||
Chat is searchable and public. Do not make comments that you would not say on a video recording or in another public space. Please be courteous to others.
|
||||
|
||||
`@here` and `@channel` should be used rarely. Members will receive notifications from these commands and we are a global project - please be kind. Note: `@all` is only to be used by admins.
|
||||
|
||||
## CODE OF CONDUCT
|
||||
Kubernetes adheres to Cloud Native Compute Foundation's [Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md) throughout the project, and includes all communication mediums.
|
||||
|
||||
## ADMINS
|
||||
(by Slack ID and timezone)
|
||||
* caniszczyk - CT
|
||||
* ihor - CET
|
||||
* jdumars - ET
|
||||
* jorge - CT
|
||||
* paris - PT
|
||||
|
||||
Slack Admins should make sure to mention this in the “What I do” section of their Slack profile, as well as for which time zone.
|
||||
|
||||
To connect: please reach out in the #slack-admins channel, mention one of us in the specific channel where you have a question, or DM (Direct Message) one of us privately.
|
||||
|
||||
### ADMIN EXPECTATIONS AND GUIDELINES
|
||||
* Adhere to Code of Conduct
|
||||
* Take care of spam as soon as possible, which may mean taking action by making members inactive
|
||||
* Moderating and fostering a safe environment for conversations
|
||||
* Bring Code of Conduct issues to the Steering Committee
|
||||
* Create relevant channels and list Code of Conduct in new channel welcome message
|
||||
* Help troubleshoot Slack issues
|
||||
* Review bot, token, and webhook requests
|
||||
* Be helpful!
|
||||
|
||||
## CREATING CHANNELS
|
||||
Please reach out to the #slack-admins group with your request to create a new channel.
|
||||
|
||||
Channels are dedicated to [SIGs, WGs](/sig-list.md), sub-projects, community topics, and related Kubernetes programs/projects.
|
||||
Channels are not:
|
||||
* company specific; cloud providers are ok with product names as the channel. Discourse will be about Kubernetes-related topics and not proprietary information of the provider.
|
||||
* private unless there is an exception: code of conduct matters, mentoring, security/vulnerabilities, or steering committee.
|
||||
|
||||
Typical naming conventions:
|
||||
#kubernetes-foo #sig-foo #meetup-foo #location-users #projectname
|
||||
|
||||
All channels need a documented purpose. Use this space to welcome the targeted community: promote your meetings, post agendas, etc.
|
||||
|
||||
We may make special accommodations where necessary.
|
||||
|
||||
## ESCALATING and/or REPORTING A PROBLEM
|
||||
Join the #slack-admins channel or contact one of the admins in the closest timezone via DM directly and describe the situation. If the issue can be documented, please take a screenshot to include in your message.
|
||||
|
||||
What if you have a problem with an admin?
|
||||
Send a DM to another listed Admin and describe the situation OR
|
||||
If it’s a code of conduct issue, please send an email to steering-private@kubernetes.io and describe the situation
|
||||
|
||||
## BOTS, TOKENS, WEBHOOKS, OH MY
|
||||
|
||||
Bots, tokens, and webhooks are reviewed on a case-by-case basis with most requests being rejected due to security, privacy, and usability concerns.. Bots and the like tend to make a lot of noise in channels. Our Slack instance has over 30,000 people and we want everyone to have a great experience. Please join #Slack-admins and have a discussion about your request before requesting the access. GitHub workflow alerts into certain channels and requests from CNCF are typically OK.
|
||||
|
||||
## ADMIN MODERATION
|
||||
|
||||
Be mindful of how you handle communication during stressful interactions. Administrators act as direct representatives of the community, and need to maintain a very high level of professionalism at all times. If you feel too involved in the situation to maintain impartiality or professionalism, that’s a great time to enlist the help of another admin.
|
||||
|
||||
Try to take any situations that involve upset or angry members to DM or video chat. Please document these interactions for other Slack admins to review.
|
||||
|
||||
Content will be automatically removed if it violates code of conduct or is a sales pitch. Admins will take a screenshot of such behavior in order to document the situation. The community takes such violations extremely seriously, and they will be handled swiftly.
|
||||
|
||||
## INACTIVATING ACCOUNTS
|
||||
|
||||
For reasons listed below, admins may inactivate individual Slack accounts. Due to Slack’s framework, it does not allow for an account to be banned or suspended in the traditional sense. [Visit Slack’s policy on this.](https://get.Slack.help/hc/en-us/articles/204475027-Deactivate-a-member-s-account)
|
||||
|
||||
* Spreading spam content in DMs and/or channels
|
||||
* Not adhering to the code of conduct set forth in DMs and/or channels
|
||||
* Overtly selling products, related or unrelated to Kubernetes
|
||||
|
||||
## SPECIFIC CHANNEL RULES
|
||||
|
||||
In the case that certain channels have rules or guidelines, they will be listed in the purpose or pinned docs of that channel.
|
||||
|
||||
#kubernetes-dev = questions and discourse around upstream contributions and development to kubernetes
|
||||
#kubernetes-careers = job openings for positions working with/on/around Kubernetes. Postings should include contact details.
|
||||
|
||||
## DM (Direct Message) Conversations
|
||||
|
||||
Please do not engage in proprietary company specific conversations in the Kubernetes Slack instance. This is meant for conversations around related Kubernetes open source topics and community. Proprietary conversations should occur in your company Slack and/or communication platforms. As with all communication, please be mindful of appropriateness, professionalism, and applicability to the Kubernetes community.
|
|
@ -26,7 +26,7 @@ but will not allow tests to be run against their PRs automatically nor allow the
|
|||
### Requirements for outside collaborators
|
||||
|
||||
- Working on some contribution to the project that would benefit from
|
||||
the abillity to have PRs or Issues to be assigned to the contributor
|
||||
the ability to have PRs or Issues to be assigned to the contributor
|
||||
- Have the support of 1 member
|
||||
- Find a member who will sponsor you
|
||||
- Send an email to kubernetes-membership@googlegroups.com
|
||||
|
@ -64,7 +64,7 @@ Members are expected to remain active contributors to the community.
|
|||
- Sponsored by 2 reviewers. **Note the following requirements for sponsors**:
|
||||
- Sponsors must have close interactions with the prospective member - e.g. code/design/proposal review, coordinating on issues, etc.
|
||||
- Sponsors must be reviewers or approvers in at least 1 OWNERS file (in any repo in the Kubernetes GitHub organization)
|
||||
- Not a requirement, but having sponsorship from a reviewer from another company is encouraged (you get a gold star).
|
||||
- Sponsors must be from multiple member companies to demonstrate integration across community.
|
||||
- Send an email to *kubernetes-membership@googlegroups.com* with:
|
||||
- CC: your sponsors on the message
|
||||
- Subject: `REQUEST: New membership for <your-GH-handle>`
|
||||
|
@ -229,7 +229,7 @@ TODO: Determine if this role is outdated and needs to be redefined or merged int
|
|||
- Primary reviewer for 20 substantial PRs
|
||||
- Reviewed or merged at least 50 PRs
|
||||
- Apply to [`kubernetes-maintainers`](https://github.com/orgs/kubernetes/teams/kubernetes-maintainers), with:
|
||||
- A [Champion](https://github.com/kubernetes/community/blob/master/incubator.md#faq) from the existing
|
||||
- A [Champion](/incubator.md#faq) from the existing
|
||||
kubernetes-maintainers members
|
||||
- A Sponsor from Project Approvers
|
||||
- Summary of contributions to the project
|
||||
|
|
|
@ -1 +0,0 @@
|
|||
More info forthcoming about the 2017 Kubernetes Developer's Summit!
|
|
@ -1,12 +0,0 @@
|
|||
reviewers:
|
||||
- czahedi
|
||||
- nyener
|
||||
- zehicle
|
||||
- sarahnovotny
|
||||
- jberkus
|
||||
approvers:
|
||||
- czahedi
|
||||
- nyener
|
||||
- zehicle
|
||||
- sarahnovotny
|
||||
- jberkus
|
|
@ -1,14 +0,0 @@
|
|||
# Kubernetes Steering Committee Election 2017
|
||||
|
||||
The following is the platform statement from the candidates for the Steering Committee election.
|
||||
|
||||
For more information see the
|
||||
[Steering Committee Charter](https://docs.google.com/document/d/1LelgXRs6mjYzaZOzh4X-4uqhHGsFHYD3dbDRYVYml-0/edit#)
|
||||
|
||||
## Candidates:
|
||||
|
||||
<!-- Alphabetical order, add the link to your statement like this:
|
||||
|
||||
- [Jane Podlet](jane-podlet.md)
|
||||
|
||||
-->
|
|
@ -0,0 +1,18 @@
|
|||
reviewers:
|
||||
- brendandburns
|
||||
- dchen1107
|
||||
- jbeda
|
||||
- lavalamp
|
||||
- smarterclayton
|
||||
- thockin
|
||||
- wojtek-t
|
||||
- bgrant0607
|
||||
approvers:
|
||||
- brendandburns
|
||||
- dchen1107
|
||||
- jbeda
|
||||
- lavalamp
|
||||
- smarterclayton
|
||||
- thockin
|
||||
- wojtek-t
|
||||
- bgrant0607
|
|
@ -2,14 +2,10 @@
|
|||
|
||||
This directory contains Kubernetes design documents and accepted design proposals.
|
||||
|
||||
For a design overview, please see [the architecture document](architecture.md).
|
||||
For a design overview, please see [the architecture document](architecture/architecture.md).
|
||||
|
||||
Note that a number of these documents are historical and may be out of date or unimplemented.
|
||||
|
||||
TODO: Add the current status to each document and clearly indicate which are up to date.
|
||||
|
||||
TODO: Document the [proposal process](../devel/faster_reviews.md#1-dont-build-a-cathedral-in-one-pr).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
@ -0,0 +1,6 @@
|
|||
reviewers:
|
||||
- sig-api-machinery-leads
|
||||
approvers:
|
||||
- sig-api-machinery-leads
|
||||
labels:
|
||||
- sig/api-machinery
|
|
@ -44,7 +44,7 @@ that does not contain a discriminator.
|
|||
|---|---|
|
||||
| non-inlined non-discriminated union | Yes |
|
||||
| non-inlined discriminated union | Yes |
|
||||
| inlined union with [patchMergeKey](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#strategic-merge-patch) only | Yes |
|
||||
| inlined union with [patchMergeKey](/contributors/devel/api-conventions.md#strategic-merge-patch) only | Yes |
|
||||
| other inlined union | No |
|
||||
|
||||
For the inlined union with patchMergeKey, we move the tag to the parent struct's instead of
|
||||
|
@ -299,7 +299,7 @@ Each field present in the request will be merged with the live config.
|
|||
|
||||
There are 2 reasons of avoiding this logic:
|
||||
- Using `$patch` as directive key will break backward compatibility.
|
||||
But can easily fixed by using a different key, e.g. `retainKeys: true`.
|
||||
But can easily be fixed by using a different key, e.g. `retainKeys: true`.
|
||||
Reason is that `$patch` has been used in earlier releases.
|
||||
If we add new value to this directive,
|
||||
the old server will reject the new patch due to not knowing the new value.
|
|
@ -0,0 +1,960 @@
|
|||
# Webhooks Beta
|
||||
|
||||
|
||||
## PUBLIC
|
||||
Authors: @erictune, @caesarxuchao, @enisoc
|
||||
Thanks to: {@dbsmith, @smarterclayton, @deads2k, @cheftako, @jpbetz, @mbohlool, @mml, @janetkuo} for comments, data, prior designs, etc.
|
||||
|
||||
|
||||
[TOC]
|
||||
|
||||
|
||||
# Summary
|
||||
|
||||
This document proposes a detailed plan for bringing Webhooks to Beta. Highlights include (incomplete, see rest of doc for complete list) :
|
||||
|
||||
|
||||
|
||||
* Adding the ability for webhooks to mutate.
|
||||
* Bootstrapping
|
||||
* Monitoring
|
||||
* Versioned rather than Internal data sent on hook
|
||||
* Ordering behavior within webhooks, and with other admission phases, is better defined
|
||||
|
||||
This plan is compatible with the [original design doc](/contributors/design-proposals/api-machinery/admission_control_extension.md).
|
||||
|
||||
|
||||
# Definitions
|
||||
|
||||
**Mutating Webhook**: Webhook that can change a request as well as accept/reject.
|
||||
|
||||
**Non-Mutating Webhook**: Webhook that cannot change request, but can accept or reject.
|
||||
|
||||
**Webhook**: encompasses both Mutating Webhook and/or Non-mutating Webhook.
|
||||
|
||||
**Validating Webhook**: synonym for Non-Mutating Webhook
|
||||
|
||||
**Static Admission Controller**: Compiled-in Admission Controllers, (in plugin/pkg/admission).
|
||||
|
||||
**Webhook Host**: a process / binary hosting a webhook.
|
||||
|
||||
# Naming
|
||||
|
||||
Many names were considered before settling on mutating. None of the names
|
||||
considered were completely satisfactory. The following are the names which were
|
||||
considered and a brief explanation of the perspectives on each.
|
||||
|
||||
* Mutating: Well defined meaning related to mutable and immutable. Some
|
||||
negative connotations related to genetic mutation. Might be too specifically
|
||||
as CS term.
|
||||
* Defaulting: Clearly indicates a create use case. However implies a lack of
|
||||
functionality for the update use case.
|
||||
* Modifying: Similar issues to mutating but not as well defined.
|
||||
* Revising: Less clear what it does. Does it imply it works only on updates?
|
||||
* Transforming: Some concern that it might have more to do with changing the
|
||||
type or shape of the related object.
|
||||
* Adjusting: Same general perspective as modifying.
|
||||
* Updating: Nice clear meaning. However it seems to easy to confuse update with
|
||||
the update operation and intuit it does not apply to the create operation.
|
||||
|
||||
|
||||
# Development Plan
|
||||
|
||||
Google able to staff development, test, review, and documentation. Community help welcome, too, esp. Reviewing.
|
||||
|
||||
Intent is Beta of Webhooks (**both** kinds) in 1.9.
|
||||
|
||||
Not in scope:
|
||||
|
||||
|
||||
|
||||
* Initializers remains Alpha for 1.9. (See [Comparison of Webhooks and Initializers](#comparison-of-webhooks-and-initializers) section). No changes to it. Will revisit its status post-1.9.
|
||||
* Converting static admission controllers is out of scope (but some investigation has been done, see Moving Built-in Admission Controllers section).
|
||||
|
||||
|
||||
## Work Items
|
||||
|
||||
* Add API for registering mutating webhooks. See [API Changes](#api-changes)
|
||||
* Copy the non-mutating webhook admission controller code and rename it to be for mutating. (Splitting into two registration APIs make ordering clear.) Add changes to handle mutating responses. See [Responses for Mutations](#responses-for-mutations).
|
||||
* Document recommended flag order for admission plugins. See [Order of Admission](#order-of-admission).
|
||||
* In kube-up.sh and other installers, change flag per previous item.
|
||||
* Ensure able to monitor latency and rejection from webhooks. See [Monitorability](#monitorability).
|
||||
* Don't send internal objects. See [#49733](https://github.com/kubernetes/kubernetes/issues/49733)
|
||||
* Serialize mutating Webhooks into order in the apiregistration. Leave non-mutating in parallel.
|
||||
* Good Error Messages. See [Good Error Messages](#good-error-messages)
|
||||
* Conversion logic in GenericWebhook to send converted resource to webhook. See [Conversion](#conversion) and [#49733](https://github.com/kubernetes/kubernetes/issues/49733).
|
||||
* Schedule discussion around resiliency to down webhooks and bootstrapping
|
||||
* Internal Go interface refactor (e.g. along the lines suggested #[1137](https://github.com/kubernetes/community/pull/1137)).
|
||||
|
||||
|
||||
# Design Discussion
|
||||
|
||||
|
||||
## Why Webhooks First
|
||||
|
||||
We will do webhooks beta before initializers beta because:
|
||||
|
||||
|
||||
|
||||
1. **Serves Most Use Cases**: We reviewed code of all current use cases, namely: Kubernetes Built-in Admission Controllers, OpenShift Admission Controllers, Istio & Service Catalog. (See also [Use Cases Detailed Descriptions](#use-cases-detailed-descriptions).) All of those use cases are well served by mutating and non-mutating webhooks. (See also [Comparison of Webhooks and Initializers](#comparison-of-webhooks-and-initializers)).
|
||||
1. **Less Work**: An engineer quite experienced with both code bases estimated that it is less work to adding Mutating Webhooks and bring both kinds of webhooks to beta; than to bring non-mutating webhooks and initializers to Beta. Some open issues with Initializers with long expected development time include quota replenishment bug, and controller awareness of uninitialized objects.
|
||||
1. **API Consistency**: Prefer completing one related pair of interfaces (both kinds of webhooks) at the same time.
|
||||
|
||||
|
||||
## Why Support Mutation for Beta
|
||||
|
||||
Based on experience and feedback from the alpha phase of both Webhooks and Initializers, we believe Webhooks Beta should support mutation because:
|
||||
|
||||
|
||||
|
||||
1. We have lots of use cases to inform this (both from Initializers, and Admission Controllers) to ensure we have needed features
|
||||
1. We have experience with Webhooks API already to give confidence in the API. The registration API will be quite similar except in the responses.
|
||||
1. There is a strong community demand for something that satisfies a mutating case.
|
||||
|
||||
|
||||
## Plan for Existing Initializer-Clients
|
||||
|
||||
After the release of 1.9, we will advise users who currently use initializers to:
|
||||
|
||||
|
||||
|
||||
* Move to Webhooks if their use case fits that model well.
|
||||
* Provide SIG-API-Machinery with feedback if Initializers is a better fit.
|
||||
|
||||
We will continue to support Initializers as an Alpha API in 1.9.
|
||||
|
||||
We will make a user guide and extensively document these webhooks. We will update some existing examples, maybe https://github.com/caesarxuchao/example-webhook-admission-controller (since the initializer docs point to it, e.g. https://github.com/kelseyhightower/kubernetes-initializer-tutorial), or maybe https://github.com/openshift/generic-admission-server.
|
||||
|
||||
We will clearly document the reasons for each and how users should decide which to use.
|
||||
|
||||
|
||||
## Monitorability
|
||||
|
||||
There should be prometheus variables to show:
|
||||
|
||||
|
||||
|
||||
* API operation latency
|
||||
* Overall
|
||||
* By webhook name
|
||||
* API response codes
|
||||
* Overall
|
||||
* By webhook name.
|
||||
|
||||
Adding a webhook dynamically adds a key to a map-valued prometheus metric. Webhook host process authors should consider how to make their webhook host monitorable: while eventually we hope to offer a set of best practices around this, for the initial release we won't have requirements here.
|
||||
|
||||
|
||||
## API Changes
|
||||
|
||||
GenericAdmissionWebhook Admission Controller is split and renamed.
|
||||
|
||||
|
||||
|
||||
* One is called `MutatingAdmissionWebhook`
|
||||
* The other is called `ValidatingAdmissionWebhook`
|
||||
* Splitting them allows them to appear in different places in the `--admission-control` flag's order.
|
||||
|
||||
ExternalAdmissionHookConfiguration API is split and renamed.
|
||||
|
||||
|
||||
|
||||
* One is called `MutatingAdmissionWebhookConfiguration`
|
||||
* The other is called `ValidatingAdmissionWebhookConfiguration`
|
||||
* Splitting them:
|
||||
* makes it clear what the order is when some items don't have both flavors,
|
||||
* enforces mutate-before-validate,
|
||||
* better allows declarative update of the config than one big list with an implied partition point
|
||||
|
||||
The `ValidatingAdmissionWebhookConfiguration` stays the same as `ExternalAdmissionHookConfiguration` except it moves to v1beta1.
|
||||
|
||||
The `MutatingAdmissionWebhookConfiguration` is the same API as `ValidatingAdmissionWebhookConfiguration`. It is only visible via the v1beta1 version.
|
||||
|
||||
We will change from having a Kubernetes service object to just accepting a DNS
|
||||
name for the location of the webhook.
|
||||
|
||||
The Group/Version called
|
||||
|
||||
`admissionregistration.k8s.io/v1alpha1` with kinds
|
||||
|
||||
InitializerConfiguration and ExternalAdmissionHookConfiguration.
|
||||
|
||||
InitializerConfiguration will not join `admissionregistration.k8s.io/v1beta1` at this time.
|
||||
|
||||
Any webhooks that register with v1alpha1 may or may not be surprised when they start getting versioned data. But we don't make any promises for Alpha, and this is a very important bug to fix.
|
||||
|
||||
|
||||
## Order of Admission
|
||||
|
||||
At kubernetes.io, we will document the ordering requirements or just recommend a particular order for `--admission-control`. A starting point might be `MutatingAdmissionWebhook,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,ValidatingAdmissionWebhook,ResourceQuota`.
|
||||
|
||||
There might be other ordering dependencies that we will document clearly, but some important properties of a valid ordering:
|
||||
|
||||
* ResourceQuota comes last, so that if prior ones reject a request, it won't increment quota.
|
||||
* All other Static ones are in the order recommended by [the docs](https://kubernetes.io/docs/admin/admission-controllers/#is-there-a-recommended-set-of-plug-ins-to-use). (which variously do mutation and validation) Preserves the behavior when there are no webhooks.
|
||||
* Ensures dynamic mutations happen before all validations.
|
||||
* Ensures dynamic validations happen after all mutations.
|
||||
* Users don't need to reason about the static ones, just the ones they add.
|
||||
|
||||
System administrators will likely need to know something about the webhooks they
|
||||
intend to run in order to make the best ordering, but we will try to document a
|
||||
good "first guess".
|
||||
|
||||
Validation continues to happen after all the admission controllers (e.g. after mutating webhooks, static admission controllers, and non-mutating admission controllers.)
|
||||
|
||||
**TODO**: we should move ResourceQuota after Validation, e.g. as described in #1137. However, this is a longstanding bug and likely a larger change than can be done in 1.9--a larger quota redesign is out of scope. But we will likely make an improvement in the current ordering.
|
||||
|
||||
|
||||
## Parallel vs Serial
|
||||
|
||||
The main reason for parallel is reducing latency due to round trip and conversion. We think this can often mitigated by consolidating multiple webhooks shared by the same project into one.
|
||||
|
||||
Reasons not to allow parallel are complexity of reasoning about concurrent patches, and CRD not supporting PATCH.
|
||||
|
||||
`ValidatingAdmissionWebhook `is already parallel, and there are no responses to merge. Therefore, it stays parallel.
|
||||
|
||||
`MutatingAdmissionWebhook `will run in serial, to ensure conflicts are resolved deterministically.
|
||||
|
||||
The order is the sort order of all the WebhookConfigs, by name, and by index within the Webhooks list.
|
||||
|
||||
We don't plan to make mutating webhooks parallel at this time, but we will revisit the question in the future and decide before going to GA.
|
||||
|
||||
## Good Error Messages
|
||||
|
||||
When a webhook is persistently failing to allow e.g. pods to be created, then the error message from the apiserver must show which webhook failed.
|
||||
|
||||
When a core controller, e.g. ReplicaSet, fails to make a resources, it must send a helpful event that is visible in `kubectl describe` for the controlling resources, saying the reason create failed.
|
||||
|
||||
## Registering for all possible representations of the same object
|
||||
|
||||
Some Kubernetes resources are mounted in the api type system at multiple places
|
||||
(e.g., during a move between groups). Additionally, some resources have multiple
|
||||
active versions. There's not currently a way to easily tell which of the exposed
|
||||
resources map to the same "storage location". We will not try to solve that
|
||||
problem at the moment: if the system administrator wishes to hook all
|
||||
deployments, they must (e.g.) make sure their hook is registered for both
|
||||
deployments.v1beta1.extensions AND deployments.v1.apps.
|
||||
|
||||
This is likely to be error-prone, especially over upgrades. For GA, we may
|
||||
consider mechanisms to make this easier. We expect to gather user feedback
|
||||
before designing this.
|
||||
|
||||
|
||||
## Conversion and Versioning
|
||||
|
||||
Webhooks will receive the admission review subject in the exact version which
|
||||
the user sent it to the control plane. This may require the webhook to
|
||||
understand multiple versions of those types.
|
||||
|
||||
All communication to webhooks will be JSON formatted, with a request body of
|
||||
type admission.k8s.io/v1beta1. For GA, we will likely also allow proto, via a
|
||||
TBD mechanism.
|
||||
|
||||
We will not take any particular steps to make it possible to know whether an
|
||||
apiserver is safe to upgrade, given the webhooks it is running. System
|
||||
administrators must understand the stack of webhooks they are running, watch the
|
||||
Kubernetes release notes, and look to the webhook authors for guidance about
|
||||
whether the webhook supports Kubernetes version N. We may choose to address this
|
||||
deficency in future betas.
|
||||
|
||||
To follow the debate that got us to this position, you can look at this
|
||||
potential design for the next steps: https://docs.google.com/document/d/1BT8mZaT42jVxtC6l14YMXpUq0vZc6V5MPf_jnzDMMcg/edit
|
||||
|
||||
|
||||
## Mutations
|
||||
|
||||
The Response for `MutatingAdmissionWebhook` must have content-type, and it must be one of:
|
||||
|
||||
* `application/json`
|
||||
* `application/protobuf`
|
||||
* `application/strategic-merge-patch+json`
|
||||
* `application/json-patch+json`
|
||||
* `application/merge-json-patch+json`
|
||||
|
||||
If the response is a patch, it is merged with the versioned response from the previous webhook, where possible without Conversion.
|
||||
|
||||
We encourage the use of patch to avoid the "old clients dropping new fields" problem.
|
||||
|
||||
|
||||
## Bootstrapping
|
||||
|
||||
Bootstrapping (both turning on a cluster for the first time and making sure a
|
||||
cluster can boot from a cold start) is made more difficult by having webhooks,
|
||||
which are a dependency of the control plane. This is covered in its [own design
|
||||
doc](./admission-webhook-bootstrapping.md).
|
||||
|
||||
## Upgrading the control plane
|
||||
|
||||
There are two categories of webhooks: security critical (e.g., scan images for
|
||||
vulnerabilities) and nice-to-have (set labels).
|
||||
|
||||
Security critical webhooks cannot work with Kubernetes types they don't have
|
||||
built-in knowledge of, because they can't know if e.g. Kubernetes 1.11 adds a
|
||||
backwards-compatible `v1.Pod.EvilField` which will defeat their functionality.
|
||||
|
||||
They therefore need to be updated before any apiserver. It is the responsibility
|
||||
of the author of such a webhook to release new versions in response to new
|
||||
Kubernetes versions in a timely manner. Webhooks must support two consecutive
|
||||
Kubernetes versions so that rollback/forward is possible. When/if Kubernetes
|
||||
introduces LTS versions, webhook authors will have to also support two
|
||||
consecutive LTS versions.
|
||||
|
||||
Non-security-critical webhooks can either be turned off to perform an upgrade,
|
||||
or can just continue running the old webhook version as long as a completely new
|
||||
version of an object they want to hook is not added. If they are metadata-only
|
||||
hooks, then they should be able to run until we deprecate meta/v1. Such webhooks
|
||||
should document that they don't consider themselves security critical, aren't
|
||||
obligated to follow the above requirements for security-critical webhooks, and
|
||||
therefore do not guarantee to be updated for every Kubernetes release.
|
||||
|
||||
It is expected that webhook authors will distribute config for each Kubernetes
|
||||
version that registers their webhook for all the necessary types, since it would
|
||||
be unreasonable to make system administrators understand all of the webhooks
|
||||
they run to that level of detail.
|
||||
|
||||
## Support for Custom Resources
|
||||
|
||||
Webhooks should work with Custom Resources created by CRDs.
|
||||
|
||||
They are particularly needed for Custom Resources, where they can supplement the validation and defaulting provided by OpenAPI. Therefore, the webhooks will be moved or copied to genericapiserver for 1.9.
|
||||
|
||||
|
||||
## Support for Aggregated API Servers
|
||||
|
||||
Webhooks should work with Custom Resources on Aggregated API Servers.
|
||||
|
||||
Aggregated API Servers should watch apiregistraton on the main APIserver, and should identify webhooks with rules that match any of their resources, and call those webhooks.
|
||||
|
||||
For example a user might install a Webhook that adds a certain annotation to every single object. Aggregated APIs need to support this use case.
|
||||
|
||||
We will build the dynamic admission stack into the generic apiserver layer to support this use case.
|
||||
|
||||
|
||||
## Moving Built-in Admission Controllers
|
||||
|
||||
This section summarizes recommendations for Posting static admission controllers to Webhooks.
|
||||
|
||||
See also [Details of Porting Admission Controllers](#details-of-porting-admission-controllers) and this [Backup Document](https://docs.google.com/spreadsheets/d/1zyCABnIzE7GiGensn-KXneWrkSJ6zfeJWeLaUY-ZmM4/edit#gid=0).
|
||||
|
||||
Here is an estimate of how each kind of admission controller would be moved (or not). This is to see if we can cover the use cases we currently have, not necessarily a promise that all of these will or should be move into another process.
|
||||
|
||||
|
||||
|
||||
* Leave static:
|
||||
* OwnerReferencesPermissionEnforcement
|
||||
* GC is a core feature of Kubernetes. Move to required.
|
||||
* ResourceQuota
|
||||
* May [redesign](https://github.com/kubernetes/kubernetes/issues/51820)
|
||||
* Original design doc says it remains static.
|
||||
* Divide into Mutating and non-mutating Webhooks
|
||||
* PodSecurityPolicy
|
||||
* NamespaceLifecycle
|
||||
* Use Mutating Webhook
|
||||
* AlwaysPullImages
|
||||
* ServiceAccount
|
||||
* StorageClass
|
||||
* Use non-mutating Webhook
|
||||
* Eventratelimit
|
||||
* DenyEscalatingExec
|
||||
* ImagePolicy
|
||||
* Need to standardize the webhook format
|
||||
* NodeRestriction
|
||||
* Needs to be admission to access User.Info
|
||||
* PodNodeSelector
|
||||
* PodTolerationRestriction
|
||||
* Move to resource's validation or defaulting
|
||||
* AntiAffinity
|
||||
* DefaultTolerationSeconds
|
||||
* PersistentVolumeClaimResize
|
||||
* Initializers are reasonable to consider moving into the API machinery
|
||||
|
||||
For "Divide", the backend may well be different port of same binary, sharing a SharedInformer, so data is not cached twice.
|
||||
|
||||
For all Kubernetes built-in webhooks, the backend will likely be compiled into kube-controller-manager and share the SharedInformer.
|
||||
|
||||
|
||||
# Use Case Analysis
|
||||
|
||||
|
||||
## Use Cases Detailed Descriptions
|
||||
|
||||
Mutating Webhooks, Non-mutating webhooks, Initializers, and Finalizers (collectively, Object Lifecycle Extensions) serve to:
|
||||
|
||||
|
||||
|
||||
* allow policy and behavioral changes to be developed independently of the control loops for individual Resources. These might include company specific rules, or a PaaS that layers on top of Kubernetes.
|
||||
* implement business logic for Custom Resource Definitions
|
||||
* separate Kubernetes business logic from the core Apiserver logic, which increases reusability, security, and reliability of the core.
|
||||
|
||||
Specific Use cases:
|
||||
|
||||
|
||||
|
||||
* Kubernetes static Admission Controllers
|
||||
* Documented [here](https://kubernetes.io/docs/admin/admission-controllers/)
|
||||
* Discussed [here](/contributors/design-proposals/api-machinery/admission_control_extension.md)
|
||||
* All are highly reliable. Most are simple. No external deps.
|
||||
* Many need update checks.
|
||||
* Can be separated into mutation and validate phases.
|
||||
* OpenShift static Admission Controllers
|
||||
* Discussed [here](/contributors/design-proposals/api-machinery/admission_control_extension.md)
|
||||
* Similar to Kubernetes ones.
|
||||
* Istio, Case 1: Add Container to all Pods.
|
||||
* Currently uses Initializer but can use Mutating Webhook.
|
||||
* Simple, can be highly reliable and fast. No external deps.
|
||||
* No current use case for updates.
|
||||
* Istio, Case 2: Validate Mixer CRDs
|
||||
* Checking cached values from other CRD objects.
|
||||
* No external deps.
|
||||
* Must check updates.
|
||||
* Service Catalog
|
||||
* Watch PodPreset and edit Pods.
|
||||
* Simple, can be highly reliable and fast. No external deps.
|
||||
* No current use case for updates.
|
||||
|
||||
Good further discussion of use cases [here](/contributors/design-proposals/api-machinery/admission_control_extension.md)
|
||||
|
||||
|
||||
## Details of Porting Admission Controllers
|
||||
|
||||
This section summarizes which Kubernetes static admission controllers can readily be ported to Object Lifecycle Extensions.
|
||||
|
||||
|
||||
### Static Admission Controllers
|
||||
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td>Admission Controller
|
||||
</td>
|
||||
<td>How
|
||||
</td>
|
||||
<td>Why
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>PodSecurityPolicy
|
||||
</td>
|
||||
<td>Use Mutating Webhook and Non-Mutating Webhook.
|
||||
</td>
|
||||
<td>Requires User.Info, so needs webhook.
|
||||
<p>
|
||||
Mutating will set SC from matching PSP.
|
||||
<p>
|
||||
Non-Mutating will check again in case any other mutators or initializers try to change it.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ResourceQuota
|
||||
</td>
|
||||
<td>Leave static
|
||||
</td>
|
||||
<td>A Redesign for Resource Quota has been proposed, to allow at least object count quota for other objects as well. This suggests that Quota might need to remain compiled in like authn and authz are.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>AlwaysPullImages
|
||||
</td>
|
||||
<td>Use Mutating Webhook (could implement using initializer since the thing is it validating is forbidden to change by Update Validation of the object)
|
||||
</td>
|
||||
<td>Needs to
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>AntiAffinity
|
||||
</td>
|
||||
<td>Move to pod validation
|
||||
</td>
|
||||
<td>Since this is provided by the core project, which also manages the pod business logic, it isn't clear why this is even an admission controller. Ask Scheduler people.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>DefaultTolerationSeconds
|
||||
</td>
|
||||
<td>Move to pod defaulting or use a Mutating Webhook.
|
||||
</td>
|
||||
<td>It is very simple.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>eventratelimit
|
||||
</td>
|
||||
<td>Non-mutating webhook
|
||||
</td>
|
||||
<td>Simple logic, does not mutate. Alternatively, have rate limit be a built-in of api server.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>DenyEscalatingExec
|
||||
</td>
|
||||
<td>Non-mutating Webhook.
|
||||
</td>
|
||||
<td>It is very simple. It is optional.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>OwnerReferences- PermissionEnforcement (gc)
|
||||
</td>
|
||||
<td>Leave compiled in
|
||||
</td>
|
||||
<td>Garbage collection is core to Kubernetes. Main and all aggregated apiservers should enforce it.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ImagePolicy
|
||||
</td>
|
||||
<td>Non-mutating webhook
|
||||
</td>
|
||||
<td>Must use webhook since image can be updated on pod, and that needs to be checked.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>LimitRanger
|
||||
</td>
|
||||
<td>Mutating Webhook
|
||||
</td>
|
||||
<td>Fast
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>NamespaceExists
|
||||
</td>
|
||||
<td>Leave compiled in
|
||||
</td>
|
||||
<td>This has been on by default for years, right?
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>NamespaceLifecycle
|
||||
</td>
|
||||
<td>Split:
|
||||
<p>
|
||||
|
||||
<p>
|
||||
Cleanup, leave compiled in.
|
||||
<p>
|
||||
|
||||
<p>
|
||||
Protection of system namespaces: use non-mutating webhook
|
||||
</td>
|
||||
<td>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>NodeRestriction
|
||||
</td>
|
||||
<td>Use a non-mutating webhook
|
||||
</td>
|
||||
<td>Needs webhook so it can use User.Info.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>PersistentVolumeClaimResize
|
||||
</td>
|
||||
<td>Move to validation
|
||||
</td>
|
||||
<td>This should be in the validation logic for storage class.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>PodNodeSelector
|
||||
</td>
|
||||
<td>Move to non-mutating webhook
|
||||
</td>
|
||||
<td>Already compiled in, so fast enough to use webhook. Does not mutate.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>podtolerationrestriction
|
||||
</td>
|
||||
<td>Move to non-mutating webhook
|
||||
</td>
|
||||
<td>Already compiled in, so fast enough to use webhook. Does not mutate.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>serviceaccount
|
||||
</td>
|
||||
<td>Move to mutating webhook.
|
||||
</td>
|
||||
<td>Already compiled in, so fast enough to use webhook. Does mutate by defaulting the service account.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>storageclass
|
||||
</td>
|
||||
<td>Move to mutating webhook.
|
||||
</td>
|
||||
<td>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
[Backup Document](https://docs.google.com/spreadsheets/d/1zyCABnIzE7GiGensn-KXneWrkSJ6zfeJWeLaUY-ZmM4/edit#gid=0)
|
||||
|
||||
|
||||
### OpenShift Admission Controllers
|
||||
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td>Admission Controller
|
||||
</td>
|
||||
<td>How
|
||||
</td>
|
||||
<td>Why
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/authorization/admission/restrictusers"
|
||||
</td>
|
||||
<td>Non-mutating Webhook or leave static
|
||||
</td>
|
||||
<td>Verification only. But uses a few loopback clients to check other resources.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/build/admission/jenkinsbootstrapper
|
||||
</td>
|
||||
<td>Non-mutating Webhook or leave static
|
||||
</td>
|
||||
<td>Doesn't mutate Build or BuildConfig, but creates Jenkins instances.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/build/admission/secretinjector
|
||||
</td>
|
||||
<td>Mutating webhook or leave static
|
||||
</td>
|
||||
<td>uses a few loopback clients to check other resources.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/build/admission/strategyrestrictions
|
||||
</td>
|
||||
<td>Non-mutating Webhook or leave static
|
||||
</td>
|
||||
<td>Verifications only. But uses a few loopback clients, and calls subjectAccessReview
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/image/admission
|
||||
</td>
|
||||
<td>Non-Mutating Webhook
|
||||
</td>
|
||||
<td>Fast, checks image size
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/image/admission/imagepolicy
|
||||
</td>
|
||||
<td>Mutating and non-mutating webhooks
|
||||
</td>
|
||||
<td>Rewriting image pull spec is mutating.
|
||||
<p>
|
||||
acceptor.Accepts is non-Mutating
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/ingress/admission
|
||||
</td>
|
||||
<td>Non-mutating webhook, or leave static.
|
||||
</td>
|
||||
<td>Simple, but calls to authorizer.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/project/admission/lifecycle
|
||||
</td>
|
||||
<td>Initializer or Non-mutating webhook?
|
||||
</td>
|
||||
<td>Needs to update another resource: Namespace
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/project/admission/nodeenv
|
||||
</td>
|
||||
<td>Mutating webhook
|
||||
</td>
|
||||
<td>Fast
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/project/admission/requestlimit
|
||||
</td>
|
||||
<td>Non-mutating webhook
|
||||
</td>
|
||||
<td>Fast, verification only
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/quota/admission/clusterresourceoverride
|
||||
</td>
|
||||
<td>Mutating webhook
|
||||
</td>
|
||||
<td>Updates container resource request and limit
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/quota/admission/clusterresourcequota
|
||||
</td>
|
||||
<td>Leave static.
|
||||
</td>
|
||||
<td>Refactor with the k8s quota
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/quota/admission/runonceduration
|
||||
</td>
|
||||
<td>Mutating webhook
|
||||
</td>
|
||||
<td>Fast. Needs a ProjectCache though. Updates pod.Spec.ActiveDeadlineSeconds
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/scheduler/admission/podnodeconstraints
|
||||
</td>
|
||||
<td>Non-mutating webhook or leave static
|
||||
</td>
|
||||
<td>Verification only. But calls to authorizer.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/security/admission
|
||||
</td>
|
||||
<td>Use Mutating Webhook and Non-Mutating Webhook.
|
||||
</td>
|
||||
<td>Similar to PSP in k8s
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/service/admission/externalip
|
||||
</td>
|
||||
<td>Non-mutating webhook
|
||||
</td>
|
||||
<td>Fast and verification only
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>pkg/service/admission/endpoints
|
||||
</td>
|
||||
<td>Non-mutating webhook or leave static
|
||||
</td>
|
||||
<td>Verification only. But calls to authorizer.
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
|
||||
### Other Projects
|
||||
|
||||
Istio Pod Injector:
|
||||
|
||||
|
||||
|
||||
* Injects Sidecar Container, Init Container, adds a volume for Istio config, and changes the Security Context
|
||||
* Source:
|
||||
* https://github.com/istio/pilot/blob/master/platform/kube/inject/inject.go#L278
|
||||
* https://github.com/istio/pilot/blob/master/cmd/sidecar-initializer/main.go
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td>
|
||||
Function
|
||||
</td>
|
||||
<td>How
|
||||
</td>
|
||||
<td>Why
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Istio Pod Injector
|
||||
</td>
|
||||
<td>Mutating Webhook
|
||||
</td>
|
||||
<td>Containers can only be added at pod creation time.
|
||||
<p>
|
||||
Because the change is complex, showing intermediate state may help debugging.
|
||||
<p>
|
||||
Fast, so could also use webhook.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Istio Mixer CRD Validation
|
||||
</td>
|
||||
<td>Non-Mutating Webhook
|
||||
</td>
|
||||
<td>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Service Catalog PodPreset
|
||||
</td>
|
||||
<td>Initializer
|
||||
</td>
|
||||
<td>Containers can only be added at pod creation time.
|
||||
<p>
|
||||
Because the change is complex, showing intermediate state may help debugging.
|
||||
<p>
|
||||
Fast, so could also use webhook.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Allocate Cert for Service
|
||||
</td>
|
||||
<td>Initializer
|
||||
</td>
|
||||
<td>Longer duration operation which might fail, with external dependency, so don't use webhook.
|
||||
<p>
|
||||
Let user see initializing state.
|
||||
<p>
|
||||
Don't let controllers that depend on services see the service before it is ready.
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
|
||||
## Comparison of Webhooks and Initializers
|
||||
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td>Mutating and Non-Mutating Webhooks
|
||||
</td>
|
||||
<td>Initializers (and Finalizers)
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><ul>
|
||||
|
||||
<li>Act on Create, update, or delete
|
||||
<li>Reject Create, Update or delete</li></ul>
|
||||
|
||||
</td>
|
||||
<td><ul>
|
||||
|
||||
<li>Act on Create and delete
|
||||
<li>Reject Create.</li></ul>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><ul>
|
||||
|
||||
<li>Clients never see pre-created state. <ul>
|
||||
|
||||
<li>Good for enforcement.
|
||||
<li>Simple invariants.</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
<td><ul>
|
||||
|
||||
<li>Clients can see pre-initialized state. <ul>
|
||||
|
||||
<li>Let clients see progress
|
||||
<li>Debuggable</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><ul>
|
||||
|
||||
<li>Admin cannot easily override broken webhook. <ul>
|
||||
|
||||
<li>Must be highly reliable code
|
||||
<li>Avoid deps on external systems.</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
<td><ul>
|
||||
|
||||
<li>Admin can easily fix a "stuck" object by "manually" initializing (or finalizing). <ul>
|
||||
|
||||
<li>Can be <em>slightly</em> less reliable.
|
||||
<li>Prefer when there are deps on external systems.</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><ul>
|
||||
|
||||
<li>Synchronous <ul>
|
||||
|
||||
<li>Apiserver uses a go routine
|
||||
<li>TCP connection open
|
||||
<li>Should be very low latency</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
<td><ul>
|
||||
|
||||
<li>Asynchronous <ul>
|
||||
|
||||
<li>Can be somewhat higher latency</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><ul>
|
||||
|
||||
<li>Does not persist intermediate state <ul>
|
||||
|
||||
<li>Should happen very quickly.
|
||||
<li>Does not increase etcd traffic.</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
<td><ul>
|
||||
|
||||
<li>Persist intermediate state <ul>
|
||||
|
||||
<li>Longer ops can persist across apiserver upgrades/failures
|
||||
<li>Does increase etcd traffic.</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><ul>
|
||||
|
||||
<li>Webhook does not know if later webhooks fail <ul>
|
||||
|
||||
<li>Must not have side effects,
|
||||
<li>Or have a really good GC plan.</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
<td><ul>
|
||||
|
||||
<li>Initializer does not know if later initializers fail, but if paired with a finalizer, it could see the resource again. <ul>
|
||||
|
||||
<li>This is not implemented
|
||||
<li>TODO: initializers: have a way to ensure finalizer runs even if later initializers reject?</li> </ul>
|
||||
</li> </ul>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Use Examples:<ul>
|
||||
|
||||
<li>checking one field on an object, and setting another field on the same object</li></ul>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
Use Examples:<ul>
|
||||
|
||||
<li>Allocate (and deallocate) external resource in parallel with a Kubernetes resource.</li></ul>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
Another [Detailed Comparison of Initializers and Webhooks](https://docs.google.com/document/d/17P_XjXDpxDC5xSD0nMT1W18qE2AlCMkVJcV6jXKlNIs/edit?ts=59d5683b#heading=h.5irk4csrpu0y)
|
|
@ -0,0 +1,98 @@
|
|||
# Webhook Bootstrapping
|
||||
|
||||
## Background
|
||||
[Admission webhook](./admission-control-webhooks.md) is a feature that
|
||||
dynamically extends Kubernetes admission chain. Because the admission webhooks
|
||||
are in the critical path of admitting REST requests, broken webhooks could block
|
||||
the entire cluster, even blocking the reboot of the webhooks themselves. This
|
||||
design presents a way to avoid such bootstrap deadlocks.
|
||||
|
||||
## Objective
|
||||
- If one or more webhooks are down, it should be able restart them automatically.
|
||||
- If a core system component that supports webhooks is down, the component
|
||||
should be able to restart.
|
||||
|
||||
## Design idea
|
||||
We add a selector to the admission webhook configuration, which will be compared
|
||||
to the labels of namespaces. Only objects in the matching namespaces are
|
||||
subjected to the webhook admission. A cluster admin will want to exempt these
|
||||
namespaces from webhooks:
|
||||
- Namespaces where this webhook and other webhooks are deployed in;
|
||||
- Namespaces where core system components are deployed in.
|
||||
|
||||
## API Changes
|
||||
`ExternalAdmissionHook` is the dynamic configuration API of an admission webhook.
|
||||
We will add a new field `NamespaceSelector` to it:
|
||||
|
||||
```golang
|
||||
type ExternalAdmissionHook struct {
|
||||
Name string
|
||||
ClientConfig AdmissionHookClientConfig
|
||||
Rules []RuleWithOperations
|
||||
FailurePolicy *FailurePolicyType
|
||||
// Only objects in matching namespaces are subjected to this webhook.
|
||||
// LabelSelector.MatchExpressions allows exclusive as well as inclusive
|
||||
// matching, so you can use this // selector as a whitelist or a blacklist.
|
||||
// For example, to apply the webhook to all namespaces except for those have
|
||||
// labels with key "runlevel" and value equal to "0" or "1":
|
||||
// metav1.LabelSelctor{MatchExpressions: []LabelSelectorRequirement{
|
||||
// {
|
||||
// Key: "runlevel",
|
||||
// Operator: metav1.LabelSelectorOpNotIn,
|
||||
// Value: []string{"0", "1"},
|
||||
// },
|
||||
// }}
|
||||
// As another example, to only apply the webhook to the namespaces that have
|
||||
// labels with key “environment” and value equal to “prod” and “staging”:
|
||||
// metav1.LabelSelctor{MatchExpressions: []LabelSelectorRequirement{
|
||||
// {
|
||||
// Key: "environment",
|
||||
// Operator: metav1.LabelSelectorOpIn,
|
||||
// Value: []string{"prod", "staging"},
|
||||
// },
|
||||
// }}
|
||||
// See https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ for more examples of label selectors.
|
||||
NamespaceSelector *metav1.LabelSelector
|
||||
}
|
||||
```
|
||||
|
||||
## Guidelines on namespace labeling
|
||||
The mechanism depends on cluster admin properly labelling the namespaces. We
|
||||
will provide guidelines on the labelling scheme. One suggestion is labelling
|
||||
namespaces with runlevels. The design of runlevels is out of the scope of this
|
||||
document (tracked in
|
||||
[#54522](https://github.com/kubernetes/kubernetes/issues/54522)), a strawman
|
||||
runlevel scheme is:
|
||||
|
||||
- runlevel 0: namespaces that host core system components, like kube-apiserver
|
||||
and kube-controller-manager.
|
||||
- runlevel 1: namespaces that host add-ons that are part of the webhook serving
|
||||
stack, e.g., kube-dns.
|
||||
- runlevel 2: namespaces that host webhooks deployments and services.
|
||||
|
||||
`ExternalAdmissionHook.NamespaceSelector` should be configured to skip all the
|
||||
above namespaces. In the case where some webhooks depend on features offered by
|
||||
other webhooks, the system administrator could extend this concept further (run
|
||||
level 3, 4, 5, …) to accommodate them.
|
||||
|
||||
## Security implication
|
||||
The mechanism depends on namespaces being properly labelled. We assume only
|
||||
highly privileged users can modify namespace labels. Note that the system
|
||||
already relies on correct namespace annotations, examples include the
|
||||
podNodeSelector admission plugin, and the podTolerationRestriction admission
|
||||
plugin etc.
|
||||
|
||||
# Considered Alternatives
|
||||
- Allow each webhook to exempt one namespace
|
||||
- Doesn’t work: if there are two webhooks in two namespaces both blocking pods
|
||||
startup, they will block each other.
|
||||
- Put all webhooks in a single namespace and let webhooks exempt that namespace,
|
||||
e.g., deploy webhooks in the “kube-system” namespace and exempt the namespace.
|
||||
- It doesn’t provide sufficient isolation. Not all objects in the
|
||||
“kube-system” namespace should bypass webhooks.
|
||||
- Add namespace selector to webhook configuration, but use the selector to match
|
||||
the name of namespaces
|
||||
([#1191](https://github.com/kubernetes/community/pull/1191)).
|
||||
- Violates k8s convention. The matching label (key=name, value=<namespace’s
|
||||
name>) is imaginary.
|
||||
- Hard to manage. Namespace’s name is arbitrary.
|
|
@ -99,8 +99,3 @@ following:
|
|||
- If operation=connect, exec
|
||||
|
||||
If at any step, there is an error, the request is canceled.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,176 @@
|
|||
# Admission control plugin: EventRateLimit
|
||||
|
||||
## Background
|
||||
|
||||
This document proposes a system for using an admission control to enforce a limit
|
||||
on the number of event requests that the API Server will accept in a given time
|
||||
slice. In a large cluster with many namespaces managed by disparate administrators,
|
||||
there may be a small percentage of namespaces that have pods that are always in
|
||||
some type of error state, for which the kubelets and controllers in the cluster
|
||||
are producing a steady stream of error event requests. Each individual namespace
|
||||
may not be causing a large amount of event requests on its own, but taken
|
||||
collectively the errors from this small percentage of namespaces can have a
|
||||
significant impact on the performance of the cluster overall.
|
||||
|
||||
## Use cases
|
||||
|
||||
1. Ability to protect the API Server from being flooded by event requests.
|
||||
2. Ability to protect the API Server from being flooded by event requests for
|
||||
a particular namespace.
|
||||
3. Ability to protect the API Server from being flooded by event requests for
|
||||
a particular user.
|
||||
4. Ability to protect the API Server from being flooded by event requests from
|
||||
a particular source+object.
|
||||
|
||||
## Data Model
|
||||
|
||||
### Configuration
|
||||
|
||||
```go
|
||||
// LimitType is the type of the limit (e.g., per-namespace)
|
||||
type LimitType string
|
||||
|
||||
const (
|
||||
// ServerLimitType is a type of limit where there is one bucket shared by
|
||||
// all of the event queries received by the API Server.
|
||||
ServerLimitType LimitType = "server"
|
||||
// NamespaceLimitType is a type of limit where there is one bucket used by
|
||||
// each namespace
|
||||
NamespaceLimitType LimitType = "namespace"
|
||||
// UserLimitType is a type of limit where there is one bucket used by each
|
||||
// user
|
||||
UserLimitType LimitType = "user"
|
||||
// SourceAndObjectLimitType is a type of limit where there is one bucket used
|
||||
// by each combination of source and involved object of the event.
|
||||
SourceAndObjectLimitType LimitType = "sourceAndObject"
|
||||
)
|
||||
|
||||
// Configuration provides configuration for the EventRateLimit admission
|
||||
// controller.
|
||||
type Configuration struct {
|
||||
metav1.TypeMeta `json:",inline"`
|
||||
|
||||
// limits are the limits to place on event queries received.
|
||||
// Limits can be placed on events received server-wide, per namespace,
|
||||
// per user, and per source+object.
|
||||
// At least one limit is required.
|
||||
Limits []Limit `json:"limits"`
|
||||
}
|
||||
|
||||
// Limit is the configuration for a particular limit type
|
||||
type Limit struct {
|
||||
// type is the type of limit to which this configuration applies
|
||||
Type LimitType `json:"type"`
|
||||
|
||||
// qps is the number of event queries per second that are allowed for this
|
||||
// type of limit. The qps and burst fields are used together to determine if
|
||||
// a particular event query is accepted. The qps determines how many queries
|
||||
// are accepted once the burst amount of queries has been exhausted.
|
||||
QPS int32 `json:"qps"`
|
||||
|
||||
// burst is the burst number of event queries that are allowed for this type
|
||||
// of limit. The qps and burst fields are used together to determine if a
|
||||
// particular event query is accepted. The burst determines the maximum size
|
||||
// of the allowance granted for a particular bucket. For example, if the burst
|
||||
// is 10 and the qps is 3, then the admission control will accept 10 queries
|
||||
// before blocking any queries. Every second, 3 more queries will be allowed.
|
||||
// If some of that allowance is not used, then it will roll over to the next
|
||||
// second, until the maximum allowance of 10 is reached.
|
||||
Burst int32 `json:"burst"`
|
||||
|
||||
// cacheSize is the size of the LRU cache for this type of limit. If a bucket
|
||||
// is evicted from the cache, then the allowance for that bucket is reset. If
|
||||
// more queries are later received for an evicted bucket, then that bucket
|
||||
// will re-enter the cache with a clean slate, giving that bucket a full
|
||||
// allowance of burst queries.
|
||||
//
|
||||
// The default cache size is 4096.
|
||||
//
|
||||
// If limitType is 'server', then cacheSize is ignored.
|
||||
// +optional
|
||||
CacheSize int32 `json:"cacheSize,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
### Validation
|
||||
|
||||
Validation of a **Configuration** enforces that the following rules apply:
|
||||
|
||||
* There is at least one item in **Limits**.
|
||||
* Each item in **Limits** has a unique **Type**.
|
||||
|
||||
Validation of a **Limit** enforces that the following rules apply:
|
||||
|
||||
* **Type** is one of "server", "namespace", "user", and "source+object".
|
||||
* **QPS** is positive.
|
||||
* **Burst** is positive.
|
||||
* **CacheSize** is non-negative.
|
||||
|
||||
### Default Value Behavior
|
||||
|
||||
If there is no item in **Limits** for a particular limit type, then no limits
|
||||
will be enforced for that type of limit.
|
||||
|
||||
## AdmissionControl plugin: EventRateLimit
|
||||
|
||||
The **EventRateLimit** plug-in introspects all incoming event requests and
|
||||
determines whether the event fits within the rate limits configured.
|
||||
|
||||
To enable the plug-in and support for EventRateLimit, the kube-apiserver must
|
||||
be configured as follows:
|
||||
|
||||
```console
|
||||
$ kube-apiserver --admission-control=EventRateLimit --admission-control-config-file=$ADMISSION_CONTROL_CONFIG_FILE
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
An example EventRateLimit configuration:
|
||||
|
||||
| Type | RequestBurst | RequestRefillRate | CacheSize |
|
||||
| ---- | ------------ | ----------------- | --------- |
|
||||
| Server | 1000 | 100 | |
|
||||
| Namespace | 100 | 10 | 50 |
|
||||
|
||||
The API Server starts with an allowance to accept 1000 event requests. Each
|
||||
event request received counts against that allowance. The API Server refills
|
||||
the allowance at a rate of 100 per second, up to a maximum allowance of 1000.
|
||||
If the allowance is exhausted, then the API Server will respond to subsequent
|
||||
event requests with 429 Too Many Requests, until the API Server adds more to
|
||||
its allowance.
|
||||
|
||||
For example, let us say that at time t the API Server has a full allowance to
|
||||
accept 1000 event requests. At time t, the API Server receives 1500 event
|
||||
requests. The first 1000 to be handled are accepted. The last 500 are rejected
|
||||
with a 429 response. At time t + 1 second, the API Server has refilled its
|
||||
allowance with 100 tokens. At time t + 1 second, the API Server receives
|
||||
another 500 event requests. The first 100 to be handled are accepted. The last
|
||||
400 are rejected.
|
||||
|
||||
The API Server also starts with an allowance to accept 100 event requests from
|
||||
each namespace. This allowance works in parallel with the server-wide
|
||||
allowance. An accepted event request will count against both the server-side
|
||||
allowance and the per-namespace allowance. An event request rejected by the
|
||||
server-side allowance will still count against the per-namespace allowance,
|
||||
and vice versa. The API Server tracks the allowances for at most 50 namespaces.
|
||||
The API Server will stop tracking the allowance for the least-recently-used
|
||||
namespace if event requests from more than 50 namespaces are received. If an
|
||||
event request for namespace N is received after the API Server has stop
|
||||
tracking the allowance for namespace N, then a new, full allowance will be
|
||||
created for namespace N.
|
||||
|
||||
In this example, the API Server will track any allowances for neither the user
|
||||
nor the source+object in an event request because both the user and the
|
||||
source+object details have been omitted from the configuration. The allowance
|
||||
mechanisms for per-user and per-source+object rate limiting works identically
|
||||
to the per-namespace rate limiting, with the exception that the former consider
|
||||
the user of the event request or source+object of the event and the latter
|
||||
considers the namespace of the event request.
|
||||
|
||||
## Client Behavior
|
||||
|
||||
Currently, the Client event recorder treats a 429 response as an http transport
|
||||
type of error, which warrants retrying the event request. Instead, the event
|
||||
recorder should abandon the event. Additionally, the event recorder should
|
||||
abandon all future events for the period of time specified in the
|
||||
Retry-After header of the 429 response.
|
|
@ -68,7 +68,7 @@ Name | Code | Description
|
|||
AlwaysPullImages | alwayspullimages/admission.go | Forces the Kubelet to pull images to prevent pods from accessing private images that another user with credentials has already pulled to the node.
|
||||
LimitPodHardAntiAffinityTopology | antiaffinity/admission.go | Defended the cluster against abusive anti-affinity topology rules that might hang the scheduler.
|
||||
DenyEscalatingExec | exec/admission.go | Prevent users from executing into pods that have higher privileges via their service account than allowed by their policy (regular users can't exec into admin pods).
|
||||
DenyExecOnPrivileged | exec/admission.go | Blanket ban exec access to pods with host level security. Superceded by DenyEscalatingExec
|
||||
DenyExecOnPrivileged | exec/admission.go | Blanket ban exec access to pods with host level security. Superseded by DenyEscalatingExec
|
||||
OwnerReferencesPermissionEnforcement | gc/gc_admission.go | Require that a user who sets a owner reference (which could result in garbage collection) has permission to delete the object, to prevent abuse.
|
||||
ImagePolicyWebhook | imagepolicy/admission.go | Invoke a remote API to determine whether an image is allowed to run on the cluster.
|
||||
PodNodeSelector | podnodeselector/admission.go | Default and limit what node selectors may be used within a namespace by reading a namespace annotation and a global configuration.
|
||||
|
@ -632,9 +632,4 @@ Some options:
|
|||
|
||||
It should be easy for a novice Kubernetes administrator to apply simple policy rules to the cluster. In
|
||||
the future it is desirable to have many such policy engines enabled via extension to enable quick policy
|
||||
customization to meet specific needs.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
customization to meet specific needs.
|
|
@ -52,8 +52,8 @@ clients can always use the proxy and don't need to know that under the hood
|
|||
multiple apiservers are running.
|
||||
|
||||
Wording note: When we say "API servers" we really mean groups of apiservers,
|
||||
since any individual apiserver is horizontally replicatable. Similarly,
|
||||
kube-aggregator itself is horizontally replicatable.
|
||||
since any individual apiserver is horizontally replicable. Similarly,
|
||||
kube-aggregator itself is horizontally replicable.
|
||||
|
||||
## Operational configurations
|
||||
|
||||
|
@ -80,9 +80,9 @@ There are two configurations in which it makes sense to run `kube-aggregator`.
|
|||
`api.mycompany.com/v1/grobinators` from different apiservers. This restriction
|
||||
allows us to limit the scope of `kube-aggregator` to a manageable level.
|
||||
* Follow API conventions: APIs exposed by every API server should adhere to [kubernetes API
|
||||
conventions](../devel/api-conventions.md).
|
||||
conventions](../../devel/api-conventions.md).
|
||||
* Support discovery API: Each API server should support the kubernetes discovery API
|
||||
(list the suported groupVersions at `/apis` and list the supported resources
|
||||
(list the supported groupVersions at `/apis` and list the supported resources
|
||||
at `/apis/<groupVersion>/`)
|
||||
* No bootstrap problem: The core kubernetes apiserver must not depend on any
|
||||
other aggregated server to come up. Non-core apiservers may use other non-core
|
||||
|
@ -140,7 +140,7 @@ complete user information, including user, groups, and "extra" for backing API s
|
|||
|
||||
Each API server is responsible for storing their resources. They can have their
|
||||
own etcd or can use kubernetes server's etcd using [third party
|
||||
resources](../design-proposals/extending-api.md#adding-custom-resources-to-the-kubernetes-api-server).
|
||||
resources](../design-proposals/api-machinery/extending-api.md#adding-custom-resources-to-the-kubernetes-api-server).
|
||||
|
||||
### Health check
|
||||
|
||||
|
@ -268,8 +268,3 @@ There were other alternatives that we had discussed.
|
|||
providing a centralised authentication and authorization service which all of
|
||||
the servers can use.
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,433 @@
|
|||
# Alternate representations of API resources
|
||||
|
||||
## Abstract
|
||||
|
||||
Naive clients benefit from allowing the server to returning resource information in a form
|
||||
that is easy to represent or is more efficient when dealing with resources in bulk. It
|
||||
should be possible to ask an API server to return a representation of one or more resources
|
||||
of the same type in a way useful for:
|
||||
|
||||
* Retrieving a subset of object metadata in a list or watch of a resource, such as the
|
||||
metadata needed by the generic Garbage Collector or the Namespace Lifecycle Controller
|
||||
* Dealing with generic operations like `Scale` correctly from a client across multiple API
|
||||
groups, versions, or servers
|
||||
* Return a simple tabular representation of an object or list of objects for naive
|
||||
web or command-line clients to display (for `kubectl get`)
|
||||
* Return a simple description of an object that can be displayed in a wide range of clients
|
||||
(for `kubectl describe`)
|
||||
* Return the object with fields set by the server cleared (as `kubectl export`) which
|
||||
is dependent on the schema, not on user input.
|
||||
|
||||
The server should allow a common mechanism for a client to request a resource be returned
|
||||
in one of a number of possible forms. In general, many of these forms are simply alternate
|
||||
versions of the existing content and are not intended to support arbitrary parameterization.
|
||||
|
||||
Also, the server today contains a number of objects which are common across multiple groups,
|
||||
but which clients must be able to deal with in a generic fashion. These objects - Status,
|
||||
ListMeta, ObjectMeta, List, ListOptions, ExportOptions, and Scale - are embedded into each
|
||||
group version but are actually part of a a shared API group. It must be possible for a naive
|
||||
client to translate the Scale response returned by two different API group versions.
|
||||
|
||||
|
||||
## Motivation
|
||||
|
||||
Currently it is difficult for a naive client (dealing only with the list of resources
|
||||
presented by API discovery) to properly handle new and extended API groups, especially
|
||||
as versions of those groups begin to evolve. It must be possible for a naive client to
|
||||
perform a set of common operations across a wide range of groups and versions and leverage
|
||||
a predictable schema.
|
||||
|
||||
We also foresee increasing difficulty in building clients that must deal with extensions -
|
||||
there are at least 6 known web-ui or CLI implementations that need to display some
|
||||
information about third party resources or additional API groups registered with a server
|
||||
without requiring each of them to change. Providing a server side implementation will
|
||||
allow clients to retrieve meaningful information for the `get` and `describe` style
|
||||
operations even for new API groups.
|
||||
|
||||
|
||||
## Implementation
|
||||
|
||||
The HTTP spec and the common REST paradigm provide mechanisms for clients to [negotiate
|
||||
alternative representations of objects (RFC2616 14.1)](http://www.w3.org/Protocols/rfc2616/rfc2616.txt)
|
||||
and for the server to correctly indicate a requested mechanism was chosen via the `Accept`
|
||||
and `Content-Type` headers. This is a standard request response protocol intended to allow
|
||||
clients to request the server choose a representation to return to the client based on the
|
||||
server's capabilities. In RESTful terminology, a representation is simply a known schema that
|
||||
the client is capable of handling - common schemas are HTML, JSON, XML, or protobuf, with the
|
||||
possibility of the client and server further refining the requested output via either query
|
||||
parameters or media type parameters.
|
||||
|
||||
In order to ensure that generic clients can properly deal with many different group versions,
|
||||
we introduce the `meta.k8s.io` group with version `v1` that grandfathers all existing resources
|
||||
currently described as "unversioned". A generic client may request that responses be applied
|
||||
in this version. The contents of a particular API group version would continue to be bound into
|
||||
other group versions (`status.v1.meta.k8s.io` would be bound as `Status` into all existing
|
||||
API groups). We would remove the `unversioned` package and properly home these resources in
|
||||
a real API group.
|
||||
|
||||
|
||||
### Considerations around choosing an implementation
|
||||
|
||||
* We wish to avoid creating new resource *locations* (URLs) for existing resources
|
||||
* New resource locations complicate access control, caching, and proxying
|
||||
* We are still retrieving the same resource, just in an alternate representation,
|
||||
which matches our current use of the protobuf, JSON, and YAML serializations
|
||||
* We do not wish to alter the mechanism for authorization - a user with access
|
||||
to a particular resource in a given namespace should be limited regardless of
|
||||
the representation in use.
|
||||
* Allowing "all namespaces" to be listed would require us to create "fake" resources
|
||||
which would complicate authorization
|
||||
* We wish to support retrieving object representations in multiple schemas - JSON for
|
||||
simple clients and Protobuf for clients concerned with efficiency.
|
||||
* Most clients will wish to retrieve a newer format, but for older servers will desire
|
||||
to fall back to the implicit resource represented by the endpoint.
|
||||
* Over time, clients may need to request results in multiple API group versions
|
||||
because of breaking changes (when we introduce v2, clients that know v2 will want
|
||||
to ask for v2, then v1)
|
||||
* The Scale resource is an example - a generic client may know v1 Scale, but when
|
||||
v2 Scale is introduced the generic client will still only request v1 Scale from
|
||||
any given resource, and the server that no longer recognizes v1 Scale must
|
||||
indicate that to the client.
|
||||
* We wish to preserve the greatest possible query parameter space for sub resources
|
||||
and special cases, which encourages us to avoid polluting the API with query
|
||||
parameters that can be otherwise represented as alternate forms.
|
||||
* We do not wish to allow deep orthogonal parameterization - a list of pods is a list
|
||||
of pods regardless of the form, and the parameters passed to the JSON representation
|
||||
should not vary significantly to the tabular representation.
|
||||
* Because we expect not all extensions will implement protobuf, an efficient client
|
||||
must continue to be able to "fall-back" to JSON, such as for third party
|
||||
resources.
|
||||
* We do not wish to create fake content-types like `application/json+kubernetes+v1+meta.k8s.io`
|
||||
because the list of combinations is unbounded and our ability to encode specific values
|
||||
(like slashes) into the value is limited.
|
||||
|
||||
### Client negotiation of response representation
|
||||
|
||||
When a client wishes to request an alternate representation of an object, it should form
|
||||
a valid `Accept` header containing one or more accepted representations, where each
|
||||
representation is represented by a media-type and [media-type parameters](https://tools.ietf.org/html/rfc6838#section-4.3).
|
||||
The server should omit representations that are unrecognized or in error - if no representations
|
||||
are left after omission the server should return a `406 Not Acceptable` HTTP response.
|
||||
|
||||
The supported parameters are:
|
||||
|
||||
| Name | Value | Default | Description |
|
||||
| ---- | ----- | ------- | ----------- |
|
||||
| g | The group name of the desired response | Current group | The group the response is expected in. |
|
||||
| v | The version of the desired response | Current version | The version the response is expected in. Note that this is separate from Group because `/` is not a valid character in Accept headers. |
|
||||
| as | Kind name | None | If specified, transform the resource into the following kind (including the group and version parameters). |
|
||||
| sv | The server group (`meta.k8s.io`) version that should be applied to generic resources returned by this endpoint | Matching server version for the current group and version | If specified, the server should transform generic responses into this version of the server API group. |
|
||||
| export | `1` | None | If specified, transform the resource prior to returning to omit defaulted fields. Additional arguments allowed in the query parameter. For legacy reasons, `?export=1` will continue to be supported on the request |
|
||||
| pretty | `0`/`1` | `1` | If specified, apply formatting to the returned response that makes the serialization readable (for JSON, use indentation) |
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
# Request a PodList in an alternate form
|
||||
GET /v1/pods
|
||||
Accept: application/json;as=Table;g=meta.k8s.io;v=v1
|
||||
|
||||
# Request a PodList in an alternate form, with pretty JSON formatting
|
||||
GET /v1/pods
|
||||
Accept: application/json;as=Table;g=meta.k8s.io;v=v1;pretty=1
|
||||
|
||||
# Request that status messages be of the form meta.k8s.io/v2 on the response
|
||||
GET /v1/pods
|
||||
Accept: application/json;sv=v2
|
||||
{
|
||||
"kind": "Status",
|
||||
"apiVersion": "meta.k8s.io/v2",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
For both export and the more complicated server side `kubectl get` cases, it's likely that
|
||||
more parameters are required and should be specified as query parameters. However, the core
|
||||
behavior is best represented as a variation on content-type. Supporting both is not limiting
|
||||
in the short term as long as we can validate correctly.
|
||||
|
||||
As a simplification for common use, we should create **media-type aliases** which may show up in lists of mime-types supported
|
||||
and simplify use for clients. For example, the following aliases would be reasonable:
|
||||
|
||||
* `application/json+vnd.kubernetes.export` would return the requested object in export form
|
||||
* `application/json+vnd.kubernetes.as+meta.k8s.io+v1+TabularOutput` would return the requested object in a tabular form
|
||||
* `text/csv` would return the requested object in a tabular form in the comma-separated-value (CSV) format
|
||||
|
||||
### Example: Partial metadata retrieval
|
||||
|
||||
The client may request to the server to return the list of namespaces as a
|
||||
`PartialObjectMetadata` kind, which is an object containing only `ObjectMeta` and
|
||||
can be serialized as protobuf or JSON. This is expected to be significantly more
|
||||
performant when controllers like the Garbage collector retrieve multiple objects.
|
||||
|
||||
GET /api/v1/namespaces
|
||||
Accept: application/json;g=meta.k8s.io,v=v1,as=PartialObjectMetadata, application/json
|
||||
|
||||
The server would respond with
|
||||
|
||||
200 OK
|
||||
Content-Type: application/json;g=meta.k8s.io,v=v1,as=PartialObjectMetadata
|
||||
{
|
||||
"apiVersion": "meta.k8s.io/v1",
|
||||
"kind": "PartialObjectMetadataList",
|
||||
"items": [
|
||||
{
|
||||
"apiVersion": "meta.k8s.io/v1",
|
||||
"kind": "PartialObjectMetadata",
|
||||
"metadata": {
|
||||
"name": "foo",
|
||||
"resourceVersion": "10",
|
||||
...
|
||||
}
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
|
||||
In this example PartialObjectMetadata is a real registered type, and each API group
|
||||
provides an efficient transformation from their schema to the partial schema directly.
|
||||
The client upon retrieving this type can act as a generic resource.
|
||||
|
||||
Note that the `as` parameter indicates to the server the Kind of the resource, but
|
||||
the Kubernetes API convention of returning a List with a known schema continues. An older
|
||||
server could ignore the presence of the `as` parameter on the media type and merely return
|
||||
a `NamespaceList` and the client would either use the content-type or the object Kind
|
||||
to distinguish. Because all responses are expected to be self-describing, an existing
|
||||
Kubernetes client would be expected to differentiate on Kind.
|
||||
|
||||
An old server, not recognizing these parameters, would respond with:
|
||||
|
||||
200 OK
|
||||
Content-Type: application/json
|
||||
{
|
||||
"apiVersion": "v1",
|
||||
"kind": "NamespaceList",
|
||||
"items": [
|
||||
{
|
||||
"apiVersion": "v1",
|
||||
"kind": "Namespace",
|
||||
"metadata": {
|
||||
"name": "foo",
|
||||
"resourceVersion": "10",
|
||||
...
|
||||
}
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
### Example: Retrieving a known version of the Scale resource
|
||||
|
||||
Each API group that supports resources that can be scaled must expose a subresource on
|
||||
their object that accepts GET or PUT with a `Scale` kind resource. This subresource acts
|
||||
as a generic interface that a client that knows nothing about the underlying object can
|
||||
use to modify the scale value of that resource. However, clients *must* be able to understand
|
||||
the response the server provides, and over time the response may change and should therefore
|
||||
be versioned. Our current API provides no way for a client to discover whether a `Scale`
|
||||
response returned by `batch/v2alpha1` is the same as the `Scale` resource returned by
|
||||
`autoscaling/v1`.
|
||||
|
||||
Under this proposal, to scale a generic resource a client would perform the following
|
||||
operations:
|
||||
|
||||
GET /api/v1/namespace/example/replicasets/test/scale
|
||||
Accept: application/json;g=meta.k8s.io,v=v1,as=Scale, application/json
|
||||
|
||||
200 OK
|
||||
Content-Type: application/json;g=meta.k8s.io,v=v1,as=Scale
|
||||
{
|
||||
"apiVersion": "meta.k8s.io/v1",
|
||||
"kind": "Scale",
|
||||
"spec": {
|
||||
"replicas": 1
|
||||
}
|
||||
...
|
||||
}
|
||||
|
||||
The client, seeing that a generic response was returned (`meta.k8s.io/v1`), knows that
|
||||
the server supports accepting that resource as well, and performs a PUT:
|
||||
|
||||
PUT /apis/extensions/v1beta1/namespace/example/replicasets/test/scale
|
||||
Accept: application/json;g=meta.k8s.io,v=v1,as=Scale, application/json
|
||||
Content-Type: application/json
|
||||
{
|
||||
"apiVersion": "meta.k8s.io/v1",
|
||||
"kind": "Scale",
|
||||
"spec": {
|
||||
"replicas": 2
|
||||
}
|
||||
}
|
||||
|
||||
200 OK
|
||||
Content-Type: application/json;g=meta.k8s.io,v=v1,as=Scale
|
||||
{
|
||||
"apiVersion": "meta.k8s.io/v1",
|
||||
"kind": "Scale",
|
||||
"spec": {
|
||||
"replicas": 2
|
||||
}
|
||||
...
|
||||
}
|
||||
|
||||
Note that the client still asks for the common Scale as the response so that it
|
||||
can access the value it wants.
|
||||
|
||||
|
||||
### Example: Retrieving an alternative representation of the resource for use in `kubectl get`
|
||||
|
||||
As new extension groups are added to the server, all clients must implement simple "view" logic
|
||||
for each resource. However, these views are specific to the resource in question, which only
|
||||
the server is aware of. To make clients more tolerant of extension and third party resources,
|
||||
it should be possible for clients to ask the server to present a resource or list of resources
|
||||
in a tabular / descriptive format rather than raw JSON.
|
||||
|
||||
While the design of serverside tabular support is outside the scope of this proposal, a few
|
||||
knows apply. The server must return a structured resource usable by both command line and
|
||||
rich clients (web or IDE), which implies a schema, which implies JSON, and which means the
|
||||
server should return a known Kind. For this example we will call that kind `TabularOutput`
|
||||
to demonstrate the concept.
|
||||
|
||||
A server side resource would implement a transformation from their resource to `TabularOutput`
|
||||
and the API machinery would translate a single item or a list of items (or a watch) into
|
||||
the tabular resource.
|
||||
|
||||
A generic client wishing to display a tabular list for resources of type `v1.ReplicaSets` would
|
||||
make the following call:
|
||||
|
||||
GET /api/v1/namespaces/example/replicasets
|
||||
Accept: application/json;g=meta.k8s.io,v=v1,as=TabularOutput, application/json
|
||||
|
||||
200 OK
|
||||
Content-Type: application/json;g=meta.k8s.io,v=v1,as=TabularOutput
|
||||
{
|
||||
"apiVersion": "meta.k8s.io/v1",
|
||||
"kind": "TabularOutput",
|
||||
"columns": [
|
||||
{"name": "Name", "description": "The name of the resource"},
|
||||
{"name": "Resource Version", "description": "The version of the resource"},
|
||||
...
|
||||
],
|
||||
"items": [
|
||||
{"columns": ["name", "10", ...]},
|
||||
...
|
||||
]
|
||||
}
|
||||
|
||||
The client can then present that information as necessary. If the server returns the
|
||||
resource list `v1.ReplicaSetList` the client knows that the server does not support tabular
|
||||
output and so must fall back to a generic output form (perhaps using the existing
|
||||
compiled in listers).
|
||||
|
||||
Note that `kubectl get` supports a number of parameters for modifying the response,
|
||||
including whether to filter resources, whether to show a "wide" list, or whether to
|
||||
turn certain labels into columns. Those options are best represented as query parameters
|
||||
and transformed into a known type.
|
||||
|
||||
|
||||
### Example: Versioning a ListOptions call to a generic API server
|
||||
|
||||
When retrieving lists of resources, the server transforms input query parameters like
|
||||
`labels` and `fields` into a `ListOptions` type. It should be possible for a generic
|
||||
client dealing with the server to be able to specify the version of ListOptions it
|
||||
is sending to detect version skew.
|
||||
|
||||
Since this is an input and list is implemented with GET, it is not possible to send
|
||||
a body and no Content-Type is possible. For this approach, we recommend that the kind
|
||||
and API version be specifiable via the GET call for further clarification:
|
||||
|
||||
New query parameters:
|
||||
|
||||
| Name | Value | Default | Description |
|
||||
| ---- | ----- | ------- | ----------- |
|
||||
| kind | The kind of parameters being sent | `ListOptions` (GET), `DeleteOptions` (DELETE) | The kind of the serialized struct, defaults to ListOptions on GET and DeleteOptions on DELETE. |
|
||||
| queryVersion / apiVersion | The API version of the parameter struct | `meta.k8s.io/v1` | May be altered to match the expected version. Because we have not yet versioned ListOptions, this is safe to alter. |
|
||||
|
||||
To send ListOptions in the v2 future format, where the serialization of `resourceVersion`
|
||||
is changed to `rv`, clients would provide:
|
||||
|
||||
GET /api/v1/namespaces/example/replicasets?apiVersion=meta.k8s.io/v2&rv=10
|
||||
|
||||
Before we introduce a second API group version, we would have to ensure old servers
|
||||
properly reject apiVersions they do not understand.
|
||||
|
||||
|
||||
### Impact on web infrastructure
|
||||
|
||||
In the past, web infrastructure and old browsers have coped poorly with the `Accept`
|
||||
header. However, most modern caching infrastructure properly supports `Vary: Accept`
|
||||
and caching of responses has not been a significant requirement for Kubernetes APIs
|
||||
to this point.
|
||||
|
||||
|
||||
### Considerations for discoverability
|
||||
|
||||
To ensure clients can discover these endpoints, the Swagger and OpenAPI documents
|
||||
should also include a set of example mime-types for each endpoint that are supported.
|
||||
Specifically, the `produces` field on an individual operation can be used to list a
|
||||
set of well known types. The description of the operation can include a stanza about
|
||||
retrieving alternate representations.
|
||||
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
* Implement only with query parameters
|
||||
|
||||
To properly implement alternative resource versions must support multiple version
|
||||
support (ask for v2, then v1). The Accept mechanism already handles this sort of
|
||||
multi-version negotiation, while any approach based on query parameters would
|
||||
have to implement this option as well. In addition, some serializations may not
|
||||
be valid in all content types, so the client asking for TabularOutput in protobuf
|
||||
may also ask for TabularOutput in JSON - if TabularOutput is not valid in protobuf
|
||||
the server call fall back to JSON.
|
||||
|
||||
* Use new resource paths - `/apis/autoscaling/v1/namespaces/example/horizontalpodautoscalermetadata`
|
||||
|
||||
This leads to a proliferation of paths which will confuse automated tools and end
|
||||
users. Authorization, logging, audit may all need a way to map the two resources
|
||||
as equivalent, while clients would need a discovery mechanism that identifies a
|
||||
"same underlying object" relationship that is different from subresources.
|
||||
|
||||
* Use a special HTTP header to denote the alternative representation
|
||||
|
||||
Given the need to support multiple versions, this would be reimplementing Accept
|
||||
in a slightly different way, so we prefer to reuse Accept.
|
||||
|
||||
* For partial object retrieval, support complex field selectors
|
||||
|
||||
From an efficiency perspective, calculating subpaths and filtering out sub fields
|
||||
from the underlying object is complex. In practice, almost all filtering falls into
|
||||
a few limited subsets, and thus retrieving an object into a few known schemas can be made
|
||||
much more efficient. In addition, arbitrary transformation of the object provides
|
||||
opportunities for supporting forward "partial" migration - for instance, returning a
|
||||
ReplicationController as a ReplicaSet to simplify a transition across resource types.
|
||||
While this is not under explicit consideration, allowing a caller to move objects across
|
||||
schemas will eventually be a required behavior when dramatic changes occur in an API
|
||||
schema.
|
||||
|
||||
## Backwards Compatibility
|
||||
|
||||
### Old clients
|
||||
|
||||
Old clients would not be affected by the new Accept path.
|
||||
|
||||
If servers begin returning Status in version `meta.k8s.io/v1`, old clients would likely error
|
||||
as that group has never been used. We would continue to return the group version of the calling
|
||||
API group on server responses unless the `sv` mime-type parameter is set.
|
||||
|
||||
|
||||
### Old servers
|
||||
|
||||
Because old Kubernetes servers are not selective about the content type parameters they
|
||||
accept, we may wish to patch server versions to explicitly bypass content
|
||||
types they do not recognize the parameters to. As a special consideration, this would allow
|
||||
new clients to more strictly handle Accept (so that the server returns errors if the content
|
||||
type is not recognized).
|
||||
|
||||
As part of introducing the new API group `meta.k8s.io`, some opaque calls where we assume the
|
||||
empty API group-version for the resource (GET parameters) could be defaulted to this group.
|
||||
|
||||
|
||||
## Future items
|
||||
|
||||
* ???
|
|
@ -0,0 +1,177 @@
|
|||
# Allow clients to retrieve consistent API lists in chunks
|
||||
|
||||
On large clusters, performing API queries that return all of the objects of a given resource type (GET /api/v1/pods, GET /api/v1/secrets) can lead to significant variations in peak memory use on the server and contribute substantially to long tail request latency.
|
||||
|
||||
When loading very large sets of objects -- some clusters are now reaching 100k pods or equivalent numbers of supporting resources -- the system must:
|
||||
|
||||
* Construct the full range description in etcd in memory and serialize it as protobuf in the client
|
||||
* Some clusters have reported over 500MB being stored in a single object type
|
||||
* This data is read from the underlying datastore and converted to a protobuf response
|
||||
* Large reads to etcd can block writes to the same range (https://github.com/coreos/etcd/issues/7719)
|
||||
* The data from etcd has to be transferred to the apiserver in one large chunk
|
||||
* The `kube-apiserver` also has to deserialize that response into a single object, and then re-serialize it back to the client
|
||||
* Much of the decoded etcd memory is copied into the struct used to serialize to the client
|
||||
* An API client like `kubectl get` will then decode the response from JSON or protobuf
|
||||
* An API client with a slow connection may not be able to receive the entire response body within the default 60s timeout
|
||||
* This may cause other failures downstream of that API client with their own timeouts
|
||||
* The recently introduced client compression feature can assist
|
||||
* The large response will also be loaded entirely into memory
|
||||
|
||||
The standard solution for reducing the impact of large reads is to allow them to be broken into smaller reads via a technique commonly referred to as paging or chunking. By efficiently splitting large list ranges from etcd to clients into many smaller list ranges, we can reduce the peak memory allocation on etcd and the apiserver, without losing the consistent read invariant our clients depend on.
|
||||
|
||||
This proposal does not cover general purpose ranging or paging for arbitrary clients, such as allowing web user interfaces to offer paged output, but does define some parameters for future extension. To that end, this proposal uses the phrase "chunking" to describe retrieving a consistent snapshot range read from the API server in distinct pieces.
|
||||
|
||||
Our primary consistent store etcd3 offers support for efficient chunking with minimal overhead, and mechanisms exist for other potential future stores such as SQL databases or Consul to also implement a simple form of consistent chunking.
|
||||
|
||||
Relevant issues:
|
||||
|
||||
* https://github.com/kubernetes/kubernetes/issues/2349
|
||||
|
||||
## Terminology
|
||||
|
||||
**Consistent list** - A snapshot of all resources at a particular moment in time that has a single `resourceVersion` that clients can begin watching from to receive updates. All Kubernetes controllers depend on this semantic. Allows a controller to refresh its internal state, and then receive a stream of changes from the initial state.
|
||||
|
||||
**API paging** - API parameters designed to allow a human to view results in a series of "pages".
|
||||
|
||||
**API chunking** - API parameters designed to allow a client to break one large request into multiple smaller requests without changing the semantics of the original request.
|
||||
|
||||
|
||||
## Proposed change:
|
||||
|
||||
Expose a simple chunking mechanism to allow large API responses to be broken into consistent partial responses. Clients would indicate a tolerance for chunking (opt-in) by specifying a desired maximum number of results to return in a `LIST` call. The server would return up to that amount of objects, and if more exist it would return a `continue` parameter that the client could pass to receive the next set of results. The server would be allowed to ignore the limit if it does not implement limiting (backward compatible), but it is not allowed to support limiting without supporting a way to continue the query past the limit (may not implement `limit` without `continue`).
|
||||
|
||||
```
|
||||
GET /api/v1/pods?limit=500
|
||||
{
|
||||
"metadata": {"continue": "ABC...", "resourceVersion": "147"},
|
||||
"items": [
|
||||
// no more than 500 items
|
||||
]
|
||||
}
|
||||
GET /api/v1/pods?limit=500&continue=ABC...
|
||||
{
|
||||
"metadata": {"continue": "DEF...", "resourceVersion": "147"},
|
||||
"items": [
|
||||
// no more than 500 items
|
||||
]
|
||||
}
|
||||
GET /api/v1/pods?limit=500&continue=DEF...
|
||||
{
|
||||
"metadata": {"resourceVersion": "147"},
|
||||
"items": [
|
||||
// no more than 500 items
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The token returned by the server for `continue` would be an opaque serialized string that would contain a simple serialization of a version identifier (to allow future extension), and any additional data needed by the server storage to identify where to start the next range.
|
||||
|
||||
The continue token is not required to encode other filtering parameters present on the initial request, and clients may alter their filter parameters on subsequent chunk reads. However, the server implementation **may** reject such changes with a `400 Bad Request` error, and clients should consider this behavior undefined and left to future clarification. Chunking is intended to return consistent lists, and clients **should not** alter their filter parameters on subsequent chunk reads.
|
||||
|
||||
If the resource version parameter specified on the request is inconsistent with the `continue` token, the server **must** reject the request with a `400 Bad Request` error.
|
||||
|
||||
The schema of the continue token is chosen by the storage layer and is not guaranteed to remain consistent for clients - clients **must** consider the continue token as opaque. Server implementations **should** ensure that continue tokens can persist across server restarts and across upgrades.
|
||||
|
||||
Servers **may** return fewer results than `limit` if server side filtering returns no results such as when a `label` or `field` selector is used. If the entire result set is filtered, the server **may** return zero results with a valid `continue` token. A client **must** use the presence of a `continue` token in the response to determine whether more results are available, regardless of the number of results returned. A server that supports limits **must not** return more results than `limit` if a `continue` token is also returned. If the server does not return a `continue` token, the server **must** return all remaining results. The server **may** return zero results with no `continue` token on the last call.
|
||||
|
||||
The server **may** limit the amount of time a continue token is valid for. Clients **should** assume continue tokens last only a few minutes.
|
||||
|
||||
The server **must** support `continue` tokens that are valid across multiple API servers. The server **must** support a mechanism for rolling restart such that continue tokens are valid after one or all API servers have been restarted.
|
||||
|
||||
|
||||
### Proposed Implementations
|
||||
|
||||
etcd3 is the primary Kubernetes store and has been designed to support consistent range reads in chunks for this use case. The etcd3 store is an ordered map of keys to values, and Kubernetes places all keys within a resource type under a common prefix, with namespaces being a further prefix of those keys. A read of all keys within a resource type is an in-order scan of the etcd3 map, and therefore we can retrieve in chunks by defining a start key for the next chunk that skips the last key read.
|
||||
|
||||
etcd2 will not be supported as it has no option to perform a consistent read and is on track to be deprecated in Kubernetes. Other databases that might back Kubernetes could either choose to not implement limiting, or leverage their own transactional characteristics to return a consistent list. In the near term our primary store remains etcd3 which can provide this capability at low complexity.
|
||||
|
||||
Implementations that cannot offer consistent ranging (returning a set of results that are logically equivalent to receiving all results in one response) must not allow continuation, because consistent listing is a requirement of the Kubernetes API list and watch pattern.
|
||||
|
||||
#### etcd3
|
||||
|
||||
For etcd3 the continue token would contain a resource version (the snapshot that we are reading that is consistent across the entire LIST) and the start key for the next set of results. Upon receiving a valid continue token the apiserver would instruct etcd3 to retrieve the set of results at a given resource version, beginning at the provided start key, limited by the maximum number of requests provided by the continue token (or optionally, by a different limit specified by the client). If more results remain after reading up to the limit, the storage should calculate a continue token that would begin at the next possible key, and the continue token set on the returned list.
|
||||
|
||||
The storage layer in the apiserver must apply consistency checking to the provided continue token to ensure that malicious users cannot trick the server into serving results outside of its range. The storage layer must perform defensive checking on the provided value, check for path traversal attacks, and have stable versioning for the continue token.
|
||||
|
||||
#### Possible SQL database implementation
|
||||
|
||||
A SQL database backing a Kubernetes server would need to implement a consistent snapshot read of an entire resource type, plus support changefeed style updates in order to implement the WATCH primitive. A likely implementation in SQL would be a table that stores multiple versions of each object, ordered by key and version, and filters out all historical versions of an object. A consistent paged list over such a table might be similar to:
|
||||
|
||||
SELECT * FROM resource_type WHERE resourceVersion < ? AND deleted = false AND namespace > ? AND name > ? LIMIT ? ORDER BY namespace, name ASC
|
||||
|
||||
where `namespace` and `name` are part of the continuation token and an index exists over `(namespace, name, resourceVersion, deleted)` that makes the range query performant. The highest returned resource version row for each `(namespace, name)` tuple would be returned.
|
||||
|
||||
|
||||
### Security implications of returning last or next key in the continue token
|
||||
|
||||
If the continue token encodes the next key in the range, that key may expose info that is considered security sensitive, whether simply the name or namespace of resources not under the current tenant's control, or more seriously the name of a resource which is also a shared secret (for example, an access token stored as a kubernetes resource). There are a number of approaches to mitigating this impact:
|
||||
|
||||
1. Disable chunking on specific resources
|
||||
2. Disable chunking when the user does not have permission to view all resources within a range
|
||||
3. Encrypt the next key or the continue token using a shared secret across all API servers
|
||||
4. When chunking, continue reading until the next visible start key is located after filtering, so that start keys are always keys the user has access to.
|
||||
|
||||
In the short term we have no supported subset filtering (i.e. a user who can LIST can also LIST ?fields= and vice versa), so 1 is sufficient to address the sensitive key name issue. Because clients are required to proceed as if limiting is not possible, the server is always free to ignore a chunked request for other reasons. In the future, 4 may be the best option because we assume that most users starting a consistent read intend to finish it, unlike more general user interface paging where only a small fraction of requests continue to the next page.
|
||||
|
||||
|
||||
### Handling expired resource versions
|
||||
|
||||
If the required data to perform a consistent list is no longer available in the storage backend (by default, old versions of objects in etcd3 are removed after 5 minutes), the server **must** return a `410 Gone ResourceExpired` status response (the same as for watch), which means clients must start from the beginning.
|
||||
|
||||
```
|
||||
# resourceVersion is expired
|
||||
GET /api/v1/pods?limit=500&continue=DEF...
|
||||
{
|
||||
"kind": "Status",
|
||||
"code": 410,
|
||||
"reason": "ResourceExpired"
|
||||
}
|
||||
```
|
||||
|
||||
Some clients may wish to follow a failed paged list with a full list attempt.
|
||||
|
||||
The 5 minute default compaction interval for etcd3 bounds how long a list can run. Since clients may wish to perform processing over very large sets, increasing that timeout may make sense for large clusters. It should be possible to alter the interval at which compaction runs to accomodate larger clusters.
|
||||
|
||||
|
||||
#### Types of clients and impact
|
||||
|
||||
Some clients such as controllers, receiving a 410 error, may instead wish to perform a full LIST without chunking.
|
||||
|
||||
* Controllers with full caches
|
||||
* Any controller with a full in-memory cache of one or more resources almost certainly depends on having a consistent view of resources, and so will either need to perform a full list or a paged list, without dropping results
|
||||
* `kubectl get`
|
||||
* Most administrators would probably prefer to see a very large set with some inconsistency rather than no results (due to a timeout under load). They would likely be ok with handling `410 ResourceExpired` as "continue from the last key I processed"
|
||||
* Migration style commands
|
||||
* Assuming a migration command has to run on the full data set (to upgrade a resource from json to protobuf, or to check a large set of resources for errors) and is performing some expensive calculation on each, very large sets may not complete over the server expiration window.
|
||||
|
||||
For clients that do not care about consistency, the server **may** return a `continue` value on the `ResourceExpired` error that allows the client to restart from the same prefix key, but using the latest resource version. This would allow clients that do not require a fully consistent LIST to opt in to partially consistent LISTs but still be able to scan the entire working set. It is likely this could be a sub field (opaque data) of the `Status` response under `statusDetails`.
|
||||
|
||||
|
||||
### Rate limiting
|
||||
|
||||
Since the goal is to reduce spikiness of load, the standard API rate limiter might prefer to rate limit page requests differently from global lists, allowing full LISTs only slowly while smaller pages can proceed more quickly.
|
||||
|
||||
|
||||
### Chunk by default?
|
||||
|
||||
On a very large data set, chunking trades total memory allocated in etcd, the apiserver, and the client for higher overhead per request (request/response processing, authentication, authorization). Picking a sufficiently high chunk value like 500 or 1000 would not impact smaller clusters, but would reduce the peak memory load of a very large cluster (10k resources and up). In testing, no significant overhead was shown in etcd3 for a paged historical query which is expected since the etcd3 store is an MVCC store and must always filter some values to serve a list.
|
||||
|
||||
For clients that must perform sequential processing of lists (kubectl get, migration commands) this change dramatically improves initial latency - clients got their first chunk of data in milliseconds, rather than seconds for the full set. It also improves user experience for web consoles that may be accessed by administrators with access to large parts of the system.
|
||||
|
||||
It is recommended that most clients attempt to page by default at a large page size (500 or 1000) and gracefully degrade to not chunking.
|
||||
|
||||
|
||||
### Other solutions
|
||||
|
||||
Compression from the apiserver and between the apiserver and etcd can reduce total network bandwidth, but cannot reduce the peak CPU and memory used inside the client, apiserver, or etcd processes.
|
||||
|
||||
Various optimizations exist that can and should be applied to minimizing the amount of data that is transferred from etcd to the client or number of allocations made in each location, but do not how response size scales with number of entries.
|
||||
|
||||
|
||||
## Plan
|
||||
|
||||
The initial chunking implementation would focus on consistent listing on server and client as well as measuring the impact of chunking on total system load, since chunking will slightly increase the cost to view large data sets because of the additional per page processing. The initial implementation should make the fewest assumptions possible in constraining future backend storage.
|
||||
|
||||
For the initial alpha release, chunking would be behind a feature flag and attempts to provide the `continue` or `limit` flags should be ignored. While disabled, a `continue` token should never be returned by the server as part of a list.
|
||||
|
||||
Future work might offer more options for clients to page in an inconsistent fashion, or allow clients to directly specify the parts of the namespace / name keyspace they wish to range over (paging).
|
|
@ -113,7 +113,3 @@ To expose a list of the supported Openshift groups to clients, OpenShift just ha
|
|||
## Future work
|
||||
|
||||
1. Dependencies between groups: we need an interface to register the dependencies between groups. It is not our priority now as the use cases are not clear yet.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,80 @@
|
|||
# Build some Admission Controllers into the Generic API server library
|
||||
|
||||
**Related PR:**
|
||||
|
||||
| Topic | Link |
|
||||
| ----- | ---- |
|
||||
| Admission Control | https://git.k8s.io/community/contributors/design-proposals/api-machinery/admission_control.md |
|
||||
|
||||
## Introduction
|
||||
|
||||
An admission controller is a piece of code that intercepts requests to the Kubernetes API - think a middleware.
|
||||
The API server lets you have a whole chain of them. Each is run in sequence before a request is accepted
|
||||
into the cluster. If any of the plugins in the sequence rejects the request, the entire request is rejected
|
||||
immediately and an error is returned to the user.
|
||||
|
||||
Many features in Kubernetes require an admission control plugin to be enabled in order to properly support the feature.
|
||||
In fact in the [documentation](https://kubernetes.io/docs/admin/admission-controllers/#is-there-a-recommended-set-of-plug-ins-to-use) you will find
|
||||
a recommended set of them to use.
|
||||
|
||||
At the moment admission controllers are implemented as plugins and they have to be compiled into the
|
||||
final binary in order to be used at a later time. Some even require an access to cache, an authorizer etc.
|
||||
This is where an admission plugin initializer kicks in. An admission plugin initializer is used to pass additional
|
||||
configuration and runtime references to a cache, a client and an authorizer.
|
||||
|
||||
To streamline the process of adding new plugins especially for aggregated API servers we would like to build some plugins
|
||||
into the generic API server library and provide a plugin initializer. While anyone can author and register one, having a known set of
|
||||
provided references let's people focus on what they need their admission plugin to do instead of paying attention to wiring.
|
||||
|
||||
## Implementation
|
||||
|
||||
The first step would involve creating a "standard" plugin initializer that would be part of the
|
||||
generic API server. It would use kubeconfig to populate
|
||||
[external clients](https://git.k8s.io/kubernetes/pkg/kubeapiserver/admission/initializer.go#L29)
|
||||
and [external informers](https://git.k8s.io/kubernetes/pkg/kubeapiserver/admission/initializer.go#L35).
|
||||
By default for servers that would be run on the kubernetes cluster in-cluster config would be used.
|
||||
The standard initializer would also provide a client config for connecting to the core kube-apiserver.
|
||||
Some API servers might be started as static pods, which don't have in-cluster configs.
|
||||
In that case the config could be easily populated form the file.
|
||||
|
||||
The second step would be to move some plugins from [admission pkg](https://git.k8s.io/kubernetes/plugin/pkg/admission)
|
||||
to the generic API server library. Some admission plugins are used to ensure consistent user expectations.
|
||||
These plugins should be moved. One example is the Namespace Lifecycle plugin which prevents users
|
||||
from creating resources in non-existent namespaces.
|
||||
|
||||
*Note*:
|
||||
For loading in-cluster configuration [visit](https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/examples/in-cluster-client-configuration/main.go)
|
||||
For loading the configuration directly from a file [visit](https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/examples/out-of-cluster-client-configuration/main.go)
|
||||
|
||||
## How to add an admission plugin ?
|
||||
At this point adding an admission plugin is very simple and boils down to performing the
|
||||
following series of steps:
|
||||
1. Write an admission plugin
|
||||
2. Register the plugin
|
||||
3. Reference the plugin in the admission chain
|
||||
|
||||
## An example
|
||||
The sample apiserver provides an example admission plugin that makes meaningful use of the "standard" plugin initializer.
|
||||
The admission plugin ensures that a resource name is not on the list of banned names.
|
||||
The source code of the plugin can be found [here](https://github.com/kubernetes/kubernetes/blob/2f00e6d72c9d58fe3edc3488a91948cf4bfcc6d9/staging/src/k8s.io/sample-apiserver/pkg/admission/plugin/banflunder/admission.go).
|
||||
|
||||
Having the plugin, the next step is the registration. [AdmissionOptions](https://github.com/kubernetes/kubernetes/blob/2f00e6d72c9d58fe3edc3488a91948cf4bfcc6d9/staging/src/k8s.io/apiserver/pkg/server/options/admission.go)
|
||||
provides two important things. Firstly it exposes [a register](https://github.com/kubernetes/kubernetes/blob/2f00e6d72c9d58fe3edc3488a91948cf4bfcc6d9/staging/src/k8s.io/apiserver/pkg/server/options/admission.go#L43)
|
||||
under which all admission plugins are registered. In fact, that's exactly what the [Register](https://github.com/kubernetes/kubernetes/blob/2f00e6d72c9d58fe3edc3488a91948cf4bfcc6d9/staging/src/k8s.io/sample-apiserver/pkg/admission/plugin/banflunder/admission.go#L33)
|
||||
method does from our example admission plugin. It accepts a global registry as a parameter and then simply registers itself in that registry.
|
||||
Secondly, it adds an admission chain to the server configuration via [ApplyTo](https://github.com/kubernetes/kubernetes/blob/2f00e6d72c9d58fe3edc3488a91948cf4bfcc6d9/staging/src/k8s.io/apiserver/pkg/server/options/admission.go#L66) method.
|
||||
The method accepts optional parameters in the form of `pluginInitalizers`. This is useful when admission plugins need custom configuration that is not provided by the generic initializer.
|
||||
|
||||
The following code has been extracted from the sample server and illustrates how to register and wire an admission plugin:
|
||||
|
||||
```go
|
||||
// register admission plugins
|
||||
banflunder.Register(o.Admission.Plugins)
|
||||
|
||||
// create custom plugin initializer
|
||||
informerFactory := informers.NewSharedInformerFactory(client, serverConfig.LoopbackClientConfig.Timeout)
|
||||
admissionInitializer, _ := wardleinitializer.New(informerFactory)
|
||||
|
||||
// add admission chain to the server configuration
|
||||
o.Admission.ApplyTo(serverConfig, admissionInitializer)
|
||||
```
|
|
@ -0,0 +1,86 @@
|
|||
# apiserver-count fix proposal
|
||||
|
||||
Authors: @rphillips
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Known Issues](#known-issues)
|
||||
3. [Proposal](#proposal)
|
||||
4. [Alternate Proposals](#alternate-proposals)
|
||||
1. [Custom Resource Definitions](#custom-resource-definitions)
|
||||
2. [Refactor Old Reconciler](#refactor-old-reconciler)
|
||||
|
||||
## Overview
|
||||
|
||||
Proposal to fix Issue [#22609](https://github.com/kubernetes/kubernetes/issues/22609)
|
||||
|
||||
`kube-apiserver` currently has a command-line argument `--apiserver-count`
|
||||
specifying the number of api servers. This masterCount is used in the
|
||||
MasterCountEndpointReconciler on a 10 second interval to potentially cleanup
|
||||
stale API Endpoints. The issue is when the number of kube-apiserver instances
|
||||
gets below or above the masterCount. If the below case happens, the stale
|
||||
instances within the Endpoints does not get cleaned up, or in the latter case
|
||||
the endpoints start to flap.
|
||||
|
||||
## Known Issues
|
||||
|
||||
Each apiserver’s reconciler only cleans up for it's own IP. If a new
|
||||
server is spun up at a new IP, then the old IP in the Endpoints list is
|
||||
only reclaimed if the number of apiservers becomes greater-than or equal
|
||||
to the masterCount. For example:
|
||||
|
||||
* If the masterCount = 3, and there are 3 API servers running (named: A, B, and C)
|
||||
* ‘B’ API server is terminated for any reason
|
||||
* The IP for endpoint ‘B’ is not
|
||||
removed from the Endpoints list
|
||||
|
||||
There is logic within the
|
||||
[MasterCountEndpointReconciler](https://github.com/kubernetes/kubernetes/blob/68814c0203c4b8abe59812b1093844a1f9bdac05/pkg/master/controller.go#L293)
|
||||
to attempt to make the Endpoints eventually consistent, but the code relies on
|
||||
the Endpoints count becoming equal to or greater than masterCount. When the
|
||||
apiservers become greater than the masterCount the Endpoints tend to flap.
|
||||
|
||||
If the number endpoints were scaled down from automation, then the
|
||||
Endpoints would never become consistent.
|
||||
|
||||
## Proposal
|
||||
|
||||
### Create New Reconciler
|
||||
|
||||
| Kubernetes Release | Quality | Description |
|
||||
| ------------- | ------------- | ----------- |
|
||||
| 1.9 | alpha | <ul><li>Add a new reconciler</li><li>Add a command-line type `--alpha-apiserver-endpoint-reconciler-type`<ul><li>storage</li><li>default</li></ul></li></ul>
|
||||
| 1.10 | beta | <ul><li>Turn on the `storage` type by default</li></ul>
|
||||
| 1.11 | stable | <ul><li>Remove code for old reconciler</li><li>Remove --apiserver-count</li></ul>
|
||||
|
||||
The MasterCountEndpointReconciler does not meet the current needs for durability
|
||||
of API Endpoint creation, deletion, or failure cases.
|
||||
|
||||
Custom Resource Definitions were proposed, but they do not have clean layering.
|
||||
Additionally, liveness and locking would be a nice to have feature for a long
|
||||
term solution.
|
||||
|
||||
ConfigMaps were proposed, but since they are watched globally, liveliness
|
||||
updates could be overly chatty.
|
||||
|
||||
By porting OpenShift's
|
||||
[LeaseEndpointReconciler](https://github.com/openshift/origin/blob/master/pkg/cmd/server/election/lease_endpoint_reconciler.go)
|
||||
to Kubernetes we can use use the Storage API directly to store Endpoints
|
||||
dynamically within the system.
|
||||
|
||||
### Alternate Proposals
|
||||
|
||||
#### Custom Resource Definitions and ConfigMaps
|
||||
|
||||
CRD's and ConfigMaps were considered for this proposal. They were not adopted
|
||||
for this proposal by the community due to technical issues explained earlier.
|
||||
|
||||
#### Refactor Old Reconciler
|
||||
|
||||
| Release | Quality | Description |
|
||||
| ------- | ------- | ------------------------------------------------------------ |
|
||||
| 1.9 | stable | Change the logic in the current reconciler
|
||||
|
||||
We could potentially reuse the old reconciler by changing the reconciler to count
|
||||
the endpoints and set the `masterCount` (with a RWLock) to the count.
|
|
@ -132,13 +132,8 @@ the same time, we can introduce an additional etcd event type: EtcdResync
|
|||
Thus, we need to create the EtcdResync event, extend watch.Interface and
|
||||
its implementations to support it and handle those events appropriately
|
||||
in places like
|
||||
[Reflector](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reflector.go)
|
||||
[Reflector](https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/tools/cache/reflector.go)
|
||||
|
||||
However, this might turn out to be unnecessary optimization if apiserver
|
||||
will always keep up (which is possible in the new design). We will work
|
||||
out all necessary details at that point.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -80,7 +80,7 @@ When implementing audit logging there are basically two options:
|
|||
1. put a logging proxy in front of the apiserver
|
||||
2. integrate audit logging into the apiserver itself
|
||||
|
||||
Both approaches have advantages and disadvanteges:
|
||||
Both approaches have advantages and disadvantages:
|
||||
- **pro proxy**:
|
||||
+ keeps complexity out of the apiserver
|
||||
+ reuses existing solutions
|
||||
|
@ -94,7 +94,7 @@ In the following, the second approach is described without a proxy. At which po
|
|||
1. as one of the REST handlers (as in [#27087](https://github.com/kubernetes/kubernetes/pull/27087)),
|
||||
2. as an admission controller.
|
||||
|
||||
The former approach (currently implemented) was picked over the other one, due to the need to be able to get information about both the user submitting the request and the impersonated user (and group), which is being overridden inside the [impersonation filter](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go). Additionally admission controller does not have access to the response and runs after authorization which will prevent logging failed authorization. All of that resulted in continuing the solution started in [#27087](https://github.com/kubernetes/kubernetes/pull/27087), which implements auditing as one of the REST handlers
|
||||
The former approach (currently implemented) was picked over the other one, due to the need to be able to get information about both the user submitting the request and the impersonated user (and group), which is being overridden inside the [impersonation filter](https://git.k8s.io/kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go). Additionally admission controller does not have access to the response and runs after authorization which will prevent logging failed authorization. All of that resulted in continuing the solution started in [#27087](https://github.com/kubernetes/kubernetes/pull/27087), which implements auditing as one of the REST handlers
|
||||
after authentication, but before impersonation and authorization.
|
||||
|
||||
## Proposed Design
|
||||
|
@ -374,7 +374,3 @@ Below are the possible future extensions to the auditing mechanism:
|
|||
* Define how filters work. They should enable dropping sensitive fields from the request/response/storage objects.
|
||||
* Allow setting a unique identifier which allows matching audit events across apiserver and federated servers.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -11,7 +11,7 @@ discussions see https://github.com/kubernetes/kubernetes/issues/40476.
|
|||
of the system, by significantly reducing amount of api calls coming from
|
||||
kubelets. As of now, to avoid situation that kubelet is watching all secrets/
|
||||
configmaps/... in the system, it is not using watch for this purpose. Instead of
|
||||
that, it is retrieving indidual objects, by sending individual GET requests.
|
||||
that, it is retrieving individual objects, by sending individual GET requests.
|
||||
However, to enable automatic updates of mounted secrets/configmaps/..., Kubelet
|
||||
is sending those GET requests periodically. In large clusters, this is
|
||||
generating huge unnecessary load, as this load in principle should be
|
||||
|
@ -64,7 +64,7 @@ API (by lists I mean e.g. `PodList` object)
|
|||
- the API has to be implemented also in aggregator so that bulk operations
|
||||
are supported also if different resource types are served by different
|
||||
apiservers
|
||||
- clients has to be able to alter their watch subscribtions incrementally (it
|
||||
- clients has to be able to alter their watch subscriptions incrementally (it
|
||||
may not be implemented in the initial version though, but has to be designed)
|
||||
|
||||
|
||||
|
@ -76,7 +76,7 @@ call). Spanning multiple resources, resource types or conditions will be more
|
|||
and more important for large number of watches. As an example, federation will
|
||||
be adding watches for every type it federates. With that in mind, bypassing
|
||||
aggregation at the resource type level and going to aggregation over objects
|
||||
with different resource types will allow us to more aggresively optimize in the
|
||||
with different resource types will allow us to more aggressively optimize in the
|
||||
future (it doesn't mean you have to watch resources of different types in a
|
||||
single watch, but we would like to make it possible).
|
||||
|
||||
|
@ -124,8 +124,8 @@ websocket /apis/bulk.k8s.io/v1/bulkgetoperations?watch=1
|
|||
handling LIST requests, where first client sends a filter definition over the
|
||||
channel and then server sends back the response, but we dropped this for now.*
|
||||
|
||||
*Note: We aso considered implementing the POST-based watch handler that doesn't
|
||||
allow for altering subsriptions, which should be very simple once we have list
|
||||
*Note: We also considered implementing the POST-based watch handler that doesn't
|
||||
allow for altering subscriptions, which should be very simple once we have list
|
||||
implemented. But since websocket API is needed anyway, we also dropped it.*
|
||||
|
||||
|
||||
|
@ -173,14 +173,14 @@ will be described together with dynamic watch description below.
|
|||
### Dynamic watch
|
||||
|
||||
As mentioned in the Proposal section, we will implement bulk watch that will
|
||||
allow for dynamic subscribtion/unsubscribtion for (sets of) objects on top of
|
||||
allow for dynamic subscription/unsubscription for (sets of) objects on top of
|
||||
websockets protocol.
|
||||
|
||||
Note that we already support websockets in the regular Kubernetes API for
|
||||
watch requests (in addition to regular http requests), so for the purpose of
|
||||
bulk watch we will be extending websocket support.
|
||||
|
||||
The the high level, the propocol will look:
|
||||
The the high level, the protocol will look:
|
||||
1. client opens a new websocket connection to a bulk watch endpoint to the
|
||||
server via ghttp GET
|
||||
1. this results in creating a single channel that is used only to handle
|
||||
|
@ -232,7 +232,7 @@ type Response struct {
|
|||
With the above structure we can guarantee that we only send and receive
|
||||
objects of a single type over the channel.
|
||||
|
||||
We should also introduce some way of correleting responses with requests
|
||||
We should also introduce some way of correlating responses with requests
|
||||
when a client is sending multiple of them at the same time. To achieve this
|
||||
we will add a `request identified` field to the `Request` that user can set
|
||||
and that will then be returned as part of `Response`. With this mechanism
|
||||
|
@ -288,7 +288,7 @@ frameworks like reflector) that rely on two crucial watch invariants:
|
|||
1. there is at most one watch event delivered for any resource version
|
||||
|
||||
However, we have no guarantee that resource version series is shared between
|
||||
diferent resource types (in fact in default GCE setup events are not sharing
|
||||
different resource types (in fact in default GCE setup events are not sharing
|
||||
the same series as they are stored in a separate etcd instance). That said,
|
||||
to avoid introducing too many assumptions (that already aren't really met)
|
||||
we can't guarantee exactly the same.
|
||||
|
@ -344,7 +344,7 @@ aggregator, which is crucial requirement here.
|
|||
NOTE: For watch requests, as an initial step we can consider implementing
|
||||
this API only in aggregator and simply start an individual watch for any
|
||||
subrequest. With http2 we shouldn't get rid of descriptors and it can be
|
||||
enough as a prrof of concept. However, with such approach there will be
|
||||
enough as a proof of concept. However, with such approach there will be
|
||||
difference between sending a given request to aggregator and apiserver
|
||||
so we need to implement it properly in apiserver before entering alpha
|
||||
anyway. This would just give us early results faster.
|
||||
|
@ -359,7 +359,7 @@ single response to the user.
|
|||
|
||||
The only non-trivial operation above is sending the request for a single
|
||||
resource type down the stack. In order to implement it, we will need to
|
||||
slighly modify the interface of "Registry" in apiserver. The modification
|
||||
slightly modify the interface of "Registry" in apiserver. The modification
|
||||
will have to allow passing both what we are passing now and BulkListOptions
|
||||
(in some format) (this may e.g. changing signature to accept BulkListOptions
|
||||
and translating ListOptions to BulkListOptions in the current code).
|
||||
|
@ -433,7 +433,7 @@ do it in deterministic way. The crucial requirements are:
|
|||
1. Whenever "list" request returns a list of objects and a resource version "rv",
|
||||
starting a watch from the returned "rv" will never drop any events.
|
||||
2. For a given watch request (with resource version "rv"), the returned stream
|
||||
of events is always the same (e.g. very slow laggin watch may not cause dropped
|
||||
of events is always the same (e.g. very slow lagging watch may not cause dropped
|
||||
events).
|
||||
|
||||
We can't really satisfy these conditions using the existing machinery. To solve
|
||||
|
@ -468,7 +468,7 @@ no matter if we implement it or not)
|
|||
there are few selectors selecting the same object, in dynamic approach it will
|
||||
be send multiple times, once over each channel, here it would be send once)
|
||||
- we would have to introduce a dedicate "BulkWatchEvent" type to incorporate
|
||||
resource type. This would make those two incompatible even at the ouput format.
|
||||
resource type. This would make those two incompatible even at the output format.
|
||||
|
||||
With all of those in mind, even though the implementation would be much much
|
||||
simpler (and could potentially be a first step and would probably solve the
|
|
@ -1,5 +1,3 @@
|
|||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Client: layering and package structure](#client-layering-and-package-structure)
|
||||
- [Desired layers](#desired-layers)
|
||||
- [Transport](#transport)
|
||||
|
@ -12,7 +10,6 @@
|
|||
- [Package Structure](#package-structure)
|
||||
- [Client Guarantees (and testing)](#client-guarantees-and-testing)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
# Client: layering and package structure
|
||||
|
||||
|
@ -310,7 +307,3 @@ that client will not have to change their code until they are deliberately
|
|||
upgrading their import. We probably will want to generate some sort of stub test
|
||||
with a clientset, to ensure that we don't change the interface.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -237,7 +237,7 @@ This section lists considerations specific to a given controller.
|
|||
|
||||
* **ReplicaSet/ReplicationController**
|
||||
|
||||
* These controllers currenly only enable ControllerRef behavior when the
|
||||
* These controllers currently only enable ControllerRef behavior when the
|
||||
Garbage Collector is enabled. When ControllerRef was first added to these
|
||||
controllers, the main purpose was to enable server-side cascading deletion
|
||||
via the Garbage Collector, so it made sense to gate it behind the same flag.
|
||||
|
@ -415,7 +415,3 @@ Summary of significant revisions to this document:
|
|||
* Specify ControllerRef-related behavior changes upon upgrade/downgrade.
|
||||
* [Implementation](#implementation)
|
||||
* List all work to be done and mark items already completed as of this edit.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -24,7 +24,7 @@ Development would be based on a generated client using OpenAPI and [swagger-code
|
|||
|
||||
### Client Capabilities
|
||||
|
||||
* Bronze Requirements [](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/csi-new-client-library-procedure.md#client-capabilities)
|
||||
* Bronze Requirements [](/contributors/design-proposals/api-machinery/csi-new-client-library-procedure.md#client-capabilities)
|
||||
|
||||
* Support loading config from kube config file
|
||||
|
||||
|
@ -40,11 +40,11 @@ Development would be based on a generated client using OpenAPI and [swagger-code
|
|||
|
||||
* Works from within the cluster environment.
|
||||
|
||||
* Silver Requirements [](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/csi-new-client-library-procedure.md#client-capabilities)
|
||||
* Silver Requirements [](/contributors/design-proposals/api-machinery/csi-new-client-library-procedure.md#client-capabilities)
|
||||
|
||||
* Support watch calls
|
||||
|
||||
* Gold Requirements [](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/csi-new-client-library-procedure.md#client-capabilities)
|
||||
* Gold Requirements [](/contributors/design-proposals/api-machinery/csi-new-client-library-procedure.md#client-capabilities)
|
||||
|
||||
* Support exec, attach, port-forward calls (these are not normally supported out of the box from [swagger-codegen](https://github.com/swagger-api/swagger-codegen))
|
||||
|
||||
|
@ -54,11 +54,11 @@ Development would be based on a generated client using OpenAPI and [swagger-code
|
|||
|
||||
### Client Support Level
|
||||
|
||||
* Alpha [](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/csi-new-client-library-procedure.md#client-support-level)
|
||||
* Alpha [](/contributors/design-proposals/api-machinery/csi-new-client-library-procedure.md#client-support-level)
|
||||
|
||||
* Clients don’t even have to meet bronze requirements
|
||||
|
||||
* Beta [](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/csi-new-client-library-procedure.md#client-support-level)
|
||||
* Beta [](/contributors/design-proposals/api-machinery/csi-new-client-library-procedure.md#client-support-level)
|
||||
|
||||
* Client at least meets bronze standards
|
||||
|
||||
|
@ -68,7 +68,7 @@ Development would be based on a generated client using OpenAPI and [swagger-code
|
|||
|
||||
* 2+ individual maintainers/owners of the repository
|
||||
|
||||
* Stable [](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/csi-new-client-library-procedure.md#client-support-level)
|
||||
* Stable [](/contributors/design-proposals/api-machinery/csi-new-client-library-procedure.md#client-support-level)
|
||||
|
||||
* Support level documented per-platform
|
||||
|
||||
|
@ -96,5 +96,5 @@ For each client language, we’ll make a client-[lang]-base and client-[lang] re
|
|||
|
||||
# Support
|
||||
|
||||
These clients will be supported by the Kubernetes [API Machinery special interest group](https://github.com/kubernetes/community/tree/master/sig-api-machinery); however, individual owner(s) will be needed for each client language for them to be considered stable; the SIG won’t be able to handle the support load otherwise. If the generated clients prove as easy to maintain as we hope, then a few individuals may be able to own multiple clients.
|
||||
These clients will be supported by the Kubernetes [API Machinery special interest group](/sig-api-machinery); however, individual owner(s) will be needed for each client language for them to be considered stable; the SIG won’t be able to handle the support load otherwise. If the generated clients prove as easy to maintain as we hope, then a few individuals may be able to own multiple clients.
|
||||
|
|
@ -0,0 +1,203 @@
|
|||
# Subresources for CustomResources
|
||||
|
||||
Authors: @nikhita, @sttts
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Abstract](#abstract)
|
||||
2. [Goals](#goals)
|
||||
3. [Non-Goals](#non-goals)
|
||||
4. [Proposed Extension of CustomResourceDefinition](#proposed-extension-of-customresourcedefinition)
|
||||
1. [API Types](#api-types)
|
||||
2. [Feature Gate](#feature-gate)
|
||||
5. [Semantics](#semantics)
|
||||
1. [Validation Behavior](#validation-behavior)
|
||||
1. [Status](#status)
|
||||
2. [Scale](#scale)
|
||||
2. [Status Behavior](#status-behavior)
|
||||
3. [Scale Behavior](#scale-behavior)
|
||||
1. [Status Replicas Behavior](#status-replicas-behavior)
|
||||
2. [Selector Behavior](#selector-behavior)
|
||||
4. [Implementation Plan](#implementation-plan)
|
||||
5. [Alternatives](#alternatives)
|
||||
1. [Scope](#scope)
|
||||
|
||||
## Abstract
|
||||
|
||||
[CustomResourceDefinitions](https://github.com/kubernetes/community/pull/524) (CRDs) were introduced in 1.7. The objects defined by CRDs are called CustomResources (CRs). Currently, we do not provide subresources for CRs.
|
||||
|
||||
However, it is one of the [most requested features](https://github.com/kubernetes/kubernetes/issues/38113) and this proposal seeks to add `/status` and `/scale` subresources for CustomResources.
|
||||
|
||||
## Goals
|
||||
|
||||
1. Support status/spec split for CustomResources:
|
||||
1. Status changes are ignored on the main resource endpoint.
|
||||
2. Support a `/status` subresource HTTP path for status changes.
|
||||
3. `metadata.Generation` is increased only on spec changes.
|
||||
2. Support a `/scale` subresource for CustomResources.
|
||||
3. Maintain backward compatibility by allowing CRDs to opt-in to enable subresources.
|
||||
4. If a CustomResource is already structured using spec/status, allow it to easily transition to use the `/status` and `/scale` endpoint.
|
||||
5. Work seamlessly with [JSON Schema validation](https://github.com/kubernetes/community/pull/708).
|
||||
|
||||
## Non-Goals
|
||||
|
||||
1. Allow defining arbitrary subresources i.e. subresources except `/status` and `/scale`.
|
||||
|
||||
## Proposed Extension of CustomResourceDefinition
|
||||
|
||||
### API Types
|
||||
|
||||
The addition of the following external types in `apiextensions.k8s.io/v1beta1` is proposed:
|
||||
|
||||
```go
|
||||
type CustomResourceDefinitionSpec struct {
|
||||
...
|
||||
// SubResources describes the subresources for CustomResources
|
||||
// This field is alpha-level and should only be sent to servers that enable
|
||||
// subresources via the CurstomResourceSubResources feature gate.
|
||||
// +optional
|
||||
SubResources *CustomResourceSubResources `json:"subResources,omitempty"`
|
||||
}
|
||||
|
||||
// CustomResourceSubResources defines the status and scale subresources for CustomResources.
|
||||
type CustomResourceSubResources struct {
|
||||
// Status denotes the status subresource for CustomResources
|
||||
Status *CustomResourceSubResourceStatus `json:"status,omitempty"`
|
||||
// Scale denotes the scale subresource for CustomResources
|
||||
Scale *CustomResourceSubResourceScale `json:"scale,omitempty"`
|
||||
}
|
||||
|
||||
// CustomResourceSubResourceStatus defines how to serve the HTTP path <CR Name>/status.
|
||||
type CustomResourceSubResourceStatus struct {
|
||||
}
|
||||
|
||||
// CustomResourceSubResourceScale defines how to serve the HTTP path <CR name>/scale.
|
||||
type CustomResourceSubResourceScale struct {
|
||||
// required, e.g. “.spec.replicas”. Must be under `.spec`.
|
||||
// Only JSON paths without the array notation are allowed.
|
||||
SpecReplicasPath string `json:"specReplicasPath"`
|
||||
// optional, e.g. “.status.replicas”. Must be under `.status`.
|
||||
// Only JSON paths without the array notation are allowed.
|
||||
StatusReplicasPath string `json:"statusReplicasPath,omitempty"`
|
||||
// optional, e.g. “.spec.labelSelector”. Must be under `.spec`.
|
||||
// Only JSON paths without the array notation are allowed.
|
||||
LabelSelectorPath string `json:"labelSelectorPath,omitempty"`
|
||||
// ScaleGroupVersion denotes the GroupVersion of the Scale
|
||||
// object sent as the payload for /scale. It allows transition
|
||||
// to future versions easily.
|
||||
// Today only autoscaling/v1 is allowed.
|
||||
ScaleGroupVersion schema.GroupVersion `json:"groupVersion"`
|
||||
}
|
||||
```
|
||||
|
||||
### Feature Gate
|
||||
|
||||
The `SubResources` field in `CustomResourceDefinitionSpec` will be gated under the `CustomResourceSubResources` alpha feature gate.
|
||||
If the gate is not open, the value of the new field within `CustomResourceDefinitionSpec` is dropped on creation and updates of CRDs.
|
||||
|
||||
### Scale type
|
||||
|
||||
The `Scale` object is the payload sent over the wire for `/scale`. The [polymorphic `Scale` type](https://github.com/kubernetes/kubernetes/pull/53743) i.e. `autoscaling/v1.Scale` is used for the `Scale` object.
|
||||
|
||||
Since the GroupVersion of the `Scale` object is specified in `CustomResourceSubResourceScale`, transition to future versions (eg `autoscaling/v2.Scale`) can be done easily.
|
||||
|
||||
Note: If `autoscaling/v1.Scale` is deprecated, then it would be deprecated here as well.
|
||||
|
||||
## Semantics
|
||||
|
||||
### Validation Behavior
|
||||
|
||||
#### Status
|
||||
|
||||
The status endpoint of a CustomResource receives a full CR object. Changes outside of the `.status` subpath are ignored.
|
||||
For validation, the JSON Schema present in the CRD is validated only against the `.status` subpath.
|
||||
|
||||
To validate only against the schema for the `.status` subpath, `oneOf` and `anyOf` constructs are not allowed within the root of the schema, but only under a properties sub-schema (with this restriction, we can project a schema to a sub-path). The following is forbidden in the CRD spec:
|
||||
|
||||
```yaml
|
||||
validation:
|
||||
openAPIV3Schema:
|
||||
oneOf:
|
||||
...
|
||||
```
|
||||
|
||||
**Note**: The restriction for `oneOf` and `anyOf` allows us to write a projection function `ProjectJSONSchema(schema *JSONSchemaProps, path []string) (*JSONSchemaProps, error)` that can be used to apply a given schema for the whole object to only the sub-path `.status` or `.spec`.
|
||||
|
||||
#### Scale
|
||||
|
||||
Moreover, if the scale subresource is enabled:
|
||||
|
||||
On update, we copy the values from the `Scale` object into the specified paths in the CustomResource, if the path is set (`StatusReplicasPath` and `LabelSelectorPath` are optional).
|
||||
If `StatusReplicasPath` or `LabelSelectorPath` is not set, we validate that the value in `Scale` is also not specified and return an error otherwise.
|
||||
|
||||
On `get` and on `update` (after copying the values into the CustomResource as described above), we verify that:
|
||||
|
||||
- The value at the specified JSON Path `SpecReplicasPath` (e.g. `.spec.replicas`) is a non-negative integer value and is not empty.
|
||||
|
||||
- The value at the optional JSON Path `StatusReplicasPath` (e.g. `.status.replicas`) is an integer value if it exists (i.e. this can be empty).
|
||||
|
||||
- The value at the optional JSON Path `LabelSelectorPath` (e.g. `.spec.labelSelector`) is a valid label selector if it exists (i.e. this can be empty).
|
||||
|
||||
**Note**: The values at the JSON Paths specified by `SpecReplicasPath`, `LabelSelectorPath` and `StatusReplicasPath` are also validated with the same rules when the whole object or, in case the `/status` subresource is enabled, the `.status` sub-object is updated.
|
||||
|
||||
### Status Behavior
|
||||
|
||||
If the `/status` subresource is enabled, the following behaviors change:
|
||||
|
||||
- The main resource endpoint will ignore all changes in the status subpath.
|
||||
(note: it will **not** reject requests which try to change the status, following the existing semantics of other resources).
|
||||
|
||||
- The `.metadata.generation` field is updated if and only if the value at the `.spec` subpath changes.
|
||||
Additionally, if the spec does not change, `.metadata.generation` is not updated.
|
||||
|
||||
- The `/status` subresource receives a full resource object, but only considers the value at the `.status` subpath for the update.
|
||||
The value at the `.metadata` subpath is **not** considered for update as decided in https://github.com/kubernetes/kubernetes/issues/45539.
|
||||
|
||||
Both the status and the spec (and everything else if there is anything) of the object share the same key in the storage layer, i.e. the value at `.metadata.resourceVersion` is increased for any kind of change. There is no split of status and spec in the storage layer.
|
||||
|
||||
The `/status` endpoint supports both `get` and `update` verbs.
|
||||
|
||||
### Scale Behavior
|
||||
|
||||
The number of CustomResources can be easily scaled up or down depending on the replicas field present in the `.spec` subpath.
|
||||
|
||||
Only `ScaleSpec.Replicas` can be written. All other values are read-only and changes will be ignored. i.e. upon updating the scale subresource, two fields are modified:
|
||||
|
||||
1. The replicas field is copied back from the `Scale` object to the main resource as specified by `SpecReplicasPath` in the CRD, e.g. `.spec.replicas = scale.Spec.Replicas`.
|
||||
|
||||
2. The resource version is copied back from the `Scale` object to the main resource before writing to the storage: `.metadata.resourceVersion = scale.ResourceVersion`.
|
||||
In other words, the scale and the CustomResource share the resource version used for optimistic concurrency.
|
||||
Updates with outdated resource versions are rejected with a conflict error, read requests will return the resource version of the CustomResource.
|
||||
|
||||
The `/scale` endpoint supports both `get` and `update` verbs.
|
||||
|
||||
#### Status Replicas Behavior
|
||||
|
||||
As only the `scale.Spec.Replicas` field is to be written to by the CR user, the user-provided controller (not any generic CRD controller) counts its children and then updates the controlled object by writing to the `/status` subresource, i.e. the `scale.Status.Replicas` field is read-only.
|
||||
|
||||
#### Selector Behavior
|
||||
|
||||
`CustomResourceSubResourceScale.LabelSelectorPath` is the label selector over CustomResources that should match the replicas count.
|
||||
The value in the `Scale` object is one-to-one the value from the CustomResource if the label selector is non-empty.
|
||||
Intentionally we do not default it to another value from the CustomResource (e.g. `.spec.template.metadata.labels`) as this turned out to cause trouble (e.g. in `kubectl apply`) and it is generally seen as a wrong approach with existing resources.
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
The `/scale` and `/status` subresources are mostly distinct. It is proposed to do the implementation in two phases (the order does not matter much):
|
||||
|
||||
1. `/status` subresource
|
||||
2. `/scale` subresource
|
||||
|
||||
## Alternatives
|
||||
|
||||
### Scope
|
||||
|
||||
In this proposal we opted for an opinionated concept of subresources i.e. we restrict the subresource spec to the two very specific subresources: `/status` and `/scale`.
|
||||
We do not aim for a more generic subresource concept. In Kubernetes there are a number of other subresources like `/log`, `/exec`, `/bind`. But their semantics is much more special than `/status` and `/scale`.
|
||||
Hence, we decided to leave those other subresources to the domain of User provided API Server (UAS) instead of inventing a more complex subresource concept for CustomResourceDefinitions.
|
||||
|
||||
**Note**: The types do not make the addition of other subresources impossible in the future.
|
||||
|
||||
We also restrict the JSON path for the status and the spec within the CustomResource.
|
||||
We could make them definable by the user and the proposed types actually allow us to open this up in the future.
|
||||
For the time being we decided to be opinionated as all status and spec subobjects in existing types live under `.status` and `.spec`. Keeping this pattern imposes consistency on user provided CustomResources as well.
|
|
@ -70,7 +70,7 @@ The schema is referenced in [`CustomResourceDefinitionSpec`](https://github.com/
|
|||
|
||||
The schema types follow those of the OpenAPI library, but we decided to define them independently for the API to have full control over the serialization and versioning. Hence, it is easy to convert our types into those used for validation or to integrate them into an OpenAPI spec later.
|
||||
|
||||
Reference http://json-schema.org is also used by OpenAPI. We propose this as there are implementations available in Go and with OpenAPI, we will also be able to serve OpenAPI specs for CustomResourceDefintions.
|
||||
Reference http://json-schema.org is also used by OpenAPI. We propose this as there are implementations available in Go and with OpenAPI, we will also be able to serve OpenAPI specs for CustomResourceDefinitions.
|
||||
|
||||
```go
|
||||
// CustomResourceSpec describes how a user wants their resource to appear
|
|
@ -2,13 +2,13 @@
|
|||
## Background
|
||||
|
||||
The extensible admission control
|
||||
[proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/admission_control_extension.md)
|
||||
[proposal](admission_control_extension.md)
|
||||
proposed making admission control extensible. In the proposal, the `initializer
|
||||
admission controller` and the `generic webhook admission controller` are the two
|
||||
controllers that set default initializers and external admission hooks for
|
||||
resources newly created. These two admission controllers are in the same binary
|
||||
as the apiserver. This
|
||||
[section](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/admission_control_extension.md#dynamic-configuration)
|
||||
[section](admission_control_extension.md#dynamic-configuration)
|
||||
gave a preliminary design of the dynamic configuration of the list of the
|
||||
default admission controls. This document hashes out the implementation details.
|
||||
|
||||
|
@ -21,21 +21,21 @@ default admission controls. This document hashes out the implementation details.
|
|||
|
||||
* As a fallback, admin can always restart an apiserver and guarantee it sees the latest config
|
||||
|
||||
* Do not block the entire cluster if the intializers/webhooks are not ready
|
||||
* Do not block the entire cluster if the initializers/webhooks are not ready
|
||||
after registration.
|
||||
|
||||
## Specification
|
||||
|
||||
We assume initializers could be "fail open". We need to update the extensible
|
||||
admission control
|
||||
[proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/admission_control_extension.md)
|
||||
[proposal](admission_control_extension.md)
|
||||
if this is accepted.
|
||||
|
||||
The schema is evolved from the prototype in
|
||||
[#132](https://github.com/kubernetes/community/pull/132).
|
||||
|
||||
```golang
|
||||
// InitializerConfiguration describes the configuration of intializers.
|
||||
// InitializerConfiguration describes the configuration of initializers.
|
||||
type InitializerConfiguration struct {
|
||||
metav1.TypeMeta
|
||||
|
||||
|
@ -43,9 +43,9 @@ type InitializerConfiguration struct {
|
|||
|
||||
// Initializers is a list of resources and their default initializers
|
||||
// Order-sensitive.
|
||||
// When merging multiple InitializerConfigurations, we sort the intializers
|
||||
// When merging multiple InitializerConfigurations, we sort the initializers
|
||||
// from different InitializerConfigurations by the name of the
|
||||
// InitializerConfigurations; the order of the intializers from the same
|
||||
// InitializerConfigurations; the order of the initializers from the same
|
||||
// InitializerConfiguration is preserved.
|
||||
// +optional
|
||||
Initializers []Initializer `json:"initializers,omitempty" patchStrategy:"merge" patchMergeKey:"name"`
|
||||
|
@ -63,7 +63,7 @@ type Initializer struct {
|
|||
Name string `json:"name"`
|
||||
|
||||
// Rules describes what resources/subresources the initializer cares about.
|
||||
// The intializer cares about an operation if it matches _any_ Rule.
|
||||
// The initializer cares about an operation if it matches _any_ Rule.
|
||||
Rules []Rule `json:"rules,omitempty"`
|
||||
|
||||
// FailurePolicy defines what happens if the responsible initializer controller
|
||||
|
@ -106,7 +106,7 @@ type Rule struct {
|
|||
type FailurePolicyType string
|
||||
|
||||
const (
|
||||
// Ignore means the initilizer is removed from the initializers list of an
|
||||
// Ignore means the initializer is removed from the initializers list of an
|
||||
// object if the initializer is timed out.
|
||||
Ignore FailurePolicyType = "Ignore"
|
||||
// For 1.7, only "Ignore" is allowed. "Fail" will be allowed when the
|
||||
|
@ -114,7 +114,7 @@ const (
|
|||
Fail FailurePolicyType = "Fail"
|
||||
)
|
||||
|
||||
// ExternalAdmissionHookConfiguration describes the configuration of intializers.
|
||||
// ExternalAdmissionHookConfiguration describes the configuration of initializers.
|
||||
type ExternalAdmissionHookConfiguration struct {
|
||||
metav1.TypeMeta
|
||||
|
||||
|
@ -211,7 +211,7 @@ Notes:
|
|||
in the beta version.
|
||||
|
||||
* We excluded `Retry` as a FailurePolicy, because we want to expose the
|
||||
flakeness of an admission controller; and admission controllers like the quota
|
||||
flakiness of an admission controller; and admission controllers like the quota
|
||||
controller are not idempotent.
|
||||
|
||||
* There are multiple ways to compose `Rules []Rule` to achieve the same effect.
|
||||
|
@ -248,7 +248,7 @@ See [Considered but REJECTED alternatives](#considered-but-rejected-alternatives
|
|||
|
||||
## Handling fail-open initializers
|
||||
|
||||
The original [proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/admission_control_extension.md) assumed initializers always failed closed. It is dangerous since crashed
|
||||
The original [proposal](admission_control_extension.md) assumed initializers always failed closed. It is dangerous since crashed
|
||||
initializers can block the whole cluster. We propose to allow initializers to
|
||||
fail open, and in 1.7, let all initializers fail open.
|
||||
|
||||
|
@ -263,7 +263,7 @@ the timed out initializer.
|
|||
|
||||
If the apiserver crashes, then we fall back to a `read repair` mechanism. When
|
||||
handling a GET request, the apiserver checks the objectMeta.CreationTimestamp of
|
||||
the object, if a global intializer timeout (e.g., 10 mins) has reached, the
|
||||
the object, if a global initializer timeout (e.g., 10 mins) has reached, the
|
||||
apiserver removes the first initializer in the object.
|
||||
|
||||
In the HA setup, apiserver needs to take the clock drift into account as well.
|
||||
|
@ -282,7 +282,7 @@ See [Considered but REJECTED alternatives](#considered-but-rejected-alternatives
|
|||
2. #1 will allow parallel initializers as well.
|
||||
|
||||
3. implement the fail closed initializers according to
|
||||
[proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/admission_control_extension.md#initializers).
|
||||
[proposal](admission_control_extension.md#initializers).
|
||||
|
||||
4. more efficient check of AdmissionControlConfiguration changes. Currently we
|
||||
do periodic consistent read every second.
|
||||
|
@ -365,7 +365,7 @@ initializers from objects' initializers list. The controller uses shared
|
|||
informers to track uninitialized objects. Every 30s, the controller
|
||||
|
||||
* makes a snapshot of the uninitialized objects in the informers.
|
||||
* indexes the objects by the name of the first initialilzer in the objectMeta.Initializers
|
||||
* indexes the objects by the name of the first initializer in the objectMeta.Initializers
|
||||
* compares with the snapshot 30s ago, finds objects whose first initializers haven't changed
|
||||
* does a consistent read of AdmissionControllerConfiguration, finds which initializers are fail-open
|
||||
* spawns goroutines to send patches to remove fail-open initializers
|
|
@ -53,7 +53,7 @@ Each binary that generates events:
|
|||
* Maintains a historical record of previously generated events:
|
||||
* Implemented with
|
||||
["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go)
|
||||
in [`pkg/client/record/events_cache.go`](../../pkg/client/record/events_cache.go).
|
||||
in [`pkg/client/record/events_cache.go`](https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/tools/record/events_cache.go).
|
||||
* Implemented behind an `EventCorrelator` that manages two subcomponents:
|
||||
`EventAggregator` and `EventLogger`.
|
||||
* The `EventCorrelator` observes all incoming events and lets each
|
||||
|
@ -98,7 +98,7 @@ of time and generates tons of unique events, the previously generated events
|
|||
cache will not grow unchecked in memory. Instead, after 4096 unique events are
|
||||
generated, the oldest events are evicted from the cache.
|
||||
* When an event is generated, the previously generated events cache is checked
|
||||
(see [`pkg/client/unversioned/record/event.go`](http://releases.k8s.io/HEAD/pkg/client/record/event.go)).
|
||||
(see [`pkg/client/unversioned/record/event.go`](https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/tools/record/event.go)).
|
||||
* If the key for the new event matches the key for a previously generated
|
||||
event (meaning all of the above fields match between the new event and some
|
||||
previously generated event), then the event is considered to be a duplicate and
|
||||
|
@ -162,8 +162,3 @@ compressing multiple recurring events in to a single event.
|
|||
single event to optimize etcd storage.
|
||||
* PR [#4444](http://pr.k8s.io/4444): Switch events history to use LRU cache
|
||||
instead of map.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -196,8 +196,3 @@ Thus, listing a third-party resource can be achieved by listing the directory:
|
|||
```
|
||||
${standard-k8s-prefix}/third-party-resources/${third-party-resource-namespace}/${third-party-resource-name}/${resource-namespace}/
|
||||
```
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -349,9 +349,3 @@ In case the garbage collector is mistakenly deleting objects, we should provide
|
|||
* Before an object is deleted from the registry, the API server clears fields like DeletionTimestamp, then creates the object in /archive and sets a TTL.
|
||||
* Add a `kubectl restore` command, which takes a resource/name pair as input, creates the object with the spec stored in the /archive, and deletes the archived object.
|
||||
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -130,8 +130,3 @@ single scheduler, as opposed to choosing a scheduler, a desire mentioned in
|
|||
`MetadataPolicy` could be used. Issue #17324 proposes to create a generalized
|
||||
API for matching "claims" to "service classes"; matching a pod to a scheduler
|
||||
would be one use for such an API.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -301,7 +301,7 @@ that the returned value is not in the known type.
|
|||
We add the `contentEncoding` field here to preserve room for future
|
||||
optimizations like encryption-at-rest or compression of the nested content.
|
||||
Clients should error when receiving an encoding they do not support.
|
||||
Negotioting encoding is not defined here, but introducing new encodings
|
||||
Negotiating encoding is not defined here, but introducing new encodings
|
||||
is similar to introducing a schema change or new API version.
|
||||
|
||||
A client should use the `kind` and `apiVersion` fields to identify the
|
||||
|
@ -473,8 +473,3 @@ The generated protobuf will be checked with a verify script before merging.
|
|||
## Open Questions
|
||||
|
||||
* Is supporting stored protobuf files on disk in the kubectl client worth it?
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -18,7 +18,7 @@ schema). `get` supports a `wide` mode that includes additional columns. Users ca
|
|||
flag. Headers corresponding to the columns are optionally displayed.
|
||||
|
||||
`kubectl describe` shows a textual representation of individual objects that describes individual fields as subsequent
|
||||
lines and uses indendation and nested tables to convey deeper structure on the resource (such as events for a pod or
|
||||
lines and uses indentation and nested tables to convey deeper structure on the resource (such as events for a pod or
|
||||
each container). It sometimes retrieves related objects like events, pods for a replication controller, or autoscalers
|
||||
for a deployment. It supports no significant flags.
|
||||
|
||||
|
@ -177,7 +177,3 @@ fall back to client side functions.
|
|||
Server side code would reuse the existing display functions but replace TabWriter with either a structured writer
|
||||
or the tabular form.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -1,5 +1,4 @@
|
|||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Overview](#overview)
|
||||
- [API Design](#api-design)
|
||||
|
@ -14,7 +13,6 @@
|
|||
- [Unhandled cases](#unhandled-cases)
|
||||
- [Implications to existing clients](#implications-to-existing-clients)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
# Overview
|
||||
|
||||
|
@ -124,7 +122,7 @@ In addition, if an object popped from `dirtyQueue` is marked as "GC in progress"
|
|||
* To avoid racing with another controller, it requeues the object if `observedGeneration < Generation`. This is best-effort, see [unhandled cases](#unhandled-cases).
|
||||
* Checks if the object has dependents
|
||||
* If not, send a PUT request to remove the `GCFinalizer`;
|
||||
* If so, then add all dependents to the `dirtryQueue`; we need bookkeeping to avoid adding the dependents repeatedly if the owner gets in the `synchronousGC queue` multiple times.
|
||||
* If so, then add all dependents to the `dirtyQueue`; we need bookkeeping to avoid adding the dependents repeatedly if the owner gets in the `synchronousGC queue` multiple times.
|
||||
|
||||
## Controllers
|
||||
|
||||
|
@ -169,7 +167,3 @@ To make the new kubectl compatible with the 1.4 and earlier masters, kubectl nee
|
|||
1.4 `kubectl delete rc/rs` uses `DeleteOptions.OrphanDependents=true`, which is going to be converted to `DeletePropagationBackground` (see [API Design](#api-changes)) by a 1.5 master, so its behavior keeps the same.
|
||||
|
||||
Pre 1.4 `kubectl delete` uses `DeleteOptions.OrphanDependents=nil`, so does the 1.4 `kubectl delete` for resources other than rc and rs. The option is going to be converted to `DeletePropagationDefault` (see [API Design](#api-changes)) by a 1.5 master, so these commands behave the same as when working with a 1.4 master.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -13,7 +13,7 @@ prevent future challenges in upgrading.
|
|||
1. Ensure ThirdPartyResource APIs operate consistently with first party
|
||||
Kubernetes APIs.
|
||||
2. Enable ThirdPartyResources to specify how they will appear in API
|
||||
discovery to be consistent with other resources and avoid naming confilcts
|
||||
discovery to be consistent with other resources and avoid naming conflicts
|
||||
3. Move TPR into their own API group to allow the extensions group to be
|
||||
[removed](https://github.com/kubernetes/kubernetes/issues/43214)
|
||||
4. Support cluster scoped TPR resources
|
|
@ -1,58 +0,0 @@
|
|||
# Build some Admission Controllers into the Generic API server library
|
||||
|
||||
**Related PR:**
|
||||
|
||||
| Topic | Link |
|
||||
| ----- | ---- |
|
||||
| Admission Control | https://github.com/kubernetes/community/blob/master/contributors/design-proposals/admission_control.md |
|
||||
|
||||
## Introduction
|
||||
|
||||
An admission controller is a piece of code that intercepts requests to the Kubernetes API - think a middleware.
|
||||
The API server lets you have a whole chain of them. Each is run in sequence before a request is accepted
|
||||
into the cluster. If any of the plugins in the sequence rejects the request, the entire request is rejected
|
||||
immediately and an error is returned to the user.
|
||||
|
||||
Many features in Kubernetes require an admission control plugin to be enabled in order to properly support the feature.
|
||||
In fact in the [documentation](https://kubernetes.io/docs/admin/admission-controllers/#is-there-a-recommended-set-of-plug-ins-to-use) you will find
|
||||
a recommended set of them to use.
|
||||
|
||||
At the moment admission controllers are implemented as plugins and they have to be compiled into the
|
||||
final binary in order to be used at a later time. Some even require an access to cache, an authorizer etc.
|
||||
This is where an admission plugin initializer kicks in. An admission plugin initializer is used to pass additional
|
||||
configuration and runtime references to a cache, a client and an authorizer.
|
||||
|
||||
To streamline the process of adding new plugins especially for aggregated API servers we would like to build some plugins
|
||||
into the generic API server library and provide a plugin initializer. While anyone can author and register one, having a known set of
|
||||
provided references let's people focus on what they need their admission plugin to do instead of paying attention to wiring.
|
||||
|
||||
## Implementation
|
||||
|
||||
The first step would involve creating a "standard" plugin initializer that would be part of the
|
||||
generic API server. It would use kubeconfig to populate
|
||||
[external clients](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubeapiserver/admission/initializer.go#L29)
|
||||
and [external informers](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubeapiserver/admission/initializer.go#L35).
|
||||
By default for servers that would be run on the kubernetes cluster in-cluster config would be used.
|
||||
The standard initializer would also provide a client config for connecting to the core kube-apiserver.
|
||||
Some API servers might be started as static pods, which don't have in-cluster configs.
|
||||
In that case the config could be easily populated form the file.
|
||||
|
||||
The second step would be to move some plugins from [admission pkg](https://github.com/kubernetes/kubernetes/tree/master/plugin/pkg/admission)
|
||||
to the generic API server library. Some admission plugins are used to ensure consistent user expectations.
|
||||
These plugins should be moved. One example is the Namespace Lifecycle plugin which prevents users
|
||||
from creating resources in non-existent namespaces.
|
||||
|
||||
*Note*:
|
||||
For loading in-cluster configuration [visit](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/examples/in-cluster/main.go#L30)
|
||||
For loading the configuration directly from a file [visit](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/examples/out-of-cluster/main.go)
|
||||
|
||||
## How to add an admission plugin ?
|
||||
At this point adding an admission plugin is very simple and boils down to performing the
|
||||
following series of steps:
|
||||
1. Write an admission plugin
|
||||
2. Register the plugin
|
||||
3. Reference the plugin in the admission chain
|
||||
|
||||
**TODO**(p0lyn0mial): There is also a [sample apiserver](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/sample-apiserver/main.go) to demonstrate the usage of the generic API library.
|
||||
After implementation sample could would be placed there - copy & paste it here and include a reference.
|
||||
|
|
@ -562,8 +562,3 @@ Openshift handles template processing via a server endpoint which consumes a tem
|
|||
produced by processing the template. It is also possible to handle the entire template processing flow via the client, but this was deemed
|
||||
undesirable as it would force each client tool to reimplement template processing (e.g. the standard CLI tool, an eclipse plugin, a plugin for a CI system like Jenkins, etc). The assumption in this proposal is that server side template processing is the preferred implementation approach for
|
||||
this reason.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,6 @@
|
|||
reviewers:
|
||||
- sig-apps-leads
|
||||
approvers:
|
||||
- sig-apps-leads
|
||||
labels:
|
||||
- sig/apps
|
|
@ -294,7 +294,3 @@ spec:
|
|||
|
||||
In the future, we may add the ability to specify an init-container that can
|
||||
watch the volume contents for updates and respond to changes when they occur.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -427,7 +427,7 @@ its feasibility, we construct such a scheme here. However, this proposal does
|
|||
not mandate its use.
|
||||
|
||||
Given a hash function with output size `HashSize` defined
|
||||
as `func H(s srtring) [HashSize] byte`, in order to resolve collisions we
|
||||
as `func H(s string) [HashSize] byte`, in order to resolve collisions we
|
||||
define a new function `func H'(s string, n int) [HashSize]byte` where `H'`
|
||||
returns the result of invoking `H` on the concatenation of `s` with the string
|
||||
value of `n`. We define a third function
|
|
@ -207,7 +207,7 @@ but old ones still satisfy the schedule and are not re-run just because the temp
|
|||
|
||||
If you delete and replace a CronJob with one of the same name, it will:
|
||||
- not use any old Status.Active, and not consider any existing running or terminated jobs from the previous
|
||||
CronJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
|
||||
CronJob (with a different UID) at all when determining conflicts, what needs to be started, etc.
|
||||
- If there is an existing Job with the same time-based hash in its name (see below), then
|
||||
new instances of that job will not be able to be created. So, delete it if you want to re-run.
|
||||
with the same name as conflicts.
|
||||
|
@ -322,6 +322,10 @@ by two instances (replicated or restarting) of the controller process.
|
|||
|
||||
We chose to use the hashed-date suffix approach.
|
||||
|
||||
## Manually triggering CronJobs
|
||||
|
||||
A user may wish to manually trigger a CronJob for some reason (see [#47538](http://issues.k8s.io/47538)), such as testing it prior to its scheduled time. This could be made possible via an `/instantiate` subresource in the API, which when POSTed to would immediately spawn a Job from the JobSpec contained within the CronJob.
|
||||
|
||||
## Future evolution
|
||||
|
||||
Below are the possible future extensions to the Job controller:
|
||||
|
@ -329,7 +333,3 @@ Below are the possible future extensions to the Job controller:
|
|||
happening in [#18827](https://issues.k8s.io/18827).
|
||||
* Be able to specify more general template in `.spec` field, to create arbitrary
|
||||
types of resources. This relates to the work happening in [#18215](https://issues.k8s.io/18215).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -66,8 +66,7 @@ nodes, preempting other pods if necessary.
|
|||
"kubernetes.io/created-by: \<json API object reference\>"
|
||||
```
|
||||
- YAML example:
|
||||
```
|
||||
YAML
|
||||
```yaml
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
|
@ -202,7 +201,3 @@ restartPolicy set to Always.
|
|||
|
||||
- Should work similarly to [Deployment](http://issues.k8s.io/1743).
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -132,7 +132,7 @@ type DaemonSetSpec struct {
|
|||
|
||||
// DaemonSetStatus represents the current status of a daemon set.
|
||||
type DaemonSetStatus struct {
|
||||
// Note: Existing fields, including CurrentNumberScheduled, NumberMissscheduled,
|
||||
// Note: Existing fields, including CurrentNumberScheduled, NumberMisscheduled,
|
||||
// DesiredNumberScheduled, NumberReady, and ObservedGeneration are omitted in
|
||||
// this proposal.
|
||||
|
||||
|
@ -202,7 +202,7 @@ For each pending DaemonSet updates, it will:
|
|||
- The history will be labeled with `DefaultDaemonSetUniqueLabelKey`.
|
||||
- DaemonSet controller will add a ControllerRef in the history
|
||||
`.ownerReferences`.
|
||||
- Current history should have the largest `.revision` number amonst all
|
||||
- Current history should have the largest `.revision` number amongst all
|
||||
existing history. Update `.revision` if it's not (e.g. after a rollback.)
|
||||
- If more than one current history is found, remove duplicates and relabel
|
||||
their pods' `DefaultDaemonSetUniqueLabelKey`.
|
||||
|
@ -250,7 +250,7 @@ In DaemonSet strategy (pkg/registry/extensions/daemonset/strategy.go#PrepareForU
|
|||
increase DaemonSet's `.spec.templateGeneration` by 1 if any changes is made to
|
||||
DaemonSet's `.spec.template`.
|
||||
|
||||
This was originally implmeneted in 1.6, and kept in 1.7 for backward compatibility.
|
||||
This was originally implemented in 1.6, and kept in 1.7 for backward compatibility.
|
||||
|
||||
### kubectl
|
||||
|
|
@ -1,5 +1,3 @@
|
|||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Deploy through CLI](#deploy-through-cli)
|
||||
- [Motivation](#motivation)
|
||||
- [Requirements](#requirements)
|
||||
|
@ -16,13 +14,12 @@
|
|||
- [Pause Deployments](#pause-deployments)
|
||||
- [Perm-failed Deployments](#perm-failed-deployments)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
# Deploy through CLI
|
||||
|
||||
## Motivation
|
||||
|
||||
Users can use [Deployments](../user-guide/deployments.md) or [`kubectl rolling-update`](../user-guide/kubectl/kubectl_rolling-update.md) to deploy in their Kubernetes clusters. A Deployment provides declarative update for Pods and ReplicationControllers, whereas `rolling-update` allows the users to update their earlier deployment without worrying about schemas and configurations. Users need a way that's similar to `rolling-update` to manage their Deployments more easily.
|
||||
Users can use [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) or [`kubectl rolling-update`](https://kubernetes.io/docs/tasks/run-application/rolling-update-replication-controller/) to deploy in their Kubernetes clusters. A Deployment provides declarative update for Pods and ReplicationControllers, whereas `rolling-update` allows the users to update their earlier deployment without worrying about schemas and configurations. Users need a way that's similar to `rolling-update` to manage their Deployments more easily.
|
||||
|
||||
`rolling-update` expects ReplicationController as the only resource type it deals with. It's not trivial to support exactly the same behavior with Deployment, which requires:
|
||||
- Print out scaling up/down events.
|
||||
|
@ -141,7 +138,3 @@ Users sometimes need to temporarily disable a deployment. See issue [#14516](htt
|
|||
### Perm-failed Deployments
|
||||
|
||||
The deployment could be marked as "permanently failed" for a given spec hash so that the system won't continue thrashing on a doomed deployment. The users can retry a failed deployment with `kubectl rollout retry`. See issue [#14519](https://github.com/kubernetes/kubernetes/issues/14519).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -257,8 +257,3 @@ Apart from the above, we want to add support for the following:
|
|||
|
||||
- https://github.com/kubernetes/kubernetes/issues/1743 has most of the
|
||||
discussion that resulted in this proposal.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -761,7 +761,7 @@ spec:
|
|||
spec:
|
||||
containers:
|
||||
- name: c
|
||||
image: gcr.io/google_containers/busybox
|
||||
image: k8s.gcr.io/busybox
|
||||
command:
|
||||
- 'sh'
|
||||
- '-c'
|
||||
|
@ -894,7 +894,3 @@ is verbose. For StatefulSet, this is less of a problem.
|
|||
This differs from StatefulSet in that StatefulSet uses names and not indexes. StatefulSet is
|
||||
intended to support ones to tens of things.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -18,6 +18,7 @@ Several existing issues and PRs were already created regarding that particular s
|
|||
1. Be able to get the job status.
|
||||
1. Be able to specify the number of instances performing a job at any one time.
|
||||
1. Be able to specify the number of successfully finished instances required to finish a job.
|
||||
1. Be able to specify a backoff policy, when job is continuously failing.
|
||||
|
||||
|
||||
## Motivation
|
||||
|
@ -26,6 +27,41 @@ Jobs are needed for executing multi-pod computation to completion; a good exampl
|
|||
here would be the ability to implement any type of batch oriented tasks.
|
||||
|
||||
|
||||
## Backoff policy and failed pod limit
|
||||
|
||||
By design, Jobs do not have any notion of failure, other than a pod's `restartPolicy`
|
||||
which is mistakenly taken as Job's restart policy ([#30243](https://github.com/kubernetes/kubernetes/issues/30243),
|
||||
[#[43964](https://github.com/kubernetes/kubernetes/issues/43964)]). There are
|
||||
situation where one wants to fail a Job after some amount of retries over a certain
|
||||
period of time, due to a logical error in configuration etc. To do so we are going
|
||||
to introduce the following fields, which will control the backoff policy: a number of
|
||||
retries and an initial time of retry. The two fields will allow fine-grained control
|
||||
over the backoff policy. Each of the two fields will use a default value if none
|
||||
is provided, `BackoffLimit` is set by default to 6 and `BackoffSeconds` to 10s.
|
||||
This will result in the following retry sequence: 10s, 20s, 40s, 1m20s, 2m40s,
|
||||
5m20s. After which the job will be considered failed.
|
||||
|
||||
Additionally, to help debug the issue with a Job, and limit the impact of having
|
||||
too many failed pods left around (as mentioned in [#30243](https://github.com/kubernetes/kubernetes/issues/30243)),
|
||||
we are going to introduce a field which will allow specifying the maximum number
|
||||
of failed pods to keep around. This number will also take effect if none of the
|
||||
limits described above are set. By default it will take value of 1, to allow debugging
|
||||
job issues, but not to flood the cluster with too many failed jobs and their
|
||||
accompanying pods.
|
||||
|
||||
All of the above fields will be optional and will apply when `restartPolicy` is
|
||||
set to `Never` on a `PodTemplate`. With restart policy `OnFailure` only `BackoffLimit`
|
||||
applies. The reason for that is that failed pods are already restarted by the
|
||||
kubelet with an [exponential backoff](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy).
|
||||
Additionally, failures are counted differently depending on `restartPolicy`
|
||||
setting. For `Never` we count actual pod failures (reflected in `.status.failed`
|
||||
field). With `OnFailure`, we take an approximate value of pod restarts (as reported
|
||||
in `.status.containerStatuses[*].restartCount`).
|
||||
When `.spec.parallelism` is set to a value higher than 1, the failures are an
|
||||
overall number (as coming from `.status.failed`) because the controller does not
|
||||
hold information about failures coming from separate pods.
|
||||
|
||||
|
||||
## Implementation
|
||||
|
||||
Job controller is similar to replication controller in that they manage pods.
|
||||
|
@ -73,14 +109,31 @@ type JobSpec struct {
|
|||
// run at any given time. The actual number of pods running in steady state will
|
||||
// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
|
||||
// i.e. when the work left to do is less than max parallelism.
|
||||
Parallelism *int
|
||||
Parallelism *int32
|
||||
|
||||
// Completions specifies the desired number of successfully finished pods the
|
||||
// job should be run with. Defaults to 1.
|
||||
Completions *int
|
||||
Completions *int32
|
||||
|
||||
// Optional duration in seconds relative to the startTime that the job may be active
|
||||
// before the system tries to terminate it; value must be a positive integer.
|
||||
// It applies to overall job run time, no matter of the value of completions
|
||||
// or parallelism parameters.
|
||||
ActiveDeadlineSeconds *int64
|
||||
|
||||
// Optional number of retries before marking this job failed.
|
||||
// Defaults to 6.
|
||||
BackoffLimit *int32
|
||||
|
||||
// Optional time (in seconds) specifying how long the initial backoff will last.
|
||||
// Defaults to 10s.
|
||||
BackoffSeconds *int64
|
||||
|
||||
// Optional number of failed pods to retain.
|
||||
FailedPodsLimit *int32
|
||||
|
||||
// Selector is a label query over pods running a job.
|
||||
Selector map[string]string
|
||||
Selector LabelSelector
|
||||
|
||||
// Template is the object that describes the pod that will be created when
|
||||
// executing a job.
|
||||
|
@ -107,14 +160,14 @@ type JobStatus struct {
|
|||
CompletionTime unversioned.Time
|
||||
|
||||
// Active is the number of actively running pods.
|
||||
Active int
|
||||
Active int32
|
||||
|
||||
// Successful is the number of pods successfully completed their job.
|
||||
Successful int
|
||||
// Succeeded is the number of pods successfully completed their job.
|
||||
Succeeded int32
|
||||
|
||||
// Unsuccessful is the number of pods failures, this applies only to jobs
|
||||
// Failed is the number of pods failures, this applies only to jobs
|
||||
// created with RestartPolicyNever, otherwise this value will always be 0.
|
||||
Unsuccessful int
|
||||
Failed int32
|
||||
}
|
||||
|
||||
type JobConditionType string
|
||||
|
@ -153,7 +206,3 @@ Below are the possible future extensions to the Job controller:
|
|||
by providing pointers to Pods in the JobStatus ([see comment](https://github.com/kubernetes/kubernetes/pull/11746/files#r37142628)).
|
||||
* help users avoid non-unique label selectors ([see this proposal](../../docs/design/selector-generation.md))
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -174,7 +174,3 @@ Docs will be edited to show examples without a `job.spec.selector`.
|
|||
We probably want as much as possible the same behavior for Job and
|
||||
ReplicationController.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -181,7 +181,7 @@ status, back-off (like a scheduler or replication controller), and try again lat
|
|||
by a StatefulSet controller must have a set of labels that match the selector, support orphaning, and have a
|
||||
controller back reference annotation identifying the owning StatefulSet by name and UID.
|
||||
|
||||
When a StatefulSet is scaled down, the pod for the removed indentity should be deleted. It is less clear what the
|
||||
When a StatefulSet is scaled down, the pod for the removed identity should be deleted. It is less clear what the
|
||||
controller should do to supporting resources. If every pod requires a PV, and a user accidentally scales
|
||||
up to N=200 and then back down to N=3, leaving 197 PVs lying around may be undesirable (potential for
|
||||
abuse). On the other hand, a cluster of 5 that is accidentally scaled down to 3 might irreparably destroy
|
||||
|
@ -346,7 +346,7 @@ Requested features:
|
|||
|
||||
* Jobs can be used to perform a run-once initialization of the cluster
|
||||
* Init containers can be used to prime PVs and config with the identity of the pod.
|
||||
* Templates and how fields are overriden in the resulting object should have broad alignment
|
||||
* Templates and how fields are overridden in the resulting object should have broad alignment
|
||||
* DaemonSet defines the core model for how new controllers sit alongside replication controller and
|
||||
how upgrades can be implemented outside of Deployment objects.
|
||||
|
||||
|
@ -355,9 +355,3 @@ Requested features:
|
|||
|
||||
StatefulSets were formerly known as PetSets and were renamed to be less "cutesy" and more descriptive as a
|
||||
prerequisite to moving to beta. No animals were harmed in the making of this proposal.
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -435,7 +435,7 @@ constraints.
|
|||
`PartitionStatefulSetStrategyType`, the API Server should fail validation
|
||||
if any of the following conditions are true.
|
||||
1. `.Spec.UpdateStrategy.Partition` is nil.
|
||||
1. `.Spec.UpdateStrategy.Parition` is not nil, and
|
||||
1. `.Spec.UpdateStrategy.Partition` is not nil, and
|
||||
`.Spec.UpdateStrategy.Partition.Ordinal` not in the sequence
|
||||
`(0,.Spec.Replicas)`.
|
||||
1. The API Server will fail validation on any update to a StatefulSetStatus
|
||||
|
@ -443,7 +443,7 @@ object if any of the following conditions are true.
|
|||
1. `.Status.Replicas` is negative.
|
||||
1. `.Status.ReadyReplicas` is negative or greater than `.Status.Replicas`.
|
||||
1. `.Status.CurrentReplicas` is negative or greater than `.Status.Replicas`.
|
||||
1. `.Stauts.UpdateReplicas` is negative or greater than `.Status.Replicas`.
|
||||
1. `.Status.UpdateReplicas` is negative or greater than `.Status.Replicas`.
|
||||
|
||||
## Kubectl
|
||||
Kubectl will use the `rollout` command to control and provide the status of
|
||||
|
@ -479,7 +479,7 @@ spec:
|
|||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: gcr.io/google_containers/nginx-slim:0.8
|
||||
image: k8s.gcr.io/nginx-slim:0.8
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: web
|
||||
|
@ -530,7 +530,7 @@ spec:
|
|||
type: RollingUpdate
|
||||
containers:
|
||||
- name: nginx
|
||||
image: gcr.io/google_containers/nginx-slim:0.9
|
||||
image: k8s.gcr.io/nginx-slim:0.9
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: web
|
||||
|
@ -582,7 +582,7 @@ spec:
|
|||
ordinal: 2
|
||||
containers:
|
||||
- name: nginx
|
||||
image: gcr.io/google_containers/nginx-slim:0.9
|
||||
image: k8s.gcr.io/nginx-slim:0.9
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: web
|
||||
|
@ -626,7 +626,7 @@ spec:
|
|||
ordinal: 3
|
||||
containers:
|
||||
- name: nginx
|
||||
image: gcr.io/google_containers/nginx-slim:0.9
|
||||
image: k8s.gcr.io/nginx-slim:0.9
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: web
|
||||
|
@ -670,7 +670,7 @@ spec:
|
|||
ordinal: 2
|
||||
containers:
|
||||
- name: nginx
|
||||
image: gcr.io/google_containers/nginx-slim:0.9
|
||||
image: k8s.gcr.io/nginx-slim:0.9
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: web
|
||||
|
@ -714,7 +714,7 @@ spec:
|
|||
ordinal: 1
|
||||
containers:
|
||||
- name: nginx
|
||||
image: gcr.io/google_containers/nginx-slim:0.9
|
||||
image: k8s.gcr.io/nginx-slim:0.9
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: web
|
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 262 KiB |
File diff suppressed because it is too large
Load Diff
Before Width: | Height: | Size: 50 KiB |
|
@ -0,0 +1,8 @@
|
|||
reviewers:
|
||||
- sig-architecture-leads
|
||||
- jbeda
|
||||
approvers:
|
||||
- sig-architecture-leads
|
||||
- jbeda
|
||||
labels:
|
||||
- sig/architecture
|
|
@ -1,5 +1,8 @@
|
|||
# Kubernetes Design and Architecture
|
||||
|
||||
A much more detailed and updated [Architectural
|
||||
Roadmap](../../devel/architectural-roadmap.md) is also available.
|
||||
|
||||
## Overview
|
||||
|
||||
Kubernetes is a production-grade, open-source infrastructure for the deployment, scaling,
|
||||
|
@ -82,7 +85,7 @@ A running Kubernetes cluster contains node agents (kubelet) and a cluster contro
|
|||
The Kubernetes [control plane](https://en.wikipedia.org/wiki/Control_plane) is split
|
||||
into a set of components, which can all run on a single *master* node, or can be replicated
|
||||
in order to support high-availability clusters, or can even be run on Kubernetes itself (AKA
|
||||
[self-hosted](self-hosted-kubernetes.md#what-is-self-hosted)).
|
||||
[self-hosted](../cluster-lifecycle/self-hosted-kubernetes.md#what-is-self-hosted)).
|
||||
|
||||
Kubernetes provides a REST API supporting primarily CRUD operations on (mostly) persistent resources, which
|
||||
serve as the hub of its control plane. Kubernetes’s API provides IaaS-like
|
||||
|
@ -170,7 +173,7 @@ Kubernetes supports user-provided schedulers and multiple concurrent cluster sch
|
|||
using the shared-state approach pioneered by
|
||||
[Omega](https://research.google.com/pubs/pub41684.html). In addition to the disadvantages of
|
||||
pessimistic concurrency described by the Omega paper,
|
||||
[two-level scheduling models](http://mesos.berkeley.edu/mesos_tech_report.pdf) that hide information from the upper-level
|
||||
[two-level scheduling models](https://amplab.cs.berkeley.edu/wp-content/uploads/2011/06/Mesos-A-Platform-for-Fine-Grained-Resource-Sharing-in-the-Data-Center.pdf) that hide information from the upper-level
|
||||
schedulers need to implement all of the same features in the lower-level scheduler as required by
|
||||
all upper-layer schedulers in order to ensure that their scheduling requests can be satisfied by
|
||||
available desired resources.
|
||||
|
@ -214,7 +217,7 @@ agent.
|
|||
Each node runs a container runtime, which is responsible for downloading images and running containers.
|
||||
|
||||
Kubelet does not link in the base container runtime. Instead, we're defining a
|
||||
[Container Runtime Interface](container-runtime-interface-v1.md) to control the
|
||||
[Container Runtime Interface](/contributors/devel/container-runtime-interface.md) to control the
|
||||
underlying runtime and facilitate pluggability of that layer.
|
||||
This decoupling is needed in order to maintain clear component boundaries, facilitate testing, and facilitate pluggability.
|
||||
Runtimes supported today, either upstream or by forks, include at least docker (for Linux and Windows),
|
||||
|
@ -225,7 +228,7 @@ Runtimes supported today, either upstream or by forks, include at least docker (
|
|||
|
||||
The [service](https://kubernetes.io/docs/concepts/services-networking/service/) abstraction provides a way to
|
||||
group pods under a common access policy (e.g., load-balanced). The implementation of this creates
|
||||
A virtual IP which clients can access and which is transparently proxied to the pods in a Service.
|
||||
a virtual IP which clients can access and which is transparently proxied to the pods in a Service.
|
||||
Each node runs a [kube-proxy](https://kubernetes.io/docs/admin/kube-proxy/) process which programs
|
||||
`iptables` rules to trap access to service IPs and redirect them to the correct backends. This provides a highly-available load-balancing solution with low performance overhead by balancing
|
||||
client traffic from a node on that same node.
|
||||
|
@ -234,10 +237,10 @@ Service endpoints are found primarily via [DNS](https://kubernetes.io/docs/conce
|
|||
|
||||
### Add-ons and other dependencies
|
||||
|
||||
A number of components, called [*add-ons*](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) typically run on Kubernetes
|
||||
A number of components, called [*add-ons*](https://git.k8s.io/kubernetes/cluster/addons) typically run on Kubernetes
|
||||
itself:
|
||||
* [DNS](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns)
|
||||
* [Ingress controller](https://github.com/kubernetes/ingress/tree/master/controllers)
|
||||
* [DNS](https://git.k8s.io/kubernetes/cluster/addons/dns)
|
||||
* [Ingress controller](https://github.com/kubernetes/ingress-gce)
|
||||
* [Heapster](https://github.com/kubernetes/heapster/) (resource monitoring)
|
||||
* [Dashboard](https://github.com/kubernetes/dashboard/) (GUI)
|
||||
|
||||
|
@ -245,8 +248,4 @@ itself:
|
|||
|
||||
A single Kubernetes cluster may span multiple availability zones.
|
||||
|
||||
However, for the highest availability, we recommend using [cluster federation](federation.md).
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
However, for the highest availability, we recommend using [cluster federation](../multicluster/federation.md).
|
|
@ -107,7 +107,3 @@ from whence it came.
|
|||
unique across time.
|
||||
1. This may correspond to Docker's container ID.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -83,7 +83,7 @@ and reference particular entities across operations.
|
|||
A *Namespace* provides an authorization scope for accessing content associated
|
||||
with the *Namespace*.
|
||||
|
||||
See [Authorization plugins](../admin/authorization.md)
|
||||
See [Authorization plugins](https://kubernetes.io/docs/admin/authorization/)
|
||||
|
||||
### Limit Resource Consumption
|
||||
|
||||
|
@ -92,13 +92,13 @@ A *Namespace* provides a scope to limit resource consumption.
|
|||
A *LimitRange* defines min/max constraints on the amount of resources a single
|
||||
entity can consume in a *Namespace*.
|
||||
|
||||
See [Admission control: Limit Range](admission_control_limit_range.md)
|
||||
See [Admission control: Limit Range](../resource-management/admission_control_limit_range.md)
|
||||
|
||||
A *ResourceQuota* tracks aggregate usage of resources in the *Namespace* and
|
||||
allows cluster operators to define *Hard* resource usage limits that a
|
||||
*Namespace* may consume.
|
||||
|
||||
See [Admission control: Resource Quota](admission_control_resource_quota.md)
|
||||
See [Admission control: Resource Quota](../resource-management/admission_control_resource_quota.md)
|
||||
|
||||
### Finalizers
|
||||
|
||||
|
@ -363,8 +363,3 @@ storage.
|
|||
|
||||
At this point, all content associated with that Namespace, and the Namespace
|
||||
itself are gone.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -4,7 +4,7 @@ Principles to follow when extending Kubernetes.
|
|||
|
||||
## API
|
||||
|
||||
See also the [API conventions](../devel/api-conventions.md).
|
||||
See also the [API conventions](../../devel/api-conventions.md).
|
||||
|
||||
* All APIs should be declarative.
|
||||
* API objects should be complementary and composable, not opaque wrappers.
|
||||
|
@ -96,7 +96,3 @@ TODO
|
|||
|
||||
* [Eric Raymond's 17 UNIX rules](https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E2.80.99s_17_Unix_Rules)
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,6 @@
|
|||
reviewers:
|
||||
- sig-auth-leads
|
||||
approvers:
|
||||
- sig-auth-leads
|
||||
labels:
|
||||
- sig/auth
|
|
@ -212,7 +212,7 @@ Engine `project`. It provides a namespace for objects created by a group of
|
|||
people co-operating together, preventing name collisions with non-cooperating
|
||||
groups. It also serves as a reference point for authorization policies.
|
||||
|
||||
Namespaces are described in [namespaces.md](namespaces.md).
|
||||
Namespaces are described in [namespaces](../architecture/namespaces.md).
|
||||
|
||||
In the Enterprise Profile:
|
||||
- a `userAccount` may have permission to access several `namespace`s.
|
||||
|
@ -223,7 +223,7 @@ In the Simple Profile:
|
|||
Namespaces versus userAccount vs. Labels:
|
||||
- `userAccount`s are intended for audit logging (both name and UID should be
|
||||
logged), and to define who has access to `namespace`s.
|
||||
- `labels` (see [docs/user-guide/labels.md](../../docs/user-guide/labels.md))
|
||||
- `labels` (see [labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/))
|
||||
should be used to distinguish pods, users, and other objects that cooperate
|
||||
towards a common goal but are different in some way, such as version, or
|
||||
responsibilities.
|
||||
|
@ -326,7 +326,7 @@ namespaces, or to make a K8s User into a K8s Project Admin.)
|
|||
|
||||
The API should have a `quota` concept (see http://issue.k8s.io/442). A quota
|
||||
object relates a namespace (and optionally a label selector) to a maximum
|
||||
quantity of resources that may be used (see [resources design doc](resources.md)).
|
||||
quantity of resources that may be used (see [resources design doc](../scheduling/resources.md)).
|
||||
|
||||
Initially:
|
||||
- A `quota` object is immutable.
|
||||
|
@ -370,7 +370,3 @@ Improvements:
|
|||
- Policies to drop logging for high rate trusted API calls, or by users
|
||||
performing audit or other sensitive functions.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -1,5 +1,3 @@
|
|||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Motivation](#motivation)
|
||||
- [Related work](#related-work)
|
||||
|
@ -22,8 +20,6 @@
|
|||
- [Profile authoring](#profile-authoring)
|
||||
- [Appendix](#appendix)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
# Overview
|
||||
|
||||
AppArmor is a [mandatory access control](https://en.wikipedia.org/wiki/Mandatory_access_control)
|
||||
|
@ -85,7 +81,7 @@ annotation. If a profile is specified, the Kubelet will verify that the node mee
|
|||
the container, and will not run the container if the profile cannot be applied. If the requirements
|
||||
are met, the container runtime will configure the appropriate options to apply the profile. Profile
|
||||
requirements and defaults can be specified on the
|
||||
[PodSecurityPolicy](security-context-constraints.md).
|
||||
[PodSecurityPolicy](pod-security-policy.md).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
|
@ -136,7 +132,7 @@ The profiles can be specified in the following formats (following the convention
|
|||
|
||||
### Pod Security Policy
|
||||
|
||||
The [PodSecurityPolicy](security-context-constraints.md) allows cluster administrators to control
|
||||
The [PodSecurityPolicy](pod-security-policy.md) allows cluster administrators to control
|
||||
the security context for a pod and its containers. An annotation can be specified on the
|
||||
PodSecurityPolicy to restrict which AppArmor profiles can be used, and specify a default if no
|
||||
profile is specified.
|
||||
|
@ -272,7 +268,7 @@ already underway for Docker, called
|
|||
## Container Runtime Interface
|
||||
|
||||
Other container runtimes will likely add AppArmor support eventually, so the
|
||||
[Container Runtime Interface](container-runtime-interface-v1.md) (CRI) needs to be made compatible
|
||||
[Container Runtime Interface](/contributors/devel/container-runtime-interface.md) (CRI) needs to be made compatible
|
||||
with this design. The two important pieces are a way to report whether AppArmor is supported by the
|
||||
runtime, and a way to specify the profile to load (likely through the `LinuxContainerConfig`).
|
||||
|
||||
|
@ -304,7 +300,3 @@ documentation for following this process in a Kubernetes environment.
|
|||
```
|
||||
$ apparmor_parser --remove /path/to/profile
|
||||
```
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,94 @@
|
|||
# Cluster Role Aggregation
|
||||
In order to support easy RBAC integration for CustomResources and Extension
|
||||
APIServers, we need to have a way for API extenders to add permissions to the
|
||||
"normal" roles for admin, edit, and view.
|
||||
|
||||
These roles express an intent for the namespaced power of administrators of the
|
||||
namespace (manage ownership), editors of the namespace (manage content like
|
||||
pods), and viewers of the namespace (see what is present). As new APIs are
|
||||
made available, these roles should reflect that intent to prevent migration
|
||||
concerns every time a new API is added.
|
||||
|
||||
To do this, we will allow one ClusterRole to be built out of a selected set of
|
||||
ClusterRoles.
|
||||
|
||||
## API Changes
|
||||
```yaml
|
||||
aggregationRule:
|
||||
selectors:
|
||||
- matchLabels:
|
||||
rbac.authorization.k8s.io/aggregate-to-admin: true
|
||||
```
|
||||
|
||||
```go
|
||||
// ClusterRole is a cluster level, logical grouping of PolicyRules that can be referenced as a unit by a RoleBinding or ClusterRoleBinding.
|
||||
type ClusterRole struct {
|
||||
metav1.TypeMeta
|
||||
// Standard object's metadata.
|
||||
metav1.ObjectMeta
|
||||
|
||||
// Rules holds all the PolicyRules for this ClusterRole
|
||||
Rules []PolicyRule
|
||||
|
||||
// AggregationRule is an optional field that describes how to build the Rules for this ClusterRole.
|
||||
// If AggregationRule is set, then the Rules are controller managed and direct changes to Rules will be
|
||||
// stomped by the controller.
|
||||
AggregationRule *AggregationRule
|
||||
}
|
||||
|
||||
// AggregationRule describes how to locate ClusterRoles to aggregate into the ClusterRole
|
||||
type AggregationRule struct {
|
||||
// Selector holds a list of selectors which will be used to find ClusterRoles and create the rules.
|
||||
// If any of the selectors match, then the ClusterRole's permissions will be added
|
||||
Selectors []metav1.LabelSelector
|
||||
}
|
||||
```
|
||||
|
||||
The `aggregationRule` stanza contains a list of LabelSelectors which are used
|
||||
to select the set of ClusterRoles which should be combined. When
|
||||
`aggregationRule` is set, the list of `rules` becomes controller managed and is
|
||||
subject to overwriting at any point.
|
||||
|
||||
`aggregationRule` needs to be protected from escalation. The simplest way to
|
||||
do this is to restrict it to users with verb=`*`, apiGroups=`*`, resources=`*`. We
|
||||
could later loosen it by using a covers check against all aggregated rules
|
||||
without changing backward compatibility.
|
||||
|
||||
## Controller
|
||||
There is a controller which watches for changes to ClusterRoles and then
|
||||
updates all aggregated ClusterRoles if their list of Rules has changed. Since
|
||||
there are relatively few ClusterRoles, it checks them all and most
|
||||
short-circuit.
|
||||
|
||||
## The Payoff
|
||||
If you want to create a CustomResource for your operator and you want namespace
|
||||
admin's to be able to create one, instead of trying to:
|
||||
1. Create a new ClusterRole
|
||||
2. Update every namespace with a matching RoleBinding
|
||||
3. Teach everyone to add the RoleBinding to all their admin users
|
||||
4. When you remove it, clean up dangling RoleBindings
|
||||
|
||||
Or
|
||||
|
||||
1. Make a non-declarative patch against the admin ClusterRole
|
||||
2. When you remove it, try to safely create a new non-declarative patch to
|
||||
remove it.
|
||||
|
||||
You can simply create a new ClusterRole like
|
||||
```yaml
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: etcd-operator-admin
|
||||
label:
|
||||
rbac.authorization.k8s.io/aggregate-to-admin: true
|
||||
rules:
|
||||
- apiGroups:
|
||||
- etcd.database.coreos.com
|
||||
resources:
|
||||
- etcdclusters
|
||||
verbs:
|
||||
- "*"
|
||||
```
|
||||
alongside your CustomResourceDefinition. The admin role is updated correctly and
|
||||
removal is a `kubectl delete -f` away.
|
|
@ -422,8 +422,3 @@ type LocalResourceAccessReviewResponse struct {
|
|||
Groups []string
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,94 @@
|
|||
# Allow Pod Security Policy to manage access to the Flexvolumes
|
||||
|
||||
## Current state
|
||||
|
||||
Cluster admins can control the usage of specific volume types by using Pod
|
||||
Security Policy (PSP). Admins can allow the use of Flexvolumes by listing the
|
||||
`flexVolume` type in the `volumes` field. The only thing that can be managed is
|
||||
allowance or disallowance of Flexvolumes.
|
||||
|
||||
Technically, Flexvolumes are implemented as vendor drivers. They are executable
|
||||
files that must be placed on every node at
|
||||
`/usr/libexec/kubernetes/kubelet-plugins/volume/exec/<vendor~driver>/<driver>`.
|
||||
In most cases they are scripts. Limiting driver access means not only limiting
|
||||
an access to the volumes that this driver can provide, but also managing access
|
||||
to executing a driver’s code (that is arbitrary, in fact).
|
||||
|
||||
It is possible to have many flex drivers for the different storage types. In
|
||||
essence, Flexvolumes represent not a single volume type, but the different
|
||||
types that allow usage of various vendor volumes.
|
||||
|
||||
## Desired state
|
||||
|
||||
In order to further improve security and to provide more granular control for
|
||||
the usage of the different Flexvolumes, we need to enhance PSP. When such a
|
||||
change takes place, cluster admins will be able to grant access to any
|
||||
Flexvolumes of a particular driver (in contrast to any volume of all drivers).
|
||||
|
||||
For example, if we have two drivers for Flexvolumes (`cifs` and
|
||||
`digitalocean`), it will become possible to grant access for one group to use
|
||||
only volumes from DigitalOcean and grant access for another group to use
|
||||
volumes from all Flexvolumes.
|
||||
|
||||
## Proposed changes
|
||||
|
||||
It has been suggested to add a whitelist of allowed Flexvolume drivers to the
|
||||
PSP. It should behave similar to [the existing
|
||||
`allowedHostPaths`](https://github.com/kubernetes/kubernetes/pull/50212) except
|
||||
that:
|
||||
|
||||
1) comparison of equality will be used instead of comparison of prefixes.
|
||||
2) Flexvolume’s driver field will be inspected rather than `hostPath`’s path field.
|
||||
|
||||
### PodSecurityPolicy modifications
|
||||
|
||||
```go
|
||||
// PodSecurityPolicySpec defines the policy enforced.
|
||||
type PodSecurityPolicySpec struct {
|
||||
...
|
||||
// AllowedFlexVolumes is a whitelist of allowed Flexvolumes. Empty or nil indicates that all
|
||||
// Flexvolumes may be used. This parameter is effective only when the usage of the Flexvolumes
|
||||
// is allowed in the "Volumes" field.
|
||||
// +optional
|
||||
AllowedFlexVolumes []AllowedFlexVolume
|
||||
}
|
||||
|
||||
// AllowedFlexVolume represents a single Flexvolume that is allowed to be used.
|
||||
type AllowedFlexVolume struct {
|
||||
// Driver is the name of the Flexvolume driver.
|
||||
Driver string
|
||||
}
|
||||
```
|
||||
|
||||
Empty `AllowedFlexVolumes` allows usage of Flexvolumes with any driver. It must
|
||||
behave as before and provide backward compatibility.
|
||||
|
||||
Non-empty `AllowedFlexVolumes` changes the behavior from "all allowed" to "all
|
||||
disallowed except those that are explicitly listed here".
|
||||
|
||||
### Admission controller modifications
|
||||
|
||||
Admission controller should be updated accordingly to inspect a Pod's volumes.
|
||||
If it finds a `flexVolume`, it should ensure that its driver is allowed to be
|
||||
used.
|
||||
|
||||
### Validation rules
|
||||
|
||||
Flexvolume driver names must be non-empty.
|
||||
|
||||
If a PSP disallows to pods to request volumes of type `flexVolume` then
|
||||
`AllowedFlexVolumes` must be empty. In case it is not empty, API server must
|
||||
report an error.
|
||||
|
||||
API server should allow granting an access to Flexvolumes that do not exist at
|
||||
time of PSP creation.
|
||||
|
||||
## Notes
|
||||
It is possible to have even more flexible control over the Flexvolumes and take
|
||||
into account options that have been passed to a driver. We decided that this is
|
||||
a desirable feature but outside the scope of this proposal.
|
||||
|
||||
The current change could be enough for many cases. Also, when cluster admins
|
||||
are able to manage access to particular Flexvolume drivers, it becomes possible
|
||||
to "emulate" control over the driver’s options by using many drivers with
|
||||
hard-coded options.
|
|
@ -322,10 +322,5 @@ It will not be a generic webhook. A generic webhook would need a lot more discus
|
|||
Additionally, just sending all the fields of just the Pod kind also has problems:
|
||||
- it exposes our whole API to a webhook backend without giving us (the project) any chance to review or understand how it is being used.
|
||||
- because we do not know which fields of an object are inspected by the backend, caching of decisions is not effective. Sending fewer fields allows caching.
|
||||
- sending fewer fields makes it possible to rev the version of the webhook request slower than the version of our internal obejcts (e.g. pod v2 could still use imageReview v1.)
|
||||
- sending fewer fields makes it possible to rev the version of the webhook request slower than the version of our internal objects (e.g. pod v2 could still use imageReview v1.)
|
||||
probably lots more reasons.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
Binary file not shown.
After Width: | Height: | Size: 7.5 KiB |
Binary file not shown.
After Width: | Height: | Size: 20 KiB |
|
@ -0,0 +1,114 @@
|
|||
# KMS Plugin API for secrets encryption
|
||||
|
||||
## Background
|
||||
|
||||
Since v1.7, Kubernetes allows encryption of resources. It supports 3 kinds of encryptions: ``aescbc``, ``aesgcm`` and ``secretbox``. They are implemented as value transformer. This feature currently only supports encryption using keys in the configuration file (plain text, encoded with base64).
|
||||
|
||||
Using an external trusted service to manage the keys separates the responsibility of key management from operating and managing a Kubernetes cluster. So a new transformer, “Envelope Transformer”, was introduced in 1.8 ([49350](https://github.com/kubernetes/kubernetes/pull/49350)). “Envelope Transformer” defines an extension point, the interface ``envelope.Service``. The intent was to make it easy to add new KMS provider by implementing the interface. For example the provider for Google Cloud KMS, Hashicorp Vault and Microsoft azure KeyVault.
|
||||
|
||||
But as more KMS providers are added, more vendor dependencies are also introduced. So now we wish to pull all KMS providers out of API server, while retaining the ability of the API server to delegate encrypting secrets to an external trusted KMS service .
|
||||
|
||||
## High Level Design
|
||||
|
||||
At a high-level, (see [51965](https://github.com/kubernetes/kubernetes/issues/51965)), use gRPC to decouple the API server from the out-of-tree KMS providers. There is only one envelope service implementation (implement the interface ``envelope.Service``), and the envelope service communicates with the out-of-tree KMS provider through gRPC. The deployment diagram is like below:
|
||||
|
||||

|
||||
|
||||
Here we assume the remote KMS provider is accessible from the API server. Not care how to launch the KMS provider process and how to manage it in this document.
|
||||
|
||||
API server side (gRPC client) should know nothing about the external KMS. We only need to configure the KMS provider (gRPC server) endpoint for it.
|
||||
|
||||
KMS provider (gRPC server) must handle all details related to external KMS. It need know how to connect to KMS, how to pass the authentication, and which key or keys will be used, etc.. A qualified KMS provider implementation should hide all details from API server.
|
||||
|
||||
To add new KMS provider, we just implement the gRPC server. No new code or dependencies are added into API server. We configure the API server to make the gRPC client communicate with the new KMS provider.
|
||||
|
||||
Following is the class diagram that illustrates a possible implementation:
|
||||
|
||||

|
||||
|
||||
The class ``envelope.envelopeTransformer`` and the interface ``envelope.Service`` exists in current code base. What we need to do is to implement the class ``envelope.grpcService``.
|
||||
|
||||
## Proto File Definition
|
||||
|
||||
```protobuf
|
||||
// envelope/service.proto
|
||||
syntax = "proto3";
|
||||
|
||||
package envelope;
|
||||
|
||||
service KMSService {
|
||||
// Version returns the runtime name and runtime version.
|
||||
rpc Version(VersionRequest) returns (VersionResponse) {}
|
||||
rpc Decrypt(DecryptRequest) returns (DecryptResponse) {}
|
||||
rpc Encrypt(EncryptRequest) returns (EncryptResponse) {}
|
||||
}
|
||||
|
||||
message VersionRequest {
|
||||
// Version of the KMS plugin API.
|
||||
string version = 1;
|
||||
}
|
||||
|
||||
message VersionResponse {
|
||||
// Version of the KMS plugin API.
|
||||
string version = 1;
|
||||
// Name of the KMS provider.
|
||||
string runtime_name = 2;
|
||||
// Version of the KMS provider. The string must be semver-compatible.
|
||||
string runtime_version = 3;
|
||||
}
|
||||
|
||||
message DecryptRequest {
|
||||
// Version of the KMS plugin API, now use “v1beta1”
|
||||
string version = 1;
|
||||
bytes cipher = 2;
|
||||
}
|
||||
|
||||
message DecryptResponse {
|
||||
bytes plain = 1;
|
||||
}
|
||||
|
||||
message EncryptRequest {
|
||||
// Version of the KMS plugin API, now use “v1beta1”
|
||||
string version = 1;
|
||||
bytes plain = 2;
|
||||
}
|
||||
|
||||
message EncryptResponse {
|
||||
bytes cipher = 1;
|
||||
}
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
To avoid the need to implement authentication and authorization, the KMS provider should run on the master and be called via a local unix domain socket.
|
||||
|
||||
Cluster administrators have various options to ensure the KMS provider runs on the master: taints and tolerations is one. On GKE we target configuration at the kubelet that runs on the master directly (it isn't registered as a regular kubelet) and will have it start the KMS provider.
|
||||
|
||||
## Performance
|
||||
|
||||
The KMS provider will be called on every secret write and make a remote RPC to the KMS provider to do the actual encrypt/decrypt. To keep the overhead of the gRPC call to the KMS provider low, the KMS provider should run on the master. This should mean the extra overhead is small compared to the remote call.
|
||||
|
||||
Unencrypted DEKs are cached on the API server side, so gRPC calls to the KMS provider are only required to fill the cache on startup.
|
||||
|
||||
## Configuration
|
||||
|
||||
The out-of-tree provider will be specified in existing configuration file used to configure any of the encryption providers. The location of this configuration file is identified by the existing startup parameter ``--experimental-encryption-provider-config``.
|
||||
|
||||
To specify the gRPC server endpoint, we add a new configuration parameter ``endpoint`` for the KMS configuration in a current deployment. The endpoint is a unix domain socket connection, for example ``unix:///tmp/kms-provider.sock``.
|
||||
|
||||
Now we expect the API server and KMS provider run in the same Pod. Not support TCP socket connection. So it’s not necessary to add TLS support.
|
||||
|
||||
Here is a sample configuration file with vault out-of-tree provider configured:
|
||||
|
||||
```yaml
|
||||
kind: EncryptionConfig
|
||||
apiVersion: v1
|
||||
resources:
|
||||
- resources:
|
||||
- secrets
|
||||
providers:
|
||||
- kms:
|
||||
name: grpc-kms-provider
|
||||
cachesize: 1000
|
||||
endpoint: unix:///tmp/kms-provider.sock
|
||||
```
|
|
@ -24,7 +24,7 @@ is inherited across `fork`, `clone` and `execve` and can not be unset. With
|
|||
that could not have been done without the `execve` call.
|
||||
|
||||
For more details about `no_new_privs`, please check the
|
||||
[Linux kernel documention](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt).
|
||||
[Linux kernel documentation](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt).
|
||||
|
||||
This is different from `NOSUID` in that `no_new_privs`can give permission to
|
||||
the container process to further restrict child processes with seccomp. This
|
|
@ -368,7 +368,3 @@ E2E test cases will be added to test the correct determination of the security c
|
|||
1. The Kubelet will use the new fields on the `PodSecurityContext` for host namespace control
|
||||
2. The Kubelet will be modified to correctly implement the backward compatibility and effective
|
||||
security context determination defined here
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -18,7 +18,7 @@ granting the user themselves an elevated set of permissions.
|
|||
|
||||
## Goals
|
||||
|
||||
1. Associate [service accounts](../design-proposals/service_accounts.md), groups, and users with
|
||||
1. Associate [service accounts](service_accounts.md), groups, and users with
|
||||
a set of constraints that dictate how a security context is established for a pod and the pod's containers.
|
||||
1. Provide the ability for users and infrastructure components to run pods with elevated privileges
|
||||
on behalf of another user or within a namespace where privileges are more restrictive.
|
||||
|
@ -343,9 +343,3 @@ for a specific UID and fail early if possible. However, if the `RunAsUser` is n
|
|||
it should still admit the pod and allow the Kubelet to ensure that the image does not run as
|
||||
`root` with the existing non-root checks.
|
||||
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,238 @@
|
|||
# RunAsGroup Proposal
|
||||
|
||||
**Author**: krmayankk@
|
||||
|
||||
**Status**: Proposal
|
||||
|
||||
## Abstract
|
||||
|
||||
|
||||
As a Kubernetes User, we should be able to specify both user id and group id for the containers running
|
||||
inside a pod on a per Container basis, similar to how docker allows that using docker run options `-u,
|
||||
--user="" Username or UID (format: <name|uid>[:<group|gid>]) format`.
|
||||
|
||||
PodSecurityContext allows Kubernetes users to specify RunAsUser which can be overridden by RunAsUser
|
||||
in SecurityContext on a per Container basis. There is no equivalent field for specifying the primary
|
||||
Group of the running container.
|
||||
|
||||
## Motivation
|
||||
|
||||
Enterprise Kubernetes users want to run containers as non root. This means running containers with a
|
||||
non zero user id and non zero primary group id. This gives Enterprises, confidence that their customer code
|
||||
is running with least privilege and if it escapes the container boundary, will still cause least harm
|
||||
by decreasing the attack surface.
|
||||
|
||||
### What is the significance of Primary Group Id ?
|
||||
Primary Group Id is the group id used when creating files and directories. It is also the default group
|
||||
associated with a user, when he/she logins. All groups are defined in `/etc/group` file and are created
|
||||
with the `groupadd` command. A Process/Container runs with uid/primary gid of the calling user. If no
|
||||
primary group is specified for a user, 0(root) group is assumed. This means , any files/directories created
|
||||
by a process running as user with no primary group associated with it, will be owned by group id 0(root).
|
||||
|
||||
## Goals
|
||||
|
||||
1. Provide the ability to specify the Primary Group id for a container inside a Pod
|
||||
2. Bring launching of containers using Kubernetes at par with Dockers by supporting the same features.
|
||||
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Use case 1:
|
||||
As a Kubernetes User, I should be able to control both user id and primary group id of containers
|
||||
launched using Kubernetes at runtime, so that i can run the container as non root with least possible
|
||||
privilege.
|
||||
|
||||
### Use case 2:
|
||||
As a Kubernetes User, I should be able to control both user id and primary group id of containers
|
||||
launched using Kubernetes at runtime, so that i can override the user id and primary group id specified
|
||||
in the Dockerfile of the container image, without having to create a new Docker image.
|
||||
|
||||
## Design
|
||||
|
||||
### Model
|
||||
|
||||
Introduce a new API field in SecurityContext and PodSecurityContext called `RunAsGroup`.
|
||||
|
||||
#### SecurityContext
|
||||
|
||||
```
|
||||
// SecurityContext holds security configuration that will be applied to a container.
|
||||
// Some fields are present in both SecurityContext and PodSecurityContext. When both
|
||||
// are set, the values in SecurityContext take precedence.
|
||||
type SecurityContext struct {
|
||||
//Other fields not shown for brevity
|
||||
.....
|
||||
|
||||
// The UID to run the entrypoint of the container process.
|
||||
// Defaults to user specified in image metadata if unspecified.
|
||||
// May also be set in PodSecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence.
|
||||
// +optional
|
||||
RunAsUser *int64
|
||||
// The GID to run the entrypoint of the container process.
|
||||
// Defaults to group specified in image metadata if unspecified.
|
||||
// May also be set in PodSecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence.
|
||||
// +optional
|
||||
RunAsGroup *int64
|
||||
// Indicates that the container must run as a non-root user.
|
||||
// If true, the Kubelet will validate the image at runtime to ensure that it
|
||||
// does not run as UID 0 (root) and fail to start the container if it does.
|
||||
// If unset or false, no such validation will be performed.
|
||||
// May also be set in SecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence.
|
||||
// +optional
|
||||
RunAsNonRoot *bool
|
||||
// Indicates that the container must run as a non-root group.
|
||||
// If true, the Kubelet will validate the image at runtime to ensure that it
|
||||
// does not run as GID 0 (root) and fail to start the container if it does.
|
||||
// If unset or false, no such validation will be performed.
|
||||
// May also be set in SecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence.
|
||||
// +optional
|
||||
RunAsNonRootGroup *bool
|
||||
|
||||
.....
|
||||
}
|
||||
```
|
||||
|
||||
#### PodSecurityContext
|
||||
|
||||
```
|
||||
type PodSecurityContext struct {
|
||||
//Other fields not shown for brevity
|
||||
.....
|
||||
|
||||
// The UID to run the entrypoint of the container process.
|
||||
// Defaults to user specified in image metadata if unspecified.
|
||||
// May also be set in SecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence
|
||||
// for that container.
|
||||
// +optional
|
||||
RunAsUser *int64
|
||||
// The GID to run the entrypoint of the container process.
|
||||
// Defaults to group specified in image metadata if unspecified.
|
||||
// May also be set in PodSecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence.
|
||||
// +optional
|
||||
RunAsGroup *int64
|
||||
// Indicates that the container must run as a non-root user.
|
||||
// If true, the Kubelet will validate the image at runtime to ensure that it
|
||||
// does not run as UID 0 (root) and fail to start the container if it does.
|
||||
// If unset or false, no such validation will be performed.
|
||||
// May also be set in SecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence.
|
||||
// +optional
|
||||
RunAsNonRoot *bool
|
||||
// Indicates that the container must run as a non-root group.
|
||||
// If true, the Kubelet will validate the image at runtime to ensure that it
|
||||
// does not run as GID 0 (root) and fail to start the container if it does.
|
||||
// If unset or false, no such validation will be performed.
|
||||
// May also be set in SecurityContext. If set in both SecurityContext and
|
||||
// PodSecurityContext, the value specified in SecurityContext takes precedence.
|
||||
// +optional
|
||||
RunAsNonRootGroup *bool
|
||||
|
||||
|
||||
.....
|
||||
}
|
||||
```
|
||||
|
||||
#### PodSecurityPolicy
|
||||
|
||||
PodSecurityPolicy defines strategies or conditions that a pod must run with in order to be accepted
|
||||
into the system. Two of the relevant strategies are RunAsUser and SupplementalGroups. We introduce
|
||||
a new strategy called RunAsGroup which will support the following options:
|
||||
- MustRunAs
|
||||
- MustRunAsNonRoot
|
||||
- RunAsAny
|
||||
|
||||
```
|
||||
// PodSecurityPolicySpec defines the policy enforced.
|
||||
type PodSecurityPolicySpec struct {
|
||||
//Other fields not shown for brevity
|
||||
.....
|
||||
// RunAsUser is the strategy that will dictate the allowable RunAsUser values that may be set.
|
||||
RunAsUser RunAsUserStrategyOptions
|
||||
// SupplementalGroups is the strategy that will dictate what supplemental groups are used by the SecurityContext.
|
||||
SupplementalGroups SupplementalGroupsStrategyOptions
|
||||
|
||||
|
||||
// RunAsGroup is the strategy that will dictate the allowable RunAsGroup values that may be set.
|
||||
RunAsGroup RunAsGroupStrategyOptions
|
||||
.....
|
||||
}
|
||||
|
||||
// RunAsGroupStrategyOptions defines the strategy type and any options used to create the strategy.
|
||||
type RunAsUserStrategyOptions struct {
|
||||
// Rule is the strategy that will dictate the allowable RunAsGroup values that may be set.
|
||||
Rule RunAsGroupStrategy
|
||||
// Ranges are the allowed ranges of gids that may be used.
|
||||
// +optional
|
||||
Ranges []GroupIDRange
|
||||
}
|
||||
|
||||
// RunAsGroupStrategy denotes strategy types for generating RunAsGroup values for a
|
||||
// SecurityContext.
|
||||
type RunAsGroupStrategy string
|
||||
|
||||
const (
|
||||
// container must run as a particular gid.
|
||||
RunAsGroupStrategyMustRunAs RunAsGroupStrategy = "MustRunAs"
|
||||
// container must run as a non-root gid
|
||||
RunAsGroupStrategyMustRunAsNonRoot RunAsGroupStrategy = "MustRunAsNonRoot"
|
||||
// container may make requests for any gid.
|
||||
RunAsGroupStrategyRunAsAny RunAsGroupStrategy = "RunAsAny"
|
||||
)
|
||||
```
|
||||
|
||||
## Behavior
|
||||
|
||||
Following points should be noted:
|
||||
|
||||
- `FSGroup` and `SupplementalGroups` will continue to have their old meanings and would be untouched.
|
||||
- The `RunAsGroup` In the SecurityContext will override the `RunAsGroup` in the PodSecurityContext.
|
||||
- If both `RunAsUser` and `RunAsGroup` are NOT provided, the USER field in Dockerfile is used
|
||||
- If both `RunAsUser` and `RunAsGroup` are specified, that is passed directly as User.
|
||||
- If only one of `RunAsUser` or `RunAsGroup` is specified, the remaining value is decided by the Runtime,
|
||||
where the Runtime behavior is to make it run with uid or gid as 0.
|
||||
- If a non numeric Group is specified in the Dockerfile and `RunAsNonRootGroup` is set, this will be
|
||||
treated as error, similar to the behavior of `RunAsNonRoot` for non numeric User in Dockerfile.
|
||||
|
||||
Basically, we guarantee to set the values provided by user, and the runtime dictates the rest.
|
||||
|
||||
Here is an example of what gets passed to docker User
|
||||
- runAsUser set to 9999, runAsGroup set to 9999 -> Config.User set to 9999:9999
|
||||
- runAsUser set to 9999, runAsGroup unset -> Config.User set to 9999 -> docker runs you with 9999:0
|
||||
- runAsUser unset, runAsGroup set to 9999 -> Config.User set to :9999 -> docker runs you with 0:9999
|
||||
- runAsUser unset, runAsGroup unset -> Config.User set to whatever is present in Dockerfile
|
||||
This is to keep the behavior backward compatible and as expected.
|
||||
|
||||
## Summary of Changes needed
|
||||
|
||||
At a high level, the changes classify into:
|
||||
1. API
|
||||
2. Validation
|
||||
3. CRI
|
||||
4. Runtime for Docker and rkt
|
||||
5. Swagger
|
||||
6. DockerShim
|
||||
7. Admission
|
||||
8. Registry
|
||||
|
||||
- plugin/pkg/admission/security/podsecuritypolicy
|
||||
- plugin/pkg/admission/securitycontext
|
||||
- pkg/securitycontext/util.go
|
||||
- pkg/security/podsecuritypolicy/selinux
|
||||
- pkg/security/podsecuritypolicy/user
|
||||
- pkg/security/podsecuritypolicy/group
|
||||
- pkg/registry/extensions/podsecuritypolicy/storage
|
||||
- pkg/kubelet/rkt
|
||||
- pkg/kubelet/kuberuntime
|
||||
- pkg/kubelet/dockershim/
|
||||
- pkg/kubelet/apis/cri/v1alpha1/runtime
|
||||
- pkg/apis/extensions/validation/
|
||||
- pkg/api/validation/
|
||||
- api/swagger-spec/
|
||||
- api/openapi-spec/swagger.json
|
||||
|
|
@ -1,9 +1,9 @@
|
|||
## Abstract
|
||||
|
||||
A proposal for the distribution of [secrets](../user-guide/secrets.md)
|
||||
A proposal for the distribution of [secrets](https://kubernetes.io/docs/concepts/configuration/secret/)
|
||||
(passwords, keys, etc) to the Kubelet and to containers inside Kubernetes using
|
||||
a custom [volume](../user-guide/volumes.md#secrets) type. See the
|
||||
[secrets example](../user-guide/secrets/) for more information.
|
||||
a custom [volume](https://kubernetes.io/docs/concepts/storage/volumes/#secret) type. See the
|
||||
[secrets example](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets) for more information.
|
||||
|
||||
## Motivation
|
||||
|
||||
|
@ -622,7 +622,3 @@ on their filesystems:
|
|||
/etc/secret-volume/username
|
||||
/etc/secret-volume/password
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -212,7 +212,3 @@ a separate component that can delete bindings but not create them). The
|
|||
scheduler may need read access to user or project-container information to
|
||||
determine preferential location (underspecified at this time).
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -186,7 +186,3 @@ privileged. Contexts that attempt to define a UID or SELinux options will be
|
|||
denied by default. In the future the admission plugin will base this decision
|
||||
upon configurable policies that reside within the [service account](http://pr.k8s.io/2297).
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -204,7 +204,3 @@ Finally, it may provide an interface to automate creation of new
|
|||
serviceAccounts. In that case, the user may want to GET serviceAccounts to see
|
||||
what has been created.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -0,0 +1,6 @@
|
|||
reviewers:
|
||||
- sig-autoscaling-leads
|
||||
approvers:
|
||||
- sig-autoscaling-leads
|
||||
labels:
|
||||
- sig/autoscaling
|
|
@ -254,10 +254,3 @@ autoscaler to create a new pod. Discussed in issue [#3247](https://github.com/k
|
|||
* *[future]* **When scaling down, make more educated decision which pods to
|
||||
kill.** E.g.: if two or more pods from the same replication controller are on
|
||||
the same node, kill one of them. Discussed in issue [#4301](https://github.com/kubernetes/kubernetes/issues/4301).
|
||||
|
||||
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
|
@ -2,7 +2,7 @@ Horizontal Pod Autoscaler Status Conditions
|
|||
===========================================
|
||||
|
||||
Currently, the HPA status conveys the last scale time, current and desired
|
||||
replacas, and the last-retrieved values of the metrics used to autoscale.
|
||||
replicas, and the last-retrieved values of the metrics used to autoscale.
|
||||
|
||||
However, the status field conveys no information about whether or not the
|
||||
HPA controller encountered difficulties while attempting to fetch metrics,
|
||||
|
@ -77,7 +77,7 @@ entirely.
|
|||
- *FailedRescale*: a scale update was needed and the HPA controller was
|
||||
unable to actually update the scale subresource of the target scalable.
|
||||
|
||||
- *SuccesfulRescale*: a scale update was needed and everything went
|
||||
- *SuccessfulRescale*: a scale update was needed and everything went
|
||||
properly.
|
||||
|
||||
- *FailedUpdateStatus*: the HPA controller failed to update the status of
|
|
@ -280,12 +280,8 @@ Mechanical Concerns
|
|||
|
||||
The HPA will derive metrics from two sources: resource metrics (i.e. CPU
|
||||
request percentage) will come from the
|
||||
[master metrics API](resource-metrics-api.md), while other metrics will
|
||||
come from the [custom metrics API](custom-metrics-api.md), which is
|
||||
[master metrics API](../instrumentation/resource-metrics-api.md), while other metrics will
|
||||
come from the [custom metrics API](../instrumentation/custom-metrics-api.md), which is
|
||||
an adapter API which sources metrics directly from the monitoring
|
||||
pipeline.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
Binary file not shown.
After Width: | Height: | Size: 303 KiB |
|
@ -5,7 +5,7 @@ and set them before the container is run. This document describes design of the
|
|||
|
||||
## Motivation
|
||||
|
||||
Since we want to make Kubernetes as simple as possible for its users we don't want to require setting [Resources](../design/resource-qos.md) for container by its owner.
|
||||
Since we want to make Kubernetes as simple as possible for its users we don't want to require setting [Resources](../node/resource-qos.md) for container by its owner.
|
||||
On the other hand having Resources filled is critical for scheduling decisions.
|
||||
Current solution to set up Resources to hardcoded value has obvious drawbacks.
|
||||
We need to implement a component which will set initial Resources to a reasonable value.
|
||||
|
@ -18,7 +18,7 @@ For every container without Resources specified it will try to predict amount of
|
|||
So that a pod without specified resources will be treated as
|
||||
.
|
||||
|
||||
InitialResources will set only [request](../design/resource-qos.md#requests-and-limits) (independently for each resource type: cpu, memory) field in the first version to avoid killing containers due to OOM (however the container still may be killed if exceeds requested resources).
|
||||
InitialResources will set only [request](../node/resource-qos.md#requests-and-limits) (independently for each resource type: cpu, memory) field in the first version to avoid killing containers due to OOM (however the container still may be killed if exceeds requested resources).
|
||||
To make the component work with LimitRanger the estimated value will be capped by min and max possible values if defined.
|
||||
It will prevent from situation when the pod is rejected due to too low or too high estimation.
|
||||
|
||||
|
@ -70,6 +70,3 @@ and should be introduced shortly after the first version is done:
|
|||
* add estimation as annotations for those containers that already has resources set
|
||||
* support for other data sources like [Hawkular](http://www.hawkular.org/)
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue