Improve taint and toleration documentation
This commit is contained in:
parent
a17a3b9cb1
commit
82ca38f8df
|
@ -58,6 +58,7 @@ toc:
|
|||
- docs/concepts/configuration/overview.md
|
||||
- docs/concepts/configuration/manage-compute-resources-container.md
|
||||
- docs/concepts/configuration/assign-pod-node.md
|
||||
- docs/concepts/configuration/taint-and-toleration.md
|
||||
- docs/concepts/configuration/secret.md
|
||||
- docs/concepts/configuration/organize-cluster-access-kubeconfig.md
|
||||
|
||||
|
|
|
@ -7868,7 +7868,7 @@ Appears In <a href="#pod-v1-core">Pod</a> <a href="#podtemplatespec-v1-core">Pod
|
|||
</tr>
|
||||
<tr>
|
||||
<td>tolerations <br /> <em><a href="#toleration-v1-core">Toleration</a> array</em></td>
|
||||
<td>If specified, the pod's tolerations.</td>
|
||||
<td>If specified, the pod's tolerations. More info: <a href="https://kubernetes.io/docs/concepts/configuration/taint-and-toleration">https://kubernetes.io/docs/concepts/configuration/taint-and-toleration</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>volumes <br /> <em><a href="#volume-v1-core">Volume</a> array</em></td>
|
||||
|
|
|
@ -7964,7 +7964,7 @@ Appears In:
|
|||
</tr>
|
||||
<tr>
|
||||
<td>tolerations <br /> <em><a href="#toleration-v1-core">Toleration</a> array</em></td>
|
||||
<td>If specified, the pod's tolerations.</td>
|
||||
<td>If specified, the pod's tolerations. More info: <a href="https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/">https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>volumes <br /> <em><a href="#volume-v1-core">Volume</a> array</em> <br /> <strong>patch type</strong>: <em>merge</em> <br /> <strong>patch merge key</strong>: <em>name</em></td>
|
||||
|
|
|
@ -171,7 +171,7 @@ Starting in Kubernetes 1.6, the NodeController is also responsible for evicting
|
|||
pods that are running on nodes with `NoExecute` taints, when the pods do not tolerate
|
||||
the taints. Additionally, as an alpha feature that is disabled by default, the
|
||||
NodeController is responsible for adding taints corresponding to node problems like
|
||||
node unreachable or not ready. See [this documentation](/docs/concepts/configuration/assign-pod-node/#taints-and-tolerations-beta-feature)
|
||||
node unreachable or not ready. See [this documentation](/docs/concepts/configuration/taint-and-toleration)
|
||||
for details about `NoExecute` taints and the alpha feature.
|
||||
|
||||
### Self-Registration of Nodes
|
||||
|
|
|
@ -118,4 +118,5 @@ spec:
|
|||
**Note**: a pod with the _unsafe_ sysctls specified above will fail to launch on
|
||||
any node which has not enabled those two _unsafe_ sysctls explicitly. As with
|
||||
_node-level_ sysctls it is recommended to use [_taints and toleration_
|
||||
feature](/docs/user-guide/kubectl/v1.6/#taint) or [labels on nodes](/docs/concepts/configuration/assign-pod-node/) to schedule those pods onto the right nodes.
|
||||
feature](/docs/user-guide/kubectl/v1.6/#taint) or [taints on nodes](/docs/concepts/configuration/taint-and-toleration/)
|
||||
to schedule those pods onto the right nodes.
|
||||
|
|
|
@ -298,231 +298,5 @@ Highly Available database statefulset has one master and three replicas, one may
|
|||
For more information on inter-pod affinity/anti-affinity, see the design doc
|
||||
[here](https://git.k8s.io/community/contributors/design-proposals/podaffinity.md).
|
||||
|
||||
## Taints and tolerations (beta feature)
|
||||
|
||||
Node affinity, described earlier, is a property of *pods* that *attracts* them to a set
|
||||
of nodes (either as a preference or a hard requirement). Taints are the opposite --
|
||||
they allow a *node* to *repel* a set of pods.
|
||||
|
||||
Taints and tolerations work together to ensure that pods are not scheduled
|
||||
onto inappropriate nodes. One or more taints are applied to a node; this
|
||||
marks that the node should not accept any pods that do not tolerate the taints.
|
||||
Tolerations are applied to pods, and allow (but do not require) the pods to schedule
|
||||
onto nodes with matching taints.
|
||||
|
||||
You add a taint to a node using [kubectl taint](/docs/user-guide/kubectl/v1.7/#taint).
|
||||
For example,
|
||||
|
||||
```shell
|
||||
kubectl taint nodes node1 key=value:NoSchedule
|
||||
```
|
||||
|
||||
places a taint on node `node1`. The taint has key `key`, value `value`, and taint effect `NoSchedule`.
|
||||
This means that no pod will be able to schedule onto `node1` unless it has a matching toleration.
|
||||
You specify a toleration for a pod in the PodSpec. Both of the following tolerations "match" the
|
||||
taint created by the `kubectl taint` line above, and thus a pod with either toleration would be able
|
||||
to schedule onto `node1`:
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key"
|
||||
operator: "Equal"
|
||||
value: "value"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key"
|
||||
operator: "Exists"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
A toleration "matches" a taint if the keys are the same and the effects are the same, and:
|
||||
|
||||
* the `operator` is `Exists` (in which case no `value` should be specified), or
|
||||
* the `operator` is `Equal` and the `value`s are equal
|
||||
|
||||
`Operator` defaults to `Equal` if not specified.
|
||||
|
||||
**NOTE:** There are two special cases:
|
||||
|
||||
* An empty `key` with operator `Exists` matches all keys, values and effects which means this
|
||||
will tolerate everything.
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- operator: "Exists"
|
||||
```
|
||||
|
||||
* An empty `effect` matches all effects with key `key`.
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key"
|
||||
operator: "Exists"
|
||||
```
|
||||
|
||||
The above example used `effect` of `NoSchedule`. Alternatively, you can use `effect` of `PreferNoSchedule`.
|
||||
This is a "preference" or "soft" version of `NoSchedule` -- the system will *try* to avoid placing a
|
||||
pod that does not tolerate the taint on the node, but it is not required. The third kind of `effect` is
|
||||
`NoExecute`, described later.
|
||||
|
||||
You can put multiple taints on the same node and multiple tolerations on the same pod.
|
||||
The way Kubernetes processes multiple taints and tolerations is like a filter: start
|
||||
with all of a node's taints, then ignore the ones for which the pod has a matching toleration; the
|
||||
remaining un-ignored taints have the indicated effects on the pod. In particular,
|
||||
|
||||
* if there is at least one un-ignored taint with effect `NoSchedule` then Kubernetes will not schedule
|
||||
the pod onto that node
|
||||
* if there is no un-ignored taint with effect `NoSchedule` but there is at least one un-ignored taint with
|
||||
effect `PreferNoSchedule` then Kubernetes will *try* to not schedule the pod onto the node
|
||||
* if there is at least one un-ignored taint with effect `NoExecute` then the pod will be evicted from
|
||||
the node (if it is already running on the node), and will not be
|
||||
scheduled onto the node (if it is not yet running on the node).
|
||||
|
||||
For example, imagine you taint a node like this
|
||||
|
||||
```shell
|
||||
kubectl taint nodes node1 key1=value1:NoSchedule
|
||||
kubectl taint nodes node1 key1=value1:NoExecute
|
||||
kubectl taint nodes node1 key2=value2:NoSchedule
|
||||
```
|
||||
|
||||
And a pod has two tolerations:
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key1"
|
||||
operator: "Equal"
|
||||
value: "value1"
|
||||
effect: "NoSchedule"
|
||||
- key: "key1"
|
||||
operator: "Equal"
|
||||
value: "value1"
|
||||
effect: "NoExecute"
|
||||
```
|
||||
|
||||
In this case, the pod will not be able to schedule onto the node, because there is no
|
||||
toleration matching the third taint. But it will be able to continue running if it is
|
||||
already running on the node when the taint is added, because the third taint is the only
|
||||
one of the three that is not tolerated by the pod.
|
||||
|
||||
Normally, if a taint with effect `NoExecute` is added to a node, then any pods that do
|
||||
not tolerate the taint will be evicted immediately, and any pods that do tolerate the
|
||||
taint will never be evicted. However, a toleration with `NoExecute` effect can specify
|
||||
an optional `tolerationSeconds` field that dictates how long the pod will stay bound
|
||||
to the node after the taint is added. For example,
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key1"
|
||||
operator: "Equal"
|
||||
value: "value1"
|
||||
effect: "NoExecute"
|
||||
tolerationSeconds: 3600
|
||||
```
|
||||
|
||||
means that if this pod is running and a matching taint is added to the node, then
|
||||
the pod will stay bound to the node for 3600 seconds, and then be evicted. If the
|
||||
taint is removed before that time, the pod will not be evicted.
|
||||
|
||||
### Example use cases
|
||||
|
||||
Taints and tolerations are a flexible way to steer pods away from nodes or evict
|
||||
pods that shouldn't be running. A few of the use cases are
|
||||
|
||||
* **dedicated nodes**: If you want to dedicate a set of nodes for exclusive use by
|
||||
a particular set of users, you can add a taint to those nodes (say,
|
||||
`kubectl taint nodes nodename dedicated=groupName:NoSchedule`) and then add a corresponding
|
||||
toleration to their pods (this would be done most easily by writing a custom
|
||||
[admission controller](/docs/admin/admission-controllers/)).
|
||||
The pods with the tolerations will then be allowed to use the tainted (dedicated) nodes as
|
||||
well as any other nodes in the cluster. If you want to dedicate the nodes to them *and*
|
||||
ensure they *only* use the dedicated nodes, then you should additionally add a label similar
|
||||
to the taint to the same set of nodes (e.g. `dedicated=groupName`), and the admission
|
||||
controller should additionally add a node affinity to require that the pods can only schedule
|
||||
onto nodes labeled with `dedicated=groupName`.
|
||||
|
||||
* **nodes with special hardware**: In a cluster where a small subset of nodes have specialized
|
||||
hardware (for example GPUs), it is desirable to keep pods that don't need the specialized
|
||||
hardware off of those nodes, thus leaving room for later-arriving pods that do need the
|
||||
specialized hardware. This can be done by tainting the nodes that have the specialized
|
||||
hardware (e.g. `kubectl taint nodes nodename special=true:NoSchedule` or
|
||||
`kubectl taint nodes nodename special=true:PreferNoSchedule`) and adding a corresponding
|
||||
toleration to pods that use the special hardware. As in the dedicated nodes use case,
|
||||
it is probably easiest to apply the tolerations using a custom
|
||||
[admission controller](/docs/admin/admission-controllers/)).
|
||||
For example, the admission controller could use
|
||||
some characteristic(s) of the pod to determine that the pod should be allowed to use
|
||||
the special nodes and hence the admission controller should add the toleration.
|
||||
To ensure that the pods that need
|
||||
the special hardware *only* schedule onto the nodes that have the special hardware, you will need some
|
||||
additional mechanism, e.g. you could represent the special resource using
|
||||
[opaque integer resources](/docs/concepts/configuration/manage-compute-resources-container/#opaque-integer-resources-alpha-feature)
|
||||
and request it as a resource in the PodSpec, or you could label the nodes that have
|
||||
the special hardware and use node affinity on the pods that need the hardware.
|
||||
|
||||
* **per-pod-configurable eviction behavior when there are node problems (alpha feature)**,
|
||||
which is described in the next section.
|
||||
|
||||
### Per-pod-configurable eviction behavior when there are node problems (alpha feature)
|
||||
|
||||
Earlier we mentioned the `NoExecute` taint effect, which affects pods that are already
|
||||
running on the node as follows
|
||||
|
||||
* pods that do not tolerate the taint are evicted immediately
|
||||
* pods that tolerate the taint without specifying `tolerationSeconds` in
|
||||
their toleration specification remain bound forever
|
||||
* pods that tolerate the taint with a specified `tolerationSeconds` remain
|
||||
bound for the specified amount of time
|
||||
|
||||
The above behavior is a beta feature. In addition, Kubernetes 1.6 has alpha
|
||||
support for representing node problems (currently only "node unreachable" and
|
||||
"node not ready", corresponding to the NodeCondition "Ready" being "Unknown" or
|
||||
"False" respectively) as taints. When the `TaintBasedEvictions` alpha feature
|
||||
is enabled (you can do this by including `TaintBasedEvictions=true` in `--feature-gates`, such as
|
||||
`--feature-gates=FooBar=true,TaintBasedEvictions=true`), the taints are automatically
|
||||
added by the NodeController and the normal logic for evicting pods from nodes
|
||||
based on the Ready NodeCondition is disabled.
|
||||
(Note: To maintain the existing [rate limiting](/docs/concepts/architecture/nodes/)
|
||||
behavior of pod evictions due to node problems, the system actually adds the taints
|
||||
in a rate-limited way. This prevents massive pod evictions in scenarios such
|
||||
as the master becoming partitioned from the nodes.)
|
||||
This alpha feature, in combination with `tolerationSeconds`, allows a pod
|
||||
to specify how long it should stay bound to a node that has one or both of these problems.
|
||||
|
||||
For example, an application with a lot of local state might want to stay
|
||||
bound to node for a long time in the event of network partition, in the hope
|
||||
that the partition will recover and thus the pod eviction can be avoided.
|
||||
The toleration the pod would use in that case would look like
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "node.alpha.kubernetes.io/unreachable"
|
||||
operator: "Exists"
|
||||
effect: "NoExecute"
|
||||
tolerationSeconds: 6000
|
||||
```
|
||||
|
||||
(For the node not ready case, change the key to `node.alpha.kubernetes.io/notReady`.)
|
||||
|
||||
Note that Kubernetes automatically adds a toleration for
|
||||
`node.alpha.kubernetes.io/notReady` with `tolerationSeconds=300`
|
||||
unless the pod configuration provided
|
||||
by the user already has a toleration for `node.alpha.kubernetes.io/notReady`.
|
||||
Likewise it adds a toleration for
|
||||
`node.alpha.kubernetes.io/unreachable` with `tolerationSeconds=300`
|
||||
unless the pod configuration provided
|
||||
by the user already has a toleration for `node.alpha.kubernetes.io/unreachable`.
|
||||
|
||||
These automatically-added tolerations ensure that
|
||||
the default pod behavior of remaining bound for 5 minutes after one of these
|
||||
problems is detected is maintained.
|
||||
The two default tolerations are added by the [DefaultTolerationSeconds
|
||||
admission controller](https://git.k8s.io/kubernetes/plugin/pkg/admission/defaulttolerationseconds).
|
||||
|
||||
[DaemonSet](/docs/concepts/workloads/controllers/daemonset/) pods are created with
|
||||
`NoExecute` tolerations for `node.alpha.kubernetes.io/unreachable` and `node.alpha.kubernetes.io/notReady`
|
||||
with no `tolerationSeconds`. This ensures that DaemonSet pods are never evicted due
|
||||
to these problems, which matches the behavior when this feature is disabled.
|
||||
You may want to check [Taints](/docs/concepts/configuration/taint-and-toleration/)
|
||||
as well, which allow a *node* to *repel* a set of pods.
|
||||
|
|
|
@ -0,0 +1,257 @@
|
|||
---
|
||||
approvers:
|
||||
- davidopp
|
||||
- kevin-wangzefeng
|
||||
- bsalamat
|
||||
title: Taints and Tolerations
|
||||
---
|
||||
|
||||
Node affinity, described [here](/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature),
|
||||
is a property of *pods* that *attracts* them to a set of nodes (either as a
|
||||
preference or a hard requirement). Taints are the opposite -- they allow a
|
||||
*node* to *repel* a set of pods.
|
||||
|
||||
Taints and tolerations work together to ensure that pods are not scheduled
|
||||
onto inappropriate nodes. One or more taints are applied to a node; this
|
||||
marks that the node should not accept any pods that do not tolerate the taints.
|
||||
Tolerations are applied to pods, and allow (but do not require) the pods to schedule
|
||||
onto nodes with matching taints.
|
||||
|
||||
## Concepts
|
||||
|
||||
You add a taint to a node using [kubectl taint](/docs/user-guide/kubectl/v1.7/#taint).
|
||||
For example,
|
||||
|
||||
```shell
|
||||
kubectl taint nodes node1 key=value:NoSchedule
|
||||
```
|
||||
|
||||
places a taint on node `node1`. The taint has key `key`, value `value`, and taint effect `NoSchedule`.
|
||||
This means that no pod will be able to schedule onto `node1` unless it has a matching toleration.
|
||||
You specify a toleration for a pod in the PodSpec. Both of the following tolerations "match" the
|
||||
taint created by the `kubectl taint` line above, and thus a pod with either toleration would be able
|
||||
to schedule onto `node1`:
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key"
|
||||
operator: "Equal"
|
||||
value: "value"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key"
|
||||
operator: "Exists"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
A toleration "matches" a taint if the keys are the same and the effects are the same, and:
|
||||
|
||||
* the `operator` is `Exists` (in which case no `value` should be specified), or
|
||||
* the `operator` is `Equal` and the `value`s are equal
|
||||
|
||||
`Operator` defaults to `Equal` if not specified.
|
||||
|
||||
**NOTE:** There are two special cases:
|
||||
|
||||
* An empty `key` with operator `Exists` matches all keys, values and effects which means this
|
||||
will tolerate everything.
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- operator: "Exists"
|
||||
```
|
||||
|
||||
* An empty `effect` matches all effects with key `key`.
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key"
|
||||
operator: "Exists"
|
||||
```
|
||||
|
||||
The above example used `effect` of `NoSchedule`. Alternatively, you can use `effect` of `PreferNoSchedule`.
|
||||
This is a "preference" or "soft" version of `NoSchedule` -- the system will *try* to avoid placing a
|
||||
pod that does not tolerate the taint on the node, but it is not required. The third kind of `effect` is
|
||||
`NoExecute`, described later.
|
||||
|
||||
You can put multiple taints on the same node and multiple tolerations on the same pod.
|
||||
The way Kubernetes processes multiple taints and tolerations is like a filter: start
|
||||
with all of a node's taints, then ignore the ones for which the pod has a matching toleration; the
|
||||
remaining un-ignored taints have the indicated effects on the pod. In particular,
|
||||
|
||||
* if there is at least one un-ignored taint with effect `NoSchedule` then Kubernetes will not schedule
|
||||
the pod onto that node
|
||||
* if there is no un-ignored taint with effect `NoSchedule` but there is at least one un-ignored taint with
|
||||
effect `PreferNoSchedule` then Kubernetes will *try* to not schedule the pod onto the node
|
||||
* if there is at least one un-ignored taint with effect `NoExecute` then the pod will be evicted from
|
||||
the node (if it is already running on the node), and will not be
|
||||
scheduled onto the node (if it is not yet running on the node).
|
||||
|
||||
For example, imagine you taint a node like this
|
||||
|
||||
```shell
|
||||
kubectl taint nodes node1 key1=value1:NoSchedule
|
||||
kubectl taint nodes node1 key1=value1:NoExecute
|
||||
kubectl taint nodes node1 key2=value2:NoSchedule
|
||||
```
|
||||
|
||||
And a pod has two tolerations:
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key1"
|
||||
operator: "Equal"
|
||||
value: "value1"
|
||||
effect: "NoSchedule"
|
||||
- key: "key1"
|
||||
operator: "Equal"
|
||||
value: "value1"
|
||||
effect: "NoExecute"
|
||||
```
|
||||
|
||||
In this case, the pod will not be able to schedule onto the node, because there is no
|
||||
toleration matching the third taint. But it will be able to continue running if it is
|
||||
already running on the node when the taint is added, because the third taint is the only
|
||||
one of the three that is not tolerated by the pod.
|
||||
|
||||
Normally, if a taint with effect `NoExecute` is added to a node, then any pods that do
|
||||
not tolerate the taint will be evicted immediately, and any pods that do tolerate the
|
||||
taint will never be evicted. However, a toleration with `NoExecute` effect can specify
|
||||
an optional `tolerationSeconds` field that dictates how long the pod will stay bound
|
||||
to the node after the taint is added. For example,
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "key1"
|
||||
operator: "Equal"
|
||||
value: "value1"
|
||||
effect: "NoExecute"
|
||||
tolerationSeconds: 3600
|
||||
```
|
||||
|
||||
means that if this pod is running and a matching taint is added to the node, then
|
||||
the pod will stay bound to the node for 3600 seconds, and then be evicted. If the
|
||||
taint is removed before that time, the pod will not be evicted.
|
||||
|
||||
## Example Use Cases
|
||||
|
||||
Taints and tolerations are a flexible way to steer pods *away* from nodes or evict
|
||||
pods that shouldn't be running. A few of the use cases are
|
||||
|
||||
* **Dedicated Nodes**: If you want to dedicate a set of nodes for exclusive use by
|
||||
a particular set of users, you can add a taint to those nodes (say,
|
||||
`kubectl taint nodes nodename dedicated=groupName:NoSchedule`) and then add a corresponding
|
||||
toleration to their pods (this would be done most easily by writing a custom
|
||||
[admission controller](/docs/admin/admission-controllers/)).
|
||||
The pods with the tolerations will then be allowed to use the tainted (dedicated) nodes as
|
||||
well as any other nodes in the cluster. If you want to dedicate the nodes to them *and*
|
||||
ensure they *only* use the dedicated nodes, then you should additionally add a label similar
|
||||
to the taint to the same set of nodes (e.g. `dedicated=groupName`), and the admission
|
||||
controller should additionally add a node affinity to require that the pods can only schedule
|
||||
onto nodes labeled with `dedicated=groupName`.
|
||||
|
||||
* **Nodes with Special Hardware**: In a cluster where a small subset of nodes have specialized
|
||||
hardware (for example GPUs), it is desirable to keep pods that don't need the specialized
|
||||
hardware off of those nodes, thus leaving room for later-arriving pods that do need the
|
||||
specialized hardware. This can be done by tainting the nodes that have the specialized
|
||||
hardware (e.g. `kubectl taint nodes nodename special=true:NoSchedule` or
|
||||
`kubectl taint nodes nodename special=true:PreferNoSchedule`) and adding a corresponding
|
||||
toleration to pods that use the special hardware. As in the dedicated nodes use case,
|
||||
it is probably easiest to apply the tolerations using a custom
|
||||
[admission controller](/docs/admin/admission-controllers/)).
|
||||
For example, the admission controller could use
|
||||
some characteristic(s) of the pod to determine that the pod should be allowed to use
|
||||
the special nodes and hence the admission controller should add the toleration.
|
||||
To ensure that the pods that need
|
||||
the special hardware *only* schedule onto the nodes that have the special hardware, you will need some
|
||||
additional mechanism, e.g. you could represent the special resource using
|
||||
[opaque integer resources](/docs/concepts/configuration/manage-compute-resources-container/#opaque-integer-resources-alpha-feature)
|
||||
and request it as a resource in the PodSpec, or you could label the nodes that have
|
||||
the special hardware and use node affinity on the pods that need the hardware.
|
||||
|
||||
* **Taint based Evictions (alpha feature)**: A per-pod-configurable eviction behavior
|
||||
when there are node problems, which is described in the next section.
|
||||
|
||||
## Taint based Evictions
|
||||
|
||||
Earlier we mentioned the `NoExecute` taint effect, which affects pods that are already
|
||||
running on the node as follows
|
||||
|
||||
* pods that do not tolerate the taint are evicted immediately
|
||||
* pods that tolerate the taint without specifying `tolerationSeconds` in
|
||||
their toleration specification remain bound forever
|
||||
* pods that tolerate the taint with a specified `tolerationSeconds` remain
|
||||
bound for the specified amount of time
|
||||
|
||||
The above behavior is a beta feature. In addition, Kubernetes 1.6 has alpha
|
||||
support for representing node problems. In other words, the node controller
|
||||
automatically taints a node when certain condition is true. The builtin taints
|
||||
currently include:
|
||||
|
||||
* `node.alpha.kubernetes.io/notReady`: Node is not ready. This corresponds to
|
||||
the NodeCondition `Ready` being "`False`".
|
||||
* `node.alpha.kubernetes.io/unreachable`: Node is unreachable from the node
|
||||
controller. This corresponds to the NodeCondition `Ready` being "`Unknown`".
|
||||
* `node.kubernetes.io/outOfDisk`: Node becomes out of disk.
|
||||
* `node.kubernetes.io/memoryPressure`: Node has memory pressure.
|
||||
* `node.kubernetes.io/diskPressure`: Node has disk pressure.
|
||||
* `node.kubernetes.io/networkUnavailable`: Node's network is unavailable.
|
||||
* `node.cloudprovider.kubernetes.io/uninitialized`: When kubelet is started
|
||||
with "external" cloud provider, it sets this taint on a node to mark it
|
||||
as unusable. When a controller from the cloud-controller-manager initializes
|
||||
this node, kubelet removes this taint.
|
||||
|
||||
When the `TaintBasedEvictions` alpha feature is enabled (you can do this by
|
||||
including `TaintBasedEvictions=true` in `--feature-gates`, such as
|
||||
`--feature-gates=FooBar=true,TaintBasedEvictions=true`), the taints are automatically
|
||||
added by the NodeController (or kubelet) and the normal logic for evicting pods from nodes
|
||||
based on the Ready NodeCondition is disabled.
|
||||
(Note: To maintain the existing [rate limiting](/docs/concepts/architecture/nodes/)
|
||||
behavior of pod evictions due to node problems, the system actually adds the taints
|
||||
in a rate-limited way. This prevents massive pod evictions in scenarios such
|
||||
as the master becoming partitioned from the nodes.)
|
||||
This alpha feature, in combination with `tolerationSeconds`, allows a pod
|
||||
to specify how long it should stay bound to a node that has one or both of these problems.
|
||||
|
||||
For example, an application with a lot of local state might want to stay
|
||||
bound to node for a long time in the event of network partition, in the hope
|
||||
that the partition will recover and thus the pod eviction can be avoided.
|
||||
The toleration the pod would use in that case would look like
|
||||
|
||||
```yaml
|
||||
tolerations:
|
||||
- key: "node.alpha.kubernetes.io/unreachable"
|
||||
operator: "Exists"
|
||||
effect: "NoExecute"
|
||||
tolerationSeconds: 6000
|
||||
```
|
||||
|
||||
Note that Kubernetes automatically adds a toleration for
|
||||
`node.alpha.kubernetes.io/notReady` with `tolerationSeconds=300`
|
||||
unless the pod configuration provided
|
||||
by the user already has a toleration for `node.alpha.kubernetes.io/notReady`.
|
||||
Likewise it adds a toleration for
|
||||
`node.alpha.kubernetes.io/unreachable` with `tolerationSeconds=300`
|
||||
unless the pod configuration provided
|
||||
by the user already has a toleration for `node.alpha.kubernetes.io/unreachable`.
|
||||
|
||||
These automatically-added tolerations ensure that
|
||||
the default pod behavior of remaining bound for 5 minutes after one of these
|
||||
problems is detected is maintained.
|
||||
The two default tolerations are added by the [DefaultTolerationSeconds
|
||||
admission controller](https://git.k8s.io/kubernetes/plugin/pkg/admission/defaulttolerationseconds).
|
||||
|
||||
[DaemonSet](/docs/concepts/workloads/controllers/daemonset/) pods are created with
|
||||
`NoExecute` tolerations for the following taints with no `tolerationSeconds`:
|
||||
|
||||
* `node.alpha.kubernetes.io/unreachable`
|
||||
* `node.alpha.kubernetes.io/notReady`
|
||||
* `node.kubernetes.io/memoryPressure`
|
||||
* `node.kubernetes.io/diskPressure`
|
||||
* `node.kubernetes.io/outOfDisk` (*only for critical pods*)
|
||||
|
||||
This ensures that DaemonSet pods are never evicted due to these problems,
|
||||
which matches the behavior when this feature is disabled.
|
|
@ -4152,7 +4152,7 @@ When an object is created, the system will populate this list with the current s
|
|||
</tr>
|
||||
<tr>
|
||||
<td class="tableblock halign-left valign-top"><p class="tableblock">tolerations</p></td>
|
||||
<td class="tableblock halign-left valign-top"><p class="tableblock">If specified, the pod’s tolerations.</p></td>
|
||||
<td class="tableblock halign-left valign-top"><p class="tableblock">If specified, the pod’s tolerations. More info: <a href="https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/">https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/</a></p></td>
|
||||
<td class="tableblock halign-left valign-top"><p class="tableblock">false</p></td>
|
||||
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#_v1_toleration">v1.Toleration</a> array</p></td>
|
||||
<td class="tableblock halign-left valign-top"></td>
|
||||
|
|
|
@ -1167,7 +1167,7 @@ Appears In <a href="#pod-v1-core">Pod</a> <a href="#podtemplatespec-v1-core">Pod
|
|||
</tr>
|
||||
<tr>
|
||||
<td>tolerations <br /> <em><a href="#toleration-v1-core">Toleration</a> array</em></td>
|
||||
<td>If specified, the pod's tolerations.</td>
|
||||
<td>If specified, the pod's tolerations. More info: <a href="http://kubernetes.io/docs/concepts/configuration/taint-and-toleration">http://kubernetes.io/docs/concepts/configuration/taint-and-toleration/</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>volumes <br /> <em><a href="#volume-v1-core">Volume</a> array</em></td>
|
||||
|
|
|
@ -1261,7 +1261,7 @@ Appears In:
|
|||
</tr>
|
||||
<tr>
|
||||
<td>tolerations <br /> <em><a href="#toleration-v1-core">Toleration</a> array</em></td>
|
||||
<td>If specified, the pod's tolerations.</td>
|
||||
<td>If specified, the pod's tolerations. More info: <a href="https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/">https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>volumes <br /> <em><a href="#volume-v1-core">Volume</a> array</em> <br /> <strong>patch type</strong>: <em>merge</em> <br /> <strong>patch merge key</strong>: <em>name</em></td>
|
||||
|
|
|
@ -123,6 +123,7 @@ prevent cross talk, or advanced networking policy.
|
|||
|
||||
By default, there are no restrictions on which nodes may run a pod. Kubernetes offers a
|
||||
[rich set of policies for controlling placement of pods onto nodes](/docs/concepts/configuration/assign-pod-node/)
|
||||
and the [taint based pod placement and eviction](/docs/concepts/configuration/taint-and-toleration)
|
||||
that are available to end users. For many clusters use of these policies to separate workloads
|
||||
can be a convention that authors adopt or enforce via tooling.
|
||||
|
||||
|
|
Loading…
Reference in New Issue