resolved node concept mixes up graceful and non graceful node shutdown
This commit is contained in:
parent
07371c33b6
commit
67bbfbc486
|
|
@ -454,50 +454,6 @@ Message: Pod was terminated in response to imminent node shutdown.
|
|||
|
||||
{{< /note >}}
|
||||
|
||||
## Non Graceful node shutdown {#non-graceful-node-shutdown}
|
||||
|
||||
{{< feature-state state="beta" for_k8s_version="v1.26" >}}
|
||||
|
||||
A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
|
||||
either because the command does not trigger the inhibitor locks mechanism used by
|
||||
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
|
||||
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
|
||||
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
|
||||
|
||||
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
|
||||
that are part of a StatefulSet will be stuck in terminating status on
|
||||
the shutdown node and cannot move to a new running node. This is because kubelet on
|
||||
the shutdown node is not available to delete the pods so the StatefulSet cannot
|
||||
create a new pod with the same name. If there are volumes used by the pods, the
|
||||
VolumeAttachments will not be deleted from the original shutdown node so the volumes
|
||||
used by these pods cannot be attached to a new running node. As a result, the
|
||||
application running on the StatefulSet cannot function properly. If the original
|
||||
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
|
||||
created on a different running node. If the original shutdown node does not come up,
|
||||
these pods will be stuck in terminating status on the shutdown node forever.
|
||||
|
||||
To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
|
||||
or `NoSchedule` effect to a Node marking it out-of-service.
|
||||
If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
is enabled on `kube-controller-manager`, and a Node is marked out-of-service with this taint, the
|
||||
pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
|
||||
detach operations for the pods terminating on the node will happen immediately. This allows the
|
||||
Pods on the out-of-service node to recover quickly on a different node.
|
||||
|
||||
During a non-graceful shutdown, Pods are terminated in the two phases:
|
||||
|
||||
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
|
||||
2. Immediately perform detach volume operation for such pods.
|
||||
|
||||
{{< note >}}
|
||||
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
|
||||
that the node is already in shutdown or power off state (not in the middle of
|
||||
restarting).
|
||||
- The user is required to manually remove the out-of-service taint after the pods are
|
||||
moved to a new node and the user has checked that the shutdown node has been
|
||||
recovered since the user was the one who originally added the taint.
|
||||
{{< /note >}}
|
||||
|
||||
### Pod Priority based graceful node shutdown {#pod-priority-graceful-node-shutdown}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.23" >}}
|
||||
|
|
@ -596,6 +552,50 @@ the feature is Beta and is enabled by default.
|
|||
Metrics `graceful_shutdown_start_time_seconds` and `graceful_shutdown_end_time_seconds`
|
||||
are emitted under the kubelet subsystem to monitor node shutdowns.
|
||||
|
||||
## Non Graceful node shutdown {#non-graceful-node-shutdown}
|
||||
|
||||
{{< feature-state state="beta" for_k8s_version="v1.26" >}}
|
||||
|
||||
A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
|
||||
either because the command does not trigger the inhibitor locks mechanism used by
|
||||
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
|
||||
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
|
||||
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
|
||||
|
||||
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
|
||||
that are part of a StatefulSet will be stuck in terminating status on
|
||||
the shutdown node and cannot move to a new running node. This is because kubelet on
|
||||
the shutdown node is not available to delete the pods so the StatefulSet cannot
|
||||
create a new pod with the same name. If there are volumes used by the pods, the
|
||||
VolumeAttachments will not be deleted from the original shutdown node so the volumes
|
||||
used by these pods cannot be attached to a new running node. As a result, the
|
||||
application running on the StatefulSet cannot function properly. If the original
|
||||
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
|
||||
created on a different running node. If the original shutdown node does not come up,
|
||||
these pods will be stuck in terminating status on the shutdown node forever.
|
||||
|
||||
To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
|
||||
or `NoSchedule` effect to a Node marking it out-of-service.
|
||||
If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
is enabled on `kube-controller-manager`, and a Node is marked out-of-service with this taint, the
|
||||
pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
|
||||
detach operations for the pods terminating on the node will happen immediately. This allows the
|
||||
Pods on the out-of-service node to recover quickly on a different node.
|
||||
|
||||
During a non-graceful shutdown, Pods are terminated in the two phases:
|
||||
|
||||
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
|
||||
2. Immediately perform detach volume operation for such pods.
|
||||
|
||||
{{< note >}}
|
||||
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
|
||||
that the node is already in shutdown or power off state (not in the middle of
|
||||
restarting).
|
||||
- The user is required to manually remove the out-of-service taint after the pods are
|
||||
moved to a new node and the user has checked that the shutdown node has been
|
||||
recovered since the user was the one who originally added the taint.
|
||||
{{< /note >}}
|
||||
|
||||
## Swap memory management {#swap-memory}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.22" >}}
|
||||
|
|
|
|||
Loading…
Reference in New Issue