Merge pull request #41282 from tengqm/tweak-probes-task

Tidy up the probes task page
This commit is contained in:
Kubernetes Prow Robot 2023-06-20 18:20:20 -07:00 committed by GitHub
commit da0d5c530e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 92 additions and 94 deletions

View File

@ -44,11 +44,8 @@ Understand the difference between readiness and liveness probes and when to appl
## {{% heading "prerequisites" %}} ## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< include "task-tutorial-prereqs.md" >}}
<!-- steps --> <!-- steps -->
## Define a liveness command ## Define a liveness command
@ -95,14 +92,14 @@ kubectl describe pod liveness-exec
The output indicates that no liveness probes have failed yet: The output indicates that no liveness probes have failed yet:
``` ```none
Type Reason Age From Message Type Reason Age From Message
---- ------ ---- ---- ------- ---- ------ ---- ---- -------
Normal Scheduled 11s default-scheduler Successfully assigned default/liveness-exec to node01 Normal Scheduled 11s default-scheduler Successfully assigned default/liveness-exec to node01
Normal Pulling 9s kubelet, node01 Pulling image "registry.k8s.io/busybox" Normal Pulling 9s kubelet, node01 Pulling image "registry.k8s.io/busybox"
Normal Pulled 7s kubelet, node01 Successfully pulled image "registry.k8s.io/busybox" Normal Pulled 7s kubelet, node01 Successfully pulled image "registry.k8s.io/busybox"
Normal Created 7s kubelet, node01 Created container liveness Normal Created 7s kubelet, node01 Created container liveness
Normal Started 7s kubelet, node01 Started container liveness Normal Started 7s kubelet, node01 Started container liveness
``` ```
After 35 seconds, view the Pod events again: After 35 seconds, view the Pod events again:
@ -114,16 +111,16 @@ kubectl describe pod liveness-exec
At the bottom of the output, there are messages indicating that the liveness At the bottom of the output, there are messages indicating that the liveness
probes have failed, and the failed containers have been killed and recreated. probes have failed, and the failed containers have been killed and recreated.
``` ```none
Type Reason Age From Message Type Reason Age From Message
---- ------ ---- ---- ------- ---- ------ ---- ---- -------
Normal Scheduled 57s default-scheduler Successfully assigned default/liveness-exec to node01 Normal Scheduled 57s default-scheduler Successfully assigned default/liveness-exec to node01
Normal Pulling 55s kubelet, node01 Pulling image "registry.k8s.io/busybox" Normal Pulling 55s kubelet, node01 Pulling image "registry.k8s.io/busybox"
Normal Pulled 53s kubelet, node01 Successfully pulled image "registry.k8s.io/busybox" Normal Pulled 53s kubelet, node01 Successfully pulled image "registry.k8s.io/busybox"
Normal Created 53s kubelet, node01 Created container liveness Normal Created 53s kubelet, node01 Created container liveness
Normal Started 53s kubelet, node01 Started container liveness Normal Started 53s kubelet, node01 Started container liveness
Warning Unhealthy 10s (x3 over 20s) kubelet, node01 Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory Warning Unhealthy 10s (x3 over 20s) kubelet, node01 Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 10s kubelet, node01 Container liveness failed liveness probe, will be restarted Normal Killing 10s kubelet, node01 Container liveness failed liveness probe, will be restarted
``` ```
Wait another 30 seconds, and verify that the container has been restarted: Wait another 30 seconds, and verify that the container has been restarted:
@ -132,9 +129,10 @@ Wait another 30 seconds, and verify that the container has been restarted:
kubectl get pod liveness-exec kubectl get pod liveness-exec
``` ```
The output shows that `RESTARTS` has been incremented. Note that the `RESTARTS` counter increments as soon as a failed container comes back to the running state: The output shows that `RESTARTS` has been incremented. Note that the `RESTARTS` counter
increments as soon as a failed container comes back to the running state:
``` ```none
NAME READY STATUS RESTARTS AGE NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 1 1m liveness-exec 1/1 Running 1 1m
``` ```
@ -142,8 +140,7 @@ liveness-exec 1/1 Running 1 1m
## Define a liveness HTTP request ## Define a liveness HTTP request
Another kind of liveness probe uses an HTTP GET request. Here is the configuration Another kind of liveness probe uses an HTTP GET request. Here is the configuration
file for a Pod that runs a container based on the `registry.k8s.io/liveness` file for a Pod that runs a container based on the `registry.k8s.io/liveness` image.
image.
{{< codenew file="pods/probe/http-liveness.yaml" >}} {{< codenew file="pods/probe/http-liveness.yaml" >}}
@ -196,9 +193,6 @@ the container has been restarted:
kubectl describe pod liveness-http kubectl describe pod liveness-http
``` ```
In releases prior to v1.13 (including v1.13), if the environment variable
`http_proxy` (or `HTTP_PROXY`) is set on the node where a Pod is running,
the HTTP liveness probe uses that proxy.
In releases after v1.13, local HTTP proxy environment variable settings do not In releases after v1.13, local HTTP proxy environment variable settings do not
affect the HTTP liveness probe. affect the HTTP liveness probe.
@ -240,7 +234,8 @@ kubectl describe pod goproxy
{{< feature-state for_k8s_version="v1.24" state="beta" >}} {{< feature-state for_k8s_version="v1.24" state="beta" >}}
If your application implements the [gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md), If your application implements the
[gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md),
this example shows how to configure Kubernetes to use it for application liveness checks. this example shows how to configure Kubernetes to use it for application liveness checks.
Similarly you can configure readiness and startup probes. Similarly you can configure readiness and startup probes.
@ -251,19 +246,19 @@ Here is an example manifest:
To use a gRPC probe, `port` must be configured. If you want to distinguish probes of different types To use a gRPC probe, `port` must be configured. If you want to distinguish probes of different types
and probes for different features you can use the `service` field. and probes for different features you can use the `service` field.
You can set `service` to the value `liveness` and make your gRPC Health Checking endpoint You can set `service` to the value `liveness` and make your gRPC Health Checking endpoint
respond to this request differently then when you set `service` set to `readiness`. respond to this request differently than when you set `service` set to `readiness`.
This lets you use the same endpoint for different kinds of container health check This lets you use the same endpoint for different kinds of container health check
(rather than needing to listen on two different ports). rather than listening on two different ports.
If you want to specify your own custom service name and also specify a probe type, If you want to specify your own custom service name and also specify a probe type,
the Kubernetes project recommends that you use a name that concatenates the Kubernetes project recommends that you use a name that concatenates
those. For example: `myservice-liveness` (using `-` as a separator). those. For example: `myservice-liveness` (using `-` as a separator).
{{< note >}} {{< note >}}
Unlike HTTP or TCP probes, you cannot specify the healthcheck port by name, and you Unlike HTTP or TCP probes, you cannot specify the health check port by name, and you
cannot configure a custom hostname. cannot configure a custom hostname.
{{< /note >}} {{< /note >}}
Configuration problems (for example: incorrect port and service, unimplemented health checking protocol) Configuration problems (for example: incorrect port or service, unimplemented health checking protocol)
are considered a probe failure, similar to HTTP and TCP probes. are considered a probe failure, similar to HTTP and TCP probes.
To try the gRPC liveness check, create a Pod using the command below. To try the gRPC liveness check, create a Pod using the command below.
@ -279,23 +274,24 @@ After 15 seconds, view Pod events to verify that the liveness check has not fail
kubectl describe pod etcd-with-grpc kubectl describe pod etcd-with-grpc
``` ```
Before Kubernetes 1.23, gRPC health probes were often implemented using [grpc-health-probe](https://github.com/grpc-ecosystem/grpc-health-probe/), Before Kubernetes 1.23, gRPC health probes were often implemented using
as described in the blog post [Health checking gRPC servers on Kubernetes](/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/). [grpc-health-probe](https://github.com/grpc-ecosystem/grpc-health-probe/),
The built-in gRPC probes behavior is similar to one implemented by grpc-health-probe. as described in the blog post
[Health checking gRPC servers on Kubernetes](/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/).
The built-in gRPC probe's behavior is similar to the one implemented by grpc-health-probe.
When migrating from grpc-health-probe to built-in probes, remember the following differences: When migrating from grpc-health-probe to built-in probes, remember the following differences:
- Built-in probes run against the pod IP address, unlike grpc-health-probe that often runs against `127.0.0.1`. - Built-in probes run against the pod IP address, unlike grpc-health-probe that often runs against
Be sure to configure your gRPC endpoint to listen on the Pod's IP address. `127.0.0.1`. Be sure to configure your gRPC endpoint to listen on the Pod's IP address.
- Built-in probes do not support any authentication parameters (like `-tls`). - Built-in probes do not support any authentication parameters (like `-tls`).
- There are no error codes for built-in probes. All errors are considered as probe failures. - There are no error codes for built-in probes. All errors are considered as probe failures.
- If `ExecProbeTimeout` feature gate is set to `false`, grpc-health-probe does **not** respect the `timeoutSeconds` setting (which defaults to 1s), - If `ExecProbeTimeout` feature gate is set to `false`, grpc-health-probe does **not**
while built-in probe would fail on timeout. respect the `timeoutSeconds` setting (which defaults to 1s), while built-in probe would fail on timeout.
## Use a named port ## Use a named port
You can use a named You can use a named [`port`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#ports)
[`port`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#ports) for HTTP and TCP probes. gRPC probes do not support named ports.
for HTTP and TCP probes. (gRPC probes do not support named ports).
For example: For example:
@ -367,7 +363,9 @@ Readiness probes runs on the container during its whole lifecycle.
{{< /note >}} {{< /note >}}
{{< caution >}} {{< caution >}}
Liveness probes *do not* wait for readiness probes to succeed. If you want to wait before executing a liveness probe you should use initialDelaySeconds or a startupProbe. Liveness probes *do not* wait for readiness probes to succeed.
If you want to wait before executing a liveness probe you should use
`initialDelaySeconds` or a `startupProbe`.
{{< /caution >}} {{< /caution >}}
Readiness probes are configured similarly to liveness probes. The only difference Readiness probes are configured similarly to liveness probes. The only difference
@ -392,37 +390,34 @@ for it, and that containers are restarted when they fail.
## Configure Probes ## Configure Probes
{{< comment >}} <!--Eventually, some of this section could be moved to a concept topic.-->
Eventually, some of this section could be moved to a concept topic.
{{< /comment >}}
[Probes](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#probe-v1-core) have a number of fields that [Probes](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#probe-v1-core)
you can use to more precisely control the behavior of startup, liveness and readiness have a number of fields that you can use to more precisely control the behavior of startup,
checks: liveness and readiness checks:
* `initialDelaySeconds`: Number of seconds after the container has started * `initialDelaySeconds`: Number of seconds after the container has started before startup,
before startup, liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0. liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.
* `periodSeconds`: How often (in seconds) to perform the probe. Default to 10 * `periodSeconds`: How often (in seconds) to perform the probe. Default to 10 seconds.
seconds. Minimum value is 1. The minimum value is 1.
* `timeoutSeconds`: Number of seconds after which the probe times out. Defaults * `timeoutSeconds`: Number of seconds after which the probe times out.
to 1 second. Minimum value is 1. Defaults to 1 second. Minimum value is 1.
* `successThreshold`: Minimum consecutive successes for the probe to be * `successThreshold`: Minimum consecutive successes for the probe to be considered successful
considered successful after having failed. Defaults to 1. Must be 1 for liveness after having failed. Defaults to 1. Must be 1 for liveness and startup Probes.
and startup Probes. Minimum value is 1. Minimum value is 1.
* `failureThreshold`: After a probe fails `failureThreshold` times in a row, Kubernetes * `failureThreshold`: After a probe fails `failureThreshold` times in a row, Kubernetes
considers that the overall check has failed: the container is _not_ ready / healthy / considers that the overall check has failed: the container is _not_ ready/healthy/live.
live.
For the case of a startup or liveness probe, if at least `failureThreshold` probes have For the case of a startup or liveness probe, if at least `failureThreshold` probes have
failed, Kubernetes treats the container as unhealthy and triggers a restart for that failed, Kubernetes treats the container as unhealthy and triggers a restart for that
specific container. The kubelet takes the setting of `terminationGracePeriodSeconds` specific container. The kubelet honors the setting of `terminationGracePeriodSeconds`
for that container into account. for that container.
For a failed readiness probe, the kubelet continues running the container that failed For a failed readiness probe, the kubelet continues running the container that failed
checks, and also continues to run more probes; because the check failed, the kubelet checks, and also continues to run more probes; because the check failed, the kubelet
sets the `Ready` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) sets the `Ready` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions)
on the Pod to `false`. on the Pod to `false`.
* `terminationGracePeriodSeconds`: configure a grace period for the kubelet to wait * `terminationGracePeriodSeconds`: configure a grace period for the kubelet to wait between
between triggering a shut down of the failed container, and then forcing the triggering a shut down of the failed container, and then forcing the container runtime to stop
container runtime to stop that container. that container.
The default is to inherit the Pod-level value for `terminationGracePeriodSeconds` The default is to inherit the Pod-level value for `terminationGracePeriodSeconds`
(30 seconds if not specified), and the minimum value is 1. (30 seconds if not specified), and the minimum value is 1.
See [probe-level `terminationGracePeriodSeconds`](#probe-level-terminationgraceperiodseconds) See [probe-level `terminationGracePeriodSeconds`](#probe-level-terminationgraceperiodseconds)
@ -435,16 +430,16 @@ until a result was returned.
This defect was corrected in Kubernetes v1.20. You may have been relying on the previous behavior, This defect was corrected in Kubernetes v1.20. You may have been relying on the previous behavior,
even without realizing it, as the default timeout is 1 second. even without realizing it, as the default timeout is 1 second.
As a cluster administrator, you can disable the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) `ExecProbeTimeout` (set it to `false`) As a cluster administrator, you can disable the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
on each kubelet to restore the behavior from older versions, then remove that override `ExecProbeTimeout` (set it to `false`) on each kubelet to restore the behavior from older versions,
once all the exec probes in the cluster have a `timeoutSeconds` value set. then remove that override once all the exec probes in the cluster have a `timeoutSeconds` value set.
If you have pods that are impacted from the default 1 second timeout, If you have pods that are impacted from the default 1 second timeout, you should update their
you should update their probe timeout so that you're ready for the probe timeout so that you're ready for the eventual removal of that feature gate.
eventual removal of that feature gate.
With the fix of the defect, for exec probes, on Kubernetes `1.20+` with the `dockershim` container runtime, With the fix of the defect, for exec probes, on Kubernetes `1.20+` with the `dockershim` container runtime,
the process inside the container may keep running even after probe returned failure because of the timeout. the process inside the container may keep running even after probe returned failure because of the timeout.
{{< /note >}} {{< /note >}}
{{< caution >}} {{< caution >}}
Incorrect implementation of readiness probes may result in an ever growing number Incorrect implementation of readiness probes may result in an ever growing number
of processes in the container, and resource starvation if this is left unchecked. of processes in the container, and resource starvation if this is left unchecked.
@ -456,15 +451,15 @@ of processes in the container, and resource starvation if this is left unchecked
have additional fields that can be set on `httpGet`: have additional fields that can be set on `httpGet`:
* `host`: Host name to connect to, defaults to the pod IP. You probably want to * `host`: Host name to connect to, defaults to the pod IP. You probably want to
set "Host" in httpHeaders instead. set "Host" in `httpHeaders` instead.
* `scheme`: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP. * `scheme`: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to "HTTP".
* `path`: Path to access on the HTTP server. Defaults to /. * `path`: Path to access on the HTTP server. Defaults to "/".
* `httpHeaders`: Custom headers to set in the request. HTTP allows repeated headers. * `httpHeaders`: Custom headers to set in the request. HTTP allows repeated headers.
* `port`: Name or number of the port to access on the container. Number must be * `port`: Name or number of the port to access on the container. Number must be
in the range 1 to 65535. in the range 1 to 65535.
For an HTTP probe, the kubelet sends an HTTP request to the specified path and For an HTTP probe, the kubelet sends an HTTP request to the specified port and
port to perform the check. The kubelet sends the probe to the pod's IP address, path to perform the check. The kubelet sends the probe to the Pod's IP address,
unless the address is overridden by the optional `host` field in `httpGet`. If unless the address is overridden by the optional `host` field in `httpGet`. If
`scheme` field is set to `HTTPS`, the kubelet sends an HTTPS request skipping the `scheme` field is set to `HTTPS`, the kubelet sends an HTTPS request skipping the
certificate verification. In most scenarios, you do not want to set the `host` field. certificate verification. In most scenarios, you do not want to set the `host` field.
@ -474,10 +469,12 @@ to 127.0.0.1. If your pod relies on virtual hosts, which is probably the more co
case, you should not use `host`, but rather set the `Host` header in `httpHeaders`. case, you should not use `host`, but rather set the `Host` header in `httpHeaders`.
For an HTTP probe, the kubelet sends two request headers in addition to the mandatory `Host` header: For an HTTP probe, the kubelet sends two request headers in addition to the mandatory `Host` header:
`User-Agent`, and `Accept`. The default values for these headers are `kube-probe/{{< skew currentVersion >}}` - `User-Agent`: The default value is `kube-probe/{{< skew currentVersion >}}`,
(where `{{< skew currentVersion >}}` is the version of the kubelet ), and `*/*` respectively. where `{{< skew currentVersion >}}` is the version of the kubelet.
- `Accept`: The default value is `*/*`.
You can override the default headers by defining `.httpHeaders` for the probe; for example You can override the default headers by defining `httpHeaders` for the probe.
For example:
```yaml ```yaml
livenessProbe: livenessProbe:
@ -511,7 +508,7 @@ startupProbe:
### TCP probes ### TCP probes
For a TCP probe, the kubelet makes the probe connection at the node, not in the pod, which For a TCP probe, the kubelet makes the probe connection at the node, not in the Pod, which
means that you can not use a service name in the `host` parameter since the kubelet is unable means that you can not use a service name in the `host` parameter since the kubelet is unable
to resolve it. to resolve it.
@ -519,13 +516,13 @@ to resolve it.
{{< feature-state for_k8s_version="v1.27" state="stable" >}} {{< feature-state for_k8s_version="v1.27" state="stable" >}}
Prior to release 1.21, the pod-level `terminationGracePeriodSeconds` was used Prior to release 1.21, the Pod-level `terminationGracePeriodSeconds` was used
for terminating a container that failed its liveness or startup probe. This for terminating a container that failed its liveness or startup probe. This
coupling was unintended and may have resulted in failed containers taking an coupling was unintended and may have resulted in failed containers taking an
unusually long time to restart when a pod-level `terminationGracePeriodSeconds` unusually long time to restart when a Pod-level `terminationGracePeriodSeconds`
was set. was set.
In 1.25 and beyond, users can specify a probe-level `terminationGracePeriodSeconds` In 1.25 and above, users can specify a probe-level `terminationGracePeriodSeconds`
as part of the probe specification. When both a pod- and probe-level as part of the probe specification. When both a pod- and probe-level
`terminationGracePeriodSeconds` are set, the kubelet will use the probe-level value. `terminationGracePeriodSeconds` are set, the kubelet will use the probe-level value.
@ -534,20 +531,20 @@ Beginning in Kubernetes 1.25, the `ProbeTerminationGracePeriod` feature is enabl
by default. For users choosing to disable this feature, please note the following: by default. For users choosing to disable this feature, please note the following:
* The `ProbeTerminationGracePeriod` feature gate is only available on the API Server. * The `ProbeTerminationGracePeriod` feature gate is only available on the API Server.
The kubelet always honors the probe-level `terminationGracePeriodSeconds` field if The kubelet always honors the probe-level `terminationGracePeriodSeconds` field if
it is present on a Pod. it is present on a Pod.
* If you have existing Pods where the `terminationGracePeriodSeconds` field is set and * If you have existing Pods where the `terminationGracePeriodSeconds` field is set and
you no longer wish to use per-probe termination grace periods, you must delete you no longer wish to use per-probe termination grace periods, you must delete
those existing Pods. those existing Pods.
* When you (or the control plane, or some other component) create replacement * When you or the control plane, or some other components create replacement
Pods, and the feature gate `ProbeTerminationGracePeriod` is disabled, then the Pods, and the feature gate `ProbeTerminationGracePeriod` is disabled, then the
API server ignores the Probe-level `terminationGracePeriodSeconds` field, even if API server ignores the Probe-level `terminationGracePeriodSeconds` field, even if
a Pod or pod template specifies it. a Pod or pod template specifies it.
{{< /note >}} {{< /note >}}
For example, For example:
```yaml ```yaml
spec: spec:
@ -577,10 +574,11 @@ It will be rejected by the API server.
## {{% heading "whatsnext" %}} ## {{% heading "whatsnext" %}}
* Learn more about * Learn more about
[Container Probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes). [Container Probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes).
You can also read the API references for: You can also read the API references for:
* [Pod](/docs/reference/kubernetes-api/workload-resources/pod-v1/), and specifically: * [Pod](/docs/reference/kubernetes-api/workload-resources/pod-v1/), and specifically:
* [container(s)](/docs/reference/kubernetes-api/workload-resources/pod-v1/#Container) * [container(s)](/docs/reference/kubernetes-api/workload-resources/pod-v1/#Container)
* [probe(s)](/docs/reference/kubernetes-api/workload-resources/pod-v1/#Probe) * [probe(s)](/docs/reference/kubernetes-api/workload-resources/pod-v1/#Probe)