Improve docs for HorizontalPodAutoscaler
Co-authored-by: Chris Negus <cnegus@redhat.com>
This commit is contained in:
parent
bc246a385d
commit
e1bf8f22b2
|
|
@ -4,42 +4,59 @@ reviewers:
|
|||
- jszczepkowski
|
||||
- justinsb
|
||||
- directxman12
|
||||
title: Horizontal Pod Autoscaler Walkthrough
|
||||
title: HorizontalPodAutoscaler Walkthrough
|
||||
content_type: task
|
||||
weight: 100
|
||||
min-kubernetes-server-version: 1.23
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
Horizontal Pod Autoscaler automatically scales the number of Pods
|
||||
in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
|
||||
(or, with beta support, on some other, application-provided metrics).
|
||||
A [HorizontalPodAutoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/)
|
||||
(HPA for short)
|
||||
automatically updates a workload resource (such as
|
||||
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
|
||||
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
|
||||
aim of automatically scaling the workload to match demand.
|
||||
|
||||
This document walks you through an example of enabling Horizontal Pod Autoscaler for the php-apache server.
|
||||
For more information on how Horizontal Pod Autoscaler behaves, see the
|
||||
[Horizontal Pod Autoscaler user guide](/docs/tasks/run-application/horizontal-pod-autoscale/).
|
||||
Horizontal scaling means that the response to increased load is to deploy more
|
||||
{{< glossary_tooltip text="Pods" term_id="pod" >}}.
|
||||
This is different from _vertical_ scaling, which for Kubernetes would mean
|
||||
assigning more resources (for example: memory or CPU) to the Pods that are already
|
||||
running for the workload.
|
||||
|
||||
If the load decreases, and the number of Pods is above the configured minimum,
|
||||
the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,
|
||||
or other similar resource) to scale back down.
|
||||
|
||||
This document walks you through an example of enabling HorizontalPodAutoscaler to
|
||||
automatically manage scale for an example web app. This example workload is Apache
|
||||
httpd running some PHP code.
|
||||
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
This example requires a running Kubernetes cluster and kubectl, version 1.2 or later.
|
||||
[Metrics server](https://github.com/kubernetes-sigs/metrics-server) monitoring needs to be deployed
|
||||
in the cluster to provide metrics through the [Metrics API](https://github.com/kubernetes/metrics).
|
||||
Horizontal Pod Autoscaler uses this API to collect metrics. To learn how to deploy the metrics-server,
|
||||
see the [metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).
|
||||
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} If you're running an older
|
||||
release of Kubernetes, refer to the version of the documentation for that release (see
|
||||
[available documentation versions](/docs/home/supported-doc-versions/).
|
||||
|
||||
To specify multiple resource metrics for a Horizontal Pod Autoscaler, you must have a
|
||||
Kubernetes cluster and kubectl at version 1.6 or later. To make use of custom metrics, your cluster
|
||||
must be able to communicate with the API server providing the custom Metrics API.
|
||||
Finally, to use metrics not related to any Kubernetes object you must have a
|
||||
Kubernetes cluster at version 1.10 or later, and you must be able to communicate
|
||||
with the API server that provides the external Metrics API.
|
||||
See the [Horizontal Pod Autoscaler user guide](/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics) for more details.
|
||||
To follow this walkthrough, you also need to use a cluster that has a
|
||||
[Metrics Server](https://github.com/kubernetes-sigs/metrics-server#readme) deployed and configured.
|
||||
The Kubernetes Metrics Server collects resource metrics from
|
||||
the {{<glossary_tooltip term_id="kubelet" text="kubelets">}} in your cluster, and exposes those metrics
|
||||
through the [Kubernetes API](/docs/concepts/overview/kubernetes-api/),
|
||||
using an [APIService](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) to add
|
||||
new kinds of resource that represent metric readings.
|
||||
|
||||
To learn how to deploy the Metrics Server, see the
|
||||
[metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).
|
||||
|
||||
<!-- steps -->
|
||||
|
||||
## Run and expose php-apache server
|
||||
|
||||
To demonstrate Horizontal Pod Autoscaler we will use a custom docker image based on the php-apache image. The Dockerfile has the following content:
|
||||
To demonstrate a HorizontalPodAutoscaler, you will first make a custom container image that uses
|
||||
the `php-apache` image from Docker Hub as its starting point. The `Dockerfile` is ready-made for you,
|
||||
and has the following content:
|
||||
|
||||
```dockerfile
|
||||
FROM php:5-apache
|
||||
|
|
@ -47,7 +64,8 @@ COPY index.php /var/www/html/index.php
|
|||
RUN chmod a+rx index.php
|
||||
```
|
||||
|
||||
It defines an index.php page which performs some CPU intensive computations:
|
||||
This code defines a simple `index.php` page that performs some CPU intensive computations,
|
||||
in order to simulate load in your cluster.
|
||||
|
||||
```php
|
||||
<?php
|
||||
|
|
@ -59,12 +77,13 @@ It defines an index.php page which performs some CPU intensive computations:
|
|||
?>
|
||||
```
|
||||
|
||||
First, we will start a deployment running the image and expose it as a service
|
||||
using the following configuration:
|
||||
Once you have made that container image, start a Deployment that runs a container using the
|
||||
image you made, and expose it as a {{< glossary_tooltip term_id="service">}}
|
||||
using the following manifest:
|
||||
|
||||
{{< codenew file="application/php-apache.yaml" >}}
|
||||
|
||||
Run the following command:
|
||||
To do so, run the following command:
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
|
||||
|
|
@ -75,16 +94,27 @@ deployment.apps/php-apache created
|
|||
service/php-apache created
|
||||
```
|
||||
|
||||
## Create Horizontal Pod Autoscaler
|
||||
## Create the HorizontalPodAutoscaler {#create-horizontal-pod-autoscaler}
|
||||
|
||||
Now that the server is running, create the autoscaler using `kubectl`. There is
|
||||
[`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands#autoscale) subcommand,
|
||||
part of `kubectl`, that helps you do this.
|
||||
|
||||
You will shortly run a command that creates a HorizontalPodAutoscaler that maintains
|
||||
between 1 and 10 replicas of the Pods controlled by the php-apache Deployment that
|
||||
you created in the first step of these instructions.
|
||||
|
||||
Roughly speaking, the HPA {{<glossary_tooltip text="controller" term_id="controller">}} will increase and decrease
|
||||
the number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.
|
||||
The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -
|
||||
and then the ReplicaSet either adds or removes Pods based on the change to its `.spec`.
|
||||
|
||||
Now that the server is running, we will create the autoscaler using
|
||||
[kubectl autoscale](/docs/reference/generated/kubectl/kubectl-commands#autoscale).
|
||||
The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods
|
||||
controlled by the php-apache deployment we created in the first step of these instructions.
|
||||
Roughly speaking, HPA will increase and decrease the number of replicas
|
||||
(via the deployment) to maintain an average CPU utilization across all Pods of 50%.
|
||||
Since each pod requests 200 milli-cores by `kubectl run`, this means an average CPU usage of 100 milli-cores.
|
||||
See [here](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details on the algorithm.
|
||||
See [Algorithm details](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details
|
||||
on the algorithm.
|
||||
|
||||
|
||||
Create the HorizontalPodAutoscaler:
|
||||
|
||||
```shell
|
||||
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
|
||||
|
|
@ -94,47 +124,64 @@ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
|
|||
horizontalpodautoscaler.autoscaling/php-apache autoscaled
|
||||
```
|
||||
|
||||
We may check the current status of autoscaler by running:
|
||||
You can check the current status of the newly-made HorizontalPodAutoscaler, by running:
|
||||
|
||||
```shell
|
||||
# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.
|
||||
kubectl get hpa
|
||||
```
|
||||
|
||||
The output is similar to:
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
|
||||
```
|
||||
|
||||
Please note that the current CPU consumption is 0% as we are not sending any requests to the server
|
||||
(the ``TARGET`` column shows the average across all the pods controlled by the corresponding deployment).
|
||||
(if you see other HorizontalPodAutoscalers with different names, that means they already existed,
|
||||
and isn't usually a problem).
|
||||
|
||||
## Increase load
|
||||
Please note that the current CPU consumption is 0% as there are no clients sending requests to the server
|
||||
(the ``TARGET`` column shows the average across all the Pods controlled by the corresponding deployment).
|
||||
|
||||
Now, we will see how the autoscaler reacts to increased load.
|
||||
We will start a container, and send an infinite loop of queries to the php-apache service (please run it in a different terminal):
|
||||
## Increase the load {#increase-load}
|
||||
|
||||
Next, see how the autoscaler reacts to increased load.
|
||||
To do this, you'll start a different Pod to act as a client. The container within the client Pod
|
||||
runs in an infinite loop, sending queries to the php-apache service.
|
||||
|
||||
```shell
|
||||
# Run this in a separate terminal
|
||||
# so that the load generation continues and you can carry on with the rest of the steps
|
||||
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
|
||||
```
|
||||
|
||||
Within a minute or so, we should see the higher CPU load by executing:
|
||||
|
||||
Now run:
|
||||
```shell
|
||||
kubectl get hpa
|
||||
# type Ctrl+C to end the watch when you're ready
|
||||
kubectl get hpa php-apache --watch
|
||||
```
|
||||
|
||||
Within a minute or so, you should see the higher CPU load; for example:
|
||||
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
|
||||
```
|
||||
|
||||
and then, more replicas. For example:
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
|
||||
```
|
||||
|
||||
Here, CPU consumption has increased to 305% of the request.
|
||||
As a result, the deployment was resized to 7 replicas:
|
||||
As a result, the Deployment was resized to 7 replicas:
|
||||
|
||||
```shell
|
||||
kubectl get deployment php-apache
|
||||
```
|
||||
|
||||
You should see the replica count matching the figure from the HorizontalPodAutoscaler
|
||||
```
|
||||
NAME READY UP-TO-DATE AVAILABLE AGE
|
||||
php-apache 7/7 7 7 19m
|
||||
|
|
@ -146,24 +193,29 @@ of load is not controlled in any way it may happen that the final number of repl
|
|||
will differ from this example.
|
||||
{{< /note >}}
|
||||
|
||||
## Stop load
|
||||
## Stop generating load {#stop-load}
|
||||
|
||||
We will finish our example by stopping the user load.
|
||||
To finish the example, stop sending the load.
|
||||
|
||||
In the terminal where we created the container with `busybox` image, terminate
|
||||
In the terminal where you created the Pod that runs a `busybox` image, terminate
|
||||
the load generation by typing `<Ctrl> + C`.
|
||||
|
||||
Then we will verify the result state (after a minute or so):
|
||||
Then verify the result state (after a minute or so):
|
||||
|
||||
```shell
|
||||
kubectl get hpa
|
||||
# type Ctrl+C to end the watch when you're ready
|
||||
kubectl get hpa php-apache --watch
|
||||
```
|
||||
|
||||
The output is similar to:
|
||||
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
|
||||
```
|
||||
|
||||
and the Deployment also shows that it has scaled down:
|
||||
|
||||
```shell
|
||||
kubectl get deployment php-apache
|
||||
```
|
||||
|
|
@ -173,11 +225,9 @@ NAME READY UP-TO-DATE AVAILABLE AGE
|
|||
php-apache 1/1 1 1 27m
|
||||
```
|
||||
|
||||
Here CPU utilization dropped to 0, and so HPA autoscaled the number of replicas back down to 1.
|
||||
Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.
|
||||
|
||||
{{< note >}}
|
||||
Autoscaling the replicas may take a few minutes.
|
||||
{{< /note >}}
|
||||
|
||||
<!-- discussion -->
|
||||
|
||||
|
|
@ -444,7 +494,7 @@ Conditions:
|
|||
Events:
|
||||
```
|
||||
|
||||
For this HorizontalPodAutoscaler, we can see several conditions in a healthy state. The first,
|
||||
For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first,
|
||||
`AbleToScale`, indicates whether or not the HPA is able to fetch and update scales, as well as
|
||||
whether or not any backoff-related conditions would prevent scaling. The second, `ScalingActive`,
|
||||
indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) and
|
||||
|
|
@ -454,7 +504,7 @@ was capped by the maximum or minimum of the HorizontalPodAutoscaler. This is an
|
|||
you may wish to raise or lower the minimum or maximum replica count constraints on your
|
||||
HorizontalPodAutoscaler.
|
||||
|
||||
## Appendix: Quantities
|
||||
## Quantities
|
||||
|
||||
All metrics in the HorizontalPodAutoscaler and metrics APIs are specified using
|
||||
a special whole-number notation known in Kubernetes as a
|
||||
|
|
@ -464,16 +514,16 @@ will return whole numbers without a suffix when possible, and will generally ret
|
|||
quantities in milli-units otherwise. This means you might see your metric value fluctuate
|
||||
between `1` and `1500m`, or `1` and `1.5` when written in decimal notation.
|
||||
|
||||
## Appendix: Other possible scenarios
|
||||
## Other possible scenarios
|
||||
|
||||
### Creating the autoscaler declaratively
|
||||
|
||||
Instead of using `kubectl autoscale` command to create a HorizontalPodAutoscaler imperatively we
|
||||
can use the following file to create it declaratively:
|
||||
can use the following manifest to create it declaratively:
|
||||
|
||||
{{< codenew file="application/hpa/php-apache.yaml" >}}
|
||||
|
||||
We will create the autoscaler by executing the following command:
|
||||
Then, create the autoscaler by executing the following command:
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/application/hpa/php-apache.yaml
|
||||
|
|
|
|||
|
|
@ -3,41 +3,59 @@ reviewers:
|
|||
- fgrzadkowski
|
||||
- jszczepkowski
|
||||
- directxman12
|
||||
title: Horizontal Pod Autoscaler
|
||||
title: Horizontal Pod Autoscaling
|
||||
feature:
|
||||
title: Horizontal scaling
|
||||
description: >
|
||||
Scale your application up and down with a simple command, with a UI, or automatically based on CPU usage.
|
||||
|
||||
content_type: concept
|
||||
weight: 90
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
The Horizontal Pod Autoscaler automatically scales the number of Pods
|
||||
in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with
|
||||
[custom metrics](https://git.k8s.io/community/contributors/design-proposals/instrumentation/custom-metrics-api.md)
|
||||
support, on some other application-provided metrics). Note that Horizontal
|
||||
Pod Autoscaling does not apply to objects that can't be scaled, for example, DaemonSets.
|
||||
In Kubernetes, a _HorizontalPodAutoscaler_ automatically updates a workload resource (such as
|
||||
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
|
||||
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
|
||||
aim of automatically scaling the workload to match demand.
|
||||
|
||||
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller.
|
||||
Horizontal scaling means that the response to increased load is to deploy more
|
||||
{{< glossary_tooltip text="Pods" term_id="pod" >}}.
|
||||
This is different from _vertical_ scaling, which for Kubernetes would mean
|
||||
assigning more resources (for example: memory or CPU) to the Pods that are already
|
||||
running for the workload.
|
||||
|
||||
If the load decreases, and the number of Pods is above the configured minimum,
|
||||
the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,
|
||||
or other similar resource) to scale back down.
|
||||
|
||||
Horizontal pod autoscaling does not apply to objects that can't be scaled (for example:
|
||||
a {{< glossary_tooltip text="DaemonSet" term_id="daemonset" >}}.)
|
||||
|
||||
The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and a
|
||||
{{< glossary_tooltip text="controller" term_id="controller" >}}.
|
||||
The resource determines the behavior of the controller.
|
||||
The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed metrics such as average CPU utilisation, average memory utilisation or any other custom metric to the target specified by the user.
|
||||
|
||||
The horizontal pod autoscaling controller, running within the Kubernetes
|
||||
{{< glossary_tooltip text="control plane" term_id="control-plane" >}}, periodically adjusts the
|
||||
desired scale of its target (for example, a Deployment) to match observed metrics such as average
|
||||
CPU utilization, average memory utilization, or any other custom metric you specify.
|
||||
|
||||
There is [walkthrough example](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) of using
|
||||
horizontal pod autoscaling.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## How does the Horizontal Pod Autoscaler work?
|
||||
## How does a HorizontalPodAutoscaler work?
|
||||
|
||||

|
||||
{{< figure src="/images/docs/horizontal-pod-autoscaler.svg" caption="HorizontalPodAutoscaler controls the scale of a Deployment and its ReplicaSet" class="diagram-medium">}}
|
||||
|
||||
The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled
|
||||
by the controller manager's `--horizontal-pod-autoscaler-sync-period` flag (with a default
|
||||
value of 15 seconds).
|
||||
Kubernetes implements horizontal pod autoscaling as a control loop that runs intermittently
|
||||
(it is not a continuous process). The interval is set by the
|
||||
`--horizontal-pod-autoscaler-sync-period` parameter to the
|
||||
[`kube-controller-manager`](/docs/reference/command-line-tools-reference/kube-controller-manager/)
|
||||
(and the default interval is 15 seconds).
|
||||
|
||||
During each period, the controller manager queries the resource utilization against the
|
||||
Once during each period, the controller manager queries the resource utilization against the
|
||||
metrics specified in each HorizontalPodAutoscaler definition. The controller manager
|
||||
obtains the metrics from either the resource metrics API (for per-pod resource metrics),
|
||||
or the custom metrics API (for all other metrics).
|
||||
|
|
@ -45,17 +63,17 @@ or the custom metrics API (for all other metrics).
|
|||
* For per-pod resource metrics (like CPU), the controller fetches the metrics
|
||||
from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler.
|
||||
Then, if a target utilization value is set, the controller calculates the utilization
|
||||
value as a percentage of the equivalent resource request on the containers in
|
||||
each Pod. If a target raw value is set, the raw metric values are used directly.
|
||||
value as a percentage of the equivalent
|
||||
[resource request](/docs/concepts/configuration/manage-resources-containers/#requests-and-limits)
|
||||
on the containers in each Pod. If a target raw value is set, the raw metric values are used directly.
|
||||
The controller then takes the mean of the utilization or the raw value (depending on the type
|
||||
of target specified) across all targeted Pods, and produces a ratio used to scale
|
||||
the number of desired replicas.
|
||||
|
||||
Please note that if some of the Pod's containers do not have the relevant resource request set,
|
||||
CPU utilization for the Pod will not be defined and the autoscaler will
|
||||
not take any action for that metric. See the [algorithm
|
||||
details](#algorithm-details) section below for more information about
|
||||
how the autoscaling algorithm works.
|
||||
not take any action for that metric. See the [algorithm details](#algorithm-details) section below
|
||||
for more information about how the autoscaling algorithm works.
|
||||
|
||||
* For per-pod custom metrics, the controller functions similarly to per-pod resource metrics,
|
||||
except that it works with raw values, not utilization values.
|
||||
|
|
@ -66,20 +84,25 @@ or the custom metrics API (for all other metrics).
|
|||
version, this value can optionally be divided by the number of Pods before the
|
||||
comparison is made.
|
||||
|
||||
The HorizontalPodAutoscaler normally fetches metrics from a series of aggregated APIs (`metrics.k8s.io`,
|
||||
`custom.metrics.k8s.io`, and `external.metrics.k8s.io`). The `metrics.k8s.io` API is usually provided by
|
||||
metrics-server, which needs to be launched separately. For more information about resource metrics, see [Metrics Server](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-server).
|
||||
The common use for HorizontalPodAutoscaler is to configure it to fetch metrics from
|
||||
{{< glossary_tooltip text="aggregated APIs" term_id="aggregation-layer" >}}
|
||||
(`metrics.k8s.io`, `custom.metrics.k8s.io`, or `external.metrics.k8s.io`). The `metrics.k8s.io` API is
|
||||
usually provided by an add on named Metrics Server, which needs to be launched separately.
|
||||
For more information about resource metrics, see
|
||||
[Metrics Server](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-server).
|
||||
|
||||
See [Support for metrics APIs](#support-for-metrics-apis) for more details.
|
||||
[Support for metrics APIs](#support-for-metrics-apis) explains the stability guarantees and support status for these
|
||||
different APIs.
|
||||
|
||||
The autoscaler accesses corresponding scalable controllers (such as replication controllers, deployments, and replica sets)
|
||||
by using the scale sub-resource. Scale is an interface that allows you to dynamically set the number of replicas and examine
|
||||
each of their current states. More details on scale sub-resource can be found
|
||||
[here](https://git.k8s.io/community/contributors/design-proposals/autoscaling/horizontal-pod-autoscaler.md#scale-subresource).
|
||||
The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deployments
|
||||
and StatefulSet). These resources each have a subresource named `scale`, an interface that allows you to dynamically set the
|
||||
number of replicas and examine each of their current states.
|
||||
For general information about subresources in the Kubernetes API, see
|
||||
[Kubernetes API Concepts](/docs/reference/using-api/api-concepts/).
|
||||
|
||||
### Algorithm Details
|
||||
### Algorithm details
|
||||
|
||||
From the most basic perspective, the Horizontal Pod Autoscaler controller
|
||||
From the most basic perspective, the HorizontalPodAutoscaler controller
|
||||
operates on the ratio between desired metric value and current metric
|
||||
value:
|
||||
|
||||
|
|
@ -89,26 +112,28 @@ desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricVal
|
|||
|
||||
For example, if the current metric value is `200m`, and the desired value
|
||||
is `100m`, the number of replicas will be doubled, since `200.0 / 100.0 ==
|
||||
2.0` If the current value is instead `50m`, we'll halve the number of
|
||||
replicas, since `50.0 / 100.0 == 0.5`. We'll skip scaling if the ratio is
|
||||
sufficiently close to 1.0 (within a globally-configurable tolerance, from
|
||||
the `--horizontal-pod-autoscaler-tolerance` flag, which defaults to 0.1).
|
||||
2.0` If the current value is instead `50m`, you'll halve the number of
|
||||
replicas, since `50.0 / 100.0 == 0.5`. The control plane skips any scaling
|
||||
action if the ratio is sufficiently close to 1.0 (within a globally-configurable
|
||||
tolerance, 0.1 by default).
|
||||
|
||||
When a `targetAverageValue` or `targetAverageUtilization` is specified,
|
||||
the `currentMetricValue` is computed by taking the average of the given
|
||||
metric across all Pods in the HorizontalPodAutoscaler's scale target.
|
||||
Before checking the tolerance and deciding on the final values, we take
|
||||
pod readiness and missing metrics into consideration, however.
|
||||
|
||||
All Pods with a deletion timestamp set (i.e. Pods in the process of being
|
||||
shut down) and all failed Pods are discarded.
|
||||
Before checking the tolerance and deciding on the final values, the control
|
||||
plane also considers whether any metrics are missing, and how many Pods
|
||||
are [`Ready`](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions).
|
||||
All Pods with a deletion timestamp set (objects with a deletion timestamp are
|
||||
in the process of being shut down / removed) are ignored, and all failed Pods
|
||||
are discarded.
|
||||
|
||||
If a particular Pod is missing metrics, it is set aside for later; Pods
|
||||
with missing metrics will be used to adjust the final scaling amount.
|
||||
|
||||
When scaling on CPU, if any pod has yet to become ready (i.e. it's still
|
||||
initializing) *or* the most recent metric point for the pod was before it
|
||||
became ready, that pod is set aside as well.
|
||||
When scaling on CPU, if any pod has yet to become ready (it's still
|
||||
initializing, or possibly is unhealthy) *or* the most recent metric point for
|
||||
the pod was before it became ready, that pod is set aside as well.
|
||||
|
||||
Due to technical constraints, the HorizontalPodAutoscaler controller
|
||||
cannot exactly determine the first time a pod becomes ready when
|
||||
|
|
@ -124,20 +149,21 @@ default is 5 minutes.
|
|||
The `currentMetricValue / desiredMetricValue` base scale ratio is then
|
||||
calculated using the remaining pods not set aside or discarded from above.
|
||||
|
||||
If there were any missing metrics, we recompute the average more
|
||||
If there were any missing metrics, the control plane recomputes the average more
|
||||
conservatively, assuming those pods were consuming 100% of the desired
|
||||
value in case of a scale down, and 0% in case of a scale up. This dampens
|
||||
the magnitude of any potential scale.
|
||||
|
||||
Furthermore, if any not-yet-ready pods were present, and we would have
|
||||
scaled up without factoring in missing metrics or not-yet-ready pods, we
|
||||
conservatively assume the not-yet-ready pods are consuming 0% of the
|
||||
desired metric, further dampening the magnitude of a scale up.
|
||||
Furthermore, if any not-yet-ready pods were present, and the workload would have
|
||||
scaled up without factoring in missing metrics or not-yet-ready pods,
|
||||
the controller conservatively assumes that the not-yet-ready pods are consuming 0%
|
||||
of the desired metric, further dampening the magnitude of a scale up.
|
||||
|
||||
After factoring in the not-yet-ready pods and missing metrics, we
|
||||
recalculate the usage ratio. If the new ratio reverses the scale
|
||||
direction, or is within the tolerance, we skip scaling. Otherwise, we use
|
||||
the new ratio to scale.
|
||||
After factoring in the not-yet-ready pods and missing metrics, the
|
||||
controller recalculates the usage ratio. If the new ratio reverses the scale
|
||||
direction, or is within the tolerance, the controller doesn't take any scaling
|
||||
action. In other cases, the new ratio is used to decide any change to the
|
||||
number of Pods.
|
||||
|
||||
Note that the *original* value for the average utilization is reported
|
||||
back via the HorizontalPodAutoscaler status, without factoring in the
|
||||
|
|
@ -173,19 +199,13 @@ When you create a HorizontalPodAutoscaler API object, make sure the name specifi
|
|||
More details about the API object can be found at
|
||||
[HorizontalPodAutoscaler Object](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#horizontalpodautoscaler-v2-autoscaling).
|
||||
|
||||
## Stability of workload scale {#flapping}
|
||||
|
||||
## Support for Horizontal Pod Autoscaler in kubectl
|
||||
When managing the scale of a group of replicas using the HorizontalPodAutoscaler,
|
||||
it is possible that the number of replicas keeps fluctuating frequently due to the
|
||||
dynamic nature of the metrics evaluated. This is sometimes referred to as *thrashing*,
|
||||
or *flapping*. It's similar to the concept of *hysteresis* in cybernetics.
|
||||
|
||||
Horizontal Pod Autoscaler, like every API resource, is supported in a standard way by `kubectl`.
|
||||
We can create a new autoscaler using `kubectl create` command.
|
||||
We can list autoscalers by `kubectl get hpa` and get detailed description by `kubectl describe hpa`.
|
||||
Finally, we can delete an autoscaler using `kubectl delete hpa`.
|
||||
|
||||
In addition, there is a special `kubectl autoscale` command for creating a HorizontalPodAutoscaler object.
|
||||
For instance, executing `kubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80`
|
||||
will create an autoscaler for replication set *foo*, with target CPU utilization set to `80%`
|
||||
and the number of replicas between 2 and 5.
|
||||
The detailed documentation of `kubectl autoscale` can be found [here](/docs/reference/generated/kubectl/kubectl-commands/#autoscale).
|
||||
|
||||
|
||||
## Autoscaling during rolling update
|
||||
|
|
@ -202,31 +222,6 @@ If you perform a rolling update of a StatefulSet that has an autoscaled number o
|
|||
replicas, the StatefulSet directly manages its set of Pods (there is no intermediate resource
|
||||
similar to ReplicaSet).
|
||||
|
||||
## Support for cooldown/delay
|
||||
|
||||
When managing the scale of a group of replicas using the Horizontal Pod Autoscaler,
|
||||
it is possible that the number of replicas keeps fluctuating frequently due to the
|
||||
dynamic nature of the metrics evaluated. This is sometimes referred to as *thrashing*.
|
||||
|
||||
Starting from v1.6, a cluster operator can mitigate this problem by tuning
|
||||
the global HPA settings exposed as flags for the `kube-controller-manager` component:
|
||||
|
||||
Starting from v1.12, a new algorithmic update removes the need for the
|
||||
upscale delay.
|
||||
|
||||
- `--horizontal-pod-autoscaler-downscale-stabilization`: Specifies the duration of the
|
||||
downscale stabilization time window. Horizontal Pod Autoscaler remembers
|
||||
the historical recommended sizes and only acts on the largest size within this time window.
|
||||
The default value is 5 minutes (`5m0s`).
|
||||
|
||||
{{< note >}}
|
||||
When tuning these parameter values, a cluster operator should be aware of the possible
|
||||
consequences. If the delay (cooldown) value is set too long, there could be complaints
|
||||
that the Horizontal Pod Autoscaler is not responsive to workload changes. However, if
|
||||
the delay value is set too short, the scale of the replicas set may keep thrashing as
|
||||
usual.
|
||||
{{< /note >}}
|
||||
|
||||
## Support for resource metrics
|
||||
|
||||
Any HPA target can be scaled based on the resource usage of the pods in the scaling target.
|
||||
|
|
@ -255,11 +250,11 @@ a single container might be running with high usage and the HPA will not scale o
|
|||
pod usage is still within acceptable limits.
|
||||
{{< /note >}}
|
||||
|
||||
### Container Resource Metrics
|
||||
### Container resource metrics
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
|
||||
|
||||
`HorizontalPodAutoscaler` also supports a container metric source where the HPA can track the
|
||||
The HorizontalPodAutoscaler API also supports a container metric source where the HPA can track the
|
||||
resource usage of individual containers across a set of Pods, in order to scale the target resource.
|
||||
This lets you configure scaling thresholds for the containers that matter most in a particular Pod.
|
||||
For example, if you have a web application and a logging sidecar, you can scale based on the resource
|
||||
|
|
@ -271,6 +266,7 @@ scaling. If the specified container in the metric source is not present or only
|
|||
of the pods then those pods are ignored and the recommendation is recalculated. See [Algorithm](#algorithm-details)
|
||||
for more details about the calculation. To use container resources for autoscaling define a metric
|
||||
source as follows:
|
||||
|
||||
```yaml
|
||||
type: ContainerResource
|
||||
containerResource:
|
||||
|
|
@ -296,30 +292,32 @@ Once you have rolled out the container name change to the workload resource, tid
|
|||
the old container name from the HPA specification.
|
||||
{{< /note >}}
|
||||
|
||||
## Support for multiple metrics
|
||||
|
||||
Kubernetes 1.6 adds support for scaling based on multiple metrics. You can use the `autoscaling/v2` API
|
||||
version to specify multiple metrics for the Horizontal Pod Autoscaler to scale on. Then, the Horizontal Pod
|
||||
Autoscaler controller will evaluate each metric, and propose a new scale based on that metric. The largest of the
|
||||
proposed scales will be used as the new scale.
|
||||
## Scaling on custom metrics
|
||||
|
||||
## Support for custom metrics
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
{{< note >}}
|
||||
Kubernetes 1.2 added alpha support for scaling based on application-specific metrics using special annotations.
|
||||
Support for these annotations was removed in Kubernetes 1.6 in favor of the new autoscaling API. While the old method for collecting
|
||||
custom metrics is still available, these metrics will not be available for use by the Horizontal Pod Autoscaler, and the former
|
||||
annotations for specifying which custom metrics to scale on are no longer honored by the Horizontal Pod Autoscaler controller.
|
||||
{{< /note >}}
|
||||
(the `autoscaling/v2beta2` API version previously provided this ability as a beta feature)
|
||||
|
||||
You can also use a HorizontalPodAutoscaler to change the scale of a
|
||||
workload based on custom metrics. You can add custom metrics for the
|
||||
Horizontal Pod Autoscaler to use in the `autoscaling/v2` API.
|
||||
Kubernetes then queries the new custom metrics API to fetch the values
|
||||
of the appropriate custom metrics.
|
||||
Provided that you use the `autoscaling/v2` API version, you can configure a HorizontalPodAutoscaler
|
||||
to scale based on a custom metric (that is not built in to Kubernetes or any Kubernetes component).
|
||||
The HorizontalPodAutoscaler controller then queries for these custom metrics from the Kubernetes
|
||||
API.
|
||||
|
||||
See [Support for metrics APIs](#support-for-metrics-apis) for the requirements.
|
||||
|
||||
## Scaling on multiple metrics
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
(the `autoscaling/v2beta2` API version previously provided this ability as a beta feature)
|
||||
|
||||
Provided that you use the `autoscaling/v2` API version, you can specify multiple metrics for a
|
||||
HorizontalPodAutoscaler to scale on. Then, the HorizontalPodAutoscaler controller evaluates each metric,
|
||||
and proposes a new scale based on that metric. The HorizontalPodAutoscaler takes the maximum scale
|
||||
recommended for each metric and sets the workload to that size (provided that this isn't larger than the
|
||||
overall maximum that you configured).
|
||||
|
||||
## Support for metrics APIs
|
||||
|
||||
By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In order for it to access these
|
||||
|
|
@ -333,8 +331,7 @@ APIs, cluster administrators must ensure that:
|
|||
It can be launched as a cluster addon.
|
||||
|
||||
* For custom metrics, this is the `custom.metrics.k8s.io` API. It's provided by "adapter" API servers provided by metrics solution vendors.
|
||||
Check with your metrics pipeline, or the [list of known solutions](https://github.com/kubernetes/metrics/blob/master/IMPLEMENTATIONS.md#custom-metrics-api).
|
||||
If you would like to write your own, check out the [boilerplate](https://github.com/kubernetes-sigs/custom-metrics-apiserver) to get started.
|
||||
Check with your metrics pipeline to see if there is a Kubernetes metrics adapter available.
|
||||
|
||||
* For external metrics, this is the `external.metrics.k8s.io` API. It may be provided by the custom metrics adapters provided above.
|
||||
|
||||
|
|
@ -346,20 +343,23 @@ and [external.metrics.k8s.io](https://github.com/kubernetes/community/blob/maste
|
|||
For examples of how to use them see [the walkthrough for using custom metrics](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics)
|
||||
and [the walkthrough for using external metrics](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects).
|
||||
|
||||
## Support for configurable scaling behavior
|
||||
## Configurable scaling behavior
|
||||
|
||||
Starting from
|
||||
[v1.18](https://github.com/kubernetes/enhancements/blob/master/keps/sig-autoscaling/853-configurable-hpa-scale-velocity/README.md)
|
||||
the `v2beta2` API (and from v1.23 the `v2` API) allows scaling
|
||||
behavior to be configured through the HPA `behavior` field. Behaviors
|
||||
are specified separately for scaling up and down in `scaleUp` or
|
||||
`scaleDown` section under the `behavior` field. A stabilization window
|
||||
can be specified for both directions which prevents the flapping of
|
||||
the number of the replicas in the scaling target. Similarly specifying
|
||||
scaling policies controls the rate of change of replicas while
|
||||
scaling.
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
### Scaling Policies
|
||||
(the `autoscaling/v2beta2` API version previously provided this ability as a beta feature)
|
||||
|
||||
If you use the `v2` HorizontalPodAutoscaler API, you can use the `behavior` field
|
||||
(see the [API reference](/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2/#HorizontalPodAutoscalerSpec))
|
||||
to configure separate scale-up and scale-down behaviors.
|
||||
You specify these behaviours by setting `scaleUp` and / or `scaleDown`
|
||||
under the `behavior` field.
|
||||
|
||||
You can specify a _stabilization window_ that prevents [flapping](#flapping)
|
||||
the replica count for a scaling target. Scaling policies also let you controls the
|
||||
rate of change of replicas while scaling.
|
||||
|
||||
### Scaling policies
|
||||
|
||||
One or more scaling policies can be specified in the `behavior` section of the spec.
|
||||
When multiple policies are specified the policy which allows the highest amount of
|
||||
|
|
@ -396,21 +396,27 @@ direction. By setting the value to `Min` which would select the policy which all
|
|||
smallest change in the replica count. Setting the value to `Disabled` completely disables
|
||||
scaling in that direction.
|
||||
|
||||
### Stabilization Window
|
||||
### Stabilization window
|
||||
|
||||
The stabilization window is used to restrict the flapping of replicas when the metrics
|
||||
used for scaling keep fluctuating. The stabilization window is used by the autoscaling
|
||||
algorithm to consider the computed desired state from the past to prevent scaling. In
|
||||
the following example the stabilization window is specified for `scaleDown`.
|
||||
The stabilization window is used to restrict the [flapping](#flapping) of
|
||||
replicas count when the metrics used for scaling keep fluctuating. The autoscaling algorithm
|
||||
uses this window to infer a previous desired state and avoid unwanted changes to workload
|
||||
scale.
|
||||
|
||||
For example, in the following example snippet, a stabilization window is specified for `scaleDown`.
|
||||
|
||||
```yaml
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
behavior:
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
```
|
||||
|
||||
When the metrics indicate that the target should be scaled down the algorithm looks
|
||||
into previously computed desired states and uses the highest value from the specified
|
||||
interval. In above example all desired states from the past 5 minutes will be considered.
|
||||
into previously computed desired states, and uses the highest value from the specified
|
||||
interval. In the above example, all desired states from the past 5 minutes will be considered.
|
||||
|
||||
This approximates a rolling maximum, and avoids having the scaling algorithm frequently
|
||||
remove Pods only to trigger recreating an equivalent Pod just moments later.
|
||||
|
||||
### Default Behavior
|
||||
|
||||
|
|
@ -498,6 +504,18 @@ behavior:
|
|||
selectPolicy: Disabled
|
||||
```
|
||||
|
||||
## Support for HorizontalPodAutoscaler in kubectl
|
||||
|
||||
HorizontalPodAutoscaler, like every API resource, is supported in a standard way by `kubectl`.
|
||||
You can create a new autoscaler using `kubectl create` command.
|
||||
You can list autoscalers by `kubectl get hpa` or get detailed description by `kubectl describe hpa`.
|
||||
Finally, you can delete an autoscaler using `kubectl delete hpa`.
|
||||
|
||||
In addition, there is a special `kubectl autoscale` command for creating a HorizontalPodAutoscaler object.
|
||||
For instance, executing `kubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80`
|
||||
will create an autoscaler for replication set *foo*, with target CPU utilization set to `80%`
|
||||
and the number of replicas between 2 and 5.
|
||||
|
||||
## Implicit maintenance-mode deactivation
|
||||
|
||||
You can implicitly deactivate the HPA for a target without the
|
||||
|
|
@ -509,7 +527,13 @@ replica count or HPA's minimum replica count.
|
|||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
If you configure autoscaling in your cluster, you may also want to consider running a
|
||||
cluster-level autoscaler such as [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler).
|
||||
|
||||
* Design documentation: [Horizontal Pod Autoscaling](https://git.k8s.io/community/contributors/design-proposals/autoscaling/horizontal-pod-autoscaler.md).
|
||||
* kubectl autoscale command: [kubectl autoscale](/docs/reference/generated/kubectl/kubectl-commands/#autoscale).
|
||||
* Usage example of [Horizontal Pod Autoscaler](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/).
|
||||
For more information on HorizontalPodAutoscaler:
|
||||
|
||||
* Read a [walkthrough example](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) for horizontal pod autoscaling.
|
||||
* Read documentation for [`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands/#autoscale).
|
||||
* If you would like to write your own custom metrics adapter, check out the
|
||||
[boilerplate](https://github.com/kubernetes-sigs/custom-metrics-apiserver) to get started.
|
||||
* Read the [API reference](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler/) for HorizontalPodAutoscaler.
|
||||
|
|
|
|||
Loading…
Reference in New Issue