Concepts/ClusterAdministration: Expand Node Autoscaling documentation

2024-04-08 15:52:21 +02:00 · 2024-04-08 15:52:21 +02:00 · 7c99a67b36
parent 4d3749acb9
commit 7c99a67b36
2 changed files with 284 additions and 118 deletions
--- a/content/en/docs/concepts/cluster-administration/cluster-autoscaling.md
+++ b/content/en/docs/concepts/cluster-administration/cluster-autoscaling.md
@ -1,118 +0,0 @@
---
-title: Cluster Autoscaling
-linkTitle: Cluster Autoscaling
-description: >-
-  Automatically manage the nodes in your cluster to adapt to demand.
-content_type: concept
-weight: 120
---
-
-<!-- overview -->
-
-Kubernetes requires {{< glossary_tooltip text="nodes" term_id="node" >}} in your cluster to
-run {{< glossary_tooltip text="pods" term_id="pod" >}}. This means providing capacity for
-the workload Pods and for Kubernetes itself.
-
-You can adjust the amount of resources available in your cluster automatically:
-_node autoscaling_. You can either change the number of nodes, or change the capacity
-that nodes provide. The first approach is referred to as _horizontal scaling_, while the
-second is referred to as _vertical scaling_.
-
-Kubernetes can even provide multidimensional automatic scaling for nodes.
-
-<!-- body -->
-
-## Manual node management
-
-You can manually manage node-level capacity, where you configure a fixed amount of nodes;
-you can use this approach even if the provisioning (the process to set up, manage, and
-decommission) for these nodes is automated.
-
-This page is about taking the next step, and automating management of the amount of
-node capacity (CPU, memory, and other node resources) available in your cluster.
-
-## Automatic horizontal scaling {#autoscaling-horizontal}
-
-### Cluster Autoscaler
-
-You can use the [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) to manage the scale of your nodes automatically.
-The cluster autoscaler can integrate with a cloud provider, or with Kubernetes'
-[cluster API](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md),
-to achieve the actual node management that's needed.
-
-The cluster autoscaler adds nodes when there are unschedulable Pods, and
-removes nodes when those nodes are empty.
-
-#### Cloud provider integrations {#cluster-autoscaler-providers}
-
-The [README](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/README.md)
-for the cluster autoscaler lists some of the cloud provider integrations
-that are available.
-
-## Cost-aware multidimensional scaling {#autoscaling-multi-dimension}
-
-### Karpenter {#autoscaler-karpenter}
-
-[Karpenter](https://karpenter.sh/) supports direct node management, via
-plugins that integrate with specific cloud providers, and can manage nodes
-for you whilst optimizing for overall cost.
-
-> Karpenter automatically launches just the right compute resources to
-> handle your cluster's applications. It is designed to let you take
-> full advantage of the cloud with fast and simple compute provisioning
-> for Kubernetes clusters.
-
-The Karpenter tool is designed to integrate with a cloud provider that
-provides API-driven server management, and where the price information for
-available servers is also available via a web API.
-
-For example, if you start some more Pods in your cluster, the Karpenter
-tool might buy a new node that is larger than one of the nodes you are
-already using, and then shut down an existing node once the new node
-is in service.
-
-#### Cloud provider integrations {#karpenter-providers}
-
-{{% thirdparty-content vendor="true" %}}
-
-There are integrations available between Karpenter's core and the following
-cloud providers:
-
- [Amazon Web Services](https://github.com/aws/karpenter-provider-aws)
- [Azure](https://github.com/Azure/karpenter-provider-azure)
-
-
-## Related components
-
-### Descheduler
-
-The [descheduler](https://github.com/kubernetes-sigs/descheduler) can help you
-consolidate Pods onto a smaller number of nodes, to help with automatic scale down
-when the cluster has spare capacity.
-
-### Sizing a workload based on cluster size
-
-#### Cluster proportional autoscaler
-
-For workloads that need to be scaled based on the size of the cluster (for example
-`cluster-dns` or other system components), you can use the
-[_Cluster Proportional Autoscaler_](https://github.com/kubernetes-sigs/cluster-proportional-autoscaler).<br />
-
-The Cluster Proportional Autoscaler watches the number of schedulable nodes
-and cores, and scales the number of replicas of the target workload accordingly.
-
-#### Cluster proportional vertical autoscaler
-
-If the number of replicas should stay the same, you can scale your workloads vertically according to the cluster size using
-the [_Cluster Proportional Vertical Autoscaler_](https://github.com/kubernetes-sigs/cluster-proportional-vertical-autoscaler).
-This project is in **beta** and can be found on GitHub.
-
-While the Cluster Proportional Autoscaler scales the number of replicas of a workload, the Cluster Proportional Vertical Autoscaler
-adjusts the resource requests for a workload (for example a Deployment or DaemonSet) based on the number of nodes and/or cores
-in the cluster.
-
-
-## {{% heading "whatsnext" %}}
-
- Read about [workload-level autoscaling](/docs/concepts/workloads/autoscaling/)
- Read about [node overprovisioning](/docs/tasks/administer-cluster/node-overprovisioning/)
--- a/content/en/docs/concepts/cluster-administration/node-autoscaling.md
+++ b/content/en/docs/concepts/cluster-administration/node-autoscaling.md
@ -0,0 +1,284 @@
+---
+reviewers:
+- gjtempleton
+- jonathan-innis
+- maciekpytel
+title: Node Autoscaling
+linkTitle: Node Autoscaling
+description: >-
+  Automatically provision and consolidate the Nodes in your cluster to adapt to demand and optimize cost.
+content_type: concept
+weight: 15
+---
+
+In order to run workloads in your cluster, you need
+{{< glossary_tooltip text="Nodes" term_id="node" >}}. Nodes in your cluster can be _autoscaled_ -
+dynamically [_provisioned_](#provisioning), or [_consolidated_](#consolidation) to provide needed
+capacity while optimizing cost. Autoscaling is performed by Node [_autoscalers_](#autoscalers).
+
+## Node provisioning {#provisioning}
+
+If there are Pods in a cluster that can't be scheduled on existing Nodes, new Nodes can be
+automatically added to the cluster&mdash;_provisioned_&mdash;to accommodate the Pods. This is
+especially useful if the number of Pods changes over time, for example as a result of
+[combining horizontal workload with Node autoscaling](#horizontal-workload-autoscaling).
+
+Autoscalers provision the Nodes by creating and deleting cloud provider resources backing them. Most
+commonly, the resources backing the Nodes are Virtual Machines.
+
+The main goal of provisioning is to make all Pods schedulable. This goal is not always attainable
+because of various limitations, including reaching configured provisioning limits, provisioning
+configuration not being compatible with a particular set of pods, or the lack of cloud provider
+capacity. While provisioning, Node autoscalers often try to achieve additional goals (for example
+minimizing the cost of the provisioned Nodes or balancing the number of Nodes between failure
+domains).
+
+There are two main inputs to a Node autoscaler when determining Nodes to
+provision&mdash;[Pod scheduling constraints](#provisioning-pod-constraints),
+and [Node constraints imposed by autoscaler configuration](#provisioning-node-constraints).
+
+Autoscaler configuration may also include other Node provisioning triggers (for example the number
+of Nodes falling below a configured minimum limit).
+
+{{< note >}}
+Provisioning was formerly known as _scale-up_ in Cluster Autoscaler.
+{{< /note >}}
+
+### Pod scheduling constraints {#provisioning-pod-constraints}
+
+Pods can express [scheduling constraints](/docs/concepts/scheduling-eviction/assign-pod-node/) to
+impose limitations on the kind of Nodes they can be scheduled on. Node autoscalers take these
+constraints into account to ensure that the pending Pods can be scheduled on the provisioned Nodes.
+
+The most common kind of scheduling constraints are the resource requests specified by Pod
+containers. Autoscalers will make sure that the provisioned Nodes have enough resources to satisfy
+the requests. However, they don't directly take into account the real resource usage of the Pods
+after they start running. In order to autoscale Nodes based on actual workload resource usage, you
+can combine [horizontal workload autoscaling](#horizontal-workload-autoscaling) with Node
+autoscaling.
+
+Other common Pod scheduling constraints include
+[Node affinity](/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity),
+[inter-Pod affinity](/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity),
+or a requirement for a particular [storage volume](/docs/concepts/storage/volumes/).
+
+### Node constraints imposed by autoscaler configuration {#provisioning-node-constraints}
+
+The specifics of the provisioned Nodes (for example the amount of resources, the presence of a given
+label) depend on autoscaler configuration. Autoscalers can either choose them from a pre-defined set
+of Node configurations, or use [auto-provisioning](#autoprovisioning).
+
+### Auto-provisioning {#autoprovisioning}
+
+Node auto-provisioning is a mode of provisioning in which a user doesn't have to fully configure the
+specifics of the Nodes that can be provisioned. Instead, the autoscaler dynamically chooses the Node
+configuration based on the pending Pods it's reacting to, as well as pre-configured constraints (for
+example, the minimum amount of resources or the need for a given label).
+
+## Node consolidation {#consolidation}
+
+The main consideration when running a cluster is ensuring that all schedulable pods are running,
+whilst keeping the cost of the cluster as low as possible. To achieve this, the Pods' resource
+requests should utilize as much of the Nodes' resources as possible. From this perspective, the
+overall Node utilization in a cluster can be used as a proxy for how cost-effective the cluster is.
+
+{{< note >}}
+Correctly setting the resource requests of your Pods is as important to the overall
+cost-effectiveness of a cluster as optimizing Node utilization.
+Combining Node autoscaling with [vertical workload autoscaling](#vertical-workload-autoscaling) can
+help you achieve this.
+{{< /note >}}
+
+Nodes in your cluster can be automatically _consolidated_ in order to improve the overall Node
+utilization, and in turn the cost-effectiveness of the cluster. Consolidation happens through
+removing a set of underutilized Nodes from the cluster. Optionally, a different set of Nodes can
+be [provisioned](#provisioning) to replace them.
+
+Consolidation, like provisioning, only considers Pod resource requests and not real resource usage
+when making decisions.
+
+For the purpose of consolidation, a Node is considered _empty_ if it only has DaemonSet and static
+Pods running on it. Removing empty Nodes during consolidation is more straightforward than non-empty
+ones, and autoscalers often have optimizations designed specifically for consolidating empty Nodes.
+
+Removing non-empty Nodes during consolidation is disruptive&mdash;the Pods running on them are
+terminated, and possibly have to be recreated (for example by a Deployment). However, all such
+recreated Pods should be able to schedule on existing Nodes in the cluster, or the replacement Nodes
+provisioned as part of consolidation. __No Pods should normally become pending as a result of
+consolidation.__
+
+{{< note >}}
+Autoscalers predict how a recreated Pod will likely be scheduled after a Node is provisioned or
+consolidated, but they don't control the actual scheduling. Because of this, some Pods might
+become pending as a result of consolidation - if for example a completely new Pod appears while
+consolidation is being performed.
+{{< /note >}}
+
+Autoscaler configuration may also enable triggering consolidation by other conditions (for example,
+the time elapsed since a Node was created), in order to optimize different properties (for example,
+the maximum lifespan of Nodes in a cluster).
+
+The details of how consolidation is performed depend on the configuration of a given autoscaler.
+
+{{< note >}}
+Consolidation was formerly known as _scale-down_ in Cluster Autoscaler.
+{{< /note >}}
+
+## Autoscalers {#autoscalers}
+
+The functionalities described in previous sections are provided by Node _autoscalers_. In addition
+to the Kubernetes API, autoscalers also need to interact with cloud provider APIs to provision and
+consolidate Nodes. This means that they need to be explicitly integrated with each supported cloud
+provider. The performance and feature set of a given autoscaler can differ between cloud provider
+integrations.
+
+{{< mermaid >}}
+graph TD
+    na[Node autoscaler]
+    k8s[Kubernetes]
+    cp[Cloud Provider]
+
+    k8s --> |get Pods/Nodes|na
+    na --> |drain Nodes|k8s
+    na --> |create/remove resources backing Nodes|cp
+    cp --> |get resources backing Nodes|na
+
+    classDef white_on_blue fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff;
+    classDef blue_on_white fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5;
+    class na blue_on_white;
+    class k8s,cp white_on_blue;
+{{</ mermaid >}}
+
+### Autoscaler implementations
+
+[Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
+and [Karpenter](https://github.com/kubernetes-sigs/karpenter) are the two Node autoscalers currently
+sponsored by [SIG Autoscaling](https://github.com/kubernetes/community/tree/master/sig-autoscaling).
+
+From the perspective of a cluster user, both autoscalers should provide a similar Node autoscaling
+experience. Both will provision new Nodes for unschedulable Pods, and both will consolidate the
+Nodes that are no longer optimally utilized.
+
+Different autoscalers may also provide features outside the Node autoscaling scope described on this
+page, and those additional features may differ between them.
+
+Consult the sections below, and the linked documentation for the individual autoscalers to decide
+which autoscaler fits your use case better.
+
+#### Cluster Autoscaler
+
+Cluster Autoscaler adds or removes Nodes to pre-configured _Node groups_. Node groups generally map
+to some sort of cloud provider resource group (most commonly a Virtual Machine group). A single
+instance of Cluster Autoscaler can simultaneously manage multiple Node groups. When provisioning,
+Cluster Autoscaler will add Nodes to the group that best fits the requests of pending Pods. When
+consolidating, Cluster Autoscaler always selects specific Nodes to remove, as opposed to just
+resizing the underlying cloud provider resource group.
+
+Additional context:
+
+* [Documentation overview](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/README.md)
+* [Cloud provider integrations](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/README.md#faqdocumentation)
+* [Cluster Autoscaler FAQ](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)
+* [Contact](https://github.com/kubernetes/community/tree/master/sig-autoscaling#contact)
+
+#### Karpenter
+
+Karpenter auto-provisions Nodes based on [NodePool](https://karpenter.sh/docs/concepts/nodepools/)
+configurations provided by the cluster operator. Karpenter handles all aspects of node lifecycle,
+not just autoscaling. This includes automatically refreshing Nodes once they reach a certain
+lifetime, and auto-upgrading Nodes when new worker Node images are released. It works directly with
+individual cloud provider resources (most commonly individual Virtual Machines), and doesn't rely on
+cloud provider resource groups.
+
+Additional context:
+
+* [Documentation](https://karpenter.sh/)
+* [Cloud provider integrations](https://github.com/kubernetes-sigs/karpenter?tab=readme-ov-file#karpenter-implementations)
+* [Karpenter FAQ](https://karpenter.sh/docs/faq/)
+* [Contact](https://github.com/kubernetes-sigs/karpenter#community-discussion-contribution-and-support)
+
+#### Implementation comparison
+
+Main differences between Cluster Autoscaler and Karpenter:
+
+* Cluster Autoscaler provides features related to just Node autoscaling. Karpenter has a wider
+  scope, and also provides features intended for managing Node lifecycle altogether (for example,
+  utilizing disruption to auto-recreate Nodes once they reach a certain lifetime, or auto-upgrade
+  them to new versions).
+* Cluster Autoscaler doesn't support auto-provisioning, the Node groups it can provision from have
+  to be pre-configured. Karpenter supports auto-provisioning, so the user only has to configure a
+  set of constraints for the provisioned Nodes, instead of fully configuring homogenous groups.
+* Cluster Autoscaler provides cloud provider integrations directly, which means that they're a part
+  of the Kubernetes project. For Karpenter, the Kubernetes project publishes Karpenter as a library
+  that cloud providers can integrate with to build a Node autoscaler.
+* Cluster Autoscaler provides integrations with numerous cloud providers, including smaller and less
+  popular providers. There are fewer cloud providers that integrate with Karpenter, including
+  [AWS](https://github.com/aws/karpenter-provider-aws), and
+  [Azure](https://github.com/Azure/karpenter-provider-azure).
+
+## Combine workload and Node autoscaling
+
+### Horizontal workload autoscaling {#horizontal-workload-autoscaling}
+
+Node autoscaling usually works in response to Pods&mdash;it provisions new Nodes to accommodate
+unschedulable Pods, and then consolidates the Nodes once they're no longer needed.
+
+[Horizontal workload autoscaling](/docs/concepts/workloads/autoscaling#scaling-workloads-horizontally)
+automatically scales the number of workload replicas to maintain a desired average resource
+utilization across the replicas. In other words, it automatically creates new Pods in response to
+application load, and then removes the Pods once the load decreases.
+
+You can use Node autoscaling together with horizontal workload autoscaling to autoscale the Nodes in
+your cluster based on the average real resource utilization of your Pods.
+
+If the application load increases, the average utilization of its Pods should also increase,
+prompting workload autoscaling to create new Pods. Node autoscaling should then provision new Nodes
+to accommodate the new Pods.
+
+Once the application load decreases, workload autoscaling should remove unnecessary Pods. Node
+autoscaling should, in turn, consolidate the Nodes that are no longer needed.
+
+If configured correctly, this pattern ensures that your application always has the Node capacity to
+handle load spikes if needed, but you don't have to pay for the capacity when it's not needed.
+
+### Vertical workload autoscaling {#vertical-workload-autoscaling}
+
+When using Node autoscaling, it's important to set Pod resource requests correctly. If the requests
+of a given Pod are too low, provisioning a new Node for it might not help the Pod actually run.
+If the requests of a given Pod are too high, it might incorrectly prevent consolidating its Node.
+
+[Vertical workload autoscaling](/docs/concepts/workloads/autoscaling#scaling-workloads-vertically)
+automatically adjusts the resource requests of your Pods based on their historical resource usage.
+
+You can use Node autoscaling together with vertical workload autoscaling in order to adjust the
+resource requests of your Pods while preserving Node autoscaling capabilities in your cluster.
+
+{{< caution >}}
+When using Node autoscaling, it's not recommended to set up vertical workload autoscaling for
+DaemonSet Pods. Autoscalers have to predict what DaemonSet Pods on a new Node will look like in
+order to predict available Node resources. Vertical workload autoscaling might make these
+predictions unreliable, leading to incorrect scaling decisions.
+{{</ caution >}}
+
+## Related components
+
+This section describes components providing functionality related to Node autoscaling.
+
+### Descheduler
+
+The [descheduler](https://github.com/kubernetes-sigs/descheduler) is a component providing Node
+consolidation functionality based on custom policies, as well as other features related to
+optimizing Nodes and Pods (for example deleting frequently restarting Pods).
+
+### Workload autoscalers based on cluster size
+
+[Cluster Proportional Autoscaler](https://github.com/kubernetes-sigs/cluster-proportional-autoscaler)
+and [Cluster Proportional Vertical
+Autoscaler](https://github.com/kubernetes-sigs/cluster-proportional-vertical-autoscaler) provide
+horizontal, and vertical workload autoscaling based on the number of Nodes in the cluster. You can
+read more in
+[autoscaling based on cluster size](/docs/concepts/workloads/autoscaling#autoscaling-based-on-cluster-size).
+
+## {{% heading "whatsnext" %}}
+
+- Read about [workload-level autoscaling](/docs/concepts/workloads/autoscaling/)