Cluster-autoscaler: improve readme
This commit is contained in:
parent
6a28643880
commit
2ed1203e42
|
|
@ -19,6 +19,25 @@ there is a big chance that it won't work as expected.
|
||||||
| 1.5.X | 0.4.X |
|
| 1.5.X | 0.4.X |
|
||||||
| 1.4.X | 0.3.X |
|
| 1.4.X | 0.3.X |
|
||||||
|
|
||||||
|
# Notable changes:
|
||||||
|
|
||||||
|
CA Version 0.5:
|
||||||
|
* CA continues to operate even if some nodes are unready and is able to scale-down them.
|
||||||
|
* CA exports its status to kube-system/cluster-autoscaler-status config map.
|
||||||
|
* CA respects PodDisruptionBudgets.
|
||||||
|
* Azure support.
|
||||||
|
* Alpha support for dynamic config changes.
|
||||||
|
* Multiple expanders to decide which node group to scale up.
|
||||||
|
|
||||||
|
CA Version 0.4:
|
||||||
|
* Bulk empty node deletions.
|
||||||
|
* Better scale-up estimator based on binpacking.
|
||||||
|
* Improved logging.
|
||||||
|
|
||||||
|
CA Version 0.3:
|
||||||
|
* AWS support.
|
||||||
|
* Performance improvements around scale down.
|
||||||
|
|
||||||
# Deployment
|
# Deployment
|
||||||
|
|
||||||
Cluster Autoscaler runs on the Kubernetes master node (at least in the default setup on GCE and GKE).
|
Cluster Autoscaler runs on the Kubernetes master node (at least in the default setup on GCE and GKE).
|
||||||
|
|
@ -32,13 +51,14 @@ Right now it is possible to run Cluster Autoscaler on:
|
||||||
* GCE http://kubernetes.io/docs/admin/cluster-management/#cluster-autoscaling
|
* GCE http://kubernetes.io/docs/admin/cluster-management/#cluster-autoscaling
|
||||||
* GKE https://cloud.google.com/container-engine/docs/cluster-autoscaler
|
* GKE https://cloud.google.com/container-engine/docs/cluster-autoscaler
|
||||||
* AWS https://github.com/kubernetes/contrib/blob/master/cluster-autoscaler/cloudprovider/aws/README.md
|
* AWS https://github.com/kubernetes/contrib/blob/master/cluster-autoscaler/cloudprovider/aws/README.md
|
||||||
|
* Azure
|
||||||
|
|
||||||
# Scale Up
|
# Scale Up
|
||||||
|
|
||||||
Scale up creates a watch on the api server looking for all pods. Every 10 seconds (configurable)
|
Scale up creates a watch on the api server looking for all pods. Every 10 seconds (configurable)
|
||||||
it checks for any unschedulable pods. A pod is unschedulable when the Kubernetes scheduler is unable
|
it checks for any unschedulable pods. A pod is unschedulable when the Kubernetes scheduler is unable
|
||||||
to find a node that can accomodate the pod. For example a pod can request more CPU that is
|
to find a node that can accommodate the pod. For example a pod can request more CPU that is
|
||||||
available on any of the cluster nodes. Unschedulable pods are reconginzed by their PodCondition.
|
available on any of the cluster nodes. Unschedulable pods are recognized by their PodCondition.
|
||||||
Whenever a kubernetes scheduler fails to find a place to run a pod it sets "schedulable"
|
Whenever a kubernetes scheduler fails to find a place to run a pod it sets "schedulable"
|
||||||
PodCondition to false and reason to "unschedulable". If there are any items on the unschedulable
|
PodCondition to false and reason to "unschedulable". If there are any items on the unschedulable
|
||||||
lists Cluster Autoscaler tries to find a new place to run them.
|
lists Cluster Autoscaler tries to find a new place to run them.
|
||||||
|
|
@ -51,7 +71,7 @@ will have all pods run from the node manifest or daemon sets).
|
||||||
|
|
||||||
Based on the above assumption Cluster Autoscaler creates template nodes for each of the
|
Based on the above assumption Cluster Autoscaler creates template nodes for each of the
|
||||||
node groups and checks if any of the unschedulable pods would fit to a brand new node, if created.
|
node groups and checks if any of the unschedulable pods would fit to a brand new node, if created.
|
||||||
While it may sound similar to what the real scheduler does it is currently quite simplified and
|
While it may sound similar to what the real scheduler does, it is currently quite simplified and
|
||||||
may require multiple iterations before all of the pods are eventually scheduled.
|
may require multiple iterations before all of the pods are eventually scheduled.
|
||||||
If there are multiple node groups that, if increased, would help with getting some pods running,
|
If there are multiple node groups that, if increased, would help with getting some pods running,
|
||||||
different strategies can be selected for choosing which node group is increased. The default is
|
different strategies can be selected for choosing which node group is increased. The default is
|
||||||
|
|
@ -75,7 +95,7 @@ under control of a deployment, replica set, replication controller or job would
|
||||||
if the node is deleted so they make a node needed, even if its utilization is low. While
|
if the node is deleted so they make a node needed, even if its utilization is low. While
|
||||||
checking this condition the new locations of all pods are memorized. With that Cluster Autoscaler
|
checking this condition the new locations of all pods are memorized. With that Cluster Autoscaler
|
||||||
knows where each pod can be moved and which nodes depend on which other nodes in terms of
|
knows where each pod can be moved and which nodes depend on which other nodes in terms of
|
||||||
pod migration. Of course, it may happen that eventaully the scheduler will place the pods
|
pod migration. Of course, it may happen that eventually the scheduler will place the pods
|
||||||
somewhere else.
|
somewhere else.
|
||||||
|
|
||||||
* There are no kube-system pods on the node (except these that run on all nodes by default like
|
* There are no kube-system pods on the node (except these that run on all nodes by default like
|
||||||
|
|
@ -94,13 +114,13 @@ For example if node A is deleted then its pods, consuming 400m CPU, are moved to
|
||||||
X where is 450m CPU available. Ok, but what other nodes that also were eligible for deletion? Well,
|
X where is 450m CPU available. Ok, but what other nodes that also were eligible for deletion? Well,
|
||||||
it depends. If node B also wanted to move its pods, consuming 350m CPU, to node X then it cannot
|
it depends. If node B also wanted to move its pods, consuming 350m CPU, to node X then it cannot
|
||||||
do it anymore as there is almost no capacity left. It has to them somewhere else, and it is not sure that
|
do it anymore as there is almost no capacity left. It has to them somewhere else, and it is not sure that
|
||||||
if A had been deleted much earlier then B, during the last 10 min, would always have a place to
|
if A had been deleted much earlier than B, during the last 10 min, would always have a place to
|
||||||
move its pods. So the requirement of being unused for 10 min may not be valid anymore for B.
|
move its pods. So the requirement of being unused for 10 min may not be valid anymore for B.
|
||||||
But if another node C, in case of deletion, can move its pods to node Y then it
|
But if another node C, in case of deletion, can move its pods to node Y then it
|
||||||
may still do it, because noone touched Y. So C can be deleted immediatelly after A. And B not.
|
may still do it, because noone touched Y. So C can be deleted immediately after A. And B not.
|
||||||
|
|
||||||
Cluster Autoscaler does all of this acounting based on the simulations and memorized new pod location.
|
Cluster Autoscaler does all of this accounting based on the simulations and memorized new pod location.
|
||||||
They may not always be precise (pods can land elswehere) but it seems to be a good heuristic so far.
|
They may not always be precise (pods can land elsewhere) but it seems to be a good heuristic so far.
|
||||||
|
|
||||||
|
|
||||||
# When scaling is executed
|
# When scaling is executed
|
||||||
|
|
@ -110,3 +130,13 @@ measured on the cloud provider side, matches the number of nodes in Kubernetes t
|
||||||
node group. If this condition is not met then all scaling operations are postponed until it is
|
node group. If this condition is not met then all scaling operations are postponed until it is
|
||||||
fulfilled.
|
fulfilled.
|
||||||
Also, any scale down will happen only after at least 10 min after the last scale up.
|
Also, any scale down will happen only after at least 10 min after the last scale up.
|
||||||
|
|
||||||
|
# Unready nodes
|
||||||
|
|
||||||
|
From 0.5 CA continues the work even if some (up to 33% or not greater than 3, configurable via flag) percentage of nodes
|
||||||
|
is unavailable. Once there is more unready nodes in the cluster CA pauses all operations until the situation
|
||||||
|
improves. If there is less unready nodes but they are concentrated in a particular node group
|
||||||
|
then this node group may be excluded from scale-ups.
|
||||||
|
Prior to 0.5 CA stopped all operations when a single node became unready.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue