History

Marcin Wielgus e821036d5c Bump CA version to 1.1.2		2018-03-06 12:20:38 +01:00
..
Godeps	Godeps update	2017-11-28 14:00:58 +01:00
_override/google.golang.org	Add appropriate license to _override	2017-11-28 14:31:20 +01:00
cloudprovider	Add h1 and m5 AWS instance types	2017-12-04 13:16:48 -05:00
clusterstate	Increase MaxNodeStartupTime to 15 minutes.	2017-11-13 15:14:47 +01:00
config	respect minimum cores/memory limit during scale down	2017-09-13 10:10:47 +02:00
core	Delay scale-up including GPU request	2018-03-05 12:29:45 +01:00
deploy	bump container image version to 0.6.0	2017-08-09 16:03:09 -07:00
estimator	Small optimize code	2017-09-04 23:50:45 +03:00
expander	Ignore unfitness in price expander if using GPU	2018-03-05 11:55:10 +01:00
metrics	Don't register metrics unless on leading master	2018-01-15 17:42:39 +01:00
proposals	Update metrics documentation	2017-11-07 17:37:10 +01:00
simulator	Source fix after godep update	2017-11-28 14:01:43 +01:00
utils	Delay scale-up including GPU request	2018-03-05 12:29:45 +01:00
vendor	Remove Windows-specific libraries from godeps	2017-11-28 14:48:05 +01:00
.gitignore	Cluster-Autoscaler - Kubernetes client deps	2016-04-20 11:49:38 +02:00
Dockerfile	Use Debian image	2018-02-14 14:28:42 +01:00
FAQ.md	Release notes for Cluster Autoscaler 1.0.3	2017-11-17 13:35:32 +01:00
Makefile	Extra checks when pushing an image to gcr repository	2017-11-17 15:49:52 +01:00
OWNERS	Make assignees approvers and reviewers	2016-12-14 16:42:04 -08:00
README.md	Merge pull request #485 from mwielgus/azure-readme	2017-11-23 08:34:23 +01:00
fix_gopath.sh	Rename override to _override to allow ./... patterns in go command	2017-11-28 14:21:17 +01:00
kubernetes.sync	Source fix after godep update	2017-11-28 14:01:43 +01:00
main.go	Don't register metrics unless on leading master	2018-01-15 17:42:39 +01:00
push_image.sh	Extra checks when pushing an image to gcr repository	2017-11-17 15:49:52 +01:00
run.sh	Cluster-Autoscaler: added wrapper script to pass signals	2017-02-28 17:39:29 +01:00
update_toc.py	Fix update_toc.py script to stop appending empty lines	2017-06-30 14:18:18 +02:00
version.go	Bump CA version to 1.1.2	2018-03-06 12:20:38 +01:00

README.md

Cluster Autoscaler

Introduction

Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when:

there are pods that failed to run in the cluster due to insufficient resources.
some nodes in the cluster are so underutilized, for an extended period of time, that they can be deleted and their pods will be easily placed on some other, existing nodes.

FAQ/Documentation

Is available HERE.

Releases

We strongly recommend using Cluster Autoscaler with version for which it was meant. We don't do ANY cross version testing so if you put the newest Cluster Autoscaler on an old cluster there is a big chance that it won't work as expected.

Kubernetes Version	CA Version
1.8.X	1.0.X
1.7.X	0.6.X
1.6.X	0.5.X, 0.6.X^*
1.5.X	0.4.X
1.4.X	0.3.X

^*Cluster Autoscaler 0.5.X is the official version shipped with k8s 1.6. We've done some basic tests using k8s 1.6 / CA 0.6 and we're not aware of any problems with this setup. However, CA internally simulates k8s scheduler and using different versions of scheduler code can lead to subtle issues.

Notable changes

CA version 1.0.3:

Adds support for safe-to-evict annotation on pod. Pods with this annotation can be evicted even if they don't meet other requirements for it.
Fixes an issue when too many nodes with GPUs could be added during scale-up (https://github.com/kubernetes/kubernetes/issues/54959).

CA Version 1.0.2:

Fixes issues with scaling node groups using GPU from 0 to 1 on GKE (https://github.com/kubernetes/autoscaler/pull/401) and AWS (https://github.com/kubernetes/autoscaler/issues/321).
Fixes a bug where goroutines performing API calls were leaking when using dynamic config on AWS (https://github.com/kubernetes/autoscaler/issues/252).
Node Autoprovisioning support for GKE (the implementation was included in 1.0.0, but this release includes some bugfixes and introduces metrics and events).

CA Version 1.0.1:

Fixes a bug in handling nodes that, at the same time, fail to register in Kubernetes and can't be deleted from cloud provider (https://github.com/kubernetes/autoscaler/issues/369).
Improves estimation of resources available on a node when performing scale-from-0 on GCE (https://github.com/kubernetes/autoscaler/issues/326).
Bugfixes in the new GKE cloud provider implementation.

CA Version 1.0:

With this release we graduated Cluster Autoscaler to GA.

Support for 1000 nodes running 30 pods each. See: Scalability testing report
Support for 10 min graceful termination.
Improved eventing and monitoring.
Node allocatable support.
Removed Azure support. See: PR removing support with reasoning behind this decision
cluster-autoscaler.kubernetes.io/scale-down-disabled` annotation for marking nodes that should not be scaled down.
scale-down-delay-after-deleteandscale-down-delay-after-failureflags replacedscale-down-trial-interval`

CA Version 0.6:

Allows scaling node groups to 0 (currently only in GCE/GKE, other cloud providers are coming). See: How can I scale a node group to 0?
Price-based expander (currently only in GCE/GKE, other cloud providers are coming). See: What are Expanders?
Similar node groups are balanced (to be enabled with a flag). See: I'm running cluster with nodes in multiple zones for HA purposes. Is that supported by Cluster Autoscaler?
It is possible to scale-down nodes with kube-system pods if PodDisruptionBudget is provided. See: How can I scale my cluster to just 1 node?
Automatic node group discovery on AWS (to be enabled with a flag). See: AWS doc.
CA exposes runtime metrics. See: How can I monitor Cluster Autoscaler?
CA exposes an endpoint for liveness probe.
max-grateful-termination-sec flag renamed to max-graceful-termination-sec.
Lower AWS API traffic to DescribeAutoscalingGroup.

CA Version 0.5.4:

Fixes problems with node drain when pods are ignoring SIGTERM.

CA Version 0.5.3:

Fixes problems with pod anti-affinity in scale up https://github.com/kubernetes/autoscaler/issues/33.

CA Version 0.5.2:

Fixes problems with pods using persistent volume claims in scale up https://github.com/kubernetes/contrib/issues/2507.

CA Version 0.5.1:

Fixes problems with slow network route creations on cluster scale up https://github.com/kubernetes/kubernetes/issues/43709.

CA Version 0.5:

CA continues to operate even if some nodes are unready and is able to scale-down them.
CA exports its status to kube-system/cluster-autoscaler-status config map.
CA respects PodDisruptionBudgets.
Azure support.
Alpha support for dynamic config changes.
Multiple expanders to decide which node group to scale up.

CA Version 0.4:

Bulk empty node deletions.
Better scale-up estimator based on binpacking.
Improved logging.

CA Version 0.3:

AWS support.
Performance improvements around scale down.

Deployment

Cluster Autoscaler runs on the Kubernetes master node (at least in the default setup on GCE and GKE). It is possible to run customized Cluster Autoscaler inside of the cluster but then extra care needs to be taken to ensure that Cluster Autoscaler is up and running. User can put it into kube-system namespace (Cluster Autoscaler doesn't scale down node with non-manifest based kube-system pods running on them) and mark with scheduler.alpha.kubernetes.io/critical-pod annotation (so that the rescheduler, if enabled, will kill other pods to make space for it to run).

Right now it is possible to run Cluster Autoscaler on: