autoscaler/cluster-autoscaler
Andre Keedy 4f30519ee5
Add CoreWeave Cluster Autoscaler provider (#8332)
* initial commit

* Add coreweave manager and node pools using unstructure
Add provider unit tests
coverage = 68.7%
run go fmt
update comments on coreweave files
change const to unexported
Add boilerplate header
Remove unused function from manager
Add coreave tag exclusion from the build_all

* address comments and feedbacks

* use the shared *rest.Config from the autoscaler's logic

* update the comments for clouProvider in charts values file

* update charts README with coreweave cloudProvider
2025-08-11 14:19:07 -07:00
..
apis Update vendored kubernetes to 1.34.0-beta.0 2025-08-05 14:16:44 +00:00
cloudprovider Add CoreWeave Cluster Autoscaler provider (#8332) 2025-08-11 14:19:07 -07:00
clusterstate Rewrite TestCloudProvider to use builder pattern 2025-05-23 12:42:15 +00:00
config Do not remove healthy nodes from partially failing zero-or-max-scaling node pool scale-ups 2025-08-04 11:44:27 +00:00
context Move DRA provider to autoscaling context. 2025-05-08 09:30:55 +00:00
core Merge pull request #8403 from kincoy/fix-typo-autoscaler-interface-comment 2025-08-04 09:27:40 -07:00
debuggingsnapshot
estimator nit: when scheduling fails on topology constraints, skip the last node that failed scheduling 2025-06-11 09:38:49 +03:00
expander Fix typo in expander/grpcplugin/README.md 2025-05-27 15:05:30 +01:00
hack CI: make update-deps.sh macOS compatible 2025-08-04 09:43:55 -07:00
loop Fix incorrect usage of klog .*f functions 2025-03-13 13:24:52 +01:00
metrics Remove stale TODO 2025-05-29 16:29:58 +02:00
observers
processors fix: not failing the main loop when one NodeGroup fails on TemplateNodeInfo() (#8402) 2025-08-07 17:19:42 -07:00
proposals
provisioningrequest Rewrite TestCloudProvider to use builder pattern 2025-05-23 12:42:15 +00:00
simulator Export fake pods definition to a dedicated module 2025-06-27 12:16:09 +00:00
utils fix: binpacking simulator scale up optimization on pods with topology spread constraint 2025-05-28 23:43:43 +03:00
version Update vendored kubernetes to 1.34.0-beta.0 2025-08-05 14:16:44 +00:00
.gitignore
Dockerfile Use go1.24 for Cluster Autoscaler builds 2025-05-15 15:53:12 +02:00
FAQ.md Utho autoscaler (#8398) 2025-08-11 09:35:08 -07:00
Makefile CA: add release automation validation 2025-07-30 17:54:44 -07:00
OWNERS just approver 2025-07-28 09:31:05 -07:00
README.md Add CoreWeave Cluster Autoscaler provider (#8332) 2025-08-11 14:19:07 -07:00
cloudbuild.yaml small tweaks to makefile to ensure it works in Cloud Build 2025-04-28 21:28:13 +00:00
go.mod Update vendored kubernetes to 1.34.0-beta.0 2025-08-05 14:16:44 +00:00
go.sum Update vendored kubernetes to 1.34.0-beta.0 2025-08-05 14:16:44 +00:00
main.go Enable DeltaSnapshotStore to work when DRA is enabled 2025-06-11 08:29:53 +00:00
push_image.sh
update_toc.py

README.md

Cluster Autoscaler

Introduction

Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:

  • there are pods that failed to run in the cluster due to insufficient resources.
  • there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.

FAQ/Documentation

An FAQ is available HERE.

You should also take a look at the notes and "gotchas" for your specific cloud provider:

Releases

We recommend using Cluster Autoscaler with the Kubernetes control plane (previously referred to as master) version for which it was meant. The below combinations have been tested on GCP. We don't do cross version testing or compatibility testing in other environments. Some user reports indicate successful use of a newer version of Cluster Autoscaler with older clusters, however, there is always a chance that it won't work as expected.

Starting from Kubernetes 1.12, versioning scheme was changed to match Kubernetes minor releases exactly.

Kubernetes Version CA Version Chart Version
1.33.x 1.33.x 9.47.0+
1.32.x 1.32.x 9.45.0+
1.31.x 1.31.x 9.38.0+
1.30.x 1.30.x 9.37.0+
1.29.X 1.29.X 9.35.0+
1.28.X 1.28.X 9.34.0+
1.27.X 1.27.X 9.29.0+
1.26.X 1.26.X 9.28.0+
1.25.X 1.25.X
1.24.X 1.24.X 9.25.0+
1.23.X 1.23.X 9.14.0+
1.22.X 1.22.X
1.21.X 1.21.X 9.10.0+
1.20.X 1.20.X 9.5.0+
1.19.X 1.19.X
1.18.X 1.18.X 9.0.0+
1.17.X 1.17.X
1.16.X 1.16.X
1.15.X 1.15.X
1.14.X 1.14.X
1.13.X 1.13.X
1.12.X 1.12.X
1.11.X 1.3.X
1.10.X 1.2.X
1.9.X 1.1.X
1.8.X 1.0.X
1.7.X 0.6.X
1.6.X 0.5.X, 0.6.X*
1.5.X 0.4.X
1.4.X 0.3.X

*Cluster Autoscaler 0.5.X is the official version shipped with k8s 1.6. We've done some basic tests using k8s 1.6 / CA 0.6 and we're not aware of any problems with this setup. However, Cluster Autoscaler internally simulates Kubernetes' scheduler and using different versions of scheduler code can lead to subtle issues.

Schedule

Cluster Autoscaler releases new minor versions shortly after OSS Kubernetes release and patches for versions corresponding to currently supported Kubernetes versions on a roughly 2 month cadence. Currently planned schedule is below. Please note that target dates listed below are approximate and we expect up to a week difference between target ETA and the actual releases.

Date Maintainer Preparing Release Backup Maintainer Type
2025-06-11 jackfrancis gjtempleton 1.33
2025-07-16 gjtempleton towca patch
2025-08-20 towca BigDarkClown patch
2025-09-17 BigDarkClown x13n 1.34
2025-10-22 x13n jackfrancis patch
2025-11-19 jackfrancis gjtempleton patch

Additional patch releases may happen outside of the schedule in case of critical bugs or vulnerabilities.

Notable changes

For CA 1.1.2 and later, please check release notes.

CA version 1.1.1:

  • Fixes around metrics in the multiple kube apiserver configuration.
  • Fixes for unready nodes issues when quota is overrun.

CA version 1.1.0:

CA version 1.0.3:

  • Adds support for safe-to-evict annotation on pod. Pods with this annotation can be evicted even if they don't meet other requirements for it.
  • Fixes an issue when too many nodes with GPUs could be added during scale-up (https://github.com/kubernetes/kubernetes/issues/54959).

CA Version 1.0.2:

CA Version 1.0.1:

CA Version 1.0:

With this release we graduated Cluster Autoscaler to GA.

  • Support for 1000 nodes running 30 pods each. See: Scalability testing report
  • Support for 10 min graceful termination.
  • Improved eventing and monitoring.
  • Node allocatable support.
  • Removed Azure support. See: PR removing support with reasoning behind this decision
  • cluster-autoscaler.kubernetes.io/scale-down-disabled annotation for marking nodes that should not be scaled down.
  • scale-down-delay-after-delete and scale-down-delay-after-failure flags replaced scale-down-trial-interval

CA Version 0.6:

CA Version 0.5.4:

  • Fixes problems with node drain when pods are ignoring SIGTERM.

CA Version 0.5.3:

CA Version 0.5.2:

CA Version 0.5.1:

CA Version 0.5:

  • CA continues to operate even if some nodes are unready and is able to scale-down them.
  • CA exports its status to kube-system/cluster-autoscaler-status config map.
  • CA respects PodDisruptionBudgets.
  • Azure support.
  • Alpha support for dynamic config changes.
  • Multiple expanders to decide which node group to scale up.

CA Version 0.4:

  • Bulk empty node deletions.
  • Better scale-up estimator based on binpacking.
  • Improved logging.

CA Version 0.3:

  • AWS support.
  • Performance improvements around scale down.

Deployment

Cluster Autoscaler is designed to run on Kubernetes control plane (previously referred to as master) node. This is the default deployment strategy on GCP. It is possible to run a customized deployment of Cluster Autoscaler on worker nodes, but extra care needs to be taken to ensure that Cluster Autoscaler remains up and running. Users can put it into kube-system namespace (Cluster Autoscaler doesn't scale down node with non-mirrored kube-system pods running on them) and set a priorityClassName: system-cluster-critical property on your pod spec (to prevent your pod from being evicted).

Supported cloud providers: