autoscaler

Commit Graph

Author	SHA1	Message	Date
Damir Markovic	11d150e920	Add podScaleUpDelay annotation support	2022-09-05 20:24:19 +02:00
James Ravn	1b98b3823a	Allow balancing by labels exclusively Adds a new flag `--balance-label` which allows users to balance between node groups exclusively via labels. This gives users the flexibility to specify the similarity logic themselves when --balance-similar-node-groups is in use.	2022-07-06 10:34:18 +01:00
Maciek Pytel	ab891418f6	Limit binpacking based on #new_nodes or time The binpacking algorithm is O(#pending_pods * #new_nodes) and calculating a very large scale-up can get stuck for minutes or even hours, leading to CA failing it's healthcheck and going down. The new limiting prevents this scenario by stopping binpacking after reaching specified threshold. Any pods that remain pending as a result of shorter binpacking will be processed next autoscaler loop. The thresholds used can be controlled with newly introduced flags: --max-nodes-per-scaleup and --max-nodegroup-binpacking-duration. The limiting can be disabled by setting both flags to 0 (not recommended, especially for --max-nodegroup-binpacking-duration).	2022-06-20 17:02:51 +02:00
Michael McCune	8c27f76933	add a flag to allow event duplication this change brings in a new command line flag, `--record-duplicated-events`, which allows a user to enable the duplication of events bypassing the 5 minute de-duplication window.	2022-06-03 14:26:38 -04:00
Yaroslava Serdiuk	d919ce3fbf	Define AnnotationNodeInfoProvider processor	2022-06-03 16:12:16 +00:00
Yaroslava Serdiuk	7fe27ddf99	GCE: Add --gce-expander-ephemeral-storage-support flag	2022-06-03 16:12:09 +00:00
Kuba Tużnik	7dc0d4f57c	CA: implement Actuator boilerplate + cropping nodes to paralellism budgets	2022-05-27 14:24:10 +02:00
weidongcai	03a0475502	Expose backoff time parameters	2022-05-12 15:34:28 +08:00
Grigoris Thanasoulas	719a53e8d7	cluster-autoscaler: Add --max-pod-eviction-time flag Add a flag to allow the user configure then MaxPodEvictionTime to values other than the default 2m. This is needed in cases a pod takes more than 2 minutes to be evicted. Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>	2022-04-30 08:52:41 +03:00
Daniel Kłobuszewski	e07fd1e130	Move filter_out_schedulable to a separate package	2022-04-26 08:48:45 +02:00
Kubernetes Prow Robot	0123869b7a	Merge pull request #4452 from airbnb/es--grpc-expander-plugin Add gRPC expander plugin	2022-02-21 06:54:14 -08:00
Evan Sheng	4504f55485	Add grpc expander and tests	2022-02-16 12:34:06 -08:00
Yaroslava Serdiuk	a9a7d98f2c	Add expire time for nodeInfo cache items	2022-02-09 09:38:32 +00:00
Jayant Jain	729038ff2d	Adding support for Debugging Snapshot	2021-12-30 09:08:05 +00:00
ialidzhikov	986d62fb96	Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled) Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>	2021-12-19 15:38:17 +02:00
Diego Bonfigli	1b4fcf6bf7	Re-add default expander	2021-12-09 18:27:46 +01:00
Michael McCune	99a242a9e6	add ClusterAPI nodegroupset processor This allows the ClusterAPI provider to ignore the `topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset processor. It also adds unit tests to exercise the new processor.	2021-11-10 17:01:27 -05:00
Ryan McNamara	068af5bf7e	Allow specification of multiple expanders Multiple expanders can now be specified, expanders now "filter to the tied for best" instead of "selecting the best" so the output of one expander is now fed to the input of the next. Each expander may only be used once to disallow bad configuration. This should not be a change in functionality as in the event of a tie the random expander is still used.	2021-09-23 14:31:39 -06:00
Kubernetes Prow Robot	9f84d391f6	Merge pull request #4022 from amrmahdi/amrh/nodegroupminmaxmetrics [cluster-autoscaler] Publish node group min/max metrics	2021-07-05 07:38:54 -07:00
Daniel Kłobuszewski	081c4664d3	Add a flag to control DaemonSet eviction on non-empty nodes	2021-06-25 11:06:10 +02:00
Amr Hanafi (MAHDI))	f5c2ab7328	Emit the node group metrics behind a flag	2021-05-20 16:49:39 -07:00
Kubernetes Prow Robot	2beea02a29	Merge pull request #3983 from elmiko/cluster-resource-consumption-metrics Cluster resource consumption metrics	2021-05-13 15:32:04 -07:00
Kubernetes Prow Robot	200415e990	Merge pull request #3940 from mcristina422/patch-1 Release leader election lock on shutdown	2021-05-04 07:21:11 -07:00
Brett Elliott	3b48a3193f	Set cluster autoscaler-specific user agent. Refactored mocks to remove redundancy.	2021-04-06 17:49:35 +02:00
Michael McCune	a24ea6c66b	add cluster cores and memory bytes count metrics This change adds 4 metrics that can be used to monitor the minimum and maximum limits for CPU and memory, as well as the current counts in cores and bytes, respectively. The four metrics added are: * `cluster_autoscaler_cpu_limits_cores` * `cluster_autoscaler_cluster_cpu_current_cores` * `cluster_autoscaler_memory_limits_bytes` * `cluster_autoscaler_cluster_memory_current_bytes` This change also adds the `max_cores_total` metric to the metrics proposal doc, as it was previously not recorded there. User story: As a cluster autoscaler user, I would like to monitor my cluster through metrics to determine when the cluster is nearing its limits for cores and memory usage.	2021-04-06 10:35:21 -04:00
Michael Cristina	4cf9a98679	Release leader election lock on shutdown	2021-03-12 12:51:03 -06:00
Eric Mrak and Brett Kochendorfer	43dd34074e	Allow name of cluster-autoscaler status ConfigMap to be specificed This allows us to run two instances of cluster-autoscaler in our cluster, targeting two different types of autoscaling groups that require different command-line settings to be passed.	2021-02-17 21:52:54 +00:00
Kubernetes Prow Robot	b470c62bfa	Merge pull request #3630 from marc-sensenich/configurable-leader-election-resource-lock-name Allow for the leader election resourcelock to have a configurable name	2021-01-27 04:59:40 -08:00
Maciek Pytel	65b3c8d3cc	Rename default options to NodeGroupDefaults	2021-01-25 13:21:30 +01:00
Maciek Pytel	3e42b26a22	Per NodeGroup config for scale-down options This is the implementation of https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343.	2021-01-25 11:00:17 +01:00
Maciek Pytel	08d18a7bd0	Define interfaces for per NodeGroup config. This is the first step of implementing https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343. New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.	2021-01-25 11:00:16 +01:00
Yaroslava Serdiuk	7068bc48f6	add DaemonSet eviction option for empty nodes	2021-01-20 18:58:16 +00:00
Kubernetes Prow Robot	bb3977764b	Merge pull request #3704 from DataDog/gcp-faster-startup gcp: faster startup and refreshes with many MIGs	2021-01-15 06:11:51 -08:00
atul	7670d7b6af	Adding functionality to cordon the node before destroying it. This helps load balancer to remove the node from healthy hosts (ALB does have this support). This won't fix the issue of 502 completely as there is some time node has to live even after cordoning as to serve In-Flight request but load balancer can be configured to remove Cordon nodes from healthy host list. This feature is enabled by cordon-node-before-terminating flag with default value as false to retain existing behavior.	2021-01-14 17:21:37 +05:30
Benjamin Pineau	087df8951d	gcp: faster startup and refreshes with many MIGs With 1.5k MIGs attached to a cluster, cluster-autoscaler needs about 40mn to start. Refreshing MIGs+ITs concurrently brings that down to about 5mn. While bulk GCE API calls (triggered at startup and on Refresh() calls) and a few stateless functions (called by GetMigInstanceTemplate) become concurrent, cache accesses remains lock protected. To that effect: * Set RegenerateInstancesCache to run parallels RegenerateInstanceCacheForMig (slightly adapted so the slow GceService.FetchMigInstances call isn't locked) * Set fetchAutoMigs to run parallels registerMig: rework GetMigInstanceTemplate so the slow InstanceGroupManagers.Get+InstanceTemplates.Get calls aren't locked Tested on a large k8s cluster (> 1k MIGs) with intense scaling activity, and tested on live clusters with "go build -race" cluster-autoscaler builds.	2020-12-24 11:33:49 +01:00
Marc Sensenich	2b402b5670	Allow for the leader election resourcelock to have a configurable name	2020-10-19 08:53:29 -04:00
Benjamin Pineau	bfd6fe7fed	Ignore topology.gke.io/zone when comparing groups Commit `bb2eed1cff` introduced a new `topology.gke.io/zone` label to GCE nodes templates, for CSI needs. That label holds zone name, making nodeInfo templates dissimilar for groups belonging to different zones. The CA otherwise tries to ignore those zonal labels (ie. it ignores the standards LabelZoneRegion and LabelZoneFailureDomain) when it looks for nodegroups similarities.	2020-10-12 15:14:21 +02:00
Jason DeTiberus	150dbdeb64	[cluster-autoscaler] Support using --cloud-config for clusterapi provider - Leverage --cloud-config to allow for providing a separate kubeconfig for Cluster API management and workload cluster resources - Allow for fallback to previous behavior when --cloud-config is not specified for backward compatibility - Provides a --clusterapi-cloud-config-authoritative flag to disable the above fallback behavior and allow for both the management and workload cluster clients to use the in-cluster config	2020-09-21 10:38:06 -04:00
M. Habib Rosyad	b7e02047f7	expose max-nodes-total as a metric	2020-08-19 17:43:39 +07:00
Maciek Pytel	3c7727a603	Fixes after vendor update	2020-06-05 17:22:26 +02:00
Maciek Pytel	655b4081f4	Migrate to klog v2	2020-06-05 17:22:26 +02:00
Jakub Tużnik	c65a6bb4ea	Scalability: Switch to deltaClusterSnapshot This massively speeds up checking predicates.	2020-04-14 15:27:55 +02:00
Adam Malcontenti-Wilson	8313e969c7	Add support for passing in custom ignore labels	2020-03-17 14:30:03 +11:00
Adam Malcontenti-Wilson	5476125063	Use builder methods to create NodeInfoComparator functions	2020-03-17 13:51:15 +11:00
Andrew McDermott	1efc258b3c	config/options: add KubeConfigPath Access to this is required by cloudprovider/clusterapi.	2020-03-10 10:27:34 +00:00
Aleksandra Malinowska	70ef92a12a	Fixes in CA for vendor update	2020-02-13 15:28:29 +01:00
Łukasz Osipiuk	dd9fe48f46	Remove filterOutSchedulableSimple	2020-01-29 13:11:38 +01:00
Łukasz Osipiuk	a5abeaf94c	Add /debug/pprof http handler	2020-01-28 11:13:29 +01:00
Łukasz Osipiuk	b4c8bbb12c	Fixes around metrics/ handler	2019-11-22 14:07:10 +01:00
Colin Murphy	7f0a42b023	Add additional AWS labels. Whitelist additional node labels for AWS CNI custom networking and EC2 lifecycle. Move AWS ignored node labels to AWS specific file.	2019-10-25 17:17:02 -04:00

1 2 3 4

166 Commits