autoscaler

Commit Graph

Author	SHA1	Message	Date
elmiko	771b9ee591	add logging for failed node balancing this change adds debug logs at level 5 to aid in triaging failed node balancing. It adds logs to help determine why two node groups are not considered as similar. These logs can be quite noisy so the logging level has been set to 5 by default.	2025-06-25 10:43:37 -04:00
Bartłomiej Wróblewski	2c7d8dc378	Rewrite TestCloudProvider to use builder pattern	2025-05-23 12:42:15 +00:00
Walid Ghallab	720f5946fd	Refactor NewAutoscalerError function. We will have two functions instead of one: 1. One that doesn't do formatting, like klog.Error 2. One that accepts formating, like klog.Errorf The main reason behind this is to avoid go vet errors and have clear interfaces to catch accidental bugs and rely on go vet to catch those accidental bugs (or go test in go 1.24, as those are treated as errors).	2024-12-16 17:46:40 +00:00
Kuba Tużnik	879c6a84a4	DRA: migrate all of CA to use the new internal NodeInfo/PodInfo The new wrapper types should behave like the direct schedulerframework types for most purposes, so most of the migration is just changing the imported package. Constructors look a bit different, so they have to be adapted - mostly in test code. Accesses to the Pods field have to be changed to a method call. After this, the schedulerframework types are only used in the new wrappers, and in the parts of simulator/ that directly interact with the scheduler framework. The rest of CA codebase operates on the new wrapper types.	2024-11-05 16:43:43 +01:00
Damika Gamlath	e20e5e600b	Remove spamming logs in compare_nodegroups.go and filter_out_daemon_sets.go Change the log lovel and type of spamming logs in clusterstate.go and pre_filtering_processor.go	2024-10-10 08:48:24 +00:00
olagacek	44dcaa8cf3	Revert "CAS: cloudprovider-specific nodegroupset"	2024-10-04 12:54:22 +02:00
Jack Francis	4ff4079041	cloudprovider-specific nodegroupset Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2024-09-06 10:09:40 -07:00
prachigandhi	96c7948789	update gofmt	2024-04-01 17:21:41 -07:00
prachigandhi	832017c86c	azure labels to skip in nodegroupset	2024-03-19 10:15:13 -07:00
michael mccune	5b0ad270de	add more logging for balancing similar node groups this change adds some logging at verbosity levels 2 and 3 to help diagnose why the cluster-autoscaler does not consider 2 or more node groups to be similar.	2023-06-05 14:03:57 +02:00
michael mccune	955396e857	remove clusterapi nodegroupset processor as discussed with the cluster api community[0], the nodegroupset processor is being removed from the clusterapi provider implementation in favor of instructing our community on the use of the --balancing-ignore-label flag. due to the wide variety of provider infrastructures that clusterapi can be deployed on, we would prefer to not encode all of these labels in the autoscaler itself. see the linked recording for more information. [0] https://www.youtube.com/watch?v=jbhca_9oPuQ	2023-01-12 15:05:37 -05:00
bsoghigian	0f8ed0b81f	Configurable difference ratios	2023-01-09 22:40:16 -08:00
Michael Grosser	62f29d23af	cluster-autoscaler: refactor BalanceScaleUpBetweenGroups	2022-11-15 13:21:29 -08:00
Michael McCune	ba9c164463	update clusterapi nodegroups processor this change adds labels that are used on Alibaba Cloud and IBM Cloud for CSI and CCM.	2022-08-18 15:55:35 -04:00
James Ravn	1b98b3823a	Allow balancing by labels exclusively Adds a new flag `--balance-label` which allows users to balance between node groups exclusively via labels. This gives users the flexibility to specify the similarity logic themselves when --balance-similar-node-groups is in use.	2022-07-06 10:34:18 +01:00
Marwan Ahmed	26569925db	ignore azure csi topology label for similarity checks and populate it for scale from zero	2021-12-21 20:44:49 +02:00
Michael McCune	99a242a9e6	add ClusterAPI nodegroupset processor This allows the ClusterAPI provider to ignore the `topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset processor. It also adds unit tests to exercise the new processor.	2021-11-10 17:01:27 -05:00
Michael McCune	828663e97a	add topology.ebs.csi.aws.com/zone label to aws nodegroupset processor This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: https://github.com/kubernetes/autoscaler/issues/3230 ref: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729	2021-11-10 17:01:08 -05:00
Marwan Ahmed	f318400c9e	add recent AKS agentpool label to ignore for similarity checks	2021-10-25 14:18:06 -07:00
Brett Elliott	5cf64a2b3c	Update vendor to v1.22.0-alpha.1	2021-05-20 22:02:41 +02:00
Bartłomiej Wróblewski	0fb897b839	Update imports after scheduler scheduler/framework/v1alpha1 removal	2020-11-30 10:48:52 +00:00
Benjamin Pineau	bfd6fe7fed	Ignore topology.gke.io/zone when comparing groups Commit `bb2eed1cff` introduced a new `topology.gke.io/zone` label to GCE nodes templates, for CSI needs. That label holds zone name, making nodeInfo templates dissimilar for groups belonging to different zones. The CA otherwise tries to ignore those zonal labels (ie. it ignores the standards LabelZoneRegion and LabelZoneFailureDomain) when it looks for nodegroups similarities.	2020-10-12 15:14:21 +02:00
Kubernetes Prow Robot	67dce2e824	Merge pull request #3124 from JoelSpeed/memory-tolerance-quantity Allow small tolerance on memory capacity when comparing nodegroups	2020-06-24 04:25:17 -07:00
Joel Speed	be1d9cb8d6	Allow 1.5% tolerance in memory capacity when comparing nodegroups In testing, AWS M5 instances can on occasion display approximately a 1% difference in memory capacity between availability zones, deployed with the same launch configuration and same AMI. Allow a 1.5% tolerance to give some buffer on the actual amount of memory discrepancy since in testing, some examples were just over 1% (eg 1.05%, 1.1%). Tests are included with capacity values taken from real instances to prevent future regression.	2020-06-10 12:00:39 +01:00
Maciek Pytel	655b4081f4	Migrate to klog v2	2020-06-05 17:22:26 +02:00
Jakub Tużnik	73a5cdf928	Address recent breaking changes in scheduler The following things changed in scheduler and needed to be fixed: * NodeInfo was moved to schedulerframework * Some fields on NodeInfo are now exposed directly instead of via getters * NodeInfo.Pods is now a list of schedulerframework.PodInfo, not apiv1.Pod * SharedLister and NodeInfoLister were moved to schedulerframework * PodLister was removed	2020-04-24 17:54:47 +02:00
Adam Malcontenti-Wilson	8313e969c7	Add support for passing in custom ignore labels	2020-03-17 14:30:03 +11:00
Adam Malcontenti-Wilson	5476125063	Use builder methods to create NodeInfoComparator functions	2020-03-17 13:51:15 +11:00
Maxime Renou	a7f3e54770	Add eks.amazonaws.com/nodegroup label to awsIgnoredLabels	2020-02-20 11:36:14 +01:00
Enxebre	d422aaaca6	UPSTREAM: <carry>: openshift: Add topology.kubernetes.io labels to be ignored when comparing similar node groups. Without this, the autoscaler where using the lables in compareLabels and failing to match similar groups in different zones. Starting in kube 1.17 failure-domain.beta.kubernetes.io/* are deprecated in favour of topology.kubernetes.io/* https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/#failure-domainbetakubernetesiozone	2020-02-19 18:15:11 +01:00
Colin Murphy	dde3341133	Raise maximum memory capacity difference. AWS M5 instance types may differ in memory capacity by more than 128MB.	2019-10-25 17:18:08 -04:00
Colin Murphy	7f0a42b023	Add additional AWS labels. Whitelist additional node labels for AWS CNI custom networking and EC2 lifecycle. Move AWS ignored node labels to AWS specific file.	2019-10-25 17:17:02 -04:00
Jarvis-Zhou	7c9d6e3518	Do not assign return values to variables when not needed	2019-10-25 19:28:00 +08:00
Kubernetes Prow Robot	dc1f19fc47	Merge pull request #2207 from viafoura/kops-node-similarity-fix-master add kops instance group label to ignore list for similar node group identification.	2019-09-27 07:27:37 -07:00
Andrew McDermott	e8b3c2a111	compare_nodegroups: Tolerate small differences in memory capacity The current comparator expects memory capacity values to be identical. However across AWS, Azure and GCP I quite often see very small differences in capacity, typically 8-16Ki. When this occurs the nodegroups are considered not equal when balancing is in effect which is unfortunate because, in reality, they are identical. This change will now tolerate a 128Ki difference before memory capacity values are considered unequal.	2019-09-06 15:55:51 +01:00
Krzysztof Jastrzebski	75030ee2ec	Fix bug in balancing processor. Cluster Autoscaler was stopping scaling up when there was a multizonal pool with number of nodes exceeding limit for one zone.	2019-07-29 09:28:20 +02:00
Joe Hohertz	754412d7ea	also add similar label for eksctl to ignore list Signed-off-by: Joe Hohertz <joe@viafoura.com>	2019-07-26 10:07:54 -04:00
Joe Hohertz	1999d3b432	add kops instance group label to ignore list for similar node group identification. Signed-off-by: Joe Hohertz <joe@viafoura.com>	2019-07-23 09:08:00 -04:00
t-qini	89a09ccf00	Refactor the corresponding code.	2019-07-22 08:58:51 +08:00
t-qini	f7c563ab06	Modify the code as the simple solution proposed by MaciekPytel.	2019-07-18 23:58:05 +08:00
t-qini	622a838c2c	Modify nodal similarity rules.	2019-07-09 16:04:40 +08:00
Łukasz Osipiuk	34a4262ad8	Remove GKE specific node group comparator Change-Id: I33131fec9b7972780cffde605a087cd2ad002752	2019-03-11 17:49:59 +01:00
Pengfei Ni	2546d0d97c	Move leaderelection options to new packages	2019-02-21 13:45:46 +08:00
Pengfei Ni	128729bae9	Move schedulercache to package nodeinfo	2019-02-21 12:41:08 +08:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Maciej Pytel	01a56a8d73	Add GKE-specific NodeGroupSet processor Also refactor Balancing processor a bit to make it easily extensible.	2018-10-25 18:50:17 +02:00
Maciej Pytel	6f5e6aab6f	Move node group balancing to processor The goal is to allow customization of this logic for different use-case and cloudproviders.	2018-10-25 14:04:05 +02:00

47 Commits