autoscaler

Commit Graph

Author	SHA1	Message	Date
Bartłomiej Wróblewski	0470fdfc35	Clean up DS utils: remove unused cluster snapshot and predicate checker	2023-01-23 14:14:53 +00:00
Kubernetes Prow Robot	f507519916	Merge pull request #5423 from yaroslava-serdiuk/sd-sorting Add scale down candidates observer	2023-01-19 10:14:16 -08:00
Yaroslava Serdiuk	541ce04e4b	Add previous scale down candidate sorting	2023-01-19 16:04:50 +00:00
Yaroslava Serdiuk	97159df69b	Add scale down candidates observer	2023-01-19 16:04:42 +00:00
michael mccune	955396e857	remove clusterapi nodegroupset processor as discussed with the cluster api community[0], the nodegroupset processor is being removed from the clusterapi provider implementation in favor of instructing our community on the use of the --balancing-ignore-label flag. due to the wide variety of provider infrastructures that clusterapi can be deployed on, we would prefer to not encode all of these labels in the autoscaler itself. see the linked recording for more information. [0] https://www.youtube.com/watch?v=jbhca_9oPuQ	2023-01-12 15:05:37 -05:00
bsoghigian	0f8ed0b81f	Configurable difference ratios	2023-01-09 22:40:16 -08:00
Kubernetes Prow Robot	d9ffb8f5ce	Merge pull request #5317 from grosser/grosser/ref2 cluster-autoscaler: refactor BalanceScaleUpBetweenGroups	2022-12-19 00:49:44 -08:00
Kubernetes Prow Robot	bc483274e4	Merge pull request #5325 from x13n/master Log node group min and current size when skipping scale down	2022-11-24 02:24:03 -08:00
Daniel Kłobuszewski	d9100cd707	Log node group min and current size when skipping scale down	2022-11-23 13:23:07 +01:00
Michael Grosser	62f29d23af	cluster-autoscaler: refactor BalanceScaleUpBetweenGroups	2022-11-15 13:21:29 -08:00
Bartłomiej Wróblewski	4373c467fe	Add ScaleDown.Actuator to AutoscalingContext	2022-11-02 13:12:25 +00:00
Daniel Kłobuszewski	18f2e67c4f	Split out code from simulator package	2022-10-18 11:51:44 +02:00
Flavian	f1b6d4ded6	handle directx nodes the same as gpu nodes	2022-09-23 09:55:14 +02:00
Michael McCune	ba9c164463	update clusterapi nodegroups processor this change adds labels that are used on Alibaba Cloud and IBM Cloud for CSI and CCM.	2022-08-18 15:55:35 -04:00
Yaroslava Serdiuk	887e16c3fc	CA: Iterate through existed node groups in AnnotationNodeInfoProvider	2022-08-09 12:28:28 +00:00
James Ravn	1b98b3823a	Allow balancing by labels exclusively Adds a new flag `--balance-label` which allows users to balance between node groups exclusively via labels. This gives users the flexibility to specify the similarity logic themselves when --balance-similar-node-groups is in use.	2022-07-06 10:34:18 +01:00
Yaroslava Serdiuk	466052aeb4	Add nodeTemplate annotations to node annotations	2022-06-03 16:12:16 +00:00
Yaroslava Serdiuk	d919ce3fbf	Define AnnotationNodeInfoProvider processor	2022-06-03 16:12:16 +00:00
Kuba Tużnik	b228f789dd	CA: implement the final part of node deletion in Actuator	2022-05-27 15:13:01 +02:00
Daniel Kłobuszewski	c550b77020	Make NodeDeletionTracker implement ActuationStatus interface	2022-04-28 17:08:10 +02:00
Daniel Kłobuszewski	627284bdae	Remove direct access to ScaleDown fields	2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski	358f3a9218	Extract utilization info to a separate package	2022-04-26 08:48:45 +02:00
Kubernetes Prow Robot	3e53cc4b8d	Merge pull request #4674 from x13n/nodestatus Expose nodes with unready GPU in CA status	2022-03-03 06:17:48 -08:00
Daniel Kłobuszewski	26769e4c1b	Expose nodes with unready GPU in CA status This change simplifies debugging GPU issues: without it, all nodes can be Ready as far as Kubernetes API is concerned, but CA will still report some of them as unready if are missing GPU resource. Explicitly calling them out in the status ConfigMap will point into the right direction.	2022-03-03 14:59:31 +01:00
Yaroslava Serdiuk	a9a7d98f2c	Add expire time for nodeInfo cache items	2022-02-09 09:38:32 +00:00
Daniel Kłobuszewski	9944137fae	Don't cache NodeInfo for recently Ready nodes There's a race condition between DaemonSet pods getting scheduled to a new node and Cluster Autoscaler caching that node for the sake of predicting future nodes in a given node group. We can reduce the risk of missing some DaemonSet by providing a grace period before accepting nodes in the cache. 1 minute should be more than enough, except for some pathological edge cases.	2022-01-26 20:18:53 +01:00
Daniel Gutowski	a230b47fec	Add AutoscalingContext to the scale-down post-processor	2022-01-18 07:58:53 +00:00
Daniel Gutowski	8064d6d1fd	Introduce the scale down processor that picks the final scale down candidates.	2022-01-03 16:05:36 +00:00
Marwan Ahmed	26569925db	ignore azure csi topology label for similarity checks and populate it for scale from zero	2021-12-21 20:44:49 +02:00
Michael McCune	99a242a9e6	add ClusterAPI nodegroupset processor This allows the ClusterAPI provider to ignore the `topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset processor. It also adds unit tests to exercise the new processor.	2021-11-10 17:01:27 -05:00
Michael McCune	828663e97a	add topology.ebs.csi.aws.com/zone label to aws nodegroupset processor This change adds the aforementioned label to the list of ignored labels in the AWS nodegroupset processor. This change is being made in response to the addition of this label by the aws-ebs-csi-driver. This label will eventually be deprecated by the driver, but its use will prevent AWS users from properly balancing similar nodes. Also adds unit test for the AWS processor. ref: https://github.com/kubernetes/autoscaler/issues/3230 ref: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729	2021-11-10 17:01:08 -05:00
Marwan Ahmed	f318400c9e	add recent AKS agentpool label to ignore for similarity checks	2021-10-25 14:18:06 -07:00
Kubernetes Prow Robot	c7c14381f5	Merge pull request #4391 from jayantjain93/scale-from-0-processer Introduce Empty Cluster Processor	2021-10-13 06:59:51 -07:00
Jayant Jain	da5ff3d971	Introduce Empty Cluster Processor This refactors the handling of cases when the cluster is empty/not ready by CA into a processors in empty_cluster_processor.go	2021-10-13 13:30:30 +00:00
Aleksandra Gacek	b5677acc80	Extend ScaleUpStatus with node groups that failed scale up.	2021-10-13 12:53:43 +02:00
Yaroslava Serdiuk	511d47a6f2	Add descriptive log for pre_filtering_processor	2021-10-06 14:41:43 +00:00
Maciek Pytel	a0109324a2	Change parameter order of TemplateNodeInfoProvider Every other processors (and, I think, function in CA?) that takes AutoscalingContext has it as first parameter. Changing the new processor for consistency.	2021-09-13 15:08:14 +02:00
Benjamin Pineau	8485cf2052	Move GetNodeInfosForGroups to it's own processor Supports providing different NodeInfos sources (either upstream or in local forks, eg. to properly implement variants like in #4000). This also moves a large and specialized code chunk out of core, and removes the need to maintain and pass the GetNodeInfosForGroups() cache from the side, as processors can hold their states themselves. No functional changes to GetNodeInfosForGroups(), outside mechanical changes due to the move: remotely call a few utils functions in core/utils package, pick context attributes (the processor takes the context as arg rather than ListerRegistry + PredicateChecker + CloudProvider), and use the builtin cache rather than receiving it from arguments.	2021-08-16 19:43:10 +02:00
Aleksandra Gacek	b194c6f252	Extend ScaleUpStatus structure with ScaleUpError field.	2021-08-12 10:40:58 +02:00
Brett Elliott	5cf64a2b3c	Update vendor to v1.22.0-alpha.1	2021-05-20 22:02:41 +02:00
Bartłomiej Wróblewski	1698e0e583	Separate and refactor custom resources logic	2021-04-07 10:31:11 +00:00
Maciek Pytel	65b3c8d3cc	Rename default options to NodeGroupDefaults	2021-01-25 13:21:30 +01:00
Maciek Pytel	3e42b26a22	Per NodeGroup config for scale-down options This is the implementation of https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343.	2021-01-25 11:00:17 +01:00
Maciek Pytel	08d18a7bd0	Define interfaces for per NodeGroup config. This is the first step of implementing https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343. New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.	2021-01-25 11:00:16 +01:00
Bartłomiej Wróblewski	0fb897b839	Update imports after scheduler scheduler/framework/v1alpha1 removal	2020-11-30 10:48:52 +00:00
AutoExtractor	20d7bc36d2	Improve error message by removing a confussing statement	2020-11-25 22:26:55 +01:00
Benjamin Pineau	bfd6fe7fed	Ignore topology.gke.io/zone when comparing groups Commit `bb2eed1cff` introduced a new `topology.gke.io/zone` label to GCE nodes templates, for CSI needs. That label holds zone name, making nodeInfo templates dissimilar for groups belonging to different zones. The CA otherwise tries to ignore those zonal labels (ie. it ignores the standards LabelZoneRegion and LabelZoneFailureDomain) when it looks for nodegroups similarities.	2020-10-12 15:14:21 +02:00
Jakub Tużnik	bf18d57871	Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously	2020-10-01 11:12:45 +02:00
Kubernetes Prow Robot	67dce2e824	Merge pull request #3124 from JoelSpeed/memory-tolerance-quantity Allow small tolerance on memory capacity when comparing nodegroups	2020-06-24 04:25:17 -07:00
Joel Speed	be1d9cb8d6	Allow 1.5% tolerance in memory capacity when comparing nodegroups In testing, AWS M5 instances can on occasion display approximately a 1% difference in memory capacity between availability zones, deployed with the same launch configuration and same AMI. Allow a 1.5% tolerance to give some buffer on the actual amount of memory discrepancy since in testing, some examples were just over 1% (eg 1.05%, 1.1%). Tests are included with capacity values taken from real instances to prevent future regression.	2020-06-10 12:00:39 +01:00

1 2 3

112 Commits