autoscaler

Commit Graph

Author	SHA1	Message	Date
Clint Fooken	c94740f437	Fixing helper function to simplify for loop to retrieve deleted node names.	2022-12-05 13:11:52 -08:00
Clint Fooken	1198fbcd90	Updating error messaging and fallback behavior of hasCloudProviderInstance. Changing deletedNodes to store empty struct instead of node values, and modifying the helper function to utilize that information for tests.	2022-12-05 12:44:39 -08:00
Clint Fooken	08dfc7e20f	Changing deletion logic to rely on a new helper method in ClusterStateRegistry, and remove old complicated logic. Adjust the naming of the method for cloud instance deletion from NodeExists to HasInstance.	2022-11-04 17:54:05 -07:00
Clint Fooken	e59c0441ff	Fixing go formatting issues with clusterstate_test	2022-10-17 15:17:28 -07:00
Clint	cf67a3004e	Implementing new cloud provider method for node deletion detection (#1 ) * Adding isNodeDeleted method to CloudProvider interface. Supports detecting whether nodes are fully deleted or are not-autoscaled. Updated cloud providers to provide initial implementation of new method that will return an ErrNotImplemented to maintain existing taint-based deletion clusterstate calculation.	2022-10-17 14:58:38 -07:00
Clint Fooken	776d7311a1	Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes. Avoids misidentifying not autoscaled nodes as deleted. Simplified implementation to use apiv1.Node instead of new struct. Expanded test cases to include not autoscaled nodes and tracking deleted nodes over multiple updates. Adding check to backfill loop to confirm cloud provider node no longer exists before flagging the node as deleted. Modifying some comments to be more accurate. Replacing erroneous line deletion.	2022-10-17 14:40:01 -07:00
Daniel Kłobuszewski	66bfe55077	Revert "Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes"	2022-07-13 10:08:03 +02:00
Clint Fooken	ee80c93ae4	Fixing test case for DeletedNodes.	2022-05-17 12:54:53 -07:00
Clint Fooken	a278255519	Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes. Including code changes first introduced in PR#4211, which will remove taints from all nodes on restarts.	2022-05-17 12:37:42 -07:00
weidongcai	03a0475502	Expose backoff time parameters	2022-05-12 15:34:28 +08:00
Vivek Bagade	8c592f0c04	Fix bug where a node that becomes ready after 2 mins can be treated as unready. Deprecated LongNotStarted In cases where node n1 would: 1) Be created at t=0min 2) Ready condition is true at t=2.5min 3) Not ready taint is removed at t=3min the ready node is counted as unready Tested cases after fix: 1) Case described above 2) Nodes not starting even after 15mins still treated as unready 3) Nodes created long ago that suddenly become unready are counted as unready.	2021-03-11 18:32:51 +01:00
Eric Mrak and Brett Kochendorfer	8442ba8307	Add argument for Status Configmap tests	2021-02-18 17:21:32 +00:00
Jakub Tużnik	6a528b45de	Include taints by condition when determining if a node is unready/still starting Conditions and their corresponding taints can sometimes skew, which can cause unnecessary scale-up. CA thinks nodes are ready because it looks only at the conditions, but scheduler predicates fail because they consider the taints as well. CA adds nodes, even though the existing nodes are still starting. This commit brings CA behavior in line with scheduler predicates behavior, eliminating the unnecessary scale-up.	2020-11-02 11:15:42 +01:00
Jakub Tużnik	f64b6cd4de	CSR: fix a bug in GetClusterSize Currently, GetClusterSize reports the target number for all autoscaled node groups, but the actual number for _all_ node groups, even those that are not autoscaled. This commit fixes that behavior so that both target and actual size reported are from autoscaled node groups only.	2019-11-20 13:49:49 +01:00
Łukasz Osipiuk	79b4614328	Use NodeDiskPressure conditino instead of NodeOutOfDisk	2019-09-05 23:23:43 +02:00
Jakub Tużnik	bb382f47f9	Retain information about scale-up failures in CSR This will provide the AutoscalingStatusProcessor with information about failed scale-ups.	2019-06-05 16:53:30 +02:00
Łukasz Osipiuk	b5f9a9505c	Extend backoff interface with NodeInfo and error information	2019-01-09 11:25:34 +01:00
Łukasz Osipiuk	85a83b62bd	Pass nodeGroup->NodeInfo map to ClusterStateRegistry Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801	2019-01-08 15:52:00 +01:00
Łukasz Osipiuk	5cddbda693	Rename nodeGroupBackoffInfo to backoff in ClusterStateRegistry	2018-12-31 17:59:58 +01:00
Łukasz Osipiuk	da5bef307b	Allow updating Increase for ScaleUpRequest in ClusterStateRegistry	2018-12-28 17:17:07 +01:00
Łukasz Osipiuk	5962354c81	Inject Backoff instance to ClusterStateRegistry on creation	2018-11-13 14:25:16 +01:00
Łukasz Osipiuk	0e2c3739b7	Use NodeGroup as key in Backoff	2018-10-30 18:17:26 +01:00
Łukasz Osipiuk	55fc1e2f00	Store NodeGroup in ScaleUpRequest and ScaleDownRequest	2018-10-30 18:03:04 +01:00
Aleksandra Malinowska	364e2da764	Check for ready condition not true	2018-08-30 13:43:24 +02:00
Jakub Tużnik	054f0b3b90	Add AutoscalingStatusProcessor	2018-08-07 14:47:06 +02:00
Krzysztof Jastrzebski	dd1db7a0ac	Move backoff mechanism to utils.	2018-06-13 15:32:25 +02:00
Aleksandra Malinowska	4c594db7f8	Run spellchecker	2018-03-15 15:47:49 +01:00
Edward Tsang	4104a91991	more spelling fixes	2017-11-02 14:21:36 -07:00
Maciej Pytel	ff21b0b00c	Keep track of nodes that failed to register for a long time Previously a node that failed to register and couldn't be deleted basically broke CA.	2017-09-27 16:32:04 +02:00
Maciej Pytel	a440d92a60	Log event on scale-up timeout	2017-09-01 14:19:14 +02:00
Maciej Pytel	6aacbb5bf7	Backoff for node group after failed scale-up	2017-08-04 15:40:23 +02:00
Aleksandra Malinowska	c159a90f04	rename test provider package	2017-07-06 16:23:15 +02:00
Marcin Wielgus	fc43808149	Godeps bump for CA	2017-07-03 22:05:11 +02:00
Marcin Wielgus	0a8a88c580	Handle empty node groups in cluster state	2017-05-19 17:46:53 +02:00
Marcin Wielgus	34eb4973f8	Fix imports in cluster autoscaler after migrating it from contrib	2017-04-18 15:42:04 +02:00
Maciej Pytel	10d560dae6	Cluster-Autoscaler: handle nil node group In a few place we assumed it's not-nil, leading to segfaults.	2017-03-13 14:46:11 +01:00
Maciej Pytel	46d2c66473	Cluster-autoscaler: set timestamps in status configmap	2017-03-08 11:51:20 +01:00
Marcin Wielgus	8cfed0b474	Cluster-autoscaler: GetStatus - scaleDown	2017-02-21 19:56:07 +01:00
Marcin Wielgus	87f0d62b28	Cluster-autoscaler: scale up status	2017-02-21 16:21:36 +01:00
Marcin Wielgus	d9d5a751f5	Cluster-autoscaler: GetState() - health condition	2017-02-21 13:15:19 +01:00
Marcin Wielgus	ce45c33d29	Cluster-autoscaler: update CA code for godep refresh	2017-01-20 14:46:34 +01:00
Marcin Wielgus	dd98a2d339	Cluster-autoscaler: unregistered nodes in cluster state registry	2017-01-12 17:58:12 +01:00
Marcin Wielgus	e5e87e5c96	Cluster-autoscaler: Add information how long node group incorrect size persisted	2017-01-10 14:17:51 +01:00
Marcin Wielgus	949cf37465	Cluster-autoscaler: support unready nodes in scale down	2017-01-03 14:17:59 +01:00
Marcin Wielgus	ec04ea4279	Cluster-Autoscaler: upcoming nodes in ClusterStateRegistry	2016-12-30 13:43:19 +01:00
Marcin Wielgus	d5229046ff	Cluster-autoscaler: cluster status registry	2016-12-29 15:04:15 +01:00

46 Commits