Commit Graph

46 Commits

Author SHA1 Message Date
Clint Fooken c94740f437 Fixing helper function to simplify for loop to retrieve deleted node names. 2022-12-05 13:11:52 -08:00
Clint Fooken 1198fbcd90 Updating error messaging and fallback behavior of hasCloudProviderInstance. Changing deletedNodes to store empty struct instead of node values, and modifying the helper function to utilize that information for tests. 2022-12-05 12:44:39 -08:00
Clint Fooken 08dfc7e20f Changing deletion logic to rely on a new helper method in ClusterStateRegistry, and remove old complicated logic. Adjust the naming of the method for cloud instance deletion from NodeExists to HasInstance. 2022-11-04 17:54:05 -07:00
Clint Fooken e59c0441ff Fixing go formatting issues with clusterstate_test 2022-10-17 15:17:28 -07:00
Clint cf67a3004e
Implementing new cloud provider method for node deletion detection (#1)
* Adding isNodeDeleted method to CloudProvider interface. Supports detecting whether nodes are fully deleted or are not-autoscaled. Updated cloud providers to provide initial implementation of new method that will return an ErrNotImplemented to maintain existing taint-based deletion clusterstate calculation.
2022-10-17 14:58:38 -07:00
Clint Fooken 776d7311a1 Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes. Avoids misidentifying not autoscaled nodes as deleted. Simplified implementation to use apiv1.Node instead of new struct. Expanded test cases to include not autoscaled nodes and tracking deleted nodes over multiple updates.
Adding check to backfill loop to confirm cloud provider node no longer exists before flagging the node as deleted. Modifying some comments to be more accurate. Replacing erroneous line deletion.
2022-10-17 14:40:01 -07:00
Daniel Kłobuszewski 66bfe55077
Revert "Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes" 2022-07-13 10:08:03 +02:00
Clint Fooken ee80c93ae4 Fixing test case for DeletedNodes. 2022-05-17 12:54:53 -07:00
Clint Fooken a278255519 Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes. Including code changes first introduced in PR#4211, which will remove taints from all nodes on restarts. 2022-05-17 12:37:42 -07:00
weidongcai 03a0475502 Expose backoff time parameters 2022-05-12 15:34:28 +08:00
Vivek Bagade 8c592f0c04 Fix bug where a node that becomes ready after 2 mins can be
treated as unready. Deprecated LongNotStarted

 In cases where node n1 would:
 1) Be created at t=0min
 2) Ready condition is true at t=2.5min
 3) Not ready taint is removed at t=3min
 the ready node is counted as unready

 Tested cases after fix:
 1) Case described above
 2) Nodes not starting even after 15mins still
 treated as unready
 3) Nodes created long ago that suddenly become unready are
 counted as unready.
2021-03-11 18:32:51 +01:00
Eric Mrak and Brett Kochendorfer 8442ba8307 Add argument for Status Configmap tests 2021-02-18 17:21:32 +00:00
Jakub Tużnik 6a528b45de Include taints by condition when determining if a node is unready/still starting
Conditions and their corresponding taints can sometimes skew, which
can cause unnecessary scale-up. CA thinks nodes are ready because it
looks only at the conditions, but scheduler predicates fail because they
consider the taints as well. CA adds nodes, even though the existing
nodes are still starting. This commit brings CA behavior in line
with scheduler predicates behavior, eliminating the unnecessary
scale-up.
2020-11-02 11:15:42 +01:00
Jakub Tużnik f64b6cd4de CSR: fix a bug in GetClusterSize
Currently, GetClusterSize reports the target number for all autoscaled
node groups, but the actual number for _all_ node groups, even those
that are not autoscaled. This commit fixes that behavior so that both
target and actual size reported are from autoscaled node groups only.
2019-11-20 13:49:49 +01:00
Łukasz Osipiuk 79b4614328 Use NodeDiskPressure conditino instead of NodeOutOfDisk 2019-09-05 23:23:43 +02:00
Jakub Tużnik bb382f47f9 Retain information about scale-up failures in CSR
This will provide the AutoscalingStatusProcessor with information
about failed scale-ups.
2019-06-05 16:53:30 +02:00
Łukasz Osipiuk b5f9a9505c Extend backoff interface with NodeInfo and error information 2019-01-09 11:25:34 +01:00
Łukasz Osipiuk 85a83b62bd Pass nodeGroup->NodeInfo map to ClusterStateRegistry
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Łukasz Osipiuk 5cddbda693 Rename nodeGroupBackoffInfo to backoff in ClusterStateRegistry 2018-12-31 17:59:58 +01:00
Łukasz Osipiuk da5bef307b Allow updating Increase for ScaleUpRequest in ClusterStateRegistry 2018-12-28 17:17:07 +01:00
Łukasz Osipiuk 5962354c81 Inject Backoff instance to ClusterStateRegistry on creation 2018-11-13 14:25:16 +01:00
Łukasz Osipiuk 0e2c3739b7 Use NodeGroup as key in Backoff 2018-10-30 18:17:26 +01:00
Łukasz Osipiuk 55fc1e2f00 Store NodeGroup in ScaleUpRequest and ScaleDownRequest 2018-10-30 18:03:04 +01:00
Aleksandra Malinowska 364e2da764 Check for ready condition not true 2018-08-30 13:43:24 +02:00
Jakub Tużnik 054f0b3b90 Add AutoscalingStatusProcessor 2018-08-07 14:47:06 +02:00
Krzysztof Jastrzebski dd1db7a0ac Move backoff mechanism to utils. 2018-06-13 15:32:25 +02:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Maciej Pytel a440d92a60 Log event on scale-up timeout 2017-09-01 14:19:14 +02:00
Maciej Pytel 6aacbb5bf7 Backoff for node group after failed scale-up 2017-08-04 15:40:23 +02:00
Aleksandra Malinowska c159a90f04 rename test provider package 2017-07-06 16:23:15 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Marcin Wielgus 0a8a88c580 Handle empty node groups in cluster state 2017-05-19 17:46:53 +02:00
Marcin Wielgus 34eb4973f8 Fix imports in cluster autoscaler after migrating it from contrib 2017-04-18 15:42:04 +02:00
Maciej Pytel 10d560dae6 Cluster-Autoscaler: handle nil node group
In a few place we assumed it's not-nil, leading
to segfaults.
2017-03-13 14:46:11 +01:00
Maciej Pytel 46d2c66473 Cluster-autoscaler: set timestamps in status configmap 2017-03-08 11:51:20 +01:00
Marcin Wielgus 8cfed0b474 Cluster-autoscaler: GetStatus - scaleDown 2017-02-21 19:56:07 +01:00
Marcin Wielgus 87f0d62b28 Cluster-autoscaler: scale up status 2017-02-21 16:21:36 +01:00
Marcin Wielgus d9d5a751f5 Cluster-autoscaler: GetState() - health condition 2017-02-21 13:15:19 +01:00
Marcin Wielgus ce45c33d29 Cluster-autoscaler: update CA code for godep refresh 2017-01-20 14:46:34 +01:00
Marcin Wielgus dd98a2d339 Cluster-autoscaler: unregistered nodes in cluster state registry 2017-01-12 17:58:12 +01:00
Marcin Wielgus e5e87e5c96 Cluster-autoscaler: Add information how long node group incorrect size persisted 2017-01-10 14:17:51 +01:00
Marcin Wielgus 949cf37465 Cluster-autoscaler: support unready nodes in scale down 2017-01-03 14:17:59 +01:00
Marcin Wielgus ec04ea4279 Cluster-Autoscaler: upcoming nodes in ClusterStateRegistry 2016-12-30 13:43:19 +01:00
Marcin Wielgus d5229046ff Cluster-autoscaler: cluster status registry 2016-12-29 15:04:15 +01:00