Clint Fooken
c94740f437
Fixing helper function to simplify for loop to retrieve deleted node names.
2022-12-05 13:11:52 -08:00
Clint Fooken
1198fbcd90
Updating error messaging and fallback behavior of hasCloudProviderInstance. Changing deletedNodes to store empty struct instead of node values, and modifying the helper function to utilize that information for tests.
2022-12-05 12:44:39 -08:00
Clint Fooken
08dfc7e20f
Changing deletion logic to rely on a new helper method in ClusterStateRegistry, and remove old complicated logic. Adjust the naming of the method for cloud instance deletion from NodeExists to HasInstance.
2022-11-04 17:54:05 -07:00
Clint Fooken
e59c0441ff
Fixing go formatting issues with clusterstate_test
2022-10-17 15:17:28 -07:00
Clint
cf67a3004e
Implementing new cloud provider method for node deletion detection ( #1 )
...
* Adding isNodeDeleted method to CloudProvider interface. Supports detecting whether nodes are fully deleted or are not-autoscaled. Updated cloud providers to provide initial implementation of new method that will return an ErrNotImplemented to maintain existing taint-based deletion clusterstate calculation.
2022-10-17 14:58:38 -07:00
Clint Fooken
776d7311a1
Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes. Avoids misidentifying not autoscaled nodes as deleted. Simplified implementation to use apiv1.Node instead of new struct. Expanded test cases to include not autoscaled nodes and tracking deleted nodes over multiple updates.
...
Adding check to backfill loop to confirm cloud provider node no longer exists before flagging the node as deleted. Modifying some comments to be more accurate. Replacing erroneous line deletion.
2022-10-17 14:40:01 -07:00
Daniel Kłobuszewski
66bfe55077
Revert "Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes"
2022-07-13 10:08:03 +02:00
Clint Fooken
ee80c93ae4
Fixing test case for DeletedNodes.
2022-05-17 12:54:53 -07:00
Clint Fooken
a278255519
Adding support for identifying nodes that have been deleted from cloud provider that are still registered within Kubernetes. Including code changes first introduced in PR#4211, which will remove taints from all nodes on restarts.
2022-05-17 12:37:42 -07:00
weidongcai
03a0475502
Expose backoff time parameters
2022-05-12 15:34:28 +08:00
Vivek Bagade
8c592f0c04
Fix bug where a node that becomes ready after 2 mins can be
...
treated as unready. Deprecated LongNotStarted
In cases where node n1 would:
1) Be created at t=0min
2) Ready condition is true at t=2.5min
3) Not ready taint is removed at t=3min
the ready node is counted as unready
Tested cases after fix:
1) Case described above
2) Nodes not starting even after 15mins still
treated as unready
3) Nodes created long ago that suddenly become unready are
counted as unready.
2021-03-11 18:32:51 +01:00
Eric Mrak and Brett Kochendorfer
8442ba8307
Add argument for Status Configmap tests
2021-02-18 17:21:32 +00:00
Jakub Tużnik
6a528b45de
Include taints by condition when determining if a node is unready/still starting
...
Conditions and their corresponding taints can sometimes skew, which
can cause unnecessary scale-up. CA thinks nodes are ready because it
looks only at the conditions, but scheduler predicates fail because they
consider the taints as well. CA adds nodes, even though the existing
nodes are still starting. This commit brings CA behavior in line
with scheduler predicates behavior, eliminating the unnecessary
scale-up.
2020-11-02 11:15:42 +01:00
Jakub Tużnik
f64b6cd4de
CSR: fix a bug in GetClusterSize
...
Currently, GetClusterSize reports the target number for all autoscaled
node groups, but the actual number for _all_ node groups, even those
that are not autoscaled. This commit fixes that behavior so that both
target and actual size reported are from autoscaled node groups only.
2019-11-20 13:49:49 +01:00
Łukasz Osipiuk
79b4614328
Use NodeDiskPressure conditino instead of NodeOutOfDisk
2019-09-05 23:23:43 +02:00
Jakub Tużnik
bb382f47f9
Retain information about scale-up failures in CSR
...
This will provide the AutoscalingStatusProcessor with information
about failed scale-ups.
2019-06-05 16:53:30 +02:00
Łukasz Osipiuk
b5f9a9505c
Extend backoff interface with NodeInfo and error information
2019-01-09 11:25:34 +01:00
Łukasz Osipiuk
85a83b62bd
Pass nodeGroup->NodeInfo map to ClusterStateRegistry
...
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Łukasz Osipiuk
5cddbda693
Rename nodeGroupBackoffInfo to backoff in ClusterStateRegistry
2018-12-31 17:59:58 +01:00
Łukasz Osipiuk
da5bef307b
Allow updating Increase for ScaleUpRequest in ClusterStateRegistry
2018-12-28 17:17:07 +01:00
Łukasz Osipiuk
5962354c81
Inject Backoff instance to ClusterStateRegistry on creation
2018-11-13 14:25:16 +01:00
Łukasz Osipiuk
0e2c3739b7
Use NodeGroup as key in Backoff
2018-10-30 18:17:26 +01:00
Łukasz Osipiuk
55fc1e2f00
Store NodeGroup in ScaleUpRequest and ScaleDownRequest
2018-10-30 18:03:04 +01:00
Aleksandra Malinowska
364e2da764
Check for ready condition not true
2018-08-30 13:43:24 +02:00
Jakub Tużnik
054f0b3b90
Add AutoscalingStatusProcessor
2018-08-07 14:47:06 +02:00
Krzysztof Jastrzebski
dd1db7a0ac
Move backoff mechanism to utils.
2018-06-13 15:32:25 +02:00
Aleksandra Malinowska
4c594db7f8
Run spellchecker
2018-03-15 15:47:49 +01:00
Edward Tsang
4104a91991
more spelling fixes
2017-11-02 14:21:36 -07:00
Maciej Pytel
ff21b0b00c
Keep track of nodes that failed to register for a long time
...
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Maciej Pytel
a440d92a60
Log event on scale-up timeout
2017-09-01 14:19:14 +02:00
Maciej Pytel
6aacbb5bf7
Backoff for node group after failed scale-up
2017-08-04 15:40:23 +02:00
Aleksandra Malinowska
c159a90f04
rename test provider package
2017-07-06 16:23:15 +02:00
Marcin Wielgus
fc43808149
Godeps bump for CA
2017-07-03 22:05:11 +02:00
Marcin Wielgus
0a8a88c580
Handle empty node groups in cluster state
2017-05-19 17:46:53 +02:00
Marcin Wielgus
34eb4973f8
Fix imports in cluster autoscaler after migrating it from contrib
2017-04-18 15:42:04 +02:00
Maciej Pytel
10d560dae6
Cluster-Autoscaler: handle nil node group
...
In a few place we assumed it's not-nil, leading
to segfaults.
2017-03-13 14:46:11 +01:00
Maciej Pytel
46d2c66473
Cluster-autoscaler: set timestamps in status configmap
2017-03-08 11:51:20 +01:00
Marcin Wielgus
8cfed0b474
Cluster-autoscaler: GetStatus - scaleDown
2017-02-21 19:56:07 +01:00
Marcin Wielgus
87f0d62b28
Cluster-autoscaler: scale up status
2017-02-21 16:21:36 +01:00
Marcin Wielgus
d9d5a751f5
Cluster-autoscaler: GetState() - health condition
2017-02-21 13:15:19 +01:00
Marcin Wielgus
ce45c33d29
Cluster-autoscaler: update CA code for godep refresh
2017-01-20 14:46:34 +01:00
Marcin Wielgus
dd98a2d339
Cluster-autoscaler: unregistered nodes in cluster state registry
2017-01-12 17:58:12 +01:00
Marcin Wielgus
e5e87e5c96
Cluster-autoscaler: Add information how long node group incorrect size persisted
2017-01-10 14:17:51 +01:00
Marcin Wielgus
949cf37465
Cluster-autoscaler: support unready nodes in scale down
2017-01-03 14:17:59 +01:00
Marcin Wielgus
ec04ea4279
Cluster-Autoscaler: upcoming nodes in ClusterStateRegistry
2016-12-30 13:43:19 +01:00
Marcin Wielgus
d5229046ff
Cluster-autoscaler: cluster status registry
2016-12-29 15:04:15 +01:00