* fix: move leader elect flag binding above InitFlags()
* Revert https://github.com/kubernetes/autoscaler/pull/7233https://github.com/kubernetes/autoscaler/pull/7233 broke `--leader-elect` flag by introducing `--lease-resource-name` that is redundant with `--leader-elect-resource-name`
* fix: move leader election flag binding above flag parsing which happens in kube_flag.InitFlags()
---------
Co-authored-by: Daniel Kłobuszewski <danielmk@google.com>
With the previous default of random, this could lead to start very expansives nodes that the cluster autoscaler does not manage to remove as long as another smaller node is started.
This is needed so that the scheduler code correctly includes and
executes the DRA plugin.
We could just use the feature gate instead of the DRA flag in CA
(the feature gates flag is already there, just not really used),
but I guess there could be use-cases for having DRA enabled in the
cluster but not in CA (e.g. DRA being tested in the cluster, CA only
operating on non-DRA nodes/pods).
Make SharedDRAManager a part of the ClusterSnapshotStore interface, and
implement dummy methods to satisfy the interface. Actual implementation
will come in later commits.
This is needed so that ClusterSnapshot can feed DRA objects to the DRA
scheduler plugin, and obtain ResourceClaim modifications back from it.
The integration is behind the DRA flag guard, this should be a no-op
if the flag is disabled.
This flag will be used to guard any behavior-changing logic needed for
DRA, to make it clear that existing behavior for non-DRA use-cases is
preserved.
This decouples PredicateChecker from the Framework initialization logic,
and allows creating multiple PredicateChecker instances while only
initializing the framework once.
This commit also fixes how CA integrates with Framework metrics. Instead
of Registering them they're only Initialized so that CA doesn't expose
scheduler metrics. And the initialization is moved from multiple
different places to the Handle constructor.
- Set initial count to zero for various autoscaler error types (e.g., CloudProviderError, ApiCallError)
- Define failed scale-up reasons and initialize metrics (e.g., CloudProviderError, APIError)
- Initialize pod eviction result counters for success and failure cases
- Initialize skipped scale events for CPU and memory resource limits in both scale-up and scale-down directions
Signed-off-by: Thiha Min Thant <thihaminthant20@gmail.com>
* Add backoff mechanism for ProvReq retry
* Add flags for intital and max backoff time, and cache size
* Review remarks
* Add LRU cache
* Review remark