Commit Graph

313 Commits

Author SHA1 Message Date
Yahia Naguib 241ad7af1e
update address description 2025-03-10 14:25:44 +00:00
Yahia Naguib 3e9d11b732
Migrating flags off main.go to a separate package 2025-03-07 21:11:27 +00:00
Kubernetes Prow Robot 173a4bde19
Merge pull request #7897 from mtrqq/bug/block-until-resource-caches-are-synced
Block cluster autoscaler until API resource caches are synced.
2025-03-07 06:19:45 -08:00
Maksym Fuhol 24f68f98e2 Block cluster autoscaler until API resource caches are synced. 2025-03-07 13:48:26 +00:00
Maciej Skoczeń 90eabc6a4d Differentiate provisioning requests using Parameters field. Keep prefixing as not recommended approach 2025-03-04 11:41:51 +00:00
Maciej Skoczeń 7115527077 Allow to prefix provisioningClassName to filter provisioning requests 2025-03-04 10:48:21 +00:00
Kubernetes Prow Robot 6324acbf6e
Merge pull request #7631 from jfcoz/feat/lestwastedefaultexpander
change default cluster-autoscaler to least-waste
2025-02-05 01:18:17 -08:00
idebeijer 131e95433c
docs: add comment to prevent leader-election flag binding to moved 2025-01-24 12:46:51 +01:00
idb 49a0c57c79
fix: add `--leader-elect` flags back by reverting https://github.com/kubernetes/autoscaler/pull/7233 (#7761)
* fix: move leader elect flag binding above InitFlags()

* Revert https://github.com/kubernetes/autoscaler/pull/7233

https://github.com/kubernetes/autoscaler/pull/7233 broke `--leader-elect` flag by introducing `--lease-resource-name` that is redundant with `--leader-elect-resource-name`

* fix: move leader election flag binding above flag parsing which happens in kube_flag.InitFlags()

---------

Co-authored-by: Daniel Kłobuszewski <danielmk@google.com>
2025-01-24 00:41:22 -08:00
Kubernetes Prow Robot f6064ee8e3
Merge pull request #7688 from macsko/enforce_provisioning_request_processing_even_if_all_pods_are_young
Enforce provisioning requests processing even if all pods are new
2025-01-23 01:54:58 -08:00
Julien Francoz f8a68efe63 change default cluster-autoscaler to least-waste
With the previous default of random, this could lead to start very expansives nodes that the cluster autoscaler does not manage to remove as long as another smaller node is started.
2025-01-20 21:47:13 +01:00
Maciej Skoczeń d7c325abf7 Enforce provisioning requests processing even if all pods are new 2025-01-10 13:08:56 +00:00
Maciej Skoczeń e7811b86fa Improve frequent loops when only one of activities is productive 2025-01-09 09:33:23 +00:00
Maciej Skoczeń 39882551f7 Parallelize cluster snapshot creation 2025-01-03 10:35:11 +00:00
Kubernetes Prow Robot 50c65906fd
Merge pull request #7530 from towca/jtuznik/dra-actual
CA: DRA integration MVP
2024-12-20 16:30:08 +01:00
Kuba Tużnik 4a89524f84 CA: enable the DRA feature gate whenever the DRA flag is passed
This is needed so that the scheduler code correctly includes and
executes the DRA plugin.

We could just use the feature gate instead of the DRA flag in CA
(the feature gates flag is already there, just not really used),
but I guess there could be use-cases for having DRA enabled in the
cluster but not in CA (e.g. DRA being tested in the cluster, CA only
operating on non-DRA nodes/pods).
2024-12-20 13:30:37 +01:00
Kuba Tużnik 99282c08cb CA: automatically use BasicSnapshotStore when DRA is enabled
By default CA is built with DeltaSnapshotStore, which isn't integrated
with DRA yet.
2024-12-20 13:30:37 +01:00
Kubernetes Prow Robot e4898a9563
Merge pull request #7611 from macsko/dont_accept_pr_twice_when_check_capacity_batch_processing_enabled
Don't accept ProvisioningRequest twice when checkCapacityBatchProcessing enabled
2024-12-17 10:58:53 +01:00
Kubernetes Prow Robot ae22146f60
Merge pull request #7449 from thiha-min-thant/failed-scale-ups-metrics
🐛(metrics) Initialize metrics for autoscaler errors, scale events, and pod evictions
2024-12-16 11:10:51 +01:00
Maciej Skoczeń 2426d7f836 Don't accept ProvisioningRequest twice when checkCapacityBatchProcessing enabled 2024-12-16 09:57:18 +00:00
Kuba Tużnik d0338fa301 CA: integrate simulator with schedulerframework.SharedDRAManager
Make SharedDRAManager a part of the ClusterSnapshotStore interface, and
implement dummy methods to satisfy the interface. Actual implementation
will come in later commits.

This is needed so that ClusterSnapshot can feed DRA objects to the DRA
scheduler plugin, and obtain ResourceClaim modifications back from it.

The integration is behind the DRA flag guard, this should be a no-op
if the flag is disabled.
2024-12-09 17:14:34 +01:00
Kuba Tużnik 8c7f3fadc6 CA: plumb the DRA flag guard to PredicateSnapshot 2024-12-06 13:40:47 +01:00
Kuba Tużnik d9036c856e CA: add a flag guard for DRA to AutoscalingOptions (disabled by default)
This flag will be used to guard any behavior-changing logic needed for
DRA, to make it clear that existing behavior for non-DRA use-cases is
preserved.
2024-12-04 18:07:18 +01:00
Kuba Tużnik 6876289228 CA: remove PredicateChecker, use the new ClusterSnapshot methods instead 2024-12-04 14:33:51 +01:00
Kuba Tużnik 0ace148d3d CA: rename BasicClusterSnapshot and DeltaClusterSnapshot to reflect the ClusterSnapshotStore change 2024-12-04 14:33:51 +01:00
Kuba Tużnik 67773a5509 CA: move BasicClusterSnapshot and DeltaClusterSnapshot to a dedicated subpkg 2024-12-04 14:33:51 +01:00
Kuba Tużnik 540725286f CA: migrate the codebase to use PredicateSnapshot 2024-12-04 14:33:51 +01:00
Kuba Tużnik a35f830f1d CA: extract a Handle to scheduleframework.Framework out of PredicateChecker
This decouples PredicateChecker from the Framework initialization logic,
and allows creating multiple PredicateChecker instances while only
initializing the framework once.

This commit also fixes how CA integrates with Framework metrics. Instead
of Registering them they're only Initialized so that CA doesn't expose
scheduler metrics. And the initialization is moved from multiple
different places to the Handle constructor.
2024-12-03 16:47:54 +01:00
Bartłomiej Wróblewski a0bf1082b5 Add flag to force remove long unregistered nodes 2024-11-18 13:55:15 +00:00
Thiha Min Thant ffd57af618
🐛(metrics) Initialize metrics for autoscaler errors, scale events, and pod evictions
- Set initial count to zero for various autoscaler error types (e.g., CloudProviderError, ApiCallError)
- Define failed scale-up reasons and initialize metrics (e.g., CloudProviderError, APIError)
- Initialize pod eviction result counters for success and failure cases
- Initialize skipped scale events for CPU and memory resource limits in both scale-up and scale-down directions

Signed-off-by: Thiha Min Thant <thihaminthant20@gmail.com>
2024-11-02 15:43:56 +08:00
Devansh Das 63d02751b1 Implement batch processing for check capacity class with combined status 2024-10-24 21:14:55 +00:00
Bartłomiej Wróblewski 068ce78272 Register scheduler metrics 2024-10-23 16:47:34 +00:00
Devansh Das 1ce64e93d4 Add support for frequent loops when provisioningrequest is encountered in last iteration 2024-10-20 17:55:05 +00:00
Devansh Das 0946d851e7
Revert "Add support for frequent loops when provisioningrequest is encountered in last iteration" 2024-10-18 12:21:04 +02:00
Devansh Das 0a64fb0c27 Add support for frequent loops when provisioningrequest is encountered in last iteration 2024-10-15 09:37:54 +00:00
olagacek 44dcaa8cf3
Revert "CAS: cloudprovider-specific nodegroupset" 2024-10-04 12:54:22 +02:00
Yaroslava Serdiuk 04b1402ddc
Add backoff mechanism for ProvReq retry (#7182)
* Add backoff mechanism for ProvReq retry

* Add flags for intital and max backoff time, and cache size

* Review remarks

* Add LRU cache

* Review remark
2024-09-23 09:16:00 +01:00
Devansh Das 94a5ef81e6 Removed redundant error check and variable declarations 2024-09-16 20:47:49 +00:00
Omran 38ce500d5f
Fix scale up status processor overriding default one with proactive scaleup enabled 2024-09-12 18:26:56 +00:00
Jack Francis 4ff4079041 cloudprovider-specific nodegroupset
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2024-09-06 10:09:40 -07:00
Devansh Das c3825193c2 Add lease resource name customization using flag 2024-09-04 08:10:18 +00:00
Daniel Kłobuszewski 7f30a7b8d1 Remove legacy scale down code 2024-08-28 11:07:40 +02:00
Omran 01e943304a
Add proactive scaleup 2024-08-23 21:59:15 +00:00
Omran e30bf14730
Add upcoming node groups state checker 2024-08-22 07:42:38 +00:00
mendelski c06ec4b324
Add async node group creation 2024-08-20 12:02:01 +00:00
Daniel Gutowski 4d845bd749 Add option to pass custom filter funtion for nodes
This will allow users to filter-out some of the nodes when
filtering out pods potentially schedulable.
2024-07-16 02:48:19 -07:00
Kubernetes Prow Robot 68a757c191
Merge pull request #6880 from yaroslava-serdiuk/provreq-scale-down
BookCapacity for ProvisioningRequest pods
2024-07-12 11:19:00 -07:00
Damika Gamlath 8971a29177 refactor gce.RegenerateMigInstancesCache() to use Instance.List API for listing MIG instances 2024-07-04 14:48:24 +00:00
Kubernetes Prow Robot 3c6dd26d9e
Merge pull request #6863 from rrangith/azure-default-sizes
Default min/max sizes for Azure VMSSs
2024-07-01 07:29:35 -07:00
Yaroslava Serdiuk 830bbb2653 BookCapacity for ProvisioningRequest pods 2024-06-21 11:54:46 +00:00