autoscaler

Commit Graph

Author	SHA1	Message	Date
Maksym Fuhol	61328095ae	Enable DeltaSnapshotStore to work when DRA is enabled	2025-06-11 08:29:53 +00:00
Yahia Naguib	241ad7af1e	update address description	2025-03-10 14:25:44 +00:00
Yahia Naguib	3e9d11b732	Migrating flags off main.go to a separate package	2025-03-07 21:11:27 +00:00
Kubernetes Prow Robot	173a4bde19	Merge pull request #7897 from mtrqq/bug/block-until-resource-caches-are-synced Block cluster autoscaler until API resource caches are synced.	2025-03-07 06:19:45 -08:00
Maksym Fuhol	24f68f98e2	Block cluster autoscaler until API resource caches are synced.	2025-03-07 13:48:26 +00:00
Maciej Skoczeń	90eabc6a4d	Differentiate provisioning requests using Parameters field. Keep prefixing as not recommended approach	2025-03-04 11:41:51 +00:00
Maciej Skoczeń	7115527077	Allow to prefix provisioningClassName to filter provisioning requests	2025-03-04 10:48:21 +00:00
Kubernetes Prow Robot	6324acbf6e	Merge pull request #7631 from jfcoz/feat/lestwastedefaultexpander change default cluster-autoscaler to least-waste	2025-02-05 01:18:17 -08:00
idebeijer	131e95433c	docs: add comment to prevent leader-election flag binding to moved	2025-01-24 12:46:51 +01:00
idb	49a0c57c79	fix: add `--leader-elect` flags back by reverting https://github.com/kubernetes/autoscaler/pull/7233 (#7761 ) * fix: move leader elect flag binding above InitFlags() * Revert https://github.com/kubernetes/autoscaler/pull/7233 https://github.com/kubernetes/autoscaler/pull/7233 broke `--leader-elect` flag by introducing `--lease-resource-name` that is redundant with `--leader-elect-resource-name` * fix: move leader election flag binding above flag parsing which happens in kube_flag.InitFlags() --------- Co-authored-by: Daniel Kłobuszewski <danielmk@google.com>	2025-01-24 00:41:22 -08:00
Kubernetes Prow Robot	f6064ee8e3	Merge pull request #7688 from macsko/enforce_provisioning_request_processing_even_if_all_pods_are_young Enforce provisioning requests processing even if all pods are new	2025-01-23 01:54:58 -08:00
Julien Francoz	f8a68efe63	change default cluster-autoscaler to least-waste With the previous default of random, this could lead to start very expansives nodes that the cluster autoscaler does not manage to remove as long as another smaller node is started.	2025-01-20 21:47:13 +01:00
Maciej Skoczeń	d7c325abf7	Enforce provisioning requests processing even if all pods are new	2025-01-10 13:08:56 +00:00
Maciej Skoczeń	e7811b86fa	Improve frequent loops when only one of activities is productive	2025-01-09 09:33:23 +00:00
Maciej Skoczeń	39882551f7	Parallelize cluster snapshot creation	2025-01-03 10:35:11 +00:00
Kubernetes Prow Robot	50c65906fd	Merge pull request #7530 from towca/jtuznik/dra-actual CA: DRA integration MVP	2024-12-20 16:30:08 +01:00
Kuba Tużnik	4a89524f84	CA: enable the DRA feature gate whenever the DRA flag is passed This is needed so that the scheduler code correctly includes and executes the DRA plugin. We could just use the feature gate instead of the DRA flag in CA (the feature gates flag is already there, just not really used), but I guess there could be use-cases for having DRA enabled in the cluster but not in CA (e.g. DRA being tested in the cluster, CA only operating on non-DRA nodes/pods).	2024-12-20 13:30:37 +01:00
Kuba Tużnik	99282c08cb	CA: automatically use BasicSnapshotStore when DRA is enabled By default CA is built with DeltaSnapshotStore, which isn't integrated with DRA yet.	2024-12-20 13:30:37 +01:00
Kubernetes Prow Robot	e4898a9563	Merge pull request #7611 from macsko/dont_accept_pr_twice_when_check_capacity_batch_processing_enabled Don't accept ProvisioningRequest twice when checkCapacityBatchProcessing enabled	2024-12-17 10:58:53 +01:00
Kubernetes Prow Robot	ae22146f60	Merge pull request #7449 from thiha-min-thant/failed-scale-ups-metrics 🐛(metrics) Initialize metrics for autoscaler errors, scale events, and pod evictions	2024-12-16 11:10:51 +01:00
Maciej Skoczeń	2426d7f836	Don't accept ProvisioningRequest twice when checkCapacityBatchProcessing enabled	2024-12-16 09:57:18 +00:00
Kuba Tużnik	d0338fa301	CA: integrate simulator with schedulerframework.SharedDRAManager Make SharedDRAManager a part of the ClusterSnapshotStore interface, and implement dummy methods to satisfy the interface. Actual implementation will come in later commits. This is needed so that ClusterSnapshot can feed DRA objects to the DRA scheduler plugin, and obtain ResourceClaim modifications back from it. The integration is behind the DRA flag guard, this should be a no-op if the flag is disabled.	2024-12-09 17:14:34 +01:00
Kuba Tużnik	8c7f3fadc6	CA: plumb the DRA flag guard to PredicateSnapshot	2024-12-06 13:40:47 +01:00
Kuba Tużnik	d9036c856e	CA: add a flag guard for DRA to AutoscalingOptions (disabled by default) This flag will be used to guard any behavior-changing logic needed for DRA, to make it clear that existing behavior for non-DRA use-cases is preserved.	2024-12-04 18:07:18 +01:00
Kuba Tużnik	6876289228	CA: remove PredicateChecker, use the new ClusterSnapshot methods instead	2024-12-04 14:33:51 +01:00
Kuba Tużnik	0ace148d3d	CA: rename BasicClusterSnapshot and DeltaClusterSnapshot to reflect the ClusterSnapshotStore change	2024-12-04 14:33:51 +01:00
Kuba Tużnik	67773a5509	CA: move BasicClusterSnapshot and DeltaClusterSnapshot to a dedicated subpkg	2024-12-04 14:33:51 +01:00
Kuba Tużnik	540725286f	CA: migrate the codebase to use PredicateSnapshot	2024-12-04 14:33:51 +01:00
Kuba Tużnik	a35f830f1d	CA: extract a Handle to scheduleframework.Framework out of PredicateChecker This decouples PredicateChecker from the Framework initialization logic, and allows creating multiple PredicateChecker instances while only initializing the framework once. This commit also fixes how CA integrates with Framework metrics. Instead of Registering them they're only Initialized so that CA doesn't expose scheduler metrics. And the initialization is moved from multiple different places to the Handle constructor.	2024-12-03 16:47:54 +01:00
Bartłomiej Wróblewski	a0bf1082b5	Add flag to force remove long unregistered nodes	2024-11-18 13:55:15 +00:00
Thiha Min Thant	ffd57af618	🐛(metrics) Initialize metrics for autoscaler errors, scale events, and pod evictions - Set initial count to zero for various autoscaler error types (e.g., CloudProviderError, ApiCallError) - Define failed scale-up reasons and initialize metrics (e.g., CloudProviderError, APIError) - Initialize pod eviction result counters for success and failure cases - Initialize skipped scale events for CPU and memory resource limits in both scale-up and scale-down directions Signed-off-by: Thiha Min Thant <thihaminthant20@gmail.com>	2024-11-02 15:43:56 +08:00
Devansh Das	63d02751b1	Implement batch processing for check capacity class with combined status	2024-10-24 21:14:55 +00:00
Bartłomiej Wróblewski	068ce78272	Register scheduler metrics	2024-10-23 16:47:34 +00:00
Devansh Das	1ce64e93d4	Add support for frequent loops when provisioningrequest is encountered in last iteration	2024-10-20 17:55:05 +00:00
Devansh Das	0946d851e7	Revert "Add support for frequent loops when provisioningrequest is encountered in last iteration"	2024-10-18 12:21:04 +02:00
Devansh Das	0a64fb0c27	Add support for frequent loops when provisioningrequest is encountered in last iteration	2024-10-15 09:37:54 +00:00
olagacek	44dcaa8cf3	Revert "CAS: cloudprovider-specific nodegroupset"	2024-10-04 12:54:22 +02:00
Yaroslava Serdiuk	04b1402ddc	Add backoff mechanism for ProvReq retry (#7182 ) * Add backoff mechanism for ProvReq retry * Add flags for intital and max backoff time, and cache size * Review remarks * Add LRU cache * Review remark	2024-09-23 09:16:00 +01:00
Devansh Das	94a5ef81e6	Removed redundant error check and variable declarations	2024-09-16 20:47:49 +00:00
Omran	38ce500d5f	Fix scale up status processor overriding default one with proactive scaleup enabled	2024-09-12 18:26:56 +00:00
Jack Francis	4ff4079041	cloudprovider-specific nodegroupset Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2024-09-06 10:09:40 -07:00
Devansh Das	c3825193c2	Add lease resource name customization using flag	2024-09-04 08:10:18 +00:00
Daniel Kłobuszewski	7f30a7b8d1	Remove legacy scale down code	2024-08-28 11:07:40 +02:00
Omran	01e943304a	Add proactive scaleup	2024-08-23 21:59:15 +00:00
Omran	e30bf14730	Add upcoming node groups state checker	2024-08-22 07:42:38 +00:00
mendelski	c06ec4b324	Add async node group creation	2024-08-20 12:02:01 +00:00
Daniel Gutowski	4d845bd749	Add option to pass custom filter funtion for nodes This will allow users to filter-out some of the nodes when filtering out pods potentially schedulable.	2024-07-16 02:48:19 -07:00
Kubernetes Prow Robot	68a757c191	Merge pull request #6880 from yaroslava-serdiuk/provreq-scale-down BookCapacity for ProvisioningRequest pods	2024-07-12 11:19:00 -07:00
Damika Gamlath	8971a29177	refactor gce.RegenerateMigInstancesCache() to use Instance.List API for listing MIG instances	2024-07-04 14:48:24 +00:00
Kubernetes Prow Robot	3c6dd26d9e	Merge pull request #6863 from rrangith/azure-default-sizes Default min/max sizes for Azure VMSSs	2024-07-01 07:29:35 -07:00

1 2 3 4 5 ...

314 Commits