autoscaler

Commit Graph

Author	SHA1	Message	Date
Bartłomiej Wróblewski	2c7d8dc378	Rewrite TestCloudProvider to use builder pattern	2025-05-23 12:42:15 +00:00
Norbert Cyran	9a5e3d9f3d	Allow using scheduled pods as samples in proactive scale up	2025-03-19 12:33:39 +01:00
Daniel Kłobuszewski	bac35046fb	Fix incorrect usage of klog Warningf function The .*f variants should only ever be called with arguments to format. This should've really been a part of https://github.com/kubernetes/autoscaler/pull/7917	2025-03-13 13:50:39 +01:00
Kubernetes Prow Robot	5e7a559aa8	Merge pull request #7841 from pmendelski/force-ds-fix2 Force system-node-critical daemon-sets in node group templates	2025-03-06 10:49:45 -08:00
Maciej Skoczeń	90eabc6a4d	Differentiate provisioning requests using Parameters field. Keep prefixing as not recommended approach	2025-03-04 11:41:51 +00:00
Maciej Skoczeń	7115527077	Allow to prefix provisioningClassName to filter provisioning requests	2025-03-04 10:48:21 +00:00
mendelski	4f58055eeb	Skip to be deleted nodes from template candidates	2025-02-17 18:10:36 +00:00
Justyna Betkier	b8db30c2fb	Improve events when max total nodes of the cluster is reached. - log cluster wide event - previous event would never get fired because the estimators would already cap the options they generate and additionally it would fire once and events are kept only for some time - log per pod event explaining why the scale up is not triggered (previously it would either get no scale up because no matching group or it would not get an event at all) This required adding a list of pods that were unschedulable to the status in case when the max total nodes were reached.	2025-02-12 13:24:51 +01:00
Justyna Betkier	86ee2b723a	Improve logging when the cluster reaches max nodes total. - add autoscaling status to reflect that - change the log severity to warning as this means that autoscaler will not be fully functional (in praticular scaling up will not work) - fix the scale up enforcer logic not to skip the max nodes reached logging point	2025-01-29 13:37:48 +01:00
Maciej Skoczeń	d7c325abf7	Enforce provisioning requests processing even if all pods are new	2025-01-10 13:08:56 +00:00
Maciej Skoczeń	39882551f7	Parallelize cluster snapshot creation	2025-01-03 10:35:11 +00:00
Kubernetes Prow Robot	7b648361c3	Merge pull request #7613 from walidghallab/err Refactor NewAutoscalerError function.	2024-12-17 13:48:53 +01:00
Walid Ghallab	720f5946fd	Refactor NewAutoscalerError function. We will have two functions instead of one: 1. One that doesn't do formatting, like klog.Error 2. One that accepts formating, like klog.Errorf The main reason behind this is to avoid go vet errors and have clear interfaces to catch accidental bugs and rely on go vet to catch those accidental bugs (or go test in go 1.24, as those are treated as errors).	2024-12-16 17:46:40 +00:00
Maciej Skoczeń	2426d7f836	Don't accept ProvisioningRequest twice when checkCapacityBatchProcessing enabled	2024-12-16 09:57:18 +00:00
Kubernetes Prow Robot	37b3da4e79	Merge pull request #7529 from towca/jtuznik/dra-prep CA: prepare for DRA integration	2024-12-09 17:14:03 +00:00
Kuba Tużnik	466f94b780	CA: extend ClusterSnapshotStore to allow storing, retrieving and modifying DRA objects A new DRA Snapshot type is introduced, for now with just dummy methods to be implemented in later commits. The new type is intended to hold all DRA objects in the cluster. ClusterSnapshotStore.SetClusterState() is extended to take the new DRA Snapshot in addition to the existing parameters. ClusterSnapshotStore.DraSnapshot() is added to retrieve the DRA snapshot set by SetClusterState() back. This will be used by PredicateSnapshot to implement DRA logic later. This should be a no-op, as DraSnapshot() is never called, and no DRA snapshot is passed to SetClusterState() yet.	2024-12-09 17:14:45 +01:00
Kubernetes Prow Robot	52dd6d7488	Merge pull request #7561 from gabesaba/check_capacity_parallel [CheckCapacity] Update Conditions in Parallel	2024-12-05 14:00:00 +00:00
Gabe	5877f9670f	[CheckCapacity] Set Provisioned/Accepted in parallel	2024-12-05 12:58:54 +00:00
Kuba Tużnik	6876289228	CA: remove PredicateChecker, use the new ClusterSnapshot methods instead	2024-12-04 14:33:51 +01:00
Kuba Tużnik	0ace148d3d	CA: rename BasicClusterSnapshot and DeltaClusterSnapshot to reflect the ClusterSnapshotStore change	2024-12-04 14:33:51 +01:00
Kuba Tużnik	67773a5509	CA: move BasicClusterSnapshot and DeltaClusterSnapshot to a dedicated subpkg	2024-12-04 14:33:51 +01:00
Kuba Tużnik	540725286f	CA: migrate the codebase to use PredicateSnapshot	2024-12-04 14:33:51 +01:00
Kuba Tużnik	a35f830f1d	CA: extract a Handle to scheduleframework.Framework out of PredicateChecker This decouples PredicateChecker from the Framework initialization logic, and allows creating multiple PredicateChecker instances while only initializing the framework once. This commit also fixes how CA integrates with Framework metrics. Instead of Registering them they're only Initialized so that CA doesn't expose scheduler metrics. And the initialization is moved from multiple different places to the Handle constructor.	2024-12-03 16:47:54 +01:00
Kuba Tużnik	eb26816ce9	CA: refactor utils related to NodeInfos simulator.BuildNodeInfoForNode, core_utils.GetNodeInfoFromTemplate, and scheduler_utils.DeepCopyTemplateNode all had very similar logic for sanitizing and copying NodeInfos. They're all consolidated to one file in simulator, sharing common logic. DeepCopyNodeInfo is changed to be a framework.NodeInfo method. MixedTemplateNodeInfoProvider now correctly uses ClusterSnapshot to correlate Nodes to scheduled pods, instead of using a live Pod lister. This means that the snapshot now has to be properly initialized in a bunch of tests.	2024-11-27 12:51:30 +01:00
Kuba Tużnik	a81aa5c616	CA: remove AddNode from ClusterSnapshot AddNodeInfo already provides the same functionality, and has to be used in production code in order to propagate DRA objects correctly. Uses in production are replaced with SetClusterState(), which will later take DRA objects into account. Uses in the test code are replaced with AddNodeInfo().	2024-11-19 15:28:16 +01:00
Kuba Tużnik	879c6a84a4	DRA: migrate all of CA to use the new internal NodeInfo/PodInfo The new wrapper types should behave like the direct schedulerframework types for most purposes, so most of the migration is just changing the imported package. Constructors look a bit different, so they have to be adapted - mostly in test code. Accesses to the Pods field have to be changed to a method call. After this, the schedulerframework types are only used in the new wrappers, and in the parts of simulator/ that directly interact with the scheduler framework. The rest of CA codebase operates on the new wrapper types.	2024-11-05 16:43:43 +01:00
Omran	f945fc4add	Modify scale down set processor to add reasons to unremovable nodes	2024-10-29 10:28:37 +00:00
Devansh Das	d73bdb1902	Implement unit tests for batch processing of check capacity class	2024-10-24 21:14:55 +00:00
Bartłomiej Wróblewski	068ce78272	Register scheduler metrics	2024-10-23 16:47:34 +00:00
Devansh Das	1ce64e93d4	Add support for frequent loops when provisioningrequest is encountered in last iteration	2024-10-20 17:55:05 +00:00
Devansh Das	0946d851e7	Revert "Add support for frequent loops when provisioningrequest is encountered in last iteration"	2024-10-18 12:21:04 +02:00
Kubernetes Prow Robot	64a64322d4	Merge pull request #7376 from damikag/cleanup-remove-or-update-logs Remove/update spamming logs	2024-10-16 13:37:03 +01:00
mendelski	4ef901cdbb	Synchronize access to scale-ups in AsyncNodeGroupInitializer	2024-10-16 11:32:35 +00:00
Devansh Das	0a64fb0c27	Add support for frequent loops when provisioningrequest is encountered in last iteration	2024-10-15 09:37:54 +00:00
Kubernetes Prow Robot	9a2e450164	Merge pull request #7310 from kawych/htn Remove an assumption that node initialization can be performed with a single 'targetSize' number input	2024-10-11 15:14:20 +01:00
Damika Gamlath	e20e5e600b	Remove spamming logs in compare_nodegroups.go and filter_out_daemon_sets.go Change the log lovel and type of spamming logs in clusterstate.go and pre_filtering_processor.go	2024-10-10 08:48:24 +00:00
olagacek	44dcaa8cf3	Revert "CAS: cloudprovider-specific nodegroupset"	2024-10-04 12:54:22 +02:00
Mahmoud Atwa	b185b14ea1	Report only injected pods after enforcing pod limit	2024-10-03 16:32:00 +00:00
Mahmoud Atwa	16688dcdbb	Adds injection metrics for fake pod injection	2024-10-03 12:19:03 +00:00
Karol Wychowaniec	95ea94cf4e	Remove an assumption that node initialization can be performed with a single 'targetSize' number input	2024-10-02 11:55:22 +00:00
Yaroslava Serdiuk	04b1402ddc	Add backoff mechanism for ProvReq retry (#7182 ) * Add backoff mechanism for ProvReq retry * Add flags for intital and max backoff time, and cache size * Review remarks * Add LRU cache * Review remark	2024-09-23 09:16:00 +01:00
Omran	38ce500d5f	Fix scale up status processor overriding default one with proactive scaleup enabled	2024-09-12 18:26:56 +00:00
Yaroslava Serdiuk	93897d8d1b	Delete old ProvReqs	2024-09-11 12:59:18 +00:00
Jack Francis	4ff4079041	cloudprovider-specific nodegroupset Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2024-09-06 10:09:40 -07:00
Devansh Das	6d5bfeb67c	Add unit test to ensure unschedulable pods slice is not overwritten by injector	2024-09-06 14:10:17 +00:00
Devansh Das	835b79bfce	Subdivide provision method	2024-09-06 10:26:38 +00:00
Walid Ghallab	a91c771f37	Fix nil pointer check in nodegroup_manager.go	2024-09-02 13:01:49 +00:00
Kubernetes Prow Robot	9226cf6bb2	Merge pull request #7145 from abdelrahman882/proactive-scaleup Add proactive scaleup	2024-08-26 16:40:14 +01:00
Omran	01e943304a	Add proactive scaleup	2024-08-23 21:59:15 +00:00
Kubernetes Prow Robot	70f0bcbca9	Merge pull request #7195 from aleksandra-malinowska/prov-req-api-v1-5 ProvisioningRequest v1 client	2024-08-23 17:07:53 +01:00

1 2 3 4 5

222 Commits