autoscaler

Commit Graph

Author	SHA1	Message	Date
Bartłomiej Wróblewski	2c7d8dc378	Rewrite TestCloudProvider to use builder pattern	2025-05-23 12:42:15 +00:00
Norbert Cyran	6ab7e2eb78	Prevent nil dereference of preFilterStatus	2025-05-07 10:38:20 +02:00
Piotr Betkier	ac1c7b5463	use k8s.io/component-helpers/resource for pod request calculations	2025-04-22 17:36:17 +02:00
Norbert Cyran	9a5e3d9f3d	Allow using scheduled pods as samples in proactive scale up	2025-03-19 12:33:39 +01:00
Kubernetes Prow Robot	5e7a559aa8	Merge pull request #7841 from pmendelski/force-ds-fix2 Force system-node-critical daemon-sets in node group templates	2025-03-06 10:49:45 -08:00
Norbert Cyran	a946d3d7c9	Make AddTaints and CleanTaints thread safe	2025-02-26 16:17:09 +01:00
mendelski	68c7d1a84e	Force preempting system-node-critical daemon sets	2025-02-17 18:10:27 +00:00
Norbert Cyran	1ffc13fc0d	Ensure added taints are present on the node in the snapshot	2025-02-17 11:36:14 +01:00
Karol Wychowaniec	5245a5b934	Minor refactor to scale-up orchestrator for more re-usability	2025-01-21 14:19:59 +00:00
Kubernetes Prow Robot	50c65906fd	Merge pull request #7530 from towca/jtuznik/dra-actual CA: DRA integration MVP	2024-12-20 16:30:08 +01:00
Kuba Tużnik	377639a8dc	CA: implement dynamicresources.Snapshot for storing and modifying the state of DRA objects The Snapshot can hold all DRA objects in the cluster, and expose them to the scheduler framework via the SharedDRAManager interface. The state of the objects can be modified during autoscaling simulations using the provided methods.	2024-12-20 13:30:10 +01:00
Kuba Tużnik	66d0aeb3cb	CA: implement utils for interacting with ResourceClaims These utils will be used by various parts of the DRA logic in the following commits.	2024-12-19 15:55:49 +01:00
Walid Ghallab	720f5946fd	Refactor NewAutoscalerError function. We will have two functions instead of one: 1. One that doesn't do formatting, like klog.Error 2. One that accepts formating, like klog.Errorf The main reason behind this is to avoid go vet errors and have clear interfaces to catch accidental bugs and rely on go vet to catch those accidental bugs (or go test in go 1.24, as those are treated as errors).	2024-12-16 17:46:40 +00:00
Kuba Tużnik	eb26816ce9	CA: refactor utils related to NodeInfos simulator.BuildNodeInfoForNode, core_utils.GetNodeInfoFromTemplate, and scheduler_utils.DeepCopyTemplateNode all had very similar logic for sanitizing and copying NodeInfos. They're all consolidated to one file in simulator, sharing common logic. DeepCopyNodeInfo is changed to be a framework.NodeInfo method. MixedTemplateNodeInfoProvider now correctly uses ClusterSnapshot to correlate Nodes to scheduled pods, instead of using a live Pod lister. This means that the snapshot now has to be properly initialized in a bunch of tests.	2024-11-27 12:51:30 +01:00
Kuba Tużnik	bc16a6f55b	CA: wrap the provided errors in ToAutoscalerError() and AddPrefix(), implement Unwrap() This allows using errors.Is() to check if an AutoscalerError wraps a sentinel error (e.g. cloudprovider.ErrNotImplemented) when a prefix is added to it.	2024-11-27 12:44:59 +01:00
Kuba Tużnik	879c6a84a4	DRA: migrate all of CA to use the new internal NodeInfo/PodInfo The new wrapper types should behave like the direct schedulerframework types for most purposes, so most of the migration is just changing the imported package. Constructors look a bit different, so they have to be adapted - mostly in test code. Accesses to the Pods field have to be changed to a method call. After this, the schedulerframework types are only used in the new wrappers, and in the parts of simulator/ that directly interact with the scheduler framework. The rest of CA codebase operates on the new wrapper types.	2024-11-05 16:43:43 +01:00
Kuba Tużnik	358f8c0d21	DRA: remove utils/test dependency on cloudprovider utils/test is supposed to be usable in any CA package. Having a dependency on cloudprovider makes it unusuable in any package that cloudprovider depends on because of import cycles. The cloudprovider import is only needed by GetGpuConfigFromNode, which is only used in info_test.go. This commit just moves GetGpuConfigFromNode there as an unexported function.	2024-11-05 16:42:26 +01:00
Kubernetes Prow Robot	c75c3254b2	Merge pull request #7152 from walidghallab/sanitize Sanitize ENV from podSpec to improve scheduling simulation performance.	2024-08-29 09:06:30 +01:00
Walid Ghallab	e6ee407001	Sanitize ENV from podSpec. This is because it doesn't affect scheduling, but it affect performance of scheduling simulation. It triggered alert from 'overflowing_controllers_count' metric.	2024-08-28 14:50:28 +00:00
Devansh Das	1dba38e850	Run gofmt	2024-08-14 14:24:29 +00:00
Devansh Das	ed38cbbffa	Add unique error message for when in-cluster config is not found	2024-08-12 17:59:59 +00:00
Devansh Das	ee6d4f9b5d	Add in cluster kubeconfig config	2024-08-12 17:59:43 +00:00
Kubernetes Prow Robot	442c35391c	Merge pull request #6782 from jbartosik/nice-format-blocking-pod-reason Format BlockingPodReason in human readbale way with %v	2024-05-07 04:25:48 -07:00
Joachim Bartosik	4e42fe78b6	Format BlockingPodReason in human readbale way with %v Without this the enum is output as its int value, which is not readable	2024-05-02 10:44:54 +02:00
Joachim Bartosik	d8e1e6dd0e	Test enum formatting	2024-05-02 10:40:46 +02:00
Kubernetes Prow Robot	3fd892a37b	Merge pull request #6725 from ZihanJiang96/filter-out-evicted-pod-on-drained-nodes-1 filter out pods that have deletion timestamp set in currentlyDrainedPods	2024-04-30 02:33:09 -07:00
Zihan Jiang	e6ab3438fa	refine log and test case	2024-04-26 20:10:48 -07:00
Zihan Jiang	15ca53f4a6	add unit test	2024-04-25 14:26:59 -07:00
Yaroslava Serdiuk	5f94f2c429	Add provreqOrchestrator that handle ProvReq classes (#6627 ) * Add provreqOrchestrator that handle ProvReq classes * Review remarks * Review remarks	2024-04-17 09:37:54 -07:00
Daniel Gutowski	5aa6b2cb07	Introduce binbacking optimization for similar pods. The optimization uses the fact that pods which are equivalent do not need to be check multiple times against already filled nodes. This changes the time complexity from O(pods*nodes) to O(pods).	2024-04-04 10:15:47 -07:00
Walid Ghallab	aada657452	Add BuildTestNodeWithAllocatable test utility method.	2024-02-27 19:26:48 +00:00
Daniel Kłobuszewski	a842d4f108	Reduce log spam in AtomicResizeFilteringProcessor Also, introduce default per-node logging quotas. For now, identical to the per-pod ones.	2024-02-07 12:01:05 +01:00
Yaroslava Serdiuk	ed6ebbe8ba	ScaleUp for check-capacity ProvisioningRequestClass (#6451 ) * ScaleUp for check-capacity ProvisioningRequestClass * update condition logic * Update tests * Naming update * Update cluster-autoscaler/core/scaleup/orchestrator/wrapper_orchestrator_test.go Co-authored-by: Bartek Wróblewski <bwroblewski@google.com> --------- Co-authored-by: Bartek Wróblewski <bwroblewski@google.com>	2024-01-30 02:36:59 -08:00
Kubernetes Prow Robot	838ba52860	Merge pull request #6378 from BigDarkClown/multitaint Taint utils taking multiple taints	2024-01-12 14:07:47 +01:00
Bartłomiej Wróblewski	c4063ef8d6	Taint utils taking multiple taints	2024-01-11 16:45:07 +00:00
Joachim Bartosik	a5e540d5da	Restore flags for setting QPS limit in CA Partially undo #6274. I noticed that with this change CA get rate limited and slows down significantly (especially during large scale downs).	2023-12-29 13:28:08 +00:00
Kubernetes Prow Robot	fc48d5c052	Merge pull request #6139 from damikag/priority-evictor Implement priority based evictor	2023-12-21 18:18:53 +01:00
damikag	9ffbea4408	implement priority based evictor and refactor drain logic	2023-12-21 16:57:05 +00:00
Kubernetes Prow Robot	2afb9683dd	Merge pull request #6376 from x13n/master [GCE] Support paginated instance listing	2023-12-15 12:36:18 +01:00
Daniel Kłobuszewski	e3d3303f89	[GCE] Support paginated instance listing	2023-12-15 09:16:09 +01:00
Walid Ghallab	f89427ad9f	Make backoff.Status.ErrorInfo non-pointer. Change-Id: I1f812d4d6f42db97670ef7304fc0e895c837a13b	2023-12-14 15:28:27 +00:00
Walid Ghallab	cf6176f80d	Add error details to autoscaling backoff. Change-Id: I3b5c62ba13c2e048ce2d7170016af07182c11eee	2023-12-14 13:45:55 +00:00
Kubernetes Prow Robot	0a1d74f352	Merge pull request #6294 from vadasambar/refactor/kube-client refactor(*): move getKubeClient to utils/kubernetes	2023-11-29 19:28:44 +01:00
qianlei.qianl	ae18f05a61	refactor(*): move getKubeClient to utils/kubernetes (cherry picked from commit `b9f636d2ef`) Signed-off-by: qianlei.qianl <qianlei.qianl@bytedance.com> refactor: move logic to create client to utils/kubernetes pkg - expose `CreateKubeClient` as public function - make `GetKubeConfig` into a private `getKubeConfig` function (can be exposed as a public function in the future if needed) Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI failing because cloudproviders were not updated to use new autoscaling option fields Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: define errors as constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: pass kube client options by value Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-11-29 00:43:59 +05:30
Kubernetes Prow Robot	39245a5613	Merge pull request #6235 from atwamahmoud/ignore-scheduler-processing Ignore scheduler processing	2023-11-22 13:54:30 +01:00
Mahmoud Atwa	a1ae4d3b57	Update flags, Improve tests readability & use Bypass instead of ignore in naming	2023-11-22 11:18:55 +00:00
Mahmoud Atwa	4635a6dc04	Allow users to specify which schedulers to ignore	2023-11-22 11:18:44 +00:00
Mahmoud Atwa	86ab017967	Fix multiple comments and update flags	2023-11-22 11:17:48 +00:00
Mahmoud Atwa	a1ab7b9e20	Add new pod list processors for clearing TPU requests & filtering out expendable pods Treat non-processed pods yet as unschedulable	2023-11-22 11:16:33 +00:00
piotrwrotniak	f2b8272949	Remove maps.Copy usage.	2023-11-09 09:46:48 +00:00

1 2 3 4 5 ...

339 Commits