autoscaler

Commit Graph

Author	SHA1	Message	Date
jesse.millan	3fd510bb5a	Upgrade OCI provider SDK to v65.90.0. Required for Go 1.24.	2025-05-10 22:57:16 -07:00
Kubernetes Prow Robot	9cdcc284ea	Merge pull request #8047 from raykrueger/aws-eks-hybrid-nodes-fix fix: AWSCloudProvider should ignore unrecognized provider IDs	2025-05-04 15:25:56 -07:00
Kubernetes Prow Robot	41630404f3	Merge pull request #7817 from karsten42/feature/hetzner-config-from-configmap added possibility to retrieve hcloud cluster config from file	2025-05-02 03:55:54 -07:00
Karsten van Baal	ea764b4ef7	chore: refactored config parsing	2025-05-02 12:25:25 +02:00
Kubernetes Prow Robot	24494f3c06	Merge pull request #7804 from ttsuuubasa/capi-scale-from-0-nodes cluster-api: node template in scale-from-0-nodes scenario with DRA	2025-05-01 16:17:54 -07:00
Tsubasa Watanabe	2291b74a2d	Make InstanceResourceSlices func more efficient and make comments about DRA annotation in capi more recognizable Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>	2025-05-01 12:09:48 +09:00
Piotr Betkier	ac1c7b5463	use k8s.io/component-helpers/resource for pod request calculations	2025-04-22 17:36:17 +02:00
Ray Krueger	3a1973872f	fix: AWSCloudProvider ignores unrecognized provider IDs The AWSCloudProvider only supports aws://zone/name ProviderIDs. It should ignore ProviderIDs it does not recognize. Prior to this fix, an unrecognized ProviderID, such as eks-hybrid://zone/cluster/my-node which is used by EKS Hybrid Nodes, will break the Autoscaler loop. This fix returns logs a warning, and returns nil, nil instead of returning the error.	2025-04-17 17:27:46 +00:00
Pierre Ozoux	e51dcfb60b	Update cluster-autoscaler/cloudprovider/clusterapi/README.md	2025-04-17 09:18:28 +02:00
Pierre Ozoux	6eebb82f0d	Update cluster-autoscaler/cloudprovider/clusterapi/README.md	2025-04-17 09:17:44 +02:00
Maksym Fuhol	99584890b4	Clean instance templates for untracked migs.	2025-04-15 12:29:19 +00:00
karsten	22dc4e06f6	chore: added paragraph to readme for new HCLOUD_CLUSTER_CONFIG_FILE	2025-04-11 07:29:01 +02:00
Daniel Kłobuszewski	f1a44d89cf	Remove outdated GCE cloudprovider owners	2025-04-08 13:24:20 +02:00
Kubernetes Prow Robot	4bc861d097	Merge pull request #7923 from Uladzislau97/nap-resilience Improve resilience of diskTypes requests.	2025-04-08 04:22:40 -07:00
Kubernetes Prow Robot	7c28f52f93	Merge pull request #7854 from AppliedIntuition/master Fix 2 bugs in the OCI integration	2025-04-07 09:14:42 -07:00
Vlad Vasilyeu	93e21d05e2	Replace diskTypes.aggregatedList request with diskTypes.list in FetchAvailableDiskTypes.	2025-04-07 07:50:29 +00:00
Kubernetes Prow Robot	1de2160986	Merge pull request #7908 from Preisschild/fix/capi-patch-instead-update CA: Use Patch to Scale clusterapi nodepools	2025-04-03 07:16:48 -07:00
Kubernetes Prow Robot	dc91330f6a	Merge pull request #7989 from loick111/feature/clusterapi-instances-status ClusterAPI: Report machine phases to improve cluster-autoscaler decisions	2025-04-01 07:44:38 -07:00
Florian Ströger	ecb572a945	Use Patch to Scale clusterapi nodepools to avoid modification conflicts Issue: https://github.com/kubernetes/autoscaler/issues/7872 Signed-off-by: Florian Ströger <stroeger@youniqx.com>	2025-04-01 08:26:45 +02:00
Pierre Ozoux	8a954bc021	docs(autoscaler): add details about flags It is currently slightly confusing if you skim through the documentation. For instance, see the discussion here: https://github.com/kubernetes/autoscaler/pull/7974 I hope that by adding these 2 Important section the reader would be warned about the key difference, and need for these 2 options.	2025-03-28 15:47:14 +01:00
Loick MAHIEUX	005a42b9af	feat(cluster-autoscaler): improve nodes listing in ClusterAPI provider Add improved error handling for machines phase in the ClusterAPI node group implementation. When a machine is in Deleting/Failed/Pending phase, mark the cloudprovider.Instance with a status for cluster-autoscaler recovery actions. The changes: - Enhance Nodes listing to allow reporting the machine phase in Instance status - Add error status reporting for failed machines This change helps identify and manage failed machines more effectively, allowing the autoscaler to make better scaling decisions.	2025-03-28 15:07:34 +01:00
Kubernetes Prow Robot	7b6996469b	Merge pull request #7973 from jincong8973/master feat: add ignoreDaemonSetsUtilization and zeroOrMaxNodeScaling to NodeGroupAutoscalingOptions	2025-03-27 00:00:35 -07:00
KrJin	e713b51bd6	feat: add missing field zeroOrMaxNodeScaling and ignoreDaemonSetsUtilization to NodeGroupAutoscalingOptions [squashed]Add field IgnoreDaemonSetsUtilization and zeroOrMaxNodeScaling that missing in externalgrpc proto	2025-03-27 11:28:12 +08:00
Kubernetes Prow Robot	2ca5b44652	Merge pull request #7977 from elmiko/refactor-findscalableproviderids refactor findScalableResourceProviderIDs in clusterapi	2025-03-26 10:22:43 -07:00
elmiko	5e1fc195a3	refactor findScalableResourceProviderIDs in clusterapi this change refactors the function so that it each distinct machine state can be filtered more easily. the unit tests have been supplemented, but not changed to ensure that the functionality continues to work as expected. these changes are to help better detect edge cases where machines can be transiting through pending phase and might be removed by the autoscaler.	2025-03-26 12:41:09 -04:00
Kubernetes Prow Robot	63309979ba	Merge pull request #7826 from Azure/rakechill/update-skewer-version-master Update skewer version to v0.0.19 (master)	2025-03-26 01:30:34 -07:00
Veer Singh	a226478f53	pricing changes: updated z3 pricing information	2025-03-24 04:06:26 +00:00
eric-higgins-ai	8da9a7b4af	add log messages	2025-03-21 14:02:10 -07:00
eric-higgins-ai	370c8eb78e	Revert "Address comment" This reverts commit `233d5c6e4d`.	2025-03-21 13:58:56 -07:00
Jack Francis	7b5e10156e	s/nodeHasValidProviderID/isProviderIDNormalized Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2025-03-19 12:30:33 -07:00
Jack Francis	4aa465764c	capi: node and provider ID accounting funcs Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2025-03-19 11:40:19 -07:00
elmiko	71d3595cb7	improve failed machine detection in clusterapi This change makes it so that when a failed machine is found during the `findScalableResourceProviderIDs` it will always gain a normalized provider ID with failure guard prepended. This is to ensure that machines which have gained a provider ID from the infrastructure and then later go into a failed state can be properly removed by the autoscaler when it wants to correct the size of a node group.	2025-03-19 12:34:29 -04:00
elmiko	003e6cd67c	make DecreaseTargetSize more accurate for clusterapi this change ensures that when DecreaseTargetSize is counting the nodes that it does not include any instances which are considered to be pending (i.e. not having a node ref), deleting, or are failed. this change will allow the core autoscaler to then decrease the size of the node group accordingly, instead of raising an error. This change also add some code to the unit tests to make detection of this condition easier.	2025-03-17 19:34:07 -04:00
Joel Smith	bef1f89a76	Update to golang.org/x/oauth2@v0.27 to fix CVE-2025-22868 Signed-off-by: Joel Smith <joelsmith@redhat.com>	2025-03-11 16:56:12 -06:00
eric-higgins-ai	233d5c6e4d	Address comment	2025-03-05 20:34:24 -08:00
Kubernetes Prow Robot	a58d346c09	Merge pull request #7767 from Kamatera/tag-fixes-add-support-for-filter-name-prefix Kamatera cluster autoscaler fixes	2025-02-27 11:36:30 -08:00
Jack Francis	0fd973a45e	azure: increase UT coverage in azure_vms_pool Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2025-02-25 16:08:22 -08:00
eric-higgins-ai	91d20d533e	unit test coverage	2025-02-24 11:17:00 -08:00
eric-higgins-ai	cc430980d2	fixes	2025-02-21 18:39:48 -08:00
eric-higgins-ai	5735b8ae19	get all node shapes	2025-02-21 14:02:31 -08:00
eric-higgins-ai	9c0357a6f2	fix scale up bug	2025-02-21 13:57:03 -08:00
Rachel Gregory	ed621282b5	Update only skewer with go get dep@ver	2025-02-20 15:12:40 -08:00
Rachel Gregory	72665b3d1c	Undo previous changes made by go mod vendor This reverts commit `b66b44621e`.	2025-02-20 14:57:51 -08:00
Muhammad Soliman	4f13cabcb4	Fixes based on code review change last character in extended resources prefix to be `.` instead of `-`. Add a warning if the extended resource already exists.	2025-02-12 10:35:24 +01:00
Muhammad Soliman	ad6d6c9871	Merge branch 'kubernetes:master' into prefixed_extended_resources	2025-02-12 10:11:33 +01:00
Tsubasa Watanabe	3fbacf0d0f	cluster-api: node template in scale-from-0-nodes scenario with DRA Modify TemplateNodeInfo() to return the template of ResourceSlice. This is to address the DRA expansion of Cluster Autoscaler, allowing users to set the number of GPUs and DRA driver name by specifying the annotation to NodeGroup provided by cluster-api. Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>	2025-02-12 11:56:04 +09:00
Rachel Gregory	b66b44621e	Update skewer version on master branch	2025-02-11 14:31:35 -08:00
Karsten van Baal	65c14d5526	added possibility to retrieve hcloud cluster config from file	2025-02-10 16:41:39 +01:00
Jeremy L. Morris	c87e68c01d	add emeritus approvers section	2025-02-07 13:06:57 -05:00
Jeremy L. Morris	d8f60fe19e	Removes stale owners from DO cluster autoscaler owners file and updates to current DO employee	2025-01-30 13:10:27 -05:00
Ori Hoch	83ce5dd13c	fix tag name attribute	2025-01-24 15:24:26 +02:00
Ori Hoch	3c187264cd	add retry mechanism	2025-01-24 15:15:59 +02:00
Ori Hoch	1d42eb55ea	shorten the uuids	2025-01-24 13:20:03 +02:00
Ori Hoch	cdb90ec4d5	tags fixes and add support for filter name prefix	2025-01-24 12:47:23 +02:00
Robin D.	64ca097c1e	fix: undefined instance state on provisioning state failed (#7750 ) * fix: undefined instance state on provisioning state failed * test: add unit tests for provisioning state failed + fast delete * test: support both fast/not fast delete on an affected test	2025-01-22 23:08:37 -08:00
Robin D.	9559204f61	test: add additional assertion for dynamic SKU list test (#7737 )	2025-01-22 13:22:38 -08:00
Robin D.	03e6b2797d	chore: remove unnecessary logs on fast delete and add a relevant note (#7736 )	2025-01-22 13:20:38 -08:00
Robin D.	7e8c41d175	test: clean up environments properly before/after each unit test in azure_manager_test.go (#7735 ) * test: clean up environments properly before/after each unit test in azure_manager_test.go * test: use testing.Cleanup() to ensure loadEnv()	2025-01-22 12:18:37 -08:00
Kubernetes Prow Robot	082e230b92	Merge pull request #7391 from jackfrancis/ca-cloudprovider-build-tags-hygiene add test-build-tags make target	2025-01-17 02:14:06 -08:00
Kubernetes Prow Robot	027795a97c	Merge pull request #7339 from justinmir/kwok-provider-metrics-annotation Add metrics-server annotation for kwok-provider managed nodes	2025-01-17 01:50:07 -08:00
Robin Deeboonchai	97dd5fe4ee	fix: don't crash when vmss not present or has no nodes	2025-01-16 14:57:20 -08:00
Kubernetes Prow Robot	ea52310b69	Merge pull request #6890 from b0e/implement-templateNodeInfo-for-cloudprovider-magnum Implement TemplateNodeInfo for magnum cloudprovider	2025-01-16 03:00:33 -08:00
Kubernetes Prow Robot	03e2795c9f	Merge pull request #7405 from ctrox/rancher-clarify-docs docs(rancher): clarify single RKE2 target	2024-12-30 02:14:14 +01:00
Kubernetes Prow Robot	38facfc3dd	Merge pull request #7633 from PerforMance308/master remove contact information for huaweicloud cluster autoscaler provider	2024-12-27 14:12:12 +01:00
Kubernetes Prow Robot	50c65906fd	Merge pull request #7530 from towca/jtuznik/dra-actual CA: DRA integration MVP	2024-12-20 16:30:08 +01:00
Kuba Tużnik	a45e6b7003	CA: implement DRA integration tests for StaticAutoscaler	2024-12-20 13:30:36 +01:00
Shiqi Wang	11740d1398	remove contact information	2024-12-19 09:23:05 -05:00
Muhammad Soliman	dd6f11b10e	Merge branch 'kubernetes:master' into prefixed_extended_resources	2024-12-18 10:17:16 +01:00
Kubernetes Prow Robot	da31dff7a6	Merge pull request #7614 from DataDog/update-azure-instance-types update azure static sku list	2024-12-17 20:54:52 +01:00
Rahul Rangith	6ab0eb94f7	update azure static sku list	2024-12-16 15:01:28 -05:00
Walid Ghallab	720f5946fd	Refactor NewAutoscalerError function. We will have two functions instead of one: 1. One that doesn't do formatting, like klog.Error 2. One that accepts formating, like klog.Errorf The main reason behind this is to avoid go vet errors and have clear interfaces to catch accidental bugs and rely on go vet to catch those accidental bugs (or go test in go 1.24, as those are treated as errors).	2024-12-16 17:46:40 +00:00
Kubernetes Prow Robot	148ffa345b	Merge pull request #7520 from hetznercloud/refactor-placement-groups refactor(hetzner): refactored placement group code	2024-12-16 13:36:51 +01:00
Muhammad Soliman	2b62a7d6df	Add option for passing extended resources in node labels in GCE on GCE, Cluster atuoscaler reads extended resource information from kubenv->AUTOSCALER_ENV_VARS->extended_resources in the managed scaling group template definition. However, users have no way to add a variable to extended resources, they are controlled from GKE side. This results in cluster autoscaler not supporting scale up from zero for all node pools that has extended resources (like GPU) on GCE. However, node labels are passed from the node pool to the managed scaling group template through the kubenv->AUTOSCALER_ENV_VARS->node_labels. This commit introduces the ability to pass extended resources as node labels with defined prefix on GCE, similar to how cluster autoscaler expects extended resources on AWS. This allows scaling from zero for node pools with extended resrouces.	2024-12-13 13:39:12 +01:00
lukasmetzner	d68a1f26b1	refactor: moved error checking with exiting to callsite	2024-12-13 11:57:51 +01:00
Alex Leites	61c8cdeff7	fix: corresponding test	2024-12-08 02:22:02 +00:00
Alex Leites	5e7ceee507	fix: setting getVmssSizeRefreshPeriod	2024-12-08 01:23:04 +00:00
Kubernetes Prow Robot	bd7156e837	Merge pull request #7557 from gvnc/handle-ooh-capacity-nodes Avoid making delete api calls for nodes that don't have an instance id	2024-12-06 22:48:01 +00:00
“gkazanci”	660f1aa6cd	added more logs	2024-12-03 17:03:56 +00:00
willie-yao	064d48f36c	Add toggle for fast delete	2024-11-26 00:25:04 +00:00
Kubernetes Prow Robot	86a80c6823	Merge pull request #7526 from willie-yao/cse-fast-delete Set node state to InstanceCreating to delete on CSE error	2024-11-26 00:20:57 +00:00
willie-yao	49a1ad4ad2	Set node state to InstanceCreating to delete on CSE error	2024-11-23 00:25:12 +00:00
Jack Francis	f1a1bab379	add test-build-tags make target Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2024-11-22 09:16:23 -08:00
lukasmetzner	64495d95a0	refactor(hetzner): refactored placement group code	2024-11-22 13:28:52 +01:00
Kubernetes Prow Robot	5458e1c208	Merge pull request #7436 from maximrub/fr-7435-alibaba-cloud-rrsa-new-env-vars 7435 Support New Alibaba Cloud ENV Variables names for RRSA Authorization	2024-11-22 10:30:54 +00:00
Kubernetes Prow Robot	4c37ff38ce	Merge pull request #6999 from dominic-p/iss-5919-placement-groups Add support for node pool placement group config	2024-11-20 13:04:53 +00:00
Kubernetes Prow Robot	a01276ef14	Merge pull request #7493 from BigDarkClown/remove-unneeded Add flag to force remove long unregistered nodes	2024-11-19 10:00:55 +00:00
Kubernetes Prow Robot	2d37aeefe8	Merge pull request #7385 from jlamillan/jlamillan/oci_sdk_65.75.2-2 Upgrade OCI providers SDK to v65.75.2.	2024-11-18 23:54:54 +00:00
Bartłomiej Wróblewski	c5f13bb02d	Add ForceDeleteNodes implementation for GCE cloud provider	2024-11-18 13:55:09 +00:00
Bartłomiej Wróblewski	3b47908e51	Add ForceDeleteNodes method to NodeGroup interface	2024-11-18 13:55:07 +00:00
Maxim Rubchinsky	dcd6d6ab36	7435 Support New Alibaba Cloud ENV Variables names for RRSA Authorization in Cluster Autoscaler Signed-off-by: Maxim Rubchinsky <maxim@rubchinsky.com>	2024-11-16 11:58:54 +02:00
Kubernetes Prow Robot	b01bff1640	Merge pull request #7453 from gvnc/oci-self-managed-nodes-fix exclude self-managed nodes from being processed	2024-11-15 23:32:53 +00:00
Kubernetes Prow Robot	009f2b8b16	Merge pull request #7438 from maximrub/bug-7437-alibaba-cloud-endpoint-reloving-logging 7437 Add logging for endpoint resolving errors	2024-11-15 10:10:52 +00:00
Kubernetes Prow Robot	267a0d8a98	Merge pull request #7459 from damikag/update-bootdisk-logs Change log level of boot dist type and size defaulting in gce_price	2024-11-15 09:54:53 +00:00
Kubernetes Prow Robot	59aefbcd5e	Merge pull request #7379 from ionos-cloud/remove-obsolete-upper-bound-check Remove obsolete upper bound check	2024-11-12 19:28:46 +00:00
Kubernetes Prow Robot	93f74c0948	Merge pull request #7481 from jackfrancis/vmss-proactive-deleting azure: StrictCacheUpdates to disable proactive vmss cache updates	2024-11-11 18:52:46 +00:00
Kubernetes Prow Robot	c9970a48ec	Merge pull request #7383 from DataDog/fix-instance-requirements-caching AWS: only cache instance requirements when needed	2024-11-11 14:22:46 +00:00
Jack Francis	1e5ed185d7	restore original behavior Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2024-11-10 20:47:22 -08:00
Jack Francis	c20971357f	azure: don’t eagerly update vmss cache before delete success Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2024-11-08 16:54:38 -08:00
Achim Ledermüller	a249ca9290	Implement TemplateNodeInfo for magnum cloudprovider	2024-11-07 17:04:37 +01:00
Kubernetes Prow Robot	0e8545325a	Merge pull request #7113 from IrisIris/feature/compatible-with-alicloud-desire-size add support to scaling group desired size for alicloud	2024-11-07 10:43:29 +00:00

1 2 3 4 5 ...

2408 Commits