jesse.millan
3fd510bb5a
Upgrade OCI provider SDK to v65.90.0. Required for Go 1.24.
2025-05-10 22:57:16 -07:00
Kubernetes Prow Robot
9cdcc284ea
Merge pull request #8047 from raykrueger/aws-eks-hybrid-nodes-fix
...
fix: AWSCloudProvider should ignore unrecognized provider IDs
2025-05-04 15:25:56 -07:00
Kubernetes Prow Robot
41630404f3
Merge pull request #7817 from karsten42/feature/hetzner-config-from-configmap
...
added possibility to retrieve hcloud cluster config from file
2025-05-02 03:55:54 -07:00
Karsten van Baal
ea764b4ef7
chore: refactored config parsing
2025-05-02 12:25:25 +02:00
Kubernetes Prow Robot
24494f3c06
Merge pull request #7804 from ttsuuubasa/capi-scale-from-0-nodes
...
cluster-api: node template in scale-from-0-nodes scenario with DRA
2025-05-01 16:17:54 -07:00
Tsubasa Watanabe
2291b74a2d
Make InstanceResourceSlices func more efficient and make comments about DRA annotation in capi more recognizable
...
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
2025-05-01 12:09:48 +09:00
Piotr Betkier
ac1c7b5463
use k8s.io/component-helpers/resource for pod request calculations
2025-04-22 17:36:17 +02:00
Ray Krueger
3a1973872f
fix: AWSCloudProvider ignores unrecognized provider IDs
...
The AWSCloudProvider only supports aws://zone/name ProviderIDs. It
should ignore ProviderIDs it does not recognize. Prior to this fix, an
unrecognized ProviderID, such as eks-hybrid://zone/cluster/my-node which
is used by EKS Hybrid Nodes, will break the Autoscaler loop.
This fix returns logs a warning, and returns nil, nil instead of
returning the error.
2025-04-17 17:27:46 +00:00
Pierre Ozoux
e51dcfb60b
Update cluster-autoscaler/cloudprovider/clusterapi/README.md
2025-04-17 09:18:28 +02:00
Pierre Ozoux
6eebb82f0d
Update cluster-autoscaler/cloudprovider/clusterapi/README.md
2025-04-17 09:17:44 +02:00
Maksym Fuhol
99584890b4
Clean instance templates for untracked migs.
2025-04-15 12:29:19 +00:00
karsten
22dc4e06f6
chore: added paragraph to readme for new HCLOUD_CLUSTER_CONFIG_FILE
2025-04-11 07:29:01 +02:00
Daniel Kłobuszewski
f1a44d89cf
Remove outdated GCE cloudprovider owners
2025-04-08 13:24:20 +02:00
Kubernetes Prow Robot
4bc861d097
Merge pull request #7923 from Uladzislau97/nap-resilience
...
Improve resilience of diskTypes requests.
2025-04-08 04:22:40 -07:00
Kubernetes Prow Robot
7c28f52f93
Merge pull request #7854 from AppliedIntuition/master
...
Fix 2 bugs in the OCI integration
2025-04-07 09:14:42 -07:00
Vlad Vasilyeu
93e21d05e2
Replace diskTypes.aggregatedList request with diskTypes.list in FetchAvailableDiskTypes.
2025-04-07 07:50:29 +00:00
Kubernetes Prow Robot
1de2160986
Merge pull request #7908 from Preisschild/fix/capi-patch-instead-update
...
CA: Use Patch to Scale clusterapi nodepools
2025-04-03 07:16:48 -07:00
Kubernetes Prow Robot
dc91330f6a
Merge pull request #7989 from loick111/feature/clusterapi-instances-status
...
ClusterAPI: Report machine phases to improve cluster-autoscaler decisions
2025-04-01 07:44:38 -07:00
Florian Ströger
ecb572a945
Use Patch to Scale clusterapi nodepools to avoid modification conflicts
...
Issue: https://github.com/kubernetes/autoscaler/issues/7872
Signed-off-by: Florian Ströger <stroeger@youniqx.com>
2025-04-01 08:26:45 +02:00
Pierre Ozoux
8a954bc021
docs(autoscaler): add details about flags
...
It is currently slightly confusing if you skim through the documentation.
For instance, see the discussion here:
https://github.com/kubernetes/autoscaler/pull/7974
I hope that by adding these 2 Important section the reader would be warned about the key difference, and need for these 2 options.
2025-03-28 15:47:14 +01:00
Loick MAHIEUX
005a42b9af
feat(cluster-autoscaler): improve nodes listing in ClusterAPI provider
...
Add improved error handling for machines phase in the ClusterAPI node group
implementation. When a machine is in Deleting/Failed/Pending phase, mark the cloudprovider.Instance
with a status for cluster-autoscaler recovery actions.
The changes:
- Enhance Nodes listing to allow reporting the machine phase in Instance status
- Add error status reporting for failed machines
This change helps identify and manage failed machines more effectively,
allowing the autoscaler to make better scaling decisions.
2025-03-28 15:07:34 +01:00
Kubernetes Prow Robot
7b6996469b
Merge pull request #7973 from jincong8973/master
...
feat: add ignoreDaemonSetsUtilization and zeroOrMaxNodeScaling to NodeGroupAutoscalingOptions
2025-03-27 00:00:35 -07:00
KrJin
e713b51bd6
feat: add missing field zeroOrMaxNodeScaling and ignoreDaemonSetsUtilization to NodeGroupAutoscalingOptions
...
[squashed]Add field IgnoreDaemonSetsUtilization and zeroOrMaxNodeScaling that missing in externalgrpc proto
2025-03-27 11:28:12 +08:00
Kubernetes Prow Robot
2ca5b44652
Merge pull request #7977 from elmiko/refactor-findscalableproviderids
...
refactor findScalableResourceProviderIDs in clusterapi
2025-03-26 10:22:43 -07:00
elmiko
5e1fc195a3
refactor findScalableResourceProviderIDs in clusterapi
...
this change refactors the function so that it each distinct machine
state can be filtered more easily. the unit tests have been
supplemented, but not changed to ensure that the functionality continues
to work as expected. these changes are to help better detect edge cases
where machines can be transiting through pending phase and might be
removed by the autoscaler.
2025-03-26 12:41:09 -04:00
Kubernetes Prow Robot
63309979ba
Merge pull request #7826 from Azure/rakechill/update-skewer-version-master
...
Update skewer version to v0.0.19 (master)
2025-03-26 01:30:34 -07:00
Veer Singh
a226478f53
pricing changes: updated z3 pricing information
2025-03-24 04:06:26 +00:00
eric-higgins-ai
8da9a7b4af
add log messages
2025-03-21 14:02:10 -07:00
eric-higgins-ai
370c8eb78e
Revert "Address comment"
...
This reverts commit 233d5c6e4d .
2025-03-21 13:58:56 -07:00
Jack Francis
7b5e10156e
s/nodeHasValidProviderID/isProviderIDNormalized
...
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2025-03-19 12:30:33 -07:00
Jack Francis
4aa465764c
capi: node and provider ID accounting funcs
...
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2025-03-19 11:40:19 -07:00
elmiko
71d3595cb7
improve failed machine detection in clusterapi
...
This change makes it so that when a failed machine is found during the
`findScalableResourceProviderIDs` it will always gain a normalized
provider ID with failure guard prepended. This is to ensure that
machines which have gained a provider ID from the infrastructure and
then later go into a failed state can be properly removed by the
autoscaler when it wants to correct the size of a node group.
2025-03-19 12:34:29 -04:00
elmiko
003e6cd67c
make DecreaseTargetSize more accurate for clusterapi
...
this change ensures that when DecreaseTargetSize is counting the nodes
that it does not include any instances which are considered to be
pending (i.e. not having a node ref), deleting, or are failed. this change will
allow the core autoscaler to then decrease the size of the node group
accordingly, instead of raising an error.
This change also add some code to the unit tests to make detection of
this condition easier.
2025-03-17 19:34:07 -04:00
Joel Smith
bef1f89a76
Update to golang.org/x/oauth2@v0.27 to fix CVE-2025-22868
...
Signed-off-by: Joel Smith <joelsmith@redhat.com>
2025-03-11 16:56:12 -06:00
eric-higgins-ai
233d5c6e4d
Address comment
2025-03-05 20:34:24 -08:00
Kubernetes Prow Robot
a58d346c09
Merge pull request #7767 from Kamatera/tag-fixes-add-support-for-filter-name-prefix
...
Kamatera cluster autoscaler fixes
2025-02-27 11:36:30 -08:00
Jack Francis
0fd973a45e
azure: increase UT coverage in azure_vms_pool
...
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2025-02-25 16:08:22 -08:00
eric-higgins-ai
91d20d533e
unit test coverage
2025-02-24 11:17:00 -08:00
eric-higgins-ai
cc430980d2
fixes
2025-02-21 18:39:48 -08:00
eric-higgins-ai
5735b8ae19
get all node shapes
2025-02-21 14:02:31 -08:00
eric-higgins-ai
9c0357a6f2
fix scale up bug
2025-02-21 13:57:03 -08:00
Rachel Gregory
ed621282b5
Update only skewer with go get dep@ver
2025-02-20 15:12:40 -08:00
Rachel Gregory
72665b3d1c
Undo previous changes made by go mod vendor
...
This reverts commit b66b44621e .
2025-02-20 14:57:51 -08:00
Muhammad Soliman
4f13cabcb4
Fixes based on code review
...
change last character in extended resources prefix to be `.` instead of `-`.
Add a warning if the extended resource already exists.
2025-02-12 10:35:24 +01:00
Muhammad Soliman
ad6d6c9871
Merge branch 'kubernetes:master' into prefixed_extended_resources
2025-02-12 10:11:33 +01:00
Tsubasa Watanabe
3fbacf0d0f
cluster-api: node template in scale-from-0-nodes scenario with DRA
...
Modify TemplateNodeInfo() to return the template of ResourceSlice.
This is to address the DRA expansion of Cluster Autoscaler, allowing users to set the number of GPUs and DRA driver name by specifying
the annotation to NodeGroup provided by cluster-api.
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
2025-02-12 11:56:04 +09:00
Rachel Gregory
b66b44621e
Update skewer version on master branch
2025-02-11 14:31:35 -08:00
Karsten van Baal
65c14d5526
added possibility to retrieve hcloud cluster config from file
2025-02-10 16:41:39 +01:00
Jeremy L. Morris
c87e68c01d
add emeritus approvers section
2025-02-07 13:06:57 -05:00
Jeremy L. Morris
d8f60fe19e
Removes stale owners from DO cluster autoscaler owners file and updates to current DO employee
2025-01-30 13:10:27 -05:00
Ori Hoch
83ce5dd13c
fix tag name attribute
2025-01-24 15:24:26 +02:00
Ori Hoch
3c187264cd
add retry mechanism
2025-01-24 15:15:59 +02:00
Ori Hoch
1d42eb55ea
shorten the uuids
2025-01-24 13:20:03 +02:00
Ori Hoch
cdb90ec4d5
tags fixes and add support for filter name prefix
2025-01-24 12:47:23 +02:00
Robin D.
64ca097c1e
fix: undefined instance state on provisioning state failed ( #7750 )
...
* fix: undefined instance state on provisioning state failed
* test: add unit tests for provisioning state failed + fast delete
* test: support both fast/not fast delete on an affected test
2025-01-22 23:08:37 -08:00
Robin D.
9559204f61
test: add additional assertion for dynamic SKU list test ( #7737 )
2025-01-22 13:22:38 -08:00
Robin D.
03e6b2797d
chore: remove unnecessary logs on fast delete and add a relevant note ( #7736 )
2025-01-22 13:20:38 -08:00
Robin D.
7e8c41d175
test: clean up environments properly before/after each unit test in azure_manager_test.go ( #7735 )
...
* test: clean up environments properly before/after each unit test in azure_manager_test.go
* test: use testing.Cleanup() to ensure loadEnv()
2025-01-22 12:18:37 -08:00
Kubernetes Prow Robot
082e230b92
Merge pull request #7391 from jackfrancis/ca-cloudprovider-build-tags-hygiene
...
add test-build-tags make target
2025-01-17 02:14:06 -08:00
Kubernetes Prow Robot
027795a97c
Merge pull request #7339 from justinmir/kwok-provider-metrics-annotation
...
Add metrics-server annotation for kwok-provider managed nodes
2025-01-17 01:50:07 -08:00
Robin Deeboonchai
97dd5fe4ee
fix: don't crash when vmss not present or has no nodes
2025-01-16 14:57:20 -08:00
Kubernetes Prow Robot
ea52310b69
Merge pull request #6890 from b0e/implement-templateNodeInfo-for-cloudprovider-magnum
...
Implement TemplateNodeInfo for magnum cloudprovider
2025-01-16 03:00:33 -08:00
Kubernetes Prow Robot
03e2795c9f
Merge pull request #7405 from ctrox/rancher-clarify-docs
...
docs(rancher): clarify single RKE2 target
2024-12-30 02:14:14 +01:00
Kubernetes Prow Robot
38facfc3dd
Merge pull request #7633 from PerforMance308/master
...
remove contact information for huaweicloud cluster autoscaler provider
2024-12-27 14:12:12 +01:00
Kubernetes Prow Robot
50c65906fd
Merge pull request #7530 from towca/jtuznik/dra-actual
...
CA: DRA integration MVP
2024-12-20 16:30:08 +01:00
Kuba Tużnik
a45e6b7003
CA: implement DRA integration tests for StaticAutoscaler
2024-12-20 13:30:36 +01:00
Shiqi Wang
11740d1398
remove contact information
2024-12-19 09:23:05 -05:00
Muhammad Soliman
dd6f11b10e
Merge branch 'kubernetes:master' into prefixed_extended_resources
2024-12-18 10:17:16 +01:00
Kubernetes Prow Robot
da31dff7a6
Merge pull request #7614 from DataDog/update-azure-instance-types
...
update azure static sku list
2024-12-17 20:54:52 +01:00
Rahul Rangith
6ab0eb94f7
update azure static sku list
2024-12-16 15:01:28 -05:00
Walid Ghallab
720f5946fd
Refactor NewAutoscalerError function.
...
We will have two functions instead of one:
1. One that doesn't do formatting, like klog.Error
2. One that accepts formating, like klog.Errorf
The main reason behind this is to avoid go vet errors and have clear
interfaces to catch accidental bugs and rely on go vet to catch those
accidental bugs (or go test in go 1.24, as those are treated as errors).
2024-12-16 17:46:40 +00:00
Kubernetes Prow Robot
148ffa345b
Merge pull request #7520 from hetznercloud/refactor-placement-groups
...
refactor(hetzner): refactored placement group code
2024-12-16 13:36:51 +01:00
Muhammad Soliman
2b62a7d6df
Add option for passing extended resources in node labels in GCE
...
on GCE, Cluster atuoscaler reads extended resource information from kubenv->AUTOSCALER_ENV_VARS->extended_resources in the managed scaling group template definition.
However, users have no way to add a variable to extended resources, they are controlled from GKE side. This results in cluster autoscaler not supporting scale up from zero for all node pools that has extended resources (like GPU) on GCE.
However, node labels are passed from the node pool to the managed scaling group template through the kubenv->AUTOSCALER_ENV_VARS->node_labels.
This commit introduces the ability to pass extended resources as node labels with defined prefix on GCE, similar to how cluster autoscaler expects extended resources on AWS. This allows scaling from zero for node pools with extended resrouces.
2024-12-13 13:39:12 +01:00
lukasmetzner
d68a1f26b1
refactor: moved error checking with exiting to callsite
2024-12-13 11:57:51 +01:00
Alex Leites
61c8cdeff7
fix: corresponding test
2024-12-08 02:22:02 +00:00
Alex Leites
5e7ceee507
fix: setting getVmssSizeRefreshPeriod
2024-12-08 01:23:04 +00:00
Kubernetes Prow Robot
bd7156e837
Merge pull request #7557 from gvnc/handle-ooh-capacity-nodes
...
Avoid making delete api calls for nodes that don't have an instance id
2024-12-06 22:48:01 +00:00
“gkazanci”
660f1aa6cd
added more logs
2024-12-03 17:03:56 +00:00
willie-yao
064d48f36c
Add toggle for fast delete
2024-11-26 00:25:04 +00:00
Kubernetes Prow Robot
86a80c6823
Merge pull request #7526 from willie-yao/cse-fast-delete
...
Set node state to InstanceCreating to delete on CSE error
2024-11-26 00:20:57 +00:00
willie-yao
49a1ad4ad2
Set node state to InstanceCreating to delete on CSE error
2024-11-23 00:25:12 +00:00
Jack Francis
f1a1bab379
add test-build-tags make target
...
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2024-11-22 09:16:23 -08:00
lukasmetzner
64495d95a0
refactor(hetzner): refactored placement group code
2024-11-22 13:28:52 +01:00
Kubernetes Prow Robot
5458e1c208
Merge pull request #7436 from maximrub/fr-7435-alibaba-cloud-rrsa-new-env-vars
...
7435 Support New Alibaba Cloud ENV Variables names for RRSA Authorization
2024-11-22 10:30:54 +00:00
Kubernetes Prow Robot
4c37ff38ce
Merge pull request #6999 from dominic-p/iss-5919-placement-groups
...
Add support for node pool placement group config
2024-11-20 13:04:53 +00:00
Kubernetes Prow Robot
a01276ef14
Merge pull request #7493 from BigDarkClown/remove-unneeded
...
Add flag to force remove long unregistered nodes
2024-11-19 10:00:55 +00:00
Kubernetes Prow Robot
2d37aeefe8
Merge pull request #7385 from jlamillan/jlamillan/oci_sdk_65.75.2-2
...
Upgrade OCI providers SDK to v65.75.2.
2024-11-18 23:54:54 +00:00
Bartłomiej Wróblewski
c5f13bb02d
Add ForceDeleteNodes implementation for GCE cloud provider
2024-11-18 13:55:09 +00:00
Bartłomiej Wróblewski
3b47908e51
Add ForceDeleteNodes method to NodeGroup interface
2024-11-18 13:55:07 +00:00
Maxim Rubchinsky
dcd6d6ab36
7435 Support New Alibaba Cloud ENV Variables names for RRSA Authorization in Cluster Autoscaler
...
Signed-off-by: Maxim Rubchinsky <maxim@rubchinsky.com>
2024-11-16 11:58:54 +02:00
Kubernetes Prow Robot
b01bff1640
Merge pull request #7453 from gvnc/oci-self-managed-nodes-fix
...
exclude self-managed nodes from being processed
2024-11-15 23:32:53 +00:00
Kubernetes Prow Robot
009f2b8b16
Merge pull request #7438 from maximrub/bug-7437-alibaba-cloud-endpoint-reloving-logging
...
7437 Add logging for endpoint resolving errors
2024-11-15 10:10:52 +00:00
Kubernetes Prow Robot
267a0d8a98
Merge pull request #7459 from damikag/update-bootdisk-logs
...
Change log level of boot dist type and size defaulting in gce_price
2024-11-15 09:54:53 +00:00
Kubernetes Prow Robot
59aefbcd5e
Merge pull request #7379 from ionos-cloud/remove-obsolete-upper-bound-check
...
Remove obsolete upper bound check
2024-11-12 19:28:46 +00:00
Kubernetes Prow Robot
93f74c0948
Merge pull request #7481 from jackfrancis/vmss-proactive-deleting
...
azure: StrictCacheUpdates to disable proactive vmss cache updates
2024-11-11 18:52:46 +00:00
Kubernetes Prow Robot
c9970a48ec
Merge pull request #7383 from DataDog/fix-instance-requirements-caching
...
AWS: only cache instance requirements when needed
2024-11-11 14:22:46 +00:00
Jack Francis
1e5ed185d7
restore original behavior
...
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2024-11-10 20:47:22 -08:00
Jack Francis
c20971357f
azure: don’t eagerly update vmss cache before delete success
...
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2024-11-08 16:54:38 -08:00
Achim Ledermüller
a249ca9290
Implement TemplateNodeInfo for magnum cloudprovider
2024-11-07 17:04:37 +01:00
Kubernetes Prow Robot
0e8545325a
Merge pull request #7113 from IrisIris/feature/compatible-with-alicloud-desire-size
...
add support to scaling group desired size for alicloud
2024-11-07 10:43:29 +00:00