Commit Graph

5067 Commits

Author SHA1 Message Date
Maksym Fuhol 6cbf801235 Patch TestCleaningSoftTaintsInScaleDown to be compatible with new isScaleDownInCooldown signature. 2025-04-15 10:02:44 +00:00
Kubernetes Prow Robot 18f10c1e00
Merge pull request #7997 from damikag/scale-down-slo-update-metric
Emit scale down metric even when there is no scale down candidates.
2025-04-14 13:13:05 -07:00
Kubernetes Prow Robot 25ad4c2c26
Merge pull request #8011 from jinglinliang/allow-third-party-sts-to-drain
Allow draining when StatefulSet kind has custom API Group
2025-04-11 13:04:41 -07:00
jinglinliang 25af21c515 Add unit test to allow draining when StatefulSet kind has custom API Group 2025-04-09 14:03:00 -07:00
jinglinliang cc3a9f5d10 Allow draining when StatefulSet kind has custom API Group 2025-04-09 14:03:00 -07:00
Kubernetes Prow Robot 87a67e3aa0
Merge pull request #7995 from abdelrahman882/cleaningSoftTaintsTesting
Add unit test for cleaning deletion soft taint in scale down cool down
2025-04-09 10:48:39 -07:00
Omran dd125d4ef1
Add unit test for cleaning deletion soft taints in scale down cool down 2025-04-09 08:21:49 +00:00
Daniel Kłobuszewski f1a44d89cf
Remove outdated GCE cloudprovider owners 2025-04-08 13:24:20 +02:00
Kubernetes Prow Robot 4bc861d097
Merge pull request #7923 from Uladzislau97/nap-resilience
Improve resilience of diskTypes requests.
2025-04-08 04:22:40 -07:00
Kubernetes Prow Robot 7c28f52f93
Merge pull request #7854 from AppliedIntuition/master
Fix 2 bugs in the OCI integration
2025-04-07 09:14:42 -07:00
Vlad Vasilyeu 93e21d05e2 Replace diskTypes.aggregatedList request with diskTypes.list in FetchAvailableDiskTypes. 2025-04-07 07:50:29 +00:00
Kubernetes Prow Robot 1de2160986
Merge pull request #7908 from Preisschild/fix/capi-patch-instead-update
CA: Use Patch to Scale clusterapi nodepools
2025-04-03 07:16:48 -07:00
Kubernetes Prow Robot dc91330f6a
Merge pull request #7989 from loick111/feature/clusterapi-instances-status
ClusterAPI: Report machine phases to improve cluster-autoscaler decisions
2025-04-01 07:44:38 -07:00
Florian Ströger ecb572a945 Use Patch to Scale clusterapi nodepools to avoid modification conflicts
Issue: https://github.com/kubernetes/autoscaler/issues/7872
Signed-off-by: Florian Ströger <stroeger@youniqx.com>
2025-04-01 08:26:45 +02:00
Damika Gamlath 49b271f75a Emit scale down metric even when there is no scale down candidates.
Update scale scaleDownInCooldown definition to skip considering zero candidates as a reason to be in scaleDownInCooldown state
2025-03-31 14:46:23 +00:00
Loick MAHIEUX 005a42b9af feat(cluster-autoscaler): improve nodes listing in ClusterAPI provider
Add improved error handling for machines phase in the ClusterAPI node group
implementation. When a machine is in Deleting/Failed/Pending phase, mark the cloudprovider.Instance
with a status for cluster-autoscaler recovery actions.

The changes:
- Enhance Nodes listing to allow reporting the machine phase in Instance status
- Add error status reporting for failed machines

This change helps identify and manage failed machines more effectively,
allowing the autoscaler to make better scaling decisions.
2025-03-28 15:07:34 +01:00
Kubernetes Prow Robot db597b1acd
Merge pull request #7966 from pmendelski/htnap-events-for-tpu
Emit event on successful async scale-up
2025-03-27 02:32:34 -07:00
Kubernetes Prow Robot 7b6996469b
Merge pull request #7973 from jincong8973/master
feat: add ignoreDaemonSetsUtilization and zeroOrMaxNodeScaling to NodeGroupAutoscalingOptions
2025-03-27 00:00:35 -07:00
KrJin e713b51bd6 feat: add missing field zeroOrMaxNodeScaling and ignoreDaemonSetsUtilization to NodeGroupAutoscalingOptions
[squashed]Add field IgnoreDaemonSetsUtilization and zeroOrMaxNodeScaling that missing in externalgrpc proto
2025-03-27 11:28:12 +08:00
Kubernetes Prow Robot 2ca5b44652
Merge pull request #7977 from elmiko/refactor-findscalableproviderids
refactor findScalableResourceProviderIDs in clusterapi
2025-03-26 10:22:43 -07:00
elmiko 5e1fc195a3 refactor findScalableResourceProviderIDs in clusterapi
this change refactors the function so that it each distinct machine
state can be filtered more easily. the unit tests have been
supplemented, but not changed to ensure that the functionality continues
to work as expected. these changes are to help better detect edge cases
where machines can be transiting through pending phase and might be
removed by the autoscaler.
2025-03-26 12:41:09 -04:00
mendelski 0c522556c5
Emit event on successful async scale-up 2025-03-26 13:11:03 +00:00
Kubernetes Prow Robot 63309979ba
Merge pull request #7826 from Azure/rakechill/update-skewer-version-master
Update skewer version to v0.0.19 (master)
2025-03-26 01:30:34 -07:00
Kubernetes Prow Robot e95e35c94e
Merge pull request #7965 from DigitalVeer/master
pricing changes: updated z3 pricing information
2025-03-25 10:48:33 -07:00
Kubernetes Prow Robot 52cd68a498
Merge pull request #7954 from abdelrahman882/FixScaledownCoolDown
Fix cool down status condition to trigger scale down
2025-03-24 07:38:33 -07:00
Omran 696af986ed
Add time based drainability rule for non-pdb-assigned system pods 2025-03-24 12:47:16 +00:00
Veer Singh a226478f53 pricing changes: updated z3 pricing information 2025-03-24 04:06:26 +00:00
eric-higgins-ai 8da9a7b4af add log messages 2025-03-21 14:02:10 -07:00
eric-higgins-ai 370c8eb78e Revert "Address comment"
This reverts commit 233d5c6e4d.
2025-03-21 13:58:56 -07:00
Omran 2bbe859154
Fix cool down status condition to trigger scale down 2025-03-21 10:21:00 +00:00
Kubernetes Prow Robot 990ab04d85
Merge pull request #7949 from ystryuchkov/master
Fix log for node filtering in static autoscaler
2025-03-21 01:58:31 -07:00
Kubernetes Prow Robot 10bb546f9e
Merge pull request #7944 from norbertcyran/proactive-scale-up-sample-scheduled
Allow using scheduled pods as samples in proactive scale up
2025-03-20 07:06:33 -07:00
Yahia Badr 5268053d1e
Update default value for scaleDownDelayAfterDelete (#7957)
* Update default value for scaleDownDelayAfterDelete

Setting defaut value for scaleDownDelayAfterDelete to be scanInterval
instead of 0.

* Revert the change and fix the flag description
2025-03-20 07:04:32 -07:00
Jack Francis 7b5e10156e s/nodeHasValidProviderID/isProviderIDNormalized
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2025-03-19 12:30:33 -07:00
Jack Francis 4aa465764c capi: node and provider ID accounting funcs
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2025-03-19 11:40:19 -07:00
elmiko 71d3595cb7 improve failed machine detection in clusterapi
This change makes it so that when a failed machine is found during the
`findScalableResourceProviderIDs` it will always gain a normalized
provider ID with failure guard prepended. This is to ensure that
machines which have gained a provider ID from the infrastructure and
then later go into a failed state can be properly removed by the
autoscaler when it wants to correct the size of a node group.
2025-03-19 12:34:29 -04:00
Yuriy Stryuchkov 105429c31e Fix log for node filtering in static autoscaler
Add missing tests
2025-03-19 15:49:34 +01:00
Norbert Cyran 9a5e3d9f3d Allow using scheduled pods as samples in proactive scale up 2025-03-19 12:33:39 +01:00
elmiko 003e6cd67c make DecreaseTargetSize more accurate for clusterapi
this change ensures that when DecreaseTargetSize is counting the nodes
that it does not include any instances which are considered to be
pending (i.e. not having a node ref), deleting, or are failed. this change will
allow the core autoscaler to then decrease the size of the node group
accordingly, instead of raising an error.

This change also add some code to the unit tests to make detection of
this condition easier.
2025-03-17 19:34:07 -04:00
Kubernetes Prow Robot 214215f320
Merge pull request #7918 from x13n/master
Fix incorrect usage of klog Warningf function
2025-03-13 06:11:48 -07:00
Daniel Kłobuszewski bac35046fb Fix incorrect usage of klog Warningf function
The .*f variants should only ever be called with arguments to format.
This should've really been a part of
https://github.com/kubernetes/autoscaler/pull/7917
2025-03-13 13:50:39 +01:00
Kubernetes Prow Robot bcbc466e4d
Merge pull request #7917 from x13n/master
Fix incorrect usage of klog .*f functions
2025-03-13 05:45:47 -07:00
Daniel Kłobuszewski 780e68f6d2 Fix incorrect usage of klog .*f functions
The .*f variants should only ever be called with arguments to format.
2025-03-13 13:24:52 +01:00
Joel Smith bef1f89a76 Update to golang.org/x/oauth2@v0.27 to fix CVE-2025-22868
Signed-off-by: Joel Smith <joelsmith@redhat.com>
2025-03-11 16:56:12 -06:00
Yahia Naguib 241ad7af1e
update address description 2025-03-10 14:25:44 +00:00
Yahia Naguib 738d7dd16d
Migrating flags off main.go to a separate package 2025-03-07 21:11:30 +00:00
Yahia Naguib 57519980c4
Migrating flags off main.go to a separate package 2025-03-07 21:11:29 +00:00
Yahia Naguib 3e9d11b732
Migrating flags off main.go to a separate package 2025-03-07 21:11:27 +00:00
Kubernetes Prow Robot 173a4bde19
Merge pull request #7897 from mtrqq/bug/block-until-resource-caches-are-synced
Block cluster autoscaler until API resource caches are synced.
2025-03-07 06:19:45 -08:00
Maksym Fuhol 24f68f98e2 Block cluster autoscaler until API resource caches are synced. 2025-03-07 13:48:26 +00:00