Commit Graph

196 Commits

Author SHA1 Message Date
Manoj 126b76d130
Utho autoscaler (#8398)
* cloudprovider: init Utho cloud provider

* add utho cloud provider logic

* implement tests and mock client for Utho cloud provider functionality

* enhance Utho cloud provider: improve logging, add ReadNodePool method, and restore toProviderID function

* add unit tests for Utho cloud provider node group functionality

* add deployment and secret configuration for Utho cloud provider

* remove outdated Utho Go module versions from go.sum

* add stress-test deployment example, improve error messages, and enhance node group tests

* update utho autoscaler image version to 1.0.0

* refactor: reorder parameters in setupMockListNodePools

* fix: correct license formatting in utho_cloud_provider_test.go and add license header to utils.go

* fix: comment out approvers and reviewers in OWNERS file

* Remove utho-go as dependency

* add comments to clarify structures and services in Utho API

* add comments to Utho SDK

* add comments

* remove unnecessary comments in cloud_instances.go and kubernetes.go

* Revert changes to go.mod

* Revert changes to go.mod

* ensure newline at end of go.mod and go.sum files

---------

Co-authored-by: hmada15 <31375621+hmada15@users.noreply.github.com>
Co-authored-by: m-kased <31375621+m-kased@users.noreply.github.com>
2025-08-11 09:35:08 -07:00
Maciej Skoczeń 90eabc6a4d Differentiate provisioning requests using Parameters field. Keep prefixing as not recommended approach 2025-03-04 11:41:51 +00:00
Maciej Skoczeń 9cac6a49d1 Update FAQ to reflect recent changes in ProvisioningRequests processing 2025-03-04 10:48:21 +00:00
Julien Francoz f8a68efe63 change default cluster-autoscaler to least-waste
With the previous default of random, this could lead to start very expansives nodes that the cluster autoscaler does not manage to remove as long as another smaller node is started.
2025-01-20 21:47:13 +01:00
rainfd b45735d63f Add script to update cluster-autoscaler flags doc 2024-12-28 06:39:58 +08:00
Kubernetes Prow Robot 95ec5f3f1e
Merge pull request #7598 from yaroslava-serdiuk/faq-update
Add note for --max-nodes-per-scaleup flag for best-effort-atomic
2024-12-13 15:32:26 +01:00
Yaroslava Serdiuk 01dce2e54c Add note for --max-nodes-per-scaleup flag for best-effort-atomic provreq class 2024-12-12 15:10:15 +00:00
Michael Grosser d65ac6445f
document scale-down-gpu-utilization-threshold
Signed-off-by: Michael Grosser <michael@grosser.it>
2024-12-06 12:27:06 -08:00
Devansh Das b78b806cbc Add information about usage of batch processing for check capacity in FAQ 2024-11-08 09:33:59 +00:00
Michael Grosser 5ea87e3a1d
Update cluster-autoscaler/FAQ.md
Co-authored-by: Shubham <shubham.kuchhal@india.nec.com>
2024-09-20 10:54:12 -05:00
Michael Grosser d7b84a40b4
expand docs for ProvisioningRequest 2024-09-19 22:29:04 -07:00
Yaroslava Serdiuk 23f29fbed3 Add documentation regarding best-effort-atomic-scale-up ProvReq class 2024-09-11 13:12:09 +00:00
Ayush Sobti 8b4a278d91
Fix incorrect wording in FAQ.md
Correct wording would be "above the maximum" which then goes on to explain how the CA handles scaling down a nodepool that is currently above the maxNodes value
2024-08-21 11:55:29 -07:00
Rachel Gregory 5ec21a1ddb
Update FAQ.md to add steps for self-hosted provreq (#7092)
* Update FAQ.md to add steps for self-hosted provreq

Adding more observed requirements to use the ProvisioningRequest feature.

* fix typo in provreq flag

* Add clusterrolebinding
2024-08-21 07:23:08 +01:00
Daniel Kłobuszewski c3e0a15824
Update FAQ.md
Update args to match update-deps.sh usage

API module may have different k8s dependency from CA itself.
2024-07-15 12:48:25 +00:00
Daniel Kłobuszewski 116035ed11
Update FAQ.md
Update CA FAQ to point to the new location of dependency update script.
2024-07-15 12:42:20 +00:00
Kubernetes Prow Robot 3c6dd26d9e
Merge pull request #6863 from rrangith/azure-default-sizes
Default min/max sizes for Azure VMSSs
2024-07-01 07:29:35 -07:00
Yaroslava Serdiuk a9fe7e302a
Add documentation for ProvisioningRequests (#6904) 2024-06-20 05:12:16 -07:00
Rahul Rangith 333d438dbf
Default min/max sizes for Azure VMSSs
return a struct
2024-06-11 13:55:38 -04:00
uucloud 4aeee374af docs:fix broken hyperlink in cluster-autoscaler FAQ.md 2024-05-09 15:54:39 +08:00
Jordan Rodgers dc99ab396f
add least-nodes expander to cluster-autoscaler 2024-05-03 14:01:52 -07:00
Kuba Tużnik 2cd8e16df8
CA FAQ: clarify the point about scheduling constraints blocking scale-down 2024-02-26 21:15:24 +01:00
faan11 6ec725638c docs: clarifies scale down operation by CA in FAQ.md and main.go
This commit clarifies the condition when a node can be scaled down by the Cluster Autoscaler (CA).
The changes updates the section and flag description in the FAQ.md and main.go files.
2024-01-07 23:36:53 +01:00
Kubernetes Prow Robot c068feb5f0
Merge pull request #6218 from piotrwrotniak/adddocs
Documents startup/status/ignore node taints.
2023-10-24 11:55:20 +02:00
Piotr Wrótniak 6fd2cb5f09 Documents startup/status/ignore node taints. 2023-10-23 15:32:15 +00:00
lisenet ddaa9f0121
Add debugging-snapshot-enabled back 2023-10-20 14:51:44 +01:00
lisenet d532844bac
Add node-delete-delay-after-taint to FAQ 2023-10-20 14:45:43 +01:00
Kubernetes Prow Robot 574c534e4f
Merge pull request #6035 from madufresneelastic/json-logging-support
Add support for json logging format
2023-08-28 08:15:44 -07:00
Marc-Andre Dufresne 55c92e1025 add json logging support 2023-08-28 10:32:39 -04:00
Yash Khare 0ed75e4490
fix: Broken links to testgrid dashboard 2023-08-15 12:28:07 +05:30
droctothorpe ecfdc21f09 Fix broken hyperlink
Co-authored-by: Shubham <shubham.kuchhal@india.nec.com>
2023-07-28 00:11:26 -04:00
Håkon Solbjørg 8094f9aac1
docs(ca/faq): Include examples of local volumes and volumes not considered local 2023-05-23 16:54:55 +02:00
Kubernetes Prow Robot 1009797f55
Merge pull request #5594 from vadasambar/feat/3947/ignore-some-local-storage-volumes
feat: add annotation to ignore local storage volume during scale down
2023-04-17 02:16:44 -07:00
vadasambar b663f138a4 feat: add annotation to ignore local storage volume during scale down
- this is so that scale down is not blocked on local storage volume
- for pods where it is okay to ignore local storage volume
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing
- there was a problem in the logic
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add unit test for `IgnoreLocalStorageVolumeKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `IgnoreLocalStorageVolumeKey`  in tests instead of hardcoding the annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: wording for test name
- `pod with EmptyDir but IgnoreLocalStorageVolumeKey annotation` -> `pod with EmptyDir and IgnoreLocalStorageVolumeKey annotation`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: simulator drain tests failing
- set local storage vol name (required)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add support for multiple vals in `safe-to-evict-local-volume` annotation
- add more unit tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename ignore local vol key `safe-to-evict-local-volume` -> `safe-to-evict-local-volumes`
- abtract code to process annotation into a separate fn
- shorten name for test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update FAQ with info about `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add the FAQ for `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix formatting for `safe-to-evict-local-volumes` in FAQ
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: format the `safe-to-evict-local-volumes` as a bullet
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix `Unless` -> `unless` to make it consistent with other lines
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add an extra test for mismatching local vol value in annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: make the wording clearer
- for `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-04-17 09:53:19 +05:30
Julian Tölle 2f5ea34f86
docs: fix invalid flag name 2023-03-30 17:16:08 +02:00
Kubernetes Prow Robot b8ba2334e4
Merge pull request #5507 from vadasambar/feature/5387/allow-scale-down-with-custom-controller-pods-2
feat: check only controller ref to decide if a pod is replicated
2023-03-24 02:56:31 -07:00
shubham82 1ea7fb0ce5 link DaemonSet and Mirror Pods to k8s docs. 2023-03-23 13:34:07 +09:00
vadasambar ff6fe5833d feat: check only controller ref to decide if a pod is replicated
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit 144a64a402)

fix: set `replicated` to true if controller ref is set to `true`
- forgot to add this in the last commit

Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit f8f458295d)

fix: remove `checkReferences`
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

(cherry picked from commit 5df6e31f8b)

test(drain): add test for custom controller pod
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add flag to allow scale down on custom controller pods
- set to `false` by default
- `false` will be set to `true` by default in the future
- right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true`
- TODO: this code might need some unit tests. Look into adding unit tests.
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: remove `at` symbol in prefix of `vadasambar`
- to keep it consistent with previous such mentions in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(utils): run all drain tests twice
- once for  `allowScaleDownOnCustomControllerOwnedPods=false`
- and once for `allowScaleDownOnCustomControllerOwnedPods=true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs(utils): add description for `testOpts` struct
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(utils/drain): fix failing tests
- refactor code to add cusom controller pod test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix long code comments
- clean-up print statements
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move `expectFatal` right above where it is used
- makes the code easier to read
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix code comment wording
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: address PR comments
- abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future
- fix param info in the FAQ.md
- simplify tests and remove the global variable used in the tests
- rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods`
- refactor tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update flag info
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: forgot to change flag name on a line in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `ControllerRef()` directly instead of `controllerRef`
- we don't need an extra variable
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: create tests consolidated test cases
- from looping over and tweaking shared test cases
- so that we don't have to duplicate shared test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: append test flag to shared test description
- so that the failed test is easy to identify
- shallow copy tests and add comments so that others do the same
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-03-22 10:51:07 +05:30
shubham82 b849ddf30e Improvement: Added the Link for Mirror pods. 2023-03-22 12:51:31 +09:00
Kubernetes Prow Robot 3ac07e7ce5
Merge pull request #5593 from zendesk/grosser/doc
docs: fix faq around expendable-pods-priority-cutoff
2023-03-16 03:31:17 -07:00
Michael Grosser acdeb92a15
docs: fix faq around expendable-pods-priority-cutoff 2023-03-13 17:19:17 -07:00
Guy Templeton 9c7f989245
CA - Document Debugging Snapshotter flag 2023-03-09 12:45:30 +00:00
cpanato 665af54f1b
update FQA to add version in the pause container image due the latest that is not valid
Signed-off-by: cpanato <ctadeu@gmail.com>
2023-02-20 10:51:33 +01:00
salasberryfin 66e1eeb7c6 update image references from k8s.gcr.io to registry.k8s.io 2023-02-14 08:15:16 +01:00
Junwon Kwon b89d2c1a89
fix typo in FAQ 2022-12-24 00:33:56 +09:00
Daniel Kłobuszewski 735cf98ec7
Add missing dot 2022-12-05 15:23:32 +01:00
McGonigle, Neil bcc06452e2 fix issue 5332 - adding suggestied change 2022-12-05 09:32:07 +00:00
McGonigle, Neil 5e74894bfb fix issue 5332 2022-12-02 11:14:39 +00:00
Michael McCune d20dbb86c2 add logging information to FAQ
this change adds a section about how to increase the logging verbosity
and why you might want to do that.
2022-11-17 12:18:21 -05:00
Xintong Liu 524886fca5 Support scaling up node groups to the configured min size if needed 2022-11-02 21:47:00 -07:00