Commit Graph

149 Commits

Author SHA1 Message Date
Yaroslava Serdiuk dffff4f557
Add ProvisioningRequest injector (#6529)
* Add ProvisioningRequests injector

* Add test case for Accepted conditions and add supported provreq classes list

* Use Passive clock
2024-02-28 02:14:49 -08:00
Yaroslava Serdiuk 5286b3f770
Add ProvisioningRequestProcessor (#6488) 2024-02-14 05:14:46 -08:00
Daniel Kłobuszewski a842d4f108 Reduce log spam in AtomicResizeFilteringProcessor
Also, introduce default per-node logging quotas. For now, identical to
the per-pod ones.
2024-02-07 12:01:05 +01:00
Yaroslava Serdiuk ed6ebbe8ba
ScaleUp for check-capacity ProvisioningRequestClass (#6451)
* ScaleUp for check-capacity ProvisioningRequestClass

* update condition logic

* Update tests

* Naming update

* Update cluster-autoscaler/core/scaleup/orchestrator/wrapper_orchestrator_test.go

Co-authored-by: Bartek Wróblewski <bwroblewski@google.com>

---------

Co-authored-by: Bartek Wróblewski <bwroblewski@google.com>
2024-01-30 02:36:59 -08:00
Guo Peng 68e661f1ed feat:add node group health and back off metrics 2024-01-23 19:39:18 +08:00
vadasambar 5de49a11fb feat: support `--scale-down-delay-after-*` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: update scale down status after every scale up
- move scaledown delay status to cluster state/registry
- enable scale down if  `ScaleDownDelayTypeLocal` is enabled
- add new funcs on cluster state to get and update scale down delay status
- use timestamp instead of booleans to track scale down delay status
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use existing fields on clusterstate
- uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus`
- changed the above existing fields a little to make them more convenient for use
- moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove note saying only `scale-down-after-add` is supported
- because we are supporting all the flags
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add`
- because it is not true anymore
- we are supporting all `--scale-down-delay-after-*` flags per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix clusterstate tests failing
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move back initializing processors logic to from static autoscaler to main
- we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors
- and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: revert changes related to `clusterstate`
- since I am going with observer pattern
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add observer interface for state of scaling
- to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same)
- refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove params passed to `clearScaleUpFailures`
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: revert clusterstate tests
- approach has changed
- I am not making any changes in clusterstate now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add accidentally deleted lines for clusterstate test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: implement `Add` fn for scale state observer
- to easily add new observers
- re-word comments
- remove redundant params from `NewDefaultScaleDownCandidatesProcessor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: CI complaining because no comments on fn definitions
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: initialize parent `ScaleDownCandidatesProcessor`
- instead  of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add scale state notifier to list of default processors
- initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn
- this allows more flexibility
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add observer interface
- create a separate observer directory
- implement `RegisterScaleUp` function in the clusterstate
- TODO: resolve syntax errors
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: use `scaleStateNotifier` in place of `clusterstate`
- delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory)
- register `clustertstate` with `scaleStateNotifier`
- use `Register` instead of `Add` function in `scaleStateNotifier`
- fix `go build`
- wip: fixing tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix syntax errors
- add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip track scale down failures along with scale up failures
- I was tracking scale up failures but not scale down failures
- fix copyright year 2017 -> 2023 for the new `pointers` package
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: register failed scale down with scale state notifier
- wip writing tests for `scale_down_candidates_delay_processor`
- fix CI lint errors
- remove test file for `scale_down_candidates_processor` (there is not much to test as of now)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: wip tests for `ScaleDownCandidatesDelayProcessor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add unit tests for `ScaleDownCandidatesDelayProcessor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor`
- not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: better doc comments for `TestGetScaleDownCandidates`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't ignore error in `NGChangeObserver`
- return it instead and let the caller decide what to do with it
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: change pointers to values in `NGChangeObserver` interface
- easier to work with
- remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now)
- add tests for clusterstate's `RegisterScaleUp`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: conditions in `GetScaleDownCandidates`
- set scale down in cool down if the number of scale down candidates is 0
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: use `ng1` instead of `ng2` in existing test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip static autoscaler tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: assign directly instead of using `sdProcessor` variable
- variable is not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: first working test for static autoscaler
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: continue working on static autoscaler tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: wip second static autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `Println` used for debugging
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add static_autoscaler tests for scale down delay per nodegroup flags
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: rebase off the latest `master`
- change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix clusterstate test failing
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing orchestrator test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor`
- describes the processor better
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver`
- makes it easier to understand for someone not familiar with the codebase
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: reword code comment `after` -> `for which`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't return error from `RegisterScaleDown`
- not needed as of now (no implementer function returns a non-nil error for this function)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: address review comments around ng change observer interface
- change dir structure of nodegroup change observer package
- stop returning errors wherever it is not needed in the nodegroup change observer interface
- rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: make nodegroupchange observer thread-safe
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add TODO to consider using multiple mutexes in nodegroupchange observer
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `time.Now()` directly instead of assigning a variable to it
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: share code for checking if there was a recent scale-up/down/failure
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: convert `ScaleDownCandidatesDelayProcessor` into table tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()`
- makes it easier to understand what the function does
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: replace scale state notifier `Register` -> `RegisterForNotifications` in test
- to fix syntax errors since it is already renamed in the actual code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `clusterStateRegistry` from `delete_in_batch` tests
- not needed anymore since we have `scaleStateNotifier`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: address PR review comments
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: add empty `RegisterFailedScaleDown` for clusterstate
- fix syntax error in static autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2024-01-11 21:46:42 +05:30
Yaroslava Serdiuk d29ffd03b9
Add ProvisioningRequestPodsFilter processor (#6386)
* Introduce ProvisioningRequestPodsFilter processor

* Review
2024-01-03 11:49:36 +01:00
Walid Ghallab 11a084699c Convert status in cluster-autoscaler-status to yaml and add error info for backoff and more node counts.
Change-Id: Ic68e0d67b7ce9912b605b6c0a3356b4d0e177911
2023-12-28 18:52:55 +00:00
Bartłomiej Wróblewski 81a4721f51 Extend BinpackingLimiter interface 2023-10-06 13:53:33 +00:00
Artem Minyaylov a68b748fd7 Refactor NodeDeleteOptions for use in drainability rules 2023-09-29 17:55:19 +00:00
Karol Wychowaniec 11e03a814a Fix processors order in tests and add a comment explaining this order 2023-09-13 14:26:31 +00:00
Kubernetes Prow Robot 72f8a3bafa
Merge pull request #6103 from kawych/composite_proc
Refactor ScaleDownSet processor into a composite processor
2023-09-13 04:18:12 -07:00
Piotr ffe6537163 Unifies pod listing. 2023-09-11 17:11:11 +00:00
Karol Wychowaniec ea94a2b343 Refactor ScaleDownSet processor into a composite processor 2023-09-11 16:19:09 +00:00
Julian Tölle 83134923f4
fix: scale down broken for providers not implementing NodeGroup.GetOptions() 2023-08-15 12:42:05 +02:00
Bartłomiej Wróblewski e39d1b028d Clean up NodeGroupConfigProcessor interface 2023-08-04 16:00:50 +00:00
Kubernetes Prow Robot c6893e9e28
Merge pull request #5672 from vadasambar/feat/5399/ignore-daemonsets-utilization-per-nodegroup
feat: set `IgnoreDaemonSetsUtilization` per nodegroup for AWS
2023-07-12 07:43:12 -07:00
Damika Gamlath 0f8502c623 Refactor autoscaler.go and static_autoscalar.go to move declaration of the NodeDeletion option to main.go 2023-07-10 08:49:49 +00:00
vadasambar 7941bab214 feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:31:45 +05:30
Hakan Bostan 333a0286e0 Rename the autoscaling option
* Renamed the "AtomicScaling" autoscaling option to
  "ZeroOrMaxNodeScaling" to be more clear about the behavior.
2023-06-30 11:17:53 +00:00
Karol Wychowaniec 374cf611b7 Address next set of comments 2023-06-27 13:35:02 +00:00
Kubernetes Prow Robot 265e57c163
Merge pull request #5835 from elmiko/add-more-balance-logging
Cluster Autoscaler: add more logging for balancing similar node groups
2023-06-26 05:25:46 -07:00
Kushagra 541affc28a addressed comments 2023-06-16 14:03:20 +00:00
Kushagra db0c783353 make no-op binpacking limiter as default + move mark nodegroups to its method 2023-06-13 12:21:26 +00:00
michael mccune 5b0ad270de add more logging for balancing similar node groups
this change adds some logging at verbosity levels 2 and 3 to help
diagnose why the cluster-autoscaler does not consider 2 or more node
groups to be similar.
2023-06-05 14:03:57 +02:00
Kushagra 49cfd18000 BinpackingLimiter interface 2023-06-02 12:59:42 +00:00
Bartłomiej Wróblewski 9604756004 Sanitize taints before scheduling DSs on template node infos 2023-04-19 14:23:42 +00:00
Bartłomiej Wróblewski b8d40fdd3c Add status taints option to template creation 2023-04-19 13:55:38 +00:00
Maria Oparka ca088d26c2 Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions 2023-04-19 11:08:20 +02:00
Bartłomiej Wróblewski d5d0a3c7b7 Fix drain logic when skipNodesWithCustomControllerPods=false, set NodeDeleteOptions correctly 2023-04-04 09:50:26 +00:00
Yaroslava Serdiuk cea9d1a73b Add empty nodes sorting for scale down candidates 2023-03-08 15:43:22 +00:00
peaaceChoi 82e8804181 Fix continue condition 2023-03-03 06:26:38 +00:00
peaaceChoi 460836285d Delete unused return param 2023-03-03 04:53:50 +00:00
tombokombo 93b9f4b8be
fix
Signed-off-by: tombokombo <tombo@sysart.tech>
2023-02-07 14:35:23 +01:00
Kubernetes Prow Robot e911e54f1e
Merge pull request #5214 from tombokombo/fix/asg-resource-tags
Fix/asg resource tags
2023-02-07 01:35:01 -08:00
Bartłomiej Wróblewski b608278386 Add force Daemon Sets option 2023-01-30 11:02:42 +00:00
Bartłomiej Wróblewski 0470fdfc35 Clean up DS utils: remove unused cluster snapshot and predicate checker 2023-01-23 14:14:53 +00:00
Kubernetes Prow Robot f507519916
Merge pull request #5423 from yaroslava-serdiuk/sd-sorting
Add scale down candidates observer
2023-01-19 10:14:16 -08:00
Yaroslava Serdiuk 541ce04e4b Add previous scale down candidate sorting 2023-01-19 16:04:50 +00:00
Yaroslava Serdiuk 97159df69b Add scale down candidates observer 2023-01-19 16:04:42 +00:00
michael mccune 955396e857 remove clusterapi nodegroupset processor
as discussed with the cluster api community[0], the nodegroupset
processor is being removed from the clusterapi provider implementation
in favor of instructing our community on the use of the
--balancing-ignore-label flag. due to the wide variety of provider
infrastructures that clusterapi can be deployed on, we would prefer to
not encode all of these labels in the autoscaler itself. see the linked
recording for more information.

[0] https://www.youtube.com/watch?v=jbhca_9oPuQ
2023-01-12 15:05:37 -05:00
bsoghigian 0f8ed0b81f Configurable difference ratios 2023-01-09 22:40:16 -08:00
Kubernetes Prow Robot d9ffb8f5ce
Merge pull request #5317 from grosser/grosser/ref2
cluster-autoscaler: refactor BalanceScaleUpBetweenGroups
2022-12-19 00:49:44 -08:00
Kubernetes Prow Robot bc483274e4
Merge pull request #5325 from x13n/master
Log node group min and current size when skipping scale down
2022-11-24 02:24:03 -08:00
Daniel Kłobuszewski d9100cd707 Log node group min and current size when skipping scale down 2022-11-23 13:23:07 +01:00
Michael Grosser 62f29d23af
cluster-autoscaler: refactor BalanceScaleUpBetweenGroups 2022-11-15 13:21:29 -08:00
Bartłomiej Wróblewski 4373c467fe Add ScaleDown.Actuator to AutoscalingContext 2022-11-02 13:12:25 +00:00
Daniel Kłobuszewski 18f2e67c4f Split out code from simulator package 2022-10-18 11:51:44 +02:00
tombokombo 17a09a0efe fix asg resource tags
Signed-off-by: tombokombo <tombo@sysart.tech>
2022-09-27 01:25:29 +02:00
Flavian f1b6d4ded6 handle directx nodes the same as gpu nodes 2022-09-23 09:55:14 +02:00