Commit Graph

108 Commits

Author SHA1 Message Date
Damika Gamlath 8971a29177 refactor gce.RegenerateMigInstancesCache() to use Instance.List API for listing MIG instances 2024-07-04 14:48:24 +00:00
Damika Gamlath 0728d157c2 implement time limiter for binpacking 2024-06-13 11:50:13 +00:00
Yuki Iwai f6fcee13ec CA: Move the ProvisioningRequest CRD to apis module
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-03-24 15:33:03 +01:00
Mahmoud Atwa e7ff1cd90f Introduce LocalSSDSizeProvider interface for GCE 2024-02-16 09:10:27 +00:00
vadasambar 5de49a11fb feat: support `--scale-down-delay-after-*` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: update scale down status after every scale up
- move scaledown delay status to cluster state/registry
- enable scale down if  `ScaleDownDelayTypeLocal` is enabled
- add new funcs on cluster state to get and update scale down delay status
- use timestamp instead of booleans to track scale down delay status
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use existing fields on clusterstate
- uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus`
- changed the above existing fields a little to make them more convenient for use
- moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove note saying only `scale-down-after-add` is supported
- because we are supporting all the flags
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add`
- because it is not true anymore
- we are supporting all `--scale-down-delay-after-*` flags per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix clusterstate tests failing
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move back initializing processors logic to from static autoscaler to main
- we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors
- and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: revert changes related to `clusterstate`
- since I am going with observer pattern
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add observer interface for state of scaling
- to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same)
- refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove params passed to `clearScaleUpFailures`
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: revert clusterstate tests
- approach has changed
- I am not making any changes in clusterstate now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add accidentally deleted lines for clusterstate test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: implement `Add` fn for scale state observer
- to easily add new observers
- re-word comments
- remove redundant params from `NewDefaultScaleDownCandidatesProcessor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: CI complaining because no comments on fn definitions
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: initialize parent `ScaleDownCandidatesProcessor`
- instead  of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add scale state notifier to list of default processors
- initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn
- this allows more flexibility
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add observer interface
- create a separate observer directory
- implement `RegisterScaleUp` function in the clusterstate
- TODO: resolve syntax errors
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: use `scaleStateNotifier` in place of `clusterstate`
- delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory)
- register `clustertstate` with `scaleStateNotifier`
- use `Register` instead of `Add` function in `scaleStateNotifier`
- fix `go build`
- wip: fixing tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix syntax errors
- add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip track scale down failures along with scale up failures
- I was tracking scale up failures but not scale down failures
- fix copyright year 2017 -> 2023 for the new `pointers` package
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: register failed scale down with scale state notifier
- wip writing tests for `scale_down_candidates_delay_processor`
- fix CI lint errors
- remove test file for `scale_down_candidates_processor` (there is not much to test as of now)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: wip tests for `ScaleDownCandidatesDelayProcessor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add unit tests for `ScaleDownCandidatesDelayProcessor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor`
- not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: better doc comments for `TestGetScaleDownCandidates`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't ignore error in `NGChangeObserver`
- return it instead and let the caller decide what to do with it
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: change pointers to values in `NGChangeObserver` interface
- easier to work with
- remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now)
- add tests for clusterstate's `RegisterScaleUp`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: conditions in `GetScaleDownCandidates`
- set scale down in cool down if the number of scale down candidates is 0
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: use `ng1` instead of `ng2` in existing test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip static autoscaler tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: assign directly instead of using `sdProcessor` variable
- variable is not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: first working test for static autoscaler
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: continue working on static autoscaler tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: wip second static autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `Println` used for debugging
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add static_autoscaler tests for scale down delay per nodegroup flags
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: rebase off the latest `master`
- change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix clusterstate test failing
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing orchestrator test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor`
- describes the processor better
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver`
- makes it easier to understand for someone not familiar with the codebase
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: reword code comment `after` -> `for which`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't return error from `RegisterScaleDown`
- not needed as of now (no implementer function returns a non-nil error for this function)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: address review comments around ng change observer interface
- change dir structure of nodegroup change observer package
- stop returning errors wherever it is not needed in the nodegroup change observer interface
- rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: make nodegroupchange observer thread-safe
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add TODO to consider using multiple mutexes in nodegroupchange observer
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `time.Now()` directly instead of assigning a variable to it
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: share code for checking if there was a recent scale-up/down/failure
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: convert `ScaleDownCandidatesDelayProcessor` into table tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()`
- makes it easier to understand what the function does
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: replace scale state notifier `Register` -> `RegisterForNotifications` in test
- to fix syntax errors since it is already renamed in the actual code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `clusterStateRegistry` from `delete_in_batch` tests
- not needed anymore since we have `scaleStateNotifier`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: address PR review comments
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: add empty `RegisterFailedScaleDown` for clusterstate
- fix syntax error in static autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2024-01-11 21:46:42 +05:30
Yaroslava Serdiuk d29ffd03b9
Add ProvisioningRequestPodsFilter processor (#6386)
* Introduce ProvisioningRequestPodsFilter processor

* Review
2024-01-03 11:49:36 +01:00
Joachim Bartosik a5e540d5da Restore flags for setting QPS limit in CA
Partially undo #6274. I noticed that with this change CA get rate limited and
slows down significantly (especially during large scale downs).
2023-12-29 13:28:08 +00:00
Kubernetes Prow Robot fc48d5c052
Merge pull request #6139 from damikag/priority-evictor
Implement priority based evictor
2023-12-21 18:18:53 +01:00
damikag 9ffbea4408 implement priority based evictor and refactor drain logic 2023-12-21 16:57:05 +00:00
Kubernetes Prow Robot 0a1d74f352
Merge pull request #6294 from vadasambar/refactor/kube-client
refactor(*): move getKubeClient to utils/kubernetes
2023-11-29 19:28:44 +01:00
qianlei.qianl ae18f05a61 refactor(*): move getKubeClient to utils/kubernetes
(cherry picked from commit b9f636d2ef)

Signed-off-by: qianlei.qianl <qianlei.qianl@bytedance.com>

refactor: move logic to create client to utils/kubernetes pkg
- expose `CreateKubeClient` as public function
- make `GetKubeConfig` into a private `getKubeConfig` function (can be exposed as a public function in the future if needed)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: CI failing because cloudproviders were not updated to use new autoscaling option fields
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: define errors as constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: pass kube client options by value
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-11-29 00:43:59 +05:30
Kubernetes Prow Robot 39245a5613
Merge pull request #6235 from atwamahmoud/ignore-scheduler-processing
Ignore scheduler processing
2023-11-22 13:54:30 +01:00
Mahmoud Atwa a1ae4d3b57 Update flags, Improve tests readability & use Bypass instead of ignore in naming 2023-11-22 11:18:55 +00:00
Mahmoud Atwa 4635a6dc04 Allow users to specify which schedulers to ignore 2023-11-22 11:18:44 +00:00
Mahmoud Atwa 86ab017967 Fix multiple comments and update flags 2023-11-22 11:17:48 +00:00
Mahmoud Atwa a1ab7b9e20 Add new pod list processors for clearing TPU requests & filtering out
expendable pods

Treat non-processed pods yet as unschedulable
2023-11-22 11:16:33 +00:00
Aleksandra Gacek 0ca611a161 Allow overriding domain suffix in GCE cloud provider. 2023-11-21 17:00:36 +01:00
Artur Żyliński e836e47c1e Remove gce-expander-ephemeral-storage-support flag
Always enable the feature
2023-11-15 10:59:06 +01:00
Artur Żyliński 747d0b9af4 Cleanup: Remove separate client for k8s events
Remove RateLimiting options - replay on APF for apiserver protection.
Details: https://github.com/kubernetes/kubernetes/issues/111880
2023-11-14 11:20:36 +01:00
Mahmoud Atwa 267476fd06 Rename variables & methods to StartupTaint... instead of IgnoreTaint 2023-09-26 13:54:52 +00:00
Daniel Gutowski 5155725a28 Address comments from #6104 2023-09-20 05:11:38 -07:00
Daniel Gutowski 678caa5fff Fix the api path to autoscaling.x-k8s.io 2023-09-15 02:40:43 -07:00
Daniel Gutowski fd13a4c36d Add auto-generated client and CRD manifest.
Manifest generated via:
$ make manifest

Client generated via:
$ ./hack/update-codegen.sh
2023-09-13 02:24:48 -07:00
Damika Gamlath 42691d3443 fix race condition between ca and scheduler
Implement dynamically adjustment of NodeDeleteDelayAfterTaint based on round trip time between ca and apiserver
2023-09-08 12:14:23 +00:00
vadasambar 23f03e112e feat: support custom scheduler config for in-tree schedulr plugins (without extenders)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: `goto` causing infinite loop
- abstract out running extenders in a separate function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove code around extenders
- we decided not to use scheduler extenders for checking if a pod would fit on a node
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move scheduler config to a `utils/scheduler` package`
- use default config as a fallback
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix static_autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `GetSchedulerConfiguration` fn
- remove falling back
- add mechanism to detect if the scheduler config file flag was set
-
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: wip add tests for `GetSchedulerConfig`
- tests are failing now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `GetSchedulerConfig`
- abstract error messages so that we can use them in the tests
- set api version explicitly (this is what upstream does as well)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: do a round of cleanup to make PR ready for review
- make import names consistent
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: use `pflag` to check if the `--scheduler-config-file` flag was set
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add comments for exported error constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't export error messages
- exporting is not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: add underscore in test file name
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix test failing because of no comment on exported `SchedulerConfigFileFlag`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile`
- avoids confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add extra test cases for predicate checker
- where the predicate checker uses custom scheduler config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `setFlags` variable
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: abstract custom scheduler configs into `conifg` package
- make them constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix linting error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: introduce a new custom test predicate checker
- instead of adding a param to the current one
- this is so that we don't have to pass `nil` to the existing test predicate checker in many places
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig`
- latter narrows down meaning of the function better than former
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `GetSchedulerConfig` -> `ConfigFromPath`
- `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig`
- move test config to a new package `test` under `config` package
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add `TODO` for replacing code to parse scheduler config
- with upstream function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-13 09:51:33 +05:30
vadasambar 7941bab214 feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:31:45 +05:30
Hakan Bostan 333a0286e0 Rename the autoscaling option
* Renamed the "AtomicScaling" autoscaling option to
  "ZeroOrMaxNodeScaling" to be more clear about the behavior.
2023-06-30 11:17:53 +00:00
Hakan Bostan 8ba34ea74b Change handling of scale up options for ZeroToMaxNodeScaling in orchestrator
* Started handling scale up options for ZeroToMaxNodeScaling with the existing estimator
* Skip setting similar node groups for the node groups that use
  ZeroToMaxNodeScaling
* Renamed the autoscaling option from "AtomicScaleUp" to "AtomicScaling"
* Merged multiple tests into one single table driven test.
* Fixed some typos.
2023-06-30 11:17:53 +00:00
Hakan Bostan 7b8e0e62a7 Add support for scaling up ZeroToMaxNodesScaling node groups 2023-06-30 11:17:53 +00:00
Karol Wychowaniec 05e1fe1f1c Use single AtomicScaling option for scale up and scale down 2023-06-27 12:28:33 +00:00
Karol Wychowaniec 61b7958f06 Add support for atomic scale-down in node group options 2023-06-27 12:28:32 +00:00
mendelski 3e2d48d86d
Add option parallel-scale-up 2023-05-11 07:49:34 +00:00
Bartłomiej Wróblewski b8d40fdd3c Add status taints option to template creation 2023-04-19 13:55:38 +00:00
Maria Oparka ca088d26c2 Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions 2023-04-19 11:08:20 +02:00
Aleksandra Gacek 656f1919a8 Limit refresh rate of GCE MIG instances. 2023-04-18 15:08:06 +02:00
vadasambar ff6fe5833d feat: check only controller ref to decide if a pod is replicated
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit 144a64a402)

fix: set `replicated` to true if controller ref is set to `true`
- forgot to add this in the last commit

Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit f8f458295d)

fix: remove `checkReferences`
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

(cherry picked from commit 5df6e31f8b)

test(drain): add test for custom controller pod
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add flag to allow scale down on custom controller pods
- set to `false` by default
- `false` will be set to `true` by default in the future
- right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true`
- TODO: this code might need some unit tests. Look into adding unit tests.
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: remove `at` symbol in prefix of `vadasambar`
- to keep it consistent with previous such mentions in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(utils): run all drain tests twice
- once for  `allowScaleDownOnCustomControllerOwnedPods=false`
- and once for `allowScaleDownOnCustomControllerOwnedPods=true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs(utils): add description for `testOpts` struct
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(utils/drain): fix failing tests
- refactor code to add cusom controller pod test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix long code comments
- clean-up print statements
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move `expectFatal` right above where it is used
- makes the code easier to read
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix code comment wording
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: address PR comments
- abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future
- fix param info in the FAQ.md
- simplify tests and remove the global variable used in the tests
- rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods`
- refactor tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update flag info
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: forgot to change flag name on a line in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `ControllerRef()` directly instead of `controllerRef`
- we don't need an extra variable
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: create tests consolidated test cases
- from looping over and tweaking shared test cases
- so that we don't have to duplicate shared test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: append test flag to shared test description
- so that the failed test is easy to identify
- shallow copy tests and add comments so that others do the same
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-03-22 10:51:07 +05:30
Grigoris Thanasoulas 6cf8c329da cluster-autoscaler: Add option to disable scale down of unready nodes
Add flag '--scale-down-unready-enabled' to enable or disable scale-down
of unready nodes. Default value set to true for backwards compatibility
(i.e., allow scale-down of unready nodes).

Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>
2023-03-06 15:51:10 +02:00
Kubernetes Prow Robot b94f340af5
Merge pull request #5402 from Bryce-Soghigian/bsoghigian/adding-configurable-difference-ratios
adding configurable difference ratios
2023-01-10 04:03:25 -08:00
bsoghigian cb19141f9e fixing linting pipeline 2023-01-09 23:36:44 -08:00
bsoghigian 0f8ed0b81f Configurable difference ratios 2023-01-09 22:40:16 -08:00
Michael Grosser cd26bcfe60
allow setting kuberentes client burst and qps to avoid rate limiting 2022-12-29 13:54:04 -08:00
Yaroslava Serdiuk ae45571af9 Create a Planner object if --parallelDrain=true 2022-12-07 11:36:05 +00:00
Aleksandra Gacek bae587d20c Break node categorization in scale down planner on timeout. 2022-12-05 11:34:53 +01:00
Xintong Liu 524886fca5 Support scaling up node groups to the configured min size if needed 2022-11-02 21:47:00 -07:00
Alexandru Matei 0ee2a359e7 Add option to wait for a period of time after node tainting/cordoning
Node state is refreshed and checked again before deleting the node
It gives kube-scheduler time to acknowledge that nodes state has
changed and to stop scheduling pods on them
2022-10-13 10:37:56 +03:00
Kubernetes Prow Robot b3c6b60e1c
Merge pull request #5060 from yaroslava-serdiuk/deleting-in-batch
Introduce NodeDeleterBatcher to ScaleDown actuator
2022-09-22 10:11:06 -07:00
Yaroslava Serdiuk 65b0d78e6e Introduce NodeDeleterBatcher to ScaleDown actuator 2022-09-22 16:19:45 +00:00
James Ravn 1b98b3823a
Allow balancing by labels exclusively
Adds a new flag `--balance-label` which allows users to balance between
node groups exclusively via labels.

This gives users the flexibility to specify the similarity logic
themselves when --balance-similar-node-groups is in use.
2022-07-06 10:34:18 +01:00
Maciek Pytel ab891418f6 Limit binpacking based on #new_nodes or time
The binpacking algorithm is O(#pending_pods * #new_nodes) and
calculating a very large scale-up can get stuck for minutes or even
hours, leading to CA failing it's healthcheck and going down.
The new limiting prevents this scenario by stopping binpacking after
reaching specified threshold. Any pods that remain pending as a result
of shorter binpacking will be processed next autoscaler loop.

The thresholds used can be controlled with newly introduced flags:
--max-nodes-per-scaleup and --max-nodegroup-binpacking-duration. The
limiting can be disabled by setting both flags to 0 (not recommended,
especially for --max-nodegroup-binpacking-duration).
2022-06-20 17:02:51 +02:00
Michael McCune 8c27f76933 add a flag to allow event duplication
this change brings in a new command line flag,
`--record-duplicated-events`, which allows a user to enable the
duplication of events bypassing the 5 minute de-duplication window.
2022-06-03 14:26:38 -04:00