Commit Graph

226 Commits

Author SHA1 Message Date
Eric Lin 74d1f7f349 Trim managedFields in shared informer factory
Signed-off-by: Eric Lin <exlin@google.com>
2023-09-28 12:45:47 +00:00
Mahmoud Atwa 1bbbbd6036 fix typo 2023-09-27 09:00:32 +00:00
Mahmoud Atwa 267476fd06 Rename variables & methods to StartupTaint... instead of IgnoreTaint 2023-09-26 13:54:52 +00:00
Mahmoud Atwa f9d3185f16 Rename IgnoreTaints to StartupTaints & deprecate --ignore-taints flag 2023-09-26 09:12:01 +00:00
Mahmoud Atwa 79a55bbb05 Add startup taint flag, prefix & add status taint prefix 2023-09-22 21:08:07 +00:00
Mahmoud Atwa d2fe118db9 Add startup taint flag, prefix & add status taint prefix 2023-09-22 20:51:34 +00:00
Patrick Ohly ade5e0814e fix incomplete startup of informers
Previously, SharedInformerFactory.Start was called before core.NewAutoscaler.
That had the effect that any new informer created as part of
core.NewAutoscaler, in particular in
kubernetes.NewListerRegistryWithDefaultListers, never got started.

One of them was the DaemonSet informer. This had the effect that the DaemonSet
lister had an empty cache and scale down failed with:

    I0920 11:06:36.046889   31805 cluster.go:164] node gke-cluster-pohly-default-pool-c9f60a43-5rvz cannot be removed: daemonset for kube-system/pdcsi-node-7hnmc is not present, err: daemonset.apps "pdcsi-node" not found

This was on a GKE cluster with cluster-autoscaler running outside of the
cluster on a development machine.
2023-09-20 11:20:35 +02:00
Kubernetes Prow Robot f9a7c7f73f
Merge pull request #6114 from linxiulei/cmd
Allow setting content-type in command
2023-09-19 06:23:07 -07:00
Eric Lin 61d784a662 Allow setting content-type in command
Signed-off-by: Eric Lin <exlin@google.com>
2023-09-19 08:58:59 +00:00
Youn Jae Kim 6cf41290b6 add elect-leader flag to the pflag 2023-09-15 14:35:56 -07:00
Eric Lin 0f02235e98 Use informer factory to reuse listers
Signed-off-by: Eric Lin <exlin@google.com>
2023-09-15 13:57:15 +00:00
Damika Gamlath aef5a2bffc Disable dynamic-node-delete-delay-after-taint-enabled by default 2023-09-11 13:20:08 +00:00
Kubernetes Prow Robot 9622a50f55
Merge pull request #6019 from damikag/fix-rc-ca-sched
Implement dynamically adjustment of NodeDeleteDelayAfterTaint based on round trip time between CA and api-server
2023-09-08 07:04:15 -07:00
Damika Gamlath 42691d3443 fix race condition between ca and scheduler
Implement dynamically adjustment of NodeDeleteDelayAfterTaint based on round trip time between ca and apiserver
2023-09-08 12:14:23 +00:00
Marc-Andre Dufresne 55c92e1025 add json logging support 2023-08-28 10:32:39 -04:00
Bartłomiej Wróblewski e39d1b028d Clean up NodeGroupConfigProcessor interface 2023-08-04 16:00:50 +00:00
Daniel Kłobuszewski 990cd6581c Enable parallel drain by default. 2023-07-21 17:57:19 +02:00
vadasambar 23f03e112e feat: support custom scheduler config for in-tree schedulr plugins (without extenders)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: `goto` causing infinite loop
- abstract out running extenders in a separate function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove code around extenders
- we decided not to use scheduler extenders for checking if a pod would fit on a node
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move scheduler config to a `utils/scheduler` package`
- use default config as a fallback
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix static_autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `GetSchedulerConfiguration` fn
- remove falling back
- add mechanism to detect if the scheduler config file flag was set
-
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: wip add tests for `GetSchedulerConfig`
- tests are failing now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `GetSchedulerConfig`
- abstract error messages so that we can use them in the tests
- set api version explicitly (this is what upstream does as well)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: do a round of cleanup to make PR ready for review
- make import names consistent
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: use `pflag` to check if the `--scheduler-config-file` flag was set
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add comments for exported error constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: don't export error messages
- exporting is not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: add underscore in test file name
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix test failing because of no comment on exported `SchedulerConfigFileFlag`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile`
- avoids confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add extra test cases for predicate checker
- where the predicate checker uses custom scheduler config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `setFlags` variable
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: abstract custom scheduler configs into `conifg` package
- make them constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix linting error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: introduce a new custom test predicate checker
- instead of adding a param to the current one
- this is so that we don't have to pass `nil` to the existing test predicate checker in many places
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig`
- latter narrows down meaning of the function better than former
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `GetSchedulerConfig` -> `ConfigFromPath`
- `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig`
- move test config to a new package `test` under `config` package
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add `TODO` for replacing code to parse scheduler config
- with upstream function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-13 09:51:33 +05:30
Kubernetes Prow Robot da96d89b17
Merge pull request #5890 from Bryce-Soghigian/bsoghigian/respecting-bulk-delete
fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value
2023-07-12 09:45:12 -07:00
Kubernetes Prow Robot c6893e9e28
Merge pull request #5672 from vadasambar/feat/5399/ignore-daemonsets-utilization-per-nodegroup
feat: set `IgnoreDaemonSetsUtilization` per nodegroup for AWS
2023-07-12 07:43:12 -07:00
Damika Gamlath 0f8502c623 Refactor autoscaler.go and static_autoscalar.go to move declaration of the NodeDeletion option to main.go 2023-07-10 08:49:49 +00:00
bsoghigian 1e4507819a fix: dynamic assignment of the scale down threshold flags. Setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the larger of the two flags in the case both are set 2023-07-08 14:43:41 -07:00
vadasambar 8a73d8ea42 refactor: remove comment line (not relevant anymore)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 11:51:09 +05:30
vadasambar 7941bab214 feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:31:45 +05:30
mendelski 51f1660db7
deduplicate scale-up test setup 2023-05-11 12:55:59 +00:00
mendelski 3e2d48d86d
Add option parallel-scale-up 2023-05-11 07:49:34 +00:00
Maria Oparka ca088d26c2 Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions 2023-04-19 11:08:20 +02:00
Aleksandra Gacek 656f1919a8 Limit refresh rate of GCE MIG instances. 2023-04-18 15:08:06 +02:00
vadasambar ff6fe5833d feat: check only controller ref to decide if a pod is replicated
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit 144a64a402)

fix: set `replicated` to true if controller ref is set to `true`
- forgot to add this in the last commit

Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit f8f458295d)

fix: remove `checkReferences`
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

(cherry picked from commit 5df6e31f8b)

test(drain): add test for custom controller pod
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add flag to allow scale down on custom controller pods
- set to `false` by default
- `false` will be set to `true` by default in the future
- right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true`
- TODO: this code might need some unit tests. Look into adding unit tests.
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: remove `at` symbol in prefix of `vadasambar`
- to keep it consistent with previous such mentions in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(utils): run all drain tests twice
- once for  `allowScaleDownOnCustomControllerOwnedPods=false`
- and once for `allowScaleDownOnCustomControllerOwnedPods=true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs(utils): add description for `testOpts` struct
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(utils/drain): fix failing tests
- refactor code to add cusom controller pod test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix long code comments
- clean-up print statements
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: move `expectFatal` right above where it is used
- makes the code easier to read
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix code comment wording
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: address PR comments
- abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future
- fix param info in the FAQ.md
- simplify tests and remove the global variable used in the tests
- rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods`
- refactor tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update flag info
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: forgot to change flag name on a line in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `ControllerRef()` directly instead of `controllerRef`
- we don't need an extra variable
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: create tests consolidated test cases
- from looping over and tweaking shared test cases
- so that we don't have to duplicate shared test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: append test flag to shared test description
- so that the failed test is easy to identify
- shallow copy tests and add comments so that others do the same
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-03-22 10:51:07 +05:30
Kubernetes Prow Robot b1b7f39908
Merge pull request #5578 from yaroslava-serdiuk/empty-sorting
Add empty nodes sorting for scale down candidates
2023-03-09 04:28:02 -08:00
Kubernetes Prow Robot 205293a7ca
Merge pull request #5537 from arrikto/feature-disable-unready-scaledown
cluster-autoscaler: Add option to disable scale down of unready nodes
2023-03-08 07:55:11 -08:00
Yaroslava Serdiuk cea9d1a73b Add empty nodes sorting for scale down candidates 2023-03-08 15:43:22 +00:00
Grigoris Thanasoulas 6cf8c329da cluster-autoscaler: Add option to disable scale down of unready nodes
Add flag '--scale-down-unready-enabled' to enable or disable scale-down
of unready nodes. Default value set to true for backwards compatibility
(i.e., allow scale-down of unready nodes).

Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>
2023-03-06 15:51:10 +02:00
Yaroslava Serdiuk a35d6d2269 Fix RemovalSimulation for parallel scale down 2023-03-01 17:30:30 +00:00
tombokombo 93b9f4b8be
fix
Signed-off-by: tombokombo <tombo@sysart.tech>
2023-02-07 14:35:23 +01:00
Kubernetes Prow Robot e911e54f1e
Merge pull request #5214 from tombokombo/fix/asg-resource-tags
Fix/asg resource tags
2023-02-07 01:35:01 -08:00
Bartłomiej Wróblewski b608278386 Add force Daemon Sets option 2023-01-30 11:02:42 +00:00
Bartłomiej Wróblewski d4b812e936 Add filtering out DS pods from scale-up, refactor default pod list processor 2023-01-23 17:14:46 +00:00
Kubernetes Prow Robot f507519916
Merge pull request #5423 from yaroslava-serdiuk/sd-sorting
Add scale down candidates observer
2023-01-19 10:14:16 -08:00
Yaroslava Serdiuk 541ce04e4b Add previous scale down candidate sorting 2023-01-19 16:04:50 +00:00
michael mccune 955396e857 remove clusterapi nodegroupset processor
as discussed with the cluster api community[0], the nodegroupset
processor is being removed from the clusterapi provider implementation
in favor of instructing our community on the use of the
--balancing-ignore-label flag. due to the wide variety of provider
infrastructures that clusterapi can be deployed on, we would prefer to
not encode all of these labels in the autoscaler itself. see the linked
recording for more information.

[0] https://www.youtube.com/watch?v=jbhca_9oPuQ
2023-01-12 15:05:37 -05:00
Kubernetes Prow Robot b94f340af5
Merge pull request #5402 from Bryce-Soghigian/bsoghigian/adding-configurable-difference-ratios
adding configurable difference ratios
2023-01-10 04:03:25 -08:00
bsoghigian 0f8ed0b81f Configurable difference ratios 2023-01-09 22:40:16 -08:00
Kubernetes Prow Robot 3785a2f82a
Merge pull request #5223 from grosser/grosser/burst
cluster-autoscaler: allow setting kuberentes client burst and qps to avoid rate limiting
2022-12-30 06:21:30 -08:00
Michael Grosser cd26bcfe60
allow setting kuberentes client burst and qps to avoid rate limiting 2022-12-29 13:54:04 -08:00
Bartłomiej Wróblewski 62c68e1280 Move PredicateChecker initialization before processors initialization 2022-12-27 15:21:41 +00:00
Kubernetes Prow Robot a46a095fe2
Merge pull request #5362 from yasinlachiny/maxnodetotal
set cluster_autoscaler_max_nodes_count dynamically
2022-12-19 00:33:44 -08:00
yasin.lachiny 6d9fed5211 set cluster_autoscaler_max_nodes_count dynamically
Signed-off-by: yasin.lachiny <yasin.lachiny@gmail.com>
2022-12-11 00:18:03 +01:00
Bartłomiej Wróblewski 2e1b04ff69 Add default PodListProcessor wrapper 2022-12-09 16:26:56 +00:00
Yaroslava Serdiuk ae45571af9 Create a Planner object if --parallelDrain=true 2022-12-07 11:36:05 +00:00