autoscaler

Commit Graph

Author	SHA1	Message	Date
Eric Lin	74d1f7f349	Trim managedFields in shared informer factory Signed-off-by: Eric Lin <exlin@google.com>	2023-09-28 12:45:47 +00:00
Mahmoud Atwa	1bbbbd6036	fix typo	2023-09-27 09:00:32 +00:00
Mahmoud Atwa	267476fd06	Rename variables & methods to StartupTaint... instead of IgnoreTaint	2023-09-26 13:54:52 +00:00
Mahmoud Atwa	f9d3185f16	Rename IgnoreTaints to StartupTaints & deprecate --ignore-taints flag	2023-09-26 09:12:01 +00:00
Mahmoud Atwa	79a55bbb05	Add startup taint flag, prefix & add status taint prefix	2023-09-22 21:08:07 +00:00
Mahmoud Atwa	d2fe118db9	Add startup taint flag, prefix & add status taint prefix	2023-09-22 20:51:34 +00:00
Patrick Ohly	ade5e0814e	fix incomplete startup of informers Previously, SharedInformerFactory.Start was called before core.NewAutoscaler. That had the effect that any new informer created as part of core.NewAutoscaler, in particular in kubernetes.NewListerRegistryWithDefaultListers, never got started. One of them was the DaemonSet informer. This had the effect that the DaemonSet lister had an empty cache and scale down failed with: I0920 11:06:36.046889 31805 cluster.go:164] node gke-cluster-pohly-default-pool-c9f60a43-5rvz cannot be removed: daemonset for kube-system/pdcsi-node-7hnmc is not present, err: daemonset.apps "pdcsi-node" not found This was on a GKE cluster with cluster-autoscaler running outside of the cluster on a development machine.	2023-09-20 11:20:35 +02:00
Kubernetes Prow Robot	f9a7c7f73f	Merge pull request #6114 from linxiulei/cmd Allow setting content-type in command	2023-09-19 06:23:07 -07:00
Eric Lin	61d784a662	Allow setting content-type in command Signed-off-by: Eric Lin <exlin@google.com>	2023-09-19 08:58:59 +00:00
Youn Jae Kim	6cf41290b6	add elect-leader flag to the pflag	2023-09-15 14:35:56 -07:00
Eric Lin	0f02235e98	Use informer factory to reuse listers Signed-off-by: Eric Lin <exlin@google.com>	2023-09-15 13:57:15 +00:00
Damika Gamlath	aef5a2bffc	Disable dynamic-node-delete-delay-after-taint-enabled by default	2023-09-11 13:20:08 +00:00
Kubernetes Prow Robot	9622a50f55	Merge pull request #6019 from damikag/fix-rc-ca-sched Implement dynamically adjustment of NodeDeleteDelayAfterTaint based on round trip time between CA and api-server	2023-09-08 07:04:15 -07:00
Damika Gamlath	42691d3443	fix race condition between ca and scheduler Implement dynamically adjustment of NodeDeleteDelayAfterTaint based on round trip time between ca and apiserver	2023-09-08 12:14:23 +00:00
Marc-Andre Dufresne	55c92e1025	add json logging support	2023-08-28 10:32:39 -04:00
Bartłomiej Wróblewski	e39d1b028d	Clean up NodeGroupConfigProcessor interface	2023-08-04 16:00:50 +00:00
Daniel Kłobuszewski	990cd6581c	Enable parallel drain by default.	2023-07-21 17:57:19 +02:00
vadasambar	23f03e112e	feat: support custom scheduler config for in-tree schedulr plugins (without extenders) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `goto` causing infinite loop - abstract out running extenders in a separate function Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove code around extenders - we decided not to use scheduler extenders for checking if a pod would fit on a node Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move scheduler config to a `utils/scheduler` package` - use default config as a fallback Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix static_autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `GetSchedulerConfiguration` fn - remove falling back - add mechanism to detect if the scheduler config file flag was set - Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip add tests for `GetSchedulerConfig` - tests are failing now Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `GetSchedulerConfig` - abstract error messages so that we can use them in the tests - set api version explicitly (this is what upstream does as well) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: do a round of cleanup to make PR ready for review - make import names consistent Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: use `pflag` to check if the `--scheduler-config-file` flag was set Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add comments for exported error constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't export error messages - exporting is not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add underscore in test file name Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix test failing because of no comment on exported `SchedulerConfigFileFlag` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile` - avoids confusion Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add extra test cases for predicate checker - where the predicate checker uses custom scheduler config Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `setFlags` variable - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract custom scheduler configs into `conifg` package - make them constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix linting error Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: introduce a new custom test predicate checker - instead of adding a param to the current one - this is so that we don't have to pass `nil` to the existing test predicate checker in many places Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig` - latter narrows down meaning of the function better than former Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `GetSchedulerConfig` -> `ConfigFromPath` - `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig` - move test config to a new package `test` under `config` package Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add `TODO` for replacing code to parse scheduler config - with upstream function Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-13 09:51:33 +05:30
Kubernetes Prow Robot	da96d89b17	Merge pull request #5890 from Bryce-Soghigian/bsoghigian/respecting-bulk-delete fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value	2023-07-12 09:45:12 -07:00
Kubernetes Prow Robot	c6893e9e28	Merge pull request #5672 from vadasambar/feat/5399/ignore-daemonsets-utilization-per-nodegroup feat: set `IgnoreDaemonSetsUtilization` per nodegroup for AWS	2023-07-12 07:43:12 -07:00
Damika Gamlath	0f8502c623	Refactor autoscaler.go and static_autoscalar.go to move declaration of the NodeDeletion option to main.go	2023-07-10 08:49:49 +00:00
bsoghigian	1e4507819a	fix: dynamic assignment of the scale down threshold flags. Setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the larger of the two flags in the case both are set	2023-07-08 14:43:41 -07:00
vadasambar	8a73d8ea42	refactor: remove comment line (not relevant anymore) Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-06 11:51:09 +05:30
vadasambar	7941bab214	feat: set `IgnoreDaemonSetsUtilization` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: test cases failing for actuator and scaledown/eligibility - abstract default values into `config` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code - there is no change in the flag name - rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: reset help text for `ignore-daemonsets-utilization` flag - because per nodegroup override is supported only for AWS ASG tags as of now Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add info about overriding `--ignore-daemonsets-utilization` per ASG - in AWS cloud provider README Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface - to limit the functions that can be used - since we need it only for `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: tests failing for actuator - rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter` - move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: go lint errors for `MockNodeGroupConfigGetter` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update node group config processor tests for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update eligibility test cases for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: run actuation tests for 2 NGS - one with `IgnoreDaemonSetsUtilization`: `false` - one with `IgnoreDaemonSetsUtilization`: `true` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in actuator - add helper to generate multiple ds pods dynamically - get rid of mock config processor because it is not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing tests for actuator Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option - not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `generateDsPods` instead of `generateDsPod` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-06 10:31:45 +05:30
mendelski	51f1660db7	deduplicate scale-up test setup	2023-05-11 12:55:59 +00:00
mendelski	3e2d48d86d	Add option parallel-scale-up	2023-05-11 07:49:34 +00:00
Maria Oparka	ca088d26c2	Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions	2023-04-19 11:08:20 +02:00
Aleksandra Gacek	656f1919a8	Limit refresh rate of GCE MIG instances.	2023-04-18 15:08:06 +02:00
vadasambar	ff6fe5833d	feat: check only controller ref to decide if a pod is replicated Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit `144a64a402`) fix: set `replicated` to true if controller ref is set to `true` - forgot to add this in the last commit Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit `f8f458295d`) fix: remove `checkReferences` - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit `5df6e31f8b`) test(drain): add test for custom controller pod Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add flag to allow scale down on custom controller pods - set to `false` by default - `false` will be set to `true` by default in the future - right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true` - TODO: this code might need some unit tests. Look into adding unit tests. Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: remove `at` symbol in prefix of `vadasambar` - to keep it consistent with previous such mentions in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(utils): run all drain tests twice - once for `allowScaleDownOnCustomControllerOwnedPods=false` - and once for `allowScaleDownOnCustomControllerOwnedPods=true` Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs(utils): add description for `testOpts` struct Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(utils/drain): fix failing tests - refactor code to add cusom controller pod test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix long code comments - clean-up print statements Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move `expectFatal` right above where it is used - makes the code easier to read Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix code comment wording Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address PR comments - abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future - fix param info in the FAQ.md - simplify tests and remove the global variable used in the tests - rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods` - refactor tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update flag info Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: forgot to change flag name on a line in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `ControllerRef()` directly instead of `controllerRef` - we don't need an extra variable Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: create tests consolidated test cases - from looping over and tweaking shared test cases - so that we don't have to duplicate shared test cases Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: append test flag to shared test description - so that the failed test is easy to identify - shallow copy tests and add comments so that others do the same Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-03-22 10:51:07 +05:30
Kubernetes Prow Robot	b1b7f39908	Merge pull request #5578 from yaroslava-serdiuk/empty-sorting Add empty nodes sorting for scale down candidates	2023-03-09 04:28:02 -08:00
Kubernetes Prow Robot	205293a7ca	Merge pull request #5537 from arrikto/feature-disable-unready-scaledown cluster-autoscaler: Add option to disable scale down of unready nodes	2023-03-08 07:55:11 -08:00
Yaroslava Serdiuk	cea9d1a73b	Add empty nodes sorting for scale down candidates	2023-03-08 15:43:22 +00:00
Grigoris Thanasoulas	6cf8c329da	cluster-autoscaler: Add option to disable scale down of unready nodes Add flag '--scale-down-unready-enabled' to enable or disable scale-down of unready nodes. Default value set to true for backwards compatibility (i.e., allow scale-down of unready nodes). Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>	2023-03-06 15:51:10 +02:00
Yaroslava Serdiuk	a35d6d2269	Fix RemovalSimulation for parallel scale down	2023-03-01 17:30:30 +00:00
tombokombo	93b9f4b8be	fix Signed-off-by: tombokombo <tombo@sysart.tech>	2023-02-07 14:35:23 +01:00
Kubernetes Prow Robot	e911e54f1e	Merge pull request #5214 from tombokombo/fix/asg-resource-tags Fix/asg resource tags	2023-02-07 01:35:01 -08:00
Bartłomiej Wróblewski	b608278386	Add force Daemon Sets option	2023-01-30 11:02:42 +00:00
Bartłomiej Wróblewski	d4b812e936	Add filtering out DS pods from scale-up, refactor default pod list processor	2023-01-23 17:14:46 +00:00
Kubernetes Prow Robot	f507519916	Merge pull request #5423 from yaroslava-serdiuk/sd-sorting Add scale down candidates observer	2023-01-19 10:14:16 -08:00
Yaroslava Serdiuk	541ce04e4b	Add previous scale down candidate sorting	2023-01-19 16:04:50 +00:00
michael mccune	955396e857	remove clusterapi nodegroupset processor as discussed with the cluster api community[0], the nodegroupset processor is being removed from the clusterapi provider implementation in favor of instructing our community on the use of the --balancing-ignore-label flag. due to the wide variety of provider infrastructures that clusterapi can be deployed on, we would prefer to not encode all of these labels in the autoscaler itself. see the linked recording for more information. [0] https://www.youtube.com/watch?v=jbhca_9oPuQ	2023-01-12 15:05:37 -05:00
Kubernetes Prow Robot	b94f340af5	Merge pull request #5402 from Bryce-Soghigian/bsoghigian/adding-configurable-difference-ratios adding configurable difference ratios	2023-01-10 04:03:25 -08:00
bsoghigian	0f8ed0b81f	Configurable difference ratios	2023-01-09 22:40:16 -08:00
Kubernetes Prow Robot	3785a2f82a	Merge pull request #5223 from grosser/grosser/burst cluster-autoscaler: allow setting kuberentes client burst and qps to avoid rate limiting	2022-12-30 06:21:30 -08:00
Michael Grosser	cd26bcfe60	allow setting kuberentes client burst and qps to avoid rate limiting	2022-12-29 13:54:04 -08:00
Bartłomiej Wróblewski	62c68e1280	Move PredicateChecker initialization before processors initialization	2022-12-27 15:21:41 +00:00
Kubernetes Prow Robot	a46a095fe2	Merge pull request #5362 from yasinlachiny/maxnodetotal set cluster_autoscaler_max_nodes_count dynamically	2022-12-19 00:33:44 -08:00
yasin.lachiny	6d9fed5211	set cluster_autoscaler_max_nodes_count dynamically Signed-off-by: yasin.lachiny <yasin.lachiny@gmail.com>	2022-12-11 00:18:03 +01:00
Bartłomiej Wróblewski	2e1b04ff69	Add default PodListProcessor wrapper	2022-12-09 16:26:56 +00:00
Yaroslava Serdiuk	ae45571af9	Create a Planner object if --parallelDrain=true	2022-12-07 11:36:05 +00:00

1 2 3 4 5

226 Commits