Previously, SharedInformerFactory.Start was called before core.NewAutoscaler.
That had the effect that any new informer created as part of
core.NewAutoscaler, in particular in
kubernetes.NewListerRegistryWithDefaultListers, never got started.
One of them was the DaemonSet informer. This had the effect that the DaemonSet
lister had an empty cache and scale down failed with:
I0920 11:06:36.046889 31805 cluster.go:164] node gke-cluster-pohly-default-pool-c9f60a43-5rvz cannot be removed: daemonset for kube-system/pdcsi-node-7hnmc is not present, err: daemonset.apps "pdcsi-node" not found
This was on a GKE cluster with cluster-autoscaler running outside of the
cluster on a development machine.
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: `goto` causing infinite loop
- abstract out running extenders in a separate function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: remove code around extenders
- we decided not to use scheduler extenders for checking if a pod would fit on a node
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: move scheduler config to a `utils/scheduler` package`
- use default config as a fallback
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: fix static_autoscaler test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: `GetSchedulerConfiguration` fn
- remove falling back
- add mechanism to detect if the scheduler config file flag was set
-
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: wip add tests for `GetSchedulerConfig`
- tests are failing now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add tests for `GetSchedulerConfig`
- abstract error messages so that we can use them in the tests
- set api version explicitly (this is what upstream does as well)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: do a round of cleanup to make PR ready for review
- make import names consistent
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: use `pflag` to check if the `--scheduler-config-file` flag was set
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: add comments for exported error constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: don't export error messages
- exporting is not needed
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: add underscore in test file name
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: fix test failing because of no comment on exported `SchedulerConfigFileFlag`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile`
- avoids confusion
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add extra test cases for predicate checker
- where the predicate checker uses custom scheduler config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: remove `setFlags` variable
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: abstract custom scheduler configs into `conifg` package
- make them constants
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: fix linting error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: introduce a new custom test predicate checker
- instead of adding a param to the current one
- this is so that we don't have to pass `nil` to the existing test predicate checker in many places
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig`
- latter narrows down meaning of the function better than former
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename `GetSchedulerConfig` -> `ConfigFromPath`
- `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig`
- move test config to a new package `test` under `config` package
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: add `TODO` for replacing code to parse scheduler config
- with upstream function
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit 144a64a402)
fix: set `replicated` to true if controller ref is set to `true`
- forgot to add this in the last commit
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit f8f458295d)
fix: remove `checkReferences`
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit 5df6e31f8b)
test(drain): add test for custom controller pod
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
feat: add flag to allow scale down on custom controller pods
- set to `false` by default
- `false` will be set to `true` by default in the future
- right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true`
- TODO: this code might need some unit tests. Look into adding unit tests.
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: remove `at` symbol in prefix of `vadasambar`
- to keep it consistent with previous such mentions in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test(utils): run all drain tests twice
- once for `allowScaleDownOnCustomControllerOwnedPods=false`
- and once for `allowScaleDownOnCustomControllerOwnedPods=true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs(utils): add description for `testOpts` struct
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test(utils/drain): fix failing tests
- refactor code to add cusom controller pod test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: fix long code comments
- clean-up print statements
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: move `expectFatal` right above where it is used
- makes the code easier to read
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: fix code comment wording
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: address PR comments
- abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future
- fix param info in the FAQ.md
- simplify tests and remove the global variable used in the tests
- rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods`
- refactor tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: update flag info
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: forgot to change flag name on a line in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: use `ControllerRef()` directly instead of `controllerRef`
- we don't need an extra variable
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: create tests consolidated test cases
- from looping over and tweaking shared test cases
- so that we don't have to duplicate shared test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: append test flag to shared test description
- so that the failed test is easy to identify
- shallow copy tests and add comments so that others do the same
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
Add flag '--scale-down-unready-enabled' to enable or disable scale-down
of unready nodes. Default value set to true for backwards compatibility
(i.e., allow scale-down of unready nodes).
Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>
as discussed with the cluster api community[0], the nodegroupset
processor is being removed from the clusterapi provider implementation
in favor of instructing our community on the use of the
--balancing-ignore-label flag. due to the wide variety of provider
infrastructures that clusterapi can be deployed on, we would prefer to
not encode all of these labels in the autoscaler itself. see the linked
recording for more information.
[0] https://www.youtube.com/watch?v=jbhca_9oPuQ