Commit Graph

33 Commits

Author SHA1 Message Date
Bartłomiej Wróblewski 14655d219f Remove the MaxNodeProvisioningTimeProvider interface 2023-08-05 11:26:40 +00:00
vadasambar eff7888f10 refactor: use `actuatorNodeGroupConfigGetter` param in `NewActuator`
- instead of passing all the processors (we only need `NodeGroupConfigProcessor`)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:48:58 +05:30
vadasambar 7941bab214 feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:31:45 +05:30
Daniel Gutowski 5fed449792 Add ClusterStateRegistry to the AutoscalingContext.
Due to the dependency of the MaxNodeProvisionTimeProvider on the context
the provider was extracted to a dedicated package and injected to the
ClusterStateRegistry after context creation.
2023-07-04 05:00:09 -07:00
Maria Oparka ca088d26c2 Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions 2023-04-19 11:08:20 +02:00
Bartłomiej Wróblewski d5d0a3c7b7 Fix drain logic when skipNodesWithCustomControllerPods=false, set NodeDeleteOptions correctly 2023-04-04 09:50:26 +00:00
Kubernetes Prow Robot 205293a7ca
Merge pull request #5537 from arrikto/feature-disable-unready-scaledown
cluster-autoscaler: Add option to disable scale down of unready nodes
2023-03-08 07:55:11 -08:00
Grigoris Thanasoulas 6cf8c329da cluster-autoscaler: Add option to disable scale down of unready nodes
Add flag '--scale-down-unready-enabled' to enable or disable scale-down
of unready nodes. Default value set to true for backwards compatibility
(i.e., allow scale-down of unready nodes).

Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>
2023-03-06 15:51:10 +02:00
Kubernetes Prow Robot edf8779bda
Merge pull request #5472 from DataDog/scaledown-nodedeletion-metric-fix
Fix scaledown:nodedeletion metric calculation
2023-02-28 07:25:17 -08:00
Bartłomiej Wróblewski 43b459bf84 Track PDBRemainingDisruptions in AutoscalingContext 2023-02-24 12:43:29 +00:00
Bartłomiej Wróblewski b5ead036a8 Merge taint utils into one package, make taint modifying methods public 2023-02-13 11:29:45 +00:00
dom.bozzuto 1150fcd27a Fix scaledown:nodedeletion metric calculation
The scaledown:nodedeletion metric duration was incorrectly being computed relative to the start of the RunOnce routine, instead of from the actual start of the deletion. Behavior in the start of the routine (like a long cloudproviderrefresh) would incorrectly skew the nodedeletion duration

Signed-off-by: Domenic Bozzuto <dom.bozzuto@datadoghq.com>
2023-02-02 12:03:38 -05:00
Bartłomiej Wróblewski 10d3f25996 Use scheduling package in filterOutSchedulable processor 2022-11-23 12:32:59 +00:00
Bartłomiej Wróblewski 4373c467fe Add ScaleDown.Actuator to AutoscalingContext 2022-11-02 13:12:25 +00:00
Daniel Kłobuszewski 92f5b8673e Extract scheduling hints to a dedicated object
This removes the need for passing maps back and forth when doing
scheduling simulations.
2022-10-20 11:44:15 +02:00
Daniel Kłobuszewski 18f2e67c4f Split out code from simulator package 2022-10-18 11:51:44 +02:00
Daniel Kłobuszewski 95fd1ed645 Remove ScaleDown dependency on clusterStateRegistry 2022-10-17 21:11:44 +02:00
Kubernetes Prow Robot f445a6a887
Merge pull request #5147 from x13n/scaledown4
Extract criteria for removing unneded nodes to a separate package
2022-10-17 11:51:20 -07:00
Alexandru Matei 0ee2a359e7 Add option to wait for a period of time after node tainting/cordoning
Node state is refreshed and checked again before deleting the node
It gives kube-scheduler time to acknowledge that nodes state has
changed and to stop scheduling pods on them
2022-10-13 10:37:56 +03:00
Daniel Kłobuszewski 3a3ec38a52 Extract criteria for removing unneded nodes to a separate package 2022-09-26 16:49:04 +02:00
Kubernetes Prow Robot 70efe28f8a
Merge pull request #5133 from x13n/scaledown3
Stop treating masters differently in scale down
2022-09-23 11:48:05 -07:00
Kubernetes Prow Robot b3c6b60e1c
Merge pull request #5060 from yaroslava-serdiuk/deleting-in-batch
Introduce NodeDeleterBatcher to ScaleDown actuator
2022-09-22 10:11:06 -07:00
Yaroslava Serdiuk 65b0d78e6e Introduce NodeDeleterBatcher to ScaleDown actuator 2022-09-22 16:19:45 +00:00
Daniel Kłobuszewski 540ff4ee05 Stop treating masters differently in scale down
This filtering was used for two purposes:
- Excluding masters from destination candidates
- Excluding masters from calculating cluster resources

Excluding from destination candidates isn't useful: if pods can schedule
there, they will, so removing them from CA simulation doesn't change
anything.
Excluding from calculating cluster resources actually matches scale up
behavior, where master nodes are treated the same way as regular nodes.
2022-09-16 12:54:33 +02:00
Daniel Kłobuszewski 6419abf155 Move resource limits checking to a separate package 2022-09-15 16:18:57 +02:00
Daniel Kłobuszewski 1284ecd718 Extract checks for scale down eligibility 2022-09-01 15:16:56 +02:00
Kuba Tużnik 6bd2432894 CA: switch legacy ScaleDown to use the new Actuator
NodeDeletionTracker is now incremented asynchronously
for drained nodes, instead of synchronously. This shouldn't
change anything in actual behavior, but some tests
depended on that, so they had to be adapted.

The switch aims to mostly be a semantic no-op, with
the following exceptions:
* Nodes that fail to be tainted won't be included in
  NodeDeleteResults, since they are now tainted
  synchronously.
2022-05-27 15:13:44 +02:00
Kuba Tużnik cda459b19c CA: Extract delay logic out of legacy scale-down 2022-05-26 16:55:59 +02:00
Kuba Tużnik 6a1ab52de7 CA: Extract drain logic out of legacy scale-down
Function signatures are simplified to take the whole
*AutoscalingContext object instead of its various
individual fields.
2022-05-26 16:55:59 +02:00
Daniel Kłobuszewski b0cd570b04 Move handing unremovable nodes to dedicated object 2022-05-24 16:24:10 +02:00
Daniel Kłobuszewski c550b77020 Make NodeDeletionTracker implement ActuationStatus interface 2022-04-28 17:08:10 +02:00
Daniel Kłobuszewski 5a78f49bc2 Move soft tainting logic to a separate package 2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski 7686a1f326 Move existing ScaleDown code to a separate package 2022-04-26 08:48:45 +02:00