Commit Graph

98 Commits

Author SHA1 Message Date
Piotr ffe6537163 Unifies pod listing. 2023-09-11 17:11:11 +00:00
Bartłomiej Wróblewski 14655d219f Remove the MaxNodeProvisioningTimeProvider interface 2023-08-05 11:26:40 +00:00
Karol Wychowaniec 80053f6eca Support ZeroOrMaxNodeScaling node groups when cleaning up unregistered nodes 2023-08-03 08:44:46 +00:00
vadasambar eff7888f10 refactor: use `actuatorNodeGroupConfigGetter` param in `NewActuator`
- instead of passing all the processors (we only need `NodeGroupConfigProcessor`)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:48:58 +05:30
vadasambar 7941bab214 feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:31:45 +05:30
Daniel Gutowski 5fed449792 Add ClusterStateRegistry to the AutoscalingContext.
Due to the dependency of the MaxNodeProvisionTimeProvider on the context
the provider was extracted to a dedicated package and injected to the
ClusterStateRegistry after context creation.
2023-07-04 05:00:09 -07:00
Kubernetes Prow Robot 114a35961a
Merge pull request #5705 from damikag/fix-race-condition-between-ca-fetching
bugfix: fix race condition between CA fetching list of scheduled pods…
2023-05-12 05:23:01 -07:00
Damika Gamlath 3b4d6d62b9 bugfix: fix race condition between CA fetching list of scheduled pods and pods being scheduled 2023-05-12 11:53:50 +00:00
Bartłomiej Wróblewski b8d40fdd3c Add status taints option to template creation 2023-04-19 13:55:38 +00:00
Maria Oparka ca088d26c2 Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions 2023-04-19 11:08:20 +02:00
Bartłomiej Wróblewski d5d0a3c7b7 Fix drain logic when skipNodesWithCustomControllerPods=false, set NodeDeleteOptions correctly 2023-04-04 09:50:26 +00:00
Daniel Gutowski 5b6c50e1c6 Apply code reivew remarks:
* Rename scaleup.Manager to scaleup.Orchestrator
* Remove factory and add Initialize function
* Rename the wrpapper package to orchestrator
* Rename NewOrchestrator func to just New
2023-03-20 10:16:53 -07:00
Daniel Gutowski 88cdd7ab4e ScaleUp logic refactors
* Simplify the ScaleUp* functions parameter list
* Introduce the ScaleUpManagerFactory to allow greater expandability
* Simplify helper functions in scale up wrapper
* Make the SkippedReasons public and move those to a dedicated file
2023-03-14 03:22:05 -07:00
Daniel Gutowski 675ca31c36 Add ScaleUpManager interface
* Add ScaleUpManager interface, which is copy of existing stand-alone functions
* Add a wrapper which contains the current scale up logic code
2023-03-14 03:22:00 -07:00
Bartłomiej Wróblewski 43b459bf84 Track PDBRemainingDisruptions in AutoscalingContext 2023-02-24 12:43:29 +00:00
Kuba Tużnik 7e6762535b CA: stop passing registered upcoming nodes as scale-down candidates
Without this, with aggressive settings, scale-down could be removing
registered upcoming nodes before they have a chance to become ready
(the duration of which should be unrelated to the scale-down settings).
2023-02-10 14:46:19 +01:00
Kubernetes Prow Robot ba3b244720
Merge pull request #5054 from fookenc/fix-autoscaler-node-deletion
Identifying cloud provider deleted nodes
2022-12-16 05:54:17 -08:00
Bartłomiej Wróblewski fb29a1d3ce Add currently drained pods before scale-up 2022-12-09 16:27:03 +00:00
Bartłomiej Wróblewski 10d3f25996 Use scheduling package in filterOutSchedulable processor 2022-11-23 12:32:59 +00:00
Clint Fooken 08dfc7e20f Changing deletion logic to rely on a new helper method in ClusterStateRegistry, and remove old complicated logic. Adjust the naming of the method for cloud instance deletion from NodeExists to HasInstance. 2022-11-04 17:54:05 -07:00
Xintong Liu 524886fca5 Support scaling up node groups to the configured min size if needed 2022-11-02 21:47:00 -07:00
Clint cf67a3004e
Implementing new cloud provider method for node deletion detection (#1)
* Adding isNodeDeleted method to CloudProvider interface. Supports detecting whether nodes are fully deleted or are not-autoscaled. Updated cloud providers to provide initial implementation of new method that will return an ErrNotImplemented to maintain existing taint-based deletion clusterstate calculation.
2022-10-17 14:58:38 -07:00
Daniel Kłobuszewski 95fd1ed645 Remove ScaleDown dependency on clusterStateRegistry 2022-10-17 21:11:44 +02:00
Kubernetes Prow Robot dc73ea9076
Merge pull request #5235 from UiPath/fix_node_delete
Add option to wait for a period of time after node tainting/cordoning
2022-10-17 04:29:07 -07:00
Kubernetes Prow Robot d022e260a1
Merge pull request #4956 from damirda/feature/scale-up-delay-annotations
Add podScaleUpDelay annotation support
2022-10-13 09:29:02 -07:00
Alexandru Matei 0ee2a359e7 Add option to wait for a period of time after node tainting/cordoning
Node state is refreshed and checked again before deleting the node
It gives kube-scheduler time to acknowledge that nodes state has
changed and to stop scheduling pods on them
2022-10-13 10:37:56 +03:00
Yaroslava Serdiuk 65b0d78e6e Introduce NodeDeleterBatcher to ScaleDown actuator 2022-09-22 16:19:45 +00:00
Damir Markovic 11d150e920 Add podScaleUpDelay annotation support 2022-09-05 20:24:19 +02:00
mikelo c127763a45 switched policy for PodDisruptionBudget from v1beta1 to v1 in time for 1.25 2022-06-24 19:13:03 +02:00
Benjamin Pineau a726944273 Don't deref nil nodegroup in deleteCreatedNodesWithErrors
Various cloudproviders' `NodeGroupForNode()` implementations (including
aws, azure, and gce) can returns a `nil` error _and_ a `nil` nodegroup.
Eg. we're seeing AWS returning that on failed upscales on live clusters.
Checking that `deleteCreatedNodesWithErrors` doesn't return an error is
not enough to safely dereference the nodegroup (as returned by
`NodeGroupForNode()`) by calling nodegroup.Id().

In that situation, logging and returning early seems the safest option,
to give various caches (eg. clusterstateregistry's and cloud provider's)
the opportunity to eventually converge.
2022-05-30 18:47:14 +02:00
Kuba Tużnik 6bd2432894 CA: switch legacy ScaleDown to use the new Actuator
NodeDeletionTracker is now incremented asynchronously
for drained nodes, instead of synchronously. This shouldn't
change anything in actual behavior, but some tests
depended on that, so they had to be adapted.

The switch aims to mostly be a semantic no-op, with
the following exceptions:
* Nodes that fail to be tainted won't be included in
  NodeDeleteResults, since they are now tainted
  synchronously.
2022-05-27 15:13:44 +02:00
Daniel Kłobuszewski c550b77020 Make NodeDeletionTracker implement ActuationStatus interface 2022-04-28 17:08:10 +02:00
Daniel Kłobuszewski 7f8b2da9e3 Separate ScaleDown logic with a new interface 2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski 5a78f49bc2 Move soft tainting logic to a separate package 2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski 7686a1f326 Move existing ScaleDown code to a separate package 2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski 4187e4ce3d Extract core test utils to a separate package 2022-04-26 08:48:45 +02:00
Yaroslava Serdiuk 8a7b99c7eb Continue CA loop when unregistered nodes were removed 2022-04-12 07:49:42 +00:00
Jayant Jain 729038ff2d Adding support for Debugging Snapshot 2021-12-30 09:08:05 +00:00
Bartłomiej Wróblewski 5076047bf8 Skip iteration loop if node creation failed 2021-06-16 14:40:15 +00:00
Michael McCune 3169a1cd9b add field keys to cluster autoscaler unit test structs
A few of the unit test structures did not have field name keys when
using literal structs. This change adds the fields to make this code a
little more future-proof.
2021-05-25 12:53:21 -04:00
Eric Mrak and Brett Kochendorfer 8442ba8307 Add argument for Status Configmap tests 2021-02-18 17:21:32 +00:00
Maciek Pytel 65b3c8d3cc Rename default options to NodeGroupDefaults 2021-01-25 13:21:30 +01:00
Maciek Pytel 3e42b26a22 Per NodeGroup config for scale-down options
This is the implementation of
https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343.
2021-01-25 11:00:17 +01:00
Maciek Pytel 08d18a7bd0 Define interfaces for per NodeGroup config.
This is the first step of implementing
https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343.
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
2021-01-25 11:00:16 +01:00
Bartłomiej Wróblewski 0fb897b839 Update imports after scheduler scheduler/framework/v1alpha1 removal 2020-11-30 10:48:52 +00:00
Maciek Pytel 655b4081f4 Migrate to klog v2 2020-06-05 17:22:26 +02:00
Jakub Tużnik 73a5cdf928 Address recent breaking changes in scheduler
The following things changed in scheduler and needed to be fixed:
* NodeInfo was moved to schedulerframework
* Some fields on NodeInfo are now exposed directly instead of via getters
* NodeInfo.Pods is now a list of *schedulerframework.PodInfo, not *apiv1.Pod
* SharedLister and NodeInfoLister were moved to schedulerframework
* PodLister was removed
2020-04-24 17:54:47 +02:00
Aleksandra Malinowska 9c6a0f9aab Filter out expendable pods before initializing snapshot 2020-03-03 12:05:58 +01:00
Łukasz Osipiuk fa2c6e4d9e Propagate cluster state to ClusterSnapshot 2020-02-04 20:51:27 +01:00
Aleksandra Malinowska 3614d4ec33 Test balancing autoprovisioned node groups 2020-02-03 17:54:02 +01:00