autoscaler

Commit Graph

Author	SHA1	Message	Date
Piotr	ffe6537163	Unifies pod listing.	2023-09-11 17:11:11 +00:00
Bartłomiej Wróblewski	14655d219f	Remove the MaxNodeProvisioningTimeProvider interface	2023-08-05 11:26:40 +00:00
Karol Wychowaniec	80053f6eca	Support ZeroOrMaxNodeScaling node groups when cleaning up unregistered nodes	2023-08-03 08:44:46 +00:00
vadasambar	eff7888f10	refactor: use `actuatorNodeGroupConfigGetter` param in `NewActuator` - instead of passing all the processors (we only need `NodeGroupConfigProcessor`) Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-06 10:48:58 +05:30
vadasambar	7941bab214	feat: set `IgnoreDaemonSetsUtilization` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: test cases failing for actuator and scaledown/eligibility - abstract default values into `config` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code - there is no change in the flag name - rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: reset help text for `ignore-daemonsets-utilization` flag - because per nodegroup override is supported only for AWS ASG tags as of now Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add info about overriding `--ignore-daemonsets-utilization` per ASG - in AWS cloud provider README Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface - to limit the functions that can be used - since we need it only for `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: tests failing for actuator - rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter` - move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: go lint errors for `MockNodeGroupConfigGetter` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update node group config processor tests for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update eligibility test cases for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: run actuation tests for 2 NGS - one with `IgnoreDaemonSetsUtilization`: `false` - one with `IgnoreDaemonSetsUtilization`: `true` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in actuator - add helper to generate multiple ds pods dynamically - get rid of mock config processor because it is not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing tests for actuator Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option - not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `generateDsPods` instead of `generateDsPod` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-06 10:31:45 +05:30
Daniel Gutowski	5fed449792	Add ClusterStateRegistry to the AutoscalingContext. Due to the dependency of the MaxNodeProvisionTimeProvider on the context the provider was extracted to a dedicated package and injected to the ClusterStateRegistry after context creation.	2023-07-04 05:00:09 -07:00
Kubernetes Prow Robot	114a35961a	Merge pull request #5705 from damikag/fix-race-condition-between-ca-fetching bugfix: fix race condition between CA fetching list of scheduled pods…	2023-05-12 05:23:01 -07:00
Damika Gamlath	3b4d6d62b9	bugfix: fix race condition between CA fetching list of scheduled pods and pods being scheduled	2023-05-12 11:53:50 +00:00
Bartłomiej Wróblewski	b8d40fdd3c	Add status taints option to template creation	2023-04-19 13:55:38 +00:00
Maria Oparka	ca088d26c2	Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions	2023-04-19 11:08:20 +02:00
Bartłomiej Wróblewski	d5d0a3c7b7	Fix drain logic when skipNodesWithCustomControllerPods=false, set NodeDeleteOptions correctly	2023-04-04 09:50:26 +00:00
Daniel Gutowski	5b6c50e1c6	Apply code reivew remarks: * Rename scaleup.Manager to scaleup.Orchestrator * Remove factory and add Initialize function * Rename the wrpapper package to orchestrator * Rename NewOrchestrator func to just New	2023-03-20 10:16:53 -07:00
Daniel Gutowski	88cdd7ab4e	ScaleUp logic refactors * Simplify the ScaleUp* functions parameter list * Introduce the ScaleUpManagerFactory to allow greater expandability * Simplify helper functions in scale up wrapper * Make the SkippedReasons public and move those to a dedicated file	2023-03-14 03:22:05 -07:00
Daniel Gutowski	675ca31c36	Add ScaleUpManager interface * Add ScaleUpManager interface, which is copy of existing stand-alone functions * Add a wrapper which contains the current scale up logic code	2023-03-14 03:22:00 -07:00
Bartłomiej Wróblewski	43b459bf84	Track PDBRemainingDisruptions in AutoscalingContext	2023-02-24 12:43:29 +00:00
Kuba Tużnik	7e6762535b	CA: stop passing registered upcoming nodes as scale-down candidates Without this, with aggressive settings, scale-down could be removing registered upcoming nodes before they have a chance to become ready (the duration of which should be unrelated to the scale-down settings).	2023-02-10 14:46:19 +01:00
Kubernetes Prow Robot	ba3b244720	Merge pull request #5054 from fookenc/fix-autoscaler-node-deletion Identifying cloud provider deleted nodes	2022-12-16 05:54:17 -08:00
Bartłomiej Wróblewski	fb29a1d3ce	Add currently drained pods before scale-up	2022-12-09 16:27:03 +00:00
Bartłomiej Wróblewski	10d3f25996	Use scheduling package in filterOutSchedulable processor	2022-11-23 12:32:59 +00:00
Clint Fooken	08dfc7e20f	Changing deletion logic to rely on a new helper method in ClusterStateRegistry, and remove old complicated logic. Adjust the naming of the method for cloud instance deletion from NodeExists to HasInstance.	2022-11-04 17:54:05 -07:00
Xintong Liu	524886fca5	Support scaling up node groups to the configured min size if needed	2022-11-02 21:47:00 -07:00
Clint	cf67a3004e	Implementing new cloud provider method for node deletion detection (#1 ) * Adding isNodeDeleted method to CloudProvider interface. Supports detecting whether nodes are fully deleted or are not-autoscaled. Updated cloud providers to provide initial implementation of new method that will return an ErrNotImplemented to maintain existing taint-based deletion clusterstate calculation.	2022-10-17 14:58:38 -07:00
Daniel Kłobuszewski	95fd1ed645	Remove ScaleDown dependency on clusterStateRegistry	2022-10-17 21:11:44 +02:00
Kubernetes Prow Robot	dc73ea9076	Merge pull request #5235 from UiPath/fix_node_delete Add option to wait for a period of time after node tainting/cordoning	2022-10-17 04:29:07 -07:00
Kubernetes Prow Robot	d022e260a1	Merge pull request #4956 from damirda/feature/scale-up-delay-annotations Add podScaleUpDelay annotation support	2022-10-13 09:29:02 -07:00
Alexandru Matei	0ee2a359e7	Add option to wait for a period of time after node tainting/cordoning Node state is refreshed and checked again before deleting the node It gives kube-scheduler time to acknowledge that nodes state has changed and to stop scheduling pods on them	2022-10-13 10:37:56 +03:00
Yaroslava Serdiuk	65b0d78e6e	Introduce NodeDeleterBatcher to ScaleDown actuator	2022-09-22 16:19:45 +00:00
Damir Markovic	11d150e920	Add podScaleUpDelay annotation support	2022-09-05 20:24:19 +02:00
mikelo	c127763a45	switched policy for PodDisruptionBudget from v1beta1 to v1 in time for 1.25	2022-06-24 19:13:03 +02:00
Benjamin Pineau	a726944273	Don't deref nil nodegroup in deleteCreatedNodesWithErrors Various cloudproviders' `NodeGroupForNode()` implementations (including aws, azure, and gce) can returns a `nil` error _and_ a `nil` nodegroup. Eg. we're seeing AWS returning that on failed upscales on live clusters. Checking that `deleteCreatedNodesWithErrors` doesn't return an error is not enough to safely dereference the nodegroup (as returned by `NodeGroupForNode()`) by calling nodegroup.Id(). In that situation, logging and returning early seems the safest option, to give various caches (eg. clusterstateregistry's and cloud provider's) the opportunity to eventually converge.	2022-05-30 18:47:14 +02:00
Kuba Tużnik	6bd2432894	CA: switch legacy ScaleDown to use the new Actuator NodeDeletionTracker is now incremented asynchronously for drained nodes, instead of synchronously. This shouldn't change anything in actual behavior, but some tests depended on that, so they had to be adapted. The switch aims to mostly be a semantic no-op, with the following exceptions: * Nodes that fail to be tainted won't be included in NodeDeleteResults, since they are now tainted synchronously.	2022-05-27 15:13:44 +02:00
Daniel Kłobuszewski	c550b77020	Make NodeDeletionTracker implement ActuationStatus interface	2022-04-28 17:08:10 +02:00
Daniel Kłobuszewski	7f8b2da9e3	Separate ScaleDown logic with a new interface	2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski	5a78f49bc2	Move soft tainting logic to a separate package	2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski	7686a1f326	Move existing ScaleDown code to a separate package	2022-04-26 08:48:45 +02:00
Daniel Kłobuszewski	4187e4ce3d	Extract core test utils to a separate package	2022-04-26 08:48:45 +02:00
Yaroslava Serdiuk	8a7b99c7eb	Continue CA loop when unregistered nodes were removed	2022-04-12 07:49:42 +00:00
Jayant Jain	729038ff2d	Adding support for Debugging Snapshot	2021-12-30 09:08:05 +00:00
Bartłomiej Wróblewski	5076047bf8	Skip iteration loop if node creation failed	2021-06-16 14:40:15 +00:00
Michael McCune	3169a1cd9b	add field keys to cluster autoscaler unit test structs A few of the unit test structures did not have field name keys when using literal structs. This change adds the fields to make this code a little more future-proof.	2021-05-25 12:53:21 -04:00
Eric Mrak and Brett Kochendorfer	8442ba8307	Add argument for Status Configmap tests	2021-02-18 17:21:32 +00:00
Maciek Pytel	65b3c8d3cc	Rename default options to NodeGroupDefaults	2021-01-25 13:21:30 +01:00
Maciek Pytel	3e42b26a22	Per NodeGroup config for scale-down options This is the implementation of https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343.	2021-01-25 11:00:17 +01:00
Maciek Pytel	08d18a7bd0	Define interfaces for per NodeGroup config. This is the first step of implementing https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343. New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.	2021-01-25 11:00:16 +01:00
Bartłomiej Wróblewski	0fb897b839	Update imports after scheduler scheduler/framework/v1alpha1 removal	2020-11-30 10:48:52 +00:00
Maciek Pytel	655b4081f4	Migrate to klog v2	2020-06-05 17:22:26 +02:00
Jakub Tużnik	73a5cdf928	Address recent breaking changes in scheduler The following things changed in scheduler and needed to be fixed: * NodeInfo was moved to schedulerframework * Some fields on NodeInfo are now exposed directly instead of via getters * NodeInfo.Pods is now a list of schedulerframework.PodInfo, not apiv1.Pod * SharedLister and NodeInfoLister were moved to schedulerframework * PodLister was removed	2020-04-24 17:54:47 +02:00
Aleksandra Malinowska	9c6a0f9aab	Filter out expendable pods before initializing snapshot	2020-03-03 12:05:58 +01:00
Łukasz Osipiuk	fa2c6e4d9e	Propagate cluster state to ClusterSnapshot	2020-02-04 20:51:27 +01:00
Aleksandra Malinowska	3614d4ec33	Test balancing autoprovisioned node groups	2020-02-03 17:54:02 +01:00

1 2

98 Commits