autoscaler

Commit Graph

Author	SHA1	Message	Date
Yaroslava Serdiuk	dffff4f557	Add ProvisioningRequest injector (#6529 ) * Add ProvisioningRequests injector * Add test case for Accepted conditions and add supported provreq classes list * Use Passive clock	2024-02-28 02:14:49 -08:00
Yaroslava Serdiuk	5286b3f770	Add ProvisioningRequestProcessor (#6488 )	2024-02-14 05:14:46 -08:00
Daniel Kłobuszewski	a842d4f108	Reduce log spam in AtomicResizeFilteringProcessor Also, introduce default per-node logging quotas. For now, identical to the per-pod ones.	2024-02-07 12:01:05 +01:00
Yaroslava Serdiuk	ed6ebbe8ba	ScaleUp for check-capacity ProvisioningRequestClass (#6451 ) * ScaleUp for check-capacity ProvisioningRequestClass * update condition logic * Update tests * Naming update * Update cluster-autoscaler/core/scaleup/orchestrator/wrapper_orchestrator_test.go Co-authored-by: Bartek Wróblewski <bwroblewski@google.com> --------- Co-authored-by: Bartek Wróblewski <bwroblewski@google.com>	2024-01-30 02:36:59 -08:00
Guo Peng	68e661f1ed	feat:add node group health and back off metrics	2024-01-23 19:39:18 +08:00
vadasambar	5de49a11fb	feat: support `--scale-down-delay-after-` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: update scale down status after every scale up - move scaledown delay status to cluster state/registry - enable scale down if `ScaleDownDelayTypeLocal` is enabled - add new funcs on cluster state to get and update scale down delay status - use timestamp instead of booleans to track scale down delay status Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use existing fields on clusterstate - uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus` - changed the above existing fields a little to make them more convenient for use - moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove note saying only `scale-down-after-add` is supported - because we are supporting all the flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add` - because it is not true anymore - we are supporting all `--scale-down-delay-after-` flags per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate tests failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move back initializing processors logic to from static autoscaler to main - we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors - and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert changes related to `clusterstate` - since I am going with observer pattern Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add observer interface for state of scaling - to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same) - refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove params passed to `clearScaleUpFailures` - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert clusterstate tests - approach has changed - I am not making any changes in clusterstate now Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add accidentally deleted lines for clusterstate test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement `Add` fn for scale state observer - to easily add new observers - re-word comments - remove redundant params from `NewDefaultScaleDownCandidatesProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI complaining because no comments on fn definitions Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: initialize parent `ScaleDownCandidatesProcessor` - instead of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add scale state notifier to list of default processors - initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn - this allows more flexibility Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add observer interface - create a separate observer directory - implement `RegisterScaleUp` function in the clusterstate - TODO: resolve syntax errors Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: use `scaleStateNotifier` in place of `clusterstate` - delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory) - register `clustertstate` with `scaleStateNotifier` - use `Register` instead of `Add` function in `scaleStateNotifier` - fix `go build` - wip: fixing tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix syntax errors - add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable) Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip track scale down failures along with scale up failures - I was tracking scale up failures but not scale down failures - fix copyright year 2017 -> 2023 for the new `pointers` package Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: register failed scale down with scale state notifier - wip writing tests for `scale_down_candidates_delay_processor` - fix CI lint errors - remove test file for `scale_down_candidates_processor` (there is not much to test as of now) Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add unit tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor` - not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: better doc comments for `TestGetScaleDownCandidates` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't ignore error in `NGChangeObserver` - return it instead and let the caller decide what to do with it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change pointers to values in `NGChangeObserver` interface - easier to work with - remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now) - add tests for clusterstate's `RegisterScaleUp` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: conditions in `GetScaleDownCandidates` - set scale down in cool down if the number of scale down candidates is 0 Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: use `ng1` instead of `ng2` in existing test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: assign directly instead of using `sdProcessor` variable - variable is not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: first working test for static autoscaler Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: continue working on static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip second static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `Println` used for debugging Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add static_autoscaler tests for scale down delay per nodegroup flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: rebase off the latest `master` - change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate test failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing orchestrator test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor` - describes the processor better Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver` - makes it easier to understand for someone not familiar with the codebase Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: reword code comment `after` -> `for which` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't return error from `RegisterScaleDown` - not needed as of now (no implementer function returns a non-nil error for this function) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address review comments around ng change observer interface - change dir structure of nodegroup change observer package - stop returning errors wherever it is not needed in the nodegroup change observer interface - rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: make nodegroupchange observer thread-safe Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add TODO to consider using multiple mutexes in nodegroupchange observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `time.Now()` directly instead of assigning a variable to it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: share code for checking if there was a recent scale-up/down/failure Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: convert `ScaleDownCandidatesDelayProcessor` into table tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()` - makes it easier to understand what the function does Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: replace scale state notifier `Register` -> `RegisterForNotifications` in test - to fix syntax errors since it is already renamed in the actual code Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `clusterStateRegistry` from `delete_in_batch` tests - not needed anymore since we have `scaleStateNotifier` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address PR review comments Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add empty `RegisterFailedScaleDown` for clusterstate - fix syntax error in static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2024-01-11 21:46:42 +05:30
Yaroslava Serdiuk	d29ffd03b9	Add ProvisioningRequestPodsFilter processor (#6386 ) * Introduce ProvisioningRequestPodsFilter processor * Review	2024-01-03 11:49:36 +01:00
Walid Ghallab	11a084699c	Convert status in cluster-autoscaler-status to yaml and add error info for backoff and more node counts. Change-Id: Ic68e0d67b7ce9912b605b6c0a3356b4d0e177911	2023-12-28 18:52:55 +00:00
Bartłomiej Wróblewski	81a4721f51	Extend BinpackingLimiter interface	2023-10-06 13:53:33 +00:00
Artem Minyaylov	a68b748fd7	Refactor NodeDeleteOptions for use in drainability rules	2023-09-29 17:55:19 +00:00
Karol Wychowaniec	11e03a814a	Fix processors order in tests and add a comment explaining this order	2023-09-13 14:26:31 +00:00
Kubernetes Prow Robot	72f8a3bafa	Merge pull request #6103 from kawych/composite_proc Refactor ScaleDownSet processor into a composite processor	2023-09-13 04:18:12 -07:00
Piotr	ffe6537163	Unifies pod listing.	2023-09-11 17:11:11 +00:00
Karol Wychowaniec	ea94a2b343	Refactor ScaleDownSet processor into a composite processor	2023-09-11 16:19:09 +00:00
Julian Tölle	83134923f4	fix: scale down broken for providers not implementing NodeGroup.GetOptions()	2023-08-15 12:42:05 +02:00
Bartłomiej Wróblewski	e39d1b028d	Clean up NodeGroupConfigProcessor interface	2023-08-04 16:00:50 +00:00
Kubernetes Prow Robot	c6893e9e28	Merge pull request #5672 from vadasambar/feat/5399/ignore-daemonsets-utilization-per-nodegroup feat: set `IgnoreDaemonSetsUtilization` per nodegroup for AWS	2023-07-12 07:43:12 -07:00
Damika Gamlath	0f8502c623	Refactor autoscaler.go and static_autoscalar.go to move declaration of the NodeDeletion option to main.go	2023-07-10 08:49:49 +00:00
vadasambar	7941bab214	feat: set `IgnoreDaemonSetsUtilization` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: test cases failing for actuator and scaledown/eligibility - abstract default values into `config` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code - there is no change in the flag name - rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: reset help text for `ignore-daemonsets-utilization` flag - because per nodegroup override is supported only for AWS ASG tags as of now Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add info about overriding `--ignore-daemonsets-utilization` per ASG - in AWS cloud provider README Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface - to limit the functions that can be used - since we need it only for `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: tests failing for actuator - rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter` - move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: go lint errors for `MockNodeGroupConfigGetter` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update node group config processor tests for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update eligibility test cases for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: run actuation tests for 2 NGS - one with `IgnoreDaemonSetsUtilization`: `false` - one with `IgnoreDaemonSetsUtilization`: `true` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in actuator - add helper to generate multiple ds pods dynamically - get rid of mock config processor because it is not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing tests for actuator Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option - not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `generateDsPods` instead of `generateDsPod` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-06 10:31:45 +05:30
Hakan Bostan	333a0286e0	Rename the autoscaling option * Renamed the "AtomicScaling" autoscaling option to "ZeroOrMaxNodeScaling" to be more clear about the behavior.	2023-06-30 11:17:53 +00:00
Karol Wychowaniec	374cf611b7	Address next set of comments	2023-06-27 13:35:02 +00:00
Kubernetes Prow Robot	265e57c163	Merge pull request #5835 from elmiko/add-more-balance-logging Cluster Autoscaler: add more logging for balancing similar node groups	2023-06-26 05:25:46 -07:00
Kushagra	541affc28a	addressed comments	2023-06-16 14:03:20 +00:00
Kushagra	db0c783353	make no-op binpacking limiter as default + move mark nodegroups to its method	2023-06-13 12:21:26 +00:00
michael mccune	5b0ad270de	add more logging for balancing similar node groups this change adds some logging at verbosity levels 2 and 3 to help diagnose why the cluster-autoscaler does not consider 2 or more node groups to be similar.	2023-06-05 14:03:57 +02:00
Kushagra	49cfd18000	BinpackingLimiter interface	2023-06-02 12:59:42 +00:00
Bartłomiej Wróblewski	9604756004	Sanitize taints before scheduling DSs on template node infos	2023-04-19 14:23:42 +00:00
Bartłomiej Wróblewski	b8d40fdd3c	Add status taints option to template creation	2023-04-19 13:55:38 +00:00
Maria Oparka	ca088d26c2	Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions	2023-04-19 11:08:20 +02:00
Bartłomiej Wróblewski	d5d0a3c7b7	Fix drain logic when skipNodesWithCustomControllerPods=false, set NodeDeleteOptions correctly	2023-04-04 09:50:26 +00:00
Yaroslava Serdiuk	cea9d1a73b	Add empty nodes sorting for scale down candidates	2023-03-08 15:43:22 +00:00
peaaceChoi	82e8804181	Fix continue condition	2023-03-03 06:26:38 +00:00
peaaceChoi	460836285d	Delete unused return param	2023-03-03 04:53:50 +00:00
tombokombo	93b9f4b8be	fix Signed-off-by: tombokombo <tombo@sysart.tech>	2023-02-07 14:35:23 +01:00
Kubernetes Prow Robot	e911e54f1e	Merge pull request #5214 from tombokombo/fix/asg-resource-tags Fix/asg resource tags	2023-02-07 01:35:01 -08:00
Bartłomiej Wróblewski	b608278386	Add force Daemon Sets option	2023-01-30 11:02:42 +00:00
Bartłomiej Wróblewski	0470fdfc35	Clean up DS utils: remove unused cluster snapshot and predicate checker	2023-01-23 14:14:53 +00:00
Kubernetes Prow Robot	f507519916	Merge pull request #5423 from yaroslava-serdiuk/sd-sorting Add scale down candidates observer	2023-01-19 10:14:16 -08:00
Yaroslava Serdiuk	541ce04e4b	Add previous scale down candidate sorting	2023-01-19 16:04:50 +00:00
Yaroslava Serdiuk	97159df69b	Add scale down candidates observer	2023-01-19 16:04:42 +00:00
michael mccune	955396e857	remove clusterapi nodegroupset processor as discussed with the cluster api community[0], the nodegroupset processor is being removed from the clusterapi provider implementation in favor of instructing our community on the use of the --balancing-ignore-label flag. due to the wide variety of provider infrastructures that clusterapi can be deployed on, we would prefer to not encode all of these labels in the autoscaler itself. see the linked recording for more information. [0] https://www.youtube.com/watch?v=jbhca_9oPuQ	2023-01-12 15:05:37 -05:00
bsoghigian	0f8ed0b81f	Configurable difference ratios	2023-01-09 22:40:16 -08:00
Kubernetes Prow Robot	d9ffb8f5ce	Merge pull request #5317 from grosser/grosser/ref2 cluster-autoscaler: refactor BalanceScaleUpBetweenGroups	2022-12-19 00:49:44 -08:00
Kubernetes Prow Robot	bc483274e4	Merge pull request #5325 from x13n/master Log node group min and current size when skipping scale down	2022-11-24 02:24:03 -08:00
Daniel Kłobuszewski	d9100cd707	Log node group min and current size when skipping scale down	2022-11-23 13:23:07 +01:00
Michael Grosser	62f29d23af	cluster-autoscaler: refactor BalanceScaleUpBetweenGroups	2022-11-15 13:21:29 -08:00
Bartłomiej Wróblewski	4373c467fe	Add ScaleDown.Actuator to AutoscalingContext	2022-11-02 13:12:25 +00:00
Daniel Kłobuszewski	18f2e67c4f	Split out code from simulator package	2022-10-18 11:51:44 +02:00
tombokombo	17a09a0efe	fix asg resource tags Signed-off-by: tombokombo <tombo@sysart.tech>	2022-09-27 01:25:29 +02:00
Flavian	f1b6d4ded6	handle directx nodes the same as gpu nodes	2022-09-23 09:55:14 +02:00

1 2 3

149 Commits