autoscaler

Commit Graph

Author	SHA1	Message	Date
Bartłomiej Wróblewski	10d3f25996	Use scheduling package in filterOutSchedulable processor	2022-11-23 12:32:59 +00:00
Xintong Liu	524886fca5	Support scaling up node groups to the configured min size if needed	2022-11-02 21:47:00 -07:00
Daniel Kłobuszewski	4187e4ce3d	Extract core test utils to a separate package	2022-04-26 08:48:45 +02:00
Yaroslava Serdiuk	a9a7d98f2c	Add expire time for nodeInfo cache items	2022-02-09 09:38:32 +00:00
Daniel Kłobuszewski	9944137fae	Don't cache NodeInfo for recently Ready nodes There's a race condition between DaemonSet pods getting scheduled to a new node and Cluster Autoscaler caching that node for the sake of predicting future nodes in a given node group. We can reduce the risk of missing some DaemonSet by providing a grace period before accepting nodes in the cache. 1 minute should be more than enough, except for some pathological edge cases.	2022-01-26 20:18:53 +01:00
Jayant Jain	729038ff2d	Adding support for Debugging Snapshot	2021-12-30 09:08:05 +00:00
Maciek Pytel	a0109324a2	Change parameter order of TemplateNodeInfoProvider Every other processors (and, I think, function in CA?) that takes AutoscalingContext has it as first parameter. Changing the new processor for consistency.	2021-09-13 15:08:14 +02:00
Benjamin Pineau	8485cf2052	Move GetNodeInfosForGroups to it's own processor Supports providing different NodeInfos sources (either upstream or in local forks, eg. to properly implement variants like in #4000). This also moves a large and specialized code chunk out of core, and removes the need to maintain and pass the GetNodeInfosForGroups() cache from the side, as processors can hold their states themselves. No functional changes to GetNodeInfosForGroups(), outside mechanical changes due to the move: remotely call a few utils functions in core/utils package, pick context attributes (the processor takes the context as arg rather than ListerRegistry + PredicateChecker + CloudProvider), and use the builtin cache rather than receiving it from arguments.	2021-08-16 19:43:10 +02:00
Amr Hanafi (MAHDI))	f5c2ab7328	Emit the node group metrics behind a flag	2021-05-20 16:49:39 -07:00
Brett Elliott	013fa19be3	Log failed scale up metric based on string of AutoscalerErrorType.	2021-03-23 15:37:04 +01:00
Brett Elliott	4cddaed2f2	Support for reporting authorization errors during scale up	2021-03-17 14:56:03 +01:00
Bartłomiej Wróblewski	0fb897b839	Update imports after scheduler scheduler/framework/v1alpha1 removal	2020-11-30 10:48:52 +00:00
Jakub Tużnik	73a5cdf928	Address recent breaking changes in scheduler The following things changed in scheduler and needed to be fixed: * NodeInfo was moved to schedulerframework * Some fields on NodeInfo are now exposed directly instead of via getters * NodeInfo.Pods is now a list of schedulerframework.PodInfo, not apiv1.Pod * SharedLister and NodeInfoLister were moved to schedulerframework * PodLister was removed	2020-04-24 17:54:47 +02:00
Aleksandra Malinowska	3614d4ec33	Test balancing autoprovisioned node groups	2020-02-03 17:54:02 +01:00
Łukasz Osipiuk	e2ca403123	Return error from NewScaleTestAutoscalingContext	2020-01-29 11:22:07 +01:00
Aleksandra Malinowska	0953e0dd63	Use ElementMatch instead of Subsets	2020-01-15 19:41:06 +01:00
Aleksandra Malinowska	ed151e637c	Add scale up test with triggering, remaining & unschedulable pods	2020-01-15 19:40:41 +01:00
Aleksandra Malinowska	93b01c0fa9	Add verifying scale up status in scale up tests	2020-01-15 18:44:30 +01:00
Łukasz Osipiuk	90a7e47123	Add GPU taint toleration for test pods requiring GPUs	2020-01-03 11:22:21 +01:00
Łukasz Osipiuk	7f083d2393	Move core/utils.go to separate package and split into multiple files	2019-10-22 14:23:40 +02:00
Kubernetes Prow Robot	c6067574c1	Merge pull request #2160 from aleksandra-malinowska/scale-up-events-fix Add resource limit type to NotTriggerScaleUp event	2019-07-05 05:48:38 -07:00
Aleksandra Malinowska	0d0c9440f6	Add no scale up test	2019-07-03 16:38:53 +02:00
Aleksandra Malinowska	7b80f4e8b8	Separate running scale up test from checking results	2019-07-03 16:38:52 +02:00
Vivek Bagade	90aa28a077	Move pod packing in upcoming nodes to RunOnce from Estimator for performance improvements	2019-06-19 14:48:47 +02:00
Krzysztof Jastrzebski	22b4a6283e	Optimize building node infos by using map with pods for nodes.	2019-06-03 13:24:09 +02:00
Chris Bradfield	92ea680f1a	Implement an --ignore-taint flag This change adds support for a user to specify taints to ignore when considering a node as a template for a node group.	2019-05-14 10:22:59 -07:00
Łukasz Osipiuk	db4c6f1133	Migrate filter out schedulabe to PodListProcessor	2019-04-15 16:59:13 +02:00
Łukasz Osipiuk	c6115b826e	Define ProcessorCallbacks interface	2019-04-15 16:59:13 +02:00
Pengfei Ni	128729bae9	Move schedulercache to package nodeinfo	2019-02-21 12:41:08 +08:00
Jacek Kaniuk	d969baff22	Cache exemplar ready node for each node group	2019-02-11 17:40:58 +01:00
Jacek Kaniuk	f054c53c46	Account for kernel reserved memory in capacity calculations	2019-02-08 17:04:07 +01:00
Marcin Wielgus	99f1dcf9d2	Merge branch 'master' into crc-fix-error-format	2019-02-01 17:22:57 +01:00
Vivek Bagade	79ef3a6940	unexporting methods in utils.go	2019-01-25 00:06:03 +05:30
CodeLingo Bot	c0603afdeb	Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Reverted incorrect change to with error format string Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Resolve conflict Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix error strings in testscases to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix more error strings to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io>	2019-01-11 09:10:31 +13:00
Łukasz Osipiuk	85a83b62bd	Pass nodeGroup->NodeInfo map to ClusterStateRegistry Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801	2019-01-08 15:52:00 +01:00
Maciej Pytel	3f0da8947a	Use listers in scale-up	2019-01-02 15:56:01 +01:00
Maciej Pytel	9060014992	Use listers in scale-down	2018-12-31 14:55:38 +01:00
Kubernetes Prow Robot	ab7f1e69be	Merge pull request #1464 from losipiuk/lo/stockouts2 Better quota-exceeded/stockout handling	2018-12-31 05:28:08 -08:00
Łukasz Osipiuk	da5bef307b	Allow updating Increase for ScaleUpRequest in ClusterStateRegistry	2018-12-28 17:17:07 +01:00
Maciej Pytel	60babe7158	Use kubernetes lister for daemonset instead of custom one Also migrate to using apps/v1.DaemonSet instead of old extensions/v1beta1.	2018-12-28 13:55:41 +01:00
Łukasz Osipiuk	991873c237	Fix gofmt errors	2018-11-26 15:39:59 +01:00
Łukasz Osipiuk	5962354c81	Inject Backoff instance to ClusterStateRegistry on creation	2018-11-13 14:25:16 +01:00
Łukasz Osipiuk	55fc1e2f00	Store NodeGroup in ScaleUpRequest and ScaleDownRequest	2018-10-30 18:03:04 +01:00
Jakub Tużnik	8179e4e716	Refactor the scale-(up\|down) status processors so that they have more info available Replace the simple boolean ScaledUp property of ScaleUpStatus with a more comprehensive ScaleUpResult. Add more possible values to ScaleDownResult. Refactor the processors execution so that they are always executed every iteration, even if RunOnce exits earlier.	2018-09-20 17:12:02 +02:00
Łukasz Osipiuk	84d8f6fd31	Remove obsolete implementations of node-related processors	2018-09-05 11:58:46 +02:00
Aleksandra Malinowska	cd9808185e	Report reason why pod didn't trigger scale-up	2018-08-28 14:11:36 +02:00
Aleksandra Malinowska	7225a0fcab	Move all Kubernetes API clients to AutoscalingKubeClients	2018-07-26 13:31:48 +02:00
Aleksandra Malinowska	0976d2aa07	Move autoscaling options out of static	2018-07-25 10:52:37 +02:00
Aleksandra Malinowska	6b94d7172d	Move AutoscalingOptions to config/static	2018-07-23 15:52:27 +02:00
Aleksandra Malinowska	ed5e82d85d	Merge pull request #956 from krzysztof-jastrzebski/master Create NodeGroupManager which is responsible for creating…	2018-06-14 17:25:32 +02:00

1 2

89 Commits