Commit Graph

87 Commits

Author SHA1 Message Date
Daniel Kłobuszewski 4187e4ce3d Extract core test utils to a separate package 2022-04-26 08:48:45 +02:00
Yaroslava Serdiuk a9a7d98f2c Add expire time for nodeInfo cache items 2022-02-09 09:38:32 +00:00
Daniel Kłobuszewski 9944137fae Don't cache NodeInfo for recently Ready nodes
There's a race condition between DaemonSet pods getting scheduled to a
new node and Cluster Autoscaler caching that node for the sake of
predicting future nodes in a given node group. We can reduce the risk of
missing some DaemonSet by providing a grace period before accepting nodes in the
cache. 1 minute should be more than enough, except for some pathological
edge cases.
2022-01-26 20:18:53 +01:00
Jayant Jain 729038ff2d Adding support for Debugging Snapshot 2021-12-30 09:08:05 +00:00
Maciek Pytel a0109324a2 Change parameter order of TemplateNodeInfoProvider
Every other processors (and, I think, function in CA?) that takes
AutoscalingContext has it as first parameter. Changing the new processor
for consistency.
2021-09-13 15:08:14 +02:00
Benjamin Pineau 8485cf2052 Move GetNodeInfosForGroups to it's own processor
Supports providing different NodeInfos sources (either upstream or in
local forks, eg. to properly implement variants like in #4000).

This also moves a large and specialized code chunk out of core, and removes
the need to maintain and pass the GetNodeInfosForGroups() cache from the side,
as processors can hold their states themselves.

No functional changes to GetNodeInfosForGroups(), outside mechanical changes
due to the move: remotely call a few utils functions in core/utils package,
pick context attributes (the processor takes the context as arg rather than
ListerRegistry + PredicateChecker + CloudProvider), and use the builtin cache
rather than receiving it from arguments.
2021-08-16 19:43:10 +02:00
Amr Hanafi (MAHDI)) f5c2ab7328 Emit the node group metrics behind a flag 2021-05-20 16:49:39 -07:00
Brett Elliott 013fa19be3 Log failed scale up metric based on string of AutoscalerErrorType. 2021-03-23 15:37:04 +01:00
Brett Elliott 4cddaed2f2 Support for reporting authorization errors during scale up 2021-03-17 14:56:03 +01:00
Bartłomiej Wróblewski 0fb897b839 Update imports after scheduler scheduler/framework/v1alpha1 removal 2020-11-30 10:48:52 +00:00
Jakub Tużnik 73a5cdf928 Address recent breaking changes in scheduler
The following things changed in scheduler and needed to be fixed:
* NodeInfo was moved to schedulerframework
* Some fields on NodeInfo are now exposed directly instead of via getters
* NodeInfo.Pods is now a list of *schedulerframework.PodInfo, not *apiv1.Pod
* SharedLister and NodeInfoLister were moved to schedulerframework
* PodLister was removed
2020-04-24 17:54:47 +02:00
Aleksandra Malinowska 3614d4ec33 Test balancing autoprovisioned node groups 2020-02-03 17:54:02 +01:00
Łukasz Osipiuk e2ca403123 Return error from NewScaleTestAutoscalingContext 2020-01-29 11:22:07 +01:00
Aleksandra Malinowska 0953e0dd63 Use ElementMatch instead of Subsets 2020-01-15 19:41:06 +01:00
Aleksandra Malinowska ed151e637c Add scale up test with triggering, remaining & unschedulable pods 2020-01-15 19:40:41 +01:00
Aleksandra Malinowska 93b01c0fa9 Add verifying scale up status in scale up tests 2020-01-15 18:44:30 +01:00
Łukasz Osipiuk 90a7e47123 Add GPU taint toleration for test pods requiring GPUs 2020-01-03 11:22:21 +01:00
Łukasz Osipiuk 7f083d2393 Move core/utils.go to separate package and split into multiple files 2019-10-22 14:23:40 +02:00
Kubernetes Prow Robot c6067574c1
Merge pull request #2160 from aleksandra-malinowska/scale-up-events-fix
Add resource limit type to NotTriggerScaleUp event
2019-07-05 05:48:38 -07:00
Aleksandra Malinowska 0d0c9440f6 Add no scale up test 2019-07-03 16:38:53 +02:00
Aleksandra Malinowska 7b80f4e8b8 Separate running scale up test from checking results 2019-07-03 16:38:52 +02:00
Vivek Bagade 90aa28a077 Move pod packing in upcoming nodes to RunOnce from Estimator for performance improvements 2019-06-19 14:48:47 +02:00
Krzysztof Jastrzebski 22b4a6283e Optimize building node infos by using map with pods for nodes. 2019-06-03 13:24:09 +02:00
Chris Bradfield 92ea680f1a Implement an --ignore-taint flag
This change adds support for a user to specify taints to ignore when
considering a node as a template for a node group.
2019-05-14 10:22:59 -07:00
Łukasz Osipiuk db4c6f1133 Migrate filter out schedulabe to PodListProcessor 2019-04-15 16:59:13 +02:00
Łukasz Osipiuk c6115b826e Define ProcessorCallbacks interface 2019-04-15 16:59:13 +02:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00
Jacek Kaniuk d969baff22 Cache exemplar ready node for each node group 2019-02-11 17:40:58 +01:00
Jacek Kaniuk f054c53c46 Account for kernel reserved memory in capacity calculations 2019-02-08 17:04:07 +01:00
Marcin Wielgus 99f1dcf9d2
Merge branch 'master' into crc-fix-error-format 2019-02-01 17:22:57 +01:00
Vivek Bagade 79ef3a6940 unexporting methods in utils.go 2019-01-25 00:06:03 +05:30
CodeLingo Bot c0603afdeb Fix error format strings according to best practices from CodeReviewComments
Fix error format strings according to best practices from CodeReviewComments

Fix error format strings according to best practices from CodeReviewComments

Reverted incorrect change to with error format string

Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Resolve conflict

Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Fix error strings in testscases to remedy failing tests

Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Fix more error strings to remedy failing tests

Signed-off-by: CodeLingo Bot <bot@codelingo.io>
2019-01-11 09:10:31 +13:00
Łukasz Osipiuk 85a83b62bd Pass nodeGroup->NodeInfo map to ClusterStateRegistry
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Maciej Pytel 3f0da8947a Use listers in scale-up 2019-01-02 15:56:01 +01:00
Maciej Pytel 9060014992 Use listers in scale-down 2018-12-31 14:55:38 +01:00
Kubernetes Prow Robot ab7f1e69be
Merge pull request #1464 from losipiuk/lo/stockouts2
Better quota-exceeded/stockout handling
2018-12-31 05:28:08 -08:00
Łukasz Osipiuk da5bef307b Allow updating Increase for ScaleUpRequest in ClusterStateRegistry 2018-12-28 17:17:07 +01:00
Maciej Pytel 60babe7158 Use kubernetes lister for daemonset instead of custom one
Also migrate to using apps/v1.DaemonSet instead of old
extensions/v1beta1.
2018-12-28 13:55:41 +01:00
Łukasz Osipiuk 991873c237 Fix gofmt errors 2018-11-26 15:39:59 +01:00
Łukasz Osipiuk 5962354c81 Inject Backoff instance to ClusterStateRegistry on creation 2018-11-13 14:25:16 +01:00
Łukasz Osipiuk 55fc1e2f00 Store NodeGroup in ScaleUpRequest and ScaleDownRequest 2018-10-30 18:03:04 +01:00
Jakub Tużnik 8179e4e716 Refactor the scale-(up|down) status processors so that they have more info available
Replace the simple boolean ScaledUp property of ScaleUpStatus with a more
comprehensive ScaleUpResult. Add more possible values to ScaleDownResult.
Refactor the processors execution so that they are always executed every
iteration, even if RunOnce exits earlier.
2018-09-20 17:12:02 +02:00
Łukasz Osipiuk 84d8f6fd31 Remove obsolete implementations of node-related processors 2018-09-05 11:58:46 +02:00
Aleksandra Malinowska cd9808185e Report reason why pod didn't trigger scale-up 2018-08-28 14:11:36 +02:00
Aleksandra Malinowska 7225a0fcab Move all Kubernetes API clients to AutoscalingKubeClients 2018-07-26 13:31:48 +02:00
Aleksandra Malinowska 0976d2aa07 Move autoscaling options out of static 2018-07-25 10:52:37 +02:00
Aleksandra Malinowska 6b94d7172d Move AutoscalingOptions to config/static 2018-07-23 15:52:27 +02:00
Aleksandra Malinowska ed5e82d85d
Merge pull request #956 from krzysztof-jastrzebski/master
Create NodeGroupManager which is responsible for creating…
2018-06-14 17:25:32 +02:00
Łukasz Osipiuk 51d628c2f1 Add test to check if nodes from not autoscaled groups are used in max-nodes limit 2018-06-14 16:17:51 +02:00
Krzysztof Jastrzebski 99c8c51bb3 Create NodeGroupManager which is responsible for creating/deleting node groups. 2018-06-14 16:11:32 +02:00