Commit Graph

84 Commits

Author SHA1 Message Date
Łukasz Osipiuk a849ead286 Precompute inter pod equivalence groups in checkPodsSchedulableOnNode 2019-05-29 18:05:52 +02:00
Chris Bradfield 92ea680f1a Implement an --ignore-taint flag
This change adds support for a user to specify taints to ignore when
considering a node as a template for a node group.
2019-05-14 10:22:59 -07:00
Jiaxin Shan 90666881d3 Move GPULabel and GPUTypes to cloud provider 2019-03-25 13:03:01 -07:00
Andrew McDermott 5ae76ea66e UPSTREAM: <carry>: fix max cluster size calculation on scale up
When scaling up the calculation for computing the maximum cluster size
does not take into account the number of any upcoming nodes and it is
possible to grow the cluster beyond the cluster
size (--max-nodes-total).

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1670695
2019-03-08 13:28:58 +00:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00
Vivek Bagade 79ef3a6940 unexporting methods in utils.go 2019-01-25 00:06:03 +05:30
Łukasz Osipiuk 85a83b62bd Pass nodeGroup->NodeInfo map to ClusterStateRegistry
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Kubernetes Prow Robot 4002559a4c
Merge pull request #1516 from frobware/fix-max-nodes-total-upstream
fix calculation of max cluster size
2019-01-03 10:02:38 -08:00
Maciej Pytel 3f0da8947a Use listers in scale-up 2019-01-02 15:56:01 +01:00
Kubernetes Prow Robot ab7f1e69be
Merge pull request #1464 from losipiuk/lo/stockouts2
Better quota-exceeded/stockout handling
2018-12-31 05:28:08 -08:00
Łukasz Osipiuk 9689b30ee4 Do not use time.Now() in RegisterFailedScaleUp 2018-12-28 17:17:07 +01:00
Łukasz Osipiuk da5bef307b Allow updating Increase for ScaleUpRequest in ClusterStateRegistry 2018-12-28 17:17:07 +01:00
Maciej Pytel 60babe7158 Use kubernetes lister for daemonset instead of custom one
Also migrate to using apps/v1.DaemonSet instead of old
extensions/v1beta1.
2018-12-28 13:55:41 +01:00
Andrew McDermott 5bc77f051c UPSTREAM: <carry>: fix calculation of max cluster size
When scaling up, the calculation for the maximum size of the cluster
based on `--max-nodes-total` doesn't take into account any nodes that
are in the process of coming up. This allows the cluster to grow
beyond the size specified.

With this change I now see:

scale_up.go:266] 21 other pods are also unschedulable
scale_up.go:423] Best option to resize: openshift-cluster-api/amcdermo-ca-worker-us-east-2b
scale_up.go:427] Estimated 18 nodes needed in openshift-cluster-api/amcdermo-ca-worker-us-east-2b
scale_up.go:432] Capping size to max cluster total size (23)
static_autoscaler.go:275] Failed to scale up: max node total count already reached
2018-12-18 17:05:19 +00:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
k8s-ci-robot 7008fb50be
Merge pull request #1380 from losipiuk/lo/backoff
Make Backoff interface
2018-11-07 05:13:43 -08:00
Aleksandra Malinowska 6febc1ddb0 Fix formatted log messages 2018-11-06 14:51:43 +01:00
Aleksandra Malinowska bf6ff4be8e Clean up estimators 2018-11-06 14:15:42 +01:00
Łukasz Osipiuk 0e2c3739b7 Use NodeGroup as key in Backoff 2018-10-30 18:17:26 +01:00
Łukasz Osipiuk 55fc1e2f00 Store NodeGroup in ScaleUpRequest and ScaleDownRequest 2018-10-30 18:03:04 +01:00
Maciej Pytel 6f5e6aab6f Move node group balancing to processor
The goal is to allow customization of this logic
for different use-case and cloudproviders.
2018-10-25 14:04:05 +02:00
Łukasz Osipiuk a266420f6a Recalculate clusterStateRegistry after adding multiple node groups 2018-10-02 17:15:20 +02:00
Łukasz Osipiuk 437efe4af6 If possible use nodeInfo based on created node group 2018-10-02 15:46:45 +02:00
Jakub Tużnik 8179e4e716 Refactor the scale-(up|down) status processors so that they have more info available
Replace the simple boolean ScaledUp property of ScaleUpStatus with a more
comprehensive ScaleUpResult. Add more possible values to ScaleDownResult.
Refactor the processors execution so that they are always executed every
iteration, even if RunOnce exits earlier.
2018-09-20 17:12:02 +02:00
Łukasz Osipiuk bf8cfef10b NodeGroupManager.CreateNodeGroup can return extra created node groups. 2018-09-19 13:55:51 +02:00
Łukasz Osipiuk 705a6d87e2 fixup! Call CheckPodsSchedulableOnNode in scale_up.go via caching layer 2018-09-17 13:01:19 +02:00
Łukasz Osipiuk 0ad4efe920 Call CheckPodsSchedulableOnNode in scale_up.go via caching layer 2018-09-13 17:01:15 +02:00
Aleksandra Malinowska b88e6019f7 code review fixes 3 2018-08-28 18:11:04 +02:00
Aleksandra Malinowska 5620f76c62 Pass NoScaleUpInfo to ScaleUpStatus processor 2018-08-28 14:26:03 +02:00
Aleksandra Malinowska cd9808185e Report reason why pod didn't trigger scale-up 2018-08-28 14:11:36 +02:00
Aleksandra Malinowska 398a1ac153 Fix error on node info not found for group 2018-07-23 11:16:12 +02:00
Pengfei Ni 1dd0147d9e Add more events for CA 2018-07-09 15:42:05 +08:00
Aleksandra Malinowska 800ee56b34 Refactor and extend GPU metrics error types 2018-07-05 13:13:11 +02:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
Karol Gołąb 5eb7021f82 Add GPU-related scaled_up & scaled_down metrics (#974)
* Add GPU-related scaled_up & scaled_down metrics

* Fix name to match SD naming convention

* Fix import after master rebase

* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski 99c8c51bb3 Create NodeGroupManager which is responsible for creating/deleting node groups. 2018-06-14 16:11:32 +02:00
Łukasz Osipiuk b7323bc0d1 Respect GPU limits in scale_up 2018-06-14 15:46:58 +02:00
Łukasz Osipiuk dfcbedb41f Take into consideration nodes from not autoscaled groups when enforcing resource limits 2018-06-14 15:31:40 +02:00
Łukasz Osipiuk 9f75099d2c Restructure checking resource limits in scale_up.go
Preparatory work for before introducing GPU limits
2018-06-13 19:00:37 +02:00
Pengfei Ni be3dd85503 Update scheduler cache package 2018-06-11 13:54:12 +08:00
Łukasz Osipiuk 9c61477d25 Do not return error when getting cpu/memory capacity of node 2018-06-08 15:04:57 +02:00
Beata Skiba b8ae6df5d3 Add post scale up status processor. 2018-06-06 13:34:49 +02:00
Maciej Pytel 856855987b Move some GKE-specific logic outside core
No change in actual logic being executed. Added a new
NodeGroupListProcessor interface to encapsulate the existing logic.
Moved PodListProcessor and refactor how it's passed around
to make it consistent and easy to add similar interfaces.
2018-05-29 12:57:19 +02:00
Krzysztof Jastrzebski 6761d7f354 Execute predicates only for similar pods. 2018-05-29 09:36:11 +02:00
Karol Gołąb 4c710950de Move ClusterStateRegistry to StaticAutoscaler
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Joachim Bartosik bfb70e40ee Allow passing taints to Node Group creation. 2018-05-18 14:33:33 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska feb4ad9e14 Add utility for limiting logging 2018-03-22 12:57:22 +01:00
Marcin Wielgus 04bec08e84 Compilation fix 2018-03-20 20:11:36 +01:00
Maciej Pytel b7f8622eb2 Create node groups with GPU in scale-up.go
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00