Łukasz Osipiuk
a849ead286
Precompute inter pod equivalence groups in checkPodsSchedulableOnNode
2019-05-29 18:05:52 +02:00
Chris Bradfield
92ea680f1a
Implement an --ignore-taint flag
...
This change adds support for a user to specify taints to ignore when
considering a node as a template for a node group.
2019-05-14 10:22:59 -07:00
Jiaxin Shan
90666881d3
Move GPULabel and GPUTypes to cloud provider
2019-03-25 13:03:01 -07:00
Andrew McDermott
5ae76ea66e
UPSTREAM: <carry>: fix max cluster size calculation on scale up
...
When scaling up the calculation for computing the maximum cluster size
does not take into account the number of any upcoming nodes and it is
possible to grow the cluster beyond the cluster
size (--max-nodes-total).
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1670695
2019-03-08 13:28:58 +00:00
Pengfei Ni
128729bae9
Move schedulercache to package nodeinfo
2019-02-21 12:41:08 +08:00
Vivek Bagade
79ef3a6940
unexporting methods in utils.go
2019-01-25 00:06:03 +05:30
Łukasz Osipiuk
85a83b62bd
Pass nodeGroup->NodeInfo map to ClusterStateRegistry
...
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Kubernetes Prow Robot
4002559a4c
Merge pull request #1516 from frobware/fix-max-nodes-total-upstream
...
fix calculation of max cluster size
2019-01-03 10:02:38 -08:00
Maciej Pytel
3f0da8947a
Use listers in scale-up
2019-01-02 15:56:01 +01:00
Kubernetes Prow Robot
ab7f1e69be
Merge pull request #1464 from losipiuk/lo/stockouts2
...
Better quota-exceeded/stockout handling
2018-12-31 05:28:08 -08:00
Łukasz Osipiuk
9689b30ee4
Do not use time.Now() in RegisterFailedScaleUp
2018-12-28 17:17:07 +01:00
Łukasz Osipiuk
da5bef307b
Allow updating Increase for ScaleUpRequest in ClusterStateRegistry
2018-12-28 17:17:07 +01:00
Maciej Pytel
60babe7158
Use kubernetes lister for daemonset instead of custom one
...
Also migrate to using apps/v1.DaemonSet instead of old
extensions/v1beta1.
2018-12-28 13:55:41 +01:00
Andrew McDermott
5bc77f051c
UPSTREAM: <carry>: fix calculation of max cluster size
...
When scaling up, the calculation for the maximum size of the cluster
based on `--max-nodes-total` doesn't take into account any nodes that
are in the process of coming up. This allows the cluster to grow
beyond the size specified.
With this change I now see:
scale_up.go:266] 21 other pods are also unschedulable
scale_up.go:423] Best option to resize: openshift-cluster-api/amcdermo-ca-worker-us-east-2b
scale_up.go:427] Estimated 18 nodes needed in openshift-cluster-api/amcdermo-ca-worker-us-east-2b
scale_up.go:432] Capping size to max cluster total size (23)
static_autoscaler.go:275] Failed to scale up: max node total count already reached
2018-12-18 17:05:19 +00:00
Łukasz Osipiuk
016bf7fc2c
Use k8s.io/klog instead github.com/golang/glog
2018-11-26 17:30:31 +01:00
k8s-ci-robot
7008fb50be
Merge pull request #1380 from losipiuk/lo/backoff
...
Make Backoff interface
2018-11-07 05:13:43 -08:00
Aleksandra Malinowska
6febc1ddb0
Fix formatted log messages
2018-11-06 14:51:43 +01:00
Aleksandra Malinowska
bf6ff4be8e
Clean up estimators
2018-11-06 14:15:42 +01:00
Łukasz Osipiuk
0e2c3739b7
Use NodeGroup as key in Backoff
2018-10-30 18:17:26 +01:00
Łukasz Osipiuk
55fc1e2f00
Store NodeGroup in ScaleUpRequest and ScaleDownRequest
2018-10-30 18:03:04 +01:00
Maciej Pytel
6f5e6aab6f
Move node group balancing to processor
...
The goal is to allow customization of this logic
for different use-case and cloudproviders.
2018-10-25 14:04:05 +02:00
Łukasz Osipiuk
a266420f6a
Recalculate clusterStateRegistry after adding multiple node groups
2018-10-02 17:15:20 +02:00
Łukasz Osipiuk
437efe4af6
If possible use nodeInfo based on created node group
2018-10-02 15:46:45 +02:00
Jakub Tużnik
8179e4e716
Refactor the scale-(up|down) status processors so that they have more info available
...
Replace the simple boolean ScaledUp property of ScaleUpStatus with a more
comprehensive ScaleUpResult. Add more possible values to ScaleDownResult.
Refactor the processors execution so that they are always executed every
iteration, even if RunOnce exits earlier.
2018-09-20 17:12:02 +02:00
Łukasz Osipiuk
bf8cfef10b
NodeGroupManager.CreateNodeGroup can return extra created node groups.
2018-09-19 13:55:51 +02:00
Łukasz Osipiuk
705a6d87e2
fixup! Call CheckPodsSchedulableOnNode in scale_up.go via caching layer
2018-09-17 13:01:19 +02:00
Łukasz Osipiuk
0ad4efe920
Call CheckPodsSchedulableOnNode in scale_up.go via caching layer
2018-09-13 17:01:15 +02:00
Aleksandra Malinowska
b88e6019f7
code review fixes 3
2018-08-28 18:11:04 +02:00
Aleksandra Malinowska
5620f76c62
Pass NoScaleUpInfo to ScaleUpStatus processor
2018-08-28 14:26:03 +02:00
Aleksandra Malinowska
cd9808185e
Report reason why pod didn't trigger scale-up
2018-08-28 14:11:36 +02:00
Aleksandra Malinowska
398a1ac153
Fix error on node info not found for group
2018-07-23 11:16:12 +02:00
Pengfei Ni
1dd0147d9e
Add more events for CA
2018-07-09 15:42:05 +08:00
Aleksandra Malinowska
800ee56b34
Refactor and extend GPU metrics error types
2018-07-05 13:13:11 +02:00
Karol Gołąb
aae4d1270a
Make GetGpuTypeForMetrics more robust
2018-06-26 21:35:16 +02:00
Karol Gołąb
5eb7021f82
Add GPU-related scaled_up & scaled_down metrics ( #974 )
...
* Add GPU-related scaled_up & scaled_down metrics
* Fix name to match SD naming convention
* Fix import after master rebase
* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski
99c8c51bb3
Create NodeGroupManager which is responsible for creating/deleting node groups.
2018-06-14 16:11:32 +02:00
Łukasz Osipiuk
b7323bc0d1
Respect GPU limits in scale_up
2018-06-14 15:46:58 +02:00
Łukasz Osipiuk
dfcbedb41f
Take into consideration nodes from not autoscaled groups when enforcing resource limits
2018-06-14 15:31:40 +02:00
Łukasz Osipiuk
9f75099d2c
Restructure checking resource limits in scale_up.go
...
Preparatory work for before introducing GPU limits
2018-06-13 19:00:37 +02:00
Pengfei Ni
be3dd85503
Update scheduler cache package
2018-06-11 13:54:12 +08:00
Łukasz Osipiuk
9c61477d25
Do not return error when getting cpu/memory capacity of node
2018-06-08 15:04:57 +02:00
Beata Skiba
b8ae6df5d3
Add post scale up status processor.
2018-06-06 13:34:49 +02:00
Maciej Pytel
856855987b
Move some GKE-specific logic outside core
...
No change in actual logic being executed. Added a new
NodeGroupListProcessor interface to encapsulate the existing logic.
Moved PodListProcessor and refactor how it's passed around
to make it consistent and easy to add similar interfaces.
2018-05-29 12:57:19 +02:00
Krzysztof Jastrzebski
6761d7f354
Execute predicates only for similar pods.
2018-05-29 09:36:11 +02:00
Karol Gołąb
4c710950de
Move ClusterStateRegistry to StaticAutoscaler
...
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Joachim Bartosik
bfb70e40ee
Allow passing taints to Node Group creation.
2018-05-18 14:33:33 +02:00
Krzysztof Jastrzebski
88b769b324
Refactor cluster autoscaler builder and add pod list processor.
2018-04-26 12:37:51 +02:00
Aleksandra Malinowska
feb4ad9e14
Add utility for limiting logging
2018-03-22 12:57:22 +01:00
Marcin Wielgus
04bec08e84
Compilation fix
2018-03-20 20:11:36 +01:00
Maciej Pytel
b7f8622eb2
Create node groups with GPU in scale-up.go
...
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00