Commit Graph

63 Commits

Author SHA1 Message Date
Łukasz Osipiuk a266420f6a Recalculate clusterStateRegistry after adding multiple node groups 2018-10-02 17:15:20 +02:00
Łukasz Osipiuk 437efe4af6 If possible use nodeInfo based on created node group 2018-10-02 15:46:45 +02:00
Jakub Tużnik 8179e4e716 Refactor the scale-(up|down) status processors so that they have more info available
Replace the simple boolean ScaledUp property of ScaleUpStatus with a more
comprehensive ScaleUpResult. Add more possible values to ScaleDownResult.
Refactor the processors execution so that they are always executed every
iteration, even if RunOnce exits earlier.
2018-09-20 17:12:02 +02:00
Łukasz Osipiuk bf8cfef10b NodeGroupManager.CreateNodeGroup can return extra created node groups. 2018-09-19 13:55:51 +02:00
Łukasz Osipiuk 705a6d87e2 fixup! Call CheckPodsSchedulableOnNode in scale_up.go via caching layer 2018-09-17 13:01:19 +02:00
Łukasz Osipiuk 0ad4efe920 Call CheckPodsSchedulableOnNode in scale_up.go via caching layer 2018-09-13 17:01:15 +02:00
Aleksandra Malinowska b88e6019f7 code review fixes 3 2018-08-28 18:11:04 +02:00
Aleksandra Malinowska 5620f76c62 Pass NoScaleUpInfo to ScaleUpStatus processor 2018-08-28 14:26:03 +02:00
Aleksandra Malinowska cd9808185e Report reason why pod didn't trigger scale-up 2018-08-28 14:11:36 +02:00
Aleksandra Malinowska 398a1ac153 Fix error on node info not found for group 2018-07-23 11:16:12 +02:00
Pengfei Ni 1dd0147d9e Add more events for CA 2018-07-09 15:42:05 +08:00
Aleksandra Malinowska 800ee56b34 Refactor and extend GPU metrics error types 2018-07-05 13:13:11 +02:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
Karol Gołąb 5eb7021f82 Add GPU-related scaled_up & scaled_down metrics (#974)
* Add GPU-related scaled_up & scaled_down metrics

* Fix name to match SD naming convention

* Fix import after master rebase

* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski 99c8c51bb3 Create NodeGroupManager which is responsible for creating/deleting node groups. 2018-06-14 16:11:32 +02:00
Łukasz Osipiuk b7323bc0d1 Respect GPU limits in scale_up 2018-06-14 15:46:58 +02:00
Łukasz Osipiuk dfcbedb41f Take into consideration nodes from not autoscaled groups when enforcing resource limits 2018-06-14 15:31:40 +02:00
Łukasz Osipiuk 9f75099d2c Restructure checking resource limits in scale_up.go
Preparatory work for before introducing GPU limits
2018-06-13 19:00:37 +02:00
Pengfei Ni be3dd85503 Update scheduler cache package 2018-06-11 13:54:12 +08:00
Łukasz Osipiuk 9c61477d25 Do not return error when getting cpu/memory capacity of node 2018-06-08 15:04:57 +02:00
Beata Skiba b8ae6df5d3 Add post scale up status processor. 2018-06-06 13:34:49 +02:00
Maciej Pytel 856855987b Move some GKE-specific logic outside core
No change in actual logic being executed. Added a new
NodeGroupListProcessor interface to encapsulate the existing logic.
Moved PodListProcessor and refactor how it's passed around
to make it consistent and easy to add similar interfaces.
2018-05-29 12:57:19 +02:00
Krzysztof Jastrzebski 6761d7f354 Execute predicates only for similar pods. 2018-05-29 09:36:11 +02:00
Karol Gołąb 4c710950de Move ClusterStateRegistry to StaticAutoscaler
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Joachim Bartosik bfb70e40ee Allow passing taints to Node Group creation. 2018-05-18 14:33:33 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska feb4ad9e14 Add utility for limiting logging 2018-03-22 12:57:22 +01:00
Marcin Wielgus 04bec08e84 Compilation fix 2018-03-20 20:11:36 +01:00
Maciej Pytel b7f8622eb2 Create node groups with GPU in scale-up.go
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00
Maciej Pytel c376ef3c87 Add metrics for autoprovisioning 2017-10-31 17:42:58 +01:00
Maciej Pytel 9c2ebccbfe Write events when autoprovisioned nodegroup is created / deleted 2017-10-25 17:39:30 +02:00
Krzysztof Jastrzebski 56ac572666 Adds resource limits to cloud provider. 2017-10-23 16:06:56 +02:00
Maciej Pytel 02ccba3338 Update clusterstate after scale-up 2017-10-17 16:11:25 +02:00
Maciej Pytel 3498507220 Handle nodegroup id changing upon creation 2017-10-17 14:02:46 +02:00
Maciej Pytel e12ee88f5f Add failed scale-up reason in metric 2017-09-26 13:40:34 +02:00
Aleksandra Malinowska 197b05b180 respect minimum cores/memory limit during scale down 2017-09-13 10:10:47 +02:00
Krzysztof Jastrzebski b1396c3cd1 Fix filtering for autoprovisioned node groups and add unit test. 2017-09-12 16:20:23 +02:00
Aleksandra Malinowska d43029c180 implement blocking scale up beyond max cores & memory 2017-09-08 12:50:00 +02:00
Marcin Wielgus e85e94510d Tests for add autoprovisioned node groups 2017-09-06 02:44:16 +02:00
Marcin Wielgus 1ad8d9e10c Build template NodeInfo for node autoprovisioning 2017-09-05 17:28:49 +02:00
Sergey Lanzman 437a3f60e1 Small optimize code 2017-09-04 23:50:45 +03:00
Marcin Wielgus ae00f0544b Merge pull request #290 from mwielgus/max-nap-groups
Limit autoprovisioned groups to 15
2017-09-01 23:49:33 +05:30
Marcin Wielgus de524a6688 Limit autoprovisioned groups to 15 2017-09-01 18:25:28 +02:00
Maciej Pytel a86268f114 Write event on scale-up failure 2017-09-01 13:34:20 +02:00
Marcin Wielgus f217d4ac93 Do not return error from exist 2017-09-01 00:24:01 +02:00
Marcin Wielgus 22f856d4da Small refactoring in ScaleUp 2017-08-31 13:21:20 +02:00
Marcin Wielgus 6b9e56f0f9 Node autoprovisioning in scale up 2017-08-31 01:33:52 +02:00
Maciej Pytel 281afa7147 precompute predicateMetadata in scale-down 2017-08-29 16:29:45 +02:00
Maciej Pytel fb6ef75d12 Don't create verbose errors in predicates if we ignore them
Turns out all this string formatting is pretty damn expensive.
2017-08-24 15:18:38 +02:00
Maciej Pytel 6aacbb5bf7 Backoff for node group after failed scale-up 2017-08-04 15:40:23 +02:00