Commit Graph

172 Commits

Author SHA1 Message Date
Maciej Pytel 930c210843 Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-05 12:29:45 +01:00
Marcin Wielgus 37ff77372a Skip iteration if pending pods are too new 2018-03-05 12:17:14 +01:00
Beata Skiba 9b56bdefb4 Remove old unregistered nodes before checking cluster healthiness 2018-02-02 15:56:47 +01:00
Marcin Wielgus f8c0e20ad9 Source fix after godep update 2017-11-28 14:01:43 +01:00
Marcin Wielgus 2589c43a61
Merge pull request #469 from aleksandra-malinowska/single-unregistered-flag
Remove --unregistered-node-removal-time flag
2017-11-16 13:07:52 +01:00
Krzysztof Jastrzebski 6c8d3aa37d Fix unit static autoscaler unit tests. 2017-11-15 16:13:18 +01:00
Aleksandra Malinowska 2ff962e53e Remove --unregistered-node-removal-time flag 2017-11-15 11:11:30 +01:00
Marcin Wielgus ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel d81dca5991 Mark nodes with uninitialized GPUs as unready 2017-11-10 17:56:10 +01:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
mmerrill3 3d043f73cb Renaming the interface function to Cleanup() for CloudProvider type 2017-11-01 12:41:13 -04:00
mmerrill3 77aa30a5c1 Fixing for issue 252 by implementing a channel to stop the go routine 2017-11-01 11:00:00 -04:00
Maciej Pytel c376ef3c87 Add metrics for autoprovisioning 2017-10-31 17:42:58 +01:00
Maciej Pytel 9c2ebccbfe Write events when autoprovisioned nodegroup is created / deleted 2017-10-25 17:39:30 +02:00
Maciej Pytel 07511f444a Add Refresh method to cloud provider
This can be used to dynamically update cloud provider
config (in particular list of managed NodeGroups and their
min/max constraints).
Add GKE implementation.
2017-10-24 18:36:29 +02:00
Marcin Wielgus 596f478e63 Merge pull request #414 from krzysztof-jastrzebski/resource_limit
Adds resource limits to cloud provider.
2017-10-23 20:38:04 +02:00
Krzysztof Jastrzebski 56ac572666 Adds resource limits to cloud provider. 2017-10-23 16:06:56 +02:00
Maciej Pytel 7b95e71315 Use GKE alpha client when autoprovisioning is enabled 2017-10-23 15:21:02 +02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Maciej Pytel 02ccba3338 Update clusterstate after scale-up 2017-10-17 16:11:25 +02:00
Maciej Pytel 3498507220 Handle nodegroup id changing upon creation 2017-10-17 14:02:46 +02:00
Marcin Wielgus f658450b16 Merge pull request #379 from MaciekPytel/long_unregistered_node
Keep track of nodes that failed to register for a long time
2017-09-28 15:01:32 +02:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Marcin Wielgus 9631f0f136 Merge pull request #375 from MaciekPytel/failed_scale_up_reason
Add failed scale-up reason in metric
2017-09-26 19:23:47 +02:00
Maciej Pytel e12ee88f5f Add failed scale-up reason in metric 2017-09-26 13:40:34 +02:00
Krzysztof Jastrzebski 16e9106c07 Fix setting target size for group in core/static_autoscaler_test.go. 2017-09-26 10:58:00 +02:00
Krzysztof Jastrzebski 80a7577399 Unit tests. 2017-09-25 11:37:24 +02:00
Maciej Pytel 098ebbee09 Log event when removing unregistered node 2017-09-22 22:48:07 +02:00
Marcin Wielgus 32c4a7ba5c Merge pull request #360 from aleksandra-malinowska/leaking-taints
Fix leaking taints in case of cloud provider error on node deletion
2017-09-22 21:43:55 +01:00
Maciej Pytel 5e05c84cf0 Add metric counting failed scale-ups
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00
Aleksandra Malinowska 4c31a57374 fix leaking taints in case of cloud provider error on node deletion 2017-09-22 17:55:48 +02:00
Matt Terry 63310ef41a Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay 2017-09-18 17:09:44 -07:00
Marcin Wielgus f04113d746 Remove TargetSize() from loops iterating over nodes 2017-09-13 22:33:17 +02:00
Marcin Wielgus 303f86c163 Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix
Move calculateUnneededOnly check after unneeded calculations
2017-09-13 11:14:51 +02:00
Marcin Wielgus 4bed50d290 Merge pull request #331 from aleksandra-malinowska/min-cluster-cpu-memory
Respect minimum cores/memory limit during scale down
2017-09-13 11:12:29 +02:00
Aleksandra Malinowska 197b05b180 respect minimum cores/memory limit during scale down 2017-09-13 10:10:47 +02:00
Krzysztof Jastrzebski d8db14701e Core/static_autoscaler_test.go unit tests. 2017-09-13 09:52:07 +02:00
Matt Terry 43943cdeb4 Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start 2017-09-12 21:38:29 -07:00
Aleksandra Malinowska 187c02693e Taint empty nodes to be deleted 2017-09-12 17:40:05 +02:00
Marcin Wielgus ef730e19c5 Merge pull request #332 from krzysztof-jastrzebski/scale_up2
Fix filtering for autoprovisioned node groups and add unit test.
2017-09-12 16:40:30 +02:00
Krzysztof Jastrzebski b1396c3cd1 Fix filtering for autoprovisioned node groups and add unit test. 2017-09-12 16:20:23 +02:00
Marcin Wielgus 738fb640e1 Merge pull request #330 from krzysztof-jastrzebski/core-test4
Core/autoscaling_context_test.go unit tests.
2017-09-12 15:07:22 +02:00
Marcin Wielgus 9d3e52551c Merge pull request #329 from krzysztof-jastrzebski/scale_down2
Core/scale_down.go unit tests.
2017-09-12 13:12:46 +02:00
Marcin Wielgus 3039a0e813 Merge pull request #319 from krzysztof-jastrzebski/core-test
Core/static_autoscaler.go unit tests.
2017-09-12 13:11:11 +02:00
Krzysztof Jastrzebski 001ade48c9 Core/autoscaling_context_test.go unit tests. 2017-09-12 11:04:18 +02:00
Krzysztof Jastrzebski 1db2513f1f Core/scale_down.go unit tests. 2017-09-12 10:41:19 +02:00
Beata Skiba eba0fa2f95 Remove nodes that are not in the cluster from unremovableNodes 2017-09-11 20:01:02 +02:00
Krzysztof Jastrzebski 0aec68a46d Core/static_autoscaler.go unit tests. Current time usage refactoring. 2017-09-11 15:07:21 +02:00