Maciej Pytel
930c210843
Delay scale-up including GPU request
...
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-05 12:29:45 +01:00
Marcin Wielgus
37ff77372a
Skip iteration if pending pods are too new
2018-03-05 12:17:14 +01:00
Beata Skiba
9b56bdefb4
Remove old unregistered nodes before checking cluster healthiness
2018-02-02 15:56:47 +01:00
Marcin Wielgus
f8c0e20ad9
Source fix after godep update
2017-11-28 14:01:43 +01:00
Marcin Wielgus
2589c43a61
Merge pull request #469 from aleksandra-malinowska/single-unregistered-flag
...
Remove --unregistered-node-removal-time flag
2017-11-16 13:07:52 +01:00
Krzysztof Jastrzebski
6c8d3aa37d
Fix unit static autoscaler unit tests.
2017-11-15 16:13:18 +01:00
Aleksandra Malinowska
2ff962e53e
Remove --unregistered-node-removal-time flag
2017-11-15 11:11:30 +01:00
Marcin Wielgus
ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
...
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel
d81dca5991
Mark nodes with uninitialized GPUs as unready
2017-11-10 17:56:10 +01:00
Marcin Wielgus
439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
...
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba
2b28ac1a04
Add a workaround for scaling of VMs with GPUs
...
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang
4104a91991
more spelling fixes
2017-11-02 14:21:36 -07:00
mmerrill3
3d043f73cb
Renaming the interface function to Cleanup() for CloudProvider type
2017-11-01 12:41:13 -04:00
mmerrill3
77aa30a5c1
Fixing for issue 252 by implementing a channel to stop the go routine
2017-11-01 11:00:00 -04:00
Maciej Pytel
c376ef3c87
Add metrics for autoprovisioning
2017-10-31 17:42:58 +01:00
Maciej Pytel
9c2ebccbfe
Write events when autoprovisioned nodegroup is created / deleted
2017-10-25 17:39:30 +02:00
Maciej Pytel
07511f444a
Add Refresh method to cloud provider
...
This can be used to dynamically update cloud provider
config (in particular list of managed NodeGroups and their
min/max constraints).
Add GKE implementation.
2017-10-24 18:36:29 +02:00
Marcin Wielgus
596f478e63
Merge pull request #414 from krzysztof-jastrzebski/resource_limit
...
Adds resource limits to cloud provider.
2017-10-23 20:38:04 +02:00
Krzysztof Jastrzebski
56ac572666
Adds resource limits to cloud provider.
2017-10-23 16:06:56 +02:00
Maciej Pytel
7b95e71315
Use GKE alpha client when autoprovisioning is enabled
2017-10-23 15:21:02 +02:00
Krzysztof Jastrzebski
d9c00e5ce1
Adds priority preemption support to cluster autoscaler.
2017-10-23 09:54:56 +02:00
Maciej Pytel
02ccba3338
Update clusterstate after scale-up
2017-10-17 16:11:25 +02:00
Maciej Pytel
3498507220
Handle nodegroup id changing upon creation
2017-10-17 14:02:46 +02:00
Marcin Wielgus
f658450b16
Merge pull request #379 from MaciekPytel/long_unregistered_node
...
Keep track of nodes that failed to register for a long time
2017-09-28 15:01:32 +02:00
Maciej Pytel
ff21b0b00c
Keep track of nodes that failed to register for a long time
...
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Marcin Wielgus
9631f0f136
Merge pull request #375 from MaciekPytel/failed_scale_up_reason
...
Add failed scale-up reason in metric
2017-09-26 19:23:47 +02:00
Maciej Pytel
e12ee88f5f
Add failed scale-up reason in metric
2017-09-26 13:40:34 +02:00
Krzysztof Jastrzebski
16e9106c07
Fix setting target size for group in core/static_autoscaler_test.go.
2017-09-26 10:58:00 +02:00
Krzysztof Jastrzebski
80a7577399
Unit tests.
2017-09-25 11:37:24 +02:00
Maciej Pytel
098ebbee09
Log event when removing unregistered node
2017-09-22 22:48:07 +02:00
Marcin Wielgus
32c4a7ba5c
Merge pull request #360 from aleksandra-malinowska/leaking-taints
...
Fix leaking taints in case of cloud provider error on node deletion
2017-09-22 21:43:55 +01:00
Maciej Pytel
5e05c84cf0
Add metric counting failed scale-ups
...
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00
Aleksandra Malinowska
4c31a57374
fix leaking taints in case of cloud provider error on node deletion
2017-09-22 17:55:48 +02:00
Matt Terry
63310ef41a
Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay
2017-09-18 17:09:44 -07:00
Marcin Wielgus
f04113d746
Remove TargetSize() from loops iterating over nodes
2017-09-13 22:33:17 +02:00
Marcin Wielgus
303f86c163
Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix
...
Move calculateUnneededOnly check after unneeded calculations
2017-09-13 11:14:51 +02:00
Marcin Wielgus
4bed50d290
Merge pull request #331 from aleksandra-malinowska/min-cluster-cpu-memory
...
Respect minimum cores/memory limit during scale down
2017-09-13 11:12:29 +02:00
Aleksandra Malinowska
197b05b180
respect minimum cores/memory limit during scale down
2017-09-13 10:10:47 +02:00
Krzysztof Jastrzebski
d8db14701e
Core/static_autoscaler_test.go unit tests.
2017-09-13 09:52:07 +02:00
Matt Terry
43943cdeb4
Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start
2017-09-12 21:38:29 -07:00
Aleksandra Malinowska
187c02693e
Taint empty nodes to be deleted
2017-09-12 17:40:05 +02:00
Marcin Wielgus
ef730e19c5
Merge pull request #332 from krzysztof-jastrzebski/scale_up2
...
Fix filtering for autoprovisioned node groups and add unit test.
2017-09-12 16:40:30 +02:00
Krzysztof Jastrzebski
b1396c3cd1
Fix filtering for autoprovisioned node groups and add unit test.
2017-09-12 16:20:23 +02:00
Marcin Wielgus
738fb640e1
Merge pull request #330 from krzysztof-jastrzebski/core-test4
...
Core/autoscaling_context_test.go unit tests.
2017-09-12 15:07:22 +02:00
Marcin Wielgus
9d3e52551c
Merge pull request #329 from krzysztof-jastrzebski/scale_down2
...
Core/scale_down.go unit tests.
2017-09-12 13:12:46 +02:00
Marcin Wielgus
3039a0e813
Merge pull request #319 from krzysztof-jastrzebski/core-test
...
Core/static_autoscaler.go unit tests.
2017-09-12 13:11:11 +02:00
Krzysztof Jastrzebski
001ade48c9
Core/autoscaling_context_test.go unit tests.
2017-09-12 11:04:18 +02:00
Krzysztof Jastrzebski
1db2513f1f
Core/scale_down.go unit tests.
2017-09-12 10:41:19 +02:00
Beata Skiba
eba0fa2f95
Remove nodes that are not in the cluster from unremovableNodes
2017-09-11 20:01:02 +02:00
Krzysztof Jastrzebski
0aec68a46d
Core/static_autoscaler.go unit tests. Current time usage refactoring.
2017-09-11 15:07:21 +02:00