Aleksandra Malinowska
ff77f2cc7d
Fix cleaning up taints
2018-05-11 13:51:40 +02:00
Maciej Pytel
930c210843
Delay scale-up including GPU request
...
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-05 12:29:45 +01:00
Marcin Wielgus
37ff77372a
Skip iteration if pending pods are too new
2018-03-05 12:17:14 +01:00
Beata Skiba
9b56bdefb4
Remove old unregistered nodes before checking cluster healthiness
2018-02-02 15:56:47 +01:00
Marcin Wielgus
ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
...
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel
d81dca5991
Mark nodes with uninitialized GPUs as unready
2017-11-10 17:56:10 +01:00
Marcin Wielgus
439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
...
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba
2b28ac1a04
Add a workaround for scaling of VMs with GPUs
...
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang
4104a91991
more spelling fixes
2017-11-02 14:21:36 -07:00
Maciej Pytel
9c2ebccbfe
Write events when autoprovisioned nodegroup is created / deleted
2017-10-25 17:39:30 +02:00
Maciej Pytel
07511f444a
Add Refresh method to cloud provider
...
This can be used to dynamically update cloud provider
config (in particular list of managed NodeGroups and their
min/max constraints).
Add GKE implementation.
2017-10-24 18:36:29 +02:00
Krzysztof Jastrzebski
d9c00e5ce1
Adds priority preemption support to cluster autoscaler.
2017-10-23 09:54:56 +02:00
Maciej Pytel
098ebbee09
Log event when removing unregistered node
2017-09-22 22:48:07 +02:00
Maciej Pytel
5e05c84cf0
Add metric counting failed scale-ups
...
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00
Matt Terry
63310ef41a
Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay
2017-09-18 17:09:44 -07:00
Marcin Wielgus
303f86c163
Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix
...
Move calculateUnneededOnly check after unneeded calculations
2017-09-13 11:14:51 +02:00
Krzysztof Jastrzebski
d8db14701e
Core/static_autoscaler_test.go unit tests.
2017-09-13 09:52:07 +02:00
Matt Terry
43943cdeb4
Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start
2017-09-12 21:38:29 -07:00
Krzysztof Jastrzebski
0aec68a46d
Core/static_autoscaler.go unit tests. Current time usage refactoring.
2017-09-11 15:07:21 +02:00
Marcin Wielgus
bcc8cded64
Clean up empty autoprovisioned node groups
2017-09-04 13:53:07 +02:00
Maciej Pytel
69c5ea03ce
Disable MatchInterPodAffinity if there are no pods using affinity
2017-08-30 16:18:31 +02:00
Marcin Wielgus
6ad7ca21e8
Merge pull request #265 from MaciekPytel/ignore_unneded_if_min_size
...
Skip nodes in min-sized groups in scale-down simulation
2017-08-28 19:40:53 +05:30
Marcin Wielgus
9e2c76551f
Merge pull request #263 from mwielgus/delete-in-goroutine
...
Run node drain/delete in a separate goroutine
2017-08-28 19:39:57 +05:30
Maciej Pytel
2f6dd8aefc
Skip nodes in min-sized groups in scale-down simulation
...
Currently we track if those nodes can be removed and only
skip them at the execution step. Since checking if node is
unneeded is pretty expensive it's better to filter them out
early.
2017-08-28 15:48:41 +02:00
Marcin Wielgus
718e5db78e
Run node drain/delete in a separate goroutine
2017-08-28 12:12:31 +02:00
Marcin Wielgus
71b4ca5461
Dont block stale downs if no nodes can be removed
2017-08-26 16:29:50 +02:00
Beata Skiba
edeb522274
Add measuring of FilterOutSchedulable
2017-08-22 18:36:13 +02:00
Beata Skiba
43c9b6b06b
Add cleaner function labels for metrics exporting.
2017-08-22 16:09:42 +02:00
Beata Skiba
14df1b808b
Drill down scale down metrics
...
Split scale down duration into three parts:
1. Find nodes to remove
2. Node deletion
3. Misc operations
2017-08-18 14:17:02 +02:00
Maciej Pytel
95b5b4be94
Remove --verify-unschedulabe-pods flag
...
This flag was true in default setups for every platform,
we haven't heard about any user changing it to false and
after removing check on PodScheduled condition setting it
to false would basically break CA.
2017-08-16 17:31:59 +02:00
Maciej Pytel
ef1241b3c6
Remove checking and resetting PodSchedulable condition
...
The performance cost was too high and the pods should
be filtered out by follow up checks anyway.
Check out https://github.com/kubernetes/autoscaler/issues/187
for details.
2017-08-16 17:30:11 +02:00
Marcin Wielgus
9116e4c08c
Compilation fix for CA after godeps update
2017-08-11 17:56:47 +02:00
Ivan Towlson
902d2414b7
Fixed typoes of name 'Kubernetes'
2017-08-03 14:20:23 +12:00
Marcin Wielgus
55d750196c
Add a flag to turn off pod status condition reseting for performance tests
2017-07-24 15:53:45 +02:00
Aleksandra Malinowska
2de8ccc8e1
Change scope of scaleUp metric
2017-07-18 12:17:51 +02:00
Aleksandra Malinowska
aa1771107e
change scope of findUnneeded metric
2017-07-07 16:30:59 +02:00
Yusuke Kuoka
7697d5345a
cluster-autoscaler: Fix scale-down when the node group auto-discovery feature is enabled
...
By fixing CA not to reset `StaticAutoscaler` state before each iteration so that it remembers last scale-up/down time which is used to throttle scale-down, which is causing the issue.
2017-06-22 10:25:37 +09:00
Marcin Wielgus
2cd532ebfe
Don't calculate utilization and run scale down simulations for unmanaged nodes
2017-06-20 16:57:30 +02:00
Maciej Pytel
fe514ed75d
Make status configmap respect namespace parameter
2017-06-14 14:07:13 +02:00
Marcin Wielgus
69c77791a2
Fix error types
2017-06-12 21:26:50 +02:00
Marcin Wielgus
e2e171b7b7
Enable pricing in expander factory
2017-06-09 11:09:43 -07:00
Maciej Pytel
58cdfa1702
Updated log levels in main loop
2017-05-18 14:09:15 +02:00
Maciej Pytel
3f8ca51768
Use typed errors in scale down
2017-05-18 14:09:15 +02:00
Maciej Pytel
7f5c7ed3a2
Used typed errors in scale up code
...
Updated some of the functions called by scale up
to return new errors as required.
2017-05-18 14:09:15 +02:00
Maciej Pytel
f716a7e496
Add typed errors; add errors_total metric
...
To keep reasonable commit size only top-level files use
new errors. Will add them in other files in next commits.
2017-05-18 14:09:15 +02:00
Marcin Wielgus
d9bf5aacd7
Use TemplateNodeInfo in scale up
2017-05-16 11:45:05 +02:00
Maciej Pytel
4cdf06ea94
Added CA metrics related to autoscaler execution
2017-05-11 14:51:04 +02:00
Maciej Pytel
83ef3d2be3
Added CA metrics related to cluster state
2017-05-11 13:54:04 +02:00
Yusuke Kuoka
5304e9af21
cluster-autoscaler: Fix typos in comments
2017-05-10 11:22:15 +09:00
Maciej Pytel
7e4212478a
Fix error handling for updating node status
2017-04-25 17:34:23 +02:00