Commit Graph

65 Commits

Author SHA1 Message Date
Aleksandra Malinowska ff77f2cc7d Fix cleaning up taints 2018-05-11 13:51:40 +02:00
Maciej Pytel 930c210843 Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-05 12:29:45 +01:00
Marcin Wielgus 37ff77372a Skip iteration if pending pods are too new 2018-03-05 12:17:14 +01:00
Beata Skiba 9b56bdefb4 Remove old unregistered nodes before checking cluster healthiness 2018-02-02 15:56:47 +01:00
Marcin Wielgus ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel d81dca5991 Mark nodes with uninitialized GPUs as unready 2017-11-10 17:56:10 +01:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
Maciej Pytel 9c2ebccbfe Write events when autoprovisioned nodegroup is created / deleted 2017-10-25 17:39:30 +02:00
Maciej Pytel 07511f444a Add Refresh method to cloud provider
This can be used to dynamically update cloud provider
config (in particular list of managed NodeGroups and their
min/max constraints).
Add GKE implementation.
2017-10-24 18:36:29 +02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Maciej Pytel 098ebbee09 Log event when removing unregistered node 2017-09-22 22:48:07 +02:00
Maciej Pytel 5e05c84cf0 Add metric counting failed scale-ups
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00
Matt Terry 63310ef41a Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay 2017-09-18 17:09:44 -07:00
Marcin Wielgus 303f86c163 Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix
Move calculateUnneededOnly check after unneeded calculations
2017-09-13 11:14:51 +02:00
Krzysztof Jastrzebski d8db14701e Core/static_autoscaler_test.go unit tests. 2017-09-13 09:52:07 +02:00
Matt Terry 43943cdeb4 Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start 2017-09-12 21:38:29 -07:00
Krzysztof Jastrzebski 0aec68a46d Core/static_autoscaler.go unit tests. Current time usage refactoring. 2017-09-11 15:07:21 +02:00
Marcin Wielgus bcc8cded64 Clean up empty autoprovisioned node groups 2017-09-04 13:53:07 +02:00
Maciej Pytel 69c5ea03ce Disable MatchInterPodAffinity if there are no pods using affinity 2017-08-30 16:18:31 +02:00
Marcin Wielgus 6ad7ca21e8 Merge pull request #265 from MaciekPytel/ignore_unneded_if_min_size
Skip nodes in min-sized groups in scale-down simulation
2017-08-28 19:40:53 +05:30
Marcin Wielgus 9e2c76551f Merge pull request #263 from mwielgus/delete-in-goroutine
Run node drain/delete in a separate goroutine
2017-08-28 19:39:57 +05:30
Maciej Pytel 2f6dd8aefc Skip nodes in min-sized groups in scale-down simulation
Currently we track if those nodes can be removed and only
skip them at the execution step. Since checking if node is
unneeded is pretty expensive it's better to filter them out
early.
2017-08-28 15:48:41 +02:00
Marcin Wielgus 718e5db78e Run node drain/delete in a separate goroutine 2017-08-28 12:12:31 +02:00
Marcin Wielgus 71b4ca5461 Dont block stale downs if no nodes can be removed 2017-08-26 16:29:50 +02:00
Beata Skiba edeb522274 Add measuring of FilterOutSchedulable 2017-08-22 18:36:13 +02:00
Beata Skiba 43c9b6b06b Add cleaner function labels for metrics exporting. 2017-08-22 16:09:42 +02:00
Beata Skiba 14df1b808b Drill down scale down metrics
Split scale down duration into three parts:
1. Find nodes to remove
2. Node deletion
3. Misc operations
2017-08-18 14:17:02 +02:00
Maciej Pytel 95b5b4be94 Remove --verify-unschedulabe-pods flag
This flag was true in default setups for every platform,
we haven't heard about any user changing it to false and
after removing check on PodScheduled condition setting it
to false would basically break CA.
2017-08-16 17:31:59 +02:00
Maciej Pytel ef1241b3c6 Remove checking and resetting PodSchedulable condition
The performance cost was too high and the pods should
be filtered out by follow up checks anyway.
Check out https://github.com/kubernetes/autoscaler/issues/187
for details.
2017-08-16 17:30:11 +02:00
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Ivan Towlson 902d2414b7 Fixed typoes of name 'Kubernetes' 2017-08-03 14:20:23 +12:00
Marcin Wielgus 55d750196c Add a flag to turn off pod status condition reseting for performance tests 2017-07-24 15:53:45 +02:00
Aleksandra Malinowska 2de8ccc8e1 Change scope of scaleUp metric 2017-07-18 12:17:51 +02:00
Aleksandra Malinowska aa1771107e change scope of findUnneeded metric 2017-07-07 16:30:59 +02:00
Yusuke Kuoka 7697d5345a cluster-autoscaler: Fix scale-down when the node group auto-discovery feature is enabled
By fixing CA not to reset `StaticAutoscaler` state before each iteration so that it remembers last scale-up/down time which is used to throttle scale-down, which is causing the issue.
2017-06-22 10:25:37 +09:00
Marcin Wielgus 2cd532ebfe Don't calculate utilization and run scale down simulations for unmanaged nodes 2017-06-20 16:57:30 +02:00
Maciej Pytel fe514ed75d Make status configmap respect namespace parameter 2017-06-14 14:07:13 +02:00
Marcin Wielgus 69c77791a2 Fix error types 2017-06-12 21:26:50 +02:00
Marcin Wielgus e2e171b7b7 Enable pricing in expander factory 2017-06-09 11:09:43 -07:00
Maciej Pytel 58cdfa1702 Updated log levels in main loop 2017-05-18 14:09:15 +02:00
Maciej Pytel 3f8ca51768 Use typed errors in scale down 2017-05-18 14:09:15 +02:00
Maciej Pytel 7f5c7ed3a2 Used typed errors in scale up code
Updated some of the functions called by scale up
to return new errors as required.
2017-05-18 14:09:15 +02:00
Maciej Pytel f716a7e496 Add typed errors; add errors_total metric
To keep reasonable commit size only top-level files use
new errors. Will add them in other files in next commits.
2017-05-18 14:09:15 +02:00
Marcin Wielgus d9bf5aacd7 Use TemplateNodeInfo in scale up 2017-05-16 11:45:05 +02:00
Maciej Pytel 4cdf06ea94 Added CA metrics related to autoscaler execution 2017-05-11 14:51:04 +02:00
Maciej Pytel 83ef3d2be3 Added CA metrics related to cluster state 2017-05-11 13:54:04 +02:00
Yusuke Kuoka 5304e9af21 cluster-autoscaler: Fix typos in comments 2017-05-10 11:22:15 +09:00
Maciej Pytel 7e4212478a Fix error handling for updating node status 2017-04-25 17:34:23 +02:00