autoscaler

Commit Graph

Author	SHA1	Message	Date
Aleksandra Malinowska	ff77f2cc7d	Fix cleaning up taints	2018-05-11 13:51:40 +02:00
Maciej Pytel	930c210843	Delay scale-up including GPU request Nodes with GPU are expensive and it's likely a bunch of pods using them will be created in a batch. In this case we can wait a bit for all pods to be created to make more efficient scale-up decision.	2018-03-05 12:29:45 +01:00
Marcin Wielgus	37ff77372a	Skip iteration if pending pods are too new	2018-03-05 12:17:14 +01:00
Beata Skiba	9b56bdefb4	Remove old unregistered nodes before checking cluster healthiness	2018-02-02 15:56:47 +01:00
Marcin Wielgus	ded016dfd8	Merge pull request #461 from MaciekPytel/gpu_unready_fix Consider GPU nodes unready until allocatable GPU is > 0	2017-11-13 15:29:27 +01:00
Maciej Pytel	d81dca5991	Mark nodes with uninitialized GPUs as unready	2017-11-10 17:56:10 +01:00
Marcin Wielgus	439fd3c9ec	Merge pull request #411 from krzysztof-jastrzebski/priority Adds priority preemption support to cluster autoscaler.	2017-11-08 09:09:26 +01:00
Beata Skiba	2b28ac1a04	Add a workaround for scaling of VMs with GPUs When a machine with GPU becomes ready it can take up to 15 minutes before it reports that GPU is allocatable. This can cause Cluster Autoscaler to trigger a second unnecessary scale up. The workaround sets allocatable to capacity for GPU so that a node that waits for GPUs to become ready to use will be considered as a place where pods requesting GPUs can be scheduled.	2017-11-06 16:04:22 +01:00
Edward Tsang	4104a91991	more spelling fixes	2017-11-02 14:21:36 -07:00
Maciej Pytel	9c2ebccbfe	Write events when autoprovisioned nodegroup is created / deleted	2017-10-25 17:39:30 +02:00
Maciej Pytel	07511f444a	Add Refresh method to cloud provider This can be used to dynamically update cloud provider config (in particular list of managed NodeGroups and their min/max constraints). Add GKE implementation.	2017-10-24 18:36:29 +02:00
Krzysztof Jastrzebski	d9c00e5ce1	Adds priority preemption support to cluster autoscaler.	2017-10-23 09:54:56 +02:00
Maciej Pytel	098ebbee09	Log event when removing unregistered node	2017-09-22 22:48:07 +02:00
Maciej Pytel	5e05c84cf0	Add metric counting failed scale-ups A minor refactor was required to avoid cyclic imports	2017-09-22 18:12:50 +02:00
Matt Terry	63310ef41a	Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay	2017-09-18 17:09:44 -07:00
Marcin Wielgus	303f86c163	Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix Move calculateUnneededOnly check after unneeded calculations	2017-09-13 11:14:51 +02:00
Krzysztof Jastrzebski	d8db14701e	Core/static_autoscaler_test.go unit tests.	2017-09-13 09:52:07 +02:00
Matt Terry	43943cdeb4	Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start	2017-09-12 21:38:29 -07:00
Krzysztof Jastrzebski	0aec68a46d	Core/static_autoscaler.go unit tests. Current time usage refactoring.	2017-09-11 15:07:21 +02:00
Marcin Wielgus	bcc8cded64	Clean up empty autoprovisioned node groups	2017-09-04 13:53:07 +02:00
Maciej Pytel	69c5ea03ce	Disable MatchInterPodAffinity if there are no pods using affinity	2017-08-30 16:18:31 +02:00
Marcin Wielgus	6ad7ca21e8	Merge pull request #265 from MaciekPytel/ignore_unneded_if_min_size Skip nodes in min-sized groups in scale-down simulation	2017-08-28 19:40:53 +05:30
Marcin Wielgus	9e2c76551f	Merge pull request #263 from mwielgus/delete-in-goroutine Run node drain/delete in a separate goroutine	2017-08-28 19:39:57 +05:30
Maciej Pytel	2f6dd8aefc	Skip nodes in min-sized groups in scale-down simulation Currently we track if those nodes can be removed and only skip them at the execution step. Since checking if node is unneeded is pretty expensive it's better to filter them out early.	2017-08-28 15:48:41 +02:00
Marcin Wielgus	718e5db78e	Run node drain/delete in a separate goroutine	2017-08-28 12:12:31 +02:00
Marcin Wielgus	71b4ca5461	Dont block stale downs if no nodes can be removed	2017-08-26 16:29:50 +02:00
Beata Skiba	edeb522274	Add measuring of FilterOutSchedulable	2017-08-22 18:36:13 +02:00
Beata Skiba	43c9b6b06b	Add cleaner function labels for metrics exporting.	2017-08-22 16:09:42 +02:00
Beata Skiba	14df1b808b	Drill down scale down metrics Split scale down duration into three parts: 1. Find nodes to remove 2. Node deletion 3. Misc operations	2017-08-18 14:17:02 +02:00
Maciej Pytel	95b5b4be94	Remove --verify-unschedulabe-pods flag This flag was true in default setups for every platform, we haven't heard about any user changing it to false and after removing check on PodScheduled condition setting it to false would basically break CA.	2017-08-16 17:31:59 +02:00
Maciej Pytel	ef1241b3c6	Remove checking and resetting PodSchedulable condition The performance cost was too high and the pods should be filtered out by follow up checks anyway. Check out https://github.com/kubernetes/autoscaler/issues/187 for details.	2017-08-16 17:30:11 +02:00
Marcin Wielgus	9116e4c08c	Compilation fix for CA after godeps update	2017-08-11 17:56:47 +02:00
Ivan Towlson	902d2414b7	Fixed typoes of name 'Kubernetes'	2017-08-03 14:20:23 +12:00
Marcin Wielgus	55d750196c	Add a flag to turn off pod status condition reseting for performance tests	2017-07-24 15:53:45 +02:00
Aleksandra Malinowska	2de8ccc8e1	Change scope of scaleUp metric	2017-07-18 12:17:51 +02:00
Aleksandra Malinowska	aa1771107e	change scope of findUnneeded metric	2017-07-07 16:30:59 +02:00
Yusuke Kuoka	7697d5345a	cluster-autoscaler: Fix scale-down when the node group auto-discovery feature is enabled By fixing CA not to reset `StaticAutoscaler` state before each iteration so that it remembers last scale-up/down time which is used to throttle scale-down, which is causing the issue.	2017-06-22 10:25:37 +09:00
Marcin Wielgus	2cd532ebfe	Don't calculate utilization and run scale down simulations for unmanaged nodes	2017-06-20 16:57:30 +02:00
Maciej Pytel	fe514ed75d	Make status configmap respect namespace parameter	2017-06-14 14:07:13 +02:00
Marcin Wielgus	69c77791a2	Fix error types	2017-06-12 21:26:50 +02:00
Marcin Wielgus	e2e171b7b7	Enable pricing in expander factory	2017-06-09 11:09:43 -07:00
Maciej Pytel	58cdfa1702	Updated log levels in main loop	2017-05-18 14:09:15 +02:00
Maciej Pytel	3f8ca51768	Use typed errors in scale down	2017-05-18 14:09:15 +02:00
Maciej Pytel	7f5c7ed3a2	Used typed errors in scale up code Updated some of the functions called by scale up to return new errors as required.	2017-05-18 14:09:15 +02:00
Maciej Pytel	f716a7e496	Add typed errors; add errors_total metric To keep reasonable commit size only top-level files use new errors. Will add them in other files in next commits.	2017-05-18 14:09:15 +02:00
Marcin Wielgus	d9bf5aacd7	Use TemplateNodeInfo in scale up	2017-05-16 11:45:05 +02:00
Maciej Pytel	4cdf06ea94	Added CA metrics related to autoscaler execution	2017-05-11 14:51:04 +02:00
Maciej Pytel	83ef3d2be3	Added CA metrics related to cluster state	2017-05-11 13:54:04 +02:00
Yusuke Kuoka	5304e9af21	cluster-autoscaler: Fix typos in comments	2017-05-10 11:22:15 +09:00
Maciej Pytel	7e4212478a	Fix error handling for updating node status	2017-04-25 17:34:23 +02:00

1 2

65 Commits