autoscaler

Commit Graph

Author	SHA1	Message	Date
Aleksandra Malinowska	07e52e6c79	Move creating cloud provider out of context	2018-07-25 13:43:47 +02:00
Aleksandra Malinowska	0976d2aa07	Move autoscaling options out of static	2018-07-25 10:52:37 +02:00
Aleksandra Malinowska	6b94d7172d	Move AutoscalingOptions to config/static	2018-07-23 15:52:27 +02:00
Krzysztof Jastrzebski	2df2568841	Move removing unneeded autoprovisioned node groups to node group manager	2018-06-22 14:26:12 +02:00
Beata Skiba	b8ae6df5d3	Add post scale up status processor.	2018-06-06 13:34:49 +02:00
Maciej Pytel	856855987b	Move some GKE-specific logic outside core No change in actual logic being executed. Added a new NodeGroupListProcessor interface to encapsulate the existing logic. Moved PodListProcessor and refactor how it's passed around to make it consistent and easy to add similar interfaces.	2018-05-29 12:57:19 +02:00
Maciej Pytel	5faa41e683	Move PodListProcessor to new directory It's not really a util and with more processors coming it makes more sense to keep them in dedicated place.	2018-05-29 12:00:47 +02:00
Karol Gołąb	4c710950de	Move ClusterStateRegistry to StaticAutoscaler AutoscalingContext is basically a configuration and few static helpers and API handles. ClusterStateRegistry is state and thus moved to other state-keeping objects.	2018-05-24 13:03:01 +02:00
Karol Gołąb	5bfab7d9b2	Return value moved to the caller	2018-05-18 14:59:15 +02:00
Karol Gołąb	fa6f25a70a	Extract ClusterStateRegistry update with its soft dependency	2018-05-18 10:25:15 +02:00
Karol Gołąb	dc34b43a40	Extract another tiny method	2018-05-18 10:10:51 +02:00
Karol Gołąb	34f6a45a04	Extract method to hide a tiny bit of complexity	2018-05-18 10:01:52 +02:00
Aleksandra Malinowska	d7dc3616f7	Merge pull request #868 from kgolab/kg-clean-up-010 Move metrics update to proper place	2018-05-17 14:52:18 +02:00
Karol Gołąb	e31bf0bb58	Move metrics.Autoscaling after all Node-level operations & checks	2018-05-17 14:37:43 +02:00
Aleksandra Malinowska	3b6cfc7c2b	Merge pull request #870 from kgolab/kg-clean-up-012 Set lastScaleDownFailTime properly	2018-05-17 12:09:15 +02:00
MaciekPytel	444201d1e7	Merge pull request #871 from kgolab/kg-clean-up-013 Extract duplicate code into a single method	2018-05-17 11:49:49 +02:00
Karol Gołąb	400147a075	Extract duplicate code into a single method	2018-05-17 10:01:04 +02:00
Karol Gołąb	b8cbdf4178	Set lastScaleDownFailTime properly - the ScaleDownError check was unreachable	2018-05-17 09:50:22 +02:00
Karol Gołąb	38a5951e22	Check glog.V once	2018-05-17 09:47:52 +02:00
Karol Gołąb	ccca078a2b	Move metrics update to proper place	2018-05-17 09:46:25 +02:00
MaciekPytel	bc39d4dcd5	Merge pull request #842 from kgolab/kg-clean-up-008 Merge two variables into one.	2018-05-14 10:54:43 +02:00
Aleksandra Malinowska	b52ec59b05	Fix cleaning up taints	2018-05-11 12:00:48 +02:00
Karol Gołąb	f1f92f065e	Merge two variables into one.	2018-05-10 14:32:37 +02:00
Karol Gołąb	ae203ed517	Removed unused CloudProvider() method.	2018-05-08 11:23:55 +02:00
Karol Gołąb	854fcc1ff8	Remove implementation details (CleanUp) from the interface. The CleanUp method is instead called directly from the implementation, when required. Test updated in a quick way since the mock we're using does not support AtLeast(1) - thus Times(2).	2018-05-07 15:24:14 +02:00
Krzysztof Jastrzebski	88b769b324	Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 12:37:51 +02:00
Aleksandra Malinowska	7e1353a865	Ignore TPU resource in simulations	2018-04-11 12:26:22 +02:00
Maciej Pytel	abbc45da2e	Delay scale-up including GPU request Nodes with GPU are expensive and it's likely a bunch of pods using them will be created in a batch. In this case we can wait a bit for all pods to be created to make more efficient scale-up decision.	2018-03-02 15:55:04 +01:00
anniedy	bf59e3daa5	Typo fix unneded->[unneeded] (#623 ) * Update clusterstate.md * Update scale_down.go * Update static_autoscaler.go	2018-02-07 17:36:58 +01:00
Beata Skiba	346a5c26a9	Remove old unregistered nodes before checking cluster healthiness	2018-02-01 16:34:50 +01:00
Aleksandra Malinowska	b17b6c3ec5	Wait before publishing no nodes ready after start	2018-01-16 19:04:38 +01:00
Aleksandra Malinowska	27efa05b1d	Publish ClusterUnhealthy events	2018-01-16 16:56:36 +01:00
Aleksandra Malinowska	1b728d411b	Publish status and metrics for empty cluster	2018-01-16 16:07:29 +01:00
Marcin Wielgus	15b10c8f67	Skip iteration if pending pods are too new	2017-12-28 16:55:44 +01:00
Marcin Wielgus	ded016dfd8	Merge pull request #461 from MaciekPytel/gpu_unready_fix Consider GPU nodes unready until allocatable GPU is > 0	2017-11-13 15:29:27 +01:00
Maciej Pytel	d81dca5991	Mark nodes with uninitialized GPUs as unready	2017-11-10 17:56:10 +01:00
Marcin Wielgus	439fd3c9ec	Merge pull request #411 from krzysztof-jastrzebski/priority Adds priority preemption support to cluster autoscaler.	2017-11-08 09:09:26 +01:00
Beata Skiba	2b28ac1a04	Add a workaround for scaling of VMs with GPUs When a machine with GPU becomes ready it can take up to 15 minutes before it reports that GPU is allocatable. This can cause Cluster Autoscaler to trigger a second unnecessary scale up. The workaround sets allocatable to capacity for GPU so that a node that waits for GPUs to become ready to use will be considered as a place where pods requesting GPUs can be scheduled.	2017-11-06 16:04:22 +01:00
Edward Tsang	4104a91991	more spelling fixes	2017-11-02 14:21:36 -07:00
Maciej Pytel	9c2ebccbfe	Write events when autoprovisioned nodegroup is created / deleted	2017-10-25 17:39:30 +02:00
Maciej Pytel	07511f444a	Add Refresh method to cloud provider This can be used to dynamically update cloud provider config (in particular list of managed NodeGroups and their min/max constraints). Add GKE implementation.	2017-10-24 18:36:29 +02:00
Krzysztof Jastrzebski	d9c00e5ce1	Adds priority preemption support to cluster autoscaler.	2017-10-23 09:54:56 +02:00
Maciej Pytel	098ebbee09	Log event when removing unregistered node	2017-09-22 22:48:07 +02:00
Maciej Pytel	5e05c84cf0	Add metric counting failed scale-ups A minor refactor was required to avoid cyclic imports	2017-09-22 18:12:50 +02:00
Matt Terry	63310ef41a	Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay	2017-09-18 17:09:44 -07:00
Marcin Wielgus	303f86c163	Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix Move calculateUnneededOnly check after unneeded calculations	2017-09-13 11:14:51 +02:00
Krzysztof Jastrzebski	d8db14701e	Core/static_autoscaler_test.go unit tests.	2017-09-13 09:52:07 +02:00
Matt Terry	43943cdeb4	Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start	2017-09-12 21:38:29 -07:00
Krzysztof Jastrzebski	0aec68a46d	Core/static_autoscaler.go unit tests. Current time usage refactoring.	2017-09-11 15:07:21 +02:00
Marcin Wielgus	bcc8cded64	Clean up empty autoprovisioned node groups	2017-09-04 13:53:07 +02:00
Maciej Pytel	69c5ea03ce	Disable MatchInterPodAffinity if there are no pods using affinity	2017-08-30 16:18:31 +02:00
Marcin Wielgus	6ad7ca21e8	Merge pull request #265 from MaciekPytel/ignore_unneded_if_min_size Skip nodes in min-sized groups in scale-down simulation	2017-08-28 19:40:53 +05:30
Marcin Wielgus	9e2c76551f	Merge pull request #263 from mwielgus/delete-in-goroutine Run node drain/delete in a separate goroutine	2017-08-28 19:39:57 +05:30
Maciej Pytel	2f6dd8aefc	Skip nodes in min-sized groups in scale-down simulation Currently we track if those nodes can be removed and only skip them at the execution step. Since checking if node is unneeded is pretty expensive it's better to filter them out early.	2017-08-28 15:48:41 +02:00
Marcin Wielgus	718e5db78e	Run node drain/delete in a separate goroutine	2017-08-28 12:12:31 +02:00
Marcin Wielgus	71b4ca5461	Dont block stale downs if no nodes can be removed	2017-08-26 16:29:50 +02:00
Beata Skiba	edeb522274	Add measuring of FilterOutSchedulable	2017-08-22 18:36:13 +02:00
Beata Skiba	43c9b6b06b	Add cleaner function labels for metrics exporting.	2017-08-22 16:09:42 +02:00
Beata Skiba	14df1b808b	Drill down scale down metrics Split scale down duration into three parts: 1. Find nodes to remove 2. Node deletion 3. Misc operations	2017-08-18 14:17:02 +02:00
Maciej Pytel	95b5b4be94	Remove --verify-unschedulabe-pods flag This flag was true in default setups for every platform, we haven't heard about any user changing it to false and after removing check on PodScheduled condition setting it to false would basically break CA.	2017-08-16 17:31:59 +02:00
Maciej Pytel	ef1241b3c6	Remove checking and resetting PodSchedulable condition The performance cost was too high and the pods should be filtered out by follow up checks anyway. Check out https://github.com/kubernetes/autoscaler/issues/187 for details.	2017-08-16 17:30:11 +02:00
Marcin Wielgus	9116e4c08c	Compilation fix for CA after godeps update	2017-08-11 17:56:47 +02:00
Ivan Towlson	902d2414b7	Fixed typoes of name 'Kubernetes'	2017-08-03 14:20:23 +12:00
Marcin Wielgus	55d750196c	Add a flag to turn off pod status condition reseting for performance tests	2017-07-24 15:53:45 +02:00
Aleksandra Malinowska	2de8ccc8e1	Change scope of scaleUp metric	2017-07-18 12:17:51 +02:00
Aleksandra Malinowska	aa1771107e	change scope of findUnneeded metric	2017-07-07 16:30:59 +02:00
Yusuke Kuoka	7697d5345a	cluster-autoscaler: Fix scale-down when the node group auto-discovery feature is enabled By fixing CA not to reset `StaticAutoscaler` state before each iteration so that it remembers last scale-up/down time which is used to throttle scale-down, which is causing the issue.	2017-06-22 10:25:37 +09:00
Marcin Wielgus	2cd532ebfe	Don't calculate utilization and run scale down simulations for unmanaged nodes	2017-06-20 16:57:30 +02:00
Maciej Pytel	fe514ed75d	Make status configmap respect namespace parameter	2017-06-14 14:07:13 +02:00
Marcin Wielgus	69c77791a2	Fix error types	2017-06-12 21:26:50 +02:00
Marcin Wielgus	e2e171b7b7	Enable pricing in expander factory	2017-06-09 11:09:43 -07:00
Maciej Pytel	58cdfa1702	Updated log levels in main loop	2017-05-18 14:09:15 +02:00
Maciej Pytel	3f8ca51768	Use typed errors in scale down	2017-05-18 14:09:15 +02:00
Maciej Pytel	7f5c7ed3a2	Used typed errors in scale up code Updated some of the functions called by scale up to return new errors as required.	2017-05-18 14:09:15 +02:00
Maciej Pytel	f716a7e496	Add typed errors; add errors_total metric To keep reasonable commit size only top-level files use new errors. Will add them in other files in next commits.	2017-05-18 14:09:15 +02:00
Marcin Wielgus	d9bf5aacd7	Use TemplateNodeInfo in scale up	2017-05-16 11:45:05 +02:00
Maciej Pytel	4cdf06ea94	Added CA metrics related to autoscaler execution	2017-05-11 14:51:04 +02:00
Maciej Pytel	83ef3d2be3	Added CA metrics related to cluster state	2017-05-11 13:54:04 +02:00
Yusuke Kuoka	5304e9af21	cluster-autoscaler: Fix typos in comments	2017-05-10 11:22:15 +09:00
Maciej Pytel	7e4212478a	Fix error handling for updating node status	2017-04-25 17:34:23 +02:00
Marcin Wielgus	34eb4973f8	Fix imports in cluster autoscaler after migrating it from contrib	2017-04-18 15:42:04 +02:00
Marcin Wielgus	eb3e6173d1	Cluster-autoscaler: Fix isNodeStarting	2017-03-27 23:27:14 +02:00
Maciej Pytel	72c885b800	Cluster-Autoscaler: reset scale-down on unready cluster	2017-03-22 17:17:59 +01:00
Maciej Pytel	c71668a8d8	Cluster-Autoscaler: update status configmap on errors Previously it would only update after successfully completing the main loop, meaning the status wouldn't get updated unless cluster was healthy.	2017-03-15 13:22:24 +01:00
Kubernetes Submit Queue	39fa783ad7	Merge pull request https://github.com/kubernetes/contrib/pull/2451 from mwielgus/pdb-ca Automatic merge from submit-queue Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2 In part 1 or 2 we skip nodes that have a pod with 0 poddisruptionallowed. Part 2/2 will delete pods using evict. cc: @jszczepkowski @MaciekPytel @davidopp @fgrzadkowski	2017-03-06 09:27:50 -08:00
Marcin Wielgus	5b4441083a	Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2	2017-03-06 17:15:04 +01:00
Maciej Pytel	d3bf5d3d51	Cluster-Autoscaler: log events on status configmap	2017-03-06 12:21:24 +01:00
Maciej Pytel	84f19c1e1e	Cluster-Autoscaler: add map to disable status configmap	2017-03-02 15:35:00 +01:00
Marcin Wielgus	2ffaddb7c0	Cluster-autoscaler: lint	2017-03-02 15:15:07 +01:00
Marcin Wielgus	72a47dc2b2	Cluster-autoscaler: update code for 1.6 k8s sync	2017-03-02 14:34:49 +01:00
Maciej Pytel	d0196c9e1b	Cluster-Autoscaler: Delete status configmap on exit	2017-02-28 17:19:23 +01:00
Maciej Pytel	497d2800ea	Cluster-Autoscaler: Write status to configmap	2017-02-28 09:59:40 +01:00
Maciej Pytel	637e750246	Cluster-Autoscaler: fix segfault StaticAutoscaler.kubeClient was uninitialized, leading to segfaults when trying to use it. It was also a duplicate since the client is already available through AutoscalingContext.	2017-02-27 14:13:54 +01:00
Marcin Wielgus	83fdeb184f	Cluster-autoscaler: use listers from ListersRegistry	2017-02-24 20:40:53 +01:00
Yusuke Kuoka	baee799524	cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs. The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime. Example usage: ``` ./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr ``` The configmap would look like: ```yaml kind: ConfigMap apiVersion: v1 metadata: name: cluster-autoscaler namespace: kube-system data: settings: \|- { "nodeGroups": [ { "minSize": 1, "maxSize": 2, "name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5" } ] } ``` Other notes: * Make namespace defaults to "kube-system" according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267 * Trigger a full-recreate on a configuration change according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410 * Introduced `autoscaler/` and moved all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663) * Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler` * `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change. * Moved `scale_down.go`, `scale_up.go` and `utils.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`) Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`	2017-02-24 20:36:47 +09:00

1 2 3 4

195 Commits