autoscaler

Commit Graph

Author	SHA1	Message	Date
Łukasz Osipiuk	a849ead286	Precompute inter pod equivalence groups in checkPodsSchedulableOnNode	2019-05-29 18:05:52 +02:00
Chris Bradfield	92ea680f1a	Implement an --ignore-taint flag This change adds support for a user to specify taints to ignore when considering a node as a template for a node group.	2019-05-14 10:22:59 -07:00
Jiaxin Shan	90666881d3	Move GPULabel and GPUTypes to cloud provider	2019-03-25 13:03:01 -07:00
Andrew McDermott	5ae76ea66e	UPSTREAM: <carry>: fix max cluster size calculation on scale up When scaling up the calculation for computing the maximum cluster size does not take into account the number of any upcoming nodes and it is possible to grow the cluster beyond the cluster size (--max-nodes-total). Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1670695	2019-03-08 13:28:58 +00:00
Pengfei Ni	128729bae9	Move schedulercache to package nodeinfo	2019-02-21 12:41:08 +08:00
Vivek Bagade	79ef3a6940	unexporting methods in utils.go	2019-01-25 00:06:03 +05:30
Łukasz Osipiuk	85a83b62bd	Pass nodeGroup->NodeInfo map to ClusterStateRegistry Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801	2019-01-08 15:52:00 +01:00
Kubernetes Prow Robot	4002559a4c	Merge pull request #1516 from frobware/fix-max-nodes-total-upstream fix calculation of max cluster size	2019-01-03 10:02:38 -08:00
Maciej Pytel	3f0da8947a	Use listers in scale-up	2019-01-02 15:56:01 +01:00
Kubernetes Prow Robot	ab7f1e69be	Merge pull request #1464 from losipiuk/lo/stockouts2 Better quota-exceeded/stockout handling	2018-12-31 05:28:08 -08:00
Łukasz Osipiuk	9689b30ee4	Do not use time.Now() in RegisterFailedScaleUp	2018-12-28 17:17:07 +01:00
Łukasz Osipiuk	da5bef307b	Allow updating Increase for ScaleUpRequest in ClusterStateRegistry	2018-12-28 17:17:07 +01:00
Maciej Pytel	60babe7158	Use kubernetes lister for daemonset instead of custom one Also migrate to using apps/v1.DaemonSet instead of old extensions/v1beta1.	2018-12-28 13:55:41 +01:00
Andrew McDermott	5bc77f051c	UPSTREAM: <carry>: fix calculation of max cluster size When scaling up, the calculation for the maximum size of the cluster based on `--max-nodes-total` doesn't take into account any nodes that are in the process of coming up. This allows the cluster to grow beyond the size specified. With this change I now see: scale_up.go:266] 21 other pods are also unschedulable scale_up.go:423] Best option to resize: openshift-cluster-api/amcdermo-ca-worker-us-east-2b scale_up.go:427] Estimated 18 nodes needed in openshift-cluster-api/amcdermo-ca-worker-us-east-2b scale_up.go:432] Capping size to max cluster total size (23) static_autoscaler.go:275] Failed to scale up: max node total count already reached	2018-12-18 17:05:19 +00:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
k8s-ci-robot	7008fb50be	Merge pull request #1380 from losipiuk/lo/backoff Make Backoff interface	2018-11-07 05:13:43 -08:00
Aleksandra Malinowska	6febc1ddb0	Fix formatted log messages	2018-11-06 14:51:43 +01:00
Aleksandra Malinowska	bf6ff4be8e	Clean up estimators	2018-11-06 14:15:42 +01:00
Łukasz Osipiuk	0e2c3739b7	Use NodeGroup as key in Backoff	2018-10-30 18:17:26 +01:00
Łukasz Osipiuk	55fc1e2f00	Store NodeGroup in ScaleUpRequest and ScaleDownRequest	2018-10-30 18:03:04 +01:00
Maciej Pytel	6f5e6aab6f	Move node group balancing to processor The goal is to allow customization of this logic for different use-case and cloudproviders.	2018-10-25 14:04:05 +02:00
Łukasz Osipiuk	a266420f6a	Recalculate clusterStateRegistry after adding multiple node groups	2018-10-02 17:15:20 +02:00
Łukasz Osipiuk	437efe4af6	If possible use nodeInfo based on created node group	2018-10-02 15:46:45 +02:00
Jakub Tużnik	8179e4e716	Refactor the scale-(up\|down) status processors so that they have more info available Replace the simple boolean ScaledUp property of ScaleUpStatus with a more comprehensive ScaleUpResult. Add more possible values to ScaleDownResult. Refactor the processors execution so that they are always executed every iteration, even if RunOnce exits earlier.	2018-09-20 17:12:02 +02:00
Łukasz Osipiuk	bf8cfef10b	NodeGroupManager.CreateNodeGroup can return extra created node groups.	2018-09-19 13:55:51 +02:00
Łukasz Osipiuk	705a6d87e2	fixup! Call CheckPodsSchedulableOnNode in scale_up.go via caching layer	2018-09-17 13:01:19 +02:00
Łukasz Osipiuk	0ad4efe920	Call CheckPodsSchedulableOnNode in scale_up.go via caching layer	2018-09-13 17:01:15 +02:00
Aleksandra Malinowska	b88e6019f7	code review fixes 3	2018-08-28 18:11:04 +02:00
Aleksandra Malinowska	5620f76c62	Pass NoScaleUpInfo to ScaleUpStatus processor	2018-08-28 14:26:03 +02:00
Aleksandra Malinowska	cd9808185e	Report reason why pod didn't trigger scale-up	2018-08-28 14:11:36 +02:00
Aleksandra Malinowska	398a1ac153	Fix error on node info not found for group	2018-07-23 11:16:12 +02:00
Pengfei Ni	1dd0147d9e	Add more events for CA	2018-07-09 15:42:05 +08:00
Aleksandra Malinowska	800ee56b34	Refactor and extend GPU metrics error types	2018-07-05 13:13:11 +02:00
Karol Gołąb	aae4d1270a	Make GetGpuTypeForMetrics more robust	2018-06-26 21:35:16 +02:00
Karol Gołąb	5eb7021f82	Add GPU-related scaled_up & scaled_down metrics (#974 ) * Add GPU-related scaled_up & scaled_down metrics * Fix name to match SD naming convention * Fix import after master rebase * Change the logic to include GPU-being-installed nodes	2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski	99c8c51bb3	Create NodeGroupManager which is responsible for creating/deleting node groups.	2018-06-14 16:11:32 +02:00
Łukasz Osipiuk	b7323bc0d1	Respect GPU limits in scale_up	2018-06-14 15:46:58 +02:00
Łukasz Osipiuk	dfcbedb41f	Take into consideration nodes from not autoscaled groups when enforcing resource limits	2018-06-14 15:31:40 +02:00
Łukasz Osipiuk	9f75099d2c	Restructure checking resource limits in scale_up.go Preparatory work for before introducing GPU limits	2018-06-13 19:00:37 +02:00
Pengfei Ni	be3dd85503	Update scheduler cache package	2018-06-11 13:54:12 +08:00
Łukasz Osipiuk	9c61477d25	Do not return error when getting cpu/memory capacity of node	2018-06-08 15:04:57 +02:00
Beata Skiba	b8ae6df5d3	Add post scale up status processor.	2018-06-06 13:34:49 +02:00
Maciej Pytel	856855987b	Move some GKE-specific logic outside core No change in actual logic being executed. Added a new NodeGroupListProcessor interface to encapsulate the existing logic. Moved PodListProcessor and refactor how it's passed around to make it consistent and easy to add similar interfaces.	2018-05-29 12:57:19 +02:00
Krzysztof Jastrzebski	6761d7f354	Execute predicates only for similar pods.	2018-05-29 09:36:11 +02:00
Karol Gołąb	4c710950de	Move ClusterStateRegistry to StaticAutoscaler AutoscalingContext is basically a configuration and few static helpers and API handles. ClusterStateRegistry is state and thus moved to other state-keeping objects.	2018-05-24 13:03:01 +02:00
Joachim Bartosik	bfb70e40ee	Allow passing taints to Node Group creation.	2018-05-18 14:33:33 +02:00
Krzysztof Jastrzebski	88b769b324	Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 12:37:51 +02:00
Aleksandra Malinowska	feb4ad9e14	Add utility for limiting logging	2018-03-22 12:57:22 +01:00
Marcin Wielgus	04bec08e84	Compilation fix	2018-03-20 20:11:36 +01:00
Maciej Pytel	b7f8622eb2	Create node groups with GPU in scale-up.go This is still not implemented in cloudprovider. Extended NewNodeGroup inteface to have a way of passing parameters for more complex resources.	2017-12-11 13:12:22 +01:00

1 2

84 Commits