autoscaler

Commit Graph

Author	SHA1	Message	Date
Kubernetes Prow Robot	bf3a9fb52e	Merge pull request #2436 from Jeffwan/skip_first_acceptable_range_check Skip acceptable range check before it has data	2019-12-10 01:49:29 -08:00
Jakub Tużnik	f64b6cd4de	CSR: fix a bug in GetClusterSize Currently, GetClusterSize reports the target number for all autoscaled node groups, but the actual number for _all_ node groups, even those that are not autoscaled. This commit fixes that behavior so that both target and actual size reported are from autoscaled node groups only.	2019-11-20 13:49:49 +01:00
Łukasz Osipiuk	aa53261098	More verbose logging of GCE instance create errors	2019-10-15 15:36:38 +02:00
Łukasz Osipiuk	288d4107b2	Rename GetCreatedNodesWithOutOfResourcesErrors to GetCreatedNodesWithErrors	2019-10-14 10:56:56 +02:00
Jiaxin Shan	0d278a2554	Skip acceptable range check before it has data	2019-10-09 17:59:43 -07:00
Thomas Hartland	7c17d52ec8	Invalidate node instances cache after deleting failed nodes	2019-09-30 13:56:33 +02:00
Kubernetes Prow Robot	6434df247d	Merge pull request #2304 from krzysztof-jastrzebski/fix_bug Stop disabling Cluster Autoscaler when there is no ready nodes.	2019-09-06 07:06:57 -07:00
Krzysztof Jastrzebski	839cdaaa09	Stop disabling Cluster Autoscaler when there is no ready nodes.	2019-09-06 14:45:34 +02:00
Łukasz Osipiuk	79b4614328	Use NodeDiskPressure conditino instead of NodeOutOfDisk	2019-09-05 23:23:43 +02:00
devinyan	3a633de55a	nodeGroup judy IsNil to avoid crashed	2019-06-30 17:33:32 +08:00
Kubernetes Prow Robot	dd89fb1385	Merge pull request #2096 from frobware/fix-segv-in-updateReadinessStats Fix potential SEGV in updateReadinessStats	2019-06-11 09:00:24 -07:00
Andrew McDermott	91016a605a	Fix SEGV in updateReadinessStats Calling cloudprovider.NodeGroupForNode(unregistered.Node) can result in a nil result for the nodegroup - handle that case.	2019-06-11 10:42:27 +01:00
Jakub Tużnik	bb382f47f9	Retain information about scale-up failures in CSR This will provide the AutoscalingStatusProcessor with information about failed scale-ups.	2019-06-05 16:53:30 +02:00
Łukasz Osipiuk	950a8a9f76	Quickly fail scaleup on all instance creation errors Change-Id: Ib918251f3e3229d882d5182a98f129b77d7731a3	2019-06-03 13:32:41 +02:00
Łukasz Osipiuk	c88f014470	Add debug log in handleOutOfResourcesErrorsForNodeGroup	2019-05-31 15:26:41 +02:00
Krzysztof Jastrzebski	4831d76288	Cache cloud provider node instances in cluster state.	2019-05-31 10:11:51 +02:00
Pengfei Ni	b721438315	Revert "Use cloudProvider.GetInstanceID() to get unregistered nodes" This reverts commit `f4ef957ecd`.	2019-03-08 10:47:26 +08:00
Pengfei Ni	f4ef957ecd	Use cloudProvider.GetInstanceID() to get unregistered nodes	2019-02-27 22:58:34 +08:00
Pengfei Ni	128729bae9	Move schedulercache to package nodeinfo	2019-02-21 12:41:08 +08:00
Łukasz Osipiuk	b5f9a9505c	Extend backoff interface with NodeInfo and error information	2019-01-09 11:25:34 +01:00
Łukasz Osipiuk	85a83b62bd	Pass nodeGroup->NodeInfo map to ClusterStateRegistry Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801	2019-01-08 15:52:00 +01:00
Łukasz Osipiuk	5cddbda693	Rename nodeGroupBackoffInfo to backoff in ClusterStateRegistry	2018-12-31 17:59:58 +01:00
Łukasz Osipiuk	2fbae197f4	Handle possible stockout/quota scale-up errors	2018-12-28 17:17:07 +01:00
Łukasz Osipiuk	9689b30ee4	Do not use time.Now() in RegisterFailedScaleUp	2018-12-28 17:17:07 +01:00
Łukasz Osipiuk	da5bef307b	Allow updating Increase for ScaleUpRequest in ClusterStateRegistry	2018-12-28 17:17:07 +01:00
lsytj0413	8ca0e71d1e	refactor(*): fix some golint warning	2018-12-24 11:07:15 +08:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Łukasz Osipiuk	5962354c81	Inject Backoff instance to ClusterStateRegistry on creation	2018-11-13 14:25:16 +01:00
k8s-ci-robot	7008fb50be	Merge pull request #1380 from losipiuk/lo/backoff Make Backoff interface	2018-11-07 05:13:43 -08:00
Łukasz Osipiuk	0e2c3739b7	Use NodeGroup as key in Backoff	2018-10-30 18:17:26 +01:00
Łukasz Osipiuk	55fc1e2f00	Store NodeGroup in ScaleUpRequest and ScaleDownRequest	2018-10-30 18:03:04 +01:00
Łukasz Osipiuk	e462d4420c	Extract Backoff interface	2018-10-29 23:02:13 +01:00
Łukasz Osipiuk	41b02870f8	NodeGroup.Nodes() return Instance struct instead instance name This is preparatory work for handling resource related (stockout/quota-exceeded) error conditions in CA.	2018-10-26 14:41:18 +02:00
Łukasz Osipiuk	29c22c0a3d	Store single ScaleUpRequest per node group	2018-10-18 18:27:31 +02:00
Jakub Tużnik	b105f28ebd	Add a method to determine if a node group is at its its target size to CSR	2018-09-07 20:24:38 +02:00
Aleksandra Malinowska	364e2da764	Check for ready condition not true	2018-08-30 13:43:24 +02:00
Jakub Tużnik	51334f283e	Fix GetClusterSize to return actual size in line with the rest of CSR It returned the number of registered nodes, but should return the number of provisioned nodes instead.	2018-08-27 14:58:07 +02:00
Jakub Tużnik	054f0b3b90	Add AutoscalingStatusProcessor	2018-08-07 14:47:06 +02:00
Krzysztof Jastrzebski	dd1db7a0ac	Move backoff mechanism to utils.	2018-06-13 15:32:25 +02:00
Aleksandra Malinowska	820f688d2a	Update max unready nodes to 45%	2018-05-17 12:51:45 +02:00
Aleksandra Malinowska	4c594db7f8	Run spellchecker	2018-03-15 15:47:49 +01:00
Hang Yan	b4713c22d5	Fix various typos in clusterstate package	2018-02-07 16:03:51 +08:00
Aleksandra Malinowska	3894ecb470	Export unregistered node count metric	2018-01-16 16:56:40 +01:00
Maciej Pytel	53603d0a2a	Increase MaxNodeStartupTime to 15 minutes.	2017-11-13 15:14:47 +01:00
Maciej Pytel	c376ef3c87	Add metrics for autoprovisioning	2017-10-31 17:42:58 +01:00
Maciej Pytel	02ccba3338	Update clusterstate after scale-up	2017-10-17 16:11:25 +02:00
Marcin Wielgus	f658450b16	Merge pull request #379 from MaciekPytel/long_unregistered_node Keep track of nodes that failed to register for a long time	2017-09-28 15:01:32 +02:00
Maciej Pytel	ff21b0b00c	Keep track of nodes that failed to register for a long time Previously a node that failed to register and couldn't be deleted basically broke CA.	2017-09-27 16:32:04 +02:00
Maciej Pytel	e12ee88f5f	Add failed scale-up reason in metric	2017-09-26 13:40:34 +02:00
Maciej Pytel	5e05c84cf0	Add metric counting failed scale-ups A minor refactor was required to avoid cyclic imports	2017-09-22 18:12:50 +02:00

1 2

80 Commits