Commit Graph

80 Commits

Author SHA1 Message Date
Kubernetes Prow Robot bf3a9fb52e
Merge pull request #2436 from Jeffwan/skip_first_acceptable_range_check
Skip acceptable range check before it has data
2019-12-10 01:49:29 -08:00
Jakub Tużnik f64b6cd4de CSR: fix a bug in GetClusterSize
Currently, GetClusterSize reports the target number for all autoscaled
node groups, but the actual number for _all_ node groups, even those
that are not autoscaled. This commit fixes that behavior so that both
target and actual size reported are from autoscaled node groups only.
2019-11-20 13:49:49 +01:00
Łukasz Osipiuk aa53261098 More verbose logging of GCE instance create errors 2019-10-15 15:36:38 +02:00
Łukasz Osipiuk 288d4107b2 Rename GetCreatedNodesWithOutOfResourcesErrors to GetCreatedNodesWithErrors 2019-10-14 10:56:56 +02:00
Jiaxin Shan 0d278a2554 Skip acceptable range check before it has data 2019-10-09 17:59:43 -07:00
Thomas Hartland 7c17d52ec8 Invalidate node instances cache after deleting failed nodes 2019-09-30 13:56:33 +02:00
Kubernetes Prow Robot 6434df247d
Merge pull request #2304 from krzysztof-jastrzebski/fix_bug
Stop disabling Cluster Autoscaler when there is no ready nodes.
2019-09-06 07:06:57 -07:00
Krzysztof Jastrzebski 839cdaaa09 Stop disabling Cluster Autoscaler when there is no ready nodes. 2019-09-06 14:45:34 +02:00
Łukasz Osipiuk 79b4614328 Use NodeDiskPressure conditino instead of NodeOutOfDisk 2019-09-05 23:23:43 +02:00
devinyan 3a633de55a nodeGroup judy IsNil to avoid crashed 2019-06-30 17:33:32 +08:00
Kubernetes Prow Robot dd89fb1385
Merge pull request #2096 from frobware/fix-segv-in-updateReadinessStats
Fix potential SEGV in updateReadinessStats
2019-06-11 09:00:24 -07:00
Andrew McDermott 91016a605a Fix SEGV in updateReadinessStats
Calling cloudprovider.NodeGroupForNode(unregistered.Node) can result
in a nil result for the nodegroup - handle that case.
2019-06-11 10:42:27 +01:00
Jakub Tużnik bb382f47f9 Retain information about scale-up failures in CSR
This will provide the AutoscalingStatusProcessor with information
about failed scale-ups.
2019-06-05 16:53:30 +02:00
Łukasz Osipiuk 950a8a9f76 Quickly fail scaleup on all instance creation errors
Change-Id: Ib918251f3e3229d882d5182a98f129b77d7731a3
2019-06-03 13:32:41 +02:00
Łukasz Osipiuk c88f014470 Add debug log in handleOutOfResourcesErrorsForNodeGroup 2019-05-31 15:26:41 +02:00
Krzysztof Jastrzebski 4831d76288 Cache cloud provider node instances in cluster state. 2019-05-31 10:11:51 +02:00
Pengfei Ni b721438315 Revert "Use cloudProvider.GetInstanceID() to get unregistered nodes"
This reverts commit f4ef957ecd.
2019-03-08 10:47:26 +08:00
Pengfei Ni f4ef957ecd Use cloudProvider.GetInstanceID() to get unregistered nodes 2019-02-27 22:58:34 +08:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00
Łukasz Osipiuk b5f9a9505c Extend backoff interface with NodeInfo and error information 2019-01-09 11:25:34 +01:00
Łukasz Osipiuk 85a83b62bd Pass nodeGroup->NodeInfo map to ClusterStateRegistry
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Łukasz Osipiuk 5cddbda693 Rename nodeGroupBackoffInfo to backoff in ClusterStateRegistry 2018-12-31 17:59:58 +01:00
Łukasz Osipiuk 2fbae197f4 Handle possible stockout/quota scale-up errors 2018-12-28 17:17:07 +01:00
Łukasz Osipiuk 9689b30ee4 Do not use time.Now() in RegisterFailedScaleUp 2018-12-28 17:17:07 +01:00
Łukasz Osipiuk da5bef307b Allow updating Increase for ScaleUpRequest in ClusterStateRegistry 2018-12-28 17:17:07 +01:00
lsytj0413 8ca0e71d1e refactor(*): fix some golint warning 2018-12-24 11:07:15 +08:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
Łukasz Osipiuk 5962354c81 Inject Backoff instance to ClusterStateRegistry on creation 2018-11-13 14:25:16 +01:00
k8s-ci-robot 7008fb50be
Merge pull request #1380 from losipiuk/lo/backoff
Make Backoff interface
2018-11-07 05:13:43 -08:00
Łukasz Osipiuk 0e2c3739b7 Use NodeGroup as key in Backoff 2018-10-30 18:17:26 +01:00
Łukasz Osipiuk 55fc1e2f00 Store NodeGroup in ScaleUpRequest and ScaleDownRequest 2018-10-30 18:03:04 +01:00
Łukasz Osipiuk e462d4420c Extract Backoff interface 2018-10-29 23:02:13 +01:00
Łukasz Osipiuk 41b02870f8 NodeGroup.Nodes() return Instance struct instead instance name
This is preparatory work for handling resource related
(stockout/quota-exceeded) error conditions in CA.
2018-10-26 14:41:18 +02:00
Łukasz Osipiuk 29c22c0a3d Store single ScaleUpRequest per node group 2018-10-18 18:27:31 +02:00
Jakub Tużnik b105f28ebd Add a method to determine if a node group is at its its target size to CSR 2018-09-07 20:24:38 +02:00
Aleksandra Malinowska 364e2da764 Check for ready condition not true 2018-08-30 13:43:24 +02:00
Jakub Tużnik 51334f283e Fix GetClusterSize to return actual size in line with the rest of CSR
It returned the number of registered nodes, but should return the number
of provisioned nodes instead.
2018-08-27 14:58:07 +02:00
Jakub Tużnik 054f0b3b90 Add AutoscalingStatusProcessor 2018-08-07 14:47:06 +02:00
Krzysztof Jastrzebski dd1db7a0ac Move backoff mechanism to utils. 2018-06-13 15:32:25 +02:00
Aleksandra Malinowska 820f688d2a Update max unready nodes to 45% 2018-05-17 12:51:45 +02:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
Hang Yan b4713c22d5 Fix various typos in clusterstate package 2018-02-07 16:03:51 +08:00
Aleksandra Malinowska 3894ecb470 Export unregistered node count metric 2018-01-16 16:56:40 +01:00
Maciej Pytel 53603d0a2a Increase MaxNodeStartupTime to 15 minutes. 2017-11-13 15:14:47 +01:00
Maciej Pytel c376ef3c87 Add metrics for autoprovisioning 2017-10-31 17:42:58 +01:00
Maciej Pytel 02ccba3338 Update clusterstate after scale-up 2017-10-17 16:11:25 +02:00
Marcin Wielgus f658450b16 Merge pull request #379 from MaciekPytel/long_unregistered_node
Keep track of nodes that failed to register for a long time
2017-09-28 15:01:32 +02:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Maciej Pytel e12ee88f5f Add failed scale-up reason in metric 2017-09-26 13:40:34 +02:00
Maciej Pytel 5e05c84cf0 Add metric counting failed scale-ups
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00