Kubernetes Prow Robot
bf3a9fb52e
Merge pull request #2436 from Jeffwan/skip_first_acceptable_range_check
...
Skip acceptable range check before it has data
2019-12-10 01:49:29 -08:00
Jakub Tużnik
f64b6cd4de
CSR: fix a bug in GetClusterSize
...
Currently, GetClusterSize reports the target number for all autoscaled
node groups, but the actual number for _all_ node groups, even those
that are not autoscaled. This commit fixes that behavior so that both
target and actual size reported are from autoscaled node groups only.
2019-11-20 13:49:49 +01:00
Łukasz Osipiuk
aa53261098
More verbose logging of GCE instance create errors
2019-10-15 15:36:38 +02:00
Łukasz Osipiuk
288d4107b2
Rename GetCreatedNodesWithOutOfResourcesErrors to GetCreatedNodesWithErrors
2019-10-14 10:56:56 +02:00
Jiaxin Shan
0d278a2554
Skip acceptable range check before it has data
2019-10-09 17:59:43 -07:00
Thomas Hartland
7c17d52ec8
Invalidate node instances cache after deleting failed nodes
2019-09-30 13:56:33 +02:00
Kubernetes Prow Robot
6434df247d
Merge pull request #2304 from krzysztof-jastrzebski/fix_bug
...
Stop disabling Cluster Autoscaler when there is no ready nodes.
2019-09-06 07:06:57 -07:00
Krzysztof Jastrzebski
839cdaaa09
Stop disabling Cluster Autoscaler when there is no ready nodes.
2019-09-06 14:45:34 +02:00
Łukasz Osipiuk
79b4614328
Use NodeDiskPressure conditino instead of NodeOutOfDisk
2019-09-05 23:23:43 +02:00
devinyan
3a633de55a
nodeGroup judy IsNil to avoid crashed
2019-06-30 17:33:32 +08:00
Kubernetes Prow Robot
dd89fb1385
Merge pull request #2096 from frobware/fix-segv-in-updateReadinessStats
...
Fix potential SEGV in updateReadinessStats
2019-06-11 09:00:24 -07:00
Andrew McDermott
91016a605a
Fix SEGV in updateReadinessStats
...
Calling cloudprovider.NodeGroupForNode(unregistered.Node) can result
in a nil result for the nodegroup - handle that case.
2019-06-11 10:42:27 +01:00
Jakub Tużnik
bb382f47f9
Retain information about scale-up failures in CSR
...
This will provide the AutoscalingStatusProcessor with information
about failed scale-ups.
2019-06-05 16:53:30 +02:00
Łukasz Osipiuk
950a8a9f76
Quickly fail scaleup on all instance creation errors
...
Change-Id: Ib918251f3e3229d882d5182a98f129b77d7731a3
2019-06-03 13:32:41 +02:00
Łukasz Osipiuk
c88f014470
Add debug log in handleOutOfResourcesErrorsForNodeGroup
2019-05-31 15:26:41 +02:00
Krzysztof Jastrzebski
4831d76288
Cache cloud provider node instances in cluster state.
2019-05-31 10:11:51 +02:00
Pengfei Ni
b721438315
Revert "Use cloudProvider.GetInstanceID() to get unregistered nodes"
...
This reverts commit f4ef957ecd .
2019-03-08 10:47:26 +08:00
Pengfei Ni
f4ef957ecd
Use cloudProvider.GetInstanceID() to get unregistered nodes
2019-02-27 22:58:34 +08:00
Pengfei Ni
128729bae9
Move schedulercache to package nodeinfo
2019-02-21 12:41:08 +08:00
Łukasz Osipiuk
b5f9a9505c
Extend backoff interface with NodeInfo and error information
2019-01-09 11:25:34 +01:00
Łukasz Osipiuk
85a83b62bd
Pass nodeGroup->NodeInfo map to ClusterStateRegistry
...
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Łukasz Osipiuk
5cddbda693
Rename nodeGroupBackoffInfo to backoff in ClusterStateRegistry
2018-12-31 17:59:58 +01:00
Łukasz Osipiuk
2fbae197f4
Handle possible stockout/quota scale-up errors
2018-12-28 17:17:07 +01:00
Łukasz Osipiuk
9689b30ee4
Do not use time.Now() in RegisterFailedScaleUp
2018-12-28 17:17:07 +01:00
Łukasz Osipiuk
da5bef307b
Allow updating Increase for ScaleUpRequest in ClusterStateRegistry
2018-12-28 17:17:07 +01:00
lsytj0413
8ca0e71d1e
refactor(*): fix some golint warning
2018-12-24 11:07:15 +08:00
Łukasz Osipiuk
016bf7fc2c
Use k8s.io/klog instead github.com/golang/glog
2018-11-26 17:30:31 +01:00
Łukasz Osipiuk
5962354c81
Inject Backoff instance to ClusterStateRegistry on creation
2018-11-13 14:25:16 +01:00
k8s-ci-robot
7008fb50be
Merge pull request #1380 from losipiuk/lo/backoff
...
Make Backoff interface
2018-11-07 05:13:43 -08:00
Łukasz Osipiuk
0e2c3739b7
Use NodeGroup as key in Backoff
2018-10-30 18:17:26 +01:00
Łukasz Osipiuk
55fc1e2f00
Store NodeGroup in ScaleUpRequest and ScaleDownRequest
2018-10-30 18:03:04 +01:00
Łukasz Osipiuk
e462d4420c
Extract Backoff interface
2018-10-29 23:02:13 +01:00
Łukasz Osipiuk
41b02870f8
NodeGroup.Nodes() return Instance struct instead instance name
...
This is preparatory work for handling resource related
(stockout/quota-exceeded) error conditions in CA.
2018-10-26 14:41:18 +02:00
Łukasz Osipiuk
29c22c0a3d
Store single ScaleUpRequest per node group
2018-10-18 18:27:31 +02:00
Jakub Tużnik
b105f28ebd
Add a method to determine if a node group is at its its target size to CSR
2018-09-07 20:24:38 +02:00
Aleksandra Malinowska
364e2da764
Check for ready condition not true
2018-08-30 13:43:24 +02:00
Jakub Tużnik
51334f283e
Fix GetClusterSize to return actual size in line with the rest of CSR
...
It returned the number of registered nodes, but should return the number
of provisioned nodes instead.
2018-08-27 14:58:07 +02:00
Jakub Tużnik
054f0b3b90
Add AutoscalingStatusProcessor
2018-08-07 14:47:06 +02:00
Krzysztof Jastrzebski
dd1db7a0ac
Move backoff mechanism to utils.
2018-06-13 15:32:25 +02:00
Aleksandra Malinowska
820f688d2a
Update max unready nodes to 45%
2018-05-17 12:51:45 +02:00
Aleksandra Malinowska
4c594db7f8
Run spellchecker
2018-03-15 15:47:49 +01:00
Hang Yan
b4713c22d5
Fix various typos in clusterstate package
2018-02-07 16:03:51 +08:00
Aleksandra Malinowska
3894ecb470
Export unregistered node count metric
2018-01-16 16:56:40 +01:00
Maciej Pytel
53603d0a2a
Increase MaxNodeStartupTime to 15 minutes.
2017-11-13 15:14:47 +01:00
Maciej Pytel
c376ef3c87
Add metrics for autoprovisioning
2017-10-31 17:42:58 +01:00
Maciej Pytel
02ccba3338
Update clusterstate after scale-up
2017-10-17 16:11:25 +02:00
Marcin Wielgus
f658450b16
Merge pull request #379 from MaciekPytel/long_unregistered_node
...
Keep track of nodes that failed to register for a long time
2017-09-28 15:01:32 +02:00
Maciej Pytel
ff21b0b00c
Keep track of nodes that failed to register for a long time
...
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Maciej Pytel
e12ee88f5f
Add failed scale-up reason in metric
2017-09-26 13:40:34 +02:00
Maciej Pytel
5e05c84cf0
Add metric counting failed scale-ups
...
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00