Kubernetes Prow Robot
8871f1702d
Merge pull request #2521 from losipiuk/lo/rename-stockout
...
Rename STOCKOUT to RESOURCE_POOL_EXHAUSTED
2019-11-12 06:00:07 -08:00
Łukasz Osipiuk
7b499aa4c9
Rename STOCKOUT to RESOURCE_POOL_EXHAUSTED
...
We came into conclusion that using STOCKOUT as error code is too
specific. Migrating to more general term RESOURCE_POOL_EXHAUSTED.
2019-11-12 14:39:51 +01:00
Vivek Bagade
910e75365c
remove temporary nodes logic
2019-11-12 11:58:29 +01:00
Kubernetes Prow Robot
19dcfbd25e
Merge pull request #2476 from tghartland/fix-scale-down-errorf
...
CA: Make error message in scale down node draining consistent
2019-11-04 01:01:40 -08:00
Jarvis-Zhou
7c9d6e3518
Do not assign return values to variables when not needed
2019-10-25 19:28:00 +08:00
Thomas Hartland
229fc959b4
Make error message in scale down consistent
2019-10-23 15:28:09 +02:00
Łukasz Osipiuk
7f083d2393
Move core/utils.go to separate package and split into multiple files
2019-10-22 14:23:40 +02:00
Łukasz Osipiuk
41e9271b9e
Remove unused GetCandidatesForScaleDown
2019-10-22 14:23:38 +02:00
Kubernetes Prow Robot
3f137fde4f
Merge pull request #2448 from hectorj2f/hectorj2f/chore_typos
...
cluster-autoscaler: fix some typos in the code
2019-10-21 00:33:37 -07:00
Łukasz Osipiuk
288d4107b2
Rename GetCreatedNodesWithOutOfResourcesErrors to GetCreatedNodesWithErrors
2019-10-14 10:56:56 +02:00
Hector Fernandez
24401b373f
cluster-autoscaler: fix some typos in the code
2019-10-13 12:52:53 +02:00
Thomas Hartland
c51b7ee72a
Update TestRemoveOldUnregisteredNodes to pass cluster state registry
2019-09-30 14:29:02 +02:00
Thomas Hartland
474eef6d47
Invalidate node instances cache after deleting unregistered nodes
2019-09-30 14:29:02 +02:00
Thomas Hartland
7c17d52ec8
Invalidate node instances cache after deleting failed nodes
2019-09-30 13:56:33 +02:00
Kubernetes Prow Robot
791f0d8355
Merge pull request #2281 from DataDog/JulienBalestra/mig-block
...
cluster-autoscaler: blocked if an instance is detached from MIG
2019-09-11 05:03:22 -07:00
Julien Balestra
3441f616e1
cluster-autoscaler/skip-node: unblock cluster autoscaler when having a single nodegroup for node error
...
Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>
2019-09-11 13:40:23 +02:00
Krzysztof Jastrzebski
839cdaaa09
Stop disabling Cluster Autoscaler when there is no ready nodes.
2019-09-06 14:45:34 +02:00
Julien Balestra
6d707a08ac
cluster-autoscaler/metrics: expose the scale down cooldown
...
Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>
2019-08-27 18:12:33 +02:00
Kubernetes Prow Robot
9aac43e237
Merge pull request #2235 from piontec/fix/aws_spots_squashed
...
correctly handle lack of capacity of AWS spot ASGs
2019-08-19 04:27:30 -07:00
Kubernetes Prow Robot
4c056fb8ba
Merge pull request #2259 from towca/jtuznik/rejected-node-groups-more-info
...
Provide ScaleUpStatusProcessor with info about all rejected node groups
2019-08-19 04:05:31 -07:00
Kubernetes Prow Robot
3f0a5fa3c2
Merge pull request #2233 from vivekbagade/surge
...
Adding ScaleDownNodeProcessor
2019-08-19 03:59:32 -07:00
Jakub Tużnik
43466ff837
Provide ScaleUpStatusProcessor with info about all rejected node groups
...
Previously, it had info only about the ones that actually exist.
The changes to the eventing processor are done to keep its previous
behavior the same.
2019-08-19 12:48:10 +02:00
Łukasz Piątkowski
8d9b81caaa
correctly handle lack of capacity of AWS spot ASGs
2019-08-19 12:43:53 +02:00
Kubernetes Prow Robot
60bdca087d
Merge pull request #2255 from towca/jtuznik/create-node-group-result
...
Provide more info to ScaleUpStatusProcessor
2019-08-13 06:51:41 -07:00
Vivek Bagade
dc64d0aab2
Adding ScaleDownNodeProcessor
2019-08-12 20:19:55 +02:00
Jakub Tużnik
935476a7e2
Provide more info to ScaleUpStatusProcessor
...
Add info about considered and created nodegroups to
ScaleUpStatusProcessor
2019-08-12 17:20:09 +02:00
Jakub Tużnik
44ae89dd09
Communicate the result of RemoveUnneededNodeGroups to ScaleDownStatusProcessor
2019-08-12 17:03:51 +02:00
t-qini
f7c563ab06
Modify the code as the simple solution proposed by MaciekPytel.
2019-07-18 23:58:05 +08:00
t-qini
622a838c2c
Modify nodal similarity rules.
2019-07-09 16:04:40 +08:00
Kubernetes Prow Robot
c6067574c1
Merge pull request #2160 from aleksandra-malinowska/scale-up-events-fix
...
Add resource limit type to NotTriggerScaleUp event
2019-07-05 05:48:38 -07:00
Aleksandra Malinowska
0d0c9440f6
Add no scale up test
2019-07-03 16:38:53 +02:00
Aleksandra Malinowska
7b80f4e8b8
Separate running scale up test from checking results
2019-07-03 16:38:52 +02:00
Aleksandra Malinowska
c27ae4eb24
Add resource limit type to NotTriggerScaleUp event
2019-07-03 16:38:46 +02:00
Aleksandra Malinowska
d01a2392db
Make scale down unit tests faster
2019-07-03 13:12:48 +02:00
Pengfei Ni
d45fee06da
Ensure upcoming nodes are different
2019-07-02 16:52:19 +08:00
silenceper
478660a6bb
fix error
2019-06-28 18:49:58 +08:00
Vivek Bagade
0a75333e1b
Potential performance improvement in bin packing unschedulable pods
2019-06-19 18:39:47 +02:00
Vivek Bagade
90aa28a077
Move pod packing in upcoming nodes to RunOnce from Estimator for performance improvements
2019-06-19 14:48:47 +02:00
Kubernetes Prow Robot
da36677d04
Merge pull request #2108 from losipiuk/lo/other-error-ut
...
Add unit test case for OTHER error handling
2019-06-10 05:29:08 -07:00
Łukasz Osipiuk
0bcf5315a7
Do not fail loop iteration if unregistered nodes cannot be removed
...
The mechanism of unregistered nodes removal is not the first
responsibility of Cluster Autoscaler. We do not want to renderi CA
unsable (disable scale-up and scale-down) if removing unregistered nodes
cannot be done for prolonged period of time.
2019-06-10 13:45:54 +02:00
Łukasz Osipiuk
be68d06b40
Add unit test case for OTHER error handling
2019-06-07 16:54:01 +02:00
Jakub Tużnik
bb382f47f9
Retain information about scale-up failures in CSR
...
This will provide the AutoscalingStatusProcessor with information
about failed scale-ups.
2019-06-05 16:53:30 +02:00
Krzysztof Jastrzebski
22b4a6283e
Optimize building node infos by using map with pods for nodes.
2019-06-03 13:24:09 +02:00
Kubernetes Prow Robot
a0853bcc80
Merge pull request #2071 from losipiuk/lo/predicate-checker-speedup
...
Precompute inter pod equivalence groups in checkPodsSchedulableOnNode
2019-06-03 03:52:16 -07:00
Krzysztof Jastrzebski
4831d76288
Cache cloud provider node instances in cluster state.
2019-05-31 10:11:51 +02:00
Łukasz Osipiuk
a849ead286
Precompute inter pod equivalence groups in checkPodsSchedulableOnNode
2019-05-29 18:05:52 +02:00
Krzysztof Jastrzebski
6944f3fc56
Delete zero values from deletionsInProgress map in NodeDeletionTracker.
2019-05-28 14:34:56 +02:00
Krzysztof Jastrzebski
da82f831a3
Use fakeNodeLister instead of mocks.
2019-05-27 15:10:31 +02:00
Kubernetes Prow Robot
cb4e60f8d4
Merge pull request #2031 from krzysztof-jastrzebski/master
...
Add functionality which delays node deletion to let other components prepare for deletion.
2019-05-20 00:57:13 -07:00
Kubernetes Prow Robot
8d2ec08b2c
Merge pull request #2015 from losipiuk/lo/pass-via-context
...
Add methods for passing arbitrary object via autoscaling context
2019-05-17 08:12:07 -07:00
Łukasz Osipiuk
e76558c65f
Add methods for passing arbitrary object via autoscaling context
...
Change-Id: I066e58010a0aef4989bfc1f73b90bc69c773b26e
2019-05-17 16:38:12 +02:00
Krzysztof Jastrzebski
4247c8b032
Implement functionality which delays node deletion when node has
...
annotation with prefix
'delay-deletion.cluster-autoscaler.kubernetes.io/'.
2019-05-17 16:06:17 +02:00
Kubernetes Prow Robot
c756ed3953
Merge pull request #1963 from cjbradfield/ignore-taints
...
add --ignore-taint flag and ignore taints added by TaintNodesByCondition
2019-05-15 02:18:21 -07:00
Chris Bradfield
92ea680f1a
Implement an --ignore-taint flag
...
This change adds support for a user to specify taints to ignore when
considering a node as a template for a node group.
2019-05-14 10:22:59 -07:00
Chris Bradfield
54773da830
Ignore taints added from TaintNodesByCondition when considering a node as a Node Group template
2019-05-14 10:22:59 -07:00
Kubernetes Prow Robot
a6c109f8f5
Merge pull request #1967 from towca/jtuznik/delete-empty-nodes-behaviour-fix
...
Modify the info passed to ScaleDownStatusProcessor when empty nodes a…
2019-04-30 05:25:37 -07:00
Jakub Tużnik
b92f971326
Provide ScaleDownStatusProcessor with more info about scale-down results
2019-04-30 13:49:06 +02:00
Jakub Tużnik
402c643851
Modify the info passed to ScaleDownStatusProcessor when empty nodes are deleted
...
Previously, if any of the nodes fails to delete, the processor gets
a ScaleDownError status. After this commit, it will get the list of
nodes that were successfully deleted.
2019-04-26 15:54:11 +02:00
Łukasz Osipiuk
c9811e87b4
Include pods with NominatedNodeName in scheduledPods list used for scale-up considerations
...
Change-Id: Ie4c095b30bf0cd1f160f1ac4b8c1fcb8c0524096
2019-04-15 16:59:13 +02:00
Łukasz Osipiuk
db4c6f1133
Migrate filter out schedulabe to PodListProcessor
2019-04-15 16:59:13 +02:00
Łukasz Osipiuk
5c09c50774
Pass ready nodes list to PodListProcessor
2019-04-15 16:59:13 +02:00
Łukasz Osipiuk
c6115b826e
Define ProcessorCallbacks interface
2019-04-15 16:59:13 +02:00
Jiaxin Shan
83ae66cebc
Consider GPU utilization in scaling down
2019-04-04 01:12:51 -07:00
Jiaxin Shan
90666881d3
Move GPULabel and GPUTypes to cloud provider
2019-03-25 13:03:01 -07:00
Lukasz Piatkowski
c5ba4b3068
priority expander
2019-03-22 10:43:20 +01:00
Łukasz Osipiuk
2474dc2fd5
Call CloudProvider.Refresh before getNodeInfosForGroups
...
We need to call refresh before getNodeInfosForGroups. If we have
stale state getNodeInfosForGroups may fail and we will end up in infinite crash looping.
2019-03-12 12:07:49 +01:00
Aleksandra Malinowska
62a28f3005
Soft taint when there are no candidates
2019-03-11 14:05:09 +01:00
Andrew McDermott
5ae76ea66e
UPSTREAM: <carry>: fix max cluster size calculation on scale up
...
When scaling up the calculation for computing the maximum cluster size
does not take into account the number of any upcoming nodes and it is
possible to grow the cluster beyond the cluster
size (--max-nodes-total).
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1670695
2019-03-08 13:28:58 +00:00
Uday Ruddarraju
91b7bc08a1
Fixing minor error handling bug in static autoscaler
2019-03-07 15:16:27 -08:00
Kubernetes Prow Robot
8944afd901
Merge pull request #1720 from aleksandra-malinowska/events-client
...
Use separate client for events
2019-02-26 12:00:19 -08:00
Aleksandra Malinowska
a824e87957
Only soft taint nodes if there's no scale down to do
2019-02-25 17:11:15 +01:00
Aleksandra Malinowska
f304722a1f
Use separate client for events
2019-02-25 13:58:54 +01:00
Pengfei Ni
2546d0d97c
Move leaderelection options to new packages
2019-02-21 13:45:46 +08:00
Pengfei Ni
128729bae9
Move schedulercache to package nodeinfo
2019-02-21 12:41:08 +08:00
Jacek Kaniuk
d969baff22
Cache exemplar ready node for each node group
2019-02-11 17:40:58 +01:00
Jacek Kaniuk
f054c53c46
Account for kernel reserved memory in capacity calculations
2019-02-08 17:04:07 +01:00
Marcin Wielgus
99f1dcf9d2
Merge branch 'master' into crc-fix-error-format
2019-02-01 17:22:57 +01:00
Kubernetes Prow Robot
bd84757b7e
Merge pull request #1596 from vivekbagade/improve-filterout-logic
...
Added better checks for filterSchedulablePods and added a tunable fla…
2019-01-27 13:00:31 -08:00
Vivek Bagade
c6b87841ce
Added a new method that uses pod packing to filter schedulable pods
...
filterOutSchedulableByPacking is an alternative to the older
filterOutSchedulable. filterOutSchedulableByPacking sorts pods in
unschedulableCandidates by priority and filters out pods that can be
scheduled on free capacity on existing nodes. It uses a basic packing
approach to do this. Pods with nominatedNodeName set are always
filtered out.
filterOutSchedulableByPacking is set to be used by default, but, this
can be toggled off by setting filter-out-schedulable-pods-uses-packing
flag to false, which would then activate the older and more lenient
filterOutSchedulable(now called filterOutSchedulableSimple).
Added test cases for both methods.
2019-01-25 16:09:51 +05:30
Jacek Kaniuk
d05dbb9ec4
Refactor tests of tainting
...
Refactor scale down nad deletetaint tests
Speed up deletetaint tests
2019-01-25 09:21:41 +01:00
Vivek Bagade
8fff0f6556
Removing nominatedNodeName annotation and moving to pod.Status.NominatedNodeName
2019-01-25 00:06:03 +05:30
Vivek Bagade
79ef3a6940
unexporting methods in utils.go
2019-01-25 00:06:03 +05:30
Jacek Kaniuk
d00af2373c
Tainting nodes - update first, refresh on conflict
2019-01-24 16:57:27 +01:00
Jacek Kaniuk
0c64e0932a
Tainting unneeded nodes as PreferNoSchedule
2019-01-21 13:06:50 +01:00
CodeLingo Bot
c0603afdeb
Fix error format strings according to best practices from CodeReviewComments
...
Fix error format strings according to best practices from CodeReviewComments
Fix error format strings according to best practices from CodeReviewComments
Reverted incorrect change to with error format string
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Resolve conflict
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix error strings in testscases to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix more error strings to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
2019-01-11 09:10:31 +13:00
Łukasz Osipiuk
85a83b62bd
Pass nodeGroup->NodeInfo map to ClusterStateRegistry
...
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Kubernetes Prow Robot
4002559a4c
Merge pull request #1516 from frobware/fix-max-nodes-total-upstream
...
fix calculation of max cluster size
2019-01-03 10:02:38 -08:00
Maciej Pytel
3f0da8947a
Use listers in scale-up
2019-01-02 15:56:01 +01:00
Kubernetes Prow Robot
f960f95d28
Merge pull request #1542 from JoeWrightss/patch-7
...
Fix typo in comment
2019-01-02 05:24:14 -08:00
JoeWrightss
9f87523de9
Fix typo in comment
...
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2019-01-01 15:10:43 +08:00
Maciej Pytel
9060014992
Use listers in scale-down
2018-12-31 14:55:38 +01:00
Kubernetes Prow Robot
ab7f1e69be
Merge pull request #1464 from losipiuk/lo/stockouts2
...
Better quota-exceeded/stockout handling
2018-12-31 05:28:08 -08:00
Łukasz Osipiuk
ddbe05b279
Add unit test for stockouts handling
2018-12-28 17:17:07 +01:00
Łukasz Osipiuk
2fbae197f4
Handle possible stockout/quota scale-up errors
2018-12-28 17:17:07 +01:00
Łukasz Osipiuk
9689b30ee4
Do not use time.Now() in RegisterFailedScaleUp
2018-12-28 17:17:07 +01:00
Łukasz Osipiuk
da5bef307b
Allow updating Increase for ScaleUpRequest in ClusterStateRegistry
2018-12-28 17:17:07 +01:00
Maciej Pytel
60babe7158
Use kubernetes lister for daemonset instead of custom one
...
Also migrate to using apps/v1.DaemonSet instead of old
extensions/v1beta1.
2018-12-28 13:55:41 +01:00
Maciej Pytel
40811c2f8b
Add listers for more controllers
2018-12-28 13:31:21 +01:00
Kubernetes Prow Robot
62c492cb1f
Merge pull request #1518 from lsytj0413/fix-golint
...
refactor(*): fix golint warning
2018-12-21 06:05:20 -08:00
lsytj0413
672dddd23a
refactor(*): fix golint warning
2018-12-19 10:04:08 +08:00