autoscaler

Commit Graph

Author	SHA1	Message	Date
Kubernetes Prow Robot	8871f1702d	Merge pull request #2521 from losipiuk/lo/rename-stockout Rename STOCKOUT to RESOURCE_POOL_EXHAUSTED	2019-11-12 06:00:07 -08:00
Łukasz Osipiuk	7b499aa4c9	Rename STOCKOUT to RESOURCE_POOL_EXHAUSTED We came into conclusion that using STOCKOUT as error code is too specific. Migrating to more general term RESOURCE_POOL_EXHAUSTED.	2019-11-12 14:39:51 +01:00
Vivek Bagade	910e75365c	remove temporary nodes logic	2019-11-12 11:58:29 +01:00
Kubernetes Prow Robot	19dcfbd25e	Merge pull request #2476 from tghartland/fix-scale-down-errorf CA: Make error message in scale down node draining consistent	2019-11-04 01:01:40 -08:00
Jarvis-Zhou	7c9d6e3518	Do not assign return values to variables when not needed	2019-10-25 19:28:00 +08:00
Thomas Hartland	229fc959b4	Make error message in scale down consistent	2019-10-23 15:28:09 +02:00
Łukasz Osipiuk	7f083d2393	Move core/utils.go to separate package and split into multiple files	2019-10-22 14:23:40 +02:00
Łukasz Osipiuk	41e9271b9e	Remove unused GetCandidatesForScaleDown	2019-10-22 14:23:38 +02:00
Kubernetes Prow Robot	3f137fde4f	Merge pull request #2448 from hectorj2f/hectorj2f/chore_typos cluster-autoscaler: fix some typos in the code	2019-10-21 00:33:37 -07:00
Łukasz Osipiuk	288d4107b2	Rename GetCreatedNodesWithOutOfResourcesErrors to GetCreatedNodesWithErrors	2019-10-14 10:56:56 +02:00
Hector Fernandez	24401b373f	cluster-autoscaler: fix some typos in the code	2019-10-13 12:52:53 +02:00
Thomas Hartland	c51b7ee72a	Update TestRemoveOldUnregisteredNodes to pass cluster state registry	2019-09-30 14:29:02 +02:00
Thomas Hartland	474eef6d47	Invalidate node instances cache after deleting unregistered nodes	2019-09-30 14:29:02 +02:00
Thomas Hartland	7c17d52ec8	Invalidate node instances cache after deleting failed nodes	2019-09-30 13:56:33 +02:00
Kubernetes Prow Robot	791f0d8355	Merge pull request #2281 from DataDog/JulienBalestra/mig-block cluster-autoscaler: blocked if an instance is detached from MIG	2019-09-11 05:03:22 -07:00
Julien Balestra	3441f616e1	cluster-autoscaler/skip-node: unblock cluster autoscaler when having a single nodegroup for node error Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>	2019-09-11 13:40:23 +02:00
Krzysztof Jastrzebski	839cdaaa09	Stop disabling Cluster Autoscaler when there is no ready nodes.	2019-09-06 14:45:34 +02:00
Julien Balestra	6d707a08ac	cluster-autoscaler/metrics: expose the scale down cooldown Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>	2019-08-27 18:12:33 +02:00
Kubernetes Prow Robot	9aac43e237	Merge pull request #2235 from piontec/fix/aws_spots_squashed correctly handle lack of capacity of AWS spot ASGs	2019-08-19 04:27:30 -07:00
Kubernetes Prow Robot	4c056fb8ba	Merge pull request #2259 from towca/jtuznik/rejected-node-groups-more-info Provide ScaleUpStatusProcessor with info about all rejected node groups	2019-08-19 04:05:31 -07:00
Kubernetes Prow Robot	3f0a5fa3c2	Merge pull request #2233 from vivekbagade/surge Adding ScaleDownNodeProcessor	2019-08-19 03:59:32 -07:00
Jakub Tużnik	43466ff837	Provide ScaleUpStatusProcessor with info about all rejected node groups Previously, it had info only about the ones that actually exist. The changes to the eventing processor are done to keep its previous behavior the same.	2019-08-19 12:48:10 +02:00
Łukasz Piątkowski	8d9b81caaa	correctly handle lack of capacity of AWS spot ASGs	2019-08-19 12:43:53 +02:00
Kubernetes Prow Robot	60bdca087d	Merge pull request #2255 from towca/jtuznik/create-node-group-result Provide more info to ScaleUpStatusProcessor	2019-08-13 06:51:41 -07:00
Vivek Bagade	dc64d0aab2	Adding ScaleDownNodeProcessor	2019-08-12 20:19:55 +02:00
Jakub Tużnik	935476a7e2	Provide more info to ScaleUpStatusProcessor Add info about considered and created nodegroups to ScaleUpStatusProcessor	2019-08-12 17:20:09 +02:00
Jakub Tużnik	44ae89dd09	Communicate the result of RemoveUnneededNodeGroups to ScaleDownStatusProcessor	2019-08-12 17:03:51 +02:00
t-qini	f7c563ab06	Modify the code as the simple solution proposed by MaciekPytel.	2019-07-18 23:58:05 +08:00
t-qini	622a838c2c	Modify nodal similarity rules.	2019-07-09 16:04:40 +08:00
Kubernetes Prow Robot	c6067574c1	Merge pull request #2160 from aleksandra-malinowska/scale-up-events-fix Add resource limit type to NotTriggerScaleUp event	2019-07-05 05:48:38 -07:00
Aleksandra Malinowska	0d0c9440f6	Add no scale up test	2019-07-03 16:38:53 +02:00
Aleksandra Malinowska	7b80f4e8b8	Separate running scale up test from checking results	2019-07-03 16:38:52 +02:00
Aleksandra Malinowska	c27ae4eb24	Add resource limit type to NotTriggerScaleUp event	2019-07-03 16:38:46 +02:00
Aleksandra Malinowska	d01a2392db	Make scale down unit tests faster	2019-07-03 13:12:48 +02:00
Pengfei Ni	d45fee06da	Ensure upcoming nodes are different	2019-07-02 16:52:19 +08:00
silenceper	478660a6bb	fix error	2019-06-28 18:49:58 +08:00
Vivek Bagade	0a75333e1b	Potential performance improvement in bin packing unschedulable pods	2019-06-19 18:39:47 +02:00
Vivek Bagade	90aa28a077	Move pod packing in upcoming nodes to RunOnce from Estimator for performance improvements	2019-06-19 14:48:47 +02:00
Kubernetes Prow Robot	da36677d04	Merge pull request #2108 from losipiuk/lo/other-error-ut Add unit test case for OTHER error handling	2019-06-10 05:29:08 -07:00
Łukasz Osipiuk	0bcf5315a7	Do not fail loop iteration if unregistered nodes cannot be removed The mechanism of unregistered nodes removal is not the first responsibility of Cluster Autoscaler. We do not want to renderi CA unsable (disable scale-up and scale-down) if removing unregistered nodes cannot be done for prolonged period of time.	2019-06-10 13:45:54 +02:00
Łukasz Osipiuk	be68d06b40	Add unit test case for OTHER error handling	2019-06-07 16:54:01 +02:00
Jakub Tużnik	bb382f47f9	Retain information about scale-up failures in CSR This will provide the AutoscalingStatusProcessor with information about failed scale-ups.	2019-06-05 16:53:30 +02:00
Krzysztof Jastrzebski	22b4a6283e	Optimize building node infos by using map with pods for nodes.	2019-06-03 13:24:09 +02:00
Kubernetes Prow Robot	a0853bcc80	Merge pull request #2071 from losipiuk/lo/predicate-checker-speedup Precompute inter pod equivalence groups in checkPodsSchedulableOnNode	2019-06-03 03:52:16 -07:00
Krzysztof Jastrzebski	4831d76288	Cache cloud provider node instances in cluster state.	2019-05-31 10:11:51 +02:00
Łukasz Osipiuk	a849ead286	Precompute inter pod equivalence groups in checkPodsSchedulableOnNode	2019-05-29 18:05:52 +02:00
Krzysztof Jastrzebski	6944f3fc56	Delete zero values from deletionsInProgress map in NodeDeletionTracker.	2019-05-28 14:34:56 +02:00
Krzysztof Jastrzebski	da82f831a3	Use fakeNodeLister instead of mocks.	2019-05-27 15:10:31 +02:00
Kubernetes Prow Robot	cb4e60f8d4	Merge pull request #2031 from krzysztof-jastrzebski/master Add functionality which delays node deletion to let other components prepare for deletion.	2019-05-20 00:57:13 -07:00
Kubernetes Prow Robot	8d2ec08b2c	Merge pull request #2015 from losipiuk/lo/pass-via-context Add methods for passing arbitrary object via autoscaling context	2019-05-17 08:12:07 -07:00
Łukasz Osipiuk	e76558c65f	Add methods for passing arbitrary object via autoscaling context Change-Id: I066e58010a0aef4989bfc1f73b90bc69c773b26e	2019-05-17 16:38:12 +02:00
Krzysztof Jastrzebski	4247c8b032	Implement functionality which delays node deletion when node has annotation with prefix 'delay-deletion.cluster-autoscaler.kubernetes.io/'.	2019-05-17 16:06:17 +02:00
Kubernetes Prow Robot	c756ed3953	Merge pull request #1963 from cjbradfield/ignore-taints add --ignore-taint flag and ignore taints added by TaintNodesByCondition	2019-05-15 02:18:21 -07:00
Chris Bradfield	92ea680f1a	Implement an --ignore-taint flag This change adds support for a user to specify taints to ignore when considering a node as a template for a node group.	2019-05-14 10:22:59 -07:00
Chris Bradfield	54773da830	Ignore taints added from TaintNodesByCondition when considering a node as a Node Group template	2019-05-14 10:22:59 -07:00
Kubernetes Prow Robot	a6c109f8f5	Merge pull request #1967 from towca/jtuznik/delete-empty-nodes-behaviour-fix Modify the info passed to ScaleDownStatusProcessor when empty nodes a…	2019-04-30 05:25:37 -07:00
Jakub Tużnik	b92f971326	Provide ScaleDownStatusProcessor with more info about scale-down results	2019-04-30 13:49:06 +02:00
Jakub Tużnik	402c643851	Modify the info passed to ScaleDownStatusProcessor when empty nodes are deleted Previously, if any of the nodes fails to delete, the processor gets a ScaleDownError status. After this commit, it will get the list of nodes that were successfully deleted.	2019-04-26 15:54:11 +02:00
Łukasz Osipiuk	c9811e87b4	Include pods with NominatedNodeName in scheduledPods list used for scale-up considerations Change-Id: Ie4c095b30bf0cd1f160f1ac4b8c1fcb8c0524096	2019-04-15 16:59:13 +02:00
Łukasz Osipiuk	db4c6f1133	Migrate filter out schedulabe to PodListProcessor	2019-04-15 16:59:13 +02:00
Łukasz Osipiuk	5c09c50774	Pass ready nodes list to PodListProcessor	2019-04-15 16:59:13 +02:00
Łukasz Osipiuk	c6115b826e	Define ProcessorCallbacks interface	2019-04-15 16:59:13 +02:00
Jiaxin Shan	83ae66cebc	Consider GPU utilization in scaling down	2019-04-04 01:12:51 -07:00
Jiaxin Shan	90666881d3	Move GPULabel and GPUTypes to cloud provider	2019-03-25 13:03:01 -07:00
Lukasz Piatkowski	c5ba4b3068	priority expander	2019-03-22 10:43:20 +01:00
Łukasz Osipiuk	2474dc2fd5	Call CloudProvider.Refresh before getNodeInfosForGroups We need to call refresh before getNodeInfosForGroups. If we have stale state getNodeInfosForGroups may fail and we will end up in infinite crash looping.	2019-03-12 12:07:49 +01:00
Aleksandra Malinowska	62a28f3005	Soft taint when there are no candidates	2019-03-11 14:05:09 +01:00
Andrew McDermott	5ae76ea66e	UPSTREAM: <carry>: fix max cluster size calculation on scale up When scaling up the calculation for computing the maximum cluster size does not take into account the number of any upcoming nodes and it is possible to grow the cluster beyond the cluster size (--max-nodes-total). Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1670695	2019-03-08 13:28:58 +00:00
Uday Ruddarraju	91b7bc08a1	Fixing minor error handling bug in static autoscaler	2019-03-07 15:16:27 -08:00
Kubernetes Prow Robot	8944afd901	Merge pull request #1720 from aleksandra-malinowska/events-client Use separate client for events	2019-02-26 12:00:19 -08:00
Aleksandra Malinowska	a824e87957	Only soft taint nodes if there's no scale down to do	2019-02-25 17:11:15 +01:00
Aleksandra Malinowska	f304722a1f	Use separate client for events	2019-02-25 13:58:54 +01:00
Pengfei Ni	2546d0d97c	Move leaderelection options to new packages	2019-02-21 13:45:46 +08:00
Pengfei Ni	128729bae9	Move schedulercache to package nodeinfo	2019-02-21 12:41:08 +08:00
Jacek Kaniuk	d969baff22	Cache exemplar ready node for each node group	2019-02-11 17:40:58 +01:00
Jacek Kaniuk	f054c53c46	Account for kernel reserved memory in capacity calculations	2019-02-08 17:04:07 +01:00
Marcin Wielgus	99f1dcf9d2	Merge branch 'master' into crc-fix-error-format	2019-02-01 17:22:57 +01:00
Kubernetes Prow Robot	bd84757b7e	Merge pull request #1596 from vivekbagade/improve-filterout-logic Added better checks for filterSchedulablePods and added a tunable fla…	2019-01-27 13:00:31 -08:00
Vivek Bagade	c6b87841ce	Added a new method that uses pod packing to filter schedulable pods filterOutSchedulableByPacking is an alternative to the older filterOutSchedulable. filterOutSchedulableByPacking sorts pods in unschedulableCandidates by priority and filters out pods that can be scheduled on free capacity on existing nodes. It uses a basic packing approach to do this. Pods with nominatedNodeName set are always filtered out. filterOutSchedulableByPacking is set to be used by default, but, this can be toggled off by setting filter-out-schedulable-pods-uses-packing flag to false, which would then activate the older and more lenient filterOutSchedulable(now called filterOutSchedulableSimple). Added test cases for both methods.	2019-01-25 16:09:51 +05:30
Jacek Kaniuk	d05dbb9ec4	Refactor tests of tainting Refactor scale down nad deletetaint tests Speed up deletetaint tests	2019-01-25 09:21:41 +01:00
Vivek Bagade	8fff0f6556	Removing nominatedNodeName annotation and moving to pod.Status.NominatedNodeName	2019-01-25 00:06:03 +05:30
Vivek Bagade	79ef3a6940	unexporting methods in utils.go	2019-01-25 00:06:03 +05:30
Jacek Kaniuk	d00af2373c	Tainting nodes - update first, refresh on conflict	2019-01-24 16:57:27 +01:00
Jacek Kaniuk	0c64e0932a	Tainting unneeded nodes as PreferNoSchedule	2019-01-21 13:06:50 +01:00
CodeLingo Bot	c0603afdeb	Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Reverted incorrect change to with error format string Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Resolve conflict Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix error strings in testscases to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix more error strings to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io>	2019-01-11 09:10:31 +13:00
Łukasz Osipiuk	85a83b62bd	Pass nodeGroup->NodeInfo map to ClusterStateRegistry Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801	2019-01-08 15:52:00 +01:00
Kubernetes Prow Robot	4002559a4c	Merge pull request #1516 from frobware/fix-max-nodes-total-upstream fix calculation of max cluster size	2019-01-03 10:02:38 -08:00
Maciej Pytel	3f0da8947a	Use listers in scale-up	2019-01-02 15:56:01 +01:00
Kubernetes Prow Robot	f960f95d28	Merge pull request #1542 from JoeWrightss/patch-7 Fix typo in comment	2019-01-02 05:24:14 -08:00
JoeWrightss	9f87523de9	Fix typo in comment Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>	2019-01-01 15:10:43 +08:00
Maciej Pytel	9060014992	Use listers in scale-down	2018-12-31 14:55:38 +01:00
Kubernetes Prow Robot	ab7f1e69be	Merge pull request #1464 from losipiuk/lo/stockouts2 Better quota-exceeded/stockout handling	2018-12-31 05:28:08 -08:00
Łukasz Osipiuk	ddbe05b279	Add unit test for stockouts handling	2018-12-28 17:17:07 +01:00
Łukasz Osipiuk	2fbae197f4	Handle possible stockout/quota scale-up errors	2018-12-28 17:17:07 +01:00
Łukasz Osipiuk	9689b30ee4	Do not use time.Now() in RegisterFailedScaleUp	2018-12-28 17:17:07 +01:00
Łukasz Osipiuk	da5bef307b	Allow updating Increase for ScaleUpRequest in ClusterStateRegistry	2018-12-28 17:17:07 +01:00
Maciej Pytel	60babe7158	Use kubernetes lister for daemonset instead of custom one Also migrate to using apps/v1.DaemonSet instead of old extensions/v1beta1.	2018-12-28 13:55:41 +01:00
Maciej Pytel	40811c2f8b	Add listers for more controllers	2018-12-28 13:31:21 +01:00
Kubernetes Prow Robot	62c492cb1f	Merge pull request #1518 from lsytj0413/fix-golint refactor(*): fix golint warning	2018-12-21 06:05:20 -08:00
lsytj0413	672dddd23a	refactor(*): fix golint warning	2018-12-19 10:04:08 +08:00

1 2 3 4 5 ...

445 Commits