autoscaler

Commit Graph

Author	SHA1	Message	Date
Krzysztof Jastrzebski	6944f3fc56	Delete zero values from deletionsInProgress map in NodeDeletionTracker.	2019-05-28 14:34:56 +02:00
Krzysztof Jastrzebski	4247c8b032	Implement functionality which delays node deletion when node has annotation with prefix 'delay-deletion.cluster-autoscaler.kubernetes.io/'.	2019-05-17 16:06:17 +02:00
Kubernetes Prow Robot	a6c109f8f5	Merge pull request #1967 from towca/jtuznik/delete-empty-nodes-behaviour-fix Modify the info passed to ScaleDownStatusProcessor when empty nodes a…	2019-04-30 05:25:37 -07:00
Jakub Tużnik	b92f971326	Provide ScaleDownStatusProcessor with more info about scale-down results	2019-04-30 13:49:06 +02:00
Jakub Tużnik	402c643851	Modify the info passed to ScaleDownStatusProcessor when empty nodes are deleted Previously, if any of the nodes fails to delete, the processor gets a ScaleDownError status. After this commit, it will get the list of nodes that were successfully deleted.	2019-04-26 15:54:11 +02:00
Jiaxin Shan	83ae66cebc	Consider GPU utilization in scaling down	2019-04-04 01:12:51 -07:00
Jiaxin Shan	90666881d3	Move GPULabel and GPUTypes to cloud provider	2019-03-25 13:03:01 -07:00
Marcin Wielgus	99f1dcf9d2	Merge branch 'master' into crc-fix-error-format	2019-02-01 17:22:57 +01:00
Vivek Bagade	79ef3a6940	unexporting methods in utils.go	2019-01-25 00:06:03 +05:30
Jacek Kaniuk	0c64e0932a	Tainting unneeded nodes as PreferNoSchedule	2019-01-21 13:06:50 +01:00
CodeLingo Bot	c0603afdeb	Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Reverted incorrect change to with error format string Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Resolve conflict Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix error strings in testscases to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix more error strings to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io>	2019-01-11 09:10:31 +13:00
Maciej Pytel	9060014992	Use listers in scale-down	2018-12-31 14:55:38 +01:00
lsytj0413	672dddd23a	refactor(*): fix golint warning	2018-12-19 10:04:08 +08:00
Andrew McDermott	fd3fd85f26	UPSTREAM: <carry>: handle nil nodeGroup in calculateScaleDownGpusTotal Explicitly handle nil as a return value for nodeGroup in `calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called for GPU nodes that don't exist. The current logic generates a runtime exception: "reflect: call of reflect.Value.IsNil on zero Value" Looking through the rest of the tree all the other places that use this pattern additionally and explicitly check whether `nodeGroup == nil` first. This change now completes the pattern in `calculateScaleDownGpusTotal()`. Looking at the other occurrences of this pattern we see: ``` File: clusterstate/clusterstate.go 488:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { File: core/utils.go 231:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 322:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 394:27: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 461:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { File: core/scale_down.go 185:6: if reflect.ValueOf(nodeGroup).IsNil() { 608:27: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 747:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 1010:25: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { ``` with the notable exception at core/scale_down.go:185 which is `calculateScaleDownGpusTotal()`. With this change, and invoking the autoscaler with: ``` ... --max-nodes-total=24 \ --cores-total=8:128 \ --memory-total=4:256 \ --gpu-total=nvidia.com/gpu:0:16 \ --gpu-total=amd.com/gpu:0:4 \ ... ``` I no longer see a runtime exception.	2018-12-05 18:54:07 +00:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Alex Price	4ae7acbacc	add flags to ignore daemonsets and mirror pods when calculating resource utilization of a node Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization (defaults to false) and when enabled, factors DaemonSet and mirror pods out when calculating the resource utilization of a node.	2018-11-23 15:24:25 +11:00
Łukasz Osipiuk	55fc1e2f00	Store NodeGroup in ScaleUpRequest and ScaleDownRequest	2018-10-30 18:03:04 +01:00
Jakub Tużnik	71111da20c	Add a scale down status processor, refactor so that there's more scale down info available to it	2018-09-12 14:52:20 +02:00
Pengfei Ni	1dd0147d9e	Add more events for CA	2018-07-09 15:42:05 +08:00
Aleksandra Malinowska	800ee56b34	Refactor and extend GPU metrics error types	2018-07-05 13:13:11 +02:00
Karol Gołąb	aae4d1270a	Make GetGpuTypeForMetrics more robust	2018-06-26 21:35:16 +02:00
Marcin Wielgus	f2e76e2592	Merge pull request #1008 from krzysztof-jastrzebski/master Move removing unneeded autoprovisioned node groups to node group manager	2018-06-22 21:01:36 +02:00
Karol Gołąb	5eb7021f82	Add GPU-related scaled_up & scaled_down metrics (#974 ) * Add GPU-related scaled_up & scaled_down metrics * Fix name to match SD naming convention * Fix import after master rebase * Change the logic to include GPU-being-installed nodes	2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski	2df2568841	Move removing unneeded autoprovisioned node groups to node group manager	2018-06-22 14:26:12 +02:00
Nic Doye	ebadbda2b2	issues/933 Consider making UnremovableNodeRecheckTimeout configurable	2018-06-18 11:54:14 +01:00
Łukasz Osipiuk	b7323bc0d1	Respect GPU limits in scale_up	2018-06-14 15:46:58 +02:00
Łukasz Osipiuk	9f75099d2c	Restructure checking resource limits in scale_up.go Preparatory work for before introducing GPU limits	2018-06-13 19:00:37 +02:00
Łukasz Osipiuk	087a5cc9a9	Respect GPU limits in scale_down	2018-06-13 14:19:59 +02:00
Łukasz Osipiuk	1fa44a4d3a	Fix bug resulting resource limits not being enforced in scale_down	2018-06-11 16:39:07 +02:00
Łukasz Osipiuk	519064e1ec	Extract isNodeBeingDeleted function	2018-06-11 14:21:07 +02:00
Łukasz Osipiuk	6c57a01fc9	Restructure checking resource limits in scale_down.go	2018-06-11 14:02:40 +02:00
Łukasz Osipiuk	9c61477d25	Do not return error when getting cpu/memory capacity of node	2018-06-08 15:04:57 +02:00
Krzysztof Jastrzebski	adad14c2c9	Delete autoprovisioned node pool after all nodes are deleted.	2018-05-28 14:22:18 +02:00
Karol Gołąb	4c710950de	Move ClusterStateRegistry to StaticAutoscaler AutoscalingContext is basically a configuration and few static helpers and API handles. ClusterStateRegistry is state and thus moved to other state-keeping objects.	2018-05-24 13:03:01 +02:00
Aleksandra Malinowska	ffeebde8d8	Add support for rescheduled pods with the same name in drain	2018-05-10 12:00:56 +02:00
Marcin Wielgus	9c5728fd74	Merge pull request #836 from kgolab/kg-clean-up-004 Use timestamp argument	2018-05-08 20:24:37 +02:00
Karol Gołąb	53b1c6a394	Use timestamp argument	2018-05-08 13:08:30 +02:00
Karol Gołąb	da16642bcf	Make the code slightly more idiomatic go	2018-05-08 11:35:01 +02:00
Beata Skiba	054f6d8650	Merge pull request #794 from krzysztof-jastrzebski/pods Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 13:08:56 +02:00
Krzysztof Jastrzebski	88b769b324	Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 12:37:51 +02:00
Aleksandra Malinowska	3d599bfabe	Rephrase unremovable node warning	2018-04-18 13:43:32 +02:00
Aleksandra Malinowska	4c594db7f8	Run spellchecker	2018-03-15 15:47:49 +01:00
anniedy	bf59e3daa5	Typo fix unneded->[unneeded] (#623 ) * Update clusterstate.md * Update scale_down.go * Update static_autoscaler.go	2018-02-07 17:36:58 +01:00
Marcin Wielgus	439fd3c9ec	Merge pull request #411 from krzysztof-jastrzebski/priority Adds priority preemption support to cluster autoscaler.	2017-11-08 09:09:26 +01:00
Edward Tsang	4104a91991	more spelling fixes	2017-11-02 14:21:36 -07:00
Maciej Pytel	c376ef3c87	Add metrics for autoprovisioning	2017-10-31 17:42:58 +01:00
Maciej Pytel	9c2ebccbfe	Write events when autoprovisioned nodegroup is created / deleted	2017-10-25 17:39:30 +02:00
Krzysztof Jastrzebski	56ac572666	Adds resource limits to cloud provider.	2017-10-23 16:06:56 +02:00
Krzysztof Jastrzebski	d9c00e5ce1	Adds priority preemption support to cluster autoscaler.	2017-10-23 09:54:56 +02:00
Aleksandra Malinowska	4c31a57374	fix leaking taints in case of cloud provider error on node deletion	2017-09-22 17:55:48 +02:00

1 2

98 Commits