autoscaler

Commit Graph

Author	SHA1	Message	Date
Jakub Tużnik	b92f971326	Provide ScaleDownStatusProcessor with more info about scale-down results	2019-04-30 13:49:06 +02:00
Jakub Tużnik	402c643851	Modify the info passed to ScaleDownStatusProcessor when empty nodes are deleted Previously, if any of the nodes fails to delete, the processor gets a ScaleDownError status. After this commit, it will get the list of nodes that were successfully deleted.	2019-04-26 15:54:11 +02:00
Jiaxin Shan	83ae66cebc	Consider GPU utilization in scaling down	2019-04-04 01:12:51 -07:00
Jiaxin Shan	90666881d3	Move GPULabel and GPUTypes to cloud provider	2019-03-25 13:03:01 -07:00
Marcin Wielgus	99f1dcf9d2	Merge branch 'master' into crc-fix-error-format	2019-02-01 17:22:57 +01:00
Vivek Bagade	79ef3a6940	unexporting methods in utils.go	2019-01-25 00:06:03 +05:30
Jacek Kaniuk	0c64e0932a	Tainting unneeded nodes as PreferNoSchedule	2019-01-21 13:06:50 +01:00
CodeLingo Bot	c0603afdeb	Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Fix error format strings according to best practices from CodeReviewComments Reverted incorrect change to with error format string Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Resolve conflict Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingoBot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <hello@codelingo.io> Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix error strings in testscases to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io> Fix more error strings to remedy failing tests Signed-off-by: CodeLingo Bot <bot@codelingo.io>	2019-01-11 09:10:31 +13:00
Maciej Pytel	9060014992	Use listers in scale-down	2018-12-31 14:55:38 +01:00
lsytj0413	672dddd23a	refactor(*): fix golint warning	2018-12-19 10:04:08 +08:00
Andrew McDermott	fd3fd85f26	UPSTREAM: <carry>: handle nil nodeGroup in calculateScaleDownGpusTotal Explicitly handle nil as a return value for nodeGroup in `calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called for GPU nodes that don't exist. The current logic generates a runtime exception: "reflect: call of reflect.Value.IsNil on zero Value" Looking through the rest of the tree all the other places that use this pattern additionally and explicitly check whether `nodeGroup == nil` first. This change now completes the pattern in `calculateScaleDownGpusTotal()`. Looking at the other occurrences of this pattern we see: ``` File: clusterstate/clusterstate.go 488:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { File: core/utils.go 231:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 322:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 394:27: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 461:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { File: core/scale_down.go 185:6: if reflect.ValueOf(nodeGroup).IsNil() { 608:27: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 747:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 1010:25: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { ``` with the notable exception at core/scale_down.go:185 which is `calculateScaleDownGpusTotal()`. With this change, and invoking the autoscaler with: ``` ... --max-nodes-total=24 \ --cores-total=8:128 \ --memory-total=4:256 \ --gpu-total=nvidia.com/gpu:0:16 \ --gpu-total=amd.com/gpu:0:4 \ ... ``` I no longer see a runtime exception.	2018-12-05 18:54:07 +00:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Alex Price	4ae7acbacc	add flags to ignore daemonsets and mirror pods when calculating resource utilization of a node Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization (defaults to false) and when enabled, factors DaemonSet and mirror pods out when calculating the resource utilization of a node.	2018-11-23 15:24:25 +11:00
Łukasz Osipiuk	55fc1e2f00	Store NodeGroup in ScaleUpRequest and ScaleDownRequest	2018-10-30 18:03:04 +01:00
Jakub Tużnik	71111da20c	Add a scale down status processor, refactor so that there's more scale down info available to it	2018-09-12 14:52:20 +02:00
Pengfei Ni	1dd0147d9e	Add more events for CA	2018-07-09 15:42:05 +08:00
Aleksandra Malinowska	800ee56b34	Refactor and extend GPU metrics error types	2018-07-05 13:13:11 +02:00
Karol Gołąb	aae4d1270a	Make GetGpuTypeForMetrics more robust	2018-06-26 21:35:16 +02:00
Marcin Wielgus	f2e76e2592	Merge pull request #1008 from krzysztof-jastrzebski/master Move removing unneeded autoprovisioned node groups to node group manager	2018-06-22 21:01:36 +02:00
Karol Gołąb	5eb7021f82	Add GPU-related scaled_up & scaled_down metrics (#974 ) * Add GPU-related scaled_up & scaled_down metrics * Fix name to match SD naming convention * Fix import after master rebase * Change the logic to include GPU-being-installed nodes	2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski	2df2568841	Move removing unneeded autoprovisioned node groups to node group manager	2018-06-22 14:26:12 +02:00
Nic Doye	ebadbda2b2	issues/933 Consider making UnremovableNodeRecheckTimeout configurable	2018-06-18 11:54:14 +01:00
Łukasz Osipiuk	b7323bc0d1	Respect GPU limits in scale_up	2018-06-14 15:46:58 +02:00
Łukasz Osipiuk	9f75099d2c	Restructure checking resource limits in scale_up.go Preparatory work for before introducing GPU limits	2018-06-13 19:00:37 +02:00
Łukasz Osipiuk	087a5cc9a9	Respect GPU limits in scale_down	2018-06-13 14:19:59 +02:00
Łukasz Osipiuk	1fa44a4d3a	Fix bug resulting resource limits not being enforced in scale_down	2018-06-11 16:39:07 +02:00
Łukasz Osipiuk	519064e1ec	Extract isNodeBeingDeleted function	2018-06-11 14:21:07 +02:00
Łukasz Osipiuk	6c57a01fc9	Restructure checking resource limits in scale_down.go	2018-06-11 14:02:40 +02:00
Łukasz Osipiuk	9c61477d25	Do not return error when getting cpu/memory capacity of node	2018-06-08 15:04:57 +02:00
Krzysztof Jastrzebski	adad14c2c9	Delete autoprovisioned node pool after all nodes are deleted.	2018-05-28 14:22:18 +02:00
Karol Gołąb	4c710950de	Move ClusterStateRegistry to StaticAutoscaler AutoscalingContext is basically a configuration and few static helpers and API handles. ClusterStateRegistry is state and thus moved to other state-keeping objects.	2018-05-24 13:03:01 +02:00
Aleksandra Malinowska	ffeebde8d8	Add support for rescheduled pods with the same name in drain	2018-05-10 12:00:56 +02:00
Marcin Wielgus	9c5728fd74	Merge pull request #836 from kgolab/kg-clean-up-004 Use timestamp argument	2018-05-08 20:24:37 +02:00
Karol Gołąb	53b1c6a394	Use timestamp argument	2018-05-08 13:08:30 +02:00
Karol Gołąb	da16642bcf	Make the code slightly more idiomatic go	2018-05-08 11:35:01 +02:00
Beata Skiba	054f6d8650	Merge pull request #794 from krzysztof-jastrzebski/pods Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 13:08:56 +02:00
Krzysztof Jastrzebski	88b769b324	Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 12:37:51 +02:00
Aleksandra Malinowska	3d599bfabe	Rephrase unremovable node warning	2018-04-18 13:43:32 +02:00
Aleksandra Malinowska	4c594db7f8	Run spellchecker	2018-03-15 15:47:49 +01:00
anniedy	bf59e3daa5	Typo fix unneded->[unneeded] (#623 ) * Update clusterstate.md * Update scale_down.go * Update static_autoscaler.go	2018-02-07 17:36:58 +01:00
Marcin Wielgus	439fd3c9ec	Merge pull request #411 from krzysztof-jastrzebski/priority Adds priority preemption support to cluster autoscaler.	2017-11-08 09:09:26 +01:00
Edward Tsang	4104a91991	more spelling fixes	2017-11-02 14:21:36 -07:00
Maciej Pytel	c376ef3c87	Add metrics for autoprovisioning	2017-10-31 17:42:58 +01:00
Maciej Pytel	9c2ebccbfe	Write events when autoprovisioned nodegroup is created / deleted	2017-10-25 17:39:30 +02:00
Krzysztof Jastrzebski	56ac572666	Adds resource limits to cloud provider.	2017-10-23 16:06:56 +02:00
Krzysztof Jastrzebski	d9c00e5ce1	Adds priority preemption support to cluster autoscaler.	2017-10-23 09:54:56 +02:00
Aleksandra Malinowska	4c31a57374	fix leaking taints in case of cloud provider error on node deletion	2017-09-22 17:55:48 +02:00
Marcin Wielgus	f04113d746	Remove TargetSize() from loops iterating over nodes	2017-09-13 22:33:17 +02:00
Aleksandra Malinowska	197b05b180	respect minimum cores/memory limit during scale down	2017-09-13 10:10:47 +02:00
Aleksandra Malinowska	187c02693e	Taint empty nodes to be deleted	2017-09-12 17:40:05 +02:00
Marcin Wielgus	3039a0e813	Merge pull request #319 from krzysztof-jastrzebski/core-test Core/static_autoscaler.go unit tests.	2017-09-12 13:11:11 +02:00
Beata Skiba	eba0fa2f95	Remove nodes that are not in the cluster from unremovableNodes	2017-09-11 20:01:02 +02:00
Krzysztof Jastrzebski	0aec68a46d	Core/static_autoscaler.go unit tests. Current time usage refactoring.	2017-09-11 15:07:21 +02:00
Marcin Wielgus	db63ac3a18	Merge pull request #324 from aleksandra-malinowska/scale-down-pod-not-found Add checking for pod not found error on eviction	2017-09-11 15:10:08 +05:30
Beata Skiba	6e5784a519	Always add empty nodes to unneeded nodes	2017-09-08 15:55:18 +02:00
Aleksandra Malinowska	fbc8462b10	Add checking for not found error	2017-09-08 15:45:44 +02:00
Marcin Wielgus	f9cabf3a1a	Merge pull request #297 from bskiba/additional-k Only consider up to 10% of the nodes as additional candidates for scale down	2017-09-07 04:34:23 +05:30
Sergey Lanzman	415f53cdea	Change from deprecated Core to CoreV1 for kube client	2017-09-04 22:16:21 +03:00
Beata Skiba	a6c18b87d2	Only consider up to 10% of the nodes as additional candidates for scale down.	2017-09-04 17:37:02 +02:00
Marcin Wielgus	bcc8cded64	Clean up empty autoprovisioned node groups	2017-09-04 13:53:07 +02:00
Marcin Wielgus	c0b48e4a15	Merge pull request #285 from mwielgus/loglevel Set verbosity for each of the glog.Info logs	2017-09-01 16:42:11 +05:30
Marcin Wielgus	2d8f59e23d	Set verbosity for each of the glog.Info logs	2017-09-01 12:34:29 +02:00
Beata Skiba	576e4105db	Make ScaleDownNonEmptyCandidatesCount a flag.	2017-08-31 15:05:06 +02:00
Beata Skiba	4560cc0a85	Keep maximum 30 candidates for scale down with drain	2017-08-31 14:58:40 +02:00
Marcin Wielgus	191d140107	Don't increase pod graceful termination	2017-08-28 16:54:19 +02:00
Marcin Wielgus	6ad7ca21e8	Merge pull request #265 from MaciekPytel/ignore_unneded_if_min_size Skip nodes in min-sized groups in scale-down simulation	2017-08-28 19:40:53 +05:30
Maciej Pytel	2f6dd8aefc	Skip nodes in min-sized groups in scale-down simulation Currently we track if those nodes can be removed and only skip them at the execution step. Since checking if node is unneeded is pretty expensive it's better to filter them out early.	2017-08-28 15:48:41 +02:00
Marcin Wielgus	718e5db78e	Run node drain/delete in a separate goroutine	2017-08-28 12:12:31 +02:00
Maciej Pytel	fa53e52ed9	Skip node in scale-down if it was recently found unremovable	2017-08-25 17:21:08 +02:00
Beata Skiba	44f69c6706	Extract deleting empty nodes to a separate function.	2017-08-22 16:09:42 +02:00
Beata Skiba	14df1b808b	Drill down scale down metrics Split scale down duration into three parts: 1. Find nodes to remove 2. Node deletion 3. Misc operations	2017-08-18 14:17:02 +02:00
Marcin Wielgus	9116e4c08c	Compilation fix for CA after godeps update	2017-08-11 17:56:47 +02:00
Marcin Wielgus	4580e1dc45	Fix getEmptyNodes function in CA	2017-08-07 22:21:41 +02:00
Aleksandra Malinowska	ab8323e8dc	fix some logs in scale down	2017-07-20 10:33:42 +02:00
fate-grand-order	5b230a45ee	correct some misspells for cluster-autoscaler/core	2017-07-13 17:53:59 +08:00
Aleksandra Malinowska	9f54934229	add annotation	2017-07-06 14:47:32 +02:00
Marcin Wielgus	fc43808149	Godeps bump for CA	2017-07-03 22:05:11 +02:00
Marcin Wielgus	2cd532ebfe	Don't calculate utilization and run scale down simulations for unmanaged nodes	2017-06-20 16:57:30 +02:00
Maciej Pytel	767367c866	Fix typos related to max-graceful-termination-sec	2017-06-14 14:14:21 +02:00
Marcin Wielgus	69c77791a2	Fix error types	2017-06-12 21:26:50 +02:00
Maciej Pytel	3f8ca51768	Use typed errors in scale down	2017-05-18 14:09:15 +02:00
Maciej Pytel	7a21a68b56	Add metrics counting CA operations	2017-05-15 13:03:00 +02:00
Marcin Wielgus	42c177b68f	Add deletion safety margin to node drain	2017-05-08 11:47:33 +02:00
Marcin Wielgus	34eb4973f8	Fix imports in cluster autoscaler after migrating it from contrib	2017-04-18 15:42:04 +02:00
Maciej Pytel	0b74a3bd25	Cluster-Autoscaler: update event name	2017-04-10 14:03:21 +02:00
Maciej Pytel	72c885b800	Cluster-Autoscaler: reset scale-down on unready cluster	2017-03-22 17:17:59 +01:00
Maciej Pytel	39162f0860	Cluster-Autoscaler: evict pods instead of deleting them	2017-03-10 16:18:47 +01:00
Maciej Pytel	5d2c675c8e	Cluster-Autoscaler: update scale down status	2017-03-08 11:51:20 +01:00
Marcin Wielgus	27b797f541	Cluster-Autoscaler: skip nodes currently under deletion in scale down	2017-03-07 14:59:15 +01:00
Kubernetes Submit Queue	39fa783ad7	Merge pull request https://github.com/kubernetes/contrib/pull/2451 from mwielgus/pdb-ca Automatic merge from submit-queue Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2 In part 1 or 2 we skip nodes that have a pod with 0 poddisruptionallowed. Part 2/2 will delete pods using evict. cc: @jszczepkowski @MaciekPytel @davidopp @fgrzadkowski	2017-03-06 09:27:50 -08:00
Marcin Wielgus	5b4441083a	Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2	2017-03-06 17:15:04 +01:00
Maciej Pytel	d3bf5d3d51	Cluster-Autoscaler: log events on status configmap	2017-03-06 12:21:24 +01:00
Marcin Wielgus	2ffaddb7c0	Cluster-autoscaler: lint	2017-03-02 15:15:07 +01:00
Marcin Wielgus	72a47dc2b2	Cluster-autoscaler: update code for 1.6 k8s sync	2017-03-02 14:34:49 +01:00
Yusuke Kuoka	baee799524	cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs. The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime. Example usage: ``` ./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr ``` The configmap would look like: ```yaml kind: ConfigMap apiVersion: v1 metadata: name: cluster-autoscaler namespace: kube-system data: settings: \|- { "nodeGroups": [ { "minSize": 1, "maxSize": 2, "name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5" } ] } ``` Other notes: * Make namespace defaults to "kube-system" according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267 * Trigger a full-recreate on a configuration change according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410 * Introduced `autoscaler/` and moved all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663) * Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler` * `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change. * Moved `scale_down.go`, `scale_up.go` and `utils.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`) Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`	2017-02-24 20:36:47 +09:00

1 2 3

145 Commits