autoscaler

Commit Graph

Author	SHA1	Message	Date
Marwan Ahmed	a3bada3708	correctly classify error for failed scale ups	2020-09-13 21:14:27 -07:00
M. Habib Rosyad	b7e02047f7	expose max-nodes-total as a metric	2020-08-19 17:43:39 +07:00
Maciek Pytel	655b4081f4	Migrate to klog v2	2020-06-05 17:22:26 +02:00
Łukasz Osipiuk	b4c8bbb12c	Fixes around metrics/ handler	2019-11-22 14:07:10 +01:00
Julien Balestra	012c8421da	cluster-autoscaler/metrics: add a summary for function duration Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>	2019-08-28 16:28:16 +02:00
Julien Balestra	6d707a08ac	cluster-autoscaler/metrics: expose the scale down cooldown Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>	2019-08-27 18:12:33 +02:00
Jacek Kaniuk	0c64e0932a	Tainting unneeded nodes as PreferNoSchedule	2019-01-21 13:06:50 +01:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Karol Gołąb	67b834368b	Add client-go metrics (rest_client_request_*).	2018-09-06 12:35:16 +02:00
Karol Gołąb	aae4d1270a	Make GetGpuTypeForMetrics more robust	2018-06-26 21:35:16 +02:00
Karol Gołąb	5eb7021f82	Add GPU-related scaled_up & scaled_down metrics (#974 ) * Add GPU-related scaled_up & scaled_down metrics * Fix name to match SD naming convention * Fix import after master rebase * Change the logic to include GPU-being-installed nodes	2018-06-22 21:00:52 +02:00
Aleksandra Malinowska	3894ecb470	Export unregistered node count metric	2018-01-16 16:56:40 +01:00
Aleksandra Malinowska	3d33b64599	Export long unregistered node count metric	2018-01-16 16:07:24 +01:00
Aleksandra Malinowska	312f989c15	Don't register metrics unless on leading master	2017-12-14 16:08:20 +01:00
Maciej Pytel	c376ef3c87	Add metrics for autoprovisioning	2017-10-31 17:42:58 +01:00
Maciej Pytel	e12ee88f5f	Add failed scale-up reason in metric	2017-09-26 13:40:34 +02:00
Maciej Pytel	7f7243ea98	Add reason field to faied_scale_ups_total metric For now it's just a placeholder, will add proper logic for next release	2017-09-25 16:33:49 +02:00
Maciej Pytel	5e05c84cf0	Add metric counting failed scale-ups A minor refactor was required to avoid cyclic imports	2017-09-22 18:12:50 +02:00
Marcin Wielgus	2d8f59e23d	Set verbosity for each of the glog.Info logs	2017-09-01 12:34:29 +02:00
Beata Skiba	edeb522274	Add measuring of FilterOutSchedulable	2017-08-22 18:36:13 +02:00
Beata Skiba	43c9b6b06b	Add cleaner function labels for metrics exporting.	2017-08-22 16:09:42 +02:00
Beata Skiba	14df1b808b	Drill down scale down metrics Split scale down duration into three parts: 1. Find nodes to remove 2. Node deletion 3. Misc operations	2017-08-18 14:17:02 +02:00
Maciej Pytel	1782cbc4ed	Log long function execution	2017-08-07 11:21:15 +02:00
Beata Skiba	25f6242b99	Change histogram buckets.	2017-08-04 14:04:02 +02:00
Maciej Pytel	9123400fcf	Change function duration metric to histogram Many functions take an order of magnitude more time if they actually decide to take an action (like deleting node in scale-down) and it's ok if executing action is slow. That makes summary less useful, as we expect to have large outliers on some percentile, depending on churn in cluster. Instead having a histogram gives us the fuller picture of how the distribution of function runtimes look like.	2017-06-23 12:06:28 +02:00
Marcin Wielgus	69c77791a2	Fix error types	2017-06-12 21:26:50 +02:00
Aleksandra Malinowska	972772440a	Add failing health check if autoscaler loop consistently returns error	2017-05-29 11:31:57 +02:00
Aleksandra Malinowska	7c94367099	Add health check	2017-05-25 11:37:44 +02:00
Maciej Pytel	f716a7e496	Add typed errors; add errors_total metric To keep reasonable commit size only top-level files use new errors. Will add them in other files in next commits.	2017-05-18 14:09:15 +02:00
Maciej Pytel	7a21a68b56	Add metrics counting CA operations	2017-05-15 13:03:00 +02:00
Maciej Pytel	4cdf06ea94	Added CA metrics related to autoscaler execution	2017-05-11 14:51:04 +02:00
Maciej Pytel	83ef3d2be3	Added CA metrics related to cluster state	2017-05-11 13:54:04 +02:00
Yusuke Kuoka	baee799524	cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs. The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime. Example usage: ``` ./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr ``` The configmap would look like: ```yaml kind: ConfigMap apiVersion: v1 metadata: name: cluster-autoscaler namespace: kube-system data: settings: \|- { "nodeGroups": [ { "minSize": 1, "maxSize": 2, "name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5" } ] } ``` Other notes: * Make namespace defaults to "kube-system" according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267 * Trigger a full-recreate on a configuration change according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410 * Introduced `autoscaler/` and moved all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663) * Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler` * `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change. * Moved `scale_down.go`, `scale_up.go` and `utils.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`) Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`	2017-02-24 20:36:47 +09:00

33 Commits