Commit Graph

45 Commits

Author SHA1 Message Date
Paco Xu 8dec2025f8 Stop applying the beta.kubernetes.io/os and arch 2022-10-27 12:20:04 +08:00
Kubernetes Prow Robot 3e25023401
Merge pull request #5059 from elmiko/skipped-scale-metric
add metric for skipped scaling events
2022-08-08 05:20:19 -07:00
Michael McCune da9d307e57 add metric for skipped scaling events
This change adds a new metric, skipped_scale_events_count, which will
record the number of times that the CA has chosen to skip a scaling
event. The metric contains a label for the scaling direction (up or down)
and the reason.

This patch includes usages for the new metric based on CPU or Memory
limits being reached in eiter a scale up or down.
2022-07-28 10:51:49 -04:00
Kubernetes Prow Robot f990344b6c
Merge pull request #4134 from airbnb/es--expander-plugin-proposal
CA expander plugin proposal
2022-07-10 13:57:47 -07:00
Daniel Kłobuszewski 48273b7aff Design proposal for parallel drain 2022-04-12 15:16:30 +02:00
Evan Sheng 66af6d1339 add error fallback line 2021-06-14 11:50:08 -07:00
Evan Sheng a956b9ca79 CA proposal 2021-06-14 11:47:27 -07:00
Kubernetes Prow Robot 2beea02a29
Merge pull request #3983 from elmiko/cluster-resource-consumption-metrics
Cluster resource consumption metrics
2021-05-13 15:32:04 -07:00
Michael McCune a24ea6c66b add cluster cores and memory bytes count metrics
This change adds 4 metrics that can be used to monitor the minimum and
maximum limits for CPU and memory, as well as the current counts in
cores and bytes, respectively.

The four metrics added are:
* `cluster_autoscaler_cpu_limits_cores`
* `cluster_autoscaler_cluster_cpu_current_cores`
* `cluster_autoscaler_memory_limits_bytes`
* `cluster_autoscaler_cluster_memory_current_bytes`

This change also adds the `max_cores_total` metric to the metrics
proposal doc, as it was previously not recorded there.

User story: As a cluster autoscaler user, I would like to monitor my
cluster through metrics to determine when the cluster is nearing its
limits for cores and memory usage.
2021-04-06 10:35:21 -04:00
MyannaHarris 7dba63d4d1 Proposal to circumvent 50 tag ASG limit for EKS ManagedNodegroups 2021-03-22 23:20:20 -07:00
Michael McCune 7ecf933e7b add a metric for unregistered nodes removed by cluster autoscaler
This change adds a new metric which counts the number of nodes removed
by the cluster autoscaler due to being unregistered with kubernetes.

User Story

As a cluster-autoscaler user, I would like to know when the autoscaler
is cleaning up nodes that have failed to register with kubernetes. I
would like to monitor the rate at which failed nodes are being removed
so that I can better alert on infrastructure issues which may go
unnoticed elsewhere.
2021-03-04 19:23:03 -05:00
Hector Fernandez 68c984472a docs: polished the rpc operations
Signed-off-by: Hector Fernandez <hectorj@gmail.com>
2021-01-11 20:50:20 +01:00
Hector Fernandez 0b05246432 chore: add authors
Signed-off-by: Hector Fernandez <hfernandez@mesosphere.com>
2021-01-08 11:06:49 +01:00
Hector Fernandez c11665bec8 chore: add gRPC contracts
Signed-off-by: Hector Fernandez <hfernandez@mesosphere.com>
2021-01-08 11:06:49 +01:00
Hector Fernandez 26f941852c chore: mention the new cloud provider 2021-01-08 11:06:48 +01:00
Hector Fernandez 50bb208119 doc: proposal custom cloud provider over gRPC 2021-01-08 11:06:48 +01:00
Jakub Tużnik 73a5cdf928 Address recent breaking changes in scheduler
The following things changed in scheduler and needed to be fixed:
* NodeInfo was moved to schedulerframework
* Some fields on NodeInfo are now exposed directly instead of via getters
* NodeInfo.Pods is now a list of *schedulerframework.PodInfo, not *apiv1.Pod
* SharedLister and NodeInfoLister were moved to schedulerframework
* PodLister was removed
2020-04-24 17:54:47 +02:00
zhanwang 4fc4cab761
repair url in balance_similar.md
repair url in balance_similar.md
2019-12-30 10:30:23 +08:00
Cria Hu 17ab00865b
fix broken link: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/utils.go 2019-12-19 17:55:27 +08:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00
JoeWrightss 322e271f5b
Fix some spelling errors 2018-12-08 01:10:18 +08:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
AdamDang 498e99aec6
Typo fix: "it's reaction time is short"->"its reaction time is short"
"it's reaction time is short"->"its reaction time is short"
2018-05-22 22:08:21 +08:00
AdamDang ebb18bc050
"N1-standard"->"n1-standard"
"N1-standard"->"n1-standard"
2018-03-29 00:43:00 +08:00
AdamDang 2ee9ebea2b
Update pricing.md
"n1-standard-..." should be changed to "N1-standard-..."
In this doc, mostly using  "N1-standard-...", but there are several "n1-standard-...", too.
It's better to use same format.
2018-03-28 21:29:25 +08:00
Hang Yan d841d87d49 Fix various typos in proposals 2018-02-07 16:12:03 +08:00
lmxia ea31397c64 fix typos in balance_similar.md and pricing.md 2018-02-06 22:50:22 +08:00
Maciej Pytel e1eabe5986 Update metrics documentation 2017-11-07 17:37:10 +01:00
Beata Skiba ac8004f41a Cluster Autoscaler scalability testing report 2017-09-26 18:43:02 +02:00
Nikhita Raghunath b82adcdff7 Fix link after design proposal move 2017-09-26 00:26:26 +05:30
Sergey Lanzman 437a3f60e1 Small optimize code 2017-09-04 23:50:45 +03:00
Marcin Wielgus f6a7eadadf Labels in NAP proposal update 2017-08-16 14:27:44 +02:00
Marcin Wielgus 64f9e065f4 Node Auto-provisioning doc 2017-08-09 13:59:31 +02:00
Beata Skiba e582a1745b Kubemark integration proposal. 2017-06-30 15:57:58 +02:00
Maciej Pytel 9123400fcf Change function duration metric to histogram
Many functions take an order of magnitude more time
if they actually decide to take an action (like deleting
node in scale-down) and it's ok if executing action is
slow. That makes summary less useful, as we expect to
have large outliers on some percentile, depending on
churn in cluster. Instead having a histogram gives
us the fuller picture of how the distribution of
function runtimes look like.
2017-06-23 12:06:28 +02:00
Marcin Wielgus cd6b27b48f Merge pull request #93 from MaciekPytel/node_group_sets_proposal
Proposal for balancing node groups for multizone
2017-05-31 11:23:29 +02:00
Maciej Pytel d7a7fc659b Proposal for balancing node groups for multizone 2017-05-31 10:54:34 +02:00
Marcin Wielgus 1397882fbe Price-based node group ranking function 2017-05-26 10:31:57 +02:00
Marcin Wielgus ff258b87a1 Min at zero - design doc 2017-05-11 14:44:20 +02:00
Maciej Pytel f7946c8d21 Updated document format 2017-05-10 16:01:21 +02:00
Maciej Pytel 2481d8f706 Renamed metrics 2017-05-10 12:15:23 +02:00
Maciej Pytel 7953a3f0eb CA metrics proposal 2017-05-08 19:04:14 +02:00
Marcin Wielgus c4cf6885f1 Cluster-autoscaler: update unready nodes proposal 2017-01-12 01:04:20 +00:00
Marcin Wielgus 547c7aab56 Cluster-autoscaler: include unregistered nodes in proposal 2017-01-04 15:15:01 +01:00
Marcin Wielgus e0aab3f307 Cluster-autoscaler: cluster state registry proposal 2017-01-02 16:00:33 +01:00