Document new API Priority and Fairness metrics

Also brush up the descriptions of some of the older metrics.
2020-07-22 19:54:12 -04:00 · 2020-07-22 19:54:12 -04:00 · 800a602e36
parent 1c6a25e257
commit 800a602e36
1 changed files with 85 additions and 28 deletions
--- a/content/en/docs/concepts/cluster-administration/flow-control.md
+++ b/content/en/docs/concepts/cluster-administration/flow-control.md
@ -308,10 +308,12 @@ exports additional metrics. Monitoring these can help you determine whether your
 configuration is inappropriately throttling important traffic, or find
 poorly-behaved workloads that may be harming system health.

-* `apiserver_flowcontrol_rejected_requests_total` counts requests that
-  were rejected, grouped by the name of the assigned priority level,
-  the name of the assigned FlowSchema, and the reason for rejection.
-  The reason will be one of the following:
+* `apiserver_flowcontrol_rejected_requests_total` is a vector of
+  counters (cumulative since server start) of requests that were
+  rejected, broken down by the labels `flowSchema` (indicating the one
+  that matched the request), `priorityLevel` (indicating the one to
+  which ithe request was assigned), and `reason`.  The `reason` label
+  will be have one of the following values:
    * `queue-full`, indicating that too many requests were already
      queued,
    * `concurrency-limit`, indicating that the
@ -320,23 +322,73 @@ poorly-behaved workloads that may be harming system health.
    * `time-out`, indicating that the request was still in the queue
      when its queuing time limit expired.

-* `apiserver_flowcontrol_dispatched_requests_total` counts requests
-  that began executing, grouped by the name of the assigned priority
-  level and the name of the assigned FlowSchema.
+* `apiserver_flowcontrol_dispatched_requests_total` is a vector of
+  counters (cumulative since server start) of requests that began
+  executing, broken down by the labels `flowSchema` (indicating the
+  one that matched the request) and `priorityLevel` (indicating the
+  one to which ithe request was assigned).

-* `apiserver_flowcontrol_current_inqueue_requests` gives the
-  instantaneous total number of queued (not executing) requests,
-  grouped by priority level and FlowSchema.
+* `apiserver_current_inqueue_requests` is a vector gauges of recent
+  high water marks of the number of queued requests, grouped by a
+  label named `request_kind` whose value is `mutating` or `readOnly`.
+  These high water marks describe the largest number seen in the one
+  second window most recently completed.  These complement the older
+  `apiserver_current_inflight_requests` gauge vector that holds the
+  last window's high water mark of number of requests actively being
+  served.

-* `apiserver_flowcontrol_current_executing_requests` gives the instantaneous
-  total number of executing requests, grouped by priority level and FlowSchema.
+* `apiserver_flowcontrol_read_vs_write_request_count_samples` is a
+  vector of histograms of observations of the then-current number of
+  requests, broken down by the labels `phase` (which takes on the
+  values `waiting` and `executing`) and `request_kind` (which takes on
+  the values `mutating` and `readOnly`).  The observations ae made
+  periodically at a high rate.

-* `apiserver_flowcontrol_request_queue_length_after_enqueue` gives a
-  histogram of queue lengths for the queues, grouped by priority level
-  and FlowSchema, as sampled by the enqueued requests.  Each request
-  that gets queued contributes one sample to its histogram, reporting
-  the length of the queue just after the request was added.  Note that
-  this produces different statistics than an unbiased survey would.
+* `apiserver_flowcontrol_read_vs_write_request_count_watermarks` is a
+  vector of histograms of high or low water marks of the number of
+  requests broken down by the labels `phase` (which takes on the
+  values `waiting` and `executing`) and `request_kind` (which takes on
+  the values `mutating` and `readOnly`); the label `mark` takes on
+  values `high` and `low`.  The water marks are accumulated over
+  windows bounded by the times when an observation was added to
+  `apiserver_flowcontrol_read_vs_write_request_count_samples`.  These
+  water marks show the range of values that occurred between samples.
+
+* `apiserver_flowcontrol_current_inqueue_requests` is a vector of
+  gauges holding the instantaneous number of queued (not executing)
+  requests, broken down by the labels `priorityLevel` and
+  `flowSchema`.
+
+* `apiserver_flowcontrol_current_executing_requests` is a vector of
+  gauges holding the instantaneous number of executing (not waiting in
+  a queue) requests, broken down by the labels `priorityLevel` and
+  `flowSchema`.
+
+* `apiserver_flowcontrol_priority_level_request_count_samples` is a
+  vector of histograms of observations of the then-current number of
+  requests broken down by the labels `phase` (which takes on the
+  values `waiting` and `executing`) and `priorityLevel`.  Each
+  histogram gets observations taken periodically, up through the last
+  activity of the relevant sort.  The observations are made at a high
+  rate.
+
+* `apiserver_flowcontrol_priority_level_request_count_watermarks` is a
+  vector of histograms of high or low water marks of the number of
+  requests broken down by the labels `phase` (which takes on the
+  values `waiting` and `executing`) and `priorityLevel`; the label
+  `mark` takes on values `high` and `low`.  The water marks are
+  accumulated over windows bounded by the times when an observation
+  was added to
+  `apiserver_flowcontrol_priority_level_request_count_samples`.  These
+  water marks show the range of values that occurred between samples.
+
+* `apiserver_flowcontrol_request_queue_length_after_enqueue` is a
+  vector of histograms of queue lengths for the queues, broken down by
+  the labels `priorityLevel` and `flowSchema`, as sampled by the
+  enqueued requests.  Each request that gets queued contributes one
+  sample to its histogram, reporting the length of the queue just
+  after the request was added.  Note that this produces different
+  statistics than an unbiased survey would.
    {{< note >}}
    An outlier value in a histogram here means it is likely that a single flow
    (i.e., requests by one user or for one namespace, depending on
@ -346,14 +398,17 @@ poorly-behaved workloads that may be harming system health.
    to increase that PriorityLevelConfiguration's concurrency shares.
    {{< /note >}}

-* `apiserver_flowcontrol_request_concurrency_limit` gives the computed
-  concurrency limit (based on the API server's total concurrency limit and PriorityLevelConfigurations'
-  concurrency shares) for each PriorityLevelConfiguration.
+* `apiserver_flowcontrol_request_concurrency_limit` is a vector of
+  gauges hoding the computed concurrency limit (based on the API
+  server's total concurrency limit and PriorityLevelConfigurations'
+  concurrency shares), broken down by the label `priorityLevel`.

-* `apiserver_flowcontrol_request_wait_duration_seconds` gives a histogram of how
-  long requests spent queued, grouped by the FlowSchema that matched the
-  request, the PriorityLevel to which it was assigned, and whether or not the
-  request successfully executed.
+* `apiserver_flowcontrol_request_wait_duration_seconds` is a vector of
+  histograms of how long requests spent queued, broken down by the
+  labels `flowSchema` (indicating which one matched the request),
+  `priorityLevel` (indicating the one to which ithe request was
+  assigned), and `execute` (indicating whether the request started
+  executing).
    {{< note >}}
    Since each FlowSchema always assigns requests to a single
    PriorityLevelConfiguration, you can add the histograms for all the
@ -361,9 +416,11 @@ poorly-behaved workloads that may be harming system health.
    requests assigned to that priority level.
    {{< /note >}}

-* `apiserver_flowcontrol_request_execution_seconds` gives a histogram of how
-  long requests took to actually execute, grouped by the FlowSchema that matched the
-  request and the PriorityLevel to which it was assigned.
+* `apiserver_flowcontrol_request_execution_seconds` is a vector of
+  histograms of how long requests took to actually execute, broken
+  down by the labels `flowSchema` (indicating which one matched the
+  request), `priorityLevel` (indicating the one to which the request
+  was assigned).