Document new API Priority and Fairness metrics

Also brush up the descriptions of some of the older metrics.
This commit is contained in:
Mike Spreitzer 2020-07-22 19:54:12 -04:00
parent 1c6a25e257
commit 800a602e36
1 changed files with 85 additions and 28 deletions

View File

@ -308,10 +308,12 @@ exports additional metrics. Monitoring these can help you determine whether your
configuration is inappropriately throttling important traffic, or find
poorly-behaved workloads that may be harming system health.
* `apiserver_flowcontrol_rejected_requests_total` counts requests that
were rejected, grouped by the name of the assigned priority level,
the name of the assigned FlowSchema, and the reason for rejection.
The reason will be one of the following:
* `apiserver_flowcontrol_rejected_requests_total` is a vector of
counters (cumulative since server start) of requests that were
rejected, broken down by the labels `flowSchema` (indicating the one
that matched the request), `priorityLevel` (indicating the one to
which ithe request was assigned), and `reason`. The `reason` label
will be have one of the following values:
* `queue-full`, indicating that too many requests were already
queued,
* `concurrency-limit`, indicating that the
@ -320,23 +322,73 @@ poorly-behaved workloads that may be harming system health.
* `time-out`, indicating that the request was still in the queue
when its queuing time limit expired.
* `apiserver_flowcontrol_dispatched_requests_total` counts requests
that began executing, grouped by the name of the assigned priority
level and the name of the assigned FlowSchema.
* `apiserver_flowcontrol_dispatched_requests_total` is a vector of
counters (cumulative since server start) of requests that began
executing, broken down by the labels `flowSchema` (indicating the
one that matched the request) and `priorityLevel` (indicating the
one to which ithe request was assigned).
* `apiserver_flowcontrol_current_inqueue_requests` gives the
instantaneous total number of queued (not executing) requests,
grouped by priority level and FlowSchema.
* `apiserver_current_inqueue_requests` is a vector gauges of recent
high water marks of the number of queued requests, grouped by a
label named `request_kind` whose value is `mutating` or `readOnly`.
These high water marks describe the largest number seen in the one
second window most recently completed. These complement the older
`apiserver_current_inflight_requests` gauge vector that holds the
last window's high water mark of number of requests actively being
served.
* `apiserver_flowcontrol_current_executing_requests` gives the instantaneous
total number of executing requests, grouped by priority level and FlowSchema.
* `apiserver_flowcontrol_read_vs_write_request_count_samples` is a
vector of histograms of observations of the then-current number of
requests, broken down by the labels `phase` (which takes on the
values `waiting` and `executing`) and `request_kind` (which takes on
the values `mutating` and `readOnly`). The observations ae made
periodically at a high rate.
* `apiserver_flowcontrol_request_queue_length_after_enqueue` gives a
histogram of queue lengths for the queues, grouped by priority level
and FlowSchema, as sampled by the enqueued requests. Each request
that gets queued contributes one sample to its histogram, reporting
the length of the queue just after the request was added. Note that
this produces different statistics than an unbiased survey would.
* `apiserver_flowcontrol_read_vs_write_request_count_watermarks` is a
vector of histograms of high or low water marks of the number of
requests broken down by the labels `phase` (which takes on the
values `waiting` and `executing`) and `request_kind` (which takes on
the values `mutating` and `readOnly`); the label `mark` takes on
values `high` and `low`. The water marks are accumulated over
windows bounded by the times when an observation was added to
`apiserver_flowcontrol_read_vs_write_request_count_samples`. These
water marks show the range of values that occurred between samples.
* `apiserver_flowcontrol_current_inqueue_requests` is a vector of
gauges holding the instantaneous number of queued (not executing)
requests, broken down by the labels `priorityLevel` and
`flowSchema`.
* `apiserver_flowcontrol_current_executing_requests` is a vector of
gauges holding the instantaneous number of executing (not waiting in
a queue) requests, broken down by the labels `priorityLevel` and
`flowSchema`.
* `apiserver_flowcontrol_priority_level_request_count_samples` is a
vector of histograms of observations of the then-current number of
requests broken down by the labels `phase` (which takes on the
values `waiting` and `executing`) and `priorityLevel`. Each
histogram gets observations taken periodically, up through the last
activity of the relevant sort. The observations are made at a high
rate.
* `apiserver_flowcontrol_priority_level_request_count_watermarks` is a
vector of histograms of high or low water marks of the number of
requests broken down by the labels `phase` (which takes on the
values `waiting` and `executing`) and `priorityLevel`; the label
`mark` takes on values `high` and `low`. The water marks are
accumulated over windows bounded by the times when an observation
was added to
`apiserver_flowcontrol_priority_level_request_count_samples`. These
water marks show the range of values that occurred between samples.
* `apiserver_flowcontrol_request_queue_length_after_enqueue` is a
vector of histograms of queue lengths for the queues, broken down by
the labels `priorityLevel` and `flowSchema`, as sampled by the
enqueued requests. Each request that gets queued contributes one
sample to its histogram, reporting the length of the queue just
after the request was added. Note that this produces different
statistics than an unbiased survey would.
{{< note >}}
An outlier value in a histogram here means it is likely that a single flow
(i.e., requests by one user or for one namespace, depending on
@ -346,14 +398,17 @@ poorly-behaved workloads that may be harming system health.
to increase that PriorityLevelConfiguration's concurrency shares.
{{< /note >}}
* `apiserver_flowcontrol_request_concurrency_limit` gives the computed
concurrency limit (based on the API server's total concurrency limit and PriorityLevelConfigurations'
concurrency shares) for each PriorityLevelConfiguration.
* `apiserver_flowcontrol_request_concurrency_limit` is a vector of
gauges hoding the computed concurrency limit (based on the API
server's total concurrency limit and PriorityLevelConfigurations'
concurrency shares), broken down by the label `priorityLevel`.
* `apiserver_flowcontrol_request_wait_duration_seconds` gives a histogram of how
long requests spent queued, grouped by the FlowSchema that matched the
request, the PriorityLevel to which it was assigned, and whether or not the
request successfully executed.
* `apiserver_flowcontrol_request_wait_duration_seconds` is a vector of
histograms of how long requests spent queued, broken down by the
labels `flowSchema` (indicating which one matched the request),
`priorityLevel` (indicating the one to which ithe request was
assigned), and `execute` (indicating whether the request started
executing).
{{< note >}}
Since each FlowSchema always assigns requests to a single
PriorityLevelConfiguration, you can add the histograms for all the
@ -361,9 +416,11 @@ poorly-behaved workloads that may be harming system health.
requests assigned to that priority level.
{{< /note >}}
* `apiserver_flowcontrol_request_execution_seconds` gives a histogram of how
long requests took to actually execute, grouped by the FlowSchema that matched the
request and the PriorityLevel to which it was assigned.
* `apiserver_flowcontrol_request_execution_seconds` is a vector of
histograms of how long requests took to actually execute, broken
down by the labels `flowSchema` (indicating which one matched the
request), `priorityLevel` (indicating the one to which the request
was assigned).