Update to track introduction of borrowing to APF
Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
This commit is contained in:
parent
1939378cd0
commit
5ce3dcfe7c
|
|
@ -52,20 +52,20 @@ for a general explanation of feature gates and how to enable and
|
||||||
disable them. The name of the feature gate for APF is
|
disable them. The name of the feature gate for APF is
|
||||||
"APIPriorityAndFairness". This feature also involves an {{<
|
"APIPriorityAndFairness". This feature also involves an {{<
|
||||||
glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
|
glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
|
||||||
`v1alpha1` version, disabled by default, and (b) `v1beta1` and
|
`v1alpha1` version and a `v1beta1` version, disabled by default, and
|
||||||
`v1beta2` versions, enabled by default. You can disable the feature
|
(b) `v1beta2` and `v1beta3` versions, enabled by default. You can
|
||||||
gate and API group beta versions by adding the following
|
disable the feature gate and API group beta versions by adding the
|
||||||
command-line flags to your `kube-apiserver` invocation:
|
following command-line flags to your `kube-apiserver` invocation:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
kube-apiserver \
|
kube-apiserver \
|
||||||
--feature-gates=APIPriorityAndFairness=false \
|
--feature-gates=APIPriorityAndFairness=false \
|
||||||
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false,flowcontrol.apiserver.k8s.io/v1beta2=false \
|
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta2=false,flowcontrol.apiserver.k8s.io/v1beta3=false \
|
||||||
# …and other flags as usual
|
# …and other flags as usual
|
||||||
```
|
```
|
||||||
|
|
||||||
Alternatively, you can enable the v1alpha1 version of the API group
|
Alternatively, you can enable the v1alpha1 and v1beta1 versions of the API group
|
||||||
with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true`.
|
with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true,flowcontrol.apiserver.k8s.io/v1beta1=true`.
|
||||||
|
|
||||||
The command-line flag `--enable-priority-and-fairness=false` will disable the
|
The command-line flag `--enable-priority-and-fairness=false` will disable the
|
||||||
API Priority and Fairness feature, even if other flags have enabled it.
|
API Priority and Fairness feature, even if other flags have enabled it.
|
||||||
|
|
@ -89,7 +89,7 @@ Without APF enabled, overall concurrency in the API server is limited by the
|
||||||
defined by these flags are summed and then the sum is divided up among a
|
defined by these flags are summed and then the sum is divided up among a
|
||||||
configurable set of _priority levels_. Each incoming request is assigned to a
|
configurable set of _priority levels_. Each incoming request is assigned to a
|
||||||
single priority level, and each priority level will only dispatch as many
|
single priority level, and each priority level will only dispatch as many
|
||||||
concurrent requests as its configuration allows.
|
concurrent requests as its particular limit allows.
|
||||||
|
|
||||||
The default configuration, for example, includes separate priority levels for
|
The default configuration, for example, includes separate priority levels for
|
||||||
leader-election requests, requests from built-in controllers, and requests from
|
leader-election requests, requests from built-in controllers, and requests from
|
||||||
|
|
@ -97,6 +97,13 @@ Pods. This means that an ill-behaved Pod that floods the API server with
|
||||||
requests cannot prevent leader election or actions by the built-in controllers
|
requests cannot prevent leader election or actions by the built-in controllers
|
||||||
from succeeding.
|
from succeeding.
|
||||||
|
|
||||||
|
The concurrency limits of the priority levels are periodically
|
||||||
|
adjusted, allowing under-utilized priority levels to temporarily lend
|
||||||
|
concurrency to heavily-utilized levels. These limits are based on
|
||||||
|
nominal limits and bounds on how much concurrency a priority level may
|
||||||
|
lend and how much it may borrow, all derived from the configuration
|
||||||
|
objects mentioned below.
|
||||||
|
|
||||||
### Seats Occupied by a Request
|
### Seats Occupied by a Request
|
||||||
|
|
||||||
The above description of concurrency management is the baseline story.
|
The above description of concurrency management is the baseline story.
|
||||||
|
|
@ -187,15 +194,38 @@ A PriorityLevelConfiguration represents a single priority level. Each
|
||||||
PriorityLevelConfiguration has an independent limit on the number of outstanding
|
PriorityLevelConfiguration has an independent limit on the number of outstanding
|
||||||
requests, and limitations on the number of queued requests.
|
requests, and limitations on the number of queued requests.
|
||||||
|
|
||||||
Concurrency limits for PriorityLevelConfigurations are not specified in absolute
|
The nominal oncurrency limit for a PriorityLevelConfiguration is not
|
||||||
number of requests, but rather in "concurrency shares." The total concurrency
|
specified in an absolute number of seats, but rather in "nominal
|
||||||
limit for the API Server is distributed among the existing
|
concurrency shares." The total concurrency limit for the API Server is
|
||||||
PriorityLevelConfigurations in proportion with these shares. This allows a
|
distributed among the existing PriorityLevelConfigurations in
|
||||||
cluster administrator to scale up or down the total amount of traffic to a
|
proportion to these shares, to give each level its nominal limit in
|
||||||
server by restarting `kube-apiserver` with a different value for
|
terms of seats. This allows a cluster administrator to scale up or
|
||||||
`--max-requests-inflight` (or `--max-mutating-requests-inflight`), and all
|
down the total amount of traffic to a server by restarting
|
||||||
PriorityLevelConfigurations will see their maximum allowed concurrency go up (or
|
`kube-apiserver` with a different value for `--max-requests-inflight`
|
||||||
down) by the same fraction.
|
(or `--max-mutating-requests-inflight`), and all
|
||||||
|
PriorityLevelConfigurations will see their maximum allowed concurrency
|
||||||
|
go up (or down) by the same fraction.
|
||||||
|
|
||||||
|
{{< caution >}}
|
||||||
|
In the versions before `v1beta3` the relevant
|
||||||
|
PriorityLevelConfiguration field is named "assured concurrency shares"
|
||||||
|
rather than "nominal concurrency shares". Also, in Kubernetes release
|
||||||
|
1.25 and earlier there were no periodic adjustments: the
|
||||||
|
nominal/assured limits were always applied without adjustment.
|
||||||
|
{{< /caution >}}
|
||||||
|
|
||||||
|
The bounds on how much concurrency a priority level may lend and how
|
||||||
|
much it may borrow are expressed in the PriorityLevelConfiguration as
|
||||||
|
percentages of the level's nominal limit. These are resolved to
|
||||||
|
absolute numbers of seats by multiplying with the nominal limit /
|
||||||
|
100.0 and rounding. The dynamically adjusted concurrency limit of a
|
||||||
|
priority level is constrained to lie between (a) a lower bound of its
|
||||||
|
nominal limit minus its lendable seats and (b) an upper bound of its
|
||||||
|
nominal limit plus the seats it may borrow. At each adjustment the
|
||||||
|
dynamic limits are derived by each priority level reclaiming any lent
|
||||||
|
seats for which demand recently appeared and then jointly fairly
|
||||||
|
responding to the recent seat demand on the priority levels, within
|
||||||
|
the bounds just described.
|
||||||
|
|
||||||
{{< caution >}}
|
{{< caution >}}
|
||||||
With the Priority and Fairness feature enabled, the total concurrency limit for
|
With the Priority and Fairness feature enabled, the total concurrency limit for
|
||||||
|
|
@ -606,10 +636,55 @@ poorly-behaved workloads that may be harming system health.
|
||||||
to increase that PriorityLevelConfiguration's concurrency shares.
|
to increase that PriorityLevelConfiguration's concurrency shares.
|
||||||
{{< /note >}}
|
{{< /note >}}
|
||||||
|
|
||||||
* `apiserver_flowcontrol_request_concurrency_limit` is a gauge vector
|
* `apiserver_flowcontrol_request_concurrency_limit` is the same as
|
||||||
holding the computed concurrency limit (based on the API server's
|
`apiserver_flowcontrol_nominal_limit_seats`. Before the
|
||||||
total concurrency limit and PriorityLevelConfigurations' concurrency
|
introduction of concurrency borrowing between priority levels, this
|
||||||
shares), broken down by the label `priority_level`.
|
was always equal to `apiserver_flowcontrol_current_limit_seats`
|
||||||
|
(which did not exist as a distinct metric).
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_nominal_limit_seats` is a gauge vector
|
||||||
|
holding each priority level's nominal concurrency limit, computed
|
||||||
|
from the API server's total concurrency limit and the priority
|
||||||
|
level's configured nominal concurrency shares.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_lower_limit_seats` is a gauge vector holding
|
||||||
|
the lower bound on each priority level's dynamic concurrency limit.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_upper_limit_seats` is a gauge vector holding
|
||||||
|
the upper bound on each priority level's dynamic concurrency limit.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_demand_seats` is a histogram vector counting
|
||||||
|
observations, at the end of every nanosecond, of each priority
|
||||||
|
level's ratio of (seat demand) / (nominal concurrency limit). A
|
||||||
|
priority level's seat demand is the sum, over both queued requests
|
||||||
|
and those in the initial phase of execution, of the maximum of the
|
||||||
|
number of seats occupied in the request's initial and final
|
||||||
|
execution phases.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_demand_seats_high_watermark` is a gauge vector
|
||||||
|
holding, for each priority level, the maximum seat demand seen
|
||||||
|
during the last concurrency borrowing adjustment period.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_demand_seats_average` is a gauge vector
|
||||||
|
holding, for each priority level, the time-weighted average seat
|
||||||
|
demand seen during the last concurrency borrowing adjustment period.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_demand_seats_stdev` is a gauge vector
|
||||||
|
holding, for each priority level, the time-weighted population
|
||||||
|
standard deviation of seat demand seen during the last concurrency
|
||||||
|
borrowing adjustment period.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_target_seats` is a gauge vector holding, for
|
||||||
|
each priority level, the concurrency target going into the borrowing
|
||||||
|
allocation problem.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_seat_fair_frac` is a gauge holding the fair
|
||||||
|
allocation fraction determined in the last borrowing adjustment.
|
||||||
|
|
||||||
|
* `apiserver_flowcontrol_current_limit_seats` is a gauge vector
|
||||||
|
holding, for each priority level, the dynamic concurrency limit
|
||||||
|
derived in the last adjustment.
|
||||||
|
|
||||||
|
|
||||||
* `apiserver_flowcontrol_request_wait_duration_seconds` is a histogram
|
* `apiserver_flowcontrol_request_wait_duration_seconds` is a histogram
|
||||||
vector of how long requests spent queued, broken down by the labels
|
vector of how long requests spent queued, broken down by the labels
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue