Update to track introduction of borrowing to APF

Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
2022-11-22 17:01:58 -05:00 · 2022-11-22 17:01:58 -05:00 · 5ce3dcfe7c
parent 1939378cd0
commit 5ce3dcfe7c
1 changed files with 96 additions and 21 deletions
--- a/content/en/docs/concepts/cluster-administration/flow-control.md
+++ b/content/en/docs/concepts/cluster-administration/flow-control.md
@ -52,20 +52,20 @@ for a general explanation of feature gates and how to enable and
 disable them.  The name of the feature gate for APF is
 "APIPriorityAndFairness".  This feature also involves an {{<
 glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
-`v1alpha1` version, disabled by default, and (b) `v1beta1` and
+`v1alpha1` version and a `v1beta1` version, disabled by default, and
-`v1beta2` versions, enabled by default.  You can disable the feature
+(b) `v1beta2` and `v1beta3` versions, enabled by default.  You can
-gate and API group beta versions by adding the following
+disable the feature gate and API group beta versions by adding the
-command-line flags to your `kube-apiserver` invocation:
+following command-line flags to your `kube-apiserver` invocation:
 ```shell
 kube-apiserver \
 --feature-gates=APIPriorityAndFairness=false \
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false,flowcontrol.apiserver.k8s.io/v1beta2=false \
+--runtime-config=flowcontrol.apiserver.k8s.io/v1beta2=false,flowcontrol.apiserver.k8s.io/v1beta3=false \
 # …and other flags as usual
 ```
-Alternatively, you can enable the v1alpha1 version of the API group
+Alternatively, you can enable the v1alpha1 and v1beta1 versions of the API group
-with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true`.
+with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true,flowcontrol.apiserver.k8s.io/v1beta1=true`.
 The command-line flag `--enable-priority-and-fairness=false` will disable the
 API Priority and Fairness feature, even if other flags have enabled it.
@ -89,7 +89,7 @@ Without APF enabled, overall concurrency in the API server is limited by the
 defined by these flags are summed and then the sum is divided up among a
 configurable set of _priority levels_. Each incoming request is assigned to a
 single priority level, and each priority level will only dispatch as many
-concurrent requests as its configuration allows.
+concurrent requests as its particular limit allows.
 The default configuration, for example, includes separate priority levels for
 leader-election requests, requests from built-in controllers, and requests from
@ -97,6 +97,13 @@ Pods. This means that an ill-behaved Pod that floods the API server with
 requests cannot prevent leader election or actions by the built-in controllers
 from succeeding.
 The concurrency limits of the priority levels are periodically
 adjusted, allowing under-utilized priority levels to temporarily lend
 concurrency to heavily-utilized levels.  These limits are based on
 nominal limits and bounds on how much concurrency a priority level may
 lend and how much it may borrow, all derived from the configuration
 objects mentioned below.
 ### Seats Occupied by a Request
 The above description of concurrency management is the baseline story.
@ -187,15 +194,38 @@ A PriorityLevelConfiguration represents a single priority level. Each
 PriorityLevelConfiguration has an independent limit on the number of outstanding
 requests, and limitations on the number of queued requests.
-Concurrency limits for PriorityLevelConfigurations are not specified in absolute
+The nominal oncurrency limit for a PriorityLevelConfiguration is not
-number of requests, but rather in "concurrency shares." The total concurrency
+specified in an absolute number of seats, but rather in "nominal
-limit for the API Server is distributed among the existing
+concurrency shares." The total concurrency limit for the API Server is
-PriorityLevelConfigurations in proportion with these shares. This allows a
+distributed among the existing PriorityLevelConfigurations in
-cluster administrator to scale up or down the total amount of traffic to a
+proportion to these shares, to give each level its nominal limit in
-server by restarting `kube-apiserver` with a different value for
+terms of seats. This allows a cluster administrator to scale up or
-`--max-requests-inflight` (or `--max-mutating-requests-inflight`), and all
+down the total amount of traffic to a server by restarting
-PriorityLevelConfigurations will see their maximum allowed concurrency go up (or
+`kube-apiserver` with a different value for `--max-requests-inflight`
-down) by the same fraction.
+(or `--max-mutating-requests-inflight`), and all
 PriorityLevelConfigurations will see their maximum allowed concurrency
 go up (or down) by the same fraction.
 {{< caution >}}
 In the versions before `v1beta3` the relevant
 PriorityLevelConfiguration field is named "assured concurrency shares"
 rather than "nominal concurrency shares".  Also, in Kubernetes release
 1.25 and earlier there were no periodic adjustments: the
 nominal/assured limits were always applied without adjustment.
 {{< /caution >}}
 The bounds on how much concurrency a priority level may lend and how
 much it may borrow are expressed in the PriorityLevelConfiguration as
 percentages of the level's nominal limit.  These are resolved to
 absolute numbers of seats by multiplying with the nominal limit /
 100.0 and rounding.  The dynamically adjusted concurrency limit of a
 priority level is constrained to lie between (a) a lower bound of its
 nominal limit minus its lendable seats and (b) an upper bound of its
 nominal limit plus the seats it may borrow.  At each adjustment the
 dynamic limits are derived by each priority level reclaiming any lent
 seats for which demand recently appeared and then jointly fairly
 responding to the recent seat demand on the priority levels, within
 the bounds just described.
 {{< caution >}}
 With the Priority and Fairness feature enabled, the total concurrency limit for
@ -606,10 +636,55 @@ poorly-behaved workloads that may be harming system health.
  to increase that PriorityLevelConfiguration's concurrency shares.
  {{< /note >}}
-* `apiserver_flowcontrol_request_concurrency_limit` is a gauge vector
+* `apiserver_flowcontrol_request_concurrency_limit` is the same as
-  holding the computed concurrency limit (based on the API server's
+  `apiserver_flowcontrol_nominal_limit_seats`.  Before the
-  total concurrency limit and PriorityLevelConfigurations' concurrency
+  introduction of concurrency borrowing between priority levels, this
-  shares), broken down by the label `priority_level`.
+  was always equal to `apiserver_flowcontrol_current_limit_seats`
  (which did not exist as a distinct metric).
 * `apiserver_flowcontrol_nominal_limit_seats` is a gauge vector
  holding each priority level's nominal concurrency limit, computed
  from the API server's total concurrency limit and the priority
  level's configured nominal concurrency shares.
 * `apiserver_flowcontrol_lower_limit_seats` is a gauge vector holding
  the lower bound on each priority level's dynamic concurrency limit.
 * `apiserver_flowcontrol_upper_limit_seats` is a gauge vector holding
  the upper bound on each priority level's dynamic concurrency limit.
 * `apiserver_flowcontrol_demand_seats` is a histogram vector counting
  observations, at the end of every nanosecond, of each priority
  level's ratio of (seat demand) / (nominal concurrency limit).  A
  priority level's seat demand is the sum, over both queued requests
  and those in the initial phase of execution, of the maximum of the
  number of seats occupied in the request's initial and final
  execution phases.
 * `apiserver_flowcontrol_demand_seats_high_watermark` is a gauge vector
  holding, for each priority level, the maximum seat demand seen
  during the last concurrency borrowing adjustment period.
 * `apiserver_flowcontrol_demand_seats_average` is a gauge vector
  holding, for each priority level, the time-weighted average seat
  demand seen during the last concurrency borrowing adjustment period.
 * `apiserver_flowcontrol_demand_seats_stdev` is a gauge vector
  holding, for each priority level, the time-weighted population
  standard deviation of seat demand seen during the last concurrency
  borrowing adjustment period.
 * `apiserver_flowcontrol_target_seats` is a gauge vector holding, for
  each priority level, the concurrency target going into the borrowing
  allocation problem.
 * `apiserver_flowcontrol_seat_fair_frac` is a gauge holding the fair
  allocation fraction determined in the last borrowing adjustment.
 * `apiserver_flowcontrol_current_limit_seats` is a gauge vector
  holding, for each priority level, the dynamic concurrency limit
  derived in the last adjustment.
 * `apiserver_flowcontrol_request_wait_duration_seconds` is a histogram
  vector of how long requests spent queued, broken down by the labels