Commit Graph

64 Commits

Author SHA1 Message Date
Abu Kashem d64c9b18da apf: remove RequestWaitLimit from queueset config
Kubernetes-commit: 11ef9514dad6f46a4315198978fee14132c4bbca
2023-08-29 12:11:08 -04:00
Abu Kashem 290096a4d0 apf: remove timeoutOldRequestsAndRejectOrEnqueueLocked function
Kubernetes-commit: da8a472206623d0727ba486489d34780c4b6c1d9
2023-08-28 17:26:11 -04:00
Abu Kashem 27772523df apf: refactor promise to use a context
Kubernetes-commit: 0039f24d74d0f57c8ba868ae361821d37fd908d6
2023-08-21 15:19:31 -04:00
Andrew Sy Kim 066c7cb8cc apiserver: add flow control metric current_inqueue_seats
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

Kubernetes-commit: fb9646fd60d4b8e79223b729c1cb54fc6818fdd1
2023-07-24 19:40:05 +00:00
Mike Spreitzer b8bc556baa Add tracking and reporting of executing requests
Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>

Kubernetes-commit: a8a2fb317c8bc9c64ced023988802b2517d34f81
2023-06-30 22:55:35 -04:00
Andrew Sy Kim 73f18d34af promote the following APF metrics to beta:
apiserver_flowcontrol_request_wait_duration_seconds
apiserver_flowcontrol_request_concurrency_in_use
apiserver_flowcontrol_request_concurrency_limit
apiserver_flowcontrol_rejected_requests_total
apiserver_flowcontrol_dispatched_requests_total
apiserver_flowcontrol_current_inqueue_requests
apiserver_flowcontrol_current_executing_requests

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

Kubernetes-commit: 0bb419b1498a664d1dda3b487e9f15fd220ea363
2023-07-05 18:19:36 +00:00
Mike Spreitzer 078694d35d Make QueueSet support exempt behavior; use it
Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>

Kubernetes-commit: f269acd12b225f6a2dbbfae64a475f73f448b918
2023-06-28 22:55:30 -04:00
RuquanZhao bc5f595633 fix undefined convertion
Signed-off-by: Ruquan Zhao ruquan.zhao@arm.com

Kubernetes-commit: 65f3454c1d926a1f119710684794bb54350ef4b1
2023-04-20 17:16:46 +08:00
Andrew Sy Kim f86340dad2 increase expected fairness margin in TestDifferentWidths
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

Kubernetes-commit: 736720128824264b4246f247b9ec0d09f5383cf0
2022-10-21 11:39:11 -04:00
Mike Spreitzer 770f2e1fa4 apiserver: finish implementation of borrowing in APF
Also make some design changes exposed in testing and review.

Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early.  This creates a problem, that metric can not
keep both of its old meanings.  I chose the configured concurrency
limit.

Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking.  The current design in the KEP is
as follows.

> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.

But this does not work out well at server startup.  As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state.  As always,
the two mandatory priority levels are implicitly added whenever they
are not read.  So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design.  So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels.  Its Target is higher than
all the others, once they start to show up.  It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.

I have considered the following fix.  The idea is that the designed
initialization is not appropriate before all the default objects are
read.  So the fix is to have a mode bit in the controller.  In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit.  In the later mode,
the seat demand tracking variables are initialized as originally
designed.

However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.

So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero.  Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.

Also: revise logging logic, to log at numerically lower V level when
there is a change.

Also: bug fix in float64close.

Also, separate imports in some file

Co-authored-by: Han Kang <hankang@google.com>

Kubernetes-commit: feb42277884bc7cfbd6f0bb1d875cc63b1b6caac
2022-10-31 16:13:25 -07:00
Mike Spreitzer 413be63b46 Add instrumentation for seat borrowing
Kubernetes-commit: 9b684579e230f105bcaa743f06bc07c39af703df
2022-10-20 15:21:09 -04:00
Mike Spreitzer 3419387b18 Call queueSet::boundNextDispatchLocked enough
Fix the one path where boundNextDispatchLocked was not being called
after modifying a queue.

Also check for negative work in a request.

These are motivated by
https://github.com/kubernetes/kubernetes/issues/112169 but I do not
have a way to reproduce it and so can not check that these changes
actually remove that symptom.  But these changes are good anyway.

Kubernetes-commit: 6ee93e2cee695203a6ce4935da1b9a807b624260
2022-09-01 22:54:53 -04:00
jupblb 16f776a534 Switch initial/final seats type to uint64
Kubernetes-commit: 3c46482eb09d7343e0f98a930a9aaa158237e278
2022-07-28 10:48:40 +02:00
Davanum Srinivas 7e94033a61 Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>

Kubernetes-commit: a9593d634c6a053848413e600dadbf974627515f
2022-07-19 20:54:13 -04:00
Mike Spreitzer 0796534fe5 Remove the PairVec types
Kubernetes-commit: 1f1cfba2a3fb35a8542bbf64a46746214355674c
2022-06-11 00:57:19 -04:00
Mike Spreitzer cae328fb1c Give apf metrics abstractions more familiar names
The logic is similar to Prometheus gauges and vectors,
adopt that terminology.

Kubernetes-commit: 7d64a93a1407f91b5e13bf540a0fa834a41622eb
2022-05-17 23:27:47 -04:00
Mike Spreitzer 8628966894 Fix more initial numerators
Kubernetes-commit: ba690c2257af76bd971d0dfb6bef13ff9099e549
2022-05-18 00:22:30 -04:00
Mike Spreitzer 6adfddf535 Clarify APF metric wrt all three stages of execution
Kubernetes-commit: 88f8e8448bf873cf41035cb858422a10a1d03018
2021-11-30 11:45:53 -05:00
Mike Spreitzer 4098be7694 Factored TimedObserver into less surprising pieces
Kubernetes-commit: ab64e852023965fd8873abcd50ff09cf79814d11
2021-11-15 14:59:30 -05:00
Mike Spreitzer 6a2631848c Add sample-and-watermark for seats occupied during all of execution
Kubernetes-commit: 945f960cfb8fc018b093c1a08e5d4cdd362b1fc6
2021-10-25 01:13:52 -04:00
Wojciech Tyczyński 55b43e446f P&F: move seat-seconds to a better location
Kubernetes-commit: e262db7a4daf5218520e49b423789ea55a94af75
2021-10-27 10:30:25 +02:00
Mike Spreitzer 5283383fb5 Clarify metrics help wrt APF execution phases
Kubernetes-commit: d7a3bf0d260a0c291941cda68492f10e5010ac91
2021-10-24 22:32:13 -04:00
Mike Spreitzer c5a0365136 Fix nits noticed in recent code review
Kubernetes-commit: 1844a052776bce33322ce20c11b2902403655ef8
2021-10-18 23:51:48 -05:00
Mike Spreitzer d69d77c659 Update queueset_test.go for FinalSeats
Track the introduction of FinalSeats.

Give up on calculating expected results for tests with added latency,
because I did not find an easy and obvious way to do it.

Kubernetes-commit: 0fc595e03360ba7fc4c3e251d4b41f39172aca72
2021-10-08 22:27:39 -07:00
Mike Spreitzer f7bfb170d7 Keep the progress meter R from overflowing
Also add test for that situation.

Kubernetes-commit: a797fbd96de8c67aaed58aef54fbe9f0eb94a2c2
2021-10-01 22:04:05 -07:00
Mike Spreitzer 1b1389676f Relax TestDifferentWidths
Make the margin a little wider because flakiness was reported.

Kubernetes-commit: 10326282f9d1abcd4a45b737628286d40971efea
2021-10-07 16:09:53 -07:00
Mike Spreitzer a5192405d9 Calculate the work in each request just once
Kubernetes-commit: f2c46c8f9d0b360cf913e22c222d9954b4ff9a76
2021-10-07 17:20:56 -07:00
Abu Kashem 9560ec6e92 introduce final seats for work estimate
Kubernetes-commit: 3d6cc118fee15313419bf7aa0082a2a608ec62f6
2021-09-24 15:18:27 -04:00
Mike Spreitzer dc449969cc Use SeatSeconds
Kubernetes-commit: 4b5e1398199282f471d0f332eefeb5c2415bdb01
2021-10-01 15:33:37 -07:00
Abu Kashem 863c48fbc2 apf: rename WorkEstimate.Seats to InitialSeats
Kubernetes-commit: 5d67896adedbce27f01b59eb5f2054919a047f2b
2021-09-24 09:41:38 -04:00
Mike Spreitzer 72ff8a6261 Improve queueset sharding and dispatching
New anti-windup technique: use the request arrival time as the floor
on the virtual dispatch time.  Prevent bound violations where they
might arise rather than fixing up just one queue at dispatch time,
so that the fixed up dispatch times figure into the dispatching choice.

Two tweaks to the shuffle sharding.  Take seats of executing requests
into account as well as seats of waiting requests.  Do not always
consider the generated hand in the same order.

Rename the queueset methods that do shuffle sharding and finding the
queue to dispatch from, because the old names were confusingly
similar.

Tighten up some request margins.

Name the test cases in TestNoRestraint and TestWindup.

Kubernetes-commit: 4b9cba85874158b25b5c994773a4ec04343820c2
2021-09-20 15:45:24 -04:00
Mike Spreitzer 8d3036922c More test tweaks
Canonicalize listing of test cases.

Make TestNoRestraint try both cases: competition and none.

Kubernetes-commit: 0ee1a7b4ff9012b050bd447055ad5e1e8c57c30e
2021-09-20 15:45:24 -04:00
Mike Spreitzer c505aa64af Update TestNoRestraint and TestWindup
Make TestNoRestraint verify that fairness is NOT achieved
when there is real competition.

Make TestWindup run two cases, to show that 0.1 is too narrow
a margin and 0.26 is wide enough.

Kubernetes-commit: c4945fdf0c14ba2032a5c8edf192678d9fe00374
2021-09-17 01:40:16 -04:00
Mike Spreitzer de042674ed Widen margins of TestDifferentWidths and TestTooWide
These behavioral unit tests of queueset were failing because the
evaluation criteria were too strict.

Kubernetes-commit: 59d319ec06bb33289a87036418b4a61ed3bb215f
2021-09-09 17:07:58 -04:00
Mike Spreitzer de227d1d37 Change execution duration guess from 1 minute to 3 milliseconds
So that the width estimate has some effect but not a grossly excessive
one.

Added the fifo::Peek method to simplify the fifo client code.

Also renamed the queueSet::estimatedServiceTime field to
estimatedServiceSeconds to make the units clear.

Kubernetes-commit: a0c161f2f6908ee424ea888ff40f75ff071bd20a
2021-09-07 00:46:50 -04:00
Mike Spreitzer 7d5430cfba Fix extra latency and add tests for that and width
Added missing dispatching after delayed release of seats.

Updated logging for all six situations of execution completion and
seat release.

Added behavioral tests for non-zero extra latency and non-unit width.

Also added two tests for baseline functionality.

Also improved some comments and other logging in `queueset.go`.

Kubernetes-commit: d2a27a58f0af20c6185fa1c21890d666e9d3746b
2021-08-12 16:48:02 -04:00
Abu Kashem da50ca4c6e apf: free seats in use after additional latency
Kubernetes-commit: d68186452d9150b113489e6a722caf82f898857f
2021-06-27 13:04:20 -04:00
Mike Spreitzer 8c2108bc80 Refactor goroutine counting
Add comment outlining TestContextCancel.

Stop calling `t.Errorf` from wrong goroutine.

Package up queueNoteFn expectation checking.

Add counting of goroutine in req1 exec fn.

Remove unnecessary assignment to `_`.

Make TestContextCancel wait on fake clock, to insulate timing check
from scheduler noise.

Factor goroutine counting out of queueset.go, into queueset_test.go,
where it matters.

Refactor promise: Use a simple channel-based implementation for normal
code, a mutex-based one for testing code.

Took all the panics out of queueset.go

Shrink the timeouts in promise tests to 1 second.

Kubernetes-commit: 1db36ae3b30e30d70972998a22987a7db470479b
2021-07-29 00:35:25 -04:00
Mike Spreitzer 904cd74454 Some cleanup of the package for event clocks
Rename from `clock` to `eventclock`.

Simplify by removing the prohibition on an EventFunc suspending and
resuming activity.

Remove "EventClock" from names to avoid stuttering.

Start to consolidate test code under fairqueuing/testing/.

Kubernetes-commit: 80ca6a4ae6ff571c32962a7155efd55edefff9e6
2021-08-06 02:06:43 -04:00
Mike Spreitzer 0c550377cf Introduce event clocks based on k8s.io/utils/clock
So we can move off of the apimachinery clock package.

Switch queueset to new clocks.

Removed event clocks based on apimachinery clocks,
because this PR introduces ones based on k8s.io/utils/clock .

Removed interface that is implemented by only one interesting type.

Simplify RealEventClock::EventAfterTime.

Kubernetes-commit: dcb298c9552de44e27ed52f5e2b58a0dd7cd8d54
2021-07-21 16:56:11 -04:00
wojtekt b4c306e1e8 Rename width to workEstimate in P&F code
Kubernetes-commit: 73211256e8f15cf84ee69d6fe8258c3a912e0f94
2021-07-13 15:10:58 +02:00
Abu Kashem cf5c77fde9 apf: add additional latency into width
Kubernetes-commit: 24e19229101d242d924ce98a562be3864dde9eae
2021-06-27 12:45:24 -04:00
Abu Kashem e1aec4ecae apf: take seats into account when dispatching request
Kubernetes-commit: ff716cef508f948b50e1026e980e6df5ee475538
2021-06-14 12:19:06 -04:00
Abu Kashem 345d1c6ff9 apf: add a gauge for the number of seats currently in use
Kubernetes-commit: c710f99ef730a791a6911e63cc3b9d26cced6bd3
2021-06-10 17:34:50 -04:00
Abu Kashem 3c7f54740f apf: add plumbing to estimate width" of a request
- add plumbing that allows us to estimated "width" of a request
- the default implementation returns 1 as the "width" of all
  incoming requests, this is in keeping with the current behavior.

Kubernetes-commit: 9b72eb1929a64b9d5a5234090a631ba312fb4d41
2021-05-11 07:03:05 -04:00
Abu Kashem eea0d66fcd clean up executing request on panic
Kubernetes-commit: 13cedca0eb5337b13e5176983ea5e784ec38df22
2020-12-10 12:57:21 -05:00
Adhityaa Chandrasekar ebe254b2e6 APF: use snake_case in metric labels
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>

Kubernetes-commit: f9d57a8d5db3e58f79a1b1958d80c049c63d6cde
2020-11-04 22:19:52 +00:00
yue9944882 5474822749 fixes max-min fairness
Kubernetes-commit: fd889ec8ae37437a9e75386542291bd0e2cc605e
2020-10-29 18:57:38 +08:00
Ken Sipe 32533315c9 fix S1000 simplify ch switch cases
Signed-off-by: Ken Sipe <kensipe@gmail.com>

Kubernetes-commit: 268c2f81c7ab94cbab68a8d6c00725144b81fa09
2020-06-26 10:45:30 -05:00
Mike Spreitzer e28ab56bd4 Introduce more metrics on concurrency
Introduce min, average, and standard deviation for the number of
executing mutating and readOnly requests.

Introduce min, max, average, and standard deviation for the number
waiting and number waiting per priority level.

Later:

Revised to use a series of windows

Use three individuals instead of array of powers

Later:

Add coarse queue count metrics, removed windowed avg and stddev

Add metrics for number of queued mutating and readOnly requests,
to complement metrics for number executing.

Later:

Removed windowed average and standard deviation because consumers can
derive such from integrals of consumer's chosen window.

Also replaced "requestKind" Prometheus label with "request_kind".

Later:

Revised to focus on sampling

Make the clock intrinsic to a TimedObserver

... so that the clock can be read while holding the observer's lock;
otherwise, forward progress is not guaranteed (and violations were
observed in testing).

Bug fixes and histogram buckets revision

SetX1 to 1 when queue length limit is zero, beause dividing by zero is nasty.

Remove obsolete argument in gen_test.go.

Add a bucket boundary at 0 for sample-and-water-mark histograms, to
distinguish zeroes from non-zeros.

This includes adding Integrator test.

Simplified test code.

More pervasively used "ctlr" instead of "ctl" as abbreviation for
"controller".

Kubernetes-commit: 57ecea22296797a93b0157169db0ff2e477f58d0
2020-05-17 01:02:25 -04:00