Also make some design changes exposed in testing and review.
Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early. This creates a problem, that metric can not
keep both of its old meanings. I chose the configured concurrency
limit.
Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking. The current design in the KEP is
as follows.
> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.
But this does not work out well at server startup. As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state. As always,
the two mandatory priority levels are implicitly added whenever they
are not read. So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design. So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels. Its Target is higher than
all the others, once they start to show up. It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.
I have considered the following fix. The idea is that the designed
initialization is not appropriate before all the default objects are
read. So the fix is to have a mode bit in the controller. In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit. In the later mode,
the seat demand tracking variables are initialized as originally
designed.
However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.
So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero. Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.
Also: revise logging logic, to log at numerically lower V level when
there is a change.
Also: bug fix in float64close.
Also, separate imports in some file
Co-authored-by: Han Kang <hankang@google.com>
Kubernetes-commit: feb42277884bc7cfbd6f0bb1d875cc63b1b6caac
Fix the one path where boundNextDispatchLocked was not being called
after modifying a queue.
Also check for negative work in a request.
These are motivated by
https://github.com/kubernetes/kubernetes/issues/112169 but I do not
have a way to reproduce it and so can not check that these changes
actually remove that symptom. But these changes are good anyway.
Kubernetes-commit: 6ee93e2cee695203a6ce4935da1b9a807b624260
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Kubernetes-commit: a9593d634c6a053848413e600dadbf974627515f
Some of these changes are cosmetic (repeatedly calling klog.V instead of
reusing the result), others address real issues:
- Logging a message only above a certain verbosity threshold without
recording that verbosity level (if klog.V().Enabled() { klog.Info... }):
this matters when using a logging backend which records the verbosity
level.
- Passing a format string with parameters to a logging function that
doesn't do string formatting.
All of these locations where found by the enhanced logcheck tool from
https://github.com/kubernetes/klog/pull/297.
In some cases it reports false positives, but those can be suppressed with
source code comments.
Kubernetes-commit: edffc700a43e610f641907290a5152ca593bad79
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:
if klog.V(5).Enabled() {
klog.Info("hello world")
}
Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.
Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
Kubernetes-commit: 9eaa2dc554e0c3d4485d4c916dfdbc2f517db2e0
Track the introduction of FinalSeats.
Give up on calculating expected results for tests with added latency,
because I did not find an easy and obvious way to do it.
Kubernetes-commit: 0fc595e03360ba7fc4c3e251d4b41f39172aca72
New anti-windup technique: use the request arrival time as the floor
on the virtual dispatch time. Prevent bound violations where they
might arise rather than fixing up just one queue at dispatch time,
so that the fixed up dispatch times figure into the dispatching choice.
Two tweaks to the shuffle sharding. Take seats of executing requests
into account as well as seats of waiting requests. Do not always
consider the generated hand in the same order.
Rename the queueset methods that do shuffle sharding and finding the
queue to dispatch from, because the old names were confusingly
similar.
Tighten up some request margins.
Name the test cases in TestNoRestraint and TestWindup.
Kubernetes-commit: 4b9cba85874158b25b5c994773a4ec04343820c2
Canonicalize listing of test cases.
Make TestNoRestraint try both cases: competition and none.
Kubernetes-commit: 0ee1a7b4ff9012b050bd447055ad5e1e8c57c30e
Make TestNoRestraint verify that fairness is NOT achieved
when there is real competition.
Make TestWindup run two cases, to show that 0.1 is too narrow
a margin and 0.26 is wide enough.
Kubernetes-commit: c4945fdf0c14ba2032a5c8edf192678d9fe00374
These behavioral unit tests of queueset were failing because the
evaluation criteria were too strict.
Kubernetes-commit: 59d319ec06bb33289a87036418b4a61ed3bb215f
So that the width estimate has some effect but not a grossly excessive
one.
Added the fifo::Peek method to simplify the fifo client code.
Also renamed the queueSet::estimatedServiceTime field to
estimatedServiceSeconds to make the units clear.
Kubernetes-commit: a0c161f2f6908ee424ea888ff40f75ff071bd20a
Added missing dispatching after delayed release of seats.
Updated logging for all six situations of execution completion and
seat release.
Added behavioral tests for non-zero extra latency and non-unit width.
Also added two tests for baseline functionality.
Also improved some comments and other logging in `queueset.go`.
Kubernetes-commit: d2a27a58f0af20c6185fa1c21890d666e9d3746b
Add comment outlining TestContextCancel.
Stop calling `t.Errorf` from wrong goroutine.
Package up queueNoteFn expectation checking.
Add counting of goroutine in req1 exec fn.
Remove unnecessary assignment to `_`.
Make TestContextCancel wait on fake clock, to insulate timing check
from scheduler noise.
Factor goroutine counting out of queueset.go, into queueset_test.go,
where it matters.
Refactor promise: Use a simple channel-based implementation for normal
code, a mutex-based one for testing code.
Took all the panics out of queueset.go
Shrink the timeouts in promise tests to 1 second.
Kubernetes-commit: 1db36ae3b30e30d70972998a22987a7db470479b
Rename from `clock` to `eventclock`.
Simplify by removing the prohibition on an EventFunc suspending and
resuming activity.
Remove "EventClock" from names to avoid stuttering.
Start to consolidate test code under fairqueuing/testing/.
Kubernetes-commit: 80ca6a4ae6ff571c32962a7155efd55edefff9e6
So we can move off of the apimachinery clock package.
Switch queueset to new clocks.
Removed event clocks based on apimachinery clocks,
because this PR introduces ones based on k8s.io/utils/clock .
Removed interface that is implemented by only one interesting type.
Simplify RealEventClock::EventAfterTime.
Kubernetes-commit: dcb298c9552de44e27ed52f5e2b58a0dd7cd8d54