Commit Graph

186 Commits

Author SHA1 Message Date
Patrick Ohly d712a4ee7e apimachinery runtime: support contextual logging
In contrast to the original HandleError and HandleCrash, the new
HandleErrorWithContext and HandleCrashWithContext functions properly do contextual
logging, so if a problem occurs while e.g. dealing with a certain request and
WithValues was used for that request, then the error log entry will also
contain information about it.

The output changes from unstructured to structured, which might be a breaking
change for users who grep for panics. Care was taken to format panics
as similar as possible to the original output.

For errors, a message string gets added. There was none before, which made it
impossible to find all error output coming from HandleError.

Keeping HandleError and HandleCrash around without deprecating while changing
the signature of callbacks is a compromise between not breaking existing code
and not adding too many special cases that need to be supported. There is some
code which uses PanicHandlers or ErrorHandlers, but less than code that uses
the Handle* calls.

In Kubernetes, we want to replace the calls. logcheck warns about them in code
which is supposed to be contextual. The steps towards that are:
- add TODO remarks as reminder (this commit)
- locally remove " TODO(pohly): " to enable the check with `//logcheck:context`,
  merge fixes for linter warnings
- once there are none, remove the TODO to enable the check permanently

Kubernetes-commit: 5a130d2b71e5d70cfff15087f4d521c6b68fb01e
2023-11-20 20:25:00 +01:00
Eric Lin 000601bdbe Add handler to run watch serving in separate goroutine
This handler allows running execution prior to actual serving in a separate
goroutine when serving requests. Doing so benefits cases in serving long running
requests because it allows freeing memory used by the separate goroutine
and keeps the serving routines slim.

Signed-off-by: Eric Lin <exlin@google.com>

Kubernetes-commit: 7b2698a5e5c61b303481c2006847409fc8704746
2023-10-10 08:53:26 +00:00
Abu Kashem b041969f97 apiserver: allow zero value for the 'nominalConcurrencyShares' field
Kubernetes-commit: 9fd2ab419ad771790d3cb80ea7b8e6828d9ce305
2023-10-27 19:26:08 -04:00
Abu Kashem 0b0a995736 apiserver: apf controller, bootstrap, tests should use flowcontrol v1 API
Kubernetes-commit: 17bda3c3e05a75943591f61f37d7fdc0d07870ec
2023-10-11 09:20:41 -04:00
Abu Kashem 28ed1d7ad4 fix data race in apf unit test
Kubernetes-commit: 52c58d970e54bf10b78512c68602f70b0a970f31
2023-09-22 14:42:43 -04:00
Abu Kashem d64c9b18da apf: remove RequestWaitLimit from queueset config
Kubernetes-commit: 11ef9514dad6f46a4315198978fee14132c4bbca
2023-08-29 12:11:08 -04:00
Abu Kashem a2e63604f2 apf: use context for queue wait
Kubernetes-commit: f39213a7e44f21a8cedcdf38d3c2531456a526d6
2023-08-28 17:01:16 -04:00
Andrew Sy Kim f00505bddc priority & fairness: support dynamically configuring work estimator max seats
Max seats from prioriy & fairness work estimator is now min(0.15 x
nominalCL, nominalCL/handSize)

'Max seats' calculated by work estimator is currently hard coded to 10.
When using lower values for --max-requests-inflight, a single
LIST request taking up 10 seats could end up using all if not most seats in
the priority level. This change updates the default work estimator
config such that 'max seats' is at most 10% of the
maximum concurrency limit for a priority level, with an upper limit of 10.
This ensures seats taken from LIST request is proportional to the total
available seats.

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

Kubernetes-commit: d3ef2d4fe95c3ef7b1c606ad01be1183659da391
2023-04-26 11:13:14 +00:00
Ben Luddy 302a5c27a6 Ensure timeout test handlers don't complete before timing out.
TestTimeoutRequestHeaders and TestTimeoutWithLogging are designed to
catch data races on request headers and include an HTTP handler that
triggers timeout then repeatedly mutates request headers. Sometimes,
the request header mutation loop could complete before the timeout
filter observed the timeout, resulting in a test failure. The mutation
loop now runs until the test ends.

Kubernetes-commit: e5a15c87e9d83ee19ba93aa356dfbb7b33a013c8
2023-06-07 12:48:33 -04:00
Wojciech Tyczyński 6c23e503a3 APF: Dynamically compute retry-after based on history
Kubernetes-commit: 23ac0fdaa52209c06eacf3613101174ea77ec42b
2023-04-20 10:18:48 +02:00
Wojciech Tyczyński 429762b215 Refactor APF handler in preparation for dynamic retryAfter
Kubernetes-commit: 16fecf3e76163ddb6d93199f5cf094fd9588b706
2023-04-18 20:34:25 +02:00
Abu Kashem 61a789ab70 apiserver: terminate watch with a rate limiter during shutdown
Kubernetes-commit: 6385b86a9b124eb03848af9a3029e8bc9058d72f
2023-01-13 18:04:13 -05:00
Abu Kashem 41067f8ef1 apiserver: fix APF tests, use T functions on the test goroutine
Kubernetes-commit: 62742db16b16449678c888490bfc141047a6939d
2023-02-10 09:49:27 -05:00
Abu Kashem cb855a88b8 apiserver: CVE-2022-1996, validate cors-allowed-origins server option
Kubernetes-commit: 841311ada2b0ba58e623a89e2e5ac74de0d94d8c
2023-01-20 13:54:02 -05:00
Patrick Ohly 8f8c30ff8f logging: fix names of keys
The stricter checking with the upcoming logcheck v0.4.1 pointed out these names
which don't comply with our recommendations in
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments.

Kubernetes-commit: bc6c7fa91201348d010b638fbadf32007c0ac546
2023-01-16 15:04:01 +01:00
Abu Kashem 9e60654b8a apiserver: refactor WithWaitGroup handler
Kubernetes-commit: 9093f126b87cb686784bb27b08be9eb12b4d5453
2023-01-10 15:55:19 -05:00
Abu Kashem c44ad6bb02 apiserver: refactor cors filter
Kubernetes-commit: ea251b5605c22d82962d4e699d933428e4c9c211
2022-11-03 09:29:47 -04:00
Abu Kashem 5b1e3f38d8 apiserver: refactor cors unit test
Kubernetes-commit: ae7327ab8eb2e05c3ccb185354eed247795bbc6d
2022-11-03 09:05:40 -04:00
Mike Spreitzer 770f2e1fa4 apiserver: finish implementation of borrowing in APF
Also make some design changes exposed in testing and review.

Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early.  This creates a problem, that metric can not
keep both of its old meanings.  I chose the configured concurrency
limit.

Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking.  The current design in the KEP is
as follows.

> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.

But this does not work out well at server startup.  As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state.  As always,
the two mandatory priority levels are implicitly added whenever they
are not read.  So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design.  So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels.  Its Target is higher than
all the others, once they start to show up.  It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.

I have considered the following fix.  The idea is that the designed
initialization is not appropriate before all the default objects are
read.  So the fix is to have a mode bit in the controller.  In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit.  In the later mode,
the seat demand tracking variables are initialized as originally
designed.

However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.

So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero.  Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.

Also: revise logging logic, to log at numerically lower V level when
there is a change.

Also: bug fix in float64close.

Also, separate imports in some file

Co-authored-by: Han Kang <hankang@google.com>

Kubernetes-commit: feb42277884bc7cfbd6f0bb1d875cc63b1b6caac
2022-10-31 16:13:25 -07:00
Tim Allclair 4b329cff47 Rename WithAuditID to WithAuditInit
Kubernetes-commit: ea28a21a6790d40c1fe540c64a296c8f0db17c65
2022-07-12 14:46:27 -07:00
Tim Allclair bd7c7f52c2 Consolidate AuditContext
Kubernetes-commit: f1d684b7b60b39b7dc1eb4156307c593f0ba74e1
2022-07-12 11:53:57 -07:00
Maciej Wyrzuc bfac2bc2b9 do not print status stack in case of timeout from timeout handler
Kubernetes-commit: 886648b820c10011350e7435a3105fd7d329c3c5
2022-09-10 10:13:11 +00:00
Abu Kashem 4ecff81419 rename assuredConcurrencyShares for flowcontrol v1beta3
Kubernetes-commit: 66fc0d703794f309c9715028d3b63f64c281a5fd
2022-09-21 15:40:33 -04:00
Abu Kashem 98ffe5507d apiserver: update apf logic to use v1beta3
Kubernetes-commit: 0a99e6ebb1e241bf421f6df44b15a5a16063a9f2
2022-09-10 07:26:31 -04:00
Davanum Srinivas 7e94033a61 Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>

Kubernetes-commit: a9593d634c6a053848413e600dadbf974627515f
2022-07-19 20:54:13 -04:00
Mike Spreitzer eb15930b31 Fix APF metric denominator problems
Co-authored-by: JUN YANG <yang.jun22@zte.com.cn>

Kubernetes-commit: fdd921cad0cd9308ec62c1b86c9c1cc5d12e5d21
2022-05-22 23:39:49 -04:00
Wojciech Tyczyński 8f7c120935 Eliminate MaintainObservations function in P&F
Kubernetes-commit: badf436ac4451590e5e84e537f2234e3632ea3b4
2021-11-25 12:44:50 +01:00
Mike Spreitzer 7aa625fb37 Make timeout test properly liberal
Make the test accept all the legitimate outcomes.

Expand the explanation of how TestPriorityAndFairnessWithPanicRecoveryAndTimeoutFilter/priority_level_concurrency_is_set_to_1,_queue_length_is_1,_first_request_should_time_out_and_second_(enqueued)_request_should_time_out_as_well is supposed to work.

Expand debug information that is available when the test fails.

Kubernetes-commit: 1f450695ffd5b2d028c87328b8b32630a8052129
2022-07-14 19:45:15 -04:00
Artur Żyliński e34c622d49 Add audit-id to storage traces
Refactor GetAuditIDTruncated to use context instead of request

Kubernetes-commit: b1e12b01b6c578da3eb593805b48e9d4a69efe54
2022-06-20 17:09:32 +02:00
Artur Żyliński 87b03dd4f5 Always log APF InitialSeats and FinalSeats values
Add apf_additionalLatency field, to have all WorkEstimate data

Kubernetes-commit: 962eb52be433bd1302210645d8cdbb0a6f6b8b24
2022-07-13 10:38:11 +02:00
Mike Spreitzer 959fbf9f84 Use timing ratio histograms instead of sample-and-watermark histograms
Kubernetes-commit: 0c0b7ca49f9ade72b990bf3a6f568485586af8b4
2022-05-18 02:56:48 -04:00
Mike Spreitzer c86ffebc09 Make sure metrics are registered in tests
Also, include metrics registration in server construction --- for
convenience.

Kubernetes-commit: 5ecf5f4ad30bbaac74a4fc87e8af06009ceb8dc0
2022-06-11 01:26:38 -04:00
Han Kang a414002089 cleanup deprecated metrics and usages
Kubernetes-commit: f223b900907b71431d7b6ceefa1642bb44fd9d84
2022-06-01 11:55:14 -07:00
Mike Spreitzer 0f5737dda8 Remove unhelpful pairing of members of read_vs_write_request_count_samples
Members are not used in (waiting,executing) pairs, so stopped
using the wrapper that adds such pairing.

Kubernetes-commit: cd33c7cf2260b351dd345497223a944e80bc7b61
2022-05-22 22:39:06 -04:00
Mike Spreitzer cae328fb1c Give apf metrics abstractions more familiar names
The logic is similar to Prometheus gauges and vectors,
adopt that terminology.

Kubernetes-commit: 7d64a93a1407f91b5e13bf540a0fa834a41622eb
2022-05-17 23:27:47 -04:00
Patrick Ohly ba3b8e9322 enhance and fix log calls
Some of these changes are cosmetic (repeatedly calling klog.V instead of
reusing the result), others address real issues:

- Logging a message only above a certain verbosity threshold without
  recording that verbosity level (if klog.V().Enabled() { klog.Info... }):
  this matters when using a logging backend which records the verbosity
  level.

- Passing a format string with parameters to a logging function that
  doesn't do string formatting.

All of these locations where found by the enhanced logcheck tool from
https://github.com/kubernetes/klog/pull/297.

In some cases it reports false positives, but those can be suppressed with
source code comments.

Kubernetes-commit: edffc700a43e610f641907290a5152ca593bad79
2022-02-16 12:17:47 +01:00
Maciej Wyrzuc 253e375283 Copy request in timeout handler
Kubernetes-commit: 44705c71401d327c6d596597adc55596973e89d0
2022-02-24 13:42:32 +00:00
Wojciech Tyczyński abc4243fac Record dropped requests in apiserver_request_total metric
Kubernetes-commit: 14396349954be57abea7162d7fe091e58a80ec4b
2022-03-23 16:16:36 +01:00
kerthcet 6316e03e25 fix: race detected in TestErrConnKilled
Signed-off-by: kerthcet <kerthcet@gmail.com>

Kubernetes-commit: dd75d3b9ecca72968bcb7ce50b39ec00e7415b41
2022-03-24 01:48:49 +08:00
jupblb c0c615eb7a Remove apf_fd from httplog
Since flowDistinguisher may hold data identifying a user accessing the
cluster this can be a source of a PII leak.

Kubernetes-commit: 94c92f78e5b02c27502f3b9d59b4e194e476a6f4
2022-03-10 12:59:00 +01:00
brianpursley 21a4aa1138 Fix wrong status code in unit test error messages.
Replace deprecated use of diff.ObjectReflectDiff() with cmp.Diff().

Kubernetes-commit: e9211d3279649795e40d9698f05e9752d111024a
2022-01-25 20:31:47 -05:00
Abu Kashem dc55a1a6cc fix flake in TestTimeoutHeaders
Kubernetes-commit: 2ae70e85d27ad30c29084b56572a817bc18b42e1
2022-02-07 10:34:20 -05:00
Jordan Liggitt 0edf32708d Fix header mutation race in timeout filter
Kubernetes-commit: 5b2a31f375755386b5cb2541b912f3561f7d6431
2022-01-04 22:57:29 -05:00
Abu Kashem 6bd59a523a apf: add a metric to count seat samples
Kubernetes-commit: bb15bdf15c1cc4d5a4380f3f6ed46d4adc9662a1
2021-11-23 11:36:09 -05:00
Abu Kashem b88c96a347 apf: add initial and final seats to httplog
Kubernetes-commit: be085b63455738d3f89fd804c84ae7ab0ac81008
2021-11-23 10:26:10 -05:00
Abu Kashem 1d83e4074a apf: ensure exempt request notes the classification
Kubernetes-commit: 8b2dd74c277d6a56a14e99830d39b23c5788c62e
2021-12-05 11:29:15 -05:00
Davanum Srinivas 56a3a30ae1 Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>

Kubernetes-commit: 9405e9b55ebcd461f161859a698b949ea3bde31d
2021-12-09 21:31:26 -05:00
Sergey Kanzhelev 95790548cb remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior
Kubernetes-commit: a11453efbc4a5575f7945af1c6fd4f7c00379529
2021-05-04 00:10:11 +00:00
Antonio Ojea 2f6960cc90 remove unused variable responseBodySize
Kubernetes-commit: 9336ff78f4a95cca8eb4a5cf528812d1bcac552c
2021-11-16 22:49:22 +01:00
Antonio Ojea 990b0d9a2e no lint unused variables
Kubernetes-commit: e82e0b38ffff895210fc6ce58bb347f77a828c01
2021-11-16 19:00:22 +01:00