* Implement a new shared "Drainer" handler.
This implements a new `http.Handler` called `Drainer`, which is intended to wrap some inner `http.Handler` business logic with a new outer handler that can respond to Kubelet probes (successfully until told to "Drain()").
This takes over the webhook's relatively new probe handling and lame duck logic with one key difference. Previously the webhook waited for a fixed period after SIGTERM before exitting, but the new logic waits for this same grace period AFTER THE LAST REQUEST. So if the handler keeps getting (non-probe) requests, the timer will continually reset, and once it stops receiving requests for the configured grace period, "Drain()" will return and the webhook will exit.
The goal of this work is to try to better cope with what we believe to be high tail latencies of the API server seeing that a webhook replica is shutting down.
Related: https://github.com/knative/pkg/issues/1509
* Switch to RWLock
* Kubelet probes would result in the webhook writing the HTTP status twice
Doesn't seem like it affected anything - just writes out some extra
log messages
* nits
* nits
* nits
* nits
* Implement the K8s lifecycle in webhook.
The webhook never properly implemented the Kubernetes SIGTERM/SIGKILL
lifecycle, and doesn't even really support readiness probes today. This
change enables folks to use a block like this on their webhook container:
```yaml
readinessProbe: &probe
periodSeconds: 1
httpGet:
scheme: HTTPS
port: 8443
httpHeaders:
- name: k-kubelet-probe
value: "webhook"
livenessProbe: *probe
```
With this, the webhook won't report as `Ready` until a probe has succeeded,
and when the SIGTERM is received, we will start failing probes for a grace
period (so our Endpoint drops) before shutting down the webhook's HTTP Server.
This was uncovered by running the webhook across 10 replicas in Serving with
the "Goose" (https://github.com/knative/pkg/pull/1316) enabled for the e2e
tests. The failure mode I saw was conversion webhook requests failing across
random tests.
This also moves the Serving probe-detection function into PKG.
* Increase the log level when we start to fail probes
* Wait for go routines to terminate on all paths.
* Add new callback pattern to pkg
* include the context
* typo
* Remove the empty instance of unstructured
* initialize the unstructured var
* Eliminate the unneeded pointer
* Pass a pointer to unstructured callback
* Create a validation specific context struct
* Move callback tests to own unit test case
* Switch from converting to decoding
* Update webhook/resourcesemantics/validation/validation.go
Co-Authored-By: Victor Agababov <vagababov@gmail.com>
* don't wrap context and include params
* split validation files
* include 2020 copyright
* include unit test for WithKubeClient
* Don't bother updating copyright date
* Inclue a unit test for panic
* Move dryRun to context
* Include context dry run unit test
* put the request operation in the context
* eliminate circular dep
* move kubeclient test out of context_test
* dont bother iterating callback map
* Callback takes a list of supported verbs
* Remove extra type
* Ensure Callback interface is public
* Alias Operation into validation
* alias Operation right in Webhook
* Update webhook/resourcesemantics/validation/validation_admit.go
Co-Authored-By: Victor Agababov <vagababov@gmail.com>
* Update webhook/resourcesemantics/validation/validation_admit_test.go
Co-Authored-By: Victor Agababov <vagababov@gmail.com>
* Update webhook/resourcesemantics/validation/validation_admit_test.go
Co-Authored-By: Victor Agababov <vagababov@gmail.com>
* Update webhook/resourcesemantics/validation/validation_admit.go
Co-Authored-By: Victor Agababov <vagababov@gmail.com>
* Update webhook/resourcesemantics/validation/validation_admit.go
Co-Authored-By: Victor Agababov <vagababov@gmail.com>
* Update webhook/resourcesemantics/validation/validation_admit_test.go
Co-Authored-By: Victor Agababov <vagababov@gmail.com>
* correct parens
* minor style fixes
* Rename Callback to Func
* Fix build error
* Switch callback to take a list with a factory
* keep descriptive names
* update comment
* Drop pointer, correct comments
* Add a unit test to disallow duplicate verbs
* fix comments, struct{} for set
* switch to variadic arg for NewCallback
Co-authored-by: Victor Agababov <vagababov@gmail.com>
* Start the webhook before informers sync.
Some webhooks (e.g. conversion) are required to list resources, so by delaying those until after informers have synced, we create a deadlock when they run in the same process. This change has two key parts:
1. Start the webhook immediately when our process starts, and issue a callback from sharedmain when the informers have synced.
2. Block `Admit` calls until informers have synced (all conversions are exempt), unless they have been designated by implementing `webhook.StatelessAdmissionController`.
Our built-in admission controllers (defaulting, validation, configmap validation) have all been marked as stateless, the main case where we want to block `Admit` calls is when we require the informer to have synchronized to populate indices for Bindings.
* Add missing err declaration
* ConversionController implementation
This controller will reconcile target CRDs with the correct
conversion webhook configuration. Specifically, the HTTP path and
CA bundle will be updated.
Additionally, the conversion controller will perform the given
conversions through a hub and spoke model utilizing the
apis.Convertible interface.
* Webhook now can host ConversionControllers
* injection/sharedmain now supports webhook.ConversionControllers
These conversion controllers will be hosted by the webhook that
the sharedmain will start
* support defaulting & include godoc
* Refactor webhook to allow adding conversion support
* pr feedback
* fix memory leak
* We can use mux.Handle
* move admission integration tests to separate file
By combining our validation logic into our mutating webhook we were previously allowing for mutating webhooks evaluated after our own to modify our resources into invalid shapes. There are no guarantees around ordering of mutating webhooks (that I could find), so the only way to remedy this properly is to split apart the two into separate webhook configurations:
- `defaulting`: which runs during the mutating admission webhook phase
- `validation`: which runs during the validating admission webhook phase.
The diagram in [this post](https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/) is very helpful in illustrating the flow of webhooks.
Fixes: https://github.com/knative/pkg/issues/847
GetCertificate allows us to start in TLS mode and dynamically fetch new certificates as they change. This will eventually allow us to decouple the cert creation process from the core webhook logic, and in a subsequent change service this from a secret lister cache.
This builds on https://github.com/knative/pkg/pull/817 and makes further
breaking changes. The options pertinent to each admission controller are
now passed to their respective constructors, which leads to a cleaner
options struct, and better prepares for greater webhook diversity.
* Stop using OwnerRefs for webhook config lifecycle
This changes the model by which we manage the lifecycle of our
`{mutating,validating}webhookconfiguration`, which previously used an owner ref
from the cluster-scoped configuration to the namespace-scoped Deployment. The
new model adds an explicit yaml file for the webhook, which omits the fields
filled in by the deployment as it starts.
A few notable elements of this change:
1. Clear out OwnerReferences explicitly (avoids the linked bug),
2. Periodically reruna `Register()` to ensure our webhook exists,
3. Simplified logic around registration (all we need now is update!).
Related: https://github.com/knative/serving/issues/5845
* Incorporate feedback from @dgerd and a few other nits I noticed.
* Prevent nil StatsReporter for existing webhook package consumers
* Pass StatsReporter by pointer and have tests test constructor
* Make constructor return error instead of panicking
* Move StatsReporter to ControllerOptions to consolidate constructors
* Add metrics to webhook package
Add metricstest package for shared helper functions for testing metrics
* Address PR
* Cleanup
* Fix import paths to fix build issues
* Fix import package path for test file
* Remove unnecessary formatting from error message
* Remove helper function only used once
* Add metric name to all error messages, make checkRowTags testing helper function
* Add common histogram bucket generator function to metrics package
* Fix CheckStatsNotReported check
* Reset metrics before each test so the tests are idempotent
* Make CheckStatsNotReported conditional clearer
* #457 Duck type user annotation logic
* #457 Duck type user annotation logic - tests
* #457 Revert updater annotation key from lastModifier to updater
* #457 Rename HasSpec#GetSpec() to HasSpec#GetUntypedSpec()
* #457 Fix some indentation
* #457 Get group for user info annotations from the request
* #457 Reduce confusuion in webhook testing by using same group
* have simple tests. working on impl.
* strict setting, reflection based.
* ran codegen.
* adding license.
* update based on feedback and merge better.
* getting closer to something simpler assuming shallow reflect.
* adding validation test.
* use the json tag.
* Golang things nil typed pointers are not nil.
* Use real value of reflect invalid.
* add a missing test.
* two methods, one for update, one for single check.
* checkdep is now in apis.
* fix pkg.
* Update apis/deprecated_test.go
Co-Authored-By: n3wscott <32305648+n3wscott@users.noreply.github.com>
* add code clarity.
* include inlined struct objects recursively.
* Update commnets and add a flatten error test for inlined.
This deprecates the `apis.Immutable` and `apis.Annotatable` interfaces,
which were both awkward niche extensions of `apis.Validatable` and
`apis.SetDefaults` for specific contexts that the former set didn't
cover well.
With this change, the expectation is that types that want to check
for immutability will instead access the "baseline" object via the
context from within updates. For example:
```
func (new *Type) Validate(ctx context.Context) *apis.FieldError {
if apis.IsInUpdate(ctx) {
old := apis.GetBaseline(ctx).(*Type)
// Update specific validation based on new and old.
}
}
```
For applying user annotations, the type writer can write:
```
func (new *Type) SetDefaults(ctx context.Context) {
if apis.IsInCreate(ctx) {
ui := apis.GetUserInfo(ctx)
// Set creator annotation from ui
}
if apis.IsInUpdate(ctx) {
ui := apis.GetUserInfo(ctx)
old := apis.GetBaseline(ctx).(*Type)
// Compare old.Spec vs. new.Spec and on changes
// update the "updater" annotation from ui.
}
}
```
One of the key motivations for this refactoring was to enable us
to do more powerful validation in `apis.Validate` beyond the niche
of immutability checking (and without introducing yet-another
one-off niche interface). In the BYO Revision name PoC I abused
`apis.Immutable` to do more arbitrary before/after validation,
which with this can simply be a part of `apis.Validatable`.
See: https://github.com/knative/serving/pull/3562
The general stance on deprecating interfaces such as these will be
to deprecate them in a non-breaking way (via a comment for now). They
will be hollowed out when the functionality is removed from the webhook,
but left in because of diamond dependency problems. In this change
we remove the `apis.Annotatable` functionality and deprecate the
`apis.Immutable` functionality.