* webhook: add options to disable resource_namespace tag in metrics
To add some context, historically, `resource_name` was removed from this
tag list due to its high potential of causing high metrics cardinality.
See [knative/pkg#1464][1] for more information.
While that's great, but it might not be sufficient for large scale use
cases where namespaces can be super dynamic (with generateName, too) or
grows fase enough. There is an issue report from
[tektoncd/pipeline#3171][2] which talks about this.
This proposal makes it possible to disable `resource_namespace` tag via
an option function. The default behavior is not changed, so no user
impact if any of existing users rely on this tag. There is no API
contract change either due to the beauty of variadic functions.
Now downstream projects can consume this by override `StatsReporter` in
webhook context options with their own preference. As a caveat here, if
downstream project does choose to override `StatsReporter`, the default
`ReportMetrics` function shouldn't be called by default as they may now
have a different set of tag keys to report. As such, this function is
only called if the default `StatsReporter` is used.
[1]: https://github.com/knative/pkg/pull/1464
[2]: https://github.com/tektoncd/pipeline/issues/3171
* webhook: add StatsReporterOptions in webhook.Options
There are two ways to customize StatsReporter:
1. Use a whole new StatsReporter implementation.
1. Or pass Option funcs to customize the default StatsReporter.
Option 1 is less practical at this time due to the metrics registration
conflict. `webhook.RegisterMetrics()` is called regardless which
StatsReporter implementation is used (which is a problem by itself). The
second option is more practical since it works well without dealing with
metrics conflicts.
The `webhook.Option` in particular allows people to discard certain
metrics tags.
* Support fetching configs from `ctx`.
This enables callers of `sharedmain` to infuse `ctx` with a `logging.Config` or `leaderelection.Config` instead of relying on the API server for it.
* Move context methods into appropriate packages, drop stutter
* Disable controllers using a flag
This patch allows disabling controllers when using a new `Main` function
called `MainNamed` which takes a list of `NamedControllerConstructor`.
Signed-off-by: Pierangelo Di Pilato <pdipilat@redhat.com>
* Apply reviews
Signed-off-by: Pierangelo Di Pilato <pdipilat@redhat.com>
* Add comments, split filter and conversion
Signed-off-by: Pierangelo Di Pilato <pdipilat@redhat.com>
* Allocate len(ctors)
Signed-off-by: Pierangelo Di Pilato <pdipilat@redhat.com>
* to disable -> disabling
Signed-off-by: Pierangelo Di Pilato <pdipilat@redhat.com>
* consolidate k8s flags to an environment package
* add copyright
* fix comment style
* use go1.16 in workflows so downstream tests work
* Deprecate GetRESTConfig and do not remove
* update copyright date
* Shush autogomaxproc.
I am not as sure about this change, but those logs are not following the standard formt
and given we need to inject the logger in init() makes this next to impossible.
Corollary, is that we set cpu limits on all our binaries, except QP, so it only mattered there
since otherwise the package logs nothing.
* m
* put EnableInjectionOrDie back on the main path
* nil check pointer type
* move enable injeciton out of sharedmain
* lint
* nit picking fmt....
* add documentation
* feedback cleanup
* injection.GetRESTConfig
* redirect code that moved:
The EnableInjectionOrDie starts informers immedidately in a go
routine, which changes the order of informer start and controller
setup. As a result setting resync period to informer handler will
be ignored as the informer is already started. Though this will
likely affect more components, changing the EnableInjectionOrDie
function will break the API, so this is a temporary solution to
fix all the controllers depending on sharedmain.
* Enable HA by default.
This consolidates the core of sharedmain around the new leaderelection logic, which will now be **enabled by default**.
This can now be disabled with `--disable-ha` or by passing `sharedmain.WithHADisabled(ctx)` to `sharedmain.MainWithConfig`.
* vagababov comments, build failure
* Open an issue for enabledComponents removal.
* Move the configmap watcher startup.
This race was uncovered by the chaos duck on knative/serving! When we have enabled a feature flag, e.g. multi-container, and the webhook pods are restarted, there is a brief window where the webhook is up and healthy before the configmaps have synchronized and the new webhook pod realizes the feature is enabled.
* Drop the import alias
This change allows for (just webhook for now) controllers going through sharedmain to opt into Yanwei's logic by setting several environment variables.
I was able to pull this change in downstream and change the webhook to use a StatefulSet with the following environment:
```
+ # These settings are used for statefulset-based
+ # leader selection.
+ - name: CONTROLLER_ORDINAL
+ valueFrom:
+ fieldRef:
+ fieldPath: metadata.name
+ - name: STATEFUL_SERVICE_NAME
+ value: "webhook"
```
Running the above with 10 replicas and 10 buckets worked as intended (keys were evenly distributed across the replicas).
- use more performant functions (mostly remove formatters were possible)
- move zap.Error() to the call site, rather than creating a new sugared logger with a key attached (not cheap)
- fix *w usages where the key was not provided.
This makes the config used to start all the clients available on the context starting the controllers. That enables us to use that config to create secondary clients in the controllers with the same config.
* Change StartAll to take context.
This has bugged me since we started using `ctx`, which containers a `stopCh` of sorts as `Done()`. This is somewhat for consistency, but by using `ctx` explicitly we enable ourselves to take advantage of more contextual information.
I did a quick scan of call sites and the good news is that the `sharedmain` change should be the place through which the vast majority of calls occur, however, the one outlier here is the KPA which calls this manually. I will stage a PR to manually import pkg into serving to fix this once this lands.
* Add a Run shim for back-compat
* Retry to get logging configmap when failed
When deploying pods with Istio sidecar, it takes a few seconds until
network is configured. In fact, Serving's system pods always fail when
Istio sidecar is enbled.
To fix it, this patch changes to retry to get logging configmap for 5
seconds.
* Add IsNotFound error check
* Remove if error condition
* Start the webhook before informers sync.
Some webhooks (e.g. conversion) are required to list resources, so by delaying those until after informers have synced, we create a deadlock when they run in the same process. This change has two key parts:
1. Start the webhook immediately when our process starts, and issue a callback from sharedmain when the informers have synced.
2. Block `Admit` calls until informers have synced (all conversions are exempt), unless they have been designated by implementing `webhook.StatelessAdmissionController`.
Our built-in admission controllers (defaulting, validation, configmap validation) have all been marked as stateless, the main case where we want to block `Admit` calls is when we require the informer to have synchronized to populate indices for Bindings.
* Add missing err declaration
* Add leader election config and to sharedmain
* Add new dependencies
* Extract method for RunLeaderElected
* Make leader election config constructor validate
* Rename leader election files
* Always start profiling server whether component has LE lock or not
* Fix entering unreachable section when leader election is disabled
* Address PR feedback
* Revert "Add support for client TLS to pkg/metrics (#1045)"
This reverts commit 945b556708.
* Roll forward "Add support for client TLS to pkg/metrics (#1045)"
Adds support for client TLS certs for opencensus export
* Switch sharedmain to only pass a getter rather than an all-namespace lister.
* Add a TODO about using a cached copy if this generates undue load
* Update deps per build failure.
* Refactor per @anniefu suggestion