* webhook: add options to disable resource_namespace tag in metrics
To add some context, historically, `resource_name` was removed from this
tag list due to its high potential of causing high metrics cardinality.
See [knative/pkg#1464][1] for more information.
While that's great, but it might not be sufficient for large scale use
cases where namespaces can be super dynamic (with generateName, too) or
grows fase enough. There is an issue report from
[tektoncd/pipeline#3171][2] which talks about this.
This proposal makes it possible to disable `resource_namespace` tag via
an option function. The default behavior is not changed, so no user
impact if any of existing users rely on this tag. There is no API
contract change either due to the beauty of variadic functions.
Now downstream projects can consume this by override `StatsReporter` in
webhook context options with their own preference. As a caveat here, if
downstream project does choose to override `StatsReporter`, the default
`ReportMetrics` function shouldn't be called by default as they may now
have a different set of tag keys to report. As such, this function is
only called if the default `StatsReporter` is used.
[1]: https://github.com/knative/pkg/pull/1464
[2]: https://github.com/tektoncd/pipeline/issues/3171
* webhook: add StatsReporterOptions in webhook.Options
There are two ways to customize StatsReporter:
1. Use a whole new StatsReporter implementation.
1. Or pass Option funcs to customize the default StatsReporter.
Option 1 is less practical at this time due to the metrics registration
conflict. `webhook.RegisterMetrics()` is called regardless which
StatsReporter implementation is used (which is a problem by itself). The
second option is more practical since it works well without dealing with
metrics conflicts.
The `webhook.Option` in particular allows people to discard certain
metrics tags.
Common use cases for this webhook involve using Kubernetes's
generateName API to randomise resource names (this is a good
idea in Tekton pipelines, for example, where there are uniqueness
constraints. That means that the webhook metrics here end up with
very high cardinality, which makes Prometheus fall over. Even
without generateName, it is possible to shoot oneself in the foot.
This commit just removes the resource_name label altogether.
* do not record for empty metric config
* Revert "do not record for empty metric config"
This reverts commit 539a5e4dbb.
* add a comment
* fix typo
* fix tests
* revert
* revert tests
* revert
* fix conflicts
* one more test file
With this change folks will need to call `webhook.RegisterMetrics()` to register the opencensus view with the metrics for the webhook's StatsReporter. This is needed to avoid having `sharedmain` crashloop the activator due to linking multiple on-`init()` views that register metrics named `request_latencies`.
In general, I believe that we should move away from registering these views via `init()` and more towards the broader K8s MetricsProvider pattern.
* Add metrics to webhook package
Add metricstest package for shared helper functions for testing metrics
* Address PR
* Cleanup
* Fix import paths to fix build issues
* Fix import package path for test file
* Remove unnecessary formatting from error message
* Remove helper function only used once
* Add metric name to all error messages, make checkRowTags testing helper function
* Add common histogram bucket generator function to metrics package
* Fix CheckStatsNotReported check
* Reset metrics before each test so the tests are idempotent
* Make CheckStatsNotReported conditional clearer