[serving] OTel Metrics Documentation (#6352)

* Updating collecting metrics steps * remove collector installation steps and separate out shared metrics into a snippet * combine notes * update nav to remove stutter * include webhook metrics * update common metrics with attributes and include serving metrics * update notice banners for eventing and serving * fix typos * Update docs/eventing/observability/metrics/collecting-metrics.md Co-authored-by: Calum Murray <cmurray@redhat.com> --------- Co-authored-by: Calum Murray <cmurray@redhat.com>
2025-09-03 14:48:44 -04:00 · 2025-09-03 14:48:44 -04:00 · b936c72e46
parent 31cca91976
commit b936c72e46
11 changed files with 362 additions and 532 deletions
--- a/config/nav.yml
+++ b/config/nav.yml
@ -188,7 +188,7 @@ nav:
        - Configuring logging: serving/observability/logging/config-logging.md
        - Configuring Request logging: serving/observability/logging/request-logging.md
        - Collecting metrics: serving/observability/metrics/collecting-metrics.md
-        - Knative Serving metrics: serving/observability/metrics/serving-metrics.md
+        - Metrics Reference: serving/observability/metrics/serving-metrics.md
      # Serving - troubleshooting
      - Troubleshooting:
        - Debugging application issues: serving/troubleshooting/debugging-application-issues.md
@ -307,7 +307,7 @@ nav:
        - Collecting logs: eventing/observability/logging/collecting-logs.md
        - Configuring logging: eventing/observability/logging/config-logging.md
        - Collecting metrics: eventing/observability/metrics/collecting-metrics.md
-        - Knative Eventing metrics: eventing/observability/metrics/eventing-metrics.md
+        - Metrics Reference: eventing/observability/metrics/eventing-metrics.md
      - Features:
        - About Eventing features: eventing/features/README.md
        - DeliverySpec.Timeout field: eventing/features/delivery-timeout.md
--- a/docs/eventing/observability/metrics/collecting-metrics.md
+++ b/docs/eventing/observability/metrics/collecting-metrics.md
@ -6,3 +6,40 @@ function: how-to
 ---

 --8<-- "collecting-metrics.md"
+
+### Enabling Metric Collection
+
+1. To enable prometheus metrics collection you will want to update `config-observability` ConfigMap and set the `metrics-protocol` to `prometheus`.
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: config-observability
+  namespace: knative-eventing
+data:
+    # metrics-protocol field specifies the protocol used when exporting metrics
+    # It supports either 'none' (the default), 'prometheus', 'http/protobuf' (OTLP HTTP), 'grpc' (OTLP gRPC)
+    metrics-protocol: prometheus
+
+    tracing-protocol:      http/protobuf
+    tracing-endpoint:      http://jaeger-collector.observability.svc:4318/v1/traces
+    tracing-sampling-rate: "1"
+```
+
+### Apply the Eventing Service/Pod Monitors
+
+1. Apply the ServiceMonitors/PodMonitors to collect metrics from Knative Eventing Control Plane
+
+    ```bash
+    kubectl apply -f https://raw.githubusercontent.com/knative-extensions/monitoring/main/config/eventing-monitors.yaml
+    ```
+### Import Grafana dashboards
+
+1. Grafana dashboards can be imported from the [`monitoring` repository](https://github.com/knative-extensions/monitoring).
+
+1. If you are using the Grafana Helm Chart with the dashboard sidecar enabled (the default), you can load the dashboards by applying the following configmaps.
+
+    ```bash
+    kubectl apply -f https://raw.githubusercontent.com/knative-extensions/monitoring/main/config/configmap-eventing-dashboard.yaml
+    ```
--- a/docs/eventing/observability/metrics/collector.yaml
+++ b/docs/eventing/observability/metrics/collector.yaml
@ -1,96 +0,0 @@
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: otel-collector-config
-  namespace: metrics
-data:
-  collector.yaml: |
-    receivers:
-      opencensus:
-        endpoint: "0.0.0.0:55678"
-
-    exporters:
-      logging:
-      prometheus:
-        endpoint: "0.0.0.0:8889"
-    extensions:
-        health_check:
-        pprof:
-        zpages:
-    service:
-      extensions: [health_check, pprof, zpages]
-      pipelines:
-        metrics:
-          receivers: [opencensus]
-          processors: []
-          exporters: [prometheus]
---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: otel-collector
-  namespace: metrics
-  labels:
-    app: otel-collector
-spec:
-  selector:
-    matchLabels:
-      app: otel-collector
-  replicas: 1  # This can be increased for a larger system.
-  template:
-    metadata:
-      labels:
-        app: otel-collector
-    spec:
-      containers:
-      - name: collector
-        args:
-        - --config=/conf/collector.yaml
-        image: otel/opentelemetry-collector:latest
-        resources:
-          requests:  # Note: these are suitable for a small instance, but may need to be increased for a large instance.
-            memory: 100Mi
-            cpu: 50m
-        ports:
-        - name: otel
-          containerPort: 55678
-        - name: prom-export
-          containerPort: 8889
-        - name: zpages  # A /debug page
-          containerPort: 55679
-        volumeMounts:
-          - mountPath: /conf
-            name: config
-      volumes:
-      - name: config
-        configMap:
-          name: otel-collector-config
-          items:
-            - key: collector.yaml
-              path: collector.yaml
---
-apiVersion: v1
-kind: Service
-metadata:
-  name: otel-collector
-  namespace: metrics
-spec:
-  selector:
-    app: "otel-collector"
-  ports:
-  - port: 55678
-    name: otel
---
-apiVersion: v1
-kind: Service
-metadata:
-  name: otel-export
-  namespace: metrics
-  labels:
-    app: otel-export
-spec:
-  selector:
-    app: otel-collector
-  ports:
-  - port: 8889
-    name: prom-export
--- a/docs/eventing/observability/metrics/eventing-metrics.md
+++ b/docs/eventing/observability/metrics/eventing-metrics.md
@ -5,7 +5,15 @@ components:
 function: reference
 ---

-# Knative Eventing metrics
+# Knative Eventing Metrics
+
+!!! warning
+
+    The metrics below have not been updated to reflect our migration from 
+    OpenCensus to OpenTelemetry. We are in the process of updating them.
+
+    These metrics may change as we flush out our migration from OpenCensus 
+    to OpenTelemetry.

 Administrators can view metrics for Knative Eventing components.

@ -42,11 +50,6 @@ By aggregating the metrics over the http code, events can be separated into two
 | event_count | Number of events dispatched by the in-memory channel | Counter | container_name<br>event_type=<br>namespace_name=<br>response_code<br>response_code_class<br>unique_name | Dimensionless | Stable
 | event_dispatch_latencies | The time spent dispatching an event from a in-memory Channel | Histogram | container_name<br>event_type<br>namespace_name=<br>response_code<br>response_code_class<br>unique_name | Milliseconds | Stable

-!!! note
-    A number of metrics eg. controller, Go runtime and others are omitted here as they are common
-    across most components. For more about these metrics check the
-    [Serving metrics API section](../../../serving/observability/metrics/serving-metrics.md).
-
 ## Eventing sources

 Eventing sources are created by users who own the related system, so they can trigger applications with events.
@ -57,3 +60,5 @@ to verify that events have been delivered from the source side, thus verifying t
 |:-|:-|:-|:-|:-|:-|
 | event_count | Number of events sent by the source | Counter | event_source<br>event_type<br>name<br>namespace_name<br>resource_group<br>response_code<br>response_code_class<br>response_error<br>response_timeout | Dimensionless  | Stable |
 | retry_event_count | Number of events sent by the source in retries | Counter | event_source<br>event_type<br>name<br>namespace_name<br>resource_group<br>response_code<br>response_code_class<br>response_error<br>response_timeout | Dimensionless | Stable
+
+--8<-- "observability-shared-metrics.md"
--- a/docs/eventing/observability/metrics/system-diagram.svg
+++ b/docs/eventing/observability/metrics/system-diagram.svg
--- a/docs/serving/observability/metrics/collecting-metrics.md
+++ b/docs/serving/observability/metrics/collecting-metrics.md
@ -6,3 +6,46 @@ function: how-to
 ---

 --8<-- "collecting-metrics.md"
+
+
+### Enabling Metric Collection
+
+1. To enable prometheus metrics collection you will want to update `config-observability` ConfigMap and set the `metrics-protocol` to `prometheus`. For request-metrics we recommend setting up pushing metrics to prometheus. This requires enabling the Prometheus OLTP receiver. This is already configured in our monitoring example.
+
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: config-observability
+  namespace: knative-serving
+data:
+    # metrics-protocol field specifies the protocol used when exporting metrics
+    # It supports either 'none' (the default), 'prometheus', 'http/protobuf' (OTLP HTTP), 'grpc' (OTLP gRPC)
+    metrics-protocol: prometheus
+
+    # request-metrics-protocol
+    request-metrics-protocol: http/protobuf
+    request-metrics-endpoint: http://knative-kube-prometheus-st-prometheus.observability.svc:9090/api/v1/otlp/v1/metrics
+
+    tracing-protocol:      http/protobuf
+    tracing-endpoint:      http://jaeger-collector.observability.svc:4318/v1/traces
+    tracing-sampling-rate: "1"
+
+```
+
+1. Apply the ServiceMonitors/PodMonitors to collect metrics from Knative Serving Control Plane.
+
+    ```bash
+    kubectl apply -f https://raw.githubusercontent.com/knative-extensions/monitoring/main/config/serving-monitors.yaml
+    ```
+
+### Import Grafana dashboards
+
+1. Grafana dashboards can be imported from the [`monitoring` repository](https://github.com/knative-extensions/monitoring).
+
+1. If you are using the Grafana Helm Chart with the dashboard sidecar enabled (the default), you can load the dashboards by applying the following configmaps.
+
+    ```bash
+    kubectl apply -f https://raw.githubusercontent.com/knative-extensions/monitoring/main/config/configmap-serving-dashboard.yaml
+    ```
--- a/docs/serving/observability/metrics/collector.yaml
+++ b/docs/serving/observability/metrics/collector.yaml
@ -1,96 +0,0 @@
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: otel-collector-config
-  namespace: metrics
-data:
-  collector.yaml: |
-    receivers:
-      opencensus:
-        endpoint: "0.0.0.0:55678"
-
-    exporters:
-      logging:
-      prometheus:
-        endpoint: "0.0.0.0:8889"
-    extensions:
-        health_check:
-        pprof:
-        zpages:
-    service:
-      extensions: [health_check, pprof, zpages]
-      pipelines:
-        metrics:
-          receivers: [opencensus]
-          processors: []
-          exporters: [prometheus]
---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: otel-collector
-  namespace: metrics
-  labels:
-    app: otel-collector
-spec:
-  selector:
-    matchLabels:
-      app: otel-collector
-  replicas: 1  # This can be increased for a larger system.
-  template:
-    metadata:
-      labels:
-        app: otel-collector
-    spec:
-      containers:
-      - name: collector
-        args:
-        - --config=/conf/collector.yaml
-        image: otel/opentelemetry-collector:latest
-        resources:
-          requests:  # Note: these are suitable for a small instance, but may need to be increased for a large instance.
-            memory: 100Mi
-            cpu: 50m
-        ports:
-        - name: otel
-          containerPort: 55678
-        - name: prom-export
-          containerPort: 8889
-        - name: zpages  # A /debug page
-          containerPort: 55679
-        volumeMounts:
-          - mountPath: /conf
-            name: config
-      volumes:
-      - name: config
-        configMap:
-          name: otel-collector-config
-          items:
-            - key: collector.yaml
-              path: collector.yaml
---
-apiVersion: v1
-kind: Service
-metadata:
-  name: otel-collector
-  namespace: metrics
-spec:
-  selector:
-    app: "otel-collector"
-  ports:
-  - port: 55678
-    name: otel
---
-apiVersion: v1
-kind: Service
-metadata:
-  name: otel-export
-  namespace: metrics
-  labels:
-    app: otel-export
-spec:
-  selector:
-    app: otel-collector
-  ports:
-  - port: 8889
-    name: prom-export
--- a/docs/serving/observability/metrics/serving-metrics.md
+++ b/docs/serving/observability/metrics/serving-metrics.md
@ -5,105 +5,164 @@ components:
 function: reference
 ---

-# Knative Serving metrics
+# Knative Serving Metrics

 Administrators can monitor Serving control plane based on the metrics exposed by each Serving component.
-Metrics are listed next.
+
+!!! note
+
+    These metrics may change as we flush out our migration from OpenCensus to OpenTelemetry
+
+## Queue Proxy
+
+The queue proxy is the per-pod sidecar that enforces container concurrency and provides metrics to the autoscaler. The following metrics provide you insights into queued
+requests and user-container behavior.
+
+###  `kn.queueproxy.depth`
+
+**Instrument Type:** Int64Gauge
+
+**Unit (UCUM):** {item}
+
+**Description:** Number of current items in the queue proxy queue
+
+### `kn.queueproxy.app.duration`
+
+**Instrument Type:** Float64Histogram
+
+**Unit (UCUM):** s
+
+**Description:** The duration of the task execution

 ## Activator

 The following metrics can help you to understand how an application responds when traffic passes through the activator. For example, when scaling from zero, high request latency might mean that requests are taking too much time to be fulfilled.

-| Metric Name | Description | Type | Tags | Unit | Status |
-|:-|:-|:-|:-|:-|:-|
-| ```request_concurrency``` | Concurrent requests that are routed to Activator<br>These are requests reported by the concurrency reporter which may not be done yet.<br> This is the average concurrency over a reporting period | Gauge | ```configuration_name```<br>```container_name```<br>```namespace_name```<br>```pod_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```request_count``` | The number of requests that are routed to Activator.<br>These are requests that have been fulfilled from the activator handler. | Counter | ```configuration_name```<br>```container_name```<br>```namespace_name```<br>```pod_name```<br>```response_code```<br>```response_code_class```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```request_latencies``` | The response time in millisecond for the fulfilled routed requests | Histogram | ```configuration_name```<br>```container_name```<br>```namespace_name```<br>```pod_name```<br>```response_code```<br>```response_code_class```<br>```revision_name```<br>```service_name``` | Milliseconds | Stable |
+
+### `kn.revision.request.concurrency`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** {request}
+
+**Description:** Concurrent requests that are routed to the Activator
+
+The following attributes are included with the metrics below
+
+Name | Type | Description
+-|-|-
+`k8s.namespace.name` | string | Namespace of the Revision
+`kn.service.name` | string | Knative Service name associated with this Revision
+`kn.configuration.name` | string | Knative Configuration name associated with this Revision
+`kn.revision.name` | string | The name of the Revision

 ## Autoscaler

 Autoscaler component exposes a number of metrics related to its decisions per revision. For example, at any given time, you can monitor the desired pods the Autoscaler wants to allocate for a Service, the average number of requests per second during the stable window, or whether autoscaler is in panic mode (KPA).

-| Metric Name | Description | Type | Tags | Unit | Status |
-|:-|:-|:-|:-|:-|:-|
-| ```desired_pods``` | Number of pods autoscaler wants to allocate | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```excess_burst_capacity``` | Excess burst capacity overserved over the stable window | Gauge |  ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```stable_request_concurrency``` | Average of requests count per observed pod over the stable window | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```panic_request_concurrency``` | Average of requests count per observed pod over the panic window | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```target_concurrency_per_pod``` | The desired number of concurrent requests for each pod | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```stable_requests_per_second``` | Average requests-per-second per observed pod over the stable window | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```panic_requests_per_second``` | Average requests-per-second per observed pod over the panic window | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```target_requests_per_second``` | The desired requests-per-second for each pod | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```panic_mode``` | 1 if autoscaler is in panic mode, 0 otherwise | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```requested_pods``` | Number of pods autoscaler requested from Kubernetes | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```actual_pods``` | Number of pods that are allocated currently in ready state | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` |  Dimensionless | Stable |
-| ```not_ready_pods``` | Number of pods that are not ready currently | Gauge | ```configuration_name=```<br>```namespace_name=```<br>```revision_name```<br>```service_name``` |  Dimensionless | Stable |
-| ```pending_pods``` | Number of pods that are pending currently | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name``` | Dimensionless | Stable |
-| ```terminating_pods``` | Number of pods that are terminating currently | Gauge | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name<br>``` | Dimensionless | Stable |
-| ```scrape_time``` | Time autoscaler takes to scrape metrics from the service pods in milliseconds | Histogram | ```configuration_name```<br>```namespace_name```<br>```revision_name```<br>```service_name```<br> | Milliseconds | Stable |
+### `kn.autoscaler.scrape.duration`

-## Controller
+**Instrument Type:** Float64Histogram

-The following metrics are emitted by any component that implements a controller logic.
-The metrics show details about the reconciliation operations and the workqueue behavior on which
-reconciliation requests are enqueued.
+**Unit (UCUM):** s

-| Metric Name | Description | Type | Tags | Unit | Status |
-|:-|:-|:-|:-|:-|:-|
-| ```work_queue_depth``` | Depth of the work queue | Gauge | ```reconciler``` | Dimensionless | Stable |
-| ```reconcile_count``` | Number of reconcile operations | Counter | ```reconciler```<br>```success```<br> | Dimensionless | Stable |
-| ```reconcile_latency``` | Latency of reconcile operations | Histogram | ```reconciler```<br>```success```<br> | Milliseconds | Stable |
-| ```workqueue_adds_total``` | Total number of adds handled by workqueue | Counter | ```name``` | Dimensionless | Stable |
-| ```workqueue_depth``` | Current depth of workqueue | Gauge | ```reconciler``` | Dimensionless | Stable |
-| ```workqueue_queue_latency_seconds``` | How long in seconds an item stays in workqueue before being requested | Histogram | ```name``` | Seconds | Stable |
-| ```workqueue_retries_total``` | Total number of retries handled by workqueue | Counter | ```name``` | Dimensionless | Stable |
-| ```workqueue_work_duration_seconds``` | How long in seconds processing an item from a workqueue takes. | Histogram | ```name``` | Seconds| Stable |
-| ```workqueue_unfinished_work_seconds``` | How long in seconds the outstanding workqueue items have been in flight (total). | Histogram | ```name``` | Seconds | Stable |
-| ```workqueue_longest_running_processor_seconds``` | How long in seconds the longest outstanding workqueue item has been in flight | Histogram | ```name``` | Seconds | Stable |
+**Description:** The duration of scraping the revision

-## Webhook
+### `kn.revision.pods.desired`

-Webhook metrics report useful info about operations. For example, if a large number of operations fail, this could indicate an issue with a user-created resource.
+**Instrument Type:** Int64Gauge

-| Metric Name | Description | Type | Tags | Unit | Status |
-|:-|:-|:-|:-|:-|:-|
-| ```request_count``` | The number of requests that are routed to webhook | Counter |  ```admission_allowed```<br>```kind_group```<br>```kind_kind```<br>```kind_version```<br>```request_operation```<br>```resource_group```<br>```resource_namespace```<br>```resource_resource```<br>```resource_version``` | Dimensionless | Stable |
-| ```request_latencies``` | The response time in milliseconds | Histogram |  ```admission_allowed```<br>```kind_group```<br>```kind_kind```<br>```kind_version```<br>```request_operation```<br>```resource_group```<br>```resource_namespace```<br>```resource_resource```<br>```resource_version``` | Milliseconds | Stable |
+**Unit (UCUM):** {item}

-## Go Runtime - memstats
+**Description:** Number of pods the autoscaler wants to allocate

-Each Knative Serving control plane process emits a number of Go runtime [memory statistics](https://golang.org/pkg/runtime/#MemStats) (shown next).
-As a baseline for monitoring purposes, user could start with a subset of the metrics: current allocations (go_alloc), total allocations (go_total_alloc), system memory (go_sys), mallocs (go_mallocs), frees (go_frees) and garbage collection total pause time (total_gc_pause_ns), next gc target heap size (go_next_gc) and number of garbage collection cycles (num_gc).
+### `kn.revision.capacity.excess`

-| Metric Name | Description | Type | Tags | Unit | Status |
-|:-|:-|:-|:-|:-|:-|
-| ```go_alloc``` | The number of bytes of allocated heap objects (same as heap_alloc) | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_total_alloc``` | The cumulative bytes allocated for heap objects | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_sys``` | The total bytes of memory obtained from the OS | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_lookups``` | The number of pointer lookups performed by the runtime | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_mallocs``` | The cumulative count of heap objects allocated | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_frees``` | The cumulative count of heap objects freed | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_heap_alloc``` | The number of bytes of allocated heap objects | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_heap_sys``` | The number of bytes of heap memory obtained from the OS | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_heap_idle``` | The number of bytes in idle (unused) spans | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_heap_in_use``` | The number of bytes in in-use spans | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_heap_released``` | The number of bytes of physical memory returned to the OS | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_heap_objects``` | The number of allocated heap objects | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_stack_in_use``` | The number of bytes in stack spans | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_stack_sys``` | The number of bytes of stack memory obtained from the OS | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_mspan_in_use``` | The number of bytes of allocated mspan structures | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_mspan_sys``` | The number of bytes of memory obtained from the OS for mspan structures | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_mcache_in_use``` | The number of bytes of allocated mcache structures | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_mcache_sys``` | The number of bytes of memory obtained from the OS for mcache structures | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_bucket_hash_sys``` | The number of bytes of memory in profiling bucket hash tables. | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_gc_sys``` | The number of bytes of memory in garbage collection metadata | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_other_sys``` | The number of bytes of memory in miscellaneous off-heap runtime allocations | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_next_gc``` | The target heap size of the next GC cycle | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_last_gc``` | The time the last garbage collection finished, as nanoseconds since 1970 (the UNIX epoch) | Gauge | ```name``` | Nanoseconds | Stable |
-| ```go_total_gc_pause_ns``` | The cumulative nanoseconds in GC stop-the-world pauses since the program started | Gauge | ```name``` | Nanoseconds | Stable |
-| ```go_num_gc``` | The number of completed GC cycles. | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_num_forced_gc``` | The number of GC cycles that were forced by the application calling the GC function. | Gauge | ```name``` | Dimensionless | Stable |
-| ```go_gc_cpu_fraction``` | The fraction of this program's available CPU time used by the GC since the program started | Gauge | ```name``` | Dimensionless | Stable |
+**Instrument Type:** Float64Gauge

-!!! note
-    The name tag is empty.
+**Unit (UCUM):** {concurrency}
+
+**Description:** Excess burst capacity observed over the stable window
+
+### `kn.revision.concurrency.stable`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** {concurrency}
+
+**Description:** Average of request count per observed pod over the stable window
+
+### `kn.revision.concurrency.panic`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** {concurrency}
+
+**Description:** Average of request count per observed pod over the panic window
+
+### `kn.revision.concurrency.target`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** {concurrency}
+
+**Description:** The desired concurrent requests for each pod
+
+### `kn.revision.rps.stable`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** {request}/s
+
+**Description:** Average of requests-per-second per observed pod over the stable window
+
+### `kn.revision.rps.panic`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** {request}/s
+
+**Description:** Average of requests-per-second per observed pod over the panic window
+
+
+### `kn.revision.pods.requested`
+
+**Instrument Type:** Int64Gauge
+
+**Unit (UCUM):** {pod}
+
+**Description:** Number of pods autoscaler requested from Kubernetes
+
+### `kn.revision.pods.count`
+
+**Instrument Type:** Int64Gauge
+
+**Unit (UCUM):** {pod}
+
+**Description:** Number of pods that are allocated currently
+
+### `kn.revision.pods.not_ready.count`
+
+**Instrument Type:** Int64Gauge
+
+**Unit (UCUM):** {pod}
+
+**Description:** Number of pods that are not ready currently
+
+### `kn.revision.pods.pending.count`
+
+**Instrument Type:** Int64Gauge
+
+**Unit (UCUM):** {pod}
+
+**Description:** Number of pods that are pending currently
+
+### `kn.revision.pods.terminating.count`
+
+**Instrument Type:** Int64Gauge
+
+**Unit (UCUM):** {pod}
+
+**Description:** Number of pods that are terminating currently
+
+--8<-- "observability-shared-metrics.md"
--- a/docs/serving/observability/metrics/system-diagram.svg
+++ b/docs/serving/observability/metrics/system-diagram.svg
--- a/docs/snippets/collecting-metrics.md
+++ b/docs/snippets/collecting-metrics.md
@ -1,16 +1,17 @@
 # Collecting Metrics in Knative

-Knative supports different popular tools for collecting metrics:
+Knative leverages [OpenTelemetry](https://opentelemetry.io/docs/what-is-opentelemetry/) for exporting metrics.

- [Prometheus](https://prometheus.io/)
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/)
-
-[Grafana](https://grafana.com/oss/) dashboards are available for metrics collected directly with Prometheus.
+We currently support the following export protocols:
+- [OTel (OTLP) over HTTP or gRPC](https://opentelemetry.io/docs/languages/go/exporters/#prometheus-experimental)
+- [Prometheus](https://opentelemetry.io/docs/languages/go/exporters/#prometheus-experimental)

 You can also set up the OpenTelemetry Collector to receive metrics from Knative components and distribute them to other metrics providers that support OpenTelemetry.

-!!! warning
-    You can't use OpenTelemetry Collector and Prometheus at the same time. The default metrics backend is Prometheus. You will need to remove `metrics.backend-destination` and `metrics.request-metrics-backend-destination` keys from the config-observability Configmap to enable Prometheus metrics.
+!!! note
+    The following monitoring setup is for illustrative purposes. Support is best-effort and changes
+    are welcome in the [Knative Monitoring repository](https://github.com/knative-extensions/monitoring)
+    By default metrics are exporting is off.

 ## About the Prometheus Stack

@ -27,39 +28,22 @@ You can also set up the OpenTelemetry Collector to receive metrics from Knative
       ```bash
       helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
       helm repo update
-       helm install prometheus prometheus-community/kube-prometheus-stack -n default -f values.yaml
-       # values.yaml contains at minimum the configuration below
-       ```
+       helm install knative prometheus-community/kube-prometheus-stack \
+          --create-namespace \
+          --namespace observability \
+          -f https://raw.githubusercontent.com/knative-extensions/monitoring/main/promstack-values.yaml

-    !!! caution
-        You will need to ensure that the helm chart has following values configured, otherwise the ServiceMonitors/Podmonitors will not work.
-        ```yaml
-        kube-state-metrics:
-          metricLabelsAllowlist:
-            - pods=[*]
-            - deployments=[app.kubernetes.io/name,app.kubernetes.io/component,app.kubernetes.io/instance]
-        prometheus:
-          prometheusSpec:
-            serviceMonitorSelectorNilUsesHelmValues: false
-            podMonitorSelectorNilUsesHelmValues: false
-        ```
-
-1. Apply the ServiceMonitors/PodMonitors to collect metrics from Knative.
-
-    ```bash
-    kubectl apply -f https://raw.githubusercontent.com/knative-extensions/monitoring/main/servicemonitor.yaml
-    ```

 ### Access the Prometheus instance locally

-By default, the Prometheus instance is only exposed on a private service named `prometheus-kube-prometheus-prometheus`.
+By default, the Prometheus instance is only exposed on a private service named `prometheus-operated`.

 To access the console in your web browser:

 1. Enter the command:

    ```bash
-    kubectl port-forward -n default svc/prometheus-kube-prometheus-prometheus 9090:9090
+    kubectl port-forward -n observability svc/prometheus-operated 9090:9090
    ```

 1. Access the console in your browser via `http://localhost:9090`.
@ -73,7 +57,7 @@ To access the dashboards in your web browser:
 1. Enter the command:

    ```bash
-    kubectl port-forward -n default svc/prometheus-grafana 3000:80
+    kubectl port-forward -n observability svc/knative-grafana 3000:80
    ```

 1. Access the dashboards in your browser via `http://localhost:3000`.
@ -85,90 +69,3 @@ To access the dashboards in your web browser:
    password: prom-operator
    ```

-### Import Grafana dashboards
-
-1. Grafana dashboards can be imported from the [`monitoring` repository](https://github.com/knative-extensions/monitoring/tree/main/grafana).
-
-1. If you are using the Grafana Helm Chart with the Dashboard Sidecar enabled, you can load the dashboards by applying the following configmaps.
-
-    ```bash
-    kubectl apply -f https://raw.githubusercontent.com/knative-extensions/monitoring/main/grafana/dashboards.yaml
-    ```
-
-    !!! caution
-        You will need to ensure that the helm chart has following values configured, otherwise the dashboards loading will not work.
-        ```yaml
-        grafana:
-          sidecar:
-            dashboards:
-              enabled: true
-              searchNamespace: ALL
-        ```
-        If you have an existing configmap and the dashboards loading doesn't work, add the `labelValue: true` attribute to the helm chart after the `searchNamespace: ALL` declaration.
-
-## About OpenTelemetry
-
-OpenTelemetry is a CNCF observability framework for cloud-native software, which provides a collection of tools, APIs, and SDKs.
-
-You can use OpenTelemetry to instrument, generate, collect, and export telemetry data. This data includes metrics, logs, and traces, that you can analyze to understand the performance and behavior of Knative components.
-
-OpenTelemetry allows you to easily export metrics to multiple monitoring services without needing to rebuild or reconfigure the Knative binaries.
-
-## Understanding the collector
-
-The collector provides a location where various Knative components can push metrics to be retained and collected by a monitoring service.
-
-In the following example, you can configure a single collector instance using a ConfigMap and a Deployment.
-
-!!! tip
-    For more complex deployments, you can automate some of these steps by using the [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator).
-
-!!! caution
-    The Grafana dashboards at https://github.com/knative-extensions/monitoring/tree/main/grafana don't work with metrics scraped from OpenTelemetry Collector.
-
-![Diagram of components reporting to collector, which is scraped by Prometheus](system-diagram.svg)
-
-<!-- yuml.me UML rendering of:
-[queue-proxy1]->[Collector]
-[queue-proxy2]->[Collector]
-[autoscaler]->[Collector]
-[controller]->[Collector]
-[Collector]<-scrape[Prometheus]
-->
-
-## Set up the collector
-
-1. Create a namespace for the collector to run in, by entering the following command:
-
-       ```bash
-       kubectl create namespace metrics
-       ```
-    The next step uses the `metrics` namespace for creating the collector.
-
-1. Create a Deployment, Service, and ConfigMap for the collector by entering the following command:
-
-       ```bash
-       kubectl apply -f https://raw.githubusercontent.com/knative/docs/main/docs/serving/observability/metrics/collector.yaml
-       ```
-
-1. Update the `config-observability` ConfigMaps in the Knative Serving and
-   Eventing namespaces, by entering the follow command:
-
-       ```bash
-       kubectl patch --namespace knative-serving configmap/config-observability \
-         --type merge \
-         --patch '{"data":{"metrics.backend-destination":"opencensus","metrics.request-metrics-backend-destination":"opencensus","metrics.opencensus-address":"otel-collector.metrics:55678"}}'
-       kubectl patch --namespace knative-eventing configmap/config-observability \
-         --type merge \
-         --patch '{"data":{"metrics.backend-destination":"opencensus","metrics.opencensus-address":"otel-collector.metrics:55678"}}'
-       ```
-
-## Verify the collector setup
-
-1. You can check that metrics are being forwarded by loading the Prometheus export port on the collector, by entering the following command:
-
-    ```bash
-    kubectl port-forward --namespace metrics deployment/otel-collector 8889
-    ```
-
-1. Fetch `http://localhost:8889/metrics` to see the exported metrics.
--- a/docs/snippets/observability-shared-metrics.md
+++ b/docs/snippets/observability-shared-metrics.md
@ -0,0 +1,115 @@
+
+## Webhook Metrics
+
+Webhook metrics report useful info about operations. For example, if a large number of operations fail, this could indicate an issue with a user-created resource.
+
+### `http.server.request.duration`
+
+Knative implements the [semantic conventions for HTTP Servers](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/#http-server) using the OpenTelemetry [otel-go/otelhttp](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp) package.
+
+Please refer to the [OpenTelemetry docs](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp) for details about the HTTP Server metrics it exports.
+
+The following attributes are included with the metric
+
+Name | Type | Description | Examples
+-|-|-|-
+`kn.webhook.type` | string | Specifies the type of webhook invoked | `admission`, `defaulting`, `validation`, `conversion` |
+`kn.webhook.resource.group` | string | Specifies the resource Kubernetes group name |
+`kn.webhook.resource.version` | string | Specifies the resource Kubernetes group version|
+`kn.webhook.resource.kind` | string | Specifies the resource Kubernetes group kind |
+`kn.webhook.subresource` | string | Specifies the subresource | "" (empty), `status`, `scale` |
+`kn.webhook.operation.type` | string | Specifies the operation that invoked the webhook | `CREATE`, `UPDATE`, `DELETE` |
+`kn.webhook.operation.status` | string | Specifies whether the operation was successful | `success`, `failed` |
+
+### `kn.webhook.handler.duration`
+
+**Instrument Type:** Histogram
+
+**Unit (UCUM):** s
+
+**Description:** The duration of task execution.
+
+The following attributes are included with the metric
+
+Name | Type | Description | Examples
+-|-|-|-
+`kn.webhook.type` | string | Specifies the type of webhook invoked | `admission`, `defaulting`, `validation`, `conversion` |
+`kn.webhook.resource.group` | string | Specifies the resource Kubernetes group name |
+`kn.webhook.resource.version` | string | Specifies the resource Kubernetes group version|
+`kn.webhook.resource.kind` | string | Specifies the resource Kubernetes group kind |
+`kn.webhook.subresource` | string | Specifies the subresource | "" (empty), `status`, `scale` |
+`kn.webhook.operation.type` | string | Specifies the operation that invoked the webhook | `CREATE`, `UPDATE`, `DELETE` |
+`kn.webhook.operation.status` | string | Specifies whether the operation was successful | `success`, `failed` |
+
+## Workqueue Metrics
+
+Knative controllers expose [client-go workqueue metrics](https://pkg.go.dev/k8s.io/client-go/util/workqueue#MetricsProvider)
+
+The following attributes are included with the metrics below
+
+Name | Type | Description |
+-|-|-
+`name` | string | Name of the work queue
+
+### `kn.workqueue.depth`
+
+**Instrument Type:** Int64UpDownCounter
+
+**Unit (UCUM):** {item}
+
+**Description:** Number of current items in the queue
+
+### `kn.workqueue.adds`
+
+**Instrument Type:**  Int64Counter
+
+**Unit (UCUM):**  {item}
+
+**Description:**  Number of items added to the queue
+
+### `kn.workqueue.queue.duration`
+
+**Instrument Type:**
+
+**Unit (UCUM):** s
+
+**Description:** How long an item stays in workqueue
+
+### `kn.workqueue.process.duration`
+
+**Instrument Type:** Float64Histogram
+
+**Unit (UCUM):** s
+
+**Description:** How long in seconds processing an item from workqueue takes
+
+### `kn.workqueue.unfinished_work`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** s
+
+**Description:** How many seconds of work the reconciler has done that is in progress and hasn't been observed by duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
+
+### `kn.workqueue.longest_running_processor`
+
+**Instrument Type:** Float64Gauge
+
+**Unit (UCUM):** s
+
+**Description:** How long the longest worker thread has been running
+
+### `kn.workqueue.retries`
+
+**Instrument Type:** Int64Counter
+
+**Unit (UCUM):** {item}
+
+**Description:** Number of items re-added to the queue
+
+
+## Go Runtime
+
+Knative implements the [semantic conventions for Go runtime metrics](https://opentelemetry.io/docs/specs/semconv/runtime/go-metrics/) using the OpenTelemetry [otel-go/instrumentation/runtime](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/runtime) package. 
+
+Please refer to the [OpenTelemetry docs](https://opentelemetry.io/docs/specs/semconv/runtime/go-metrics/) for details about the go runtime metrics it exports.