3710: Moved serving and eventing metrics to admin guide (#3735)

2021-06-07 08:22:49 -05:00 · 2021-06-07 08:22:49 -05:00 · 7f657c8b1c
parent 4da3fd9582
commit 7f657c8b1c
3 changed files with 32 additions and 57 deletions
--- a/docs/admin/collecting-metrics/eventing-metrics/metrics.md
+++ b/docs/admin/collecting-metrics/eventing-metrics/metrics.md
@ -1,23 +1,11 @@
---
-title: "Metrics API"
-weight: 99
-type: "docs"
---
+# Knative Eventing metrics

-# Metrics API
+Administrators can view metrics for Knative Eventing components.

-<br>
+## Broker - Ingress

-**NOTE:** The metrics API may change in the future, this serves as a snapshot of the current metrics.
+Use the following metrics to debug how broker ingress performs and what events are dispatched via the ingress component.

-## Admin
-
-Administrators can monitor Eventing based on the metrics exposed by each Eventing component.
-Metrics are listed next.
-
-### Broker - Ingress
-
-Use the following metrics to debug how broker ingress performs and what events are dispacthed via the ingress component.
 By aggregating the metrics over the http code, events can be separated into two classes, successful (2xx) and failed events (5xx).

 | Metric Name | Description | Type | Tags | Unit | Status |
@ -25,7 +13,7 @@ By aggregating the metrics over the http code, events can be separated into two
 | event_count | Number of events received by a Broker | Counter | broker_name<br>event_type<br>namespace_name<br>response_code<br>response_code_class<br>unique_name | Dimensionless | Stable
 | event_dispatch_latencies | The time spent dispatching an event to a Channel | Histogram | broker_name<br>event_type<br>namespace_name<br>response_code<br>response_code_class<br>unique_name | Milliseconds | Stable

-### Broker - Filter
+## Broker - Filter

 Use the following metrics to debug how broker filter performs and what events are dispatched via the filter component.
 Also user can measure the latency of the actual filtering action on an event.
@ -37,7 +25,7 @@ By aggregating the metrics over the http code, events can be separated into two
 | event_dispatch_latencies | The time spent dispatching an event to a Channel | Histogram | broker_name<br>container_name<br>filter_type<br>namespace_name<br>response_code<br>response_code_class<br>trigger_name<br>unique_name | Milliseconds | Stable
 | event_processing_latencies | The time spent processing an event before it is dispatched to a Trigger subscriber | Histogram | broker_name<br>container_name<br>filter_type<br>namespace_name<br>trigger_name<br>unique_name | Milliseconds | Stable

-### In-memory Dispatcher
+## In-memory Dispatcher

 In-memory channel can be evaluated via the following metrics.
 By aggregating the metrics over the http code, events can be separated into two classes, successful (2xx) and failed events (5xx).
@ -47,11 +35,10 @@ By aggregating the metrics over the http code, events can be separated into two
 | event_count | Number of events dispatched by the in-memory channel | Counter | container_name<br>event_type=<br>namespace_name=<br>response_code<br>response_code_class<br>unique_name | Dimensionless | Stable
 | event_dispatch_latencies | The time spent dispatching an event from a in-memory Channel | Histogram | container_name<br>event_type<br>namespace_name=<br>response_code<br>response_code_class<br>unique_name | Milliseconds | Stable

+!!! note
+    A number of metrics eg. controller, Go runtime and others are omitted here as they are common across most components. For more about these metrics check the [Serving metrics API section](../../serving-metrics/metrics).

-**NOTE:** A number of metrics eg. controller, Go runtime and others are omitted here as they are common across most components. For more about these metrics check the [Serving metrics API section](../serving/metrics.md#controller).
-
-
-### Eventing sources
+## Eventing sources

 Eventing sources are created by users who own the related system, so they can trigger applications with events.
 Every source exposes by default a number of metrics to help user monitor events dispatched. Use the following metrics
--- a/docs/admin/collecting-metrics/serving-metrics/metrics.md
+++ b/docs/admin/collecting-metrics/serving-metrics/metrics.md
@ -1,38 +1,22 @@
---
-title: "Metrics API"
-weight: 99
-type: "docs"
---
-
-# Metrics API
-
-<br>
-
-**NOTE:** The metrics API may change in the future, this serves as a snapshot of the current metrics.
-<br>
-
-## Admin
+# Knative Serving metrics

 Administrators can monitor Serving control plane based on the metrics exposed by each Serving component.
 Metrics are listed next.

-### Activator
+## Activator
+
+The following metrics allow the user to understand how application responds when traffic goes through the activator, for example, when scaling from zero. For example high request latency means that requests are taken too much time be fulfilled.

-The following metrics allow the user to understand how application responds when traffic goes through the activator eg. scaling from zero. For example high request latency means that requests are taken too much time be fulfilled.
-<br>
 | Metric Name | Description | Type | Tags | Unit | Status |
 |:-|:-|:-|:-|:-|:-|
 | request_concurrency | Concurrent requests that are routed to Activator<br>These are requests reported by the concurrency reporter which may not be done yet.<br> This is the average concurrency over a reporting period | Gauge | configuration_name<br>container_name<br>namespace_name<br>pod_name<br>revision_name<br>service_name | Dimensionless | Stable |
 | request_count | The number of requests that are routed to Activator.<br>These are requests that have been fulfilled from the activator handler. | Counter | configuration_name<br>container_name<br>namespace_name<br>pod_name<br>response_code<br>response_code_class<br>revision_name<br>service_name | Dimensionless | Stable |
 | request_latencies | The response time in millisecond for the fulfilled routed requests | Histogram | configuration_name<br>container_name<br>namespace_name<br>pod_name<br>response_code<br>response_code_class<br>revision_name<br>service_name | Milliseconds | Stable |

-### Autoscaler
+## Autoscaler
+
+Autoscaler component exposes a number of metrics related to its decisions per revision. For example, at any given time, you can monitor the desired pods the Autoscaler wants to allocate for a Service, the average number of requests per second during the stable window, or whether autoscaler is in panic mode (KPA).

-Autoscaler component exposes a number of metrics related to its decisions per revision.
-For example at any given time user can monitor the desired pods the Autoscaler wants to allocate for
-a service, the average number of requests per second during the stable window, whether autoscaler is in panic mode (KPA) etc.
-To read more about how autoscaler works check [here](https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md).
-<br>
 | Metric Name | Description | Type | Tags | Unit | Status |
 |:-|:-|:-|:-|:-|:-|
 | desired_pods | Number of pods autoscaler wants to allocate | Gauge | configuration_name<br>namespace_name<br>revision_name<br>service_name | Dimensionless | Stable |
@ -50,7 +34,7 @@ To read more about how autoscaler works check [here](https://github.com/knative/
 | pending_pods | Number of pods that are pending currently | Gauge | configuration_name<br>namespace_name<br>revision_name<br>service_name | Dimensionless | Stable |
 | terminating_pods | Number of pods that are terminating currently | Gauge | configuration_name<br>namespace_name<br>revision_name<br>service_name<br> | Dimensionless | Stable |

-### Controller
+## Controller

 The following metrics are emitted by any component that implements a controller logic.
 The metrics show details about the reconciliation operations and the workqueue behavior on which
@ -69,21 +53,20 @@ reconciliation requests are enqueued.
 | workqueue_unfinished_work_seconds | How long in seconds the outstanding workqueue items have been in flight (total). | Histogram | name | Seconds | Stable |
 | workqueue_longest_running_processor_seconds | How long in seconds the longest outstanding workqueue item has been in flight | Histogram | name | Seconds | Stable |

-### Webhook
+## Webhook
+
+Webhook metrics report useful info about operations. For example, if a large number of operations fail, this could indicate an issue with a user-created resource.

-Webhook metrics report useful info about operations eg. CREATE on Serving resources and if admission was allowed.
-For example if a big number of operations fail this could be an issue with the submitted user resource.
-<br>
 | Metric Name | Description | Type | Tags | Unit | Status |
 |:-|:-|:-|:-|:-|:-|
 | request_count | The number of requests that are routed to webhook | Counter |  admission_allowed<br>kind_group<br>kind_kind<br>kind_version<br>request_operation<br>resource_group<br>resource_namespace<br>resource_resource<br>resource_version | Dimensionless | Stable |
 | request_latencies | The response time in milliseconds | Histogram |  admission_allowed<br>kind_group<br>kind_kind<br>kind_version<br>request_operation<br>resource_group<br>resource_namespace<br>resource_resource<br>resource_version | Milliseconds | Stable |

-### Go Runtime - memstats
+## Go Runtime - memstats

 Each Knative Serving control plane process emits a number of Go runtime [memory statistics](https://golang.org/pkg/runtime/#MemStats) (shown next).
 As a baseline for monitoring purproses, user could start with a subset of the metrics: current allocations (go_alloc), total allocations (go_total_alloc), system memory (go_sys), mallocs (go_mallocs), frees (go_frees) and garbage collection total pause time (total_gc_pause_ns), next gc target heap size (go_next_gc) and number of garbage collection cycles (num_gc).
-<br>
+
 | Metric Name | Description | Type | Tags | Unit | Status |
 |:-|:-|:-|:-|:-|:-|
 | go_alloc | The number of bytes of allocated heap objects (same as heap_alloc) | Gauge | name | Dimensionless | Stable |
@ -114,7 +97,8 @@ As a baseline for monitoring purproses, user could start with a subset of the me
 | go_num_forced_gc | The number of GC cycles that were forced by the application calling the GC function. | Gauge | name | Dimensionless | Stable |
 | go_gc_cpu_fraction | The fraction of this program's available CPU time used by the GC since the program started | Gauge | name | Dimensionless | Stable |

-**NOTE:** name tag is empty.
+!!! note
+    The name tag is empty.

 ## Developer - User Services

@ -124,7 +108,7 @@ developers, devops and others, could measure if requests are queued at the proxy

 ### Queue proxy

-Requests endpoint
+Requests endpoint.

 | Metric Name | Description | Type | Tags | Unit | Status |
 |:-|:-|:-|:-|:-|:-|
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -50,7 +50,11 @@ nav:
          - Upgrading with the Knative Operator: admin/upgrade/upgrade-installation-with-operator.md
          - Upgrading with kubectl: admin/upgrade/upgrade-installation.md
        - Logging: admin/collecting-logs/README.md
-        - Metrics: admin/collecting-metrics/README.md
+        # Administrator metrics
+        - Metrics:
+          - About metrics: admin/collecting-metrics/README.md
+          - Knative Eventing metrics: admin/collecting-metrics/eventing-metrics/metrics.md
+          - Knative Serving metrics: admin/collecting-metrics/serving-metrics/metrics.md
        - Uninstalling Knative: admin/install/uninstall.md
        # Serving config
        - Knative Serving configuration:
@ -99,7 +103,6 @@ nav:
        - Installing cert-manager: serving/installing-cert-manager.md
        - Configuring HTTPS connections: serving/using-a-tls-cert.md
        - Enabling auto-TLS certs: serving/using-auto-tls.md
-        - Metrics API: serving/metrics.md
        - Feature/Extension Flags: serving/feature-flags.md
        - Configuring the ingress gateway: serving/setting-up-custom-ingress-gateway.md
        - Setting up a custom domain: serving/using-a-custom-domain.md
@ -197,7 +200,6 @@ nav:
        - Apache Kafka Sink: eventing/sink/kafka-sink.md
      - Debugging: eventing/debugging/index.md
      - Accessing CloudEvent traces: eventing/accessing-traces.md
-      - Metrics API: eventing/metrics.md
      - Experimental Features: eventing/experimental-features.md
      - Code samples:
        - Overview: eventing/samples/README.md
@ -291,6 +293,8 @@ plugins:
 # Redirects
  - redirects:
      redirect_maps:
+        'serving/metrics.md': 'admin/collecting-metrics/serving-metrics/metrics.md'
+        'eventing/metrics.md': 'admin/collecting-metrics/eventing-metrics/metrics.md'
        'help/README.md' : 'docs/help/README.md'
        'eventing/samples/sinkbinding/README.md': 'eventing/sources/sinkbinding/README.md'
        'admin/install/install-serving-with-yaml.md': 'admin/install/serving/install-serving-with-yaml.md'