add info about custom latency buckets (#4236)

* doc: http metrics path normalization Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * doc: code review & path matching rename Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * doc: add configuration examples Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * update: update docs based on last proposal changes Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * feat: more updates based on the ingress/egress merge Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * doc: code review comments Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * doc: code review comments Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * feat: add excludeVerbs Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * feat: new line Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * feat: add review meeting changes Signed-off-by: nelson.parente <nelson_parente@live.com.pt> * v1.14 - cherry pick path normalization Signed-off-by: Filinto Duran <filinto@diagrid.io> * add additional changes Signed-off-by: Filinto Duran <filinto@diagrid.io> * add additional changes Signed-off-by: Filinto Duran <filinto@diagrid.io> * add additional changes Signed-off-by: Filinto Duran <filinto@diagrid.io> * format table Signed-off-by: Filinto Duran <filinto@diagrid.io> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Mark Fussell <markfussell@gmail.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * explain buckets Signed-off-by: Filinto Duran <filinto@diagrid.io> * Apply suggestions from code review Co-authored-by: Alice Gibbons <alicejgibbons@gmail.com> Signed-off-by: Mark Fussell <markfussell@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Mark Fussell <markfussell@gmail.com> Signed-off-by: Filinto Duran <duranto@gmail.com> * Update daprdocs/content/en/operations/observability/metrics/metrics-overview.md Co-authored-by: Mark Fussell <markfussell@gmail.com> Signed-off-by: Filinto Duran <duranto@gmail.com> --------- Signed-off-by: nelson.parente <nelson_parente@live.com.pt> Signed-off-by: Filinto Duran <filinto@diagrid.io> Signed-off-by: Filinto Duran <duranto@gmail.com> Signed-off-by: Mark Fussell <markfussell@gmail.com> Signed-off-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com> Co-authored-by: nelson.parente <nelson_parente@live.com.pt> Co-authored-by: Mark Fussell <markfussell@gmail.com> Co-authored-by: Alice Gibbons <alicejgibbons@gmail.com> Co-authored-by: Hannah Hunter <94493363+hhunter-ms@users.noreply.github.com>
2024-07-10 09:36:58 -05:00 · 2024-07-10 09:36:58 -05:00 · 64a22cbe3c
parent 867b15348d
commit 64a22cbe3c
3 changed files with 64 additions and 8 deletions
--- a/daprdocs/content/en/operations/configuration/configuration-overview.md
+++ b/daprdocs/content/en/operations/configuration/configuration-overview.md
@ -108,6 +108,7 @@ The `metrics` section under the `Configuration` spec contains the following prop
 metrics:
  enabled: true
  rules: []
+  latencyDistributionBuckets: []
  http:
    increasedCardinality: true
    pathMatching:
@ -121,17 +122,18 @@ metrics:
    excludeVerbs: false
 ```

-In the examples above, the path filter `/orders/{orderID}/items/{itemID}` would return a single metric count matching all the `orderIDs` and all the `itemIDs`, rather than multiple metrics for each `itemID`. For more information, see [HTTP metrics path matching]({{< ref "metrics-overview.md#http-metrics-path-matching" >}}).
+In the examples above this path filter `/orders/{orderID}/items/{itemID}` would return a single metric count matching all the orderIDs and all the itemIDs rather than multiple metrics for each itemID. For more information see [HTTP metrics path matching]({{< ref "metrics-overview.md#http-metrics-path-matching" >}})

 The following table lists the properties for metrics:

-| Property     | Type   | Description |
-|--------------|--------|-------------|
-| `enabled` | boolean | When set to true, the default, enables metrics collection and the metrics endpoint. |
-| `rules`   | array | Named rule to filter metrics. Each rule contains a set of `labels` to filter on and a `regex` expression to apply to the metrics path. |
-| `http.increasedCardinality` | boolean | When set to `true` (default), in the Dapr HTTP server, each request path causes the creation of a new "bucket" of metrics. This can cause issues, including excessive memory consumption when there many different requested endpoints (such as when interacting with RESTful APIs).<br> To mitigate high memory usage and egress costs associated with [high cardinality metrics]({{< ref "metrics-overview.md#high-cardinality-metrics" >}}) with the HTTP server, you should set the `metrics.http.increasedCardinality` property to `false`.|
-| `http.pathMatching` | array | Paths used for path matching, allowing users to define matching paths in order to manage cardinality. |
-| `http.excludeVerbs` | boolean | When set to `true` (default is `false`), the Dapr HTTP server ignores each request HTTP verb when building the method metric label. |
+| Property                     | Type    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+|------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `enabled`                    | boolean | When set to true, the default, enables metrics collection and the metrics endpoint.                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| `rules`                      | array   | Named rule to filter metrics. Each rule contains a set of `labels` to filter on and a `regex` expression to apply to the metrics path.                                                                                                                                                                                                                                                                                                                                                                                                           |
+| `latencyDistributionBuckets` | array   | Array of latency distribution buckets in milliseconds for latency metrics histograms.                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| `http.increasedCardinality`  | boolean | When set to `true` (default), in the Dapr HTTP server each request path causes the creation of a new "bucket" of metrics. This can cause issues, including excessive memory consumption, when there many different requested endpoints (such as when interacting with RESTful APIs).<br> To mitigate high memory usage and egress costs associated with [high cardinality metrics]({{< ref "metrics-overview.md#high-cardinality-metrics" >}}) with the HTTP server, you should set the `metrics.http.increasedCardinality` property to `false`. |
+| `http.pathMatching`          | array   | Array of paths for path matching, allowing users to define matching paths to manage cardinality.                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `http.excludeVerbs`          | boolean | When set to true (default is false), the Dapr HTTP server ignores each request HTTP verb when building the method metric label.                                                                                                                                                                                                                                                                                                                                                                                                                  |

 To further help managing cardinality, path matching allows specified paths matched according to defined patterns, reducing the number of unique metrics paths and thus controlling metric cardinality. This feature is particularly useful for applications with dynamic URLs, ensuring that metrics remain meaningful and manageable without excessive memory consumption. 

--- a/daprdocs/content/en/operations/observability/metrics/metrics-overview.md
+++ b/daprdocs/content/en/operations/observability/metrics/metrics-overview.md
@ -198,7 +198,58 @@ dapr_http_server_request_count{app_id="order-service",method="",path="/orders",s

 In this example, the HTTP method is excluded from the metrics, resulting in a single metric for all requests to the `/orders` endpoint.

+## Configuring custom latency histogram buckets

+Dapr uses cumulative histogram metrics to group latency values into buckets, where each bucket contains:
+- A count of the number of requests with that latency
+- All the requests with lower latency
+
+### Using the default latency bucket configurations 
+
+By default, Dapr groups request latency metrics into the following buckets:
+
+```
+1, 2, 3, 4, 5, 6, 8, 10, 13, 16, 20, 25, 30, 40, 50, 65, 80, 100, 130, 160, 200, 250, 300, 400, 500, 650, 800, 1000, 2000, 5000, 10000, 20000, 50000, 100000
+```
+
+Grouping latency values in a cumulative fashion allows buckets to be used or dropped as needed for increased or decreased granularity of data.
+For example, if a request takes 3ms, it's counted in the 3ms bucket, the 4ms bucket, the 5ms bucket, and so on.
+Similarly, if a request takes 10ms, it's counted in the 10ms bucket, the 13ms bucket, the 16ms bucket, and so on.
+After these two requests have completed, the 3ms bucket has a count of 1 and the 10ms bucket has a count of 2, since both the 3ms and 10ms requests are included here. 
+
+This shows up as follows:
+
+|1|2|3|4|5|6|8|10|13|16|20|25|30|40|50|65|80|100|130|160| ..... | 100000 |
+|-|-|-|-|-|-|-|--|--|--|--|--|--|--|--|--|--|---|---|---|-------|--------|
+|0|0|1|1|1|1|1| 2| 2| 2| 2| 2| 2| 2| 2| 2| 2| 2 | 2 | 2 | ..... | 2      |
+
+
+The default number of buckets works well for most use cases, but can be adjusted as needed. Each request creates 34 different metrics, leaving this value to grow considerably for a large number of applications.
+More accurate latency percentiles can be achieved by increasing the number of buckets. However, a higher number of buckets increases the amount of memory used to store the metrics, potentially negatively impacting your monitoring system. 
+
+It is recommended to keep the number of latency buckets set to the default value, unless you are seeing unwanted memory pressure in your monitoring system. Configuring the number of buckets allows you to choose applications where:
+- You want to see more detail with a higher number of buckets
+- Broader values are sufficient by reducing the buckets
+
+Take note of the default latency values your applications are producing before configuring the number buckets.
+### Customizing latency buckets to your scenario
+
+Tailor the latency buckets to your needs, by modifying the `spec.metrics.latencyDistributionBuckets` field in the [Dapr configuration spec]({{< ref configuration-schema.md >}}) for your application(s).
+
+For example, if you aren't interested in extremely low latency values (1-10ms), you can group them in a single 10ms bucket. Similarly, you can group the high values in a single bucket (1000-5000ms), while keeping more detail in the middle range of values that you are most interested in.
+
+The following Configuration spec example replaces the default 34 buckets with 11 buckets, giving a higher level of granularity in the middle range of values:
+
+```yaml
+apiVersion: dapr.io/v1alpha1
+kind: Configuration
+metadata:
+  name: custom-metrics
+spec:
+    metrics:
+        enabled: true
+        latencyDistributionBuckets: [10, 25, 40, 50, 70, 100, 150, 200, 500, 1000, 5000]
+```

 ## Transform metrics with regular expressions

--- a/daprdocs/content/en/reference/resource-specs/configuration-schema.md
+++ b/daprdocs/content/en/reference/resource-specs/configuration-schema.md
@ -36,6 +36,9 @@ spec:
        labels:
          - name: <LABEL-NAME>
            regex: {}
+    latencyDistributionBuckets:
+      - <BUCKET-VALUE-MS-0>
+      - <BUCKET-VALUE-MS-1>
    http:
      increasedCardinality: <TRUE-OR-FALSE>
      pathMatching: