|
|
|
|
@ -2,16 +2,10 @@
|
|
|
|
|
linkTitle: Generative AI metrics
|
|
|
|
|
--->
|
|
|
|
|
|
|
|
|
|
# Semantic Conventions for Generative AI Client Metrics
|
|
|
|
|
# Semantic Conventions for Generative AI Metrics
|
|
|
|
|
|
|
|
|
|
**Status**: [Experimental][DocumentStatus]
|
|
|
|
|
|
|
|
|
|
The conventions described in this section are specific to Generative AI client
|
|
|
|
|
applications.
|
|
|
|
|
|
|
|
|
|
**Disclaimer:** These are initial Generative AI client metric instruments
|
|
|
|
|
and attributes but more may be added in the future.
|
|
|
|
|
|
|
|
|
|
<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->
|
|
|
|
|
|
|
|
|
|
<!-- toc -->
|
|
|
|
|
@ -19,11 +13,21 @@ and attributes but more may be added in the future.
|
|
|
|
|
- [Generative AI Client Metrics](#generative-ai-client-metrics)
|
|
|
|
|
- [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage)
|
|
|
|
|
- [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration)
|
|
|
|
|
- [Generative AI Model Server Metrics](#generative-ai-model-server-metrics)
|
|
|
|
|
- [Metric: `gen_ai.server.request.duration`](#metric-gen_aiserverrequestduration)
|
|
|
|
|
- [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiservertime_per_output_token)
|
|
|
|
|
- [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token)
|
|
|
|
|
|
|
|
|
|
<!-- tocstop -->
|
|
|
|
|
|
|
|
|
|
## Generative AI Client Metrics
|
|
|
|
|
|
|
|
|
|
The conventions described in this section are specific to Generative AI client
|
|
|
|
|
applications.
|
|
|
|
|
|
|
|
|
|
**Disclaimer:** These are initial Generative AI client metric instruments
|
|
|
|
|
and attributes but more may be added in the future.
|
|
|
|
|
|
|
|
|
|
The following metric instruments describe Generative AI operations. An
|
|
|
|
|
operation may be a request to an LLM, a function call, or some other
|
|
|
|
|
distinct action within a larger Generative AI workflow.
|
|
|
|
|
@ -69,7 +73,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [1, 4, 16, 64
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` |  |
|
|
|
|
|
| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` |  |
|
|
|
|
|
| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` |  |
|
|
|
|
|
| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` |  |
|
|
|
|
|
| [`gen_ai.token.type`](/docs/attributes-registry/gen-ai.md) | string | The type of token being counted. | `input`; `output` | `Required` |  |
|
|
|
|
|
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. |  |
|
|
|
|
|
| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` |  |
|
|
|
|
|
@ -142,7 +146,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [ 0.01, 0.02,
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` |  |
|
|
|
|
|
| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` |  |
|
|
|
|
|
| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` |  |
|
|
|
|
|
| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` |  |
|
|
|
|
|
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error |  |
|
|
|
|
|
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. |  |
|
|
|
|
|
| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` |  |
|
|
|
|
|
@ -179,6 +183,230 @@ Instrumentations SHOULD document the list of errors they report.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- markdownlint-restore -->
|
|
|
|
|
<!-- prettier-ignore-end -->
|
|
|
|
|
<!-- END AUTOGENERATED TEXT -->
|
|
|
|
|
<!-- endsemconv -->
|
|
|
|
|
|
|
|
|
|
## Generative AI Model Server Metrics
|
|
|
|
|
|
|
|
|
|
The following metric instruments describe Generative AI model servers'
|
|
|
|
|
operational metrics. It includes both functional and performance metrics.
|
|
|
|
|
|
|
|
|
|
### Metric: `gen_ai.server.request.duration`
|
|
|
|
|
|
|
|
|
|
This metric is [recommended][MetricRecommended] to report the model server
|
|
|
|
|
latency in terms of time spent per request.
|
|
|
|
|
|
|
|
|
|
This metric SHOULD be specified with [ExplicitBucketBoundaries] of
|
|
|
|
|
[0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12,10.24, 20.48, 40.96, 81.92].
|
|
|
|
|
|
|
|
|
|
<!-- semconv metric.gen_ai.server.request.duration(metric_table) -->
|
|
|
|
|
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
|
|
|
|
|
<!-- see templates/registry/markdown/snippet.md.j2 -->
|
|
|
|
|
<!-- prettier-ignore-start -->
|
|
|
|
|
<!-- markdownlint-capture -->
|
|
|
|
|
<!-- markdownlint-disable -->
|
|
|
|
|
|
|
|
|
|
| Name | Instrument Type | Unit (UCUM) | Description | Stability |
|
|
|
|
|
| -------- | --------------- | ----------- | -------------- | --------- |
|
|
|
|
|
| `gen_ai.server.request.duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- markdownlint-restore -->
|
|
|
|
|
<!-- prettier-ignore-end -->
|
|
|
|
|
<!-- END AUTOGENERATED TEXT -->
|
|
|
|
|
<!-- endsemconv -->
|
|
|
|
|
|
|
|
|
|
<!-- semconv metric.gen_ai.server.request.duration(full) -->
|
|
|
|
|
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
|
|
|
|
|
<!-- see templates/registry/markdown/snippet.md.j2 -->
|
|
|
|
|
<!-- prettier-ignore-start -->
|
|
|
|
|
<!-- markdownlint-capture -->
|
|
|
|
|
<!-- markdownlint-disable -->
|
|
|
|
|
|
|
|
|
|
| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` |  |
|
|
|
|
|
| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` |  |
|
|
|
|
|
| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` |  |
|
|
|
|
|
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error |  |
|
|
|
|
|
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. |  |
|
|
|
|
|
| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` |  |
|
|
|
|
|
| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [4] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` |  |
|
|
|
|
|
|
|
|
|
|
**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge.
|
|
|
|
|
For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`.
|
|
|
|
|
|
|
|
|
|
**[2]:** The `error.type` SHOULD match the error code returned by the Generative AI service,
|
|
|
|
|
the canonical name of exception that occurred, or another low-cardinality error identifier.
|
|
|
|
|
Instrumentations SHOULD document the list of errors they report.
|
|
|
|
|
|
|
|
|
|
**[3]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.
|
|
|
|
|
|
|
|
|
|
**[4]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
|
|
|
|
|
|
|
|
|
|
| Value | Description | Stability |
|
|
|
|
|
|---|---|---|
|
|
|
|
|
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
|
|
|
|
|
|
|
|
|
|
| Value | Description | Stability |
|
|
|
|
|
|---|---|---|
|
|
|
|
|
| `anthropic` | Anthropic |  |
|
|
|
|
|
| `cohere` | Cohere |  |
|
|
|
|
|
| `openai` | OpenAI |  |
|
|
|
|
|
| `vertex_ai` | Vertex AI |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- markdownlint-restore -->
|
|
|
|
|
<!-- prettier-ignore-end -->
|
|
|
|
|
<!-- END AUTOGENERATED TEXT -->
|
|
|
|
|
<!-- endsemconv -->
|
|
|
|
|
|
|
|
|
|
### Metric: `gen_ai.server.time_per_output_token`
|
|
|
|
|
|
|
|
|
|
This metric is [recommended][MetricRecommended] to report the model server
|
|
|
|
|
latency in terms of time per token generated after the first token for any model
|
|
|
|
|
servers which support serving LLMs. It is measured by subtracting the time taken
|
|
|
|
|
to generate the first output token from the request duration and dividing the
|
|
|
|
|
rest of the duration by the number of output tokens generated after the first
|
|
|
|
|
token. This is important in measuring the performance of the decode phase of LLM
|
|
|
|
|
inference.
|
|
|
|
|
|
|
|
|
|
This metric SHOULD be specified with [ExplicitBucketBoundaries] of
|
|
|
|
|
[0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5].
|
|
|
|
|
|
|
|
|
|
<!-- semconv metric.gen_ai.server.time_per_output_token(metric_table) -->
|
|
|
|
|
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
|
|
|
|
|
<!-- see templates/registry/markdown/snippet.md.j2 -->
|
|
|
|
|
<!-- prettier-ignore-start -->
|
|
|
|
|
<!-- markdownlint-capture -->
|
|
|
|
|
<!-- markdownlint-disable -->
|
|
|
|
|
|
|
|
|
|
| Name | Instrument Type | Unit (UCUM) | Description | Stability |
|
|
|
|
|
| -------- | --------------- | ----------- | -------------- | --------- |
|
|
|
|
|
| `gen_ai.server.time_per_output_token` | Histogram | `s` | Time per output token generated after the first token for successful responses |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- markdownlint-restore -->
|
|
|
|
|
<!-- prettier-ignore-end -->
|
|
|
|
|
<!-- END AUTOGENERATED TEXT -->
|
|
|
|
|
<!-- endsemconv -->
|
|
|
|
|
|
|
|
|
|
<!-- semconv metric.gen_ai.server.time_per_output_token(full) -->
|
|
|
|
|
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
|
|
|
|
|
<!-- see templates/registry/markdown/snippet.md.j2 -->
|
|
|
|
|
<!-- prettier-ignore-start -->
|
|
|
|
|
<!-- markdownlint-capture -->
|
|
|
|
|
<!-- markdownlint-disable -->
|
|
|
|
|
|
|
|
|
|
| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` |  |
|
|
|
|
|
| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` |  |
|
|
|
|
|
| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` |  |
|
|
|
|
|
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. |  |
|
|
|
|
|
| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` |  |
|
|
|
|
|
| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` |  |
|
|
|
|
|
|
|
|
|
|
**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge.
|
|
|
|
|
For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`.
|
|
|
|
|
|
|
|
|
|
**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.
|
|
|
|
|
|
|
|
|
|
**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
|
|
|
|
|
|
|
|
|
|
| Value | Description | Stability |
|
|
|
|
|
|---|---|---|
|
|
|
|
|
| `anthropic` | Anthropic |  |
|
|
|
|
|
| `cohere` | Cohere |  |
|
|
|
|
|
| `openai` | OpenAI |  |
|
|
|
|
|
| `vertex_ai` | Vertex AI |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- markdownlint-restore -->
|
|
|
|
|
<!-- prettier-ignore-end -->
|
|
|
|
|
<!-- END AUTOGENERATED TEXT -->
|
|
|
|
|
<!-- endsemconv -->
|
|
|
|
|
|
|
|
|
|
### Metric: `gen_ai.server.time_to_first_token`
|
|
|
|
|
|
|
|
|
|
This metric is [recommended][MetricRecommended] to report the model server
|
|
|
|
|
latency in terms of time spent to generate the first token of the response for
|
|
|
|
|
any model servers which support serving LLMs. It helps measure the time spent in
|
|
|
|
|
the queue and the prefill phase. It is important especially for streaming
|
|
|
|
|
requests. It is calculated at a request level and is reported as a histogram
|
|
|
|
|
using the buckets mentioned below.
|
|
|
|
|
|
|
|
|
|
This metric SHOULD be specified with [ExplicitBucketBoundaries] of
|
|
|
|
|
[0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0].
|
|
|
|
|
|
|
|
|
|
<!-- semconv metric.gen_ai.server.time_to_first_token(metric_table) -->
|
|
|
|
|
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
|
|
|
|
|
<!-- see templates/registry/markdown/snippet.md.j2 -->
|
|
|
|
|
<!-- prettier-ignore-start -->
|
|
|
|
|
<!-- markdownlint-capture -->
|
|
|
|
|
<!-- markdownlint-disable -->
|
|
|
|
|
|
|
|
|
|
| Name | Instrument Type | Unit (UCUM) | Description | Stability |
|
|
|
|
|
| -------- | --------------- | ----------- | -------------- | --------- |
|
|
|
|
|
| `gen_ai.server.time_to_first_token` | Histogram | `s` | Time to generate first token for successful responses |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- markdownlint-restore -->
|
|
|
|
|
<!-- prettier-ignore-end -->
|
|
|
|
|
<!-- END AUTOGENERATED TEXT -->
|
|
|
|
|
<!-- endsemconv -->
|
|
|
|
|
|
|
|
|
|
<!-- semconv metric.gen_ai.server.time_to_first_token(full) -->
|
|
|
|
|
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
|
|
|
|
|
<!-- see templates/registry/markdown/snippet.md.j2 -->
|
|
|
|
|
<!-- prettier-ignore-start -->
|
|
|
|
|
<!-- markdownlint-capture -->
|
|
|
|
|
<!-- markdownlint-disable -->
|
|
|
|
|
|
|
|
|
|
| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` |  |
|
|
|
|
|
| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` |  |
|
|
|
|
|
| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` |  |
|
|
|
|
|
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. |  |
|
|
|
|
|
| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` |  |
|
|
|
|
|
| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` |  |
|
|
|
|
|
|
|
|
|
|
**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge.
|
|
|
|
|
For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`.
|
|
|
|
|
|
|
|
|
|
**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.
|
|
|
|
|
|
|
|
|
|
**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
|
|
|
|
|
|
|
|
|
|
| Value | Description | Stability |
|
|
|
|
|
|---|---|---|
|
|
|
|
|
| `anthropic` | Anthropic |  |
|
|
|
|
|
| `cohere` | Cohere |  |
|
|
|
|
|
| `openai` | OpenAI |  |
|
|
|
|
|
| `vertex_ai` | Vertex AI |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- markdownlint-restore -->
|
|
|
|
|
<!-- prettier-ignore-end -->
|
|
|
|
|
<!-- END AUTOGENERATED TEXT -->
|
|
|
|
|
|