Add LLM model server metrics (#1103)

Co-authored-by: Liudmila Molkova <limolkova@microsoft.com> Co-authored-by: Drew Robbins <drew@drewby.com> Co-authored-by: Joao Grassi <5938087+joaopgrassi@users.noreply.github.com>
2024-06-27 11:05:12 +00:00 · 2024-06-27 11:05:12 +00:00 · a328d73b5b
parent 5a2836bbea
commit a328d73b5b
6 changed files with 298 additions and 12 deletions
--- a/.chloggen/1102.yaml
+++ b/.chloggen/1102.yaml
@ -0,0 +1,22 @@
+# Use this changelog template to create an entry for release notes.
+#
+# If your change doesn't affect end users you should instead start
+# your pull request title with [chore] or use the "Skip Changelog" label.
+
+# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
+change_type: enhancement
+
+# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
+component: gen-ai
+
+# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
+note: Add GenAI model server metrics for measuring LLM serving latency
+
+# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
+# The values here must be integers.
+issues: [1102]
+
+# (Optional) One or more lines of additional information to render under the primary note.
+# These lines will be padded with 2 spaces and then inserted directly into the document.
+# Use pipe (|) for multiline entries.
+subtext:
--- a/docs/attributes-registry/gen-ai.md
+++ b/docs/attributes-registry/gen-ai.md
@ -26,7 +26,7 @@ This document defines the attributes used to describe telemetry in the context o
 | `gen_ai.response.finish_reasons`   | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]`                                                              | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.response.id`               | string   | The unique identifier for the completion.                                                        | `chatcmpl-123`                                                          | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.response.model`            | string   | The name of the model that generated the response.                                               | `gpt-4-0613`                                                            | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
-| `gen_ai.system`                    | string   | The Generative AI product as identified by the client instrumentation. [3]                       | `openai`                                                                | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `gen_ai.system`                    | string   | The Generative AI product as identified by the client or server instrumentation. [3]             | `openai`                                                                | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.token.type`                | string   | The type of token being counted.                                                                 | `input`; `output`                                                       | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.usage.completion_tokens`   | int      | The number of tokens used in the GenAI response (completion).                                    | `180`                                                                   | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.usage.prompt_tokens`       | int      | The number of tokens used in the GenAI input or prompt.                                          | `100`                                                                   | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
--- a/docs/gen-ai/gen-ai-metrics.md
+++ b/docs/gen-ai/gen-ai-metrics.md
@ -2,16 +2,10 @@
 linkTitle: Generative AI metrics
 --->

-# Semantic Conventions for Generative AI Client Metrics
+# Semantic Conventions for Generative AI Metrics

 **Status**: [Experimental][DocumentStatus]

-The conventions described in this section are specific to Generative AI client
-applications.
-
-**Disclaimer:** These are initial Generative AI client metric instruments
-and attributes but more may be added in the future.
-
 <!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->

 <!-- toc -->
@ -19,11 +13,21 @@ and attributes but more may be added in the future.
 - [Generative AI Client Metrics](#generative-ai-client-metrics)
  - [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage)
  - [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration)
+- [Generative AI Model Server Metrics](#generative-ai-model-server-metrics)
+  - [Metric: `gen_ai.server.request.duration`](#metric-gen_aiserverrequestduration)
+  - [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiservertime_per_output_token)
+  - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token)

 <!-- tocstop -->

 ## Generative AI Client Metrics

+The conventions described in this section are specific to Generative AI client
+applications.
+
+**Disclaimer:** These are initial Generative AI client metric instruments
+and attributes but more may be added in the future.
+
 The following metric instruments describe Generative AI operations. An
 operation may be a request to an LLM, a function call, or some other
 distinct action within a larger Generative AI workflow.
@ -69,7 +73,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [1, 4, 16, 64
 |---|---|---|---|---|---|
 | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
-| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`gen_ai.token.type`](/docs/attributes-registry/gen-ai.md) | string | The type of token being counted. | `input`; `output` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
 | [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
@ -142,7 +146,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [ 0.01, 0.02,
 |---|---|---|---|---|---|
 | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
-| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
 | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
 | [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
@ -179,6 +183,230 @@ Instrumentations SHOULD document the list of errors they report.



+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+## Generative AI Model Server Metrics
+
+The following metric instruments describe Generative AI model servers'
+operational metrics. It includes both functional and performance metrics.
+
+### Metric: `gen_ai.server.request.duration`
+
+This metric is [recommended][MetricRecommended] to report the model server
+latency in terms of time spent per request.
+
+This metric SHOULD be specified with [ExplicitBucketBoundaries] of
+[0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12,10.24, 20.48, 40.96, 81.92].
+
+<!-- semconv metric.gen_ai.server.request.duration(metric_table) -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+| Name     | Instrument Type | Unit (UCUM) | Description    | Stability |
+| -------- | --------------- | ----------- | -------------- | --------- |
+| `gen_ai.server.request.duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+<!-- semconv metric.gen_ai.server.request.duration(full) -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+| Attribute  | Type | Description  | Examples  | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
+|---|---|---|---|---|---|
+| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [4] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+
+**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge.
+For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`.
+
+**[2]:** The `error.type` SHOULD match the error code returned by the Generative AI service,
+the canonical name of exception that occurred, or another low-cardinality error identifier.
+Instrumentations SHOULD document the list of errors they report.
+
+**[3]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.
+
+**[4]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
+
+
+
+`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
+
+| Value  | Description | Stability |
+|---|---|---|
+| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+
+
+`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
+
+| Value  | Description | Stability |
+|---|---|---|
+| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+
+
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+### Metric: `gen_ai.server.time_per_output_token`
+
+This metric is [recommended][MetricRecommended] to report the model server
+latency in terms of time per token generated after the first token for any model
+servers which support serving LLMs. It is measured by subtracting the time taken
+to generate the first output token from the request duration and dividing the
+rest of the duration by the number of output tokens generated after the first
+token. This is important in measuring the performance of the decode phase of LLM
+inference.
+
+This metric SHOULD be specified with [ExplicitBucketBoundaries] of
+[0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5].
+
+<!-- semconv metric.gen_ai.server.time_per_output_token(metric_table) -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+| Name     | Instrument Type | Unit (UCUM) | Description    | Stability |
+| -------- | --------------- | ----------- | -------------- | --------- |
+| `gen_ai.server.time_per_output_token` | Histogram | `s` | Time per output token generated after the first token for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+<!-- semconv metric.gen_ai.server.time_per_output_token(full) -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+| Attribute  | Type | Description  | Examples  | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
+|---|---|---|---|---|---|
+| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+
+**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge.
+For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`.
+
+**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.
+
+**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
+
+
+
+`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
+
+| Value  | Description | Stability |
+|---|---|---|
+| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+
+
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+### Metric: `gen_ai.server.time_to_first_token`
+
+This metric is [recommended][MetricRecommended] to report the model server
+latency in terms of time spent to generate the first token of the response for
+any model servers which support serving LLMs. It helps measure the time spent in
+the queue and the prefill phase. It is important especially for streaming
+requests. It is calculated at a request level and is reported as a histogram
+using the buckets mentioned below.
+
+This metric SHOULD be specified with [ExplicitBucketBoundaries] of
+[0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0].
+
+<!-- semconv metric.gen_ai.server.time_to_first_token(metric_table) -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+| Name     | Instrument Type | Unit (UCUM) | Description    | Stability |
+| -------- | --------------- | ----------- | -------------- | --------- |
+| `gen_ai.server.time_to_first_token` | Histogram | `s` | Time to generate first token for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+<!-- semconv metric.gen_ai.server.time_to_first_token(full) -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+| Attribute  | Type | Description  | Examples  | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
+|---|---|---|---|---|---|
+| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+
+**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge.
+For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`.
+
+**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.
+
+**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
+
+
+
+`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
+
+| Value  | Description | Stability |
+|---|---|---|
+| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+
+
+
 <!-- markdownlint-restore -->
 <!-- prettier-ignore-end -->
 <!-- END AUTOGENERATED TEXT -->
--- a/docs/gen-ai/gen-ai-spans.md
+++ b/docs/gen-ai/gen-ai-spans.md
@ -46,7 +46,7 @@ These attributes track input data and metadata for a request to an GenAI model.
 | Attribute  | Type | Description  | Examples  | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
 |---|---|---|---|---|---|
 | [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. [1] | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
-| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [2] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [2] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`gen_ai.request.frequency_penalty`](/docs/attributes-registry/gen-ai.md) | double | The frequency penalty setting for the GenAI request. | `0.1` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`gen_ai.request.max_tokens`](/docs/attributes-registry/gen-ai.md) | int | The maximum number of tokens the model generates for a request. | `100` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | [`gen_ai.request.presence_penalty`](/docs/attributes-registry/gen-ai.md) | double | The presence penalty setting for the GenAI request. | `0.1` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
--- a/model/metrics/gen-ai.yaml
+++ b/model/metrics/gen-ai.yaml
@ -16,6 +16,18 @@ groups:
        requirement_level: required
      - ref: gen_ai.operation.name
        requirement_level: required
+  - id: metric_attributes.gen_ai.server
+    type: attribute_group
+    brief: 'This group describes GenAI server metrics attributes'
+    extends: metric_attributes.gen_ai
+    attributes:
+      - ref: error.type
+        requirement_level:
+          conditionally_required: "if the operation ended in an error"
+        note: |
+          The `error.type` SHOULD match the error code returned by the Generative AI service,
+          the canonical name of exception that occurred, or another low-cardinality error identifier.
+          Instrumentations SHOULD document the list of errors they report.
  - id: metric.gen_ai.client.token.usage
    type: metric
    metric_name: gen_ai.client.token.usage
@ -43,3 +55,27 @@ groups:
          The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
          the canonical name of exception that occurred, or another low-cardinality error identifier.
          Instrumentations SHOULD document the list of errors they report.
+  - id: metric.gen_ai.server.request.duration
+    type: metric
+    metric_name: gen_ai.server.request.duration
+    brief: 'Generative AI server request duration such as time-to-last byte or last output token'
+    instrument: histogram
+    unit: "s"
+    stability: experimental
+    extends: metric_attributes.gen_ai.server
+  - id: metric.gen_ai.server.time_per_output_token
+    type: metric
+    metric_name: gen_ai.server.time_per_output_token
+    brief: 'Time per output token generated after the first token for successful responses'
+    instrument: histogram
+    unit: "s"
+    stability: experimental
+    extends: metric_attributes.gen_ai
+  - id: metric.gen_ai.server.time_to_first_token
+    type: metric
+    metric_name: gen_ai.server.time_to_first_token
+    brief: 'Time to generate first token for successful responses'
+    instrument: histogram
+    unit: "s"
+    stability: experimental
+    extends: metric_attributes.gen_ai
--- a/model/registry/gen-ai.yaml
+++ b/model/registry/gen-ai.yaml
@ -25,7 +25,7 @@ groups:
              stability: experimental
              value: "cohere"
              brief: 'Cohere'
-        brief: The Generative AI product as identified by the client instrumentation.
+        brief: The Generative AI product as identified by the client or server instrumentation.
        note: >
          The actual GenAI product may differ from the one identified by the client.
          For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system`