Add common guidance on recording errors on spans and metrics, clarify DB conventions (#1716)

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
This commit is contained in:
Liudmila Molkova 2025-01-16 11:54:10 -08:00 committed by GitHub
parent bcb052e03e
commit 539ce854bf
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
27 changed files with 238 additions and 268 deletions

4
.chloggen/1716.yaml Normal file
View File

@ -0,0 +1,4 @@
change_type: enhancement
component: docs, db
note: Add common guidance for recording errors on spans and metrics, clarify DB conventions.
issues: [1516, 1536, 1716]

View File

@ -6,30 +6,23 @@
# Exception
- [Exception Attributes](#exception-attributes)
- [Deprecated Exception Attributes](#deprecated-exception-attributes)
## Exception Attributes
This document defines the shared attributes used to report a single exception associated with a span or log.
| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| <a id="exception-escaped" href="#exception-escaped">`exception.escaped`</a> | boolean | SHOULD be set to true if the exception event is recorded at a point where it is known that the exception is escaping the scope of the span. [1] | | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| <a id="exception-message" href="#exception-message">`exception.message`</a> | string | The exception message. | `Division by zero`; `Can't convert 'int' object to str implicitly` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| <a id="exception-stacktrace" href="#exception-stacktrace">`exception.stacktrace`</a> | string | A stacktrace as a string in the natural representation for the language runtime. The representation is to be determined and documented by each language SIG. | `Exception in thread "main" java.lang.RuntimeException: Test exception\n at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n at com.example.GenerateTrace.main(GenerateTrace.java:5)` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| <a id="exception-type" href="#exception-type">`exception.type`</a> | string | The type of the exception (its fully-qualified class name, if applicable). The dynamic type of the exception should be preferred over the static type in languages that support it. | `java.net.ConnectException`; `OSError` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
**[1] `exception.escaped`:** An exception is considered to have escaped (or left) the scope of a span,
if that span is ended while the exception is still logically "in flight".
This may be actually "in flight" in some languages (e.g. if the exception
is passed to a Context manager's `__exit__` method in Python) but will
usually be caught at the point of recording the exception in most languages.
## Deprecated Exception Attributes
It is usually not possible to determine at the point where an exception is thrown
whether it will escape the scope of a span.
However, it is trivial to know that an exception
will escape, if one checks for an active exception just before ending the span,
as done in the [example for recording span exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#recording-an-exception).
Deprecated exception attributes.
It follows that an exception may still escape the scope of the span
even if the `exception.escaped` attribute was not set or set to false,
since the event might have been recorded at a time where it was not
clear whether the exception will escape.
| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| <a id="exception-escaped" href="#exception-escaped">`exception.escaped`</a> | boolean | Indicates that the exception is escaping the scope of the span. | | ![Deprecated](https://img.shields.io/badge/-deprecated-red)<br>It's no longer recommended to record exceptions that are handled and do not escape the scope of a span. |

View File

@ -13,7 +13,8 @@ Span kind SHOULD be `INTERNAL` when the traced program is the callee or `CLIENT`
The span name SHOULD be set to `{process.executable.name}`.
Instrumentations that have additional context about executed commands MAY use a different low-cardinality span name format and SHOULD document it.
Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0.
Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0. Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
additional details on how to record span status.
<!-- TODO: context propagation https://github.com/open-telemetry/semantic-conventions/issues/1612 -->

View File

@ -69,8 +69,7 @@ system specific term if more applicable.
**[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.
**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[6] `db.response.status_code`:** All Cassandra protocol error codes SHOULD be considered errors.
**[7] `db.response.status_code`:** If the operation failed and status code is available.

View File

@ -193,8 +193,7 @@ additional values when introducing new operations.
**[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.
**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[6] `db.response.status_code`:** Response codes in the 4xx and 5xx range SHOULD be considered errors.
**[7] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

View File

@ -23,7 +23,7 @@ The Semantic Conventions for [CouchDB](https://couchdb.apache.org/) extend and o
|---|---|---|---|---|---|
| [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the database, fully qualified within the server address and port. | `customers`; `test.users` | `Conditionally Required` If available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The HTTP method + the target REST route. [1] | `GET /{db}/{docid}` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB recorded as a string. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [4] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [5] | `80`; `8080`; `443` | `Conditionally Required` [6] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [7] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
@ -31,8 +31,7 @@ The Semantic Conventions for [CouchDB](https://couchdb.apache.org/) extend and o
**[1] `db.operation.name`:** In **CouchDB**, `db.operation.name` should be set to the HTTP method + the target REST route according to the API reference documentation. For example, when retrieving a document, `db.operation.name` would be set to (literally, i.e., without replacing the placeholders with concrete values): [`GET /{db}/{docid}`](https://docs.couchdb.org/en/stable/api/document/common.html#get--db-docid).
**[2] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[2] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.
**[3] `db.response.status_code`:** If response was received and the HTTP response code is available.

View File

@ -12,7 +12,6 @@ linkTitle: Client Calls
- [Name](#name)
- [Status](#status)
- [Recording exception events](#recording-exception-events)
- [Common attributes](#common-attributes)
- [Notes and well-known identifiers for `db.system`](#notes-and-well-known-identifiers-for-dbsystem)
- [Sanitization of `db.query.text`](#sanitization-of-dbquerytext)
@ -89,59 +88,11 @@ For example, for an operation describing SQL query on an anonymous table like `S
## Status
[Span Status Code][SpanStatus] MUST be left unset if the operation has ended without any errors.
Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
details on how to record span status.
Instrumentation SHOULD consider the operation as failed if any of the following is true:
- the `db.response.status_code` value indicates an error
> [!NOTE]
>
> The classification of status code as an error depends on the context.
> For example, a SQL STATE `02000` (`no_data`) indicates an error when the application
> expected the data to be available. However, it is not an error when the
> application is simply checking whether the data exists.
>
> Instrumentations that have additional context about a specific operation MAY use
> this context to set the span status more precisely.
> Instrumentations that don't have any additional context MUST follow the
> guidelines in this section.
- an exception is thrown by the instrumented method call
- the instrumented method returns an error in another way
When the operation ends with an error, instrumentation:
- SHOULD set the span status code to `Error`
- SHOULD set the `error.type` attribute
- SHOULD set the span status description when it has additional information
about the error which is not expected to contain sensitive details and aligns
with [Span Status Description][SpanStatus] definition.
It's NOT RECOMMENDED to duplicate `db.response.status_code` or `error.type`
in span status description.
When the operation fails with an exception, the span status description SHOULD be set to
the exception message.
### Recording exception events
**Status**: [Experimental][DocumentStatus]
When the operation fails with an exception, instrumentation SHOULD record
an [exception event](../exceptions/exceptions-spans.md) by default if, and only if,
the span being recorded is a local root span (does not have a local parent).
> [!NOTE]
>
> Exception stack traces could be very long and are expensive to capture and store.
> Exceptions which are not handled by instrumented libraries are likely to be handled
> and logged by the caller.
> Exceptions that are not handled will be recorded by the outermost (local root)
> instrumentation such as HTTP or gRPC server.
Instrumentation MAY provide a configuration option to record exceptions that
escape the surface of the instrumented API.
Semantic conventions for individual systems SHOULD specify which values of `db.response.status_code`
classify as errors.
## Common attributes
@ -466,4 +417,3 @@ More specific Semantic Conventions are defined for the following database techno
* [SQL](sql.md): Semantic Conventions for *SQL* databases.
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#set-status

View File

@ -82,8 +82,7 @@ When a query string value is redacted, the query string key SHOULD still be pres
**[4] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.<key>`, where `<key>` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names.
**[5] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[5] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.
**[6] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

View File

@ -24,7 +24,7 @@ The Semantic Conventions for [HBase](https://hbase.apache.org/) extend and overr
| [`db.collection.name`](/docs/attributes-registry/db.md) | string | The HBase table name. [1] | `mytable`; `ns:table` | `Conditionally Required` If applicable. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.namespace`](/docs/attributes-registry/db.md) | string | The HBase namespace. [2] | `mynamespace` | `Conditionally Required` If applicable. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [3] | `findAndModify`; `HMSET`; `SELECT` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Protocol-specific response code recorded as string. [4] | `200`; `409`; `14` | `Conditionally Required` If response was received. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Protocol-specific response code recorded as a string. [4] | `200`; `409`; `14` | `Conditionally Required` If response was received. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [5] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [6] | `80`; `8080`; `443` | `Conditionally Required` [7] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [8] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |

View File

@ -42,41 +42,9 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database
return code which is adopted by some database systems like PostgreSQL.
See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html)
for the details.
Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific
error codes. Database SQL drivers usually provide access to both properties.
For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html)
class reports them with `getSQLState()` and `getErrorCode()` methods.
Instrumentations SHOULD populate the `db.response.status_code` with the
the most specific code available to them.
Here's a non-exhaustive list of databases that report vendor-specific
codes with granularity higher than SQLSTATE (or don't report SQLSTATE
at all):
- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql).
- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/)
- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors)
- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html)
- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm)
- [SQLite result codes](https://www.sqlite.org/rescode.html)
These systems SHOULD set the `db.response.status_code` to a
known vendor-specific error code. If only SQLSTATE is available,
it SHOULD be used.
When multiple error codes are available and specificity is unclear,
instrumentation SHOULD set the `db.response.status_code` to the
concatenated string of all codes with '/' used as a separator.
For example, generic DB instrumentation that detected an error and has
SQLSTATE `"42000"` and vendor-specific `1071` should set
`db.response.status_code` to `"42000/1071"`."
**[2] `db.response.status_code`:** MariaDB uses vendor-specific error codes on all errors and reports [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) in some cases.
MariaDB error codes are more granular than SQLSTATE, so MariaDB instrumentations SHOULD set the `db.response.status_code` to this known error code.
When SQLSTATE is available, SQLSTATE of "Class 02" or higher SHOULD be considered errors. When SQLSTATE is not available, all MariaDB error codes SHOULD be considered errors.
**[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

View File

@ -40,8 +40,7 @@ then that collection name SHOULD be used.
**[2] `db.operation.name`:** See [MongoDB database commands](https://www.mongodb.com/docs/manual/reference/command/).
**[3] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[3] `db.response.status_code`:** All MongoDB error codes SHOULD be considered errors.
**[4] `db.response.status_code`:** If the operation failed and error code is available.

View File

@ -47,6 +47,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
**[2] `db.response.status_code`:** Microsoft SQL Server does not report SQLSTATE.
Instrumentations SHOULD use [error severity](https://learn.microsoft.com/sql/relational-databases/errors-events/database-engine-error-severities) returned along with the status code to determine the status of the span. Response codes with severity 11 or higher SHOULD be considered errors.
**[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

View File

@ -22,7 +22,7 @@ The Semantic Conventions for *MySQL* extend and override the [Database Semantic
| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection. [1] | `products`; `customers` | `Conditionally Required` If available without an additional network call. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html). [2] | `1005`; `MY-010016` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) recorded as a string. [2] | `1005`; `MY-010016` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [3] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [4] | `80`; `8080`; `443` | `Conditionally Required` [5] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [6] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
@ -42,41 +42,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database
return code which is adopted by some database systems like PostgreSQL.
See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html)
for the details.
Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific
error codes. Database SQL drivers usually provide access to both properties.
For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html)
class reports them with `getSQLState()` and `getErrorCode()` methods.
Instrumentations SHOULD populate the `db.response.status_code` with the
the most specific code available to them.
Here's a non-exhaustive list of databases that report vendor-specific
codes with granularity higher than SQLSTATE (or don't report SQLSTATE
at all):
- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql).
- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/)
- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors)
- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html)
- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm)
- [SQLite result codes](https://www.sqlite.org/rescode.html)
These systems SHOULD set the `db.response.status_code` to a
known vendor-specific error code. If only SQLSTATE is available,
it SHOULD be used.
When multiple error codes are available and specificity is unclear,
instrumentation SHOULD set the `db.response.status_code` to the
concatenated string of all codes with '/' used as a separator.
For example, generic DB instrumentation that detected an error and has
SQLSTATE `"42000"` and vendor-specific `1071` should set
`db.response.status_code` to `"42000/1071"`."
**[2] `db.response.status_code`:** MySQL error codes are vendor specific error codes and don't follow [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) conventions. All MySQL error codes SHOULD be considered errors.
**[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

View File

@ -49,41 +49,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the user provided whe
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database
return code which is adopted by some database systems like PostgreSQL.
See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html)
for the details.
Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific
error codes. Database SQL drivers usually provide access to both properties.
For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html)
class reports them with `getSQLState()` and `getErrorCode()` methods.
Instrumentations SHOULD populate the `db.response.status_code` with the
the most specific code available to them.
Here's a non-exhaustive list of databases that report vendor-specific
codes with granularity higher than SQLSTATE (or don't report SQLSTATE
at all):
- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql).
- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/)
- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors)
- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html)
- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm)
- [SQLite result codes](https://www.sqlite.org/rescode.html)
These systems SHOULD set the `db.response.status_code` to a
known vendor-specific error code. If only SQLSTATE is available,
it SHOULD be used.
When multiple error codes are available and specificity is unclear,
instrumentation SHOULD set the `db.response.status_code` to the
concatenated string of all codes with '/' used as a separator.
For example, generic DB instrumentation that detected an error and has
SQLSTATE `"42000"` and vendor-specific `1071` should set
`db.response.status_code` to `"42000/1071"`."
**[2] `db.response.status_code`:** PostgreSQL follows SQL standard conventions for [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE). Response codes of "Class 02" or higher SHOULD be considered errors.
**[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

View File

@ -60,8 +60,7 @@ system specific term if more applicable.
**[3] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.
**[4] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[4] `db.response.status_code`:** All Redis error prefixes SHOULD be considered errors.
**[5] `db.response.status_code`:** If the operation failed and status code is available.

View File

@ -46,7 +46,7 @@ Instrumentations applied to generic SQL drivers SHOULD adhere to SQL semantic co
| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection, fully qualified within the server address and port. [1] | `customers`; `test.users` | `Conditionally Required` If available without an additional network call. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Database response code recorded as string. [2] | `ORA-17027`; `1052`; `2201B` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Database response code recorded as a string. [2] | `ORA-17027`; `1052`; `2201B` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [3] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [4] | `80`; `8080`; `443` | `Conditionally Required` [5] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [6] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |

View File

@ -9,8 +9,6 @@ path_base_for_github_subdir:
**Status**: [Stable][DocumentStatus]
This document defines semantic conventions for Exceptions.
Semantic conventions for Exceptions are defined for the following signals:
* [Exceptions on spans](exceptions-spans.md): Semantic Conventions for Exceptions associated with *spans*.

View File

@ -11,33 +11,6 @@ exceptions associated with spans.
<!-- toc -->
- [Recording an Exception](#recording-an-exception)
- [Exception event](#exception-event)
- [Stacktrace Representation](#stacktrace-representation)
<!-- tocstop -->
## Recording an Exception
An exception SHOULD be recorded as an `Event` on the span during which it occurred.
The name of the event MUST be `"exception"`.
A typical template for an auto-instrumentation implementing this semantic convention
using an [API-provided `recordException` method](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#record-exception)
could look like this (pseudo-Java):
```java
Span span = myTracer.startSpan(/*...*/);
try {
// Code that does the actual work which the Span represents
} catch (Throwable e) {
span.recordException(e, Attributes.of("exception.escaped", true));
throw e;
} finally {
span.end();
}
```
## Exception event
<!-- semconv event.exception -->
@ -57,30 +30,13 @@ This event describes a single exception.
|---|---|---|---|---|---|
| [`exception.message`](/docs/attributes-registry/exception.md) | string | The exception message. | `Division by zero`; `Can't convert 'int' object to str implicitly` | `Conditionally Required` [1] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`exception.type`](/docs/attributes-registry/exception.md) | string | The type of the exception (its fully-qualified class name, if applicable). The dynamic type of the exception should be preferred over the static type in languages that support it. | `java.net.ConnectException`; `OSError` | `Conditionally Required` [2] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`exception.escaped`](/docs/attributes-registry/exception.md) | boolean | SHOULD be set to true if the exception event is recorded at a point where it is known that the exception is escaping the scope of the span. [3] | | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`exception.escaped`](/docs/attributes-registry/exception.md) | boolean | Indicates that the exception is escaping the scope of the span. | | `Recommended` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)<br>It's no longer recommended to record exceptions that are handled and do not escape the scope of a span. |
| [`exception.stacktrace`](/docs/attributes-registry/exception.md) | string | A stacktrace as a string in the natural representation for the language runtime. The representation is to be determined and documented by each language SIG. | `Exception in thread "main" java.lang.RuntimeException: Test exception\n at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n at com.example.GenerateTrace.main(GenerateTrace.java:5)` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
**[1] `exception.message`:** Required if `exception.type` is not set, recommended otherwise.
**[2] `exception.type`:** Required if `exception.message` is not set, recommended otherwise.
**[3] `exception.escaped`:** An exception is considered to have escaped (or left) the scope of a span,
if that span is ended while the exception is still logically "in flight".
This may be actually "in flight" in some languages (e.g. if the exception
is passed to a Context manager's `__exit__` method in Python) but will
usually be caught at the point of recording the exception in most languages.
It is usually not possible to determine at the point where an exception is thrown
whether it will escape the scope of a span.
However, it is trivial to know that an exception
will escape, if one checks for an active exception just before ending the span,
as done in the [example for recording span exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#recording-an-exception).
It follows that an exception may still escape the scope of the span
even if the `exception.escaped` attribute was not set or set to false,
since the event might have been recorded at a time where it was not
clear whether the exception will escape.
<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->

View File

@ -36,6 +36,8 @@ See also the [additional instructions for instrumenting AWS Lambda](aws-lambda.m
Span `name` should be set to the function name being executed. Depending on the value of the `faas.trigger` attribute, additional attributes MUST be set. For example, an `http` trigger SHOULD follow the [HTTP Server semantic conventions](/docs/http/http-spans.md#http-server-semantic-conventions). For more information, refer to the [Function Trigger Type](#function-trigger-type) section.
Refer to the [Recording Errors](/docs/general/recording-errors.md) document for details on how to record span status.
If Spans following this convention are produced, a Resource of type `faas` MUST exist following the [Resource semantic convention](../resource/faas.md).
<!-- semconv span.faas -->

View File

@ -11,6 +11,7 @@ linkTitle: Generative AI traces
<!-- toc -->
- [Name](#name)
- [Status](#status)
- [GenAI attributes](#genai-attributes)
- [Capturing inputs and outputs](#capturing-inputs-and-outputs)
@ -30,6 +31,11 @@ GenAI spans MUST follow the overall [guidelines for span names](https://github.c
The **span name** SHOULD be `{gen_ai.operation.name} {gen_ai.request.model}`.
Semantic conventions for individual GenAI systems and frameworks MAY specify different span name format.
### Status
Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
details on how to record span status.
## GenAI attributes
These attributes track input data and metadata for a request to a GenAI model. Each attribute represents a concept that is common to most Generative AI clients.

View File

@ -0,0 +1,129 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: Recording errors
--->
# Recording errors
**Status**: [Development][DocumentStatus].
<!-- toc -->
- [What constitutes an error](#what-constitutes-an-error)
- [Recording errors on spans](#recording-errors-on-spans)
- [Recording errors on metrics](#recording-errors-on-metrics)
- [Recording exceptions](#recording-exceptions)
<!-- tocstop -->
This document provides recommendations to semantic convention and instrumentation authors
on how to record errors on spans and metrics.
Individual semantic conventions are encouraged to provide additional guidance.
## What constitutes an error
An operation SHOULD be considered as failed if any of the following is true:
- an exception is thrown by the instrumented method (API, block of code, or another instrumented unit)
- the instrumented method returns an error in another way, for example, via an error code
Semantic conventions that define domain-specific status codes SHOULD specify
which status codes should be reported as errors by a general-purpose instrumentation.
> [!NOTE]
>
> The classification of a status code as an error depends on the context.
> For example, an HTTP 404 "Not Found" status code indicates an error if the application
> expected the resource to be available. However, it is not an error when the
> application is simply checking whether the resource exists.
>
> Instrumentations that have additional context about a specific request MAY use
> this context to set the span status more precisely.
Errors that were retried or handled (allowing an operation to complete gracefully) SHOULD NOT
be recorded on spans or metrics that describe this operation.
## Recording errors on spans
[Span Status Code][SpanStatus] MUST be left unset if the instrumented operation has
ended without any errors.
When the operation ends with an error, instrumentation:
- SHOULD set the span status code to `Error`
- SHOULD set the [`error.type`](/docs/attributes-registry/error.md#error-type) attribute
- SHOULD set the span status description when it has additional information
about the error which is not expected to contain sensitive details and aligns
with [Span Status Description][SpanStatus] definition.
It's NOT RECOMMENDED to duplicate status code or `error.type` in span status description.
When the operation fails with an exception, the span status description SHOULD be set to
the exception message.
Refer to the [recording exceptions](#recording-errors) on capturing exception
details.
## Recording errors on metrics
Semantic conventions for operations usually define an operation duration histogram
metric. This metric SHOULD include the `error.type` attribute. This enables users to derive
throughput and error rates.
Operations that complete successfully SHOULD NOT include the `error.type` attribute,
allowing users to filter out errors.
Semantic conventions SHOULD include `error.type` on other metrics when it's applicable.
For example, `messaging.client.sent.messages` metric measures message throughput (one
messaging operation may involve sending multiple messages) and includes `error.type`.
It's RECOMMENDED to report one metric that includes successes and failures as opposed
to reporting two (or more) metrics depending on the operation status.
Instrumentation SHOULD ensure `error.type` is applied consistently across spans
and metrics when both are reported. A span and its corresponding metric for a single
operation SHOULD have the same `error.type` value if the operation failed and SHOULD NOT
include it if the operation succeeded.
## Recording exceptions
When an instrumented operation fails with an exception, instrumentation SHOULD record
this exception as a [span event](/docs/exceptions/exceptions-spans.md) or a [log record](/docs/exceptions/exceptions-logs.md).
It's RECOMMENDED to use the `Span.recordException` API or logging library API that takes exception instance
instead of providing individual attributes. This enables the OpenTelemetry SDK to
control what information is recorded based on application configuration.
It's NOT RECOMMENDED to record the same exception more than once.
It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library.
For example, in this code-snippet, `ResourceAlreadyExistsException` is handled and the corresponding
native instrumentation should not record it. Exceptions which are propagated
to the caller should be recorded (or logged) once.
```java
public boolean createIfNotExists(String resourceId) throws IOException {
Span span = startSpan();
try {
create(resourceId);
return true;
} catch (ResourceAlreadyExistsException e) {
// not recording exception and not setting span status to error - exception is handled
// but we can set attributes that capture additional details
span.setAttribute(AttributeKey.stringKey("acme.resource.create.status"), "already_exists");
return false;
} catch (IOException e) {
// recording exception here (assuming it was not recorded inside `create` method)
span.recordException(e);
// or
// logger.warn(e);
span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName())
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
}
}
```
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#set-status

View File

@ -91,7 +91,7 @@ Instrumentation MUST NOT default to using URI path as a `{target}`.
the response body; or 3xx codes with max redirects exceeded), in which case status
MUST be set to `Error`.
> **Note:**
> [!NOTE]
>
> The classification of an HTTP status code as an error depends on the context.
> For example, a 404 "Not Found" status code indicates an error if the application
@ -117,6 +117,9 @@ the client or server from sending/receiving the request/response fully.
When instrumentation detects such errors it SHOULD set span status to `Error`
and SHOULD set the `error.type` attribute.
**Status**: [Development][DocumentStatus] - Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
general considerations on how to record span status.
## HTTP client
This span type represents an outbound HTTP request. There are two ways this can be achieved in an instrumentation:

View File

@ -22,6 +22,7 @@
- [Span name](#span-name)
- [Operation types](#operation-types)
- [Span kind](#span-kind)
- [Span status](#span-status)
- [Trace structure](#trace-structure)
- [Producer spans](#producer-spans)
- [Consumer spans](#consumer-spans)
@ -247,6 +248,11 @@ Span kind SHOULD be set according to the following table, based on the operation
Setting span kinds according to this table allows analysis tools to interpret spans
and relationships between them without the need for additional semantic hints.
### Span status
Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
details on how to record span status.
### Trace structure
#### Producer spans

View File

@ -15,6 +15,7 @@ This document defines how to describe remote procedure calls
- [Common remote procedure call conventions](#common-remote-procedure-call-conventions)
- [Span name](#span-name)
- [Span status](#span-status)
- [Service name](#service-name)
- [Client attributes](#client-attributes)
- [Server attributes](#server-attributes)
@ -79,6 +80,11 @@ Examples of span names:
`MyServiceReference.ICalculator/Add` reported by the client for .NET WCF calls
- `MyServiceWithNoPackage/theMethod`
### Span status
Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
details on how to record span status.
### Service name
On the server process receiving and handling the remote procedure call, the service name provided in `rpc.service` does not necessarily have to match the [`service.name`][] resource attribute.

View File

@ -148,6 +148,9 @@ groups:
represented as a string.
note: >
Microsoft SQL Server does not report SQLSTATE.
Instrumentations SHOULD use [error severity](https://learn.microsoft.com/sql/relational-databases/errors-events/database-engine-error-severities)
returned along with the status code to determine the status of the span. Response codes with severity 11 or higher SHOULD be considered errors.
examples: ["102", "40020"]
- id: span.db.postgresql.client
@ -183,8 +186,10 @@ groups:
- ref: db.response.status_code
brief: >
[PostgreSQL error code](https://www.postgresql.org/docs/current/errcodes-appendix.html).
note: >
PostgreSQL follows SQL standard conventions for [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE).
Response codes of "Class 02" or higher SHOULD be considered errors.
examples: ["08000", "08P01"]
- id: span.db.mysql.client
type: span
stability: experimental
@ -209,7 +214,10 @@ groups:
examples: ["products", "customers"]
- ref: db.response.status_code
brief: >
[MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html).
[MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) recorded as a string.
note: >
MySQL error codes are vendor specific error codes and don't follow [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE)
conventions. All MySQL error codes SHOULD be considered errors.
examples: ["1005", "MY-010016"]
- id: span.db.mariadb.client
@ -238,8 +246,18 @@ groups:
brief: >
[Maria DB error code](https://mariadb.com/kb/en/mariadb-error-code-reference/)
represented as a string.
examples: ["1008", "3058"]
note: >
MariaDB uses vendor-specific error codes on all errors and reports
[SQLSTATE](https://mariadb.com/kb/en/sqlstate/) in some cases.
MariaDB error codes are more granular than SQLSTATE, so MariaDB instrumentations
SHOULD set the `db.response.status_code` to this known error code.
When SQLSTATE is available, SQLSTATE of "Class 02" or higher SHOULD be
considered errors. When SQLSTATE is not available, all MariaDB error
codes SHOULD be considered errors.
examples: ["1008", "3058"]
- id: span.db.cassandra.client
type: span
span_kind: client
@ -274,6 +292,8 @@ groups:
- ref: db.response.status_code
brief: >
[Cassandra protocol error code](https://github.com/apache/cassandra/blob/cassandra-5.0/doc/native_protocol_v5.spec) represented as a string.
note: >
All Cassandra protocol error codes SHOULD be considered errors.
examples: ["102", "40020"]
- id: span.db.hbase.client
type: span
@ -302,7 +322,7 @@ groups:
conditionally_required: If readily available.
- ref: db.response.status_code
brief: >
Protocol-specific response code recorded as string.
Protocol-specific response code recorded as a string.
examples: ["200", "409", "14"]
requirement_level:
conditionally_required: If response was received.
@ -334,7 +354,9 @@ groups:
note: "" # overriding the base note
- ref: db.response.status_code
brief: >
The HTTP response code returned by the Couch DB.
The HTTP response code returned by the Couch DB recorded as a string.
note: >
HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.
examples: ["200", "201", "429"]
requirement_level:
conditionally_required: If response was received and the HTTP response code is available.
@ -395,6 +417,8 @@ groups:
brief: >
The Redis [simple error](https://redis.io/docs/latest/develop/reference/protocol-spec/#simple-errors) prefix.
examples: ["ERR", "WRONGTYPE", "CLUSTERDOWN"]
note: >
All Redis error prefixes SHOULD be considered errors.
- ref: db.operation.batch.size
- ref: db.operation.parameter
requirement_level: opt_in
@ -434,6 +458,8 @@ groups:
- ref: db.response.status_code
brief: >
[MongoDB error code](https://www.mongodb.com/docs/manual/reference/error-codes/) represented as a string.
note: >
All MongoDB error codes SHOULD be considered errors.
requirement_level:
conditionally_required: If the operation failed and error code is available.
examples: ["36", "11602"]
@ -492,6 +518,8 @@ groups:
brief: >
The HTTP response code returned by the Elasticsearch cluster.
examples: ["200", "201", "429"]
note: >
HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.
requirement_level:
conditionally_required: If response was received.
- id: span.db.sql.client
@ -528,7 +556,7 @@ groups:
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
- ref: db.response.status_code
brief: >
Database response code recorded as string.
Database response code recorded as a string.
note: |
SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database
return code which is adopted by some database systems like PostgreSQL.
@ -607,6 +635,8 @@ groups:
brief: >
Cosmos DB status code.
examples: ["200", "201"]
note: >
Response codes in the 4xx and 5xx range SHOULD be considered errors.
requirement_level:
conditionally_required: if response was received
- ref: db.response.returned_rows

View File

@ -0,0 +1,14 @@
groups:
- id: registry.exception.deprecated
type: attribute_group
display_name: Deprecated Exception Attributes
brief: >
Deprecated exception attributes.
attributes:
- id: exception.escaped
type: boolean
stability: stable
deprecated: "It's no longer recommended to record exceptions that are handled
and do not escape the scope of a span."
brief: >
Indicates that the exception is escaping the scope of the span.

View File

@ -31,26 +31,3 @@ groups:
at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n
at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n
at com.example.GenerateTrace.main(GenerateTrace.java:5)
- id: exception.escaped
type: boolean
stability: stable
brief: >
SHOULD be set to true if the exception event is recorded at a point where
it is known that the exception is escaping the scope of the span.
note: |-
An exception is considered to have escaped (or left) the scope of a span,
if that span is ended while the exception is still logically "in flight".
This may be actually "in flight" in some languages (e.g. if the exception
is passed to a Context manager's `__exit__` method in Python) but will
usually be caught at the point of recording the exception in most languages.
It is usually not possible to determine at the point where an exception is thrown
whether it will escape the scope of a span.
However, it is trivial to know that an exception
will escape, if one checks for an active exception just before ending the span,
as done in the [example for recording span exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#recording-an-exception).
It follows that an exception may still escape the scope of the span
even if the `exception.escaped` attribute was not set or set to false,
since the event might have been recorded at a time where it was not
clear whether the exception will escape.