Reorganize specification for clarity (#188)
* Reorganize specification for clarity * Consolidate into a single readme * New table of contents * File names changed for better orgnanization * Images moved to subdirectory * No specification changes * fix link Co-Authored-By: Yang Song <songy23@users.noreply.github.com> * clean up versioning statement Co-Authored-By: Armin Ruech <armin.ruech@gmail.com> * Clarify package/library layout * Updated DistributedContext header
This commit is contained in:
parent
5442afe730
commit
21a3adf78d
|
|
@ -1,151 +0,0 @@
|
|||
# Semantic Conventions
|
||||
|
||||
This document defines reserved attributes that can be used to add operation and
|
||||
protocol specific information.
|
||||
|
||||
In OpenTelemetry spans can be created freely and it’s up to the implementor to
|
||||
annotate them with attributes specific to the represented operation. Spans
|
||||
represent specific operations in and between systems. Some of these operations
|
||||
represent calls that use well-known protocols like HTTP or database calls.
|
||||
Depending on the protocol and the type of operation, additional information
|
||||
is needed to represent and analyze a span correctly in monitoring systems. It is
|
||||
also important to unify how this attribution is made in different languages.
|
||||
This way, the operator will not need to learn specifics of a language and
|
||||
telemetry collected from multi-language micro-service can still be easily
|
||||
correlated and cross-analyzed.
|
||||
|
||||
## HTTP client
|
||||
|
||||
This span type represents an outbound HTTP request.
|
||||
|
||||
For a HTTP client span, `SpanKind` MUST be `Client`.
|
||||
|
||||
Given an [RFC 3986](https://www.ietf.org/rfc/rfc3986.txt) compliant URI of the form
|
||||
`scheme:[//authority]path[?query][#fragment]`, the span name of the span SHOULD
|
||||
be set to to the URI path value.
|
||||
|
||||
If a framework can identify a value that represents the identity of the request
|
||||
and has a lower cardinality than the URI path, this value MUST be used for the span name instead.
|
||||
|
||||
| Attribute name | Notes and examples | Required? |
|
||||
| :------------- | :----------------------------------------------------------- | --------- |
|
||||
| `component` | Denotes the type of the span and needs to be `http`. | Yes |
|
||||
| `http.method` | HTTP request method. E.g. `"GET"`. | Yes |
|
||||
| `http.url` | HTTP host. E.g. `"https://example.com:779/users/187a34"`. | Yes |
|
||||
| `http.status_code` | [HTTP response status code](https://tools.ietf.org/html/rfc7231). E.g. `200` | No |
|
||||
| `http.status_text` | [HTTP reason phrase](https://www.ietf.org/rfc/rfc2616.txt). E.g. `OK` | No |
|
||||
|
||||
## HTTP server
|
||||
|
||||
This span type represents an inbound HTTP request.
|
||||
|
||||
For a HTTP server span, `SpanKind` MUST be `Server`.
|
||||
|
||||
Given an inbound request for a route (e.g. `"/users/:userID?"` the `name`
|
||||
attribute of the span SHOULD be set to this route.
|
||||
|
||||
If the route can not be determined, the `name` attribute MUST be set to the [RFC 3986 URI](https://www.ietf.org/rfc/rfc3986.txt) path value.
|
||||
|
||||
If a framework can identify a value that represents the identity of the request
|
||||
and has a lower cardinality than the URI path or route, this value MUST be used for the span name instead.
|
||||
|
||||
| Attribute name | Notes and examples | Required? |
|
||||
| :------------- | :----------------------------------------------------------- | --------- |
|
||||
| `component` | Denotes the type of the span and needs to be `http`. | Yes |
|
||||
| `http.method` | HTTP request method. E.g. `"GET"`. | Yes |
|
||||
| `http.url` | HTTP host. E.g. `"https://example.com:779/users/187a34"`. | Yes |
|
||||
| `http.route` | The matched route. E.g. `"/users/:userID?"`. | No |
|
||||
| `http.status_code` | [HTTP response status code](https://tools.ietf.org/html/rfc7231). E.g. `200` | No |
|
||||
| `http.status_text` | [HTTP reason phrase](https://www.ietf.org/rfc/rfc2616.txt). E.g. `OK` | No |
|
||||
|
||||
## Databases client calls
|
||||
|
||||
For database client call the `SpanKind` MUST be `Client`.
|
||||
|
||||
Span `name` should be set to low cardinality value representing the statement
|
||||
executed on the database. It may be stored procedure name (without argument), sql
|
||||
statement without variable arguments, etc. When it's impossible to get any
|
||||
meaningful representation of the span `name`, it can be populated using the same
|
||||
value as `db.instance`.
|
||||
|
||||
Note, Redis, Cassandra, HBase and other storage systems may reuse the same
|
||||
attribute names.
|
||||
|
||||
| Attribute name | Notes and examples | Required? |
|
||||
| :------------- | :----------------------------------------------------------- | --------- |
|
||||
| `component` | Database driver name or database name (when known) `JDBI`, `jdbc`, `odbc`, `postgreSQL`. | Yes |
|
||||
| `db.type` | Database type. For any SQL database, `"sql"`. For others, the lower-case database category, e.g. `"cassandra"`, `"hbase"`, or `"redis"`. | Yes |
|
||||
| `db.instance` | Database instance name. E.g., In java, if the jdbc.url=`"jdbc:mysql://db.example.com:3306/customers"`, the instance name is `"customers"`. | Yes |
|
||||
| `db.statement` | A database statement for the given database type. Note, that the value may be sanitized to exclude sensitive information. E.g., for `db.type="sql"`, `"SELECT * FROM wuser_table"`; for `db.type="redis"`, `"SET mykey 'WuValue'"`. | Yes |
|
||||
| `db.user` | Username for accessing database. E.g., `"readonly_user"` or `"reporting_user"` | No |
|
||||
|
||||
For database client calls, peer information can be populated and interpreted as
|
||||
follows:
|
||||
|
||||
| Attribute name | Notes and examples | Required |
|
||||
| :-------------- | :----------------------------------------------------------- | -------- |
|
||||
| `peer.address` | JDBC substring like `"mysql://db.example.com:3306"` | Yes |
|
||||
| `peer.hostname` | Remote hostname. `db.example.com` | Yes |
|
||||
| `peer.ipv4` | Remote IPv4 address as a `.`-separated tuple. E.g., `"127.0.0.1"` | No |
|
||||
| `peer.ipv6` | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., `"2001:0db8:85a3:0000:0000:8a2e:0370:7334"` | No |
|
||||
| `peer.port` | Remote port. E.g., `80` (integer) | No |
|
||||
| `peer.service` | Remote service name. Can be database friendly name or `db.instance` | No |
|
||||
|
||||
## gRPC
|
||||
|
||||
Implementations MUST create a span, when the gRPC call starts, one for
|
||||
client-side and one for server-side. Outgoing requests should be a span `kind`
|
||||
of `CLIENT` and incoming requests should be a span `kind` of `SERVER`.
|
||||
|
||||
Span `name` MUST be full gRPC method name formatted as:
|
||||
|
||||
```
|
||||
$package.$service/$method
|
||||
```
|
||||
|
||||
Examples of span name: `grpc.test.EchoService/Echo`.
|
||||
|
||||
### Attributes
|
||||
|
||||
| Attribute name | Notes and examples | Required? |
|
||||
| -------------- | ------------------------------------------------------------ | --------- |
|
||||
| `component` | Declares that this is a grpc component. Value MUST be `grpc` | Yes |
|
||||
|
||||
`peer.*` attributes MUST define service name as `peer.service`, host as
|
||||
`peer.hostname` and port as `peer.port`.
|
||||
|
||||
### Status
|
||||
|
||||
Implementations MUST set status which MUST be the same as the gRPC client/server
|
||||
status. The mapping between gRPC canonical codes and OpenTelemetry status codes
|
||||
is 1:1 as OpenTelemetry canonical codes is just a snapshot of grpc codes which
|
||||
can be found [here](https://github.com/grpc/grpc-go/blob/master/codes/codes.go).
|
||||
|
||||
### Events
|
||||
|
||||
In the lifetime of a gRPC stream, an event for each message sent/received on
|
||||
client and server spans SHOULD be created with the following attributes:
|
||||
|
||||
```
|
||||
-> [time],
|
||||
"name" = "message",
|
||||
"message.type" = "SENT",
|
||||
"message.id" = id
|
||||
"message.compressed_size" = <compressed size in bytes>,
|
||||
"message.uncompressed_size" = <uncompressed size in bytes>
|
||||
```
|
||||
|
||||
```
|
||||
-> [time],
|
||||
"name" = "message",
|
||||
"message.type" = "RECEIVED",
|
||||
"message.id" = id
|
||||
"message.compressed_size" = <compressed size in bytes>,
|
||||
"message.uncompressed_size" = <uncompressed size in bytes>
|
||||
```
|
||||
|
||||
The `message.id` MUST be calculated as two different counters starting from `1`
|
||||
one for sent messages and one for received message. This way we guarantee that
|
||||
the values will be consistent between different implementations. In case of
|
||||
unary calls only one sent and one received message will be recorded for both
|
||||
client and server spans.
|
||||
274
terminology.md
274
terminology.md
|
|
@ -1,274 +0,0 @@
|
|||
# Terminology
|
||||
|
||||
## Distributed Tracing
|
||||
|
||||
A distributed trace is a set of events, triggered as a result of a single
|
||||
logical operation, consolidated across various components of an application. A
|
||||
distributed trace contains events that cross process, network and security
|
||||
boundaries. A distributed trace may be initiated when someone presses a button
|
||||
to start an action on a website - in this example, the trace will represent
|
||||
calls made between the downstream services that handled the chain of requests
|
||||
initiated by this button being pressed.
|
||||
|
||||
### Trace
|
||||
|
||||
**Traces** in OpenTelemetry are defined implicitly by their **Spans**. In
|
||||
particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of
|
||||
**Spans**, where the edges between **Spans** are defined as parent/child
|
||||
relationship.
|
||||
|
||||
For example, the following is an example **Trace** made up of 8 **Spans**:
|
||||
|
||||
```
|
||||
Causal relationships between Spans in a single Trace
|
||||
|
||||
|
||||
[Span A] ←←←(the root span)
|
||||
|
|
||||
+------+------+
|
||||
| |
|
||||
[Span B] [Span C] ←←←(Span C is a `child` of Span A)
|
||||
| |
|
||||
[Span D] +---+-------+
|
||||
| |
|
||||
[Span E] [Span F]
|
||||
```
|
||||
|
||||
Sometimes it's easier to visualize **Traces** with a time axis as in the diagram
|
||||
below:
|
||||
|
||||
```
|
||||
Temporal relationships between Spans in a single Trace
|
||||
|
||||
|
||||
––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time
|
||||
|
||||
[Span A···················································]
|
||||
[Span B··············································]
|
||||
[Span D··········································]
|
||||
[Span C········································]
|
||||
[Span E·······] [Span F··]
|
||||
```
|
||||
|
||||
### Span
|
||||
|
||||
Each **Span** encapsulates the following state:
|
||||
|
||||
- An operation name
|
||||
- A start and finish timestamp
|
||||
- A set of zero or more key:value **Attributes**. The keys must be strings. The
|
||||
values may be strings, bools, or numeric types.
|
||||
- A set of zero or more **Events**, each of which is itself a key:value map
|
||||
paired with a timestamp. The keys must be strings, though the values may be of
|
||||
the same types as Span **Attributes**.
|
||||
- Parent's **Span** identifier.
|
||||
- [**Links**](#links-between-spans) to zero or more causally-related **Spans**
|
||||
(via the **SpanContext** of those related **Spans**).
|
||||
- **SpanContext** identification of a Span. See below.
|
||||
|
||||
### SpanContext
|
||||
|
||||
Represents all the information that identifies **Span** in the **Trace** and
|
||||
MUST be propagated to child Spans and across process boundaries. A
|
||||
**SpanContext** contains the tracing identifiers and the options that are
|
||||
propagated from parent to child **Spans**.
|
||||
|
||||
- **TraceId** is the identifier for a trace. It is worldwide unique with
|
||||
practically sufficient probability by being made as 16 randomly generated
|
||||
bytes. TraceId is used to group all spans for a specific trace together across
|
||||
all processes.
|
||||
- **SpanId** is the identifier for a span. It is globally unique with
|
||||
practically sufficient probability by being made as 8 randomly generated
|
||||
bytes. When passed to a child Span this identifier becomes the parent span id
|
||||
for the child **Span**.
|
||||
- **TraceOptions** represents the options for a trace. It is represented as 1
|
||||
byte (bitmap).
|
||||
- Sampling bit - Bit to represent whether trace is sampled or not (mask
|
||||
`0x1`).
|
||||
- **Tracestate** carries tracing-system specific context in a list of key value
|
||||
pairs. **Tracestate** allows different vendors propagate additional
|
||||
information and inter-operate with their legacy Id formats. For more details
|
||||
see [this](https://w3c.github.io/trace-context/#tracestate-field).
|
||||
|
||||
### Links between spans
|
||||
|
||||
A **Span** may be linked to zero or more other **Spans** (defined by
|
||||
**SpanContext**) that are causally related. **Links** can point to
|
||||
**SpanContexts** inside a single **Trace** or across different **Traces**.
|
||||
**Links** can be used to represent batched operations where a **Span** has
|
||||
multiple parents, each representing a single incoming item being processed in
|
||||
the batch. Another example of using a **Link** is to declare relationship
|
||||
between originating and restarted trace. This can be used when **Trace** enters
|
||||
trusted boundaries of an service and service policy requires to generate a new
|
||||
Trace instead of trusting incoming Trace context.
|
||||
|
||||
## Metrics
|
||||
|
||||
OpenTelemetry allows to record raw measurements or metrics with predefined
|
||||
aggregation and set of labels.
|
||||
|
||||
Recording raw measurements using OpenTelemetry API allows to defer to end-user
|
||||
the decision on what aggregation algorithm should be applied for this metric as
|
||||
well as defining labels (dimensions). It will be used in client libraries like
|
||||
gRPC to record raw measurements "server_latency" or "received_bytes". So end
|
||||
user will decide what type of aggregated values should be collected out of these
|
||||
raw measurements. It may be simple average or elaborate histogram calculation.
|
||||
|
||||
Recording of metrics with the pre-defined aggregation using OpenTelemetry API is
|
||||
not less important. It allows to collect values like cpu and memory usage, or
|
||||
simple metrics like "queue length".
|
||||
|
||||
### Recording raw measurements
|
||||
|
||||
The main classes used to record raw measurements are `Measure` and
|
||||
`Measurement`. List of `Measurement`s alongside the additional context can be
|
||||
recorded using OpenTelemetry API. So user may define to aggregate those
|
||||
`Measurement`s and use the context passed alongside to define additional
|
||||
dimensions of the resulting metric.
|
||||
|
||||
#### Measure
|
||||
|
||||
`Measure` describes the type of the individual values recorded by a library. It
|
||||
defines a contract between the library exposing the measurements and an
|
||||
application that will aggregate those individual measurements into a `Metric`.
|
||||
`Measure` is identified by name, description and a unit of values.
|
||||
|
||||
#### Measurement
|
||||
|
||||
`Measurement` describes a single value to be collected for a `Measure`.
|
||||
`Measurement` is an empty interface in API surface. This interface is defined in
|
||||
SDK.
|
||||
|
||||
### Recording metrics with predefined aggregation
|
||||
|
||||
The base class for all types of pre-aggregated metrics is called `Metric`. It
|
||||
defines basic metric properties like a name and labels. Classes inheriting from
|
||||
the `Metric` define their aggregation type as well as a structure of individual
|
||||
measurements or Points. API defines the following types of pre-aggregated
|
||||
metrics:
|
||||
|
||||
- Counter metric to report instantaneous measurement. Counter values can go
|
||||
up or stay the same, but can never go down. Counter values cannot be
|
||||
negative. There are two types of counter metric values - `double` and `long`.
|
||||
- Gauge metric to report instantaneous measurement of a double value. Gauges can
|
||||
go both up and down. The gauges values can be negative. There are two types of
|
||||
gauge metric values - `double` and `long`.
|
||||
|
||||
API allows to construct the `Metric` of a chosen type. SDK defines the way to
|
||||
query the current value of a `Metric` to be exported.
|
||||
|
||||
Every type of a `Metric` has it's API to record values to be aggregated. API
|
||||
supports both - push and pull model of setting the `Metric` value.
|
||||
|
||||
### Metrics data model and SDK
|
||||
|
||||
Metrics data model is defined in SDK and is based on
|
||||
[metrics.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/master/src/opentelemetry/proto/metrics/v1/metrics.proto).
|
||||
This data model is used by all the OpenTelemetry exporters as an input.
|
||||
Different exporters have different capabilities (e.g. which data types are
|
||||
supported) and different constraints (e.g. which characters are allowed in label
|
||||
keys). Metrics is intended to be a superset of what's possible, not a lowest
|
||||
common denominator that's supported everywhere. All exporters consume data from
|
||||
Metrics Data Model via a Metric Producer interface defined in OpenTelemetry SDK.
|
||||
|
||||
Because of this, Metrics puts minimal constraints on the data (e.g. which
|
||||
characters are allowed in keys), and code dealing with Metrics should avoid
|
||||
validation and sanitization of the Metrics data. Instead, pass the data to the
|
||||
backend, rely on the backend to perform validation, and pass back any errors
|
||||
from the backend.
|
||||
|
||||
OpenTelemetry defines the naming convention for metric names as well as a
|
||||
well-known metric names in [Semantic Conventions](semantic-conventions.md)
|
||||
document.
|
||||
|
||||
## DistributedContext
|
||||
|
||||
**DistributedContext** is an abstract data type that represents collection of entries.
|
||||
Each key of **DistributedContext** is associated with exactly one value. **DistributedContext** is serializable,
|
||||
to facilitate propagating it not only inside the process but also across process boundaries.
|
||||
|
||||
**DistributedContext** is used to annotate telemetry with the name:value pair **Entry**.
|
||||
Those values can be used to add dimension to the metric or additional contest properties to logs and traces.
|
||||
|
||||
**DistributedContext** is a recommended name but languages can have more language-specific names like **dctx**.
|
||||
|
||||
### Entry
|
||||
|
||||
An **Entry** is used to label anything that is associated with a specific operation,
|
||||
such as an HTTP request. It consists of **EntryKey**, **EntryValue** and **EntryMetadata**.
|
||||
|
||||
- **EntryKey** is the name of the **Entry**. **EntryKey** along with **EntryValue**
|
||||
can be used to aggregate and group stats, annotate traces and logs, etc. **EntryKey** is
|
||||
a string that contains only printable ASCII (codes between 32 and 126 inclusive) and with
|
||||
a length greater than zero and less than 256.
|
||||
- **EntryValue** is a string that contains only printable ASCII (codes between 32 and 126).
|
||||
- **EntryMetadata** contains properties associated with an **Entry**.
|
||||
For now only the property **EntryTTL** is defined.
|
||||
- **EntryTTL** is an integer that represents number of hops an entry can propagate.
|
||||
Anytime a sender serializes an entry, sends it over the wire and receiver unserializes
|
||||
the entry then the entry is considered to have travelled one hop.
|
||||
|
||||
## Resources
|
||||
|
||||
`Resource` captures information about the entity for which telemetry is
|
||||
recorded. For example, metrics exposed by a Kubernetes container can be linked
|
||||
to a resource that specifies the cluster, namespace, pod, and container name.
|
||||
|
||||
`Resource` may capture an entire hierarchy of entity identification. It may
|
||||
describe the host in the cloud and specific container or an application running
|
||||
in the process.
|
||||
|
||||
Note, that some of the process identification information can be associated with
|
||||
telemetry automatically by OpenTelemetry SDK or specific exporter. See
|
||||
OpenTelemetry
|
||||
[proto](https://github.com/open-telemetry/opentelemetry-proto/blob/a46c815aa5e85a52deb6cb35b8bc182fb3ca86a0/src/opentelemetry/proto/agent/common/v1/common.proto#L28-L96)
|
||||
for an example.
|
||||
|
||||
**TODO**: Better describe the difference between the resource and a Node
|
||||
https://github.com/open-telemetry/opentelemetry-proto/issues/17
|
||||
|
||||
## Propagators
|
||||
|
||||
OpenTelemetry uses `Propagators` to serialize and deserialize `SpanContext` and `DistributedContext`
|
||||
into a binary or text format. Currently there are two types of propagators:
|
||||
|
||||
- `BinaryFormat` which is used to serialize and deserialize a value into a binary representation.
|
||||
- `HTTPTextFormat` which is used to inject and extract a value as text into carriers that travel
|
||||
in-band across process boundaries.
|
||||
|
||||
## Agent/Collector
|
||||
|
||||
The OpenTelemetry service is a set of components that can collect traces,
|
||||
metrics and eventually other telemetry data (e.g. logs) from processes
|
||||
instrumented by OpenTelementry or other monitoring/tracing libraries (Jaeger,
|
||||
Prometheus, etc.), do aggregation and smart sampling, and export traces and
|
||||
metrics to one or more monitoring/tracing backends. The service will allow to
|
||||
enrich and transform collected telemetry (e.g. add additional attributes or
|
||||
scrub personal information).
|
||||
|
||||
The OpenTelemetry service has two primary modes of operation: Agent (a locally
|
||||
running daemon) and Collector (a standalone running service).
|
||||
|
||||
Read more at OpenTelemetry Service [Long-term
|
||||
Vision](https://github.com/open-telemetry/opentelemetry-service/blob/master/docs/VISION.md).
|
||||
|
||||
## Instrumentation adapters
|
||||
|
||||
The inspiration of the project is to make every library and application
|
||||
manageable out of the box by instrumenting it with OpenTelemery. However on the
|
||||
way to this goal there will be a need to enable instrumentation by plugging
|
||||
instrumentation adapters into the library of choice. These adapters can be
|
||||
wrapping library APIs, subscribing to the library-specific callbacks or
|
||||
translating telemetry exposed in other formats into OpenTelemetry model.
|
||||
|
||||
Instrumentation adapters may be called different names. It is often referred as
|
||||
plugin, collector or auto-collector, telemetry module, bridge, etc. It is always
|
||||
recommended to follow the library and language standards. For instance, if
|
||||
instrumentation adapter is implemented as "log appender" - it will probably be
|
||||
called an `appender`, not an instrumentation adapter. However if there is no
|
||||
established name - the recommendation is to call packages "Instrumentation
|
||||
Adapter" or simply "Adapter".
|
||||
|
||||
## Code injecting adapters
|
||||
|
||||
TODO: fill out as a result of SIG discussion.
|
||||
Loading…
Reference in New Issue