From 21a3adf78d669ff544992226c170b06d4e610d97 Mon Sep 17 00:00:00 2001 From: Ted Young Date: Wed, 24 Jul 2019 11:16:14 -0700 Subject: [PATCH] Reorganize specification for clarity (#188) * Reorganize specification for clarity * Consolidate into a single readme * New table of contents * File names changed for better orgnanization * Images moved to subdirectory * No specification changes * fix link Co-Authored-By: Yang Song * clean up versioning statement Co-Authored-By: Armin Ruech * Clarify package/library layout * Updated DistributedContext header --- semantic-conventions.md | 151 ---------------------- terminology.md | 274 ---------------------------------------- 2 files changed, 425 deletions(-) delete mode 100644 semantic-conventions.md delete mode 100644 terminology.md diff --git a/semantic-conventions.md b/semantic-conventions.md deleted file mode 100644 index a6931a172..000000000 --- a/semantic-conventions.md +++ /dev/null @@ -1,151 +0,0 @@ -# Semantic Conventions - -This document defines reserved attributes that can be used to add operation and -protocol specific information. - -In OpenTelemetry spans can be created freely and it’s up to the implementor to -annotate them with attributes specific to the represented operation. Spans -represent specific operations in and between systems. Some of these operations -represent calls that use well-known protocols like HTTP or database calls. -Depending on the protocol and the type of operation, additional information -is needed to represent and analyze a span correctly in monitoring systems. It is -also important to unify how this attribution is made in different languages. -This way, the operator will not need to learn specifics of a language and -telemetry collected from multi-language micro-service can still be easily -correlated and cross-analyzed. - -## HTTP client - -This span type represents an outbound HTTP request. - -For a HTTP client span, `SpanKind` MUST be `Client`. - -Given an [RFC 3986](https://www.ietf.org/rfc/rfc3986.txt) compliant URI of the form -`scheme:[//authority]path[?query][#fragment]`, the span name of the span SHOULD -be set to to the URI path value. - -If a framework can identify a value that represents the identity of the request -and has a lower cardinality than the URI path, this value MUST be used for the span name instead. - -| Attribute name | Notes and examples | Required? | -| :------------- | :----------------------------------------------------------- | --------- | -| `component` | Denotes the type of the span and needs to be `http`. | Yes | -| `http.method` | HTTP request method. E.g. `"GET"`. | Yes | -| `http.url` | HTTP host. E.g. `"https://example.com:779/users/187a34"`. | Yes | -| `http.status_code` | [HTTP response status code](https://tools.ietf.org/html/rfc7231). E.g. `200` | No | -| `http.status_text` | [HTTP reason phrase](https://www.ietf.org/rfc/rfc2616.txt). E.g. `OK` | No | - -## HTTP server - -This span type represents an inbound HTTP request. - -For a HTTP server span, `SpanKind` MUST be `Server`. - -Given an inbound request for a route (e.g. `"/users/:userID?"` the `name` -attribute of the span SHOULD be set to this route. - -If the route can not be determined, the `name` attribute MUST be set to the [RFC 3986 URI](https://www.ietf.org/rfc/rfc3986.txt) path value. - -If a framework can identify a value that represents the identity of the request -and has a lower cardinality than the URI path or route, this value MUST be used for the span name instead. - -| Attribute name | Notes and examples | Required? | -| :------------- | :----------------------------------------------------------- | --------- | -| `component` | Denotes the type of the span and needs to be `http`. | Yes | -| `http.method` | HTTP request method. E.g. `"GET"`. | Yes | -| `http.url` | HTTP host. E.g. `"https://example.com:779/users/187a34"`. | Yes | -| `http.route` | The matched route. E.g. `"/users/:userID?"`. | No | -| `http.status_code` | [HTTP response status code](https://tools.ietf.org/html/rfc7231). E.g. `200` | No | -| `http.status_text` | [HTTP reason phrase](https://www.ietf.org/rfc/rfc2616.txt). E.g. `OK` | No | - -## Databases client calls - -For database client call the `SpanKind` MUST be `Client`. - -Span `name` should be set to low cardinality value representing the statement -executed on the database. It may be stored procedure name (without argument), sql -statement without variable arguments, etc. When it's impossible to get any -meaningful representation of the span `name`, it can be populated using the same -value as `db.instance`. - -Note, Redis, Cassandra, HBase and other storage systems may reuse the same -attribute names. - -| Attribute name | Notes and examples | Required? | -| :------------- | :----------------------------------------------------------- | --------- | -| `component` | Database driver name or database name (when known) `JDBI`, `jdbc`, `odbc`, `postgreSQL`. | Yes | -| `db.type` | Database type. For any SQL database, `"sql"`. For others, the lower-case database category, e.g. `"cassandra"`, `"hbase"`, or `"redis"`. | Yes | -| `db.instance` | Database instance name. E.g., In java, if the jdbc.url=`"jdbc:mysql://db.example.com:3306/customers"`, the instance name is `"customers"`. | Yes | -| `db.statement` | A database statement for the given database type. Note, that the value may be sanitized to exclude sensitive information. E.g., for `db.type="sql"`, `"SELECT * FROM wuser_table"`; for `db.type="redis"`, `"SET mykey 'WuValue'"`. | Yes | -| `db.user` | Username for accessing database. E.g., `"readonly_user"` or `"reporting_user"` | No | - -For database client calls, peer information can be populated and interpreted as -follows: - -| Attribute name | Notes and examples | Required | -| :-------------- | :----------------------------------------------------------- | -------- | -| `peer.address` | JDBC substring like `"mysql://db.example.com:3306"` | Yes | -| `peer.hostname` | Remote hostname. `db.example.com` | Yes | -| `peer.ipv4` | Remote IPv4 address as a `.`-separated tuple. E.g., `"127.0.0.1"` | No | -| `peer.ipv6` | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., `"2001:0db8:85a3:0000:0000:8a2e:0370:7334"` | No | -| `peer.port` | Remote port. E.g., `80` (integer) | No | -| `peer.service` | Remote service name. Can be database friendly name or `db.instance` | No | - -## gRPC - -Implementations MUST create a span, when the gRPC call starts, one for -client-side and one for server-side. Outgoing requests should be a span `kind` -of `CLIENT` and incoming requests should be a span `kind` of `SERVER`. - -Span `name` MUST be full gRPC method name formatted as: - -``` -$package.$service/$method -``` - -Examples of span name: `grpc.test.EchoService/Echo`. - -### Attributes - -| Attribute name | Notes and examples | Required? | -| -------------- | ------------------------------------------------------------ | --------- | -| `component` | Declares that this is a grpc component. Value MUST be `grpc` | Yes | - -`peer.*` attributes MUST define service name as `peer.service`, host as -`peer.hostname` and port as `peer.port`. - -### Status - -Implementations MUST set status which MUST be the same as the gRPC client/server -status. The mapping between gRPC canonical codes and OpenTelemetry status codes -is 1:1 as OpenTelemetry canonical codes is just a snapshot of grpc codes which -can be found [here](https://github.com/grpc/grpc-go/blob/master/codes/codes.go). - -### Events - -In the lifetime of a gRPC stream, an event for each message sent/received on -client and server spans SHOULD be created with the following attributes: - -``` --> [time], - "name" = "message", - "message.type" = "SENT", - "message.id" = id - "message.compressed_size" = , - "message.uncompressed_size" = -``` - -``` --> [time], - "name" = "message", - "message.type" = "RECEIVED", - "message.id" = id - "message.compressed_size" = , - "message.uncompressed_size" = -``` - -The `message.id` MUST be calculated as two different counters starting from `1` -one for sent messages and one for received message. This way we guarantee that -the values will be consistent between different implementations. In case of -unary calls only one sent and one received message will be recorded for both -client and server spans. diff --git a/terminology.md b/terminology.md deleted file mode 100644 index 22c71aaca..000000000 --- a/terminology.md +++ /dev/null @@ -1,274 +0,0 @@ -# Terminology - -## Distributed Tracing - -A distributed trace is a set of events, triggered as a result of a single -logical operation, consolidated across various components of an application. A -distributed trace contains events that cross process, network and security -boundaries. A distributed trace may be initiated when someone presses a button -to start an action on a website - in this example, the trace will represent -calls made between the downstream services that handled the chain of requests -initiated by this button being pressed. - -### Trace - -**Traces** in OpenTelemetry are defined implicitly by their **Spans**. In -particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of -**Spans**, where the edges between **Spans** are defined as parent/child -relationship. - -For example, the following is an example **Trace** made up of 8 **Spans**: - -``` -Causal relationships between Spans in a single Trace - - - [Span A] ←←←(the root span) - | - +------+------+ - | | - [Span B] [Span C] ←←←(Span C is a `child` of Span A) - | | - [Span D] +---+-------+ - | | - [Span E] [Span F] -``` - -Sometimes it's easier to visualize **Traces** with a time axis as in the diagram -below: - -``` -Temporal relationships between Spans in a single Trace - - -––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time - - [Span A···················································] - [Span B··············································] - [Span D··········································] - [Span C········································] - [Span E·······] [Span F··] -``` - -### Span - -Each **Span** encapsulates the following state: - -- An operation name -- A start and finish timestamp -- A set of zero or more key:value **Attributes**. The keys must be strings. The - values may be strings, bools, or numeric types. -- A set of zero or more **Events**, each of which is itself a key:value map - paired with a timestamp. The keys must be strings, though the values may be of - the same types as Span **Attributes**. -- Parent's **Span** identifier. -- [**Links**](#links-between-spans) to zero or more causally-related **Spans** - (via the **SpanContext** of those related **Spans**). -- **SpanContext** identification of a Span. See below. - -### SpanContext - -Represents all the information that identifies **Span** in the **Trace** and -MUST be propagated to child Spans and across process boundaries. A -**SpanContext** contains the tracing identifiers and the options that are -propagated from parent to child **Spans**. - -- **TraceId** is the identifier for a trace. It is worldwide unique with - practically sufficient probability by being made as 16 randomly generated - bytes. TraceId is used to group all spans for a specific trace together across - all processes. -- **SpanId** is the identifier for a span. It is globally unique with - practically sufficient probability by being made as 8 randomly generated - bytes. When passed to a child Span this identifier becomes the parent span id - for the child **Span**. -- **TraceOptions** represents the options for a trace. It is represented as 1 - byte (bitmap). - - Sampling bit - Bit to represent whether trace is sampled or not (mask - `0x1`). -- **Tracestate** carries tracing-system specific context in a list of key value - pairs. **Tracestate** allows different vendors propagate additional - information and inter-operate with their legacy Id formats. For more details - see [this](https://w3c.github.io/trace-context/#tracestate-field). - -### Links between spans - -A **Span** may be linked to zero or more other **Spans** (defined by -**SpanContext**) that are causally related. **Links** can point to -**SpanContexts** inside a single **Trace** or across different **Traces**. -**Links** can be used to represent batched operations where a **Span** has -multiple parents, each representing a single incoming item being processed in -the batch. Another example of using a **Link** is to declare relationship -between originating and restarted trace. This can be used when **Trace** enters -trusted boundaries of an service and service policy requires to generate a new -Trace instead of trusting incoming Trace context. - -## Metrics - -OpenTelemetry allows to record raw measurements or metrics with predefined -aggregation and set of labels. - -Recording raw measurements using OpenTelemetry API allows to defer to end-user -the decision on what aggregation algorithm should be applied for this metric as -well as defining labels (dimensions). It will be used in client libraries like -gRPC to record raw measurements "server_latency" or "received_bytes". So end -user will decide what type of aggregated values should be collected out of these -raw measurements. It may be simple average or elaborate histogram calculation. - -Recording of metrics with the pre-defined aggregation using OpenTelemetry API is -not less important. It allows to collect values like cpu and memory usage, or -simple metrics like "queue length". - -### Recording raw measurements - -The main classes used to record raw measurements are `Measure` and -`Measurement`. List of `Measurement`s alongside the additional context can be -recorded using OpenTelemetry API. So user may define to aggregate those -`Measurement`s and use the context passed alongside to define additional -dimensions of the resulting metric. - -#### Measure - -`Measure` describes the type of the individual values recorded by a library. It -defines a contract between the library exposing the measurements and an -application that will aggregate those individual measurements into a `Metric`. -`Measure` is identified by name, description and a unit of values. - -#### Measurement - -`Measurement` describes a single value to be collected for a `Measure`. -`Measurement` is an empty interface in API surface. This interface is defined in -SDK. - -### Recording metrics with predefined aggregation - -The base class for all types of pre-aggregated metrics is called `Metric`. It -defines basic metric properties like a name and labels. Classes inheriting from -the `Metric` define their aggregation type as well as a structure of individual -measurements or Points. API defines the following types of pre-aggregated -metrics: - -- Counter metric to report instantaneous measurement. Counter values can go - up or stay the same, but can never go down. Counter values cannot be - negative. There are two types of counter metric values - `double` and `long`. -- Gauge metric to report instantaneous measurement of a double value. Gauges can - go both up and down. The gauges values can be negative. There are two types of - gauge metric values - `double` and `long`. - -API allows to construct the `Metric` of a chosen type. SDK defines the way to -query the current value of a `Metric` to be exported. - -Every type of a `Metric` has it's API to record values to be aggregated. API -supports both - push and pull model of setting the `Metric` value. - -### Metrics data model and SDK - -Metrics data model is defined in SDK and is based on -[metrics.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/master/src/opentelemetry/proto/metrics/v1/metrics.proto). -This data model is used by all the OpenTelemetry exporters as an input. -Different exporters have different capabilities (e.g. which data types are -supported) and different constraints (e.g. which characters are allowed in label -keys). Metrics is intended to be a superset of what's possible, not a lowest -common denominator that's supported everywhere. All exporters consume data from -Metrics Data Model via a Metric Producer interface defined in OpenTelemetry SDK. - -Because of this, Metrics puts minimal constraints on the data (e.g. which -characters are allowed in keys), and code dealing with Metrics should avoid -validation and sanitization of the Metrics data. Instead, pass the data to the -backend, rely on the backend to perform validation, and pass back any errors -from the backend. - -OpenTelemetry defines the naming convention for metric names as well as a -well-known metric names in [Semantic Conventions](semantic-conventions.md) -document. - -## DistributedContext - -**DistributedContext** is an abstract data type that represents collection of entries. -Each key of **DistributedContext** is associated with exactly one value. **DistributedContext** is serializable, -to facilitate propagating it not only inside the process but also across process boundaries. - -**DistributedContext** is used to annotate telemetry with the name:value pair **Entry**. -Those values can be used to add dimension to the metric or additional contest properties to logs and traces. - -**DistributedContext** is a recommended name but languages can have more language-specific names like **dctx**. - -### Entry - -An **Entry** is used to label anything that is associated with a specific operation, -such as an HTTP request. It consists of **EntryKey**, **EntryValue** and **EntryMetadata**. - -- **EntryKey** is the name of the **Entry**. **EntryKey** along with **EntryValue** -can be used to aggregate and group stats, annotate traces and logs, etc. **EntryKey** is -a string that contains only printable ASCII (codes between 32 and 126 inclusive) and with -a length greater than zero and less than 256. -- **EntryValue** is a string that contains only printable ASCII (codes between 32 and 126). -- **EntryMetadata** contains properties associated with an **Entry**. -For now only the property **EntryTTL** is defined. -- **EntryTTL** is an integer that represents number of hops an entry can propagate. -Anytime a sender serializes an entry, sends it over the wire and receiver unserializes -the entry then the entry is considered to have travelled one hop. - -## Resources - -`Resource` captures information about the entity for which telemetry is -recorded. For example, metrics exposed by a Kubernetes container can be linked -to a resource that specifies the cluster, namespace, pod, and container name. - -`Resource` may capture an entire hierarchy of entity identification. It may -describe the host in the cloud and specific container or an application running -in the process. - -Note, that some of the process identification information can be associated with -telemetry automatically by OpenTelemetry SDK or specific exporter. See -OpenTelemetry -[proto](https://github.com/open-telemetry/opentelemetry-proto/blob/a46c815aa5e85a52deb6cb35b8bc182fb3ca86a0/src/opentelemetry/proto/agent/common/v1/common.proto#L28-L96) -for an example. - -**TODO**: Better describe the difference between the resource and a Node -https://github.com/open-telemetry/opentelemetry-proto/issues/17 - -## Propagators - -OpenTelemetry uses `Propagators` to serialize and deserialize `SpanContext` and `DistributedContext` -into a binary or text format. Currently there are two types of propagators: - -- `BinaryFormat` which is used to serialize and deserialize a value into a binary representation. -- `HTTPTextFormat` which is used to inject and extract a value as text into carriers that travel -in-band across process boundaries. - -## Agent/Collector - -The OpenTelemetry service is a set of components that can collect traces, -metrics and eventually other telemetry data (e.g. logs) from processes -instrumented by OpenTelementry or other monitoring/tracing libraries (Jaeger, -Prometheus, etc.), do aggregation and smart sampling, and export traces and -metrics to one or more monitoring/tracing backends. The service will allow to -enrich and transform collected telemetry (e.g. add additional attributes or -scrub personal information). - -The OpenTelemetry service has two primary modes of operation: Agent (a locally -running daemon) and Collector (a standalone running service). - -Read more at OpenTelemetry Service [Long-term -Vision](https://github.com/open-telemetry/opentelemetry-service/blob/master/docs/VISION.md). - -## Instrumentation adapters - -The inspiration of the project is to make every library and application -manageable out of the box by instrumenting it with OpenTelemery. However on the -way to this goal there will be a need to enable instrumentation by plugging -instrumentation adapters into the library of choice. These adapters can be -wrapping library APIs, subscribing to the library-specific callbacks or -translating telemetry exposed in other formats into OpenTelemetry model. - -Instrumentation adapters may be called different names. It is often referred as -plugin, collector or auto-collector, telemetry module, bridge, etc. It is always -recommended to follow the library and language standards. For instance, if -instrumentation adapter is implemented as "log appender" - it will probably be -called an `appender`, not an instrumentation adapter. However if there is no -established name - the recommendation is to call packages "Instrumentation -Adapter" or simply "Adapter". - -## Code injecting adapters - -TODO: fill out as a result of SIG discussion.