# Terminology ## Distributed Tracing A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. A distributed trace contains events that cross process, network and security boundaries. A distributed trace may be initiated when someone presses a button to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed. ### Trace **Traces** in OpenTelemetry are defined implicitly by their **Spans**. In particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of **Spans**, where the edges between **Spans** are defined as parent/child relationship. For example, the following is an example **Trace** made up of 8 **Spans**: ``` Causal relationships between Spans in a single Trace [Span A] ←←←(the root span) | +------+------+ | | [Span B] [Span C] ←←←(Span C is a `child` of Span A) | | [Span D] +---+-------+ | | [Span E] [Span F] ``` Sometimes it's easier to visualize **Traces** with a time axis as in the diagram below: ``` Temporal relationships between Spans in a single Trace ––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time [Span A···················································] [Span B··············································] [Span D··········································] [Span C········································] [Span E·······] [Span F··] ``` ### Span Each **Span** encapsulates the following state: - An operation name - A start and finish timestamp - A set of zero or more key:value **Attributes**. The keys must be strings. The values may be strings, bools, or numeric types. - A set of zero or more **Events**, each of which is itself a key:value map paired with a timestamp. The keys must be strings, though the values may be of the same types as Span **Attributes**. - Parent's **Span** identifier. - [**Links**](#links-between-spans) to zero or more causally-related **Spans** (via the **SpanContext** of those related **Spans**). - **SpanContext** identification of a Span. See below. ### SpanContext Represents all the information that identifies **Span** in the **Trace** and MUST be propagated to child Spans and across process boundaries. A **SpanContext** contains the tracing identifiers and the options that are propagated from parent to child **Spans**. - **TraceId** is the identifier for a trace. It is worldwide unique with practically sufficient probability by being made as 16 randomly generated bytes. TraceId is used to group all spans for a specific trace together across all processes. - **SpanId** is the identifier for a span. It is globally unique with practically sufficient probability by being made as 8 randomly generated bytes. When passed to a child Span this identifier becomes the parent span id for the child **Span**. - **TraceOptions** represents the options for a trace. It is represented as 1 byte (bitmap). - Sampling bit - Bit to represent whether trace is sampled or not (mask `0x1`). - **Tracestate** carries tracing-system specific context in a list of key value pairs. **Tracestate** allows different vendors propagate additional information and inter-operate with their legacy Id formats. For more details see [this][https://w3c.github.io/trace-context/#tracestate-field]. ### Links between spans A **Span** may be linked to zero or more other **Spans** (defined by **SpanContext**) that are causally related. **Links** can point to **SpanContexts** inside a single **Trace** or across different **Traces**. **Links** can be used to represent batched operations where a **Span** has multiple parents, each representing a single incoming item being processed in the batch. Another example of using a **Link** is to declare relationship between originating and restarted trace. This can be used when **Trace** enters trusted boundaries of an service and service policy requires to generate a new Trace instead of trusting incoming Trace context. ## Metrics TODO: Describe metrics terminology https://github.com/open-telemetry/opentelemetry-specification/issues/45 ## DistributedContext ### Entry An **Entry** is used to label anything that is associated with a specific operation, such as an HTTP request. It consists of **EntryKey**, **EntryValue** and **EntryMetadata**. - **EntryKey** is the name of the **Entry**. **EntryKey** along with **EntryValue** can be used to aggregate and group stats, annotate traces and logs, etc. **EntryKey** is a string that contains only printable ASCII (codes between 32 and 126 inclusive) and with a length greater than zero and less than 256. - **EntryValue** is a string that contains only printable ASCII (codes between 32 and 126). - **EntryMetadata** contains properties associated with an **Entry**. For now only the property **EntryTTL** is defined. - **EntryTTL** is an integer that represents number of hops an entry can propagate. Anytime a sender serializes an entry, sends it over the wire and receiver unserializes the entry then the entry is considered to have travelled one hop. ## Resources `Resource` captures information about the entity for which telemetry is recorded. For example, metrics exposed by a Kubernetes container can be linked to a resource that specifies the cluster, namespace, pod, and container name. `Resource` may capture an entire hierarchy of entity identification. It may describe the host in the cloud and specific container or an application running in the process. Note, that some of the process identification information can be associated with telemetry automatically by OpenTelemetry SDK or specific exporter. See OpenTelemetry [proto](https://github.com/open-telemetry/opentelemetry-proto/blob/a46c815aa5e85a52deb6cb35b8bc182fb3ca86a0/src/opentelemetry/proto/agent/common/v1/common.proto#L28-L96) for an example. **TODO**: Better describe the difference between the resource and a Node https://github.com/open-telemetry/opentelemetry-proto/issues/17 ## Agent/Collector The OpenTelemetry service is a set of components that can collect traces, metrics and eventually other telemetry data (e.g. logs) from processes instrumented by OpenTelementry or other monitoring/tracing libraries (Jaeger, Prometheus, etc.), do aggregation and smart sampling, and export traces and metrics to one or more monitoring/tracing backends. The service will allow to enrich and transform collected telemetry (e.g. add additional attributes or scrab personal information). The OpenTelemetry service has two primary modes of operation: Agent (a locally running daemon) and Collector (a standalone running service). Read more at OpenTelemetry Service [Long-term Vision](https://github.com/open-telemetry/opentelemetry-service/blob/master/docs/VISION.md). ## Instrumentation adapters The inspiration of the project is to make every library and application manageable out of the box by instrumenting it with OpenTelemery. However on the way to this goal there will be a need to enable instrumentation by plugging instrumentation adapters into the library of choice. These adapters can be wrapping library APIs, subscribing to the library-specific callbacks or translating telemetry exposed in other formats into OpenTelemetry model. Instrumentation adapters may be called different names. It is often referred as plugin, collector or auto-collector, telemetry module, bridge, etc. It is always recommended to follow the library and language standards. For instance, if instrumentation adapter is implemented as "log appender" - it will probably be called an `appender`, not an instrumentation adapter. However if there is no established name - the recommendation is to call packages "Instrumentation Adapter" or simply "Adapter". ## Code injecting adapters TODO: fill out as a result of SIG discussion.