Reorganize specification for clarity (#188)

* Reorganize specification for clarity

* Consolidate into a single readme
* New table of contents
* File names changed for better orgnanization
* Images moved to subdirectory
* No specification changes

* fix link

Co-Authored-By: Yang Song <songy23@users.noreply.github.com>

* clean up versioning statement

Co-Authored-By: Armin Ruech <armin.ruech@gmail.com>

* Clarify package/library layout

* Updated DistributedContext header
This commit is contained in:
Ted Young 2019-07-24 11:16:14 -07:00 committed by Yuri Shkuro
parent 5442afe730
commit 21a3adf78d
2 changed files with 0 additions and 425 deletions

View File

@ -1,151 +0,0 @@
# Semantic Conventions
This document defines reserved attributes that can be used to add operation and
protocol specific information.
In OpenTelemetry spans can be created freely and its up to the implementor to
annotate them with attributes specific to the represented operation. Spans
represent specific operations in and between systems. Some of these operations
represent calls that use well-known protocols like HTTP or database calls.
Depending on the protocol and the type of operation, additional information
is needed to represent and analyze a span correctly in monitoring systems. It is
also important to unify how this attribution is made in different languages.
This way, the operator will not need to learn specifics of a language and
telemetry collected from multi-language micro-service can still be easily
correlated and cross-analyzed.
## HTTP client
This span type represents an outbound HTTP request.
For a HTTP client span, `SpanKind` MUST be `Client`.
Given an [RFC 3986](https://www.ietf.org/rfc/rfc3986.txt) compliant URI of the form
`scheme:[//authority]path[?query][#fragment]`, the span name of the span SHOULD
be set to to the URI path value.
If a framework can identify a value that represents the identity of the request
and has a lower cardinality than the URI path, this value MUST be used for the span name instead.
| Attribute name | Notes and examples | Required? |
| :------------- | :----------------------------------------------------------- | --------- |
| `component` | Denotes the type of the span and needs to be `http`. | Yes |
| `http.method` | HTTP request method. E.g. `"GET"`. | Yes |
| `http.url` | HTTP host. E.g. `"https://example.com:779/users/187a34"`. | Yes |
| `http.status_code` | [HTTP response status code](https://tools.ietf.org/html/rfc7231). E.g. `200` | No |
| `http.status_text` | [HTTP reason phrase](https://www.ietf.org/rfc/rfc2616.txt). E.g. `OK` | No |
## HTTP server
This span type represents an inbound HTTP request.
For a HTTP server span, `SpanKind` MUST be `Server`.
Given an inbound request for a route (e.g. `"/users/:userID?"` the `name`
attribute of the span SHOULD be set to this route.
If the route can not be determined, the `name` attribute MUST be set to the [RFC 3986 URI](https://www.ietf.org/rfc/rfc3986.txt) path value.
If a framework can identify a value that represents the identity of the request
and has a lower cardinality than the URI path or route, this value MUST be used for the span name instead.
| Attribute name | Notes and examples | Required? |
| :------------- | :----------------------------------------------------------- | --------- |
| `component` | Denotes the type of the span and needs to be `http`. | Yes |
| `http.method` | HTTP request method. E.g. `"GET"`. | Yes |
| `http.url` | HTTP host. E.g. `"https://example.com:779/users/187a34"`. | Yes |
| `http.route` | The matched route. E.g. `"/users/:userID?"`. | No |
| `http.status_code` | [HTTP response status code](https://tools.ietf.org/html/rfc7231). E.g. `200` | No |
| `http.status_text` | [HTTP reason phrase](https://www.ietf.org/rfc/rfc2616.txt). E.g. `OK` | No |
## Databases client calls
For database client call the `SpanKind` MUST be `Client`.
Span `name` should be set to low cardinality value representing the statement
executed on the database. It may be stored procedure name (without argument), sql
statement without variable arguments, etc. When it's impossible to get any
meaningful representation of the span `name`, it can be populated using the same
value as `db.instance`.
Note, Redis, Cassandra, HBase and other storage systems may reuse the same
attribute names.
| Attribute name | Notes and examples | Required? |
| :------------- | :----------------------------------------------------------- | --------- |
| `component` | Database driver name or database name (when known) `JDBI`, `jdbc`, `odbc`, `postgreSQL`. | Yes |
| `db.type` | Database type. For any SQL database, `"sql"`. For others, the lower-case database category, e.g. `"cassandra"`, `"hbase"`, or `"redis"`. | Yes |
| `db.instance` | Database instance name. E.g., In java, if the jdbc.url=`"jdbc:mysql://db.example.com:3306/customers"`, the instance name is `"customers"`. | Yes |
| `db.statement` | A database statement for the given database type. Note, that the value may be sanitized to exclude sensitive information. E.g., for `db.type="sql"`, `"SELECT * FROM wuser_table"`; for `db.type="redis"`, `"SET mykey 'WuValue'"`. | Yes |
| `db.user` | Username for accessing database. E.g., `"readonly_user"` or `"reporting_user"` | No |
For database client calls, peer information can be populated and interpreted as
follows:
| Attribute name | Notes and examples | Required |
| :-------------- | :----------------------------------------------------------- | -------- |
| `peer.address` | JDBC substring like `"mysql://db.example.com:3306"` | Yes |
| `peer.hostname` | Remote hostname. `db.example.com` | Yes |
| `peer.ipv4` | Remote IPv4 address as a `.`-separated tuple. E.g., `"127.0.0.1"` | No |
| `peer.ipv6` | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., `"2001:0db8:85a3:0000:0000:8a2e:0370:7334"` | No |
| `peer.port` | Remote port. E.g., `80` (integer) | No |
| `peer.service` | Remote service name. Can be database friendly name or `db.instance` | No |
## gRPC
Implementations MUST create a span, when the gRPC call starts, one for
client-side and one for server-side. Outgoing requests should be a span `kind`
of `CLIENT` and incoming requests should be a span `kind` of `SERVER`.
Span `name` MUST be full gRPC method name formatted as:
```
$package.$service/$method
```
Examples of span name: `grpc.test.EchoService/Echo`.
### Attributes
| Attribute name | Notes and examples | Required? |
| -------------- | ------------------------------------------------------------ | --------- |
| `component` | Declares that this is a grpc component. Value MUST be `grpc` | Yes |
`peer.*` attributes MUST define service name as `peer.service`, host as
`peer.hostname` and port as `peer.port`.
### Status
Implementations MUST set status which MUST be the same as the gRPC client/server
status. The mapping between gRPC canonical codes and OpenTelemetry status codes
is 1:1 as OpenTelemetry canonical codes is just a snapshot of grpc codes which
can be found [here](https://github.com/grpc/grpc-go/blob/master/codes/codes.go).
### Events
In the lifetime of a gRPC stream, an event for each message sent/received on
client and server spans SHOULD be created with the following attributes:
```
-> [time],
"name" = "message",
"message.type" = "SENT",
"message.id" = id
"message.compressed_size" = <compressed size in bytes>,
"message.uncompressed_size" = <uncompressed size in bytes>
```
```
-> [time],
"name" = "message",
"message.type" = "RECEIVED",
"message.id" = id
"message.compressed_size" = <compressed size in bytes>,
"message.uncompressed_size" = <uncompressed size in bytes>
```
The `message.id` MUST be calculated as two different counters starting from `1`
one for sent messages and one for received message. This way we guarantee that
the values will be consistent between different implementations. In case of
unary calls only one sent and one received message will be recorded for both
client and server spans.

View File

@ -1,274 +0,0 @@
# Terminology
## Distributed Tracing
A distributed trace is a set of events, triggered as a result of a single
logical operation, consolidated across various components of an application. A
distributed trace contains events that cross process, network and security
boundaries. A distributed trace may be initiated when someone presses a button
to start an action on a website - in this example, the trace will represent
calls made between the downstream services that handled the chain of requests
initiated by this button being pressed.
### Trace
**Traces** in OpenTelemetry are defined implicitly by their **Spans**. In
particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of
**Spans**, where the edges between **Spans** are defined as parent/child
relationship.
For example, the following is an example **Trace** made up of 8 **Spans**:
```
Causal relationships between Spans in a single Trace
[Span A] ←←←(the root span)
|
+------+------+
| |
[Span B] [Span C] ←←←(Span C is a `child` of Span A)
| |
[Span D] +---+-------+
| |
[Span E] [Span F]
```
Sometimes it's easier to visualize **Traces** with a time axis as in the diagram
below:
```
Temporal relationships between Spans in a single Trace
||||||||> time
[Span A···················································]
[Span B··············································]
[Span D··········································]
[Span C········································]
[Span E·······] [Span F··]
```
### Span
Each **Span** encapsulates the following state:
- An operation name
- A start and finish timestamp
- A set of zero or more key:value **Attributes**. The keys must be strings. The
values may be strings, bools, or numeric types.
- A set of zero or more **Events**, each of which is itself a key:value map
paired with a timestamp. The keys must be strings, though the values may be of
the same types as Span **Attributes**.
- Parent's **Span** identifier.
- [**Links**](#links-between-spans) to zero or more causally-related **Spans**
(via the **SpanContext** of those related **Spans**).
- **SpanContext** identification of a Span. See below.
### SpanContext
Represents all the information that identifies **Span** in the **Trace** and
MUST be propagated to child Spans and across process boundaries. A
**SpanContext** contains the tracing identifiers and the options that are
propagated from parent to child **Spans**.
- **TraceId** is the identifier for a trace. It is worldwide unique with
practically sufficient probability by being made as 16 randomly generated
bytes. TraceId is used to group all spans for a specific trace together across
all processes.
- **SpanId** is the identifier for a span. It is globally unique with
practically sufficient probability by being made as 8 randomly generated
bytes. When passed to a child Span this identifier becomes the parent span id
for the child **Span**.
- **TraceOptions** represents the options for a trace. It is represented as 1
byte (bitmap).
- Sampling bit - Bit to represent whether trace is sampled or not (mask
`0x1`).
- **Tracestate** carries tracing-system specific context in a list of key value
pairs. **Tracestate** allows different vendors propagate additional
information and inter-operate with their legacy Id formats. For more details
see [this](https://w3c.github.io/trace-context/#tracestate-field).
### Links between spans
A **Span** may be linked to zero or more other **Spans** (defined by
**SpanContext**) that are causally related. **Links** can point to
**SpanContexts** inside a single **Trace** or across different **Traces**.
**Links** can be used to represent batched operations where a **Span** has
multiple parents, each representing a single incoming item being processed in
the batch. Another example of using a **Link** is to declare relationship
between originating and restarted trace. This can be used when **Trace** enters
trusted boundaries of an service and service policy requires to generate a new
Trace instead of trusting incoming Trace context.
## Metrics
OpenTelemetry allows to record raw measurements or metrics with predefined
aggregation and set of labels.
Recording raw measurements using OpenTelemetry API allows to defer to end-user
the decision on what aggregation algorithm should be applied for this metric as
well as defining labels (dimensions). It will be used in client libraries like
gRPC to record raw measurements "server_latency" or "received_bytes". So end
user will decide what type of aggregated values should be collected out of these
raw measurements. It may be simple average or elaborate histogram calculation.
Recording of metrics with the pre-defined aggregation using OpenTelemetry API is
not less important. It allows to collect values like cpu and memory usage, or
simple metrics like "queue length".
### Recording raw measurements
The main classes used to record raw measurements are `Measure` and
`Measurement`. List of `Measurement`s alongside the additional context can be
recorded using OpenTelemetry API. So user may define to aggregate those
`Measurement`s and use the context passed alongside to define additional
dimensions of the resulting metric.
#### Measure
`Measure` describes the type of the individual values recorded by a library. It
defines a contract between the library exposing the measurements and an
application that will aggregate those individual measurements into a `Metric`.
`Measure` is identified by name, description and a unit of values.
#### Measurement
`Measurement` describes a single value to be collected for a `Measure`.
`Measurement` is an empty interface in API surface. This interface is defined in
SDK.
### Recording metrics with predefined aggregation
The base class for all types of pre-aggregated metrics is called `Metric`. It
defines basic metric properties like a name and labels. Classes inheriting from
the `Metric` define their aggregation type as well as a structure of individual
measurements or Points. API defines the following types of pre-aggregated
metrics:
- Counter metric to report instantaneous measurement. Counter values can go
up or stay the same, but can never go down. Counter values cannot be
negative. There are two types of counter metric values - `double` and `long`.
- Gauge metric to report instantaneous measurement of a double value. Gauges can
go both up and down. The gauges values can be negative. There are two types of
gauge metric values - `double` and `long`.
API allows to construct the `Metric` of a chosen type. SDK defines the way to
query the current value of a `Metric` to be exported.
Every type of a `Metric` has it's API to record values to be aggregated. API
supports both - push and pull model of setting the `Metric` value.
### Metrics data model and SDK
Metrics data model is defined in SDK and is based on
[metrics.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/master/src/opentelemetry/proto/metrics/v1/metrics.proto).
This data model is used by all the OpenTelemetry exporters as an input.
Different exporters have different capabilities (e.g. which data types are
supported) and different constraints (e.g. which characters are allowed in label
keys). Metrics is intended to be a superset of what's possible, not a lowest
common denominator that's supported everywhere. All exporters consume data from
Metrics Data Model via a Metric Producer interface defined in OpenTelemetry SDK.
Because of this, Metrics puts minimal constraints on the data (e.g. which
characters are allowed in keys), and code dealing with Metrics should avoid
validation and sanitization of the Metrics data. Instead, pass the data to the
backend, rely on the backend to perform validation, and pass back any errors
from the backend.
OpenTelemetry defines the naming convention for metric names as well as a
well-known metric names in [Semantic Conventions](semantic-conventions.md)
document.
## DistributedContext
**DistributedContext** is an abstract data type that represents collection of entries.
Each key of **DistributedContext** is associated with exactly one value. **DistributedContext** is serializable,
to facilitate propagating it not only inside the process but also across process boundaries.
**DistributedContext** is used to annotate telemetry with the name:value pair **Entry**.
Those values can be used to add dimension to the metric or additional contest properties to logs and traces.
**DistributedContext** is a recommended name but languages can have more language-specific names like **dctx**.
### Entry
An **Entry** is used to label anything that is associated with a specific operation,
such as an HTTP request. It consists of **EntryKey**, **EntryValue** and **EntryMetadata**.
- **EntryKey** is the name of the **Entry**. **EntryKey** along with **EntryValue**
can be used to aggregate and group stats, annotate traces and logs, etc. **EntryKey** is
a string that contains only printable ASCII (codes between 32 and 126 inclusive) and with
a length greater than zero and less than 256.
- **EntryValue** is a string that contains only printable ASCII (codes between 32 and 126).
- **EntryMetadata** contains properties associated with an **Entry**.
For now only the property **EntryTTL** is defined.
- **EntryTTL** is an integer that represents number of hops an entry can propagate.
Anytime a sender serializes an entry, sends it over the wire and receiver unserializes
the entry then the entry is considered to have travelled one hop.
## Resources
`Resource` captures information about the entity for which telemetry is
recorded. For example, metrics exposed by a Kubernetes container can be linked
to a resource that specifies the cluster, namespace, pod, and container name.
`Resource` may capture an entire hierarchy of entity identification. It may
describe the host in the cloud and specific container or an application running
in the process.
Note, that some of the process identification information can be associated with
telemetry automatically by OpenTelemetry SDK or specific exporter. See
OpenTelemetry
[proto](https://github.com/open-telemetry/opentelemetry-proto/blob/a46c815aa5e85a52deb6cb35b8bc182fb3ca86a0/src/opentelemetry/proto/agent/common/v1/common.proto#L28-L96)
for an example.
**TODO**: Better describe the difference between the resource and a Node
https://github.com/open-telemetry/opentelemetry-proto/issues/17
## Propagators
OpenTelemetry uses `Propagators` to serialize and deserialize `SpanContext` and `DistributedContext`
into a binary or text format. Currently there are two types of propagators:
- `BinaryFormat` which is used to serialize and deserialize a value into a binary representation.
- `HTTPTextFormat` which is used to inject and extract a value as text into carriers that travel
in-band across process boundaries.
## Agent/Collector
The OpenTelemetry service is a set of components that can collect traces,
metrics and eventually other telemetry data (e.g. logs) from processes
instrumented by OpenTelementry or other monitoring/tracing libraries (Jaeger,
Prometheus, etc.), do aggregation and smart sampling, and export traces and
metrics to one or more monitoring/tracing backends. The service will allow to
enrich and transform collected telemetry (e.g. add additional attributes or
scrub personal information).
The OpenTelemetry service has two primary modes of operation: Agent (a locally
running daemon) and Collector (a standalone running service).
Read more at OpenTelemetry Service [Long-term
Vision](https://github.com/open-telemetry/opentelemetry-service/blob/master/docs/VISION.md).
## Instrumentation adapters
The inspiration of the project is to make every library and application
manageable out of the box by instrumenting it with OpenTelemery. However on the
way to this goal there will be a need to enable instrumentation by plugging
instrumentation adapters into the library of choice. These adapters can be
wrapping library APIs, subscribing to the library-specific callbacks or
translating telemetry exposed in other formats into OpenTelemetry model.
Instrumentation adapters may be called different names. It is often referred as
plugin, collector or auto-collector, telemetry module, bridge, etc. It is always
recommended to follow the library and language standards. For instance, if
instrumentation adapter is implemented as "log appender" - it will probably be
called an `appender`, not an instrumentation adapter. However if there is no
established name - the recommendation is to call packages "Instrumentation
Adapter" or simply "Adapter".
## Code injecting adapters
TODO: fill out as a result of SIG discussion.