Merge in OpenCensus, OpenTracing specs (#17)

* Copy OC specs into specification dir

* Copy OT spec into specification/README.md

* Remove work-in-progress files

* Move specification/ into work_in_progress/
This commit is contained in:
Chris Kleinknecht 2019-05-21 15:06:25 -07:00 committed by Sergey Kanzhelev
parent 285425de2b
commit 0e1e0f45d2
16 changed files with 1112 additions and 1 deletions

View File

@ -1 +0,0 @@
# Specification

View File

@ -0,0 +1,259 @@
# Specification
**Version:** 1.1
## Document Overview
This is the "formal" OpenTracing semantic specification. Since OpenTracing must work across many languages, this document takes care to avoid language-specific concepts. That said, there is an understanding throughout that all languages have some concept of an "interface" which encapsulates a set of related capabilities.
### Versioning policy
The OpenTracing specification uses a `Major.Minor` version number but has no `.Patch` component. The major version increments when backwards-incompatible changes are made to the specification. The minor version increments for non-breaking changes like the introduction of new standard tags, log fields, or SpanContext reference types. (You can read more about the motivation for this versioning scheme at Issue [specification#2](https://github.com/opentracing/specification/issues/2#issuecomment-261740811))
## The OpenTracing Data Model
**Traces** in OpenTracing are defined implicitly by their **Spans**. In
particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of
**Spans**, where the edges between **Spans** are called **References**.
For example, the following is an example **Trace** made up of 8 **Spans**:
~~~
Causal relationships between Spans in a single Trace
[Span A] ←←←(the root span)
|
+------+------+
| |
[Span B] [Span C] ←←←(Span C is a `ChildOf` Span A)
| |
[Span D] +---+-------+
| |
[Span E] [Span F] >>> [Span G] >>> [Span H]
(Span G `FollowsFrom` Span F)
~~~
Sometimes it's easier to visualize **Traces** with a time axis as in the
diagram below:
~~~
Temporal relationships between Spans in a single Trace
||||||||> time
[Span A···················································]
[Span B··············································]
[Span D··········································]
[Span C········································]
[Span E·······] [Span F··] [Span G··] [Span H··]
~~~
Each **Span** encapsulates the following state:
- An operation name
- A start timestamp
- A finish timestamp
- A set of zero or more key:value **Span Tags**. The keys must be strings. The
values may be strings, bools, or numeric types.
- A set of zero or more **Span Logs**, each of which is itself a key:value map
paired with a timestamp. The keys must be strings, though the values may be
of any type. Not all OpenTracing implementations must support every value
type.
- A **SpanContext** (see below)
- [**References**](#references-between-spans) to zero or more causally-related **Spans** (via the
**SpanContext** of those related **Spans**)
Each **SpanContext** encapsulates the following state:
- Any OpenTracing-implementation-dependent state (for example, trace and span ids) needed to refer to a distinct **Span** across a process boundary
- **Baggage Items**, which are just key:value pairs that cross process boundaries
### References between Spans
A Span may reference zero or more other **SpanContexts** that are causally related. OpenTracing presently defines two types of references: `ChildOf` and `FollowsFrom`. **Both reference types specifically model direct causal relationships between a child Span and a parent Span.** In the future, OpenTracing may also support reference types for Spans with non-causal relationships (e.g., Spans that are batched together, Spans that are stuck in the same queue, etc).
**`ChildOf` references:** A Span may be the `ChildOf` a parent Span. In a `ChildOf` reference, the parent Span depends on the child Span in some capacity. All of the following would constitute `ChildOf` relationships:
- A Span representing the server side of an RPC may be the `ChildOf` a Span representing the client side of that RPC
- A Span representing a SQL insert may be the `ChildOf` a Span representing an ORM save method
- Many Spans doing concurrent (perhaps distributed) work may all individually be the `ChildOf` a single parent Span that merges the results for all children that return within a deadline
These could all be valid timing diagrams for children that are the `ChildOf` a parent.
~~~
[-Parent Span---------]
[-Child Span----]
[-Parent Span--------------]
[-Child Span A----]
[-Child Span B----]
[-Child Span C----]
[-Child Span D---------------]
[-Child Span E----]
~~~
**`FollowsFrom` references:** Some parent Spans do not depend in any way on the result of their child Spans. In these cases, we say merely that the child Span `FollowsFrom` the parent Span in a causal sense. There are many distinct `FollowsFrom` reference sub-categories, and in future versions of OpenTracing they may be distinguished more formally.
These can all be valid timing diagrams for children that "FollowFrom" a parent.
~~~
[-Parent Span-] [-Child Span-]
[-Parent Span--]
[-Child Span-]
[-Parent Span-]
[-Child Span-]
~~~
## The OpenTracing API
There are three critical and inter-related types in the OpenTracing specification: `Tracer`, `Span`, and `SpanContext`. Below, we go through the behaviors of each type; roughly speaking, each behavior becomes a "method" in a typical programming language, though it may actually be a set of related sibling methods due to type overloading and so on.
When we discuss "optional" parameters, it is understood that different languages have different ways to construe such concepts. For example, in Go we might use the "functional Options" idiom, whereas in Java we might use a builder pattern.
### `Tracer`
The `Tracer` interface creates `Span`s and understands how to `Inject`
(serialize) and `Extract` (deserialize) them across process boundaries.
Formally, it has the following capabilities:
#### Start a new `Span`
Required parameters
- An **operation name**, a human-readable string which concisely represents the work done by the Span (for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation). The operation name should be **the most general string that identifies a (statistically) interesting class of `Span` instances**. That is, `"get_user"` is better than `"get_user/314159"`.
For example, here are potential **operation names** for a `Span` that gets hypothetical account information:
| Operation Name | Guidance |
|:---------------|:--------|
| `get` | Too general |
| `get_account/792` | Too specific |
| `get_account` | Good, and `account_id=792` would make a nice **`Span` tag** |
Optional parameters
- Zero or more **references** to related `SpanContext`s, including a shorthand for `ChildOf` and `FollowsFrom` reference types if possible.
- An optional explicit **start timestamp**; if omitted, the current walltime is used by default
- Zero or more **tags**
**Returns** a `Span` instance that's already started (but not `Finish`ed)
#### Inject a `SpanContext` into a carrier
Required parameters
- A **`SpanContext`** instance
- A **format** descriptor (typically but not necessarily a string constant) which tells the `Tracer` implementation how to encode the `SpanContext` in the carrier parameter
- A **carrier**, whose type is dictated by the **format**. The `Tracer` implementation will encode the `SpanContext` in this carrier object according to the **format**.
#### Extract a `SpanContext` from a carrier
Required parameters
- A **format** descriptor (typically but not necessarily a string constant) which tells the `Tracer` implementation how to decode `SpanContext` from the carrier parameter
- A **carrier**, whose type is dictated by the **format**. The `Tracer` implementation will decode the `SpanContext` from this carrier object according to **format**.
**Returns** a `SpanContext` instance suitable for use as a **reference** when starting a new `Span` via the `Tracer`.
#### Note: required **format**s for injection and extraction
Both injection and extraction rely on an extensible **format** parameter that dictates the type of the associated "carrier" as well as how a `SpanContext` is encoded in that carrier. All of the following **format**s must be supported by all Tracer implementations.
- **Text Map**: an arbitrary string-to-string map with an unrestricted character set for both keys and values
- **HTTP Headers**: a string-to-string map with keys and values that are suitable for use in HTTP headers (a la [RFC 7230](https://tools.ietf.org/html/rfc7230#section-3.2.4)). In practice, since there is such "diversity" in the way that HTTP headers are treated in the wild, it is strongly recommended that Tracer implementations use a limited HTTP header key space and escape values conservatively.
- **Binary**: a (single) arbitrary binary blob representing a `SpanContext`
### `Span`
With the exception of the method to retrieve the `Span`'s `SpanContext`, none of the below may be called after the `Span` is finished.
#### Retrieve the `Span`s `SpanContext`
There should be no parameters.
**Returns** the `SpanContext` for the given `Span`. The returned value may be used even after the `Span` is finished.
#### Overwrite the operation name
Required parameters
- The new **operation name**, which supersedes whatever was passed in when the `Span` was started
#### Finish the `Span`
Optional parameters
- An explicit **finish timestamp** for the `Span`; if omitted, the current walltime is used implicitly.
With the exception of the method to retrieve a `Span`'s `SpanContext`, no method may be called on a `Span` instance after it's finished.
#### Set a `Span` tag
Required parameters
- The tag key, which must be a string
- The tag value, which must be either a string, a boolean value, or a numeric type
Note that the OpenTracing project documents certain **["standard tags"](./semantic_conventions.md#span-tags-table)** that have prescribed semantic meanings.
#### Log structured data
Required parameters
- One or more key:value pairs, where the keys must be strings and the values may have any type at all. Some OpenTracing implementations may handle more (or more of) certain log values than others.
Optional parameters
- An explicit timestamp. If specified, it must fall between the local start and finish time for the span.
Note that the OpenTracing project documents certain **["standard log keys"](./semantic_conventions.md#log-fields-table)** which have prescribed semantic meanings.
#### Set a **baggage** item
Baggage items are key:value string pairs that apply to the given `Span`, its `SpanContext`, and **all `Spans` which directly or transitively _reference_ the local `Span`.** That is, baggage items propagate in-band along with the trace itself.
Baggage items enable powerful functionality given a full-stack OpenTracing integration (for example, arbitrary application data from a mobile app can make it, transparently, all the way into the depths of a storage system), and with it some powerful costs: use this feature with care.
Use this feature thoughtfully and with care. Every key and value is copied into every local *and remote* child of the associated Span, and that can add up to a lot of network and cpu overhead.
Required parameters
- The **baggage key**, a string
- The **baggage value**, a string
#### Get a **baggage** item
Required parameters
- The **baggage key**, a string
**Returns** either the corresponding **baggage value**, or some indication that such a value was missing.
### `SpanContext`
The `SpanContext` is more of a "concept" than a useful piece of functionality at the generic OpenTracing layer. That said, it is of critical importance to OpenTracing *implementations* and does present a thin API of its own. Most OpenTracing users only interact with `SpanContext` via [**references**](#references-between-spans) when starting new `Span`s, or when injecting/extracting a trace to/from some transport protocol.
In OpenTracing we force `SpanContext` instances to be **immutable** in order to avoid complicated lifetime issues around `Span` finish and references.
#### Iterate through all baggage items
This is modeled in different ways depending on the language, but semantically the caller should be able to efficiently iterate through all baggage items in one pass given a `SpanContext` instance.
### `NoopTracer`
All OpenTracing language APIs must also provide some sort of `NoopTracer` implementation which can be used to flag-control OpenTracing or inject something harmless for tests (et cetera). In some cases (for example, Java) the `NoopTracer` may be in its own packaging artifact.
### Optional API Elements
Some languages also provide utilities to pass an active `Span` and/or `SpanContext` around a single process. For instance, `opentracing-go` provides helpers to set and get the active `Span` in Go's `context.Context` mechanism.

View File

@ -0,0 +1,146 @@
# BINARY FORMAT
The binary format can be used to encode different data types, each with different fields. This
document first describes the general format and then applies it to specific data types,
including Trace Context and Tag Context.
## General Format
Each encoding will have a 1 byte version followed by the version format encoding:
`<version><version_format>`
This will allow us to, in 1 deprecation cycle to completely switch to a new format if needed.
## Version Format (version_id = 0)
The version format for the version_id = 0 is based on ideas from proto encoding. The main
requirements are to allow adding and removing fields in less than 1 deprecation cycle. It
contains a list of fields:
`<field><field>...`
### Field
Each field is a 1-byte field ID paired with a field value, where the format of the field value is
determined by both the field ID and the data type. For example, field 0 in `Trace Context` may
have a completely different format than field 0 in `Tag Context` or field 1 in `Trace Context`.
Each field that we send on the wire will have the following format:
`<field_id><field_format>`
* `field_id` is a single byte.
* `field_format` must be defined for each field separately.
The specification for a data type's format must also specify whether each field is optional or
repeated. For example, `Trace-id` in `Trace Context` is optional, and `Tag` in `Tag Context`
is repeated. The specification for a data type's format MAY define a default value for any
optional field, which must be used when the field is missing.
The specification for a data type can define versions within a version of the format, called data
type version, where each data type version adds new fields. The data type version can be useful
for describing what fields an implementation supports, but it is not included in the
serialized data.
### Serialization Rules
Fields MUST be serialized in data type version order (i.e. all fields from version (i) of a data
type must precede all fields from version (i+1)). That is because each field has its own format,
and old implementations may not be able to determine where newer field values end. This ordering
allows old decoders to ignore any new fields when they do not know the format for those fields.
Fields within a data type version can be serialized in any order, and fields with the same field
ID do not need to be serialized consecutively.
### Deserialization Rules
Because all the fields will be decoded in data type version order, the deserialization will
simply read the encoded input until the end of the input or until the first unknown field_id. An
unknown field id should not be considered a parse error. Implementations MAY pass on any fields
that they cannot decode, when possible (by passing-through the whole opaque tail of bytes
starting with the first field id that the current binary does not understand).
### How can we add new fields?
If we follow the rules that we always append the new ids at the end of the buffer we can add up
to 127.
TODO(bdrutu): Decide what to do after 127: a) use varint encoding or b) just reserve 255 as a
continuation byte.
### How can we remove a field?
We can stop sending any field at any moment and the decoders will be able to skip the missing ids
and use the default values.
### Trace Context
#### Fields added in Trace Context version 0
##### Trace-id
* optional
* `field_id` = 0
* `len` = 16
Is the ID of the whole trace forest. It is represented as an opaque 16-bytes array,
e.g. (in hex), `4bf92f3577b34da6a3ce929d000e4736`. All bytes 0 is considered invalid.
##### Span-id
* optional
* `field_id` = 1
* `len` = 8
Is the ID of the caller span (parent). It is represented as an opaque 8-bytes array,
e.g. (in hex), `34f067aa0ba902b7`. All bytes 0 is considered invalid.
##### Trace-options
* optional
* `field_id` = 2
* `len` = 1
Controls tracing options such as sampling, trace level etc. It is a 1-byte
representing a 8-bit unsigned integer. The least significant bit provides
recommendation whether the request should be traced or not (1 recommends the
request should be traced, 0 means the caller does not make a decision to trace
and the decision might be deferred). The flags are recommendations given by the
caller rather than strict rules to follow for 3 reasons:
1. Trust and abuse.
2. Bug in caller
3. Different load between caller service and callee service might force callee to down sample.
The behavior of other bits is currently undefined.
#### Valid example
{0,
0, 75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54,
1, 52, 240, 103, 170, 11, 169, 2, 183,
2, 1}
This corresponds to:
* `traceId` = {75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54}
* `spanId` = {52, 240, 103, 170, 11, 169, 2, 183}
* `traceOptions` = 1
### Tag Context
The Tag Context format uses Varint encoding, which is described in
https://developers.google.com/protocol-buffers/docs/encoding#varints.
#### Fields added in Tag Context version 0
##### Tag
* repeated
* `field_id` = 0
* `field_format` = `<tag_key_len><tag_key><tag_val_len><tag_val>` where
* `tag_key_len` is a varint encoded integer.
* `tag_key` is `tag_key_len` bytes comprising the tag key name.
* `tag_val_len` is a varint encoded integer.
* `tag_val` is `tag_val_len` bytes comprising the tag value.
* Tags can be serialized in any order.
* Multiple tag fields can contain the same tag key. All but the last value for
that key should be ignored.
* The
[size limit for serialized Tag Contexts](https://github.com/census-instrumentation/opencensus-specs/blob/master/tags/TagMap.md#limits)
should apply to all tag fields, even if some of them have duplicate keys. For
example, a serialized tag context with 10,000 small tags that all have the
same key should be considered too large.

View File

@ -0,0 +1,47 @@
### Census Server Stats
The encoding is based on [BinaryEncoding](BinaryEncoding.md)
#### Fields added in Census Server Stats version 0
##### LB-Latency-Ns
* optional
* `field_id` = 0
* `len` = 8
Request processing latency observed on Load Balance. The unit is nanoseconds.
It is int64 little endian.
##### Server-Latency_Ns
* optional
* `field_id` = 1
* `len` = 8
Request processing latency observed on Server. The unit is nanoseconds.
It is int64 little endian.
##### Trace-Options
* optional
* `field_id` = 2
* `len` = 1
It is a 1-byte representing a 8-bit unsigned integer. The least significant
bit provides if the request was sampled on the server or not (1= sampled,
0= not sampled).
The behavior of other bits is currently undefined.
#### Valid example (Hex)
{`0,`
`0, 38, C7, 0, 0, 0, 0, 0, 0,`
`1, 50, C3, 0, 0, 0, 0, 0, 0,`
`2, 1`}
This corresponds to:
* `lb_latency_ns` = 51000 (0x000000000000C738)
* `server_latency_ns` = 50000 (0x000000000000C350)
* `trace_options` = 1 (0x01)

View File

@ -0,0 +1,15 @@
# OpenCensus Library Encoding Package
This documentation serves to document the on-the-wire encoding format supported in
OpenCensus. It describes the key types and the overall behavior.
## Formats
* [Binary Encoding](BinaryEncoding.md)
* [TraceContext Binary Encoding](BinaryEncoding.md#trace-context)
* [TagContext Binary Encoding](BinaryEncoding.md#tag-context)
* [Census Server Stats Encoding](CensusServerStatsEncoding.md)
* HTTP Encoding
* [W3C TraceContext](https://github.com/TraceContext/tracecontext-spec)
* [W3C Correlation Context](https://github.com/w3c/correlation-context)
* [Stackdriver TraceContext Header](https://cloud.google.com/trace/docs/support)
* [B3 TraceContext Header](https://github.com/openzipkin/b3-propagation)

View File

@ -0,0 +1,21 @@
# Metrics
Metrics are a data model for what stats exporters take as input.
Different exporters have different capabilities (e.g. which data types
are supported) and different constraints (e.g. which characters are allowed in
label keys). Metrics is intended to be a superset of what's possible, not a
lowest common denominator that's supported everywhere.
Because of this, Metrics puts minimal constraints on the data (e.g. which
characters are allowed in keys), and code dealing with Metrics should avoid
validation and sanitization of the Metrics data. Instead, pass the data to the
backend, rely on the backend to perform validation, and pass back any errors
from the backend.
The Metrics data model is defined as
[metrics.proto](https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/metrics/v1/metrics.proto),
but the proto is just to illustrate the concepts. OpenCensus implementations
don't have to use the actual proto, and can instead use a language-specific
in-memory data structure that captures what exporters need. This structure
should use the names and fields from the data model, for API consistency across
languages.

View File

@ -0,0 +1,8 @@
# OpenCensus Library Metrics Package
This documentation serves to document the "look and feel" of the open source metrics package. It
describes the key types and the overall behavior.
Note: This is an experimental package and is likely to get backwards-incompatible updates in the future.
## Main APIs
* [Metrics Data Model](Metrics.md): defines the metrics data model.

View File

@ -0,0 +1,14 @@
# OpenCensus Library Resource Package
This documentation serves to document the "look and feel" of the OpenCensus resource package.
It describes their key types and overall behavior.
The resource library primarily defines a type "Resource" that captures information about the
entity for which stats or traces are recorded. For example, metrics exposed by a Kubernetes
container can be linked to a resource that specifies the cluster, namespace, pod, and container name.
The primary purpose of resources as a first-class concept in the core library is decoupling
of discovery of resource information from exporters. This allows for independent development
of easy customization for users that need to integrate with closed source environments.
## Main APIs
* [Resource](Resource.md)

View File

@ -0,0 +1,149 @@
# Resource API Overview
The resource library primarily defines a type that captures information about the entity
for which metrics or traces are reported. It further provides a framework for detection of
resource information from the environment and progressive population as signals propagate
from the core instrumentation library to a backend's exporter.
## Resource type
A `Resource` describes the entity for which a signal was collected through two fields:
* `type`: an optional string which describes a well-known type of resource.
* `labels`: a dictionary of labels with string keys and values that provide information
about the entity.
Type, label keys, and label values MUST contain only printable ASCII (codes between 32
and 126, inclusive) and not exceed 256 characters.
Type and label keys MUST have a length greater than zero. Label keys SHOULD start with the type
and separate hierarchies with `.` characters, e.g. `k8s.namespace.name`.
Implementations MAY define a `Resource` data type, constructed from the parameters above.
`Resource` MUST have getters for retrieving all the information used in `Resource` definition.
Example in Go:
```go
type Resource {
Type string
Labels map[string]string
}
```
For the proto definition see [here][resource-proto-link]
## Populating resources
Resource information MAY be populated at any point between startup of the instrumented
application and passing it to a backend-specific exporter. This explicitly includes
the path through future OpenCensus components such as agents or services.
For example, process-identifying information may be populated through the library while
an agent attaches further labels about the underlying VM, the cluster, or geo-location.
### From environment variables
Population of resource information from environment variables MUST be provided by the
core library. It provides the user with an ubiquitious way to manually provide information
that may not be detectable automatically through available integration libraries.
Two environment variables are used:
* `OC_RESOURCE_TYPE`: defines the resource type. Leading and trailing whitespaces are trimmed.
* `OC_RESOURCE_LABELS`: defines resource labels as a comma-seperated list of key/value pairs
(`[ <key>="value" [ ,<key>="<value>" ... ] ]`). `"` characters in values MUST be escaped with `\`.
For example:
* `OC_RESOURCE_TYPE=container`
* `OC_RESOURCE_LABELS=container.name="c1",k8s.pod.name="pod-xyz-123",k8s.namespace.name="default"`
Population from environment variables MUST be the first applied detection process unless
the user explicitly overwrites this behavior.
### Auto-detection
Auto-detection of resource information in specific environments, e.g. specific cloud
vendors, MUST be implemented outside of the core libraries in third party or
[census-ecosystem][census-ecosystem] repositories.
### Merging
As different mechanisms are run to gain information about a resource, their information
has to be merged into a single resulting resource.
Already set labels or type fields MUST NOT be overwritten unless they are empty string. Label key
namespacing SHOULD be used to prevent collisions across different resource detection steps.
### Detectors
To make auto-detection implementations easy to use, the core resource package SHOULD define
an interface to retrieve resource information. Additionally, helper functionality MAY be
provided to effectively make use of this interface.
The exact shape of those interfaces and helpers SHOULD be idiomatic to the respective language.
Example in Go:
```go
type Detector func(context.Context) (*Resource, error)
// Returns a detector that runs all input detectors sequentially and merges their results.
func ChainedDetector(...Detector) Detector
```
### Updates
OpenCensus's resource representation is focused on providing static, uniquely identifying
information and thus those mutable attributes SHOULD NOT be included in the resource
representation.
Resource type and labels MUST NOT be mutated after initialization. Any changes MUST be
effectively be treated as a different resource and any associated signal state MUST be reset.
## Exporter translation
A resource object MUST NOT be mutated further once it is passed to a backend-specific exporter.
From the provided resource information, the exporter MAY transform, drop, or add information
to build the resource identifying data type specific to its backend.
If the passed resource does not contain sufficient information, an exporter MAY drop
signal data entirely, if no sufficient resource information is provided to perform a correct
write.
For example, from a resource object
```javascript
{
"type": "container",
"labels": {
// Populated from VM environment through auto-detection library.
"host.id": "instance1",
"cloud.zone": "eu-west2-a",
"cloud.account.id": "project1",
// Populated through OpenCensus resource environment variables.
"k8s.cluster_name": "cluster1",
"k8s.namespace.name": "ns1",
"k8s.pod.name": "pod1",
"container.name": "container1",
},
}
```
an exporter for Stackdriver would create the following "monitored resource", which is a
resource type with well-known identifiers specific to its API:
```javascript
{
"type": "k8s_container",
"labels": {
"project_id": "project1",
"location": "eu-west2-a",
"cluster_name": "cluster1",
"namespace_name": "ns1",
"pod_name": "pod1",
"container_name": "container1",
},
}
```
For another, hypothetical, backend a simple unique identifier might be constructed instead
by its exporter:
```
cluster1/ns1/pod1/container1
```
Exporter libraries MAY provide a default translation for well-known input resource types and labels.
Those would generally be based on community-supported detection integrations maintained in the
[census-ecosystem][census-ecosystem-link] organisation.
Additionally, exporters SHOULD provide configuration hooks for users to provide their own
translation unless the exporter's backend does not support resources at all. For such backends,
exporters SHOULD allow attaching converting resource labels to metric tags.
[census-ecosystem-link]: https://github.com/census-ecosystem
[resource-proto-link]: https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/resource/v1/resource.proto

View File

@ -0,0 +1,6 @@
# OpenCensus Library Tags Package
This documentation serves to document the "look and feel" of the open source tags package. It
describes the key types and the overall behavior.
## Main APIs
* [TagMap](TagMap.md)

View File

@ -0,0 +1,189 @@
# Summary
A `Tag` is used to label anything that is associated
with a specific operation, such as an HTTP request. These `Tag`s are used to
aggregate measurements in a [`View`](https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/DataAggregation.md#view)
according to unique value of the `Tag`s. The `Tag`s can also be used to filter (include/exclude)
measurements in a `View`. `Tag`s can further be used for logging and tracing.
# Tag
A `Tag` consists of TagMetadata, TagKey, and TagValue.
## TagKey
`TagKey` is the name of the Tag. TagKey along with `TagValue` is used to aggregate
and group stats, annotate traces and logs.
**Restrictions**
- Must contain only printable ASCII (codes between 32 and 126 inclusive)
- Must have length greater than zero and less than 256.
- Must not be empty.
## TagValue
`TagValue` is a string. It MUST contain only printable ASCII (codes between
32 and 126)
## TagMetadata
`TagMetadata` contains properties associated with a `Tag`. For now only the property `TagTTL`
is defined. In future, additional properties may be added to address specific situations.
A tag creator determines metadata of a tag it creates.
### TagTTL
`TagTTL` is an integer that represents number of hops a tag can propagate. Anytime a sender serializes a tag,
sends it over the wire and receiver unserializes the tag then the tag is considered to have travelled one hop.
There could be one or more proxy(ies) between sender and receiver. Proxies are treated as transparent
entities and they may not create additional hops. Every propagation implementation should support an option
`decrementTTL` (default set to true) that allows proxies to set it to false.
**For now, ONLY special values (0 and -1) are supported.**
#### Special Values
- **NO_PROPAGATION (0)**: Tag with `TagTTL` value of zero is considered to have local scope and
is used within the process it created.
- **UNLIMITED_PROPAGATION (-1)**: A Tag with `TagTTL` value of -1 can propagate unlimited hops.
However, it is still subject to outgoing and incoming (on remote side) filter criteria.
See `TagPropagationFilter` in [Tag Propagation](#Tag Propagation). `TagTTL` value of -1
is typical used to represent a request, processing of which may span multiple entities.
#### Example for TagTTL > 0
On a server side typically there is no information about the caller besides ip/port,
but in every process there is a notion of "service_name" tag that is added as a "caller" tag before
serialization when a RPC/HTTP call is made. For the "caller" tag, desirable `TagTTL` value is 1.
Note that TagTTL value of 1 is not supported at this time. The example is listed here simply to
show a possible use case for TagTTL > 0.
### Processing at Receiver and Sender
For now, limited processing is required on Sender and Receiver. However, for the sake of
completeness, future processing requirement is also listed here. These requirements are marked with
"**(future)**".
This processing is done as part of tag propagator.
#### At Receiver
Upon receiving a tag from remote entity a tag extractor
- MUST decrement the value of `TagTTL` by one if it is greater than zero. **(future)**
- MUST treat the value of `TagTTL` as -1 if it is not present.
- MUST discard the `Tag` for any other value of `TagTTL`. **(future)**
#### At Sender
Upon preparing to send a tag to a remote entity a tag injector
- MUST send the tag AND include `TagTTL` if its value is greater than 0. **(future)**
- MUST send the tag without 'TagTTL' if its value is -1. Absence of TagTTL on the wire is treated as having TagTTL of -1.
This is to optimize on-the-wire representation of common case.
- MUST not send the tag if the value of `TagTTL` is 0.
A tag accepted for sending/receiving based on `TagTTL` value could still be excluded from sending/receiving based on
`TagPropagationFilter`.
## Tag Conflict Resolution
If a new tag conflicts with an existing tag then the new tag takes precedence. Entire `Tag` along
with `TagValue` and `TagMetadata` is replaced by the most recent tag (regardless of it is locally
generated or received from a remote peer). Replacement is limited to a scope in which the
conflict arises. When the scope is closed the orignal value and metadata prior to the conflict is restored.
For example,
```
T# - Tag keys
V# - Tag Values
M# - Tag Metadata
Enter Scope 1
Current Tags T1=V1/M1, T2=V2/M2
Enter Scope 2
Add Tags T3=V3/M3, T2=v4/M4
Current Tags T1=V1/M1, T2=V4/M4, T3=V3/M3 <== Value/Metadata of T2 is replaced by V4/M4.
Close Scope 2
Current Tags T1=V1/M1, T2=V2/M2 <== T2 is restored.
Close Scope 1
```
# TagMap
`TagMap` is an abstract data type that represents collection of tags.
i.e., each key is associated with exactly one value. `TagMap` is serializable, and it represents
all of the information that could be propagated inside the process and across process boundaries.
`TagMap` is a recommended name but languages can have more language specific name.
## Limits
Combined size of all `Tag`s should not exceed 8192 bytes before encoding.
The size restriction applies to the deserialized tags so that the set of decoded
`TagMap`s is independent of the encoding format.
## TagMap Propagation
`TagMap` may be propagated across process boundaries or across any arbitrary boundaries for various
reasons. For example, one may propagate 'project-id' Tag across all micro-services to break down metrics
by 'project-id'. Not all `Tag`s in a `TagMap` should be propagated and not all `Tag`s in a `TagMap`
should be accepted from a remote peer. Hence, `TagMap` propagator must allow specifying an optional
list of ordered `TagPropagationFilter`s for receiving `Tag`s or for forwarding `Tag`s or for both.
A `TagPropagationFilter` list for receiving MAY be different then that for forwarding.
If no filter is specified for receiving then all `Tag`s are received.
If no filter is specified for forwarding then all `Tag`s are forwarded except those that have `TagTTL` of 0.
### TagPropagationFilter
Tag Propagation Filter consists of action (`TagPropagationFilterAction`) and condition
(`TagPropagationFilterMatchOperator` and `TagPropagationFilterMatchString`). A `TagKey`
is evaluated against condition of each `TagPropagationFilter` in order. If the condition is evaluated
to true then action is taken according to `TagPropagationFilterAction` and filter processing is stopped.
If the condition is evaluated to false then the `TagKey` is processed against next `TagPropagationFilter`
in the ordered list. If none of the condition is evaluated to true then the default
action is **Exclude**.
#### TagPropagationFilterAction
This is an interface. Implementation of this interface takes appropriate action on the `Tag` if the
condition (`TagPropagationFitlerMatchOperator` and `TagPropagationFilterMatchString`) is evaluated to true.
At a minimum, `Exclude` and `Include` actions MUST be implemented.
**Exclude**
If the `TagPropagationFilterAction` is Exclude then any `Tag` whose `TagKey` evaluates to true
with the condition (`TagPropagationFitlerMatchOperator` and `TagPropagationFilterMatchString`)
MUST be excluded.
**Include**
If the `TagPropagationFilterAction` is Include then any `Tag` whose `TagKey` evaluates to true
with the condition (`TagPropagationFitlerMatchOperator ` and `TagPropagationFilterMatchString`)
MUST be included.
#### TagPropagationFilterMatchOperator
| Operator | Description |
|----------|-------------|
| EQUAL | The condition is evaluated to true if `TagKey` is exactly same as `TagPropagationFilterMatchString` |
| NOTEQUAL | The condition is evaluated to true if `TagKey` is NOT exactly same as `TagPropagationFilterMatchString` |
| HAS_PREFIX | The condition is evaluated to true if `TagKey` begins with `TagPropagationFilterMatchString` |
#### TagPropagationFilterMatchString
It is a string to compare against TagKey using `TagPropagationFilterMatchOperator` in order to
include or exclude a `Tag`.
## Encoding
### Wire Format
TBD:
#### Over gRPC
TagMap should be encoded using [BinaryEncoding](https://github.com/census-instrumentation/opencensus-specs/tree/master/encodings)
and propagated using gRPC metadata `grpc-tags-bin`. The propagation MUST inject a TagMap and MUST extract a TagMap from the gRPC metadata.
#### Over HTTP
TBD: W3C [correlation context](https://github.com/w3c/correlation-context/blob/master/correlation_context/HTTP_HEADER_FORMAT.md)
may be an appropriate choice.
### Error handling
- Call should continue irrespective of any error related to encoding/decoding.
- There are no partial failures for encoding or decoding. The result of encoding or decoding
should always be a complete `TagMap` or an error. The type of error
reporting depends on the language.
- Serialization should result in an error if the `TagMap` does not meet the
size restriction above.
- Deserialization should result in an error if the serialized `TagMap`
- cannot be parsed.
- contains a `TagKey` or `TagValue` that does not meet the restrictions above.
- does not meet the size restriction above.

View File

@ -0,0 +1,55 @@
# Log Correlation
Log correlation is a feature that inserts information about the current span into log entries
created by existing logging frameworks. The feature can be used to add more context to log entries,
filter log entries by trace ID, or find log entries associated with a specific trace or span.
The design of a log correlation implementation depends heavily on the details of the particular
logging framework that it supports. Therefore, this document only covers the aspects of log
correlation that could be shared across log correlation implementations for multiple languages and
logging frameworks. It doesn't cover how to hook into the logging framework.
## Identifying the span to associate with a log entry
A log correlation implementation should look up tracing data from the span that is current at the
point of the log statement. See
[Span.md#how-span-interacts-with-context](Span.md#how-span-interacts-with-context) for the
definition of the current span.
## Tracing data to include in log entries
A log correlation implementation should make the following pieces of tracing data from the current
span context available in each log entry:
### Trace ID
The trace ID of the current span. See [Span.md#traceid](Span.md#traceid).
### Span ID
The span ID of the current span. See [Span.md#spanid](Span.md#spanid).
### Sampling Decision
The sampling bit of the current span, as a boolean. See
[Span.md#supported-bits](Span.md#supported-bits).
TODO(sebright): Include "samplingScore" once that field is added to the SpanContext.
TODO(sebright): Add a section on fields from the Tracestate. Users should be able to add
vendor-specific fields from the Tracestate to logs, using a callback mechanism.
TODO(sebright): Consider adding parent span ID, to allow recreating the trace structure from logs.
## String format for tracing data
The logging framework may require the pieces of tracing data to be converted to strings. In that
case, the log correlation implementation should format the trace ID and span ID as lowercase base 16
and format the sampling decision as "true" or "false".
## Key names for tracing data
Some logging frameworks allow the insertion of arbitrary key-value pairs into log entries. When
a log correlation implementation inserts tracing data by that method, the key names should be
"traceId", "spanId", and "traceSampled" by default. The log correlation implementation may allow
the user to override the key names.

View File

@ -0,0 +1,13 @@
# OpenCensus Library Trace Package
This documentation serves to document the "look and feel" of the open source tags package. It
describes the key types and the overall behavior.
## Main APIs
* [Span](Span.md)
* [TraceConfig](TraceConfig.md)
## Utils
* [gRPC integration](gRPC.md): document about how to instrument gRPC framework.
* [HTTP integration](HTTP.md): document about how to instrument http frameworks.
* [Sampling logic](Sampling.md): document about how sampling works.
* [Log correlation](LogCorrelation.md): specification for a feature for inserting tracing data into log entries.

View File

@ -0,0 +1,62 @@
# Sampling
This document is about the sampling bit, sampling decision, samplers and how and when
OpenCensus samples traces. A sampled trace is one that gets exported via the configured
exporters.
## Sampling Bit (propagated via TraceOptions)
The Sampling bit is always set only at the start of a Span, using a `Sampler`
### What kind of samplers does OpenCensus support?
* `AlwaysSample` - sampler that makes a "yes" decision every time.
* `NeverSample` - sampler that makes a "no" decision every time.
* `Probability` - sampler that tries to uniformly sample traces with a given probability. When
applied to a child `Span` of a **sampled** parent `Span`, the child `Span` keeps the sampling
decision.
* `RateLimiting` - sampler that tries to sample with a rate per time window (0.1 traces/second).
When applied to a child `Span` of a **sampled** parent `Span`, the child `Span` keeps the sampling
decision. For implementation details see [this](#ratelimiting-sampler-implementation-details)
### How can users control the Sampler that is used for sampling?
There are 2 ways to control the `Sampler` used when the library samples:
* Controlling the global default `Sampler` via [TraceConfig](https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/TraceConfig.md).
* Pass a specific `Sampler` when starting the [Span](https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/Span.md)
(a.k.a. "span-scoped").
* For example `AlwaysSample` and `NeverSample` can be used to implement request-specific
decisions such as those based on http paths.
### When does OpenCensus sample traces?
The OpenCensus library samples based on the following rules:
1. If the span is a root `Span`, then a `Sampler` will be used to make the sampling decision:
* If a "span-scoped" `Sampler` is provided, use it to determine the sampling decision.
* Else use the global default `Sampler` to determine the sampling decision.
2. If the span is a child of a remote `Span` the sampling decision will be:
* If a "span-scoped" `Sampler` is provided, use it to determine the sampling decision.
* Else use the global default `Sampler` to determine the sampling decision.
3. If the span is a child of a local `Span` the sampling decision will be:
* If a "span-scoped" `Sampler` is provided, use it to determine the sampling decision.
* Else keep the sampling decision from the parent.
### RateLimiting sampler implementation details
The problem we are trying to solve is:
1. Getting QPS based sampling.
2. Providing real sampling probabilities.
3. Minimal overhead.
Idea is to store the time that we last made a QPS based sampling decision in an atomic. Then we can
use the elapsed time Z since the coin flip to weight our current coin flip. We choose our
probability function P(Z) such that we get the desired sample QPS. We want P(Z) to be very
cheap to compute.
Let X be the desired QPS. Let Z be the elapsed time since the last sampling decision in seconds.
```
P(Z) = min(Z * X, 1)
```
To see that this is approximately correct, consider the case where we have perfectly distributed
time intervals. Specifically, let X = 1 and Z = 1/N. Then we would have N coin flips per second,
each with probability 1/N, for an expectation of 1 sample per second.
This will under-sample: consider the case where X = 1 and Z alternates between 0.5 and 1.5. It is
possible to get about 1 QPS by always sampling, but this algorithm only gets 0.75 QPS.

View File

@ -0,0 +1,87 @@
# Span
Span represents a single operation within a trace. Spans can be nested to form a trace tree.
Often, a trace contains a root span that describes the end-to-end latency and, optionally, one or
more sub-spans for its sub-operations.
A span contains a SpanContext and allows users to record tracing events based on the data model
defined [here][SpanDataModel].
## Span structure
### SpanContext
Represents all the information that MUST be propagated to child Spans and across process boundaries.
A span context contains the tracing identifiers and the options that are propagated from parent
to child Spans.
#### TraceId
Is the identifier for a trace. It is worldwide unique with practically sufficient
probability by being made as 16 randomly generated bytes. TraceId is used to group all spans for
a specific trace together across all processes.
#### SpanId
Is the identifier for a span. It is globally unique with practically sufficient probability by
being made as 8 randomly generated bytes. When passed to a child Span this identifier becomes the
parent span id for the child Span.
#### TraceOptions
Represents the options for a trace. It is represented as 1 byte (bitmap).
##### Supported bits
* Sampling bit - Bit to represent whether trace is sampled or not (mask `0x1`).
#### Tracestate
Carries tracing-system specific context in a list of key value pairs. Tracestate allows different
vendors propagate additional information and inter-operate with their legacy Id formats.
For more details see [this][TracestateLink].
## Span creation
The implementation MUST allow users to create two types of Spans:
* Root Spans - spans that do not have a parent.
* Child Spans - the parent can be explicitly set or inherit from the Context.
When creating a Span the implementation MUST allow users to create the Span attached or detached
from the Context, this allows users to manage the interaction with the Context independently of
the Span lifetime:
* Attached to the Context - the newly created Span is attached to the Context.
* Detached from the Context - the newly created Span is not attached to any Context.
### How Span interacts with Context?
Context interaction represents the process of attaching/detaching a Span to the Context
in order to propagate it in-process (possibly between threads) and between function calls.
There are two supported implementations for the Context based on how the propagation is implemented:
* With implicit propagation - implicitly passed between function calls and threads, usually
implemented using thread-local variables (e.g. Java [io.grpc.Context][javaContext])
* With explicit propagation - explicitly passed between function calls and threads (e.g. Go
[context.Context][goContext])
When an implicit propagated Context is used, the implementation MUST use scoped objects to
attach/detach a Span (scoped objects represent auto closable objects, e.g. stack allocated
objects in C++):
* When attach/detach an already created Span the API MAY be called `WithSpan`.
* When attach/detach at the creation time the API MAY be called `StartSpan` or `StartScopedSpan`.
When an explicit propagated Context is used, the implementation MUST create a new Context when a
Span is attached (immutable Context):
* When attach/detach an already created Span the API MAY be called `WithSpan`.
* When attach/detach at the creation time the API MAY be called `StartSpan` or `StartScopedSpan`.
### What is Span lifetime?
Span lifetime represents the process of recording the start and the end timestamps to the Span
object:
* The start time is recorded when the Span is created.
* The end time needs to be recorded when the operation is ended.
### Why support Spans that are not attached to the Context?
* Allow users to use the OpenCensus library without using a Context.
* Allow users to have more control for the lifetime of the Span.
* There are cases, for example HTTP/RPC interceptors, where the Span creation and usage happens in
different places, and the user does not have control of the framework to control the Context
propagation.
[goContext]: https://golang.org/pkg/context
[javaContext]: https://github.com/grpc/grpc-java/blob/master/context/src/main/java/io/grpc/Context.java
[SpanDataModel]: https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/trace/v1/trace.proto
[TracestateLink]: https://w3c.github.io/trace-context/#tracestate-field

View File

@ -0,0 +1,41 @@
# TraceConfig
Global configuration of the trace service. This allows users to change configs for the default
sampler, maximum events to be kept, etc.
## TraceParams
Represents the set of parameters that users can control
* Default `Sampler` - used when creating a Span if no specific sampler is given. The default sampler
is a [Probability](Sampling.md) sampler with the probability set to `1/10000`.
### Limits
We define limits on the number of attributes, annotations, message events and links on each span
in order to prevent unbounded memory increase for long-running spans.
When limits are exceeded, implementations should by default preserve the most recently added values
and drop the oldest values. Implementations may make this policy configurable.
Implementations should track the number of dropped items per span. Some backends provide dedicated
support for tracking these counts. Others do not, but exporters may choose to represent these in
exported spans in some way (for example, as a tag).
Implementations may support tracking the total number of dropped items in stats as outlined.
| Item | Default Limit | Measure for dropped items |
| --- | --- | --- |
| Attributes | 32 | opencensus.io/trace/dropped_attributes |
| Annotations | 32 | opencensus.io/trace/dropped_annotations |
| Message Events | 128 | opencensus.io/trace/dropped_message_events |
| Links | 32 | opencensus.io/trace/dropped_links |
No views should be registered by default on these measures. Users may register views if they
are interested in recording these measures.
Implementations should provide a way to override the globals per-span.
## API Summary
* Permanently update the active TraceParams.
* Temporary update the active TraceParams. This API allows changing of the active params for a
certain period of time. No more than one active temporary update can exist at any moment.
* Get current active TraceParams.