semantic-conventions/work_in_progress/specification/encodings/BinaryEncoding.md

147 lines
5.7 KiB
Markdown

# BINARY FORMAT
The binary format can be used to encode different data types, each with different fields. This
document first describes the general format and then applies it to specific data types,
including Trace Context and Tag Context.
## General Format
Each encoding will have a 1 byte version followed by the version format encoding:
`<version><version_format>`
This will allow us to, in 1 deprecation cycle to completely switch to a new format if needed.
## Version Format (version_id = 0)
The version format for the version_id = 0 is based on ideas from proto encoding. The main
requirements are to allow adding and removing fields in less than 1 deprecation cycle. It
contains a list of fields:
`<field><field>...`
### Field
Each field is a 1-byte field ID paired with a field value, where the format of the field value is
determined by both the field ID and the data type. For example, field 0 in `Trace Context` may
have a completely different format than field 0 in `Tag Context` or field 1 in `Trace Context`.
Each field that we send on the wire will have the following format:
`<field_id><field_format>`
* `field_id` is a single byte.
* `field_format` must be defined for each field separately.
The specification for a data type's format must also specify whether each field is optional or
repeated. For example, `Trace-id` in `Trace Context` is optional, and `Tag` in `Tag Context`
is repeated. The specification for a data type's format MAY define a default value for any
optional field, which must be used when the field is missing.
The specification for a data type can define versions within a version of the format, called data
type version, where each data type version adds new fields. The data type version can be useful
for describing what fields an implementation supports, but it is not included in the
serialized data.
### Serialization Rules
Fields MUST be serialized in data type version order (i.e. all fields from version (i) of a data
type must precede all fields from version (i+1)). That is because each field has its own format,
and old implementations may not be able to determine where newer field values end. This ordering
allows old decoders to ignore any new fields when they do not know the format for those fields.
Fields within a data type version can be serialized in any order, and fields with the same field
ID do not need to be serialized consecutively.
### Deserialization Rules
Because all the fields will be decoded in data type version order, the deserialization will
simply read the encoded input until the end of the input or until the first unknown field_id. An
unknown field id should not be considered a parse error. Implementations MAY pass on any fields
that they cannot decode, when possible (by passing-through the whole opaque tail of bytes
starting with the first field id that the current binary does not understand).
### How can we add new fields?
If we follow the rules that we always append the new ids at the end of the buffer we can add up
to 127.
TODO(bdrutu): Decide what to do after 127: a) use varint encoding or b) just reserve 255 as a
continuation byte.
### How can we remove a field?
We can stop sending any field at any moment and the decoders will be able to skip the missing ids
and use the default values.
### Trace Context
#### Fields added in Trace Context version 0
##### Trace-id
* optional
* `field_id` = 0
* `len` = 16
Is the ID of the whole trace forest. It is represented as an opaque 16-bytes array,
e.g. (in hex), `4bf92f3577b34da6a3ce929d000e4736`. All bytes 0 is considered invalid.
##### Span-id
* optional
* `field_id` = 1
* `len` = 8
Is the ID of the caller span (parent). It is represented as an opaque 8-bytes array,
e.g. (in hex), `34f067aa0ba902b7`. All bytes 0 is considered invalid.
##### Trace-options
* optional
* `field_id` = 2
* `len` = 1
Controls tracing options such as sampling, trace level etc. It is a 1-byte
representing a 8-bit unsigned integer. The least significant bit provides
recommendation whether the request should be traced or not (1 recommends the
request should be traced, 0 means the caller does not make a decision to trace
and the decision might be deferred). The flags are recommendations given by the
caller rather than strict rules to follow for 3 reasons:
1. Trust and abuse.
2. Bug in caller
3. Different load between caller service and callee service might force callee to down sample.
The behavior of other bits is currently undefined.
#### Valid example
{0,
0, 75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54,
1, 52, 240, 103, 170, 11, 169, 2, 183,
2, 1}
This corresponds to:
* `traceId` = {75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54}
* `spanId` = {52, 240, 103, 170, 11, 169, 2, 183}
* `traceOptions` = 1
### Tag Context
The Tag Context format uses Varint encoding, which is described in
https://developers.google.com/protocol-buffers/docs/encoding#varints.
#### Fields added in Tag Context version 0
##### Tag
* repeated
* `field_id` = 0
* `field_format` = `<tag_key_len><tag_key><tag_val_len><tag_val>` where
* `tag_key_len` is a varint encoded integer.
* `tag_key` is `tag_key_len` bytes comprising the tag key name.
* `tag_val_len` is a varint encoded integer.
* `tag_val` is `tag_val_len` bytes comprising the tag value.
* Tags can be serialized in any order.
* Multiple tag fields can contain the same tag key. All but the last value for
that key should be ignored.
* The
[size limit for serialized Tag Contexts](https://github.com/census-instrumentation/opencensus-specs/blob/master/tags/TagMap.md#limits)
should apply to all tag fields, even if some of them have duplicate keys. For
example, a serialized tag context with 10,000 small tags that all have the
same key should be considered too large.