semantic-conventions/docs/how-to-write-conventions/resource-and-entities.md

340 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Resource and Entities
<!-- toc -->
- [Modelling Guide](#modelling-guide)
- [Declaring associations between signals](#declaring-associations-between-signals)
- [Extending an entity](#extending-an-entity)
- [FAQ](#faq)
- [When to define a new entity?](#when-to-define-a-new-entity)
- [What is an "is-a" relationship?](#what-is-an-is-a-relationship)
- [When to define an "is-a" relationship vs. extending descriptive attributes?](#when-to-define-an-is-a-relationship-vs-extending-descriptive-attributes)
- [How to define identifying attributes?](#how-to-define-identifying-attributes)
- [Multi-Observer Guidance](#multi-observer-guidance)
- [How to namespace entities?](#how-to-namespace-entities)
- [Background: Resource and Entities](#background-resource-and-entities)
- [Open Expansion](#open-expansion)
- [Telescoping Identity](#telescoping-identity)
<!-- tocstop -->
## Modelling Guide
To define an entity, create a new semantic convention model file, with a group type as `entity`, for example:
`model/{my_domain}/entities.yaml`:
```yaml
groups:
- id: entity.my_entity
type: entity
stability: development
name: my_entity
brief: >
A description of my_entity here.
attributes:
- ref: some.attribute
role: identifying
- ref: some.other_attribute
role: descriptive
...
```
Here, the attributes field contains all attributes of the Entity.
The `role` of each attribute determines if it is identifying or
descriptive. See
[How to define identifying attributes?](#how-to-define-identifying-attributes)
for details on what these mean.
> Note: Declaring Entity relationships is not yet supported.
### Declaring associations between signals
You can declare which entities should be used with specific observability
signals. For example, process metrics should be used with the process
entity, so that the metric is associated with a known process. To declare
this, use the `entity_associations` field on the signal and reference
another resource group *by name*.
`model/{my_domain}/metrics.yaml`:
```yaml
groups:
- id: metric.some_metric
type: metric
...
entity_associations:
- my_entity
```
Notes:
- You cannot declare an association on an *unstable* resource from a
*stable* signal.
- You can declare multiple associations. These form a "one or many" set,
where one or many of the named entities may be associated with the
metric. There is *no* requirement to have one and only one entity
attached to a signal.
### Extending an entity
While not recommended for Semantic Conventions, you can define a new
"view" of an entity that includes additional descriptive attributes. To
do so, use the extends field on groups:
`model/{my_other_domain}/entities.yaml`:
```yaml
groups:
- id: entity.my_entity_2
type: entity
extends: entity.my_entity
attributes:
- ref: new.attribute.name
requirement_level: opt_in
role: descriptive
```
Notes:
- You cannot change the `name` or `type` fields of the new entity.
- You cannot change the set of identifying attributes.
## FAQ
### When to define a new entity?
There are two scenarios where entities should be defined:
- When you are generating a new signal (log, metric, span, etc.)
and no existing entity makes sense as the "source".
- (future) When you need to describe an entity hierarchy from
some system of record (e.g. resources in kubernetes, assets
in a cloud).
For example, if a new clustering solution (e.g. Hashicorp's Nomad) is
defined, and existing container-based entities are not enough, then
new entities should be defined.
### What is an "is-a" relationship?
OpenTelemetry, as an open ecosystem, cannot understand and model all
possible entities that exist in the world. Instead, we are allowing
overlapping definitions across domains. For example, the `container` and
`k8s.container` entities exist, and generally every `k8s.container` is
a `container`, but not every `container` runs in kubernetes.
An "is-a" relationship denotes that one entity is describing the exact
same system component as another entity, but from a different domain. In
the above example, `k8s.container` models containers from the kubernetes
domain, while `container` is a general model for containers, regardless
of how they are run (e.g. podman, docker, kubernetes, FAAS, etc.)
"is-a" relationships denote this relationship in entities allowing
OpenTelemetry to fully model a subset of entities (e.g. all known
`k8s` resources as entities), but still allow the extended ecosystem
to grow and evolve with new entities in the future.
### When to define an "is-a" relationship vs. extending descriptive attributes?
There are two key rules:
- Default to introducing separate entities with a clear "is a" (or similar) relationship
- Extend an entity with new descriptive attributes if, and only if, the following is true:
- The extending entity cannot be associated with any telemetry by itself.
- Example 1: When adding a `windows.process` entity, you do not expect
to create any specific process metrics or logs that would be
specific to `windows.process`, instead all data is still reported
against `process` entity.
- Example 2: When adding a `docker.container` entity, you do not
expect to create any specific container signals, instead logs,
metrics and spans would be reported against the `container` entity.
- The extending entity doesnt have any other entities that logically can
only be associated with it.
*Note: this only applies to entities as signals with relationships.*
This avoids the complexities of subtyping and ambiguous attribute usage.
### How to define identifying attributes?
The identifying attributes should be minimally sufficient to
identify an entity within the context of how that entity is
discovered. For example, when discovering kubernetes entities
like `k8s.pod`, `k8s.deployment`, the identifying attributes
should be sufficient to identify these entities within the
scope of a kubernetes cluster (or more specifically, the
kubernetes API server where the entities are discovered).
Commonly, a number of attributes of an entity are readily
available for the telemetry producer to compose an identity. Of
the available attributes the entity ID should include the
minimal set of attributes that is sufficient for uniquely identifying that entity. For example a process on a host can be
uniquely identified by (`process.pid`,`process.creation.time`)
attributes. Adding for example `process.executable.name`
attribute to the identity is unnecessary and violates the
rule of having a [minimally sufficient ID](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.45.0/specification/entities/data-model.md#minimally-sufficient-identity).
Identifying attributes generally form the lifespan of an entity. This is
important, particularly, for metrics written against an entity. The lifespan
determines whether a timeseries remains "connected" between reported points, or
if it suddenly looks like a drop. It is recommended to select an identity that
keeps the lifespan "stable" for important alerting and monitoring use cases.
Identifying attributes MUST NOT change during the lifespan of the entity.
#### Multi-Observer Guidance
When choosing Identifying attributes, care should be taken to ensure that
multiple observers will find the same identifying attribute for the same
entity. Generally, entities may be discovered both within OpenTelemetry
SDKs and the Collector and should leverage identifying attributes that
will be the same between these signal providers.
For example, `service.instance.id` can be problematic to detect from outside
an SDK and inside an SDK consistently. Generally, this can only be
achieved if some outside source injects a `service.instance.id` value into
the SDK that is externally visible. An alternative is to have the SDK
provide a relationship between the `service.instance.id` and another entity
that is visible externally. Care should be taken when modelling Entities
to avoid this problem where possible.
The choice of `service.instance.id` should be an exception, not the rule, for
most Entities being modelled. Service instancing is a fundamental feature of
OpenTelemetry, and we think it is a critical "fall back" identity. It works
best when there is *one* generator of the id shared across all observers.
However, in practice, this is difficult or "non standard" in the following
scenarios:
- Prometheus pull metrics that want the `instance` label to match
`service.instance.id` on push based OTLP data.
- Reading container logs from a `k8s.node`, where we know the container name
and deployment, but can't see into the SDK to understand a chosen instance id.
The OpenTelemetry Operator, and onboarding guides for kubernetes, e.g. leverage
mechanisms to ensure a `service.instance.id` can be pushed down to SDKs and
external observers, alleviating this friction for kubernetes.
### How to namespace entities?
Entities (both types and and attributes) should be namespaced around
the primary mechanism used to identify the Entity. For Example,
kubernetes entities use the `k8s` namespace, and are primarily
discovered using the kubernetes API or working within kubernetes.
See [General Naming Guidance](/docs/general/naming.md) for overall
semantic convention namespacing rules.
## Background: Resource and Entities
In OpenTelemetry, every signal is associated with a Resource.
According to the [Specification](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.45.0/specification/resource#overview)
this is:
> A Resource is a representation of the entity producing telemetry. Within
> OpenTelemetry, all signals are associated with a Resource, enabling contextual
> correlation of data from the same source. For example, if I see a high latency
> in a span I need to check the metrics for the same entity that produced that
> Span during the time when the latency was observed.
>
> Resource provides two important aspects for observability:
>
> * It MUST identify an entity that is producing telemetry.
> * It SHOULD allow users to determine where that entity resides within their
> infrastructure.
All resources are composed of [Entities](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.45.0/specification/entities/README.md#overview).
An entity is specified as:
> Entity represents an object of interest associated with
> produced telemetry: traces, metrics, logs, profiles etc.
While there is overlap in the definition of Entity and Resource, there are
several key differences between the two:
- An Entity has a known "type", e.g. `service`, `k8s.pod`,
`host`, etc.
- An Entity can distinguish *identifying* attributes from
*descriptive* attributes.
- Identifying attributes can be used to identify the entity
within some system (See
[minimally sufficient id](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.45.0/specification/entities/data-model.md#minimally-sufficient-identity)).
For Example, the `k8s.pod.uid` would be considered an
identifying attribute for a pod within kubernetes.
- *Descriptive* attributes can be used to provide additional labels for
entities, but are not necessary to uniquely identify the Entity.
- A Resource is composed of *multiple* Entities.
- Each of the entities within Resource is considered
'contributing' to that telemetry.
- For Example, today, most SDKs include the service entity,
but also another entity, like `k8s.container`, `host`, etc.
- An Entity may be conceptually similar to another (which we
call an "is-a" relationship).
- For example, the `k8s.cluster` entity generically
represents kubernetes clusters, while the `aws.eks.cluster`
entity would represent the AWS specific concept of an
Elastic Kubernetes cluster.
- In this case, a Resource from EKS could contain both the
`aws.eks.cluster` entity and the `k8s.cluster` entity.
There are two key principles that are important for Entities
and Resource in OpenTelemetry:
1. *Open expansion*: Allowing users outside of OpenTelemetry to provide
Entity definitions and relationships within the system.
2. *Telescoping Identity*: Allowing flexible denormalization of
observability data to optimise critical queries (e.g. alerts, dashboard,
etc.)
### Open Expansion
OpenTelemetry is designed to be an open system. When it comes to defining
the core set of entities and relationships within systems, it needs to
remain open about what these entities and possible relationships are. Any
system a user has should be able to model and participate with existing
OpenTelemetry semantic conventions. This is done through two key aspects:
- Namespacing
- "Is-a" Relationships
When defining a new set of entities within OpenTelemetry Semantic
Conventions, they should be namespaced, as per the
[Semantic Convention naming policy](/docs/general/naming.md#general-naming-considerations).
This gives clear indication which concepts are clearly related with each
other. For example, the k8s namespace would define kubernetes related
entities and their relationships. Users would know to create a new
namespace when modelling concepts on top of k8s.
Expansion to existing concepts is done through "is-a" relationships.
These are relationships where one entity is known to represent the same
concept as another entity, but in some new scoped context. For example,
an `aws.eks.cluster` is a `k8s.cluster`, but not all `k8s.cluster`
entities are `aws.eks.cluster` entities.
### Telescoping Identity
Within OpenTelemetry, we want to give users the flexibility to decide
what information needs to be sent *with* observability signals and what
information can be later joined. We call this "telescoping identity"
where users can decide how *small* or *large* the size of an OpenTelemetry
resource will be on the wire (and correspondingly, how large data points
may be when stored, depending on storage solution).
For example, in the extreme, OpenTelemery could synthesize a UUID for
every system which produces telemetry. All identifying attributes for
Resource and Entity could be sent via a side channel with known
relationships to this UUID. While this would optimise the runtime
generation and sending of telemetry, it comes at the cost of downstream
storage systems needing to join data back together either at ingestion
time or query time. For high performance use cases, e.g. alerting, these
joins can be expensive.
In practice, users control Resource identity via the configuration of
Resource Detection within SDKs and the collector. Users wishing for
minimal identity may limit their resource detection just to a
`service.instance.id`. Some users highly customize resource detection with
many concepts being appended.
*OpenTelemetry should provide a good "out of the box" set of resource
detection that makes appropriate denormalization trade-offs for most
users, but allows users to fine-tune the system to their needs.*