Depends on
https://github.com/open-telemetry/opentelemetry-collector/pull/12856Resolves#12676
This is a reboot of #11311, incorporating metrics defined in the
[component telemetry
RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md)
and attributes added in #12617.
The basic pattern is:
- When building any pipeline component which produces data, wrap the
"next consumer" with instrumentation to measure the number of items
being passed. This wrapped consumer is then passed into the constructor
of the component.
- When building any pipeline component which consumes data, wrap the
component itself. This wrapped consumer is saved onto the graph node so
that it can be retrieved during graph assembly.
---------
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
As mentioned in this
https://github.com/open-telemetry/opentelemetry-collector/issues/8721#issuecomment-1813468623,
the error message for unused connectors currently lacks specific
pipeline names, making debugging more difficult.
This PR enhances the error message by including pipeline names in the
`[signal/name]` format, consistent with how they appear in
`config.yaml`. This provides a better context for identifying
misconfigurations.
<!-- Issue number if applicable -->
#### Link to tracking issue
Related to #8721
<!--Describe what testing was performed and which tests were added.-->
#### Testing
A few scenarios and example output are given below. I will do additional
testing and add unit tests if necessary.
**1. Used as a receiver but not used as an exporter with 1 signal**
<details>
<summary><strong>config.yaml</strong></summary>
```yaml
receivers:
otlp:
protocols:
grpc:
exporters:
debug:
connectors:
forward:
service:
pipelines:
logs/in:
receivers: [otlp]
processors: []
exporters: [debug]
logs/out:
receivers: [forward]
processors: []
exporters: [debug]
```
</details>
Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as receiver in logs pipeline but not used in any supported exporter pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as receiver in [logs/out] pipeline but not used in any supported exporter pipeline
```
**2. Plain**
<details>
<summary><strong>config.yaml</strong></summary>
```yaml
receivers:
otlp:
protocols:
grpc:
exporters:
debug:
connectors:
forward:
service:
pipelines:
traces:
receivers: [ otlp ]
processors: [ ]
exporters: [ forward ]
metrics:
receivers: [ forward ]
processors: [ ]
exporters: [ debug ]
```
</details>
Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in traces pipeline but not used in any supported receiver pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in [traces] pipeline but not used in any supported receiver pipeline
```
**3. Multiple pipeline**
<details>
<summary><strong>config.yaml</strong></summary>
```yaml
receivers:
otlp:
protocols:
grpc:
exporters:
debug:
connectors:
forward:
service:
pipelines:
logs/in:
receivers: [otlp]
processors: []
exporters: [forward]
logs/in2:
receivers: [ otlp ]
processors: [ ]
exporters: [ forward ]
logs/out:
receivers: [otlp]
processors: []
exporters: [debug]
traces:
receivers: [ otlp ]
processors: [ ]
exporters: [ forward ]
metrics:
receivers: [ forward ]
processors: [ ]
exporters: [ debug ]
```
</details>
Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in logs pipeline but not used in any supported receiver pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in [logs/in2 logs/in] pipeline but not used in any supported receiver pipeline
```
---------
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
Implements the logger described in
https://github.com/open-telemetry/opentelemetry-collector/issues/12217
Alternative to #12057Resolves#11814
`component/componentattribute`:
- Initializes new module
- Defines constants for component telemetry attribute keys
- Defines a `zapcore.Core` which can remove attributes from the root
logger
`service`:
- Rebases component instantiation on attribute sets
- Internal constructors for attribute sets for each component type
- Constructs loggers from `componentattribute`
`otlpreceiver`:
- Uses `componentattribute` to remove `otelcol.signal` attribute from
logger
`memorylimiter`:
- Uses `componentattribute` to remove `otelcol.signal`,
`otelcol.pipeline.id` and `otelcol.component.id` attributes from logger
Subset of #12057
This PR adds a test to validate the expected number of instances of each
component. This framework becomes more useful once singleton components
are explicitly supported.
#### Description
[whitespace](https://golangci-lint.run/usage/linters/#whitespace) is a
linter that checks for unnecessary newlines at the start and end of
functions.
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Move componentprofiles to pipelineprofiles since only the signal
constant is defined in that package.
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Follows #11330
Currently, `connectorNode` contains separate `component.Component` and
`baseConsumer` fields. These fields are essentially two representations
of the same component, but `baseConsumer` may be wrapped in another
consumer that inherits capabilities. Rather than maintain two separate
handles, this PR switches to a unified field. I believe this change
helps normalize the connector node with other types of consumer nodes
and will enable further refactoring opportunities.
This PR follows #11321 by splitting up the primary test file into a few
related topics. I believe this will make further refactoring PRs easier
to follow.
#### Description
Testifylint doesn't support it yet.
This replaces `Contains(t, err.Error()` by `ErrorContains(t, err` and
`Equal(t, err.Error()` by `EqualError(t, err`
As they both check for nil error it becomes useless to check it yourself
without having defined a custom message
<!-- Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> -->
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This is the last PR to add profiles support, adding it to the service
package.
This is based after #11023.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the processor builder out of the `processor` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the exporter builder out of the `exporter` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the connector builder out of the `connector` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
#### Description
This moves the receiver builder out of the `receiver` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
#### Description
This PR makes component.InstanceID immutable. Previously it was a struct
with all fields exported. Technically this is a breaking change, but the
only thing using the InstanceID is the in-progress
healthcheckv2extension.
<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes#10494
<!--Describe what testing was performed and which tests were added.-->
#### Testing
units
<!--Describe the documentation added.-->
#### Documentation
code comments
<!--Please delete paragraphs that you did not use before submitting.-->
---------
Co-authored-by: Antoine Toulme <antoine@toulme.name>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
#### Description
This PR removes `ReportStatus` from `component.TelemetrySettings` and
instead expects components to check if their `component.Host` implements
a new `componentstatus.Reporter` interface.
<!-- Issue number if applicable -->
#### Link to tracking issue
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10725
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10413
<!--Describe what testing was performed and which tests were added.-->
#### Testing
unit tests and a sharedinstance e2e test.
The contrib tests will fail because this is a breaking change. If we
merge this I and @mwear can commit to updating contrib before the next
release.
---------
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
The primary objective here was to add new test cases for the graph.
However, I found that mutability assertions added in #8634 appear to be
nondeterministic. Therefore, important test cases cannot be covered with
them in place. This effectively removes the assertions about mutability
for now.
@dmitryax, I'm curious if you have better ideas here. I am thinking that
perhaps it would be better to have an entirely separate set of test
cases which are focused on mutability expectations.
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
#### Description
Reorganizes service to not require `servicetelemetry.TelemetrySettings`
and instead depend directly on `component.TelemetrySettings`
Whether or not we move forward with
https://github.com/open-telemetry/opentelemetry-collector/pull/10725 I
think this is a useful change for service.
#### Testing
Unit tests
This PR refactors the service graph initialization so that the graph can
be assembled without the intention to start it.
1. Do not save telemetry on the graph. The only use of this was for
reporting component status. Instead, pass in a reporter when starting or
stopping. Correspondingly, add an internal `statustest` package to make
it easy to pass in a status reporter in tests.
2. Decouple graph building from extension building. There isn't any
direct relationship so these things should be separated.
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings
Part of #9428
---------
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings
Part of #9428
~Follows
https://github.com/open-telemetry/opentelemetry-collector/pull/10333~
---------
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings
Part of #9428
---------
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
This reduces dependencies from the consumer package while making
testdata available across repos. It will allow us to remove duplicated
code and its a fairly small surface area.
Fixes
https://github.com/open-telemetry/opentelemetry-collector/issues/9886
---------
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
**Description:**
- Adds `component.MustNewType` to create a type. This function panics if
the type has invalid characters. Add similar functions
`component.MustNewID` and `component.MustNewIDWithName`.
- Adds `component.Type.String` to recover the string
- Use `component.MustNewType`, `component.MustNewID`,
`component.MustNewIDWithName` and `component.Type.String` everywhere in
this codebase. To do this I changed `component.Type` into an opaque
struct and checked for compile-time errors.
Some notes:
1. All components currently on core and contrib follow this rule. This
is still breaking for other components.
2. A future PR will change this into a struct, to actually validate this
(right now you can just do `component.Type("anything")` to bypass
validation). I want to do this in two steps to avoid breaking contrib
tests: we first introduce this function, and after that we change into a
struct.
**Link to tracking Issue:** Updates #9208
This fixes two closely related problems.
1. While fanoutconsumers do not themselves mutate data, they should
expose whether or not they are handing data off to consumers which may
do so. Otherwise, the service cannot correctly determine how to fan out
after a receiver. e.g. a receiver shared between two pipelines, one of
which contains an exporter or connector which mutates data.
2. Connectors can themselves mutate data but we were not taking this
into account when building the graph.
This is part of the continued component status reporting effort.
Currently we have automated status reporting for the following component
lifecycle events: `Starting`, `Stopping`, `Stopped` as well as
definitive errors that occur in the starting or stopping process (e.g.
as determined by an error return value). This leaves the responsibility
to the component to report runtime status after start and before stop.
We'd like to be able to extend the automatic status reporting to report
`StatusOK` if `Start` completes without an error. One complication with
this approach is that some components spawn async work (via goroutines)
that, depending on the Go scheduler, can report status before `Start`
returns. As such, we cannot assume a nil return value from `Start` means
the component has started properly. The solution is to detect if the
component has already reported status when start returns, if it has, we
will use the component-reported status and will not automatically report
status. If it hasn't, and `Start` returns without an error, we can
report `StatusOK`. Any subsequent reports from the component (async or
otherwise) will transition the component status accordingly.
The tl;dr is that we cannot control the execution of async code, that's
up to the Go scheduler, but we can handle the race, report the status
based on the execution, and not clobber status reported from within the
component during the startup process. That said, for components with
async starts, you may see a `StatusOK` before the component-reported
status, or just the component-reported status depending on the actual
execution of the code. In both cases, the end status will be same.
The work in this PR will allow us to simplify #8684 and #8788 and
ultimately choose which direction we want to go for runtime status
reporting.
**Link to tracking Issue:** #7682
**Testing:** units / manual
---------
Co-authored-by: Alex Boten <aboten@lightstep.com>
This change enables the runtime assertions to catch unintentional pdata
mutations in components claiming as non-mutating pdata. Without these
assertions, runtime errors may still occur, but thrown by unrelated
components, making it very difficult to troubleshoot.
This required introducing extra API to get the pdata mutability state:
- p[metric|trace|log].[Metrics|Traces|Logs].IsReadOnly()
Resolves:
https://github.com/open-telemetry/opentelemetry-collector/issues/6794
This PR introduces component status reporting. There have been several
attempts to introduce this functionality previously, with the most
recent being: #6560.
This PR was orignally based off of #6560, but has evolved based on the
feedback received and some additional enhancements to improve the ease
of use of the `ReportComponentStatus` API.
In earlier discussions (see
https://github.com/open-telemetry/opentelemetry-collector/pull/8169#issuecomment-1668367246)
we decided to model status as a finite state machine with the following
statuses: `Starting`, `OK`, `RecoverableError`, `PermanentError`,
`FatalError`. `Stopping`, and `Stopped`. A benefit of this design is
that `StatusWatcher`s will be notified on changes in status rather than
on potentially repetitive reports of the same status.
With the additional statuses and modeling them using a finite state
machine, there are more statuses to report. Rather than having each
component be responsible for reporting all of the statuses, I automated
status reporting where possible. A component's status will automatically
be set to `Starting` at startup. If the components `Start` returns an
error, the status will automatically be set to `PermanentError`. A
component is expected to report `StatusOK` when it has successfully
started (if it has successfully started) and from there can report
changes in status as it runs. It will likely be a common scenario for
components to transition between `StatusOK` and `StatusRecoverableError`
during their lifetime. In extenuating circumstances they can transition
into terminal states of `PermanentError` and `FatalError` (where a fatal
error initiates collector shutdown). Additionally, during component
Shutdown statuses are automatically reported where possible. A
component's status is set to `Stopping` when Shutdown is initially
called, if Shutdown returns an error, the status will be set to
`PermanentError` if it does not return an error, the status is set to
`Stopped`.
In #6560 ReportComponentStatus was implemented on the `Host` interface.
I found that few components use the Host interface, and none of them
save a handle to it (to be used outside of the `start` method). I found
that many components keep a handle to the `TelemetrySettings` that they
are initialized with, and this seemed like a more natural, convenient
place for the `ReportComponentStatus` API. I'm ultimately flexible on
where this method resides, but feel that `TelemetrySettings` a more user
friendly place for it.
Regardless of where the `ReportComponentStatus` method resides (Host or
TelemetrySettings), there is a difference in the method signature for
the API based on whether it is used from the service or from a
component. As the service is not bound to a specific component, it needs
to take the `instanceID` of a component as a parameter, whereas the
component version of the method already knows the `instanceID`. In #6560
this led to having both `component.Host` and `servicehost.Host` versions
of the Host interface to be used at the component or service levels. In
this version, we have the same for TelemetrySettings. There is a
`component.TelemetrySettings` and a `servicetelemetry.Settings` with the
only difference being the method signature of `ReportComponentStatus`.
Lastly, this PR sets up the machinery for report component status, and
allows extensions to be `StatusWatcher`s, but it does not introduce any
`StatusWatcher`s. We expect the OpAMP extension to be a `StatusWatcher`
and use data from this system as part of its AgentHealth message (the
message is currently being extended to accommodate more component level
details). We also expect there to be a non-OpAMP `StatusWatcher`
implementation, likely via the HealthCheck extension (or something
similiar).
**Link to tracking Issue:** #7682
cc: @tigrannajaryan @djaglowski @evan-bradley
---------
Co-authored-by: Tigran Najaryan <tnajaryan@splunk.com>
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Alex Boten <aboten@lightstep.com>