<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Wraps the component ID and type into the component's start retuned
error. If `telemetry::logs::level` is < error (e.g fatal), the service
error message does not give information about which pipeline component
failed:
Current error message:
```
$ otelcontribcol --config config.yaml
Error: cannot start pipelines: start function error
2024/12/10 10:07:25 collector server run finished with error: cannot start pipelines: start function error
```
With these changes:
```
$ otelcontribcol --config config.yaml
Error: cannot start pipelines: failed to start geoip processor: start
function error
2024/12/10 09:58:29 collector server run finished with error: cannot
start pipelines: failed to start geoip processor: start function error
```
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/issues/10426
<!--Describe what testing was performed and which tests were added.-->
#### Testing
<!--Describe the documentation added.-->
#### Documentation
<!--Please delete paragraphs that you did not use before submitting.-->
---------
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
create `service/hostcapabilities` package to expose `GetModuleInfos()`
from service/host. Also moves getExporters interface for
`GetExporters()` to this package.
<!-- Issue number if applicable -->
#### Link to tracking issue
Addresses planned work from #12296
<!--Describe what testing was performed and which tests were added.-->
#### Testing
none, creates interface for existing function
<!--Describe the documentation added.-->
#### Documentation
changelog yaml
<!--Please delete paragraphs that you did not use before submitting.-->
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
As mentioned in this
https://github.com/open-telemetry/opentelemetry-collector/issues/8721#issuecomment-1813468623,
the error message for unused connectors currently lacks specific
pipeline names, making debugging more difficult.
This PR enhances the error message by including pipeline names in the
`[signal/name]` format, consistent with how they appear in
`config.yaml`. This provides a better context for identifying
misconfigurations.
<!-- Issue number if applicable -->
#### Link to tracking issue
Related to #8721
<!--Describe what testing was performed and which tests were added.-->
#### Testing
A few scenarios and example output are given below. I will do additional
testing and add unit tests if necessary.
**1. Used as a receiver but not used as an exporter with 1 signal**
<details>
<summary><strong>config.yaml</strong></summary>
```yaml
receivers:
otlp:
protocols:
grpc:
exporters:
debug:
connectors:
forward:
service:
pipelines:
logs/in:
receivers: [otlp]
processors: []
exporters: [debug]
logs/out:
receivers: [forward]
processors: []
exporters: [debug]
```
</details>
Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as receiver in logs pipeline but not used in any supported exporter pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as receiver in [logs/out] pipeline but not used in any supported exporter pipeline
```
**2. Plain**
<details>
<summary><strong>config.yaml</strong></summary>
```yaml
receivers:
otlp:
protocols:
grpc:
exporters:
debug:
connectors:
forward:
service:
pipelines:
traces:
receivers: [ otlp ]
processors: [ ]
exporters: [ forward ]
metrics:
receivers: [ forward ]
processors: [ ]
exporters: [ debug ]
```
</details>
Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in traces pipeline but not used in any supported receiver pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in [traces] pipeline but not used in any supported receiver pipeline
```
**3. Multiple pipeline**
<details>
<summary><strong>config.yaml</strong></summary>
```yaml
receivers:
otlp:
protocols:
grpc:
exporters:
debug:
connectors:
forward:
service:
pipelines:
logs/in:
receivers: [otlp]
processors: []
exporters: [forward]
logs/in2:
receivers: [ otlp ]
processors: [ ]
exporters: [ forward ]
logs/out:
receivers: [otlp]
processors: []
exporters: [debug]
traces:
receivers: [ otlp ]
processors: [ ]
exporters: [ forward ]
metrics:
receivers: [ forward ]
processors: [ ]
exporters: [ debug ]
```
</details>
Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in logs pipeline but not used in any supported receiver pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in [logs/in2 logs/in] pipeline but not used in any supported receiver pipeline
```
---------
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
Subset of #12057
This PR adds a test to validate the expected number of instances of each
component. This framework becomes more useful once singleton components
are explicitly supported.
#### Description
[gofumpt](https://golangci-lint.run/usage/linters/#gofumpt) is a
stricter format than gofmt, while being backwards compatible.
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
Move componentprofiles to pipelineprofiles since only the signal
constant is defined in that package.
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This is the last PR to add profiles support, adding it to the service
package.
This is based after #11023.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the processor builder out of the `processor` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the exporter builder out of the `exporter` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the connector builder out of the `connector` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
#### Description
This moves the receiver builder out of the `receiver` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
#### Description
This PR makes component.InstanceID immutable. Previously it was a struct
with all fields exported. Technically this is a breaking change, but the
only thing using the InstanceID is the in-progress
healthcheckv2extension.
<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes#10494
<!--Describe what testing was performed and which tests were added.-->
#### Testing
units
<!--Describe the documentation added.-->
#### Documentation
code comments
<!--Please delete paragraphs that you did not use before submitting.-->
---------
Co-authored-by: Antoine Toulme <antoine@toulme.name>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
#### Description
This PR removes `ReportStatus` from `component.TelemetrySettings` and
instead expects components to check if their `component.Host` implements
a new `componentstatus.Reporter` interface.
<!-- Issue number if applicable -->
#### Link to tracking issue
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10725
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10413
<!--Describe what testing was performed and which tests were added.-->
#### Testing
unit tests and a sharedinstance e2e test.
The contrib tests will fail because this is a breaking change. If we
merge this I and @mwear can commit to updating contrib before the next
release.
---------
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
#### Description
Reorganizes service to not require `servicetelemetry.TelemetrySettings`
and instead depend directly on `component.TelemetrySettings`
Whether or not we move forward with
https://github.com/open-telemetry/opentelemetry-collector/pull/10725 I
think this is a useful change for service.
#### Testing
Unit tests
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Improved logging of component start errors, so the log lines appear
where they chronologically happen, rather than at the end. They also use
the component logger, which makes them look nicer.
Output before change:
```text
2024-06-19T14:15:30.822+0200 info service@v0.103.0/service.go:115 Setting up own telemetry...
2024-06-19T14:15:30.822+0200 info service@v0.103.0/telemetry.go:96 Serving metrics {"address": ":8888", "level": "Normal"}
2024-06-19T14:15:30.823+0200 info service@v0.103.0/service.go:182 Starting otelcorecol... {"Version": "0.103.0-dev", "NumCPU": 32}
2024-06-19T14:15:30.823+0200 info extensions/extensions.go:35 Starting extensions...
2024-06-19T14:15:30.823+0200 info service@v0.103.0/service.go:245 Starting shutdown...
2024-06-19T14:15:30.823+0200 info extensions/extensions.go:61 Stopping extensions...
2024-06-19T14:15:30.823+0200 info service@v0.103.0/service.go:259 Shutdown complete.
Error: cannot start pipelines: failed to resolve authenticator "oauth2client": authenticator not found
2024/06/19 14:15:30 collector server run finished with error: cannot start pipelines: failed to resolve authenticator "oauth2client": authenticator not found
```
Output after change:
```
2024-06-19T14:25:18.324+0200 info service@v0.103.0/service.go:115 Setting up own telemetry...
2024-06-19T14:25:18.325+0200 info service@v0.103.0/telemetry.go:96 Serving metrics {"address": ":8888", "level": "Normal"}
2024-06-19T14:25:18.326+0200 info service@v0.103.0/service.go:182 Starting otelcorecol... {"Version": "0.103.0-dev", "NumCPU": 32}
2024-06-19T14:25:18.326+0200 info extensions/extensions.go:35 Starting extensions...
2024-06-19T14:25:18.326+0200 error graph/graph.go:428 Failed to start component {"error": "failed to resolve authenticator \"oauth2client\": authenticator not found", "type": "Exporter", "id": "otlphttp"}
2024-06-19T14:25:18.326+0200 info service@v0.103.0/service.go:245 Starting shutdown...
2024-06-19T14:25:18.326+0200 info extensions/extensions.go:61 Stopping extensions...
2024-06-19T14:25:18.326+0200 info service@v0.103.0/service.go:259 Shutdown complete.
Error: cannot start pipelines: failed to resolve authenticator "oauth2client": authenticator not found
2024/06/19 14:25:18 collector server run finished with error: cannot start pipelines: failed to resolve authenticator "oauth2client": authenticator not found
```
We've added the following line:
```
2024-06-19T14:25:18.326+0200 error graph/graph.go:428 Failed to start component {"error": "failed to resolve authenticator \"oauth2client\": authenticator not found", "type": "Exporter", "id": "otlphttp"}
```
For extensions, the new line looks like so:
```
2024-06-25T16:59:18.513+0200 error extensions/extensions.go:54 Failed to start extension {"kind": "extension", "name": "memory_limiter", "error": "failed to get memory limit"}
```
<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes#7078
This PR refactors the service graph initialization so that the graph can
be assembled without the intention to start it.
1. Do not save telemetry on the graph. The only use of this was for
reporting component status. Instead, pass in a reporter when starting or
stopping. Correspondingly, add an internal `statustest` package to make
it easy to pass in a status reporter in tests.
2. Decouple graph building from extension building. There isn't any
direct relationship so these things should be separated.
<!--Describe the documentation added.-->
#### Documentation
Creates an internal architecture file. In it is a diagram of the startup
flow of the collector as well as links to key files / packages. I also
added package level comments to some key packages.
I wrote some other documentation in
https://github.com/open-telemetry/opentelemetry-collector/pull/10029 but
split the PRs up.
---------
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
This fixes two closely related problems.
1. While fanoutconsumers do not themselves mutate data, they should
expose whether or not they are handing data off to consumers which may
do so. Otherwise, the service cannot correctly determine how to fan out
after a receiver. e.g. a receiver shared between two pipelines, one of
which contains an exporter or connector which mutates data.
2. Connectors can themselves mutate data but we were not taking this
into account when building the graph.
This is part of the continued component status reporting effort.
Currently we have automated status reporting for the following component
lifecycle events: `Starting`, `Stopping`, `Stopped` as well as
definitive errors that occur in the starting or stopping process (e.g.
as determined by an error return value). This leaves the responsibility
to the component to report runtime status after start and before stop.
We'd like to be able to extend the automatic status reporting to report
`StatusOK` if `Start` completes without an error. One complication with
this approach is that some components spawn async work (via goroutines)
that, depending on the Go scheduler, can report status before `Start`
returns. As such, we cannot assume a nil return value from `Start` means
the component has started properly. The solution is to detect if the
component has already reported status when start returns, if it has, we
will use the component-reported status and will not automatically report
status. If it hasn't, and `Start` returns without an error, we can
report `StatusOK`. Any subsequent reports from the component (async or
otherwise) will transition the component status accordingly.
The tl;dr is that we cannot control the execution of async code, that's
up to the Go scheduler, but we can handle the race, report the status
based on the execution, and not clobber status reported from within the
component during the startup process. That said, for components with
async starts, you may see a `StatusOK` before the component-reported
status, or just the component-reported status depending on the actual
execution of the code. In both cases, the end status will be same.
The work in this PR will allow us to simplify #8684 and #8788 and
ultimately choose which direction we want to go for runtime status
reporting.
**Link to tracking Issue:** #7682
**Testing:** units / manual
---------
Co-authored-by: Alex Boten <aboten@lightstep.com>
This PR introduces component status reporting. There have been several
attempts to introduce this functionality previously, with the most
recent being: #6560.
This PR was orignally based off of #6560, but has evolved based on the
feedback received and some additional enhancements to improve the ease
of use of the `ReportComponentStatus` API.
In earlier discussions (see
https://github.com/open-telemetry/opentelemetry-collector/pull/8169#issuecomment-1668367246)
we decided to model status as a finite state machine with the following
statuses: `Starting`, `OK`, `RecoverableError`, `PermanentError`,
`FatalError`. `Stopping`, and `Stopped`. A benefit of this design is
that `StatusWatcher`s will be notified on changes in status rather than
on potentially repetitive reports of the same status.
With the additional statuses and modeling them using a finite state
machine, there are more statuses to report. Rather than having each
component be responsible for reporting all of the statuses, I automated
status reporting where possible. A component's status will automatically
be set to `Starting` at startup. If the components `Start` returns an
error, the status will automatically be set to `PermanentError`. A
component is expected to report `StatusOK` when it has successfully
started (if it has successfully started) and from there can report
changes in status as it runs. It will likely be a common scenario for
components to transition between `StatusOK` and `StatusRecoverableError`
during their lifetime. In extenuating circumstances they can transition
into terminal states of `PermanentError` and `FatalError` (where a fatal
error initiates collector shutdown). Additionally, during component
Shutdown statuses are automatically reported where possible. A
component's status is set to `Stopping` when Shutdown is initially
called, if Shutdown returns an error, the status will be set to
`PermanentError` if it does not return an error, the status is set to
`Stopped`.
In #6560 ReportComponentStatus was implemented on the `Host` interface.
I found that few components use the Host interface, and none of them
save a handle to it (to be used outside of the `start` method). I found
that many components keep a handle to the `TelemetrySettings` that they
are initialized with, and this seemed like a more natural, convenient
place for the `ReportComponentStatus` API. I'm ultimately flexible on
where this method resides, but feel that `TelemetrySettings` a more user
friendly place for it.
Regardless of where the `ReportComponentStatus` method resides (Host or
TelemetrySettings), there is a difference in the method signature for
the API based on whether it is used from the service or from a
component. As the service is not bound to a specific component, it needs
to take the `instanceID` of a component as a parameter, whereas the
component version of the method already knows the `instanceID`. In #6560
this led to having both `component.Host` and `servicehost.Host` versions
of the Host interface to be used at the component or service levels. In
this version, we have the same for TelemetrySettings. There is a
`component.TelemetrySettings` and a `servicetelemetry.Settings` with the
only difference being the method signature of `ReportComponentStatus`.
Lastly, this PR sets up the machinery for report component status, and
allows extensions to be `StatusWatcher`s, but it does not introduce any
`StatusWatcher`s. We expect the OpAMP extension to be a `StatusWatcher`
and use data from this system as part of its AgentHealth message (the
message is currently being extended to accommodate more component level
details). We also expect there to be a non-OpAMP `StatusWatcher`
implementation, likely via the HealthCheck extension (or something
similiar).
**Link to tracking Issue:** #7682
cc: @tigrannajaryan @djaglowski @evan-bradley
---------
Co-authored-by: Tigran Najaryan <tnajaryan@splunk.com>
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Alex Boten <aboten@lightstep.com>
Alternate to
https://github.com/open-telemetry/opentelemetry-collector/pull/8003
Fixes
#https://github.com/open-telemetry/opentelemetry-collector/issues/7892
Validation of connectors was too aggressive such that a connector that
was used in any combination of unsupported roles would fail. Instead,
validation should pass as long as each use of the connector has a
supported corresponding use.
For example, the forward connector may forward traces and metrics at the
same time. Previously, validation would fail because it detected that
traces->metrics and metrics->traces were possible connections. Now it
will pass as long as there is a supported connection type for each
pipeline in which the connector is used.
This PR adds helpers to `connectortest` to aid the construction of
`connector.*Router`s for testing connectors. This was implemented based
on @djaglowski's [comment
here](https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/21498#issuecomment-1542682841),
with some slight modifications. I found it more ergonomic to pass the
sink into the `WithTracesSink` (and similar) options. You usually want a
handle on the sink after creating the router. While it's possible to get
the sink out of the router after the fact, it's a little cumbersome.
These helpers will be useful for
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/21498
and future connectors that need use of routers.
For example, here is a test with the consumer passed in:
```go
func TestFanoutTracesWithSink(t *testing.T) {
var sink0, sink1 consumertest.TracesSink
tr, err := NewTracesRouterSink(
WithTracesSink(component.NewIDWithName(component.DataTypeTraces, "0"), &sink0),
WithTracesSink(component.NewIDWithName(component.DataTypeTraces, "1"), &sink1),
)
require.NoError(t, err)
require.Equal(t, 0, sink0.SpanCount())
require.Equal(t, 0, sink1.SpanCount())
td := testdata.GenerateTraces(1)
err = tr.(consumer.Traces).ConsumeTraces(context.Background(), td)
require.NoError(t, err)
require.Equal(t, 1, sink0.SpanCount())
require.Equal(t, 1, sink1.SpanCount())
}
```
The same test having to extract the consumer out after the fact:
```go
func TestFanoutTracesWithSink(t *testing.T) {
traces0 := component.NewIDWithName(component.DataTypeTraces, "0")
traces1 := component.NewIDWithName(component.DataTypeTraces, "1")
tr, err := NewTracesRouterSink(
WithTracesSink(traces0),
WithTracesSink(traces1),
)
require.NoError(t, err)
cons0, _ := tr.Consumer(traces0)
sink0 := cons0.(*consumertest.TracesSink)
cons1, _ := tr.Consumer(traces1)
sink1 := cons1.(*consumertest.TracesSink)
require.Equal(t, 0, sink0.SpanCount())
require.Equal(t, 0, sink1.SpanCount())
td := testdata.GenerateTraces(1)
err = tr.(consumer.Traces).ConsumeTraces(context.Background(), td)
require.NoError(t, err)
require.Equal(t, 1, sink0.SpanCount())
require.Equal(t, 1, sink1.SpanCount())}
}
```
**Link to tracking Issue:**
#7672
**Testing:**
Unit tests
**Documentation:**
Source code comments
---------
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
* [chore] use license shortform
To remain consistent w/ contrib repo, see https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/22052
Signed-off-by: Alex Boten <aboten@lightstep.com>
* make goporto
Signed-off-by: Alex Boten <aboten@lightstep.com>
---------
Signed-off-by: Alex Boten <aboten@lightstep.com>