Commit Graph

56 Commits

Author SHA1 Message Date
Daniel Jaglowski 279752c11f
[service/internal/graph] Measure telemetry as it is passed between pipeline components (#12812)
Depends on
https://github.com/open-telemetry/opentelemetry-collector/pull/12856

Resolves #12676

This is a reboot of #11311, incorporating metrics defined in the
[component telemetry
RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md)
and attributes added in #12617.

The basic pattern is:
- When building any pipeline component which produces data, wrap the
"next consumer" with instrumentation to measure the number of items
being passed. This wrapped consumer is then passed into the constructor
of the component.
- When building any pipeline component which consumes data, wrap the
component itself. This wrapped consumer is saved onto the graph node so
that it can be retrieved during graph assembly.

---------

Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
2025-05-12 08:33:02 +00:00
Daniel Jaglowski d804ef5910
Fix 'otelcol.component.kind' value capitalization (#12865)
This brings capitalization in line with the attributes defined in the
[Pipeline Component
Telemetry](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md#attributes)
RFC.
2025-04-16 03:18:27 +00:00
Matthieu MOREL 564818fd7f
[chore]: fix testifylint rules (#12791)
#### Description

Fixes testifylint rules which where disabled with golangci-lint v2
upgrade

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-04-02 23:10:07 +00:00
Sudipto Baral 18e18b21da
Add pipeline ID to the error message for unused connectors. (#12410)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
As mentioned in this
https://github.com/open-telemetry/opentelemetry-collector/issues/8721#issuecomment-1813468623,
the error message for unused connectors currently lacks specific
pipeline names, making debugging more difficult.

This PR enhances the error message by including pipeline names in the
`[signal/name]` format, consistent with how they appear in
`config.yaml`. This provides a better context for identifying
misconfigurations.

<!-- Issue number if applicable -->
#### Link to tracking issue
Related to #8721 

<!--Describe what testing was performed and which tests were added.-->
#### Testing
A few scenarios and example output are given below. I will do additional
testing and add unit tests if necessary.

**1. Used as a receiver but not used as an exporter with 1 signal**
<details>
  <summary><strong>config.yaml</strong></summary>

```yaml
receivers:
  otlp:
    protocols:
      grpc:
exporters:
  debug:
connectors:
  forward:
service:
  pipelines:
    logs/in:
      receivers: [otlp]
      processors: []
      exporters: [debug]
    logs/out:
      receivers: [forward]
      processors: []
      exporters: [debug]
```
</details>

Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as receiver in logs pipeline but not used in any supported exporter pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as receiver in [logs/out] pipeline but not used in any supported exporter pipeline
```

**2. Plain**
<details>
  <summary><strong>config.yaml</strong></summary>

```yaml
receivers:
  otlp:
    protocols:
      grpc:
exporters:
  debug:
connectors:
  forward:
service:
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ ]
      exporters: [ forward ]
    metrics:
      receivers: [ forward ]
      processors: [ ]
      exporters: [ debug ]
```
</details>

Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in traces pipeline but not used in any supported receiver pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in [traces] pipeline but not used in any supported receiver pipeline
```
**3. Multiple pipeline**
<details>
  <summary><strong>config.yaml</strong></summary>

```yaml
receivers:
  otlp:
    protocols:
      grpc:
exporters:
  debug:
connectors:
  forward:
service:
  pipelines:
    logs/in:
      receivers: [otlp]
      processors: []
      exporters: [forward]
    logs/in2:
      receivers: [ otlp ]
      processors: [ ]
      exporters: [ forward ]
    logs/out:
      receivers: [otlp]
      processors: []
      exporters: [debug]
    traces:
      receivers: [ otlp ]
      processors: [ ]
      exporters: [ forward ]
    metrics:
      receivers: [ forward ]
      processors: [ ]
      exporters: [ debug ]
```
</details>

Main Branch Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in logs pipeline but not used in any supported receiver pipeline
```
Proposed Output:
```
Error: failed to build pipelines: connector "forward" used as exporter in [logs/in2 logs/in] pipeline but not used in any supported receiver pipeline

```

---------

Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
2025-02-24 01:06:54 +00:00
Daniel Jaglowski 5d5fb21acf
Introduce component logger with appropriate attributes (#12259)
Implements the logger described in
https://github.com/open-telemetry/opentelemetry-collector/issues/12217

Alternative to #12057

Resolves #11814

`component/componentattribute`:
- Initializes new module
- Defines constants for component telemetry attribute keys
- Defines a `zapcore.Core` which can remove attributes from the root
logger

`service`:
- Rebases component instantiation on attribute sets
- Internal constructors for attribute sets for each component type
- Constructs loggers from `componentattribute`

`otlpreceiver`:
- Uses `componentattribute` to remove `otelcol.signal` attribute from
logger

`memorylimiter`:
- Uses `componentattribute` to remove `otelcol.signal`,
`otelcol.pipeline.id` and `otelcol.component.id` attributes from logger
2025-02-06 16:53:20 +00:00
Daniel Jaglowski 81f1fad0ee
[chore] Add test to validate expected component instances (#12071)
Subset of #12057

This PR adds a test to validate the expected number of instances of each
component. This framework becomes more useful once singleton components
are explicitly supported.
2025-01-20 13:45:55 +00:00
Dmitrii Anoshin 9206c68ec2
Deprecate pipelineprofiles module in favor of xpipeline (#11888)
to allow adding more experimental data types.

Updates
https://github.com/open-telemetry/opentelemetry-collector/issues/11778
2024-12-13 23:23:06 +00:00
Matthieu MOREL 0204d957e5
[chore]: enable whitespace linter (#11579)
#### Description

[whitespace](https://golangci-lint.run/usage/linters/#whitespace) is a
linter that checks for unnecessary newlines at the start and end of
functions.

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-10-31 12:15:54 -07:00
Bogdan Drutu 98a326a3c5
Move componentprofiles to pipelineprofiles (#11421)
Move componentprofiles to pipelineprofiles since only the signal
constant is defined in that package.

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2024-10-13 13:47:00 -07:00
Bogdan Drutu c8005ec855
[chore] Pass the signal constant instead of the string (#11420)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2024-10-13 13:21:03 -07:00
Daniel Jaglowski 151f8377d8
[chore][graph] Remove connectorNode's separate baseConsumer (#11333)
Follows #11330

Currently, `connectorNode` contains separate `component.Component` and
`baseConsumer` fields. These fields are essentially two representations
of the same component, but `baseConsumer` may be wrapped in another
consumer that inherits capabilities. Rather than maintain two separate
handles, this PR switches to a unified field. I believe this change
helps normalize the connector node with other types of consumer nodes
and will enable further refactoring opportunities.
2024-10-09 12:20:53 -07:00
Daniel Jaglowski af27e16f0c
[chore][graph] Split test file (#11329)
This PR follows #11321 by splitting up the primary test file into a few
related topics. I believe this will make further refactoring PRs easier
to follow.
2024-10-01 13:30:58 -07:00
Tyler Helmuth e69f2f38ff
[componentstatus] Continue DataType rename (#11313)
#### Description
Continues the DataType rename process for
`NewInstanceIDWithPipelineIDs`, `AllPipelineIDsWithPipelineIDs`, and
`WithPipelineIDs`.

#### Link to tracking issue
Related to
https://github.com/open-telemetry/opentelemetry-collector/issues/9429
2024-10-01 17:27:59 +02:00
Bogdan Drutu 6412988f19
[chore] Move back connector definitions, make profile embed connector (#11306)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2024-09-30 13:53:03 -07:00
Tyler Helmuth 8ced6eb6e1
[service] Remove deprecations and continue renames around DataType (#11303)
#### Description

Continues deprecation/rename processor for
`Config.PipelinesWithPipelineID`, `pipelines.ConfigWithPipelineID` and
`GetExportersWithSignal`.


#### Link to tracking issue
Related to
https://github.com/open-telemetry/opentelemetry-collector/issues/9429
2024-09-30 11:31:26 -07:00
Matthieu MOREL aba139c2cb
[chore]: use ErrorContains and EqualError (#11295)
#### Description

Testifylint doesn't support it yet. 
This replaces `Contains(t, err.Error()` by `ErrorContains(t, err` and
`Equal(t, err.Error()` by `EqualError(t, err`
As they both check for nil error it becomes useless to check it yourself
without having defined a custom message

<!-- Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> -->

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-28 08:06:09 -07:00
Bogdan Drutu 99cf16ef4c
[chore] Move back exporter definitions, make profile embed exporter (#11290)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2024-09-27 14:09:13 -07:00
Bogdan Drutu 632461fb91
[chore] Move back processor definitions, make profile embed processor (#11286)
Same as
https://github.com/open-telemetry/opentelemetry-collector/pull/11254 but
for processor

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2024-09-27 11:07:54 -07:00
Bogdan Drutu 5cc717d747
[chore] Move back receiver definitions, make profile embed receiver (#11254)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2024-09-26 12:46:40 -07:00
Tyler Helmuth 77bb849aa0
[component] Refactor to use pipeline.ID and pipeline.Signal (#11204)
#### Description
Depends on
https://github.com/open-telemetry/opentelemetry-collector/pull/11209

This PR is a non-breaking implementation of
https://github.com/open-telemetry/opentelemetry-collector/pull/10947. It
adds a new module, `pipeline`, which houses a `pipeline.ID` and
`pipeline.Signal`. `pipeline.ID` is used to identify a pipeline within
the service. `pipeline.Signal` is uses to identify the signal associated
to a pipeline.

I do this work begrudgingly. As the PR shows, this is a huge refactor
when done in a non-breaking way, will require 3 full releases, and
doesn't benefit our [End Users or, in my opinion, our Component
Developers or Collector Library
Users](https://github.com/open-telemetry/opentelemetry-collector/blob/main/CONTRIBUTING.md#target-audiences).
I view this refactor as a Nice-To-Have, not a requirement for Component
1.0.

<!-- Issue number if applicable -->
#### Link to tracking issue
Works towards
https://github.com/open-telemetry/opentelemetry-collector/issues/9429
2024-09-23 07:38:59 -07:00
Matthieu MOREL 37f783308e
[chore]: enable require-error rule from testifylint (#11199)
#### Description

Testifylint is a linter that provides best practices with the use of
testify.

This PR enables
[require-error](https://github.com/Antonboom/testifylint?tab=readme-ov-file#require-error)
rule from [testifylint](https://github.com/Antonboom/testifylint)

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-18 15:02:22 -07:00
Alex Boten fbffbb0820
[chore] small test improvements (#11211)
Clean up some inconsistencies in the test code across the components.

Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
2024-09-18 13:47:25 -07:00
Matthieu MOREL 6925a306fa
[chore]: enable len and empty rules from testifylint (#11021)
#### Description

Testifylint is a linter that provides best practices with the use of
testify.

This PR enables
[len](https://github.com/Antonboom/testifylint?tab=readme-ov-file#len)
and
[empty](https://github.com/Antonboom/testifylint?tab=readme-ov-file#empty)
rules from [testifylint](https://github.com/Antonboom/testifylint)

It also adds testifylint as tool to use with a make command

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-09 09:57:58 -07:00
Damien Mathieu 720f3a86a3
Add profiles support in service (#11024)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This is the last PR to add profiles support, adding it to the service
package.

This is based after #11023.
2024-09-05 17:56:11 +02:00
Damien Mathieu 18d5c02ade
Move processor builders into internal service (#10782)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This moves the processor builder out of the `processor` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.

<!-- Issue number if applicable -->
#### Link to tracking issue

https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
2024-08-22 12:22:23 +02:00
Damien Mathieu 7cd1579d1f
Move exporter builder into internal service (#10783)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This moves the exporter builder out of the `exporter` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.

<!-- Issue number if applicable -->
#### Link to tracking issue

https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
2024-08-22 11:43:35 +02:00
Damien Mathieu cde1055559
Move connector builder into internal service (#10784)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This moves the connector builder out of the `connector` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.

<!-- Issue number if applicable -->
#### Link to tracking issue

https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
2024-08-22 10:35:19 +02:00
Damien Mathieu 454432e06f
Move receiver builder into internal service (#10781)
#### Description

This moves the receiver builder out of the `receiver` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.

#### Link to tracking issue

https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
2024-08-21 07:59:38 -07:00
Matthew Wear 98fb888dfe
[component] Make InstanceID immutable (#10495)
#### Description
This PR makes component.InstanceID immutable. Previously it was a struct
with all fields exported. Technically this is a breaking change, but the
only thing using the InstanceID is the in-progress
healthcheckv2extension.

<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes #10494

<!--Describe what testing was performed and which tests were added.-->
#### Testing
units

<!--Describe the documentation added.-->
#### Documentation
code comments
<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
2024-08-21 10:41:55 +02:00
Tyler Helmuth cb24d0c7d7
[component] Remove ReportStatus from component.TelemetrySettings (#10777)
#### Description

This PR removes `ReportStatus` from `component.TelemetrySettings` and
instead expects components to check if their `component.Host` implements
a new `componentstatus.Reporter` interface.

<!-- Issue number if applicable -->
#### Link to tracking issue
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10725
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10413

<!--Describe what testing was performed and which tests were added.-->
#### Testing
unit tests and a sharedinstance e2e test.

The contrib tests will fail because this is a breaking change. If we
merge this I and @mwear can commit to updating contrib before the next
release.

---------

Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
2024-08-16 09:27:01 +02:00
Daniel Jaglowski afefb7464e
[chore] Add readonly matrix test. Remove invalid mutability assertions (#10632)
The primary objective here was to add new test cases for the graph.
However, I found that mutability assertions added in #8634 appear to be
nondeterministic. Therefore, important test cases cannot be covered with
them in place. This effectively removes the assertions about mutability
for now.

@dmitryax, I'm curious if you have better ideas here. I am thinking that
perhaps it would be better to have an entirely separate set of test
cases which are focused on mutability expectations.

Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
2024-08-02 16:05:48 +02:00
Tyler Helmuth fb5b1e6aa5
[service] Remove servicetelemetry.TelemetrySettings (#10728)
#### Description
Reorganizes service to not require `servicetelemetry.TelemetrySettings`
and instead depend directly on `component.TelemetrySettings`

Whether or not we move forward with
https://github.com/open-telemetry/opentelemetry-collector/pull/10725 I
think this is a useful change for service.

#### Testing
Unit tests
2024-07-29 10:29:05 +02:00
Antoine Toulme a85ace8a22
[chore] fix typo (#10680) 2024-07-22 06:36:45 -07:00
Daniel Jaglowski 8e2db5f6a9
[chore] Fix mutability assertion loops (#10627)
I noticed that these loops are making the same assertion repeatedly,
rather than checking each element.
2024-07-16 14:57:46 -07:00
Daniel Jaglowski b3c781b90e
[chore] Remove telemetry from graph initialization (#10446)
This PR refactors the service graph initialization so that the graph can
be assembled without the intention to start it.
1. Do not save telemetry on the graph. The only use of this was for
reporting component status. Instead, pass in a reporter when starting or
stopping. Correspondingly, add an internal `statustest` package to make
it easy to pass in a status reporter in tests.
2. Decouple graph building from extension building. There isn't any
direct relationship so these things should be separated.
2024-06-21 12:32:21 +02:00
Alex Boten 3b3deb8dbe
[connector] deprecate CreateSettings -> Settings (#10338)
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings

Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
2024-06-06 10:04:47 -07:00
Alex Boten 9907ba50df
[processor] deprecate CreateSettings -> Settings (#10336)
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings
    
Part of #9428

---------

Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
2024-06-06 09:34:53 -07:00
Alex Boten f0c8787d2b
[exporter] deprecate CreateSettings -> Settings (#10335)
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings
    
Part of #9428

~Follows
https://github.com/open-telemetry/opentelemetry-collector/pull/10333~

---------

Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
2024-06-06 08:03:40 -07:00
Alex Boten 1e44a9c473
[receiver] deprecate CreateSettings -> Settings (#10333)
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings

Part of #9428

---------

Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
2024-06-05 13:57:23 -07:00
Bogdan Drutu edae2c7469
[chore] test that implementation implements interface without allocation (#10214) 2024-05-23 23:40:05 +02:00
Alex Boten fc28929061
move internal/testdata to pdata/testdata (#9885)
This reduces dependencies from the consumer package while making
testdata available across repos. It will allow us to remove duplicated
code and its a fairly small surface area.

Fixes
https://github.com/open-telemetry/opentelemetry-collector/issues/9886

---------

Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
2024-04-08 08:36:57 -07:00
Alex Boten 062d0a7ffc
[chore] remove unnecessary underscores (#9580)
As per feedback from my previous PR

Signed-off-by: Alex Boten <aboten@lightstep.com>
2024-02-13 13:34:53 -08:00
Alex Boten 4688461318
[chore] fix unused params (#9578)
Related to #9577

Signed-off-by: Alex Boten <aboten@lightstep.com>
2024-02-13 11:04:48 -08:00
Pablo Baeyens 26c157e3bf
[component] Add MustNewType constructor for component.Type (#9414)
**Description:** 

- Adds `component.MustNewType` to create a type. This function panics if
the type has invalid characters. Add similar functions
`component.MustNewID` and `component.MustNewIDWithName`.
- Adds `component.Type.String` to recover the string
- Use `component.MustNewType`, `component.MustNewID`,
`component.MustNewIDWithName` and `component.Type.String` everywhere in
this codebase. To do this I changed `component.Type` into an opaque
struct and checked for compile-time errors.

Some notes:

1. All components currently on core and contrib follow this rule. This
is still breaking for other components.
2. A future PR will change this into a struct, to actually validate this
(right now you can just do `component.Type("anything")` to bypass
validation). I want to do this in two steps to avoid breaking contrib
tests: we first introduce this function, and after that we change into a
struct.

**Link to tracking Issue:** Updates #9208
2024-02-02 17:33:03 +01:00
Antoine Toulme c5a2c78d61
Move error out of `ReportComponentStatus` function signature, use `ReportStatus` instead (#9175)
Fixes #9148
2024-01-09 09:36:41 -08:00
Daniel Jaglowski 7ec38e5c19
Fanout consumer does not need to mutate in some cases (#9062)
Follow up to
https://github.com/open-telemetry/opentelemetry-collector/pull/9053.

@dmitryax pointed out
[here](https://github.com/open-telemetry/opentelemetry-collector/pull/9053#discussion_r1420871665)
that the fanout consumer will pass original data to a non-mutating
consumer if any is available. This PR incorporates that point and
updates test expectations accordingly.
2023-12-11 10:04:37 -08:00
Daniel Jaglowski 7c58e71515
Fix bug where MutatesData would not correctly propogate through connectors (#9053)
This fixes two closely related problems. 
1. While fanoutconsumers do not themselves mutate data, they should
expose whether or not they are handing data off to consumers which may
do so. Otherwise, the service cannot correctly determine how to fan out
after a receiver. e.g. a receiver shared between two pipelines, one of
which contains an exporter or connector which mutates data.
2. Connectors can themselves mutate data but we were not taking this
into account when building the graph.
2023-12-09 09:44:55 -08:00
Matthew Wear 433f7aef92
Automate status reporting on start (#8836)
This is part of the continued component status reporting effort.
Currently we have automated status reporting for the following component
lifecycle events: `Starting`, `Stopping`, `Stopped` as well as
definitive errors that occur in the starting or stopping process (e.g.
as determined by an error return value). This leaves the responsibility
to the component to report runtime status after start and before stop.
We'd like to be able to extend the automatic status reporting to report
`StatusOK` if `Start` completes without an error. One complication with
this approach is that some components spawn async work (via goroutines)
that, depending on the Go scheduler, can report status before `Start`
returns. As such, we cannot assume a nil return value from `Start` means
the component has started properly. The solution is to detect if the
component has already reported status when start returns, if it has, we
will use the component-reported status and will not automatically report
status. If it hasn't, and `Start` returns without an error, we can
report `StatusOK`. Any subsequent reports from the component (async or
otherwise) will transition the component status accordingly.

The tl;dr is that we cannot control the execution of async code, that's
up to the Go scheduler, but we can handle the race, report the status
based on the execution, and not clobber status reported from within the
component during the startup process. That said, for components with
async starts, you may see a `StatusOK` before the component-reported
status, or just the component-reported status depending on the actual
execution of the code. In both cases, the end status will be same.

The work in this PR will allow us to simplify #8684 and #8788 and
ultimately choose which direction we want to go for runtime status
reporting.

**Link to tracking Issue:** #7682

**Testing:** units / manual

---------

Co-authored-by: Alex Boten <aboten@lightstep.com>
2023-11-28 12:43:32 -08:00
Dmitrii Anoshin 8a385c22e5
[pdata] Enable the pdata mutation safeguards in the fanout consumers (#8634)
This change enables the runtime assertions to catch unintentional pdata
mutations in components claiming as non-mutating pdata. Without these
assertions, runtime errors may still occur, but thrown by unrelated
components, making it very difficult to troubleshoot.

This required introducing extra API to get the pdata mutability state:
- p[metric|trace|log].[Metrics|Traces|Logs].IsReadOnly()

Resolves:
https://github.com/open-telemetry/opentelemetry-collector/issues/6794
2023-10-16 11:12:50 -07:00
Matthew Wear 53615832e6
Component Status Reporting (#8169)
This PR introduces component status reporting. There have been several
attempts to introduce this functionality previously, with the most
recent being: #6560.

This PR was orignally based off of #6560, but has evolved based on the
feedback received and some additional enhancements to improve the ease
of use of the `ReportComponentStatus` API.

In earlier discussions (see
https://github.com/open-telemetry/opentelemetry-collector/pull/8169#issuecomment-1668367246)
we decided to model status as a finite state machine with the following
statuses: `Starting`, `OK`, `RecoverableError`, `PermanentError`,
`FatalError`. `Stopping`, and `Stopped`. A benefit of this design is
that `StatusWatcher`s will be notified on changes in status rather than
on potentially repetitive reports of the same status.

With the additional statuses and modeling them using a finite state
machine, there are more statuses to report. Rather than having each
component be responsible for reporting all of the statuses, I automated
status reporting where possible. A component's status will automatically
be set to `Starting` at startup. If the components `Start` returns an
error, the status will automatically be set to `PermanentError`. A
component is expected to report `StatusOK` when it has successfully
started (if it has successfully started) and from there can report
changes in status as it runs. It will likely be a common scenario for
components to transition between `StatusOK` and `StatusRecoverableError`
during their lifetime. In extenuating circumstances they can transition
into terminal states of `PermanentError` and `FatalError` (where a fatal
error initiates collector shutdown). Additionally, during component
Shutdown statuses are automatically reported where possible. A
component's status is set to `Stopping` when Shutdown is initially
called, if Shutdown returns an error, the status will be set to
`PermanentError` if it does not return an error, the status is set to
`Stopped`.

In #6560 ReportComponentStatus was implemented on the `Host` interface.
I found that few components use the Host interface, and none of them
save a handle to it (to be used outside of the `start` method). I found
that many components keep a handle to the `TelemetrySettings` that they
are initialized with, and this seemed like a more natural, convenient
place for the `ReportComponentStatus` API. I'm ultimately flexible on
where this method resides, but feel that `TelemetrySettings` a more user
friendly place for it.

Regardless of where the `ReportComponentStatus` method resides (Host or
TelemetrySettings), there is a difference in the method signature for
the API based on whether it is used from the service or from a
component. As the service is not bound to a specific component, it needs
to take the `instanceID` of a component as a parameter, whereas the
component version of the method already knows the `instanceID`. In #6560
this led to having both `component.Host` and `servicehost.Host` versions
of the Host interface to be used at the component or service levels. In
this version, we have the same for TelemetrySettings. There is a
`component.TelemetrySettings` and a `servicetelemetry.Settings` with the
only difference being the method signature of `ReportComponentStatus`.

Lastly, this PR sets up the machinery for report component status, and
allows extensions to be `StatusWatcher`s, but it does not introduce any
`StatusWatcher`s. We expect the OpAMP extension to be a `StatusWatcher`
and use data from this system as part of its AgentHealth message (the
message is currently being extended to accommodate more component level
details). We also expect there to be a non-OpAMP `StatusWatcher`
implementation, likely via the HealthCheck extension (or something
similiar).

**Link to tracking Issue:** #7682

cc: @tigrannajaryan @djaglowski @evan-bradley

---------

Co-authored-by: Tigran Najaryan <tnajaryan@splunk.com>
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Alex Boten <aboten@lightstep.com>
2023-10-06 11:35:38 -07:00