opentelemetry-collector

Commit Graph

Author	SHA1	Message	Date
Daniel Jaglowski	279752c11f	[service/internal/graph] Measure telemetry as it is passed between pipeline components (#12812 ) Depends on https://github.com/open-telemetry/opentelemetry-collector/pull/12856 Resolves #12676 This is a reboot of #11311, incorporating metrics defined in the [component telemetry RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md) and attributes added in #12617. The basic pattern is: - When building any pipeline component which produces data, wrap the "next consumer" with instrumentation to measure the number of items being passed. This wrapped consumer is then passed into the constructor of the component. - When building any pipeline component which consumes data, wrap the component itself. This wrapped consumer is saved onto the graph node so that it can be retrieved during graph assembly. --------- Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>	2025-05-12 08:33:02 +00:00
Daniel Jaglowski	d804ef5910	Fix 'otelcol.component.kind' value capitalization (#12865 ) This brings capitalization in line with the attributes defined in the [Pipeline Component Telemetry](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md#attributes) RFC.	2025-04-16 03:18:27 +00:00
Matthieu MOREL	564818fd7f	[chore]: fix testifylint rules (#12791 ) #### Description Fixes testifylint rules which where disabled with golangci-lint v2 upgrade Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-04-02 23:10:07 +00:00
Sudipto Baral	18e18b21da	Add pipeline ID to the error message for unused connectors. (#12410 ) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description As mentioned in this https://github.com/open-telemetry/opentelemetry-collector/issues/8721#issuecomment-1813468623, the error message for unused connectors currently lacks specific pipeline names, making debugging more difficult. This PR enhances the error message by including pipeline names in the `[signal/name]` format, consistent with how they appear in `config.yaml`. This provides a better context for identifying misconfigurations. <!-- Issue number if applicable --> #### Link to tracking issue Related to #8721 <!--Describe what testing was performed and which tests were added.--> #### Testing A few scenarios and example output are given below. I will do additional testing and add unit tests if necessary. 1. Used as a receiver but not used as an exporter with 1 signal <details> <summary><strong>config.yaml</strong></summary> ```yaml receivers: otlp: protocols: grpc: exporters: debug: connectors: forward: service: pipelines: logs/in: receivers: [otlp] processors: [] exporters: [debug] logs/out: receivers: [forward] processors: [] exporters: [debug] ``` </details> Main Branch Output: ``` Error: failed to build pipelines: connector "forward" used as receiver in logs pipeline but not used in any supported exporter pipeline ``` Proposed Output: ``` Error: failed to build pipelines: connector "forward" used as receiver in [logs/out] pipeline but not used in any supported exporter pipeline ``` 2. Plain <details> <summary><strong>config.yaml</strong></summary> ```yaml receivers: otlp: protocols: grpc: exporters: debug: connectors: forward: service: pipelines: traces: receivers: [ otlp ] processors: [ ] exporters: [ forward ] metrics: receivers: [ forward ] processors: [ ] exporters: [ debug ] ``` </details> Main Branch Output: ``` Error: failed to build pipelines: connector "forward" used as exporter in traces pipeline but not used in any supported receiver pipeline ``` Proposed Output: ``` Error: failed to build pipelines: connector "forward" used as exporter in [traces] pipeline but not used in any supported receiver pipeline ``` 3. Multiple pipeline <details> <summary><strong>config.yaml</strong></summary> ```yaml receivers: otlp: protocols: grpc: exporters: debug: connectors: forward: service: pipelines: logs/in: receivers: [otlp] processors: [] exporters: [forward] logs/in2: receivers: [ otlp ] processors: [ ] exporters: [ forward ] logs/out: receivers: [otlp] processors: [] exporters: [debug] traces: receivers: [ otlp ] processors: [ ] exporters: [ forward ] metrics: receivers: [ forward ] processors: [ ] exporters: [ debug ] ``` </details> Main Branch Output: ``` Error: failed to build pipelines: connector "forward" used as exporter in logs pipeline but not used in any supported receiver pipeline ``` Proposed Output: ``` Error: failed to build pipelines: connector "forward" used as exporter in [logs/in2 logs/in] pipeline but not used in any supported receiver pipeline ``` --------- Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>	2025-02-24 01:06:54 +00:00
Daniel Jaglowski	5d5fb21acf	Introduce component logger with appropriate attributes (#12259 ) Implements the logger described in https://github.com/open-telemetry/opentelemetry-collector/issues/12217 Alternative to #12057 Resolves #11814 `component/componentattribute`: - Initializes new module - Defines constants for component telemetry attribute keys - Defines a `zapcore.Core` which can remove attributes from the root logger `service`: - Rebases component instantiation on attribute sets - Internal constructors for attribute sets for each component type - Constructs loggers from `componentattribute` `otlpreceiver`: - Uses `componentattribute` to remove `otelcol.signal` attribute from logger `memorylimiter`: - Uses `componentattribute` to remove `otelcol.signal`, `otelcol.pipeline.id` and `otelcol.component.id` attributes from logger	2025-02-06 16:53:20 +00:00
Daniel Jaglowski	81f1fad0ee	[chore] Add test to validate expected component instances (#12071 ) Subset of #12057 This PR adds a test to validate the expected number of instances of each component. This framework becomes more useful once singleton components are explicitly supported.	2025-01-20 13:45:55 +00:00
Dmitrii Anoshin	9206c68ec2	Deprecate pipelineprofiles module in favor of xpipeline (#11888 ) to allow adding more experimental data types. Updates https://github.com/open-telemetry/opentelemetry-collector/issues/11778	2024-12-13 23:23:06 +00:00
Matthieu MOREL	0204d957e5	[chore]: enable whitespace linter (#11579 ) #### Description [whitespace](https://golangci-lint.run/usage/linters/#whitespace) is a linter that checks for unnecessary newlines at the start and end of functions. Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-10-31 12:15:54 -07:00
Bogdan Drutu	98a326a3c5	Move componentprofiles to pipelineprofiles (#11421 ) Move componentprofiles to pipelineprofiles since only the signal constant is defined in that package. Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2024-10-13 13:47:00 -07:00
Bogdan Drutu	c8005ec855	[chore] Pass the signal constant instead of the string (#11420 ) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2024-10-13 13:21:03 -07:00
Daniel Jaglowski	151f8377d8	[chore][graph] Remove connectorNode's separate baseConsumer (#11333 ) Follows #11330 Currently, `connectorNode` contains separate `component.Component` and `baseConsumer` fields. These fields are essentially two representations of the same component, but `baseConsumer` may be wrapped in another consumer that inherits capabilities. Rather than maintain two separate handles, this PR switches to a unified field. I believe this change helps normalize the connector node with other types of consumer nodes and will enable further refactoring opportunities.	2024-10-09 12:20:53 -07:00
Daniel Jaglowski	af27e16f0c	[chore][graph] Split test file (#11329 ) This PR follows #11321 by splitting up the primary test file into a few related topics. I believe this will make further refactoring PRs easier to follow.	2024-10-01 13:30:58 -07:00
Tyler Helmuth	e69f2f38ff	[componentstatus] Continue DataType rename (#11313 ) #### Description Continues the DataType rename process for `NewInstanceIDWithPipelineIDs`, `AllPipelineIDsWithPipelineIDs`, and `WithPipelineIDs`. #### Link to tracking issue Related to https://github.com/open-telemetry/opentelemetry-collector/issues/9429	2024-10-01 17:27:59 +02:00
Bogdan Drutu	6412988f19	[chore] Move back connector definitions, make profile embed connector (#11306 ) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2024-09-30 13:53:03 -07:00
Tyler Helmuth	8ced6eb6e1	[service] Remove deprecations and continue renames around DataType (#11303 ) #### Description Continues deprecation/rename processor for `Config.PipelinesWithPipelineID`, `pipelines.ConfigWithPipelineID` and `GetExportersWithSignal`. #### Link to tracking issue Related to https://github.com/open-telemetry/opentelemetry-collector/issues/9429	2024-09-30 11:31:26 -07:00
Matthieu MOREL	aba139c2cb	[chore]: use ErrorContains and EqualError (#11295 ) #### Description Testifylint doesn't support it yet. This replaces `Contains(t, err.Error()` by `ErrorContains(t, err` and `Equal(t, err.Error()` by `EqualError(t, err` As they both check for nil error it becomes useless to check it yourself without having defined a custom message <!-- Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> --> Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-09-28 08:06:09 -07:00
Bogdan Drutu	99cf16ef4c	[chore] Move back exporter definitions, make profile embed exporter (#11290 ) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2024-09-27 14:09:13 -07:00
Bogdan Drutu	632461fb91	[chore] Move back processor definitions, make profile embed processor (#11286 ) Same as https://github.com/open-telemetry/opentelemetry-collector/pull/11254 but for processor Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2024-09-27 11:07:54 -07:00
Bogdan Drutu	5cc717d747	[chore] Move back receiver definitions, make profile embed receiver (#11254 ) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2024-09-26 12:46:40 -07:00
Tyler Helmuth	77bb849aa0	[component] Refactor to use pipeline.ID and pipeline.Signal (#11204 ) #### Description Depends on https://github.com/open-telemetry/opentelemetry-collector/pull/11209 This PR is a non-breaking implementation of https://github.com/open-telemetry/opentelemetry-collector/pull/10947. It adds a new module, `pipeline`, which houses a `pipeline.ID` and `pipeline.Signal`. `pipeline.ID` is used to identify a pipeline within the service. `pipeline.Signal` is uses to identify the signal associated to a pipeline. I do this work begrudgingly. As the PR shows, this is a huge refactor when done in a non-breaking way, will require 3 full releases, and doesn't benefit our [End Users or, in my opinion, our Component Developers or Collector Library Users](https://github.com/open-telemetry/opentelemetry-collector/blob/main/CONTRIBUTING.md#target-audiences). I view this refactor as a Nice-To-Have, not a requirement for Component 1.0. <!-- Issue number if applicable --> #### Link to tracking issue Works towards https://github.com/open-telemetry/opentelemetry-collector/issues/9429	2024-09-23 07:38:59 -07:00
Matthieu MOREL	37f783308e	[chore]: enable require-error rule from testifylint (#11199 ) #### Description Testifylint is a linter that provides best practices with the use of testify. This PR enables [require-error](https://github.com/Antonboom/testifylint?tab=readme-ov-file#require-error) rule from [testifylint](https://github.com/Antonboom/testifylint) Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-09-18 15:02:22 -07:00
Alex Boten	fbffbb0820	[chore] small test improvements (#11211 ) Clean up some inconsistencies in the test code across the components. Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>	2024-09-18 13:47:25 -07:00
Matthieu MOREL	6925a306fa	[chore]: enable len and empty rules from testifylint (#11021 ) #### Description Testifylint is a linter that provides best practices with the use of testify. This PR enables [len](https://github.com/Antonboom/testifylint?tab=readme-ov-file#len) and [empty](https://github.com/Antonboom/testifylint?tab=readme-ov-file#empty) rules from [testifylint](https://github.com/Antonboom/testifylint) It also adds testifylint as tool to use with a make command Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-09-09 09:57:58 -07:00
Damien Mathieu	720f3a86a3	Add profiles support in service (#11024 ) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This is the last PR to add profiles support, adding it to the service package. This is based after #11023.	2024-09-05 17:56:11 +02:00
Damien Mathieu	18d5c02ade	Move processor builders into internal service (#10782 ) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This moves the processor builder out of the `processor` package, and into `service/internal/builders`. There's no real reason for this struct to be public (folks shouldn't call it), and making it private will allow us to add profiling support to it. <!-- Issue number if applicable --> #### Link to tracking issue https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463	2024-08-22 12:22:23 +02:00
Damien Mathieu	7cd1579d1f	Move exporter builder into internal service (#10783 ) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This moves the exporter builder out of the `exporter` package, and into `service/internal/builders`. There's no real reason for this struct to be public (folks shouldn't call it), and making it private will allow us to add profiling support to it. <!-- Issue number if applicable --> #### Link to tracking issue https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463	2024-08-22 11:43:35 +02:00
Damien Mathieu	cde1055559	Move connector builder into internal service (#10784 ) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This moves the connector builder out of the `connector` package, and into `service/internal/builders`. There's no real reason for this struct to be public (folks shouldn't call it), and making it private will allow us to add profiling support to it. <!-- Issue number if applicable --> #### Link to tracking issue https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463	2024-08-22 10:35:19 +02:00
Damien Mathieu	454432e06f	Move receiver builder into internal service (#10781 ) #### Description This moves the receiver builder out of the `receiver` package, and into `service/internal/builders`. There's no real reason for this struct to be public (folks shouldn't call it), and making it private will allow us to add profiling support to it. #### Link to tracking issue https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463	2024-08-21 07:59:38 -07:00
Matthew Wear	98fb888dfe	[component] Make InstanceID immutable (#10495 ) #### Description This PR makes component.InstanceID immutable. Previously it was a struct with all fields exported. Technically this is a breaking change, but the only thing using the InstanceID is the in-progress healthcheckv2extension. <!-- Issue number if applicable --> #### Link to tracking issue Fixes #10494 <!--Describe what testing was performed and which tests were added.--> #### Testing units <!--Describe the documentation added.--> #### Documentation code comments <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Antoine Toulme <antoine@toulme.name> Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com> Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>	2024-08-21 10:41:55 +02:00
Tyler Helmuth	cb24d0c7d7	[component] Remove ReportStatus from component.TelemetrySettings (#10777 ) #### Description This PR removes `ReportStatus` from `component.TelemetrySettings` and instead expects components to check if their `component.Host` implements a new `componentstatus.Reporter` interface. <!-- Issue number if applicable --> #### Link to tracking issue Related to https://github.com/open-telemetry/opentelemetry-collector/pull/10725 Related to https://github.com/open-telemetry/opentelemetry-collector/pull/10413 <!--Describe what testing was performed and which tests were added.--> #### Testing unit tests and a sharedinstance e2e test. The contrib tests will fail because this is a breaking change. If we merge this I and @mwear can commit to updating contrib before the next release. --------- Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>	2024-08-16 09:27:01 +02:00
Daniel Jaglowski	afefb7464e	[chore] Add readonly matrix test. Remove invalid mutability assertions (#10632 ) The primary objective here was to add new test cases for the graph. However, I found that mutability assertions added in #8634 appear to be nondeterministic. Therefore, important test cases cannot be covered with them in place. This effectively removes the assertions about mutability for now. @dmitryax, I'm curious if you have better ideas here. I am thinking that perhaps it would be better to have an entirely separate set of test cases which are focused on mutability expectations. Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>	2024-08-02 16:05:48 +02:00
Tyler Helmuth	fb5b1e6aa5	[service] Remove servicetelemetry.TelemetrySettings (#10728 ) #### Description Reorganizes service to not require `servicetelemetry.TelemetrySettings` and instead depend directly on `component.TelemetrySettings` Whether or not we move forward with https://github.com/open-telemetry/opentelemetry-collector/pull/10725 I think this is a useful change for service. #### Testing Unit tests	2024-07-29 10:29:05 +02:00
Antoine Toulme	a85ace8a22	[chore] fix typo (#10680 )	2024-07-22 06:36:45 -07:00
Daniel Jaglowski	8e2db5f6a9	[chore] Fix mutability assertion loops (#10627 ) I noticed that these loops are making the same assertion repeatedly, rather than checking each element.	2024-07-16 14:57:46 -07:00
Daniel Jaglowski	b3c781b90e	[chore] Remove telemetry from graph initialization (#10446 ) This PR refactors the service graph initialization so that the graph can be assembled without the intention to start it. 1. Do not save telemetry on the graph. The only use of this was for reporting component status. Instead, pass in a reporter when starting or stopping. Correspondingly, add an internal `statustest` package to make it easy to pass in a status reporter in tests. 2. Decouple graph building from extension building. There isn't any direct relationship so these things should be separated.	2024-06-21 12:32:21 +02:00
Alex Boten	3b3deb8dbe	[connector] deprecate CreateSettings -> Settings (#10338 ) This deprecates CreateSettings in favour of Settings. NewNopCreateSettings is also being deprecated in favour of NewNopSettings Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>	2024-06-06 10:04:47 -07:00
Alex Boten	9907ba50df	[processor] deprecate CreateSettings -> Settings (#10336 ) This deprecates CreateSettings in favour of Settings. NewNopCreateSettings is also being deprecated in favour of NewNopSettings Part of #9428 --------- Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>	2024-06-06 09:34:53 -07:00
Alex Boten	f0c8787d2b	[exporter] deprecate CreateSettings -> Settings (#10335 ) This deprecates CreateSettings in favour of Settings. NewNopCreateSettings is also being deprecated in favour of NewNopSettings Part of #9428 ~Follows https://github.com/open-telemetry/opentelemetry-collector/pull/10333~ --------- Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>	2024-06-06 08:03:40 -07:00
Alex Boten	1e44a9c473	[receiver] deprecate CreateSettings -> Settings (#10333 ) This deprecates CreateSettings in favour of Settings. NewNopCreateSettings is also being deprecated in favour of NewNopSettings Part of #9428 --------- Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>	2024-06-05 13:57:23 -07:00
Bogdan Drutu	edae2c7469	[chore] test that implementation implements interface without allocation (#10214 )	2024-05-23 23:40:05 +02:00
Alex Boten	fc28929061	move internal/testdata to pdata/testdata (#9885 ) This reduces dependencies from the consumer package while making testdata available across repos. It will allow us to remove duplicated code and its a fairly small surface area. Fixes https://github.com/open-telemetry/opentelemetry-collector/issues/9886 --------- Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>	2024-04-08 08:36:57 -07:00
Alex Boten	062d0a7ffc	[chore] remove unnecessary underscores (#9580 ) As per feedback from my previous PR Signed-off-by: Alex Boten <aboten@lightstep.com>	2024-02-13 13:34:53 -08:00
Alex Boten	4688461318	[chore] fix unused params (#9578 ) Related to #9577 Signed-off-by: Alex Boten <aboten@lightstep.com>	2024-02-13 11:04:48 -08:00
Pablo Baeyens	26c157e3bf	[component] Add MustNewType constructor for component.Type (#9414 ) Description: - Adds `component.MustNewType` to create a type. This function panics if the type has invalid characters. Add similar functions `component.MustNewID` and `component.MustNewIDWithName`. - Adds `component.Type.String` to recover the string - Use `component.MustNewType`, `component.MustNewID`, `component.MustNewIDWithName` and `component.Type.String` everywhere in this codebase. To do this I changed `component.Type` into an opaque struct and checked for compile-time errors. Some notes: 1. All components currently on core and contrib follow this rule. This is still breaking for other components. 2. A future PR will change this into a struct, to actually validate this (right now you can just do `component.Type("anything")` to bypass validation). I want to do this in two steps to avoid breaking contrib tests: we first introduce this function, and after that we change into a struct. Link to tracking Issue: Updates #9208	2024-02-02 17:33:03 +01:00
Antoine Toulme	c5a2c78d61	Move error out of `ReportComponentStatus` function signature, use `ReportStatus` instead (#9175 ) Fixes #9148	2024-01-09 09:36:41 -08:00
Daniel Jaglowski	7ec38e5c19	Fanout consumer does not need to mutate in some cases (#9062 ) Follow up to https://github.com/open-telemetry/opentelemetry-collector/pull/9053. @dmitryax pointed out [here](https://github.com/open-telemetry/opentelemetry-collector/pull/9053#discussion_r1420871665) that the fanout consumer will pass original data to a non-mutating consumer if any is available. This PR incorporates that point and updates test expectations accordingly.	2023-12-11 10:04:37 -08:00
Daniel Jaglowski	7c58e71515	Fix bug where MutatesData would not correctly propogate through connectors (#9053 ) This fixes two closely related problems. 1. While fanoutconsumers do not themselves mutate data, they should expose whether or not they are handing data off to consumers which may do so. Otherwise, the service cannot correctly determine how to fan out after a receiver. e.g. a receiver shared between two pipelines, one of which contains an exporter or connector which mutates data. 2. Connectors can themselves mutate data but we were not taking this into account when building the graph.	2023-12-09 09:44:55 -08:00
Matthew Wear	433f7aef92	Automate status reporting on start (#8836 ) This is part of the continued component status reporting effort. Currently we have automated status reporting for the following component lifecycle events: `Starting`, `Stopping`, `Stopped` as well as definitive errors that occur in the starting or stopping process (e.g. as determined by an error return value). This leaves the responsibility to the component to report runtime status after start and before stop. We'd like to be able to extend the automatic status reporting to report `StatusOK` if `Start` completes without an error. One complication with this approach is that some components spawn async work (via goroutines) that, depending on the Go scheduler, can report status before `Start` returns. As such, we cannot assume a nil return value from `Start` means the component has started properly. The solution is to detect if the component has already reported status when start returns, if it has, we will use the component-reported status and will not automatically report status. If it hasn't, and `Start` returns without an error, we can report `StatusOK`. Any subsequent reports from the component (async or otherwise) will transition the component status accordingly. The tl;dr is that we cannot control the execution of async code, that's up to the Go scheduler, but we can handle the race, report the status based on the execution, and not clobber status reported from within the component during the startup process. That said, for components with async starts, you may see a `StatusOK` before the component-reported status, or just the component-reported status depending on the actual execution of the code. In both cases, the end status will be same. The work in this PR will allow us to simplify #8684 and #8788 and ultimately choose which direction we want to go for runtime status reporting. Link to tracking Issue: #7682 Testing: units / manual --------- Co-authored-by: Alex Boten <aboten@lightstep.com>	2023-11-28 12:43:32 -08:00
Dmitrii Anoshin	8a385c22e5	[pdata] Enable the pdata mutation safeguards in the fanout consumers (#8634 ) This change enables the runtime assertions to catch unintentional pdata mutations in components claiming as non-mutating pdata. Without these assertions, runtime errors may still occur, but thrown by unrelated components, making it very difficult to troubleshoot. This required introducing extra API to get the pdata mutability state: - p[metric\|trace\|log].[Metrics\|Traces\|Logs].IsReadOnly() Resolves: https://github.com/open-telemetry/opentelemetry-collector/issues/6794	2023-10-16 11:12:50 -07:00
Matthew Wear	53615832e6	Component Status Reporting (#8169 ) This PR introduces component status reporting. There have been several attempts to introduce this functionality previously, with the most recent being: #6560. This PR was orignally based off of #6560, but has evolved based on the feedback received and some additional enhancements to improve the ease of use of the `ReportComponentStatus` API. In earlier discussions (see https://github.com/open-telemetry/opentelemetry-collector/pull/8169#issuecomment-1668367246) we decided to model status as a finite state machine with the following statuses: `Starting`, `OK`, `RecoverableError`, `PermanentError`, `FatalError`. `Stopping`, and `Stopped`. A benefit of this design is that `StatusWatcher`s will be notified on changes in status rather than on potentially repetitive reports of the same status. With the additional statuses and modeling them using a finite state machine, there are more statuses to report. Rather than having each component be responsible for reporting all of the statuses, I automated status reporting where possible. A component's status will automatically be set to `Starting` at startup. If the components `Start` returns an error, the status will automatically be set to `PermanentError`. A component is expected to report `StatusOK` when it has successfully started (if it has successfully started) and from there can report changes in status as it runs. It will likely be a common scenario for components to transition between `StatusOK` and `StatusRecoverableError` during their lifetime. In extenuating circumstances they can transition into terminal states of `PermanentError` and `FatalError` (where a fatal error initiates collector shutdown). Additionally, during component Shutdown statuses are automatically reported where possible. A component's status is set to `Stopping` when Shutdown is initially called, if Shutdown returns an error, the status will be set to `PermanentError` if it does not return an error, the status is set to `Stopped`. In #6560 ReportComponentStatus was implemented on the `Host` interface. I found that few components use the Host interface, and none of them save a handle to it (to be used outside of the `start` method). I found that many components keep a handle to the `TelemetrySettings` that they are initialized with, and this seemed like a more natural, convenient place for the `ReportComponentStatus` API. I'm ultimately flexible on where this method resides, but feel that `TelemetrySettings` a more user friendly place for it. Regardless of where the `ReportComponentStatus` method resides (Host or TelemetrySettings), there is a difference in the method signature for the API based on whether it is used from the service or from a component. As the service is not bound to a specific component, it needs to take the `instanceID` of a component as a parameter, whereas the component version of the method already knows the `instanceID`. In #6560 this led to having both `component.Host` and `servicehost.Host` versions of the Host interface to be used at the component or service levels. In this version, we have the same for TelemetrySettings. There is a `component.TelemetrySettings` and a `servicetelemetry.Settings` with the only difference being the method signature of `ReportComponentStatus`. Lastly, this PR sets up the machinery for report component status, and allows extensions to be `StatusWatcher`s, but it does not introduce any `StatusWatcher`s. We expect the OpAMP extension to be a `StatusWatcher` and use data from this system as part of its AgentHealth message (the message is currently being extended to accommodate more component level details). We also expect there to be a non-OpAMP `StatusWatcher` implementation, likely via the HealthCheck extension (or something similiar). Link to tracking Issue: #7682 cc: @tigrannajaryan @djaglowski @evan-bradley --------- Co-authored-by: Tigran Najaryan <tnajaryan@splunk.com> Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com> Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com> Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com> Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: Alex Boten <aboten@lightstep.com>	2023-10-06 11:35:38 -07:00

1 2

56 Commits