#### Description
Move the telemetry implementation from `service/telemetry` to
`service/telemetry/otelconftelemetry`, with the exception of the
Settings type which is intended to remain in the `service/telemetry`
package permanently. The goal here is to separate the interface (which
is to be defined in a followup) and implementation, so multiple
implementations can exist.
This move is in preparation for creating a new Telemetry and Factory
interface, similar to pipeline components. In the end, the
implementation (e.g. otelconftelemetry) will be completely self
contained, providing a NewFactory function to obtain a factory that can
be used to construct a Telemetry implementation.
The next steps after this PR would be:
1. Update
7bf3615b56/cmd/opampsupervisor/supervisor/config/config.go (L25)
to use otelconftelemetry. Eventually the opampsupervisor should probably
use the Factory & Telemetry, but that can be done later on.
2. Create the Factory and Telemetry interface in service/telemetry, and
update otelconftelemetry to implement it, encapsulating the creation of
the SDK & SDK resource. We would continue directly constructing the
otelconftelemetry factory to keep the change contained. At this stage we
could close
https://github.com/open-telemetry/opentelemetry-collector/issues/8170.
3. Update otelcol and service packages to inject the telemetry factory &
default config into the service's settings and default config, and
remove direct references to otelconftelemetry in the service package. At
this stage we could close
https://github.com/open-telemetry/opentelemetry-collector/issues/4970.
#### Link to tracking issue
Part of
https://github.com/open-telemetry/opentelemetry-collector/issues/4970
#### Testing
N/A, non-functional change.
#### Documentation
N/A
#### Description
Simplify code by using the SDK.Shutdown method, which internally shuts
down each of the providers.
#### Link to tracking issue
Preparation work for
https://github.com/open-telemetry/opentelemetry-collector/issues/4970
#### Testing
N/A, non-functional change.
#### Documentation
N/A
---------
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
#### Description
There are two main changes here:
- We move the componentattribute logger core wrapping and Logger to
LoggerProvider tee'ing to the service package. This will enable the
wrapping to be performed independently of the implementation of
telemetry providers.
- We move away from using zapcore.NewTeeCore, and introduce a new
zapcore.Core implementation that only copies log entries accepted by the
native Zap core to the OTel LoggerProvider.
The second change is necessary due to the first: because the
LoggerProvider will in the future be injectable, and because the
telemetry configuration will be opaque to the service package, we must
always tee the logs from the zap logger to the LoggerProvider. When
using zapcore.NewTeeCore, the resulting core will duplicate all entries
to all cores - which is not what we want.
There's also some light refactoring of tests, and the service package
tests are now ~30s faster by replacing a sleep with assert.Eventually.
#### Link to tracking issue
Preparation work for
https://github.com/open-telemetry/opentelemetry-collector/issues/4970
#### Testing
Unit tests pass
#### Documentation
N/A
---------
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
I noticed the service tests were a little flaky due to using the default
Prometheus port. This should at least reduce the frequency where this
happens if not eliminate it.
I did two things to help elimination contention for the port:
1. Disable the Prometheus server in tests where it isn't needed.
2. Ensure the port is randomized in tests where it is needed.
#### Description
Upgrading the collector to use opentelemetry-go-contrib
v1.35.0/v0.60.0/v0.29.0/v0.15.0/v0.10.0/v0.8.0/v0.7.0
See changelog here:
https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.35.0
<!-- Issue number if applicable -->
#### Link to tracking issue
none available
<!--Describe what testing was performed and which tests were added.-->
#### Testing
I ran `make all` but happy to run other targets if needed
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
<!-- Issue number if applicable -->
Changes the underlying type of `component.Kind` to be closer to
[`pipeline.Signal`](https://pkg.go.dev/go.opentelemetry.io/collector/pipeline#Signal).
Right now the type is defined within `component`, but it could be moved
to an internal module so that we could have a different module exposing
other value on this enum.
This is already doable today, the only thing this PR gives us is
1. slightly more flexibility on things like making the concept of kind
more complex (e.g. adding an adjective to a kind).
2. restricting external consumers from implementing their own component
kinds without our explicit approval (with some kind of API we design)
I am not convinced this is _necessary_ to do, but we may as well do it.
This is technically a breaking change since `component.Kind(42)` was a
valid expression and it no longer is. I think this is extremely rare, so
I suggest if we go ahead we do so in one go.
#### Link to tracking issue
Fixes#11443
This PR does a couple of things that I couldn't quite split up so I put
together a PR w/ individual commits to help reviewers get through it.
This PR does the following:
1. update `go.opentelemetry.io/contrib/config` package to latest. this
brings in breaking changes. in order to prevent those breaking changes
from impacting end users, i've also added a layer of config unmarshaling
2. updates the collector to instantiate the meter provider (and
exporters) via the config package. this allows us to remove all the code
in `otelinit`. the reason for including this change was that
unmarshaling the config was causing circular dependencies i didn't want
to address by moving code that could be deleted around.
Replacement for
https://github.com/open-telemetry/opentelemetry-collector/pull/11458.
Fixes
https://github.com/open-telemetry/opentelemetry-collector/issues/12021
---------
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
#### Description
[gofumpt](https://golangci-lint.run/usage/linters/#gofumpt) is a
stricter format than gofmt, while being backwards compatible.
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
#### Description
[perfsprint](https://golangci-lint.run/usage/linters/#perfsprint) checks
that fmt.Sprintf can be replaced with a faster alternative.
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
The tracer and logger provider were instantiating different resources
objects that didn't have a `service.instance.id` attribute. This change
fixes that by instantiating the SDK in the service and passing it to the
factory via the telemetry.Settings struct.
This also removes the duplicate code to instantiate the SDK multiple
times, which will be useful when we move to instantiating MeterProvider
via the config SDK. This bug is blocking the change to bump up the
dependency on the config package.
There are a few alternatives that were considered:
1. could set the resource's service.instance.id on the config object
instead. this seemed messy as it would be editing the config struct and
the instantiation of the SDK would remain duplicated.
2. update the factory to pass in a resource object option, i didn't want
to update the NewFactory func.
3. update the CreateLogger, CreateTracerProvider to receive either a
resource or sdk parameter, both of those seemed incorrect as well.
---------
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
Move componentprofiles to pipelineprofiles since only the signal
constant is defined in that package.
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
#### Description
As part of
https://github.com/open-telemetry/opentelemetry-collector/pull/11204 I
left `host`'s implementation of `GetExporters` alone, thinking I didn't
need to change a deprecated interface implementation.
But `GetExporters` depends on `component.DataType`, so we won't be able
to remove `component.DataType` unless `GetExporters` is updated to use
`pipeline.Signal` instead.
This PR deprecates the deprecated `GetExporters` interface with
`GetExportersWithSignal`, which uses `pipeline.Signal`. Then we'll
follow the process to rename `GetExportersWithSignal` back to
`GetExporters`.
This bug caused proctelemetry metrics to not be registered if a user
configured the Collector's internal telemetry via `readers` only and
disabled `address`. The check in the `if` statement is no longer needed
since a no-op meter provider will be configured unless the telemetry
level is set.
Mentioned in #10919
---------
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This is the last PR to add profiles support, adding it to the service
package.
This is based after #11023.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the processor builder out of the `processor` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the exporter builder out of the `exporter` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the connector builder out of the `connector` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
While this is not technically required for the profiles work (there is
no notion of signals in extensions), this PR is here to keep things
consistent.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This moves the connector builder out of the `connector` package, and
into `service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
<!-- Issue number if applicable -->
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
#### Description
This moves the receiver builder out of the `receiver` package, and into
`service/internal/builders`.
There's no real reason for this struct to be public (folks shouldn't
call it), and making it private will allow us to add profiling support
to it.
#### Link to tracking issue
https://github.com/open-telemetry/opentelemetry-collector/pull/10375#pullrequestreview-2144929463
#### Description
This PR removes `ReportStatus` from `component.TelemetrySettings` and
instead expects components to check if their `component.Host` implements
a new `componentstatus.Reporter` interface.
<!-- Issue number if applicable -->
#### Link to tracking issue
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10725
Related to
https://github.com/open-telemetry/opentelemetry-collector/pull/10413
<!--Describe what testing was performed and which tests were added.-->
#### Testing
unit tests and a sharedinstance e2e test.
The contrib tests will fail because this is a breaking change. If we
merge this I and @mwear can commit to updating contrib before the next
release.
---------
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This change adds `goleak` to check for memory leaks. Originally there
were 3 failing tests in the `service` package, so I'll describe changes
in relation to resolving each test's failing goleak check.
1. `TestServiceTelemetryRestart`: Simplest fix, close the response body
to make sure goroutines aren't leaked by reopening a server on the same
port. This was just a test issue
2. `TestTelemetryInit.UseOTelWithSDKConfiguration`: The [meter
provider](fb3ed1b0d6/service/telemetry.go (L57-L58))
was being started in the initialization process ([metrics
reference](fb3ed1b0d6/service/internal/proctelemetry/config.go (L135))),
but never shutdown. The type originally being used
(`meter.MetricProvider`) was the base interface which didn't provide a
`Shutdown` method. I changed this to use the `sdk` interfaces that
provide the required `Shutdown` method. The actual functionality of
starting the providers was already using and returning the `sdk`
interface, so the actual underlying type remains the same. Since `mp` is
a private member and `sdkmetric` and implement the original type, I
don't believe changing the type is a breaking change.
3. `TestServiceTelemetryCleanupOnError`: This test starts a server using
a sub-goroutine, cancels it, starts again in a subroutine, and cancels
again in the main goroutine. This test showed the racing behavior of the
subroutine running
[`server.ListenAndServe`](fb3ed1b0d6/service/internal/proctelemetry/config.go (L148))
and the main goroutine's functionality of [calling
close](fb3ed1b0d6/service/telemetry.go (L219))
and then starting the server again [right
away](fb3ed1b0d6/service/service_test.go (L244)).
The solution here is to add a `sync.WaitGroup` variable that can
properly block until all servers are closed before returning from
`shutdown`. This will allow us to ensure it's safe to proceed knowing
the ports are free, and server is fully closed.
The first test fix was just a test issue, but 2 and 3 were real bugs. I
realize it's a bit hard to read with them all together, but I assumed
adding PR dependency notes would be more complicated.
**Link to tracking Issue:** <Issue number if applicable>
#9165
**Testing:** <Describe what testing was performed and which tests were
added.>
All tests are passing as well as goleak check.
---------
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
#### Description
Fixing the bug: the latest version of otel-collector failed to start
with ipv6 metrics endpoint service telemetry.
This problem began to occur after
https://github.com/open-telemetry/opentelemetry-collector/pull/9037 with
the feature gate flag enabled was merged. This problem is probably an
implementation omission because the enabled codepath, which was
originally added by
https://github.com/open-telemetry/opentelemetry-collector/pull/7871, is
marked as WIP.
You can reproduce the issue with the config and the environment variable
(`MY_POD_IP=::1`).
```yaml
service:
telemetry:
logs:
encoding: json
metrics:
address: '[${env:MY_POD_IP}]:8888'
```
#### Link to tracking issue
Fixes
https://github.com/open-telemetry/opentelemetry-collector/issues/10011
---------
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
This deprecates CreateSettings in favour of Settings.
NewNopCreateSettings is also being deprecated in favour of
NewNopSettings
Part of #9428
Signed-off-by: Alex Boten <223565+codeboten@users.noreply.github.com>
**Description:**
- Adds `component.MustNewType` to create a type. This function panics if
the type has invalid characters. Add similar functions
`component.MustNewID` and `component.MustNewIDWithName`.
- Adds `component.Type.String` to recover the string
- Use `component.MustNewType`, `component.MustNewID`,
`component.MustNewIDWithName` and `component.Type.String` everywhere in
this codebase. To do this I changed `component.Type` into an opaque
struct and checked for compile-time errors.
Some notes:
1. All components currently on core and contrib follow this rule. This
is still breaking for other components.
2. A future PR will change this into a struct, to actually validate this
(right now you can just do `component.Type("anything")` to bypass
validation). I want to do this in two steps to avoid breaking contrib
tests: we first introduce this function, and after that we change into a
struct.
**Link to tracking Issue:** Updates #9208
This PR introduces component status reporting. There have been several
attempts to introduce this functionality previously, with the most
recent being: #6560.
This PR was orignally based off of #6560, but has evolved based on the
feedback received and some additional enhancements to improve the ease
of use of the `ReportComponentStatus` API.
In earlier discussions (see
https://github.com/open-telemetry/opentelemetry-collector/pull/8169#issuecomment-1668367246)
we decided to model status as a finite state machine with the following
statuses: `Starting`, `OK`, `RecoverableError`, `PermanentError`,
`FatalError`. `Stopping`, and `Stopped`. A benefit of this design is
that `StatusWatcher`s will be notified on changes in status rather than
on potentially repetitive reports of the same status.
With the additional statuses and modeling them using a finite state
machine, there are more statuses to report. Rather than having each
component be responsible for reporting all of the statuses, I automated
status reporting where possible. A component's status will automatically
be set to `Starting` at startup. If the components `Start` returns an
error, the status will automatically be set to `PermanentError`. A
component is expected to report `StatusOK` when it has successfully
started (if it has successfully started) and from there can report
changes in status as it runs. It will likely be a common scenario for
components to transition between `StatusOK` and `StatusRecoverableError`
during their lifetime. In extenuating circumstances they can transition
into terminal states of `PermanentError` and `FatalError` (where a fatal
error initiates collector shutdown). Additionally, during component
Shutdown statuses are automatically reported where possible. A
component's status is set to `Stopping` when Shutdown is initially
called, if Shutdown returns an error, the status will be set to
`PermanentError` if it does not return an error, the status is set to
`Stopped`.
In #6560 ReportComponentStatus was implemented on the `Host` interface.
I found that few components use the Host interface, and none of them
save a handle to it (to be used outside of the `start` method). I found
that many components keep a handle to the `TelemetrySettings` that they
are initialized with, and this seemed like a more natural, convenient
place for the `ReportComponentStatus` API. I'm ultimately flexible on
where this method resides, but feel that `TelemetrySettings` a more user
friendly place for it.
Regardless of where the `ReportComponentStatus` method resides (Host or
TelemetrySettings), there is a difference in the method signature for
the API based on whether it is used from the service or from a
component. As the service is not bound to a specific component, it needs
to take the `instanceID` of a component as a parameter, whereas the
component version of the method already knows the `instanceID`. In #6560
this led to having both `component.Host` and `servicehost.Host` versions
of the Host interface to be used at the component or service levels. In
this version, we have the same for TelemetrySettings. There is a
`component.TelemetrySettings` and a `servicetelemetry.Settings` with the
only difference being the method signature of `ReportComponentStatus`.
Lastly, this PR sets up the machinery for report component status, and
allows extensions to be `StatusWatcher`s, but it does not introduce any
`StatusWatcher`s. We expect the OpAMP extension to be a `StatusWatcher`
and use data from this system as part of its AgentHealth message (the
message is currently being extended to accommodate more component level
details). We also expect there to be a non-OpAMP `StatusWatcher`
implementation, likely via the HealthCheck extension (or something
similiar).
**Link to tracking Issue:** #7682
cc: @tigrannajaryan @djaglowski @evan-bradley
---------
Co-authored-by: Tigran Najaryan <tnajaryan@splunk.com>
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Alex Boten <aboten@lightstep.com>
**Description:**
This adds an optional `ConfigWatcher` interface that extensions can
implement if they want to be notified of the effective configuration
that is used by the Collector.
I don't feel very strongly about any of the decisions I made in this PR,
so I am open to input if we would like to take a different approach
anywhere. I will leave some comments to explain the decisions I made.
**Link to tracking Issue:**
Closes
https://github.com/open-telemetry/opentelemetry-collector/issues/6596
**Testing:**
I've made minimal unit test changes, but I expect to write more tests
once there is consensus on the direction for implementing this
functionality. I have done some manual testing to show that an extension
can get a YAML representation of the effective config using two YAML
input files.
---------
Co-authored-by: Evan Bradley <evan-bradley@users.noreply.github.com>