Values passed in to OTEL_RESOURCE_ATTRIBUTES containing an equal sign "=" are currently ignored by the Resource constructor, but should be accepted as it is part of the W3C Baggage octet
range.
Example to demonstrate using tracing as a global error handler for errors generated by the OpenTelemetry Metrics SDK. In this example, measurements are recorded to exceed the cardinality limit, which triggers the error to be logged. This error is then emitted to `stdout` using `opentelemetry-appender-tracing` subscriber.
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
Co-authored-by: Harold Dost <h.dost@criteo.com>
Co-authored-by: Cijo Thomas <cijo.thomas@gmail.com>
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
Similar issue to what #472 but for opentelemetry-http
Please provide a brief description of the changes here.
Added new feature flag for the opentelemetry-http crate to enable rustls instead of openssl
Co-authored-by: Harold Dost <github@hdost.com>
Co-authored-by: Cijo Thomas <cijo.thomas@gmail.com>
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
The purpose of this is to ensure we understand the API surface area. Anything exposed by a crate like a type can affect the possible compatibility. This means that if we bump an external type it could cause breaking changes. This tracks them to limit this possibility.
Affected Crates:
opentelemetry-otlp
opentelemetry-zipkin
Co-authored-by: Harold Dost <github@hdost.com>
Co-authored-by: Ed Morley <501702+edmorley@users.noreply.github.com>
Co-authored-by: Cijo Thomas <cijo.thomas@gmail.com>
In #1192, chrono was added as a dependency of the opentelemetry-stdout crate in order to support outputting timestamps in human readable format.
In that PR, all Chrono features were disabled apart from the clock feature.
However, since that change landed, chrono v0.4.32 has added support for an even finer-grained feature named now, which is a subset of the clock feature - that excludes timezone support, and so avoids pulling in many timezone related crates.
- Add deprecation flags to the various components to give notices in users compilers.
- Add more definition around the deprecation in the README.
- Add a deprecation badge for crates.io.
Relates #995
Make endpoint Uri construction less fragile by stripping out duplicate slashes.
Fixes#997
The functionality to append the signal path to a user-supplied tracing endpoint is already in place. However, it does so in a way that is likely to break for any user who's passing the string value of a Uri or Url, which will have a trailing slash appended to them. This attempts to fix that issue.
There were other possible changes discussed but ever implemented/merged in #1056. This PR attempts to keep it simple by just changing existing behavior to not break in a common use case.
This should allow all fields in event to be optional, which is needed to
decode JSON strings
## Changes
- add `serde(default)` to `Events` for tonic generated types
- add `serde(default)` to `Status` for tonic generated types
## Merge requirement checklist
* [x]
[CONTRIBUTING](https://github.com/open-telemetry/opentelemetry-rust/blob/main/CONTRIBUTING.md)
guidelines followed
* [x] Unit tests added/updated (if applicable)
* [ ] Appropriate `CHANGELOG.md` files updated for non-trivial,
user-facing changes
* [ ] Changes in public API reviewed (if applicable)
Implement new bench ideas from this
[PR](https://github.com/open-telemetry/opentelemetry-rust/pull/1431)’s
comments.
Please provide a brief description of the changes here.
## Merge requirement checklist
* [ ]
[CONTRIBUTING](https://github.com/open-telemetry/opentelemetry-rust/blob/main/CONTRIBUTING.md)
guidelines followed
* [ ] Unit tests added/updated (if applicable)
* [ ] Appropriate `CHANGELOG.md` files updated for non-trivial,
user-facing changes
* [ ] Changes in public API reviewed (if applicable)
Co-authored-by: Lalit Kumar Bhasin <lalit_fin@yahoo.com>
Part of the effort of #1327 as we need json formats for assertation in
integration tests.
## Changes
- add configuration to serde to deserialize the json in camelCase field
name
- add custom (de)serialization for traceId, spanId as they are
case-insensitive hex encoded string(see
[here](https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding))
- add custom (de)serialization for `KeyValue`
- add tests for above, and a test using example json files
## Merge requirement checklist
* [ x]
[CONTRIBUTING](https://github.com/open-telemetry/opentelemetry-rust/blob/main/CONTRIBUTING.md)
guidelines followed
* [x] Unit tests added/updated (if applicable)
* [x] Appropriate `CHANGELOG.md` files updated for non-trivial,
user-facing changes
* [] Changes in public API reviewed (if applicable)
Adds shared dependencies to the workspace.
It improves the experience while managing the dependencies.
## Changes
Picked the shared dependencies of the member projects and used added
then to the workspace dependencies, https://crates.io/crates/work_dep
can help with that.
## Merge requirement checklist
* [ ]
[CONTRIBUTING](https://github.com/open-telemetry/opentelemetry-rust/blob/main/CONTRIBUTING.md)
guidelines followed
* [ ] Unit tests added/updated (if applicable)
* [ ] Appropriate `CHANGELOG.md` files updated for non-trivial,
user-facing changes
* [ ] Changes in public API reviewed (if applicable)
`metrics::Aggregation::validate()` has a bug that limits `max_scale` of
a `Base2ExponentialHistogram` to the interval `[10, 20]` instead of the
expected `[-10, 20]`.
This adds `impl Into` to two logs methods, the first to match the
equivalent metrics/traces method and the other to make `with_body` more
ergonomic so you can do `with_body("hello")`
This change removes the old `global::shutdown_meter_provider` method
which is not part of the metrics API spec, and properly documents the
`SdkMeterProvider::shutdown` method which is spec compliant.
The hash of `AttributeSet`s are expensive to compute, as they have to be
computed for each key and value in the attribute set. This hash is used
by the `ValueMap` to look up if we are already aggregating a time series
for this set of attributes or not. Since this hashmap lookup occurs
inside a mutex lock, no other counters can execute their `add()` calls
while this hash is being calculated, and therefore contention in high
throughput scenarios exists.
This PR calculates and caches the hashmap at creation time. This
improves throughput because the hashmap is calculated by the thread
creating the `AttributeSet` and is performed outside of any mutex locks,
meaning hashes can be computed in parallel and the time spent within a
mutex lock is reduced. As larger sets of attributes are used for time
series, the benefits of reduction of lock times should be greater.
The stress test results of this change for different thread counts are:
| Thread Count | Main | PR |
| -------------- | ---------- | --------- |
| 2 | 3,376,040 | 3,310,920 |
| 3 | 5,908,640 | 5,807,240 |
| 4 | 3,382,040 | 8,094,960 |
| 5 | 1,212,640 | 9,086,520 |
| 6 | 1,225,280 | 6,595,600 |
The non-precomputed hashes starts feeling contention with 4 threads, and
drops substantially after that while precomputed hashes doesn't start
seeing contention until 6 threads, and even then we still have 5-6x more
throughput after contention due to reduced locking times.
While these benchmarks may not be "realistic" (since most applications
will be doing more work in between counter updates) it does show a
benefit of better parallelism and the opportunity to reduce lock
contention at the cost of only 8 bytes per time series (so a total of
16KB additional memory at maximum cardinality).
Modified version of the
[tracingresponse](https://github.com/open-telemetry/opentelemetry-rust-contrib/tree/main/examples/traceresponse)
example (in `contrib` repo) to demonstrate context propagation from
client to server. The example
- Removes the code to propagate trace-context as part of response
headers from server to client, as the W3C specs is still in draft
(https://w3c.github.io/trace-context/#trace-context-http-response-headers-format),
and also the propagator is part of contrib repo.
- Modify the HTTP server and client code to look more complete and also
demonstrate the context propagation across async delegates.
**_Server_** - Enhance the server's request handling, by adding support
for `/echo` and `/health` endpoints. Upon receiving the request, the
server now creates a child span, linked to the originating remote span,
and the request is forwarded to its respective delegate async task.
Furthermore, within each async task, a subsequent child span is spawned,
parented by the initial child span. This nested span creation
exemplifies the effective propagation of tracing context through the
multiple layers of async execution.
**_Client_** - The client sends requests for `/echo` and `/health`
within the context of the client root span.
In some cases (e.g. rust wasm), it is not possible to have
network-related crates (e.g. grpcio) included, and we would like to
support protobuf encoding support. so similar to the already existing
feature `gen-tonic-messages`, here introduce a feature
`gen-grpcio-messages` to support protobuf encoding without any network
client/server included.
## Changes
replaces some of existing `gen-grpcio` to `gen-grpcio-messages`
enables feature `gen-grpcio` will automatically enable
`gen-grpcio-messages`
Limit threads to 1 (to force tests to run
consecutively) to temporarily fix random [failures](https://github.com/open-telemetry/opentelemetry-rust/actions/runs/6915742069/job/18815025248)
during `opentelemetry-jaeger` tests, due to environment variable updates
from parallel tests
If you run this command line multiple times, you should be able to
reproduce it (`test_resolve_timeout` and `test_resolve_endpoint` are
updating some environment variables)
```shell
cargo test --manifest-path=opentelemetry-jaeger/Cargo.toml --all-features collector -- --test-threads=5
```
As per the OTel [specs], the value to be recorded with histogram instrument SHOULD be non-negative. Removing the existing method to record signed integer values.
> The value is expected to be non-negative. This API SHOULD be documented in a way to communicate to users that > this value is expected to be non-negative. This API SHOULD NOT validate this value, that is left to implementations
> of the API.
[specs]: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#histogram
SimpleLogProcessor's `force_flush` was incorrectly implemented as it did not do anything assuming there was nothing to flush. But the SimpleLogProcessor uses a queue with pub-sub mechanism to operate, so flush is not a no-op. This is handled correctly for SimpleSpanProcessor.
Instead of borrowing the approach from SimpleSpanProcessor, I have refactored the SimpleLogProcessor to use no queue and instead use a simple Mutex protected exporter/shutdown. I feel this is sufficient (and want to port this to SimpleSpanProcessor as well), but would like to get feedback on the approach - was there some scenario which prompted the queue/pub-sub for SimpleProcessors? Or this approach is sufficient?
There should not be any perf concerns as SimpleProcessors are used for learning/dev scenarios, and not for production. (Except when you are exporting to operating system native tracing like etw, user_events, but for them we have written RentrantProcessor separately.)
* feat: add generated modules that output const &str for tracing compatibility
* fix: add tracing as a dev-dependency for doc examples
* fix: remove unused code
* fix: remove tracing examples from semconv docs
* docs: remove extra whitespace that was added previously
## Motivation
The [metric unit semantic conventions] suggest that integer counts
should use annotations (e.g. `{packet}`), which breaks the current unit
appending logic as they are not properly escaped.
[metric unit semantic conventions]: https://github.com/open-telemetry/semantic-conventions/blob/v1.23.0/docs/general/metrics.md#instrument-units
## Solution
Ignore unknown units (including annotations) as other language
implementations currently do. This change also removes the `$` mapping
as it is not UCUM.
RuntimeChannel::batch_message_channel needs to be generic over the
message type. The type used to be declared on the RuntimeChannel<T>
trait. This means a RuntimeChannel can only be used with one particular
message type, which feels unfortunate.
fn install<R: RuntimeChannel<??::BatchMessage>>(runtime: R) {
// Can't use the same runtime here. :-(
TracerProvider::builder().with_batch_exporter(e, runtime);
LoggerProvider::builder().with_batch_exporter(e, runtime);
}
This change moves the type argument to the batch_message_channel<T>
function and the associated types Receiver<T> and Sender<T>. Channels
are still specific to a message type, but a RuntimeChannel can be used
with any number of message types.
fn install<R: RuntimeChannel>(runtime: R) {
// It works. :-)
TracerProvider::builder().with_batch_exporter(e, runtime);
LoggerProvider::builder().with_batch_exporter(e, runtime);
}
This also means the BatchMessage types no longer need to be public.
Add additional warning.
NOTE: Not going to bump the Otel Version since this crate will be
removed after this.
Signed-off-by: Harold Dost <h.dost@criteo.com>
* Remove extra generic from with_http_client
This generic type parameter is not being used anywhere, so when you try to use `with_http_client`, it requires you to specify a dummy type that implements `HttpClient + 'static`.
So this does not work:
```rust
new_pipeline().with_http_client(Arc::new(client))
```
But these work, even though they don't necessarily match the actual type of `client`:
```rust
new_pipeline::<Arc<dyn HttpClient>>().with_http_client(Arc::new(client))
new_pipeline::<reqwest::Client>().with_http_client(Arc::new(client))
```
* Add test and CHANGELOG entry
* initial commit
* fix
* remove jaeger exporter
* add service_name as CARGO_BIN_NAME
* remove jaeger container, as we instead use Debug Exporter
* Update opentelemetry-otlp/examples/basic-otlp-http/README.md
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
* only keep required ports
* read binary name at compile time
---------
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
* remove regex crate
* update cargo.toml
* fix dependency
* add unittest
* add changelog
* order tests
* fix unit test
* do disk cleanup before tests
* fix disk cleanup
* fix disk cleanup
* fix disk cleanup
* fix disk cleanup
* fix disk cleanup
feat: force user to add exporter builder pipeline.
Currently, when users don't configure a builder pipeline. We will throw a runtime error.
This change force user to call `with_exporter` before building exporters or installing pipeline and eliminate the `NoExporterBuilder` error
* Cleanup crate documentation
- mainly this removes circular dev-dependencies between the api,
sdk, and stdout crates.
* Some things need to be behind "trace" feature
- Changed NOOP_SPAN from a lazy static to a const
- Remove branch in build_with_context for the noop tracer.
The current context's span() method already returns &NOOP_SPAN in the
case when no active span is present, which already has the right span
context value to use.
- Changed other constituents of span context to use const default values too.
- lookup on TypeId and downcasting isn't necesary
- Context::with_value and current_with_value are more efficient
as they no longer clone and overwrite the entry in the map
which represens the current span
* Move metric validation from api to sdk
* moved validation process for metrics instrument from api to sdk.
* included hyphens in instrument names as valid values.
* increase instrument name maximum length from 63 to 255 characters.
* refactor(metrics): rename InstProvider to InstrumentProvider
* refactor(metrics): define spec limitation to const
This simplifies memory reuse on subsequent collection cycles.
Also small ergonomics changes:
* Move `MetricsProducer` config to builders to match other config
* Remove the need for separate `Delta*` and `Cumulative*` aggregate
types
* Return error earlier if readers are shut down
* Log warning if two instruments have the same name with different
casing
* Log warning if view is created with empty criteria
* feat: addInMemoryLogExporter
* linting
* Update CHANGELOG.md
* changed finished to emitted logs, moved example
* replace clone_log with cloned()
* remove proj example in favor of single file
* corrected dependency, removed repeated code
* changed finished to emitted logs, moved example
* remove proj example in favor of single file
* corrected dependency, removed repeated code
* corrected dependency, removed repeated code
* fix: endpoint urls for otlp http exporter. (#1210)
* Move opentelemetry_api code back into opentelemetry and remove the former (#1226)
* [user_events log exporter] Upgrade eventheader-dynamic dependency (#1230)
* feat: addInMemoryLogExporter
* linting
* Update CHANGELOG.md
* remove the example from examples
* fixes after rebase
* missing doc, added no_run in examples
* fix ci test(stable) and docs
* more examples ci fixes
* added dev-dependencies to resolve ci issue
* corrected required features
* remove comments about returning error
* Update example description
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
---------
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
Co-authored-by: Shaun Cox <shaunco@microsoft.com>
Co-authored-by: Lalit Kumar Bhasin <lalit_fin@yahoo.com>
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
* fix(zpages): use tonic based generated files.
The reason zpages cannot compile after #1202 is prost backed grpcio compiler no longer allow us to add `serde` macros onto the types. Thus, instead of using grpcio types, use tonic types in zpages fixes it.
* add changelog
The EvictedQueue was checking for the length _before_ inserting, and
popping extra items, then doing the insertion. In the case where the
capacity is set to zero, it caused the pop operation to be a no-op on
the first insert, and then insert an item anyway.
This commit fixes the issue by moving the length check after the insert
and popping any extra items.
* Adding Copyright Holders
Using the Copyright holder of "The OpenTelemetry Authors" as
recommended.
* Update the LICENSE Files Per the CNCF Guidelines:
https://github.com/cncf/foundation/blob/main/copyright-notices.md#copyright-notices
* Don't add authors to Cargo.toml
* Per [RFC 3052](https://rust-lang.github.io/rfcs/3052-optional-authors-field.html)
* Remove from opentelemetry-proto.
Signed-off-by: Harold Dost <h.dost@criteo.com>
* Update LICENSE
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
---------
Signed-off-by: Harold Dost <h.dost@criteo.com>
Co-authored-by: Julian Tescher <julian@tescher.me>
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
Merge the TraceRuntime and LogRuntime traits in the trace and logs
module respectively into MessageRuntime, an extension trait to Runtime
generic over the type of Message/Signal type to be sent and
received. All references to the older traits now become
MessageRuntime<trace::BatchMessage> and
MessageRuntime<logs::BatchMessage> respectively.
* Logging SDK.
* Add From implementations for Any.
* Fix docs links.
* Add Into<Any> impls for f64 and f32.
* Support converting serde_json::Value to Any.
* Fix docs link.
* Remove unused dependency.
* Add LogRecordBuilder, documentation for Severity.
* Add LogRecordBuilder::new().
* with_body: Remove unneeded generic parameter.
* Remove unneeded generic parameters.
* Enforce LogProcessor is Send + Sync.
* export: Use the correct variables.
* LogEmitterProvider: Enable Clone. Add shutdown, try_shutdown.
* Remove From implementation for serde_json values.
* Fix typo.
* Add Default impl for LogRecordBuilder.
* Update to work with opentelemetry-proto v0.14.0.
* Remove tests.
* Avoid using wildcard imports.
* Rename feature/module "log" to "logs".
* Implement Drop for LogEmitterProvider.
* Use std::convert::identity.
* Use opentelemetry-log-exporter as the thread name.
* Use the correct module name.
* Remove From<Severity> impls for SeverityNumber.
* log_emitter: Set emitter version as None.
* Rename attributes_to_keyvalue to attributes_to_keyv_alue
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
* Update logs
* Fix typos in feature names.
* Add logs protobuf files to GRPCIO_PROTO_FILES.
* Update to opentelemetry-proto 0.19.0.
* Update crates/modules names.
* Remove incorrect exporter example in docs.
* Move stdout logs exporter to the opentelemetry-stdout crate.
* Store resource using Cow instead of Arc.
* Add From<Cow<str>> implementation for Key.
* Update logging SDK.
* Add ordered-float dependency.
* Rewrite LogsExporter.
* Move LogRecord to api crate, simplify resources.
* Add API traits for logs.
* Add api trait impls.
* Add no-op impl for Logger and LoggerProvider.
* Use api traits.
* Add global logger/loggerproviders.
* Add include_trace_context, make the component name param ergonomic.
* Update docs.
* fix: lint
* logs: Rename Any to AnyValue.
* Address docs and lint issues.
---------
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
ForceFlush seems to have been left behind in #502. With those changes, the processing is not really synchronous anymore, i.e. OnEnd now only sends the span down the pipe to be processed in the separate thread as soon as possible.
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#forceflush-1 says:
> In particular, if any SpanProcessor has any associated exporter, it SHOULD try to call the exporter's Export with all spans for which this was not already done and then invoke ForceFlush on it.
As the comment states, all spans previously got exported synchronounsly right away, so that no such spans existed, but now they might be anywhere between the channel and (the end of) the export call. Doin
g nothing in ForceFlush even violates the specification as...
> The built-in SpanProcessors MUST do so.
Awaiting all open tasks from the channel on ForceFlush fixes this.
Previous discussions regarding parts of the specification that this does not tackle in line with Shutdown:
> ForceFlush SHOULD provide a way to let the caller know whether it succeeded, failed or timed out.
https://github.com/open-telemetry/opentelemetry-rust/pull/358#issuecomment-725449486
> ForceFlush SHOULD complete or abort within some timeout.
https://github.com/open-telemetry/opentelemetry-rust/pull/502/files#r603722431
This brings the simple processor a step closer to the batch processor with the obvious main difference of batches and the (not so obvious, also see https://github.com/open-telemetry/opentelemetry-rust/pull/502#issuecomment-809740071) difference that it works without a presumed async runtime.
This patch updates the metrics SDK to the latest spec. The following
breaking changes are introduced.
Metrics API changes:
* Move `AttributeSet` to SDK as it's not mentioned in the spec or used
in the api
* Consolidate `AsyncCounter`, `AsyncUpDownCounter`, and `AsyncGauge`
into `AsyncInstrument` trait and add downcasting for observer
callbacks.
* Add `AsyncInstrumentBuilder` to allow per-instrument callback
configuration.
* Allow metric `name` and `description` fields to be `Cow<'static, str>`
* Warn on metric misconfiguration when using instrument builder `init`
rather than returning error
* Update `Meter::register_callback` to take a list of async instruments
and validate they are registered in the callback through the
associated `Observer`
* Allow registered callbacks to be unregistered.
Metrics SDK changes:
* Introduce `Scope` as type alias for `InstrumentationLibrary`
* Update `Aggregation` to match aggregation spec
* Refactor `BasicController` to spec compliant `ManualReader`
* Refactor `PushController` to spec compliant `PeriodicReader`
* Update metric data fields to match spec, including exemplars.
* Split `MetricsExporter` into `Reader`s and `PushMetricExporter`s
* Add `View` implementation
* Remove `AtomicNumber`s
* Refactor `Processor`s into `Pipeline`
Metrics exporter changes:
* Update otlp exporter to match new metrics data
* Update otlp exporter configuration to allow aggregation and
temporality selectors to be optional.
* Update prometheus exporter to match new metrics data
Example changes:
* Update otlp metrics and prometheus examples.
* Remove basic example as we should be focusing on the OTLP variants
* Fix the array encoding of datadog version 05 exporter
* Fix type for array length
* Fix unit test
* Fix unit test correctly
* opentelemetry-datadog: Add missing Headers
* Version
* Language
* fix: format
---------
Co-authored-by: Harold Dost <h.dost@criteo.com>
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
* suggestion: CARGO_BIN_NAME instead of unknown_service as default service name
* update: docs for SdkProvidedResourceDetector struct
* use instead of env::var
* feat: add scope attr
* feat: add scope attr
* feat: add scope attr
* feat: must have version scope
* feat: add scope metric info
* feat: add test
* style: rust fmt
* style: rust fmt
* feat: change disable_scope_info to with_scope_info, and let the default value be true
* In an effort provide a bit more transparency. This contains a list of
the maintainers and approvers name and GitHub handles.
Relates #844
Signed-off-by: Harold Dost <h.dost@criteo.com>
* Only run ParentBased delegate sampler when there is no parent
* fix(trace): add test and fix lint issue
---------
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
* feat(metrics): remove memory settings.
memory setting is used to control whether to keep the instruments if there are no updates in this collection internal.
This should instead configure via temporality.
memory = true -> Cumulative
memory = false -> Delta
* test(metrics): add tests for temporality
* fix(metrics): test
* test(metrics): adding more tests
When @rex-remind101 added 3806258ee5 he didn't
update the length of the map encoded as msgpack.
This increases the numbers so that the messages are valid. I've
confirmed this fixes v0.3 protocol messages, and updated tests to
reflect the changes.
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
* chore(common): fix deny.
Most of the warnings from deny are unmaintained/deprecated create which we depend on indirectly either in dev or build
I don't believe there are any action item for us to fix them.
Also split the doc as separate task in CI. It's confusing to fail the coverage because of the doc failures
* set log level of cargo deny to error
* remove unnecessary entry in deny.toml
* chore: generate proto files
* address comments
* try to fix unsoundness and manually ignore them one by one if needed
* fix(sdk): ignore error if the channel has already shutdown
* Improve OTLP environment variable handling
OTLP exporter default endpoint changed to http
There are a couple of places that the OTLP exporter differed from the spec https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md
1. OTEL_EXPORTER_OTLP_PROTOCOL was not supported.
2. The default endpoint used https rather than http.
The following has changed:
1. OTEL_EXPORTER_OTLP_PROTOCOL is supported.
2. If http-proto feature is enabled the default will be to use http-proto rather than grpc.
3. The default endpoint will be set according to the default protocol.
4. The default endpoint uses http rather than https.
5. Tests added.
Fixes#909#908
* Lint fixes
* Fix changelog PR number
- fix new clippy issues with 1.65
- Update generate files
- Update `patch_dependencies.sh`. `cargo update` now automatically downgrade the `time` create a compatible version.
- Remove `--verbose` in `cargo test` in `msrv`
* headers in otlp exporters
* tonic specific headers and grpc patch
* typo
* fmt
* more linting / fmt
* fixed metadata insert -- lesson learned, headers are the way to go :P
* woopsie, fmt
* move headers to their own fn, patch for tonic
* when not compiling with tls lint fails -- try again
* oof I'm pretty sure something else will happen
* tls flag
* get_or_insert
The latest exporter deployment will complain about the invalid
configure ion, which now has moved to a new `tls` key.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Sampling should always defer to the existing sampler. The logic before
this patch reflected something similar to what is described in the
ParentBased sampler.
Signed-off-by: Harold Dost <h.dost@criteo.com>
Currently span processors will send span data to exporters even if the
sampled flag is false. This patch fixes this by explicitly checking the
sampled state in both the batch and simple span processors.
* Fix comment header to use simpler language
* Fix comment header to refer to correct binary
* Coalesce use statements
* Update env_logger to 0.9.0
* ensure that both examples are formatted the same
* fix(common): clippy issue raised by 1.63
* fix(common): nightly issues
* fix(jaeger): take self reference for `collector_password`
* fix(common): pin serde to 1.0142 as workaround of https://github.com/rust-lang/cargo/issues/10954
* fix(doc): fix type link in docs
This change aligns metrics with the spec, changes include:
* Rename `MeterProvider::meter` to `MeterProvider::versioned_meter` for
consistency with `TracerProvider` trait.
* Move metrics sdk api types to `opentelemetry-sdk`
* Consolidate instrument builders into `InstrumentBuilder`
* Remove value observers and add gauges.
* Move from batch observer to registered callbacks.
* Rename `ExportKindFor` to `TemporalitySelector`
* Consolidate `PushController` and `PullController` into
`BasicController`
* Remove `MinMaxSumCountAggregator` and `ArrayAggregator`
* Update examples and exporters for new api/sdk
* remove pin-project from crates which weren't actually using it
* switch to pin-project-lite to improve compilation time and impose
fewer deps at consumers (ecosystem migrated to -lite awhile ago
https://github.com/tokio-rs/tokio/pull/1778)
This updates attribute `Key` and `Value` inner types from `Cow<'static, str>`
to a new private `OtelString` enum to support static, owned, or ref counted
values.
When `Propagator::extract_with_context()` is run with an Extractor empty
of trace metadata, it returns a null (invalid) Context, clearing
TraceId, and disabling downwards propagation. As there is already an
implicit TraceId (if we're in a Span), we'd like to use it, traces have
to start somewhere.
This just aligns the mechanics with opentelemetry-sdk: if span
extraction errors out we return a clone of the existing context.
* [dd-metrics-encode] encode sampling priority into metrics object of data dog exported data. data dog uses this to sample.
* [dd-metrics-encode] resolve ci issues by bumping msrv version
* [dd-metrics-encode] resolve ci issues by bumping msrv version to 1.59
I propose to fix the lower bound of `indexmap` to `1.8` as versions below `1.7` are not compiling. I get the following error:
```
error[E0432]: unresolved imports `indexmap::map::IntoKeys`, `indexmap::map::IntoValues`
--> /home/peter/.cargo/git/checkouts/opentelemetry-rust-458e5dd530230e7a/a767fd3/opentelemetry-api/src/trace/order_map.rs:3:29
|
3 | Drain, Entry, IntoIter, IntoKeys, IntoValues, Iter, IterMut, Keys, Values, ValuesMut,
| ^^^^^^^^ ^^^^^^^^^^ no `IntoValues` in `map`
| |
| no `IntoKeys` in `map`
error[E0599]: no method named `into_keys` found for struct `IndexMap` in the current scope
--> /home/peter/.cargo/git/checkouts/opentelemetry-rust-458e5dd530230e7a/a767fd3/opentelemetry-api/src/trace/order_map.rs:95:16
|
95 | self.0.into_keys()
| ^^^^^^^^^ method not found in `IndexMap<K, V, S>`
error[E0599]: no method named `into_values` found for struct `IndexMap` in the current scope
--> /home/peter/.cargo/git/checkouts/opentelemetry-rust-458e5dd530230e7a/a767fd3/opentelemetry-api/src/trace/order_map.rs:111:16
|
111 | self.0.into_values()
| ^^^^^^^^^^^ method not found in `IndexMap<K, V, S>`
error[E0308]: mismatched types
--> /home/peter/.cargo/git/checkouts/opentelemetry-rust-458e5dd530230e7a/a767fd3/opentelemetry-api/src/trace/order_map.rs:561:29
|
561 | Self(IndexMap::from(arr))
| ^^^ expected struct `IndexMap`, found array
|
= note: expected struct `IndexMap<K, V>`
found array `[(K, V); N]`
error[E0308]: mismatched types
--> /home/peter/.cargo/git/checkouts/opentelemetry-rust-458e5dd530230e7a/a767fd3/opentelemetry-api/src/trace/order_map.rs:647:29
|
647 | Self(IndexMap::from(arr))
| ^^^ expected struct `IndexMap`, found array
|
= note: expected struct `IndexMap<Key, Value>`
found array `[(Key, Value); N]`
Some errors have detailed explanations: E0308, E0432, E0599.
For more information about an error, try `rustc --explain E0308`
```
* Use index_map::IndexMap instead of Vec<KeyValue> to store SpanBuilder's attributes.
* Test suite compiles.
* Rustfmt
* Wrap IndexMap to expose only the methods that are insertion-order preserving.
* Raise MSRV to 1.51 to get support for const generics.
* Fix doctest.
* Fix lint.
* Add specialised implementations to make it easier to work with KeyValue iterators/collections.
* Bump MSRV to get access to array::map.
* Minimise breakages for existing users.
* Fix invocation.
* Rustfmt
* Add support for concurrent exports
Applications generating significant span volume can end up dropping data
due to the synchronous export step. According to the opentelemetry spec,
This function will never be called concurrently for the same exporter
instance. It can be called again only after the current call returns.
However, it does not place a restriction on concurrent I/O or anything
of that nature. There is an [ongoing discussion] about tweaking the
language to make this more clear.
With that in mind, this commit makes the exporters return a future that
can be spawned concurrently. Unfortunately, this means that the
`export()` method can no longer be async while taking &mut self. The
latter is desirable to enforce the no concurrent calls line of the spec,
so the choice is made here to return a future instead with the lifetime
decoupled from self. This resulted in a bit of additional verbosity, but
for the most part the async code can still be shoved into an async fn
for the ergonomics.
The main exception to this is the `jaeger` exporter which internally
requires a bunch of mutable references. I plan to discuss with the
opentelemetry team the overall goal of this PR and get buy-in before
making more invasive changes to support this in the jaeger exporter.
[ongoing discussion]: https://github.com/open-telemetry/opentelemetry-specification/issues/2434
* SpanProcessor directly manages concurrent exports
Prior, export tasks were run in "fire and forget" mode with
runtime::spawn. SpanProcessor now manages tasks directly using
FuturesUnordered. This enables limiting overall concurrency (and thus
memory footprint). Additionally, flush and shutdown logic now spawn an
additional task for any unexported spans and wait on _all_ outstanding
tasks to complete before returning.
* Add configuration for BSP max_concurrent_exports
Users may desire to control the level of export concurrency in the batch
span processor. There are two special values:
max_concurrent_exports = 0: no bound on concurrency
max_concurrent_exports = 1: no concurrency, makes everything
synchronous on the messaging task.
* Implement new SpanExporter API for Jaeger
Key points
- decouple exporter from uploaders via channel and spawned task
- some uploaders are a shared I/O resource and cannot be multiplexed
- necessitates a task queue
- eg, HttpClient will spawn many I/O tasks internally, AgentUploader
is a single I/O resource. Different level of abstraction.
- Synchronous API not supported without a Runtime argument. I updated
the API to thread one through, but maybe this is undesirable. I'm also
exploiting the fact in the Actix examples that it uses Tokio under the
hood to pass through the Tokio runtime token.
- Tests pass save for a couple of flakey environment ones which is
likely a race condition.
* Reduce dependencies on futures
The minimal necessary futures library (core, util, futures proper) is
now used in all packages touched by the concurrent exporters work.
* Remove runtime from Jaeger's install_simple
To keep the API _actually_ simple, we now leverage a thread to run the
jaeger exporter internals.
* Add Arc lost in a rebase
* Fix OTEL_BSP_MAX_CONCURRENT_EXPORTS name and value
Per PR feedback, the default should match the previous behavior of 1
batch at a time.
* Fix remaining TODOs
This finishes the remaining TODOs on the concurrent-exports branch. The
major change included here adds shutdown functionality to the jaeger
exporter which ensures the exporter has finished its tasks before
exiting.
* Restore lint.sh script
This was erroneously committed.
* Make max concurrent exports env configurable
OTEL_BSP_MAX_CONCURRENT_EXPORTS may now be specified in the environment
to configure the number of max concurrent exports. This configurable now
has parity with the other options of the span_processor.
The spec suggests all spans should have an associated `Resource`. This
change switches trace config and span data from `Option<Arc<Resource>>`
to `Cow<'static, Resource>` and removes `Config::with_no_resource` to
accommodate this requirement.
This adds the convenience method `Span::set_attributes` to set multiple
attributes at a time.
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
* feat(datadog): allow users to override the model.
* feat(datadog): allow users to override the model.
* feat(datadog): allow users to override the model mapping.
* feat(datadog): update submodule
* feat(datadog): update submodule
* doc(datadog): clean up docs
* fix(jaeger): use `ConfigError` for all errors from pipeline/configurations.
* refactor(jaeger): add docs, removed some error types.
- Fix some typo in exporter docs
- Add docs for propagator.
The spec defines status as unset, ok, or error with a description. The
current api allows for confusion and misuse by providing invalid
combinations (e.g. `Ok` with a description). This change simplifies the
api and removes these illegal states by moving the status to be an enum
where only the `Status::Error` case accepts a message.
* feat(jaeger): better configuration pipeline.
- Separate agent pipeline and collector pipeline. it's now `new_agent_pipeline` and `new_collector_pipeline`
- Add `Configurable` trait to include common attributes shared by agent pipeline and collector pipeline.
- Removed `with_tag` method.
- Make build in http client additive. `surf_collector_client`, `isahc_collector_client`, etc. are now just allow user to choose the http client.
* fix(jaeger): Move CommonConfig and HasRequiredConfig to private mod to meet MSRV requirement. Rename CommonConfig to TransformationConfig.
* chore: make format happy.
* chore: make msrv happy.
* test: add unit tests.
* refactor(jaeger): removed the `Configurable` trait
* fix(jaeger): fix code link
This change aligns the error recording methods with Rust's naming
conventions which prefer the term `Error` over `Exception`.
This also removes `Span::record_exception_with_stacktrace` as Rust's
backtrace functionality is not yet stable. Users who wish to record
this data can use `Span::add_event` directly.
This change simplifies the error handling in the `trace` module by
making `TraceStateError` private, and exposing these errors as the more
general `TraceError`.
* Remove `Tracer::with_span`
This method is not part of the trace spec and the same functionality can
be achieved with `mark_span_as_active` or
`Context::current_with_span(span).attach()`.
* Fix dynatrace clippy lints
* fix otlp external example lint
* Fix doc links
* test(jaeger): add integration test for opentelemtry jaeger
* chore(jaeger): ignore testing harness in code coverage.
* chore(common): add opentelemetry-api and opentelemetry-sdk to coverage.
* Split out `opentelemetry` crate into api and sdk crates, and re-export
to maintain existing compatibility.
* Move `util` module to sdk as it was internal and not documented
publicly.
* Remove api doc tests for now that relied on the sdk.
* Switch default text map propagator to be a noop (allowed by the
[spec](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.8.0/specification/context/api-propagators.md#global-propagators))
* Move `ExportError` and `InstrumentationLibrary` to API.
* Move `SamplingResult` and `SamplingDecision` to trace API.
OTEL_EXPORTER_JAEGER_TIMEOUT is now supported for all cases where a
non-custom HTTP Client is utilized.
Closes#528
Signed-off-by: Harold Dost <github@hdost.com>
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
The Metric API Spec is now stable and ValueRecorder was replaced with
Histogram.
* Deprecations - left structs unmarked as clippy threw a fit.
* Update all code examples to use Histograms.
* Remove InstrumentKind::ValueRecorder since it's not part of the API.
** Otherwise we were left with duplicating code int the SDK which does
exactly the same thing.
Signed-off-by: Harold Dost <github@hdost.com>
Remove depcecated call to PrometheusExporter::new() which was deprecated
in favor of using ExporterBuilder for new Exporters.
Signed-off-by: Harold Dost <github@hdost.com>
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
* proto: add opentelemetry-proto crate.
Separate generated proto definitions and it's transformation with types in opentelemety into a new crate.
* proto: fix tests.
* proto: add features in proto
* proto: merge upstream
* refactor(otlp,proto): Clean up features and imports
- add the `grpc-tonic` feature
- add the `build-server` feature
- reordered imports in otlp
BREAKING CHANGE: add the `grpc-tonic` feature
* refactor(otlp,proto): remove prost generated files
We can just use tonic generated types without clients.
* refactor(otlp,proto): update submodule
* refactor(otlp): guard trace related code using trace feature
* docs(proto): add proto docs
* feature(proto): update proto version
Update proto version to 0.9. See https://github.com/open-telemetry/opentelemetry-proto/tree/v0.9.0
* refactor(all): merge upstream
* style(all): add one line at the end of files
* fix(examples): fix external-otlp-grpcio-async-std example
Use the new `trace` feature
* stackdriver: reorder items to keep things local
* stackdriver: avoid spawning inside the library
* stackdriver: use a custom error type
* stackdriver: generate protobuf bindings from an integration test
* switch from `Option::map` to pattern match in sdk tracer: ~16%
perf improvement on span creation.
* optimize for common case of single span processor in sdk span: ~1% perf
improvement on span end
Overall improvement on span start-end benchmark: ~17%
* chore: update the NoHttpClient error message
Adds a hint to the error message to make it better understandable to users how to resolve the error
* chore: Use fully qualified stream import (metrics)
* Add opentelemetry-dynatrace crate
The spec only defines conversions between ids and hex/binary values.
This patch updates the `TraceId` and `SpanId` APIs to be compliant by
removing explicit conversions from u128/u64 (except for test convenience
methods) and adding methods for converting to/from byte arrays and hex
strings.
Updates `Tracer::start_with_context` to accepet a reference instead of an
owned parent `Context`, and adds `Tracer::build_with_context` to allow a
parent context reference to be passed with a builder as well. This also
removes the need for `SpanBuilder`s to store a context.
The [trace spec] requires that `TracerProvider`s MUST accept optional
`version` and `schema_url` parameters. This introduces ergonomic issues
as rust does not have a variadic solution outside of macros, leading to
many calls with only a single relevant argument (e.g. `tracer(name, None, None)`).
This patch splits the current `TracerProvider::tracer` method into
`TracerProvider::versioned_tracer` which implements the spec mandated
optional fields, as well as a `TracerProvider::tracer` method that
accepts only a `name` parameter as a convenience method for the above
listed common case.
[trace spec]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.4.0/specification/trace/api.md#get-a-tracer
* Simplify trace core traits
This patch removes the `std::fmt::Debug` and `'static` requirements from
the `TracerProvider`, `Tracer`, and `Span` traits as they are not
generally necessary. `'static` is only required in the global module
context.
* Remove tests that rely on debug impls
* Fix clippy lints
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
It allows users to pass in a `String` if needed, although it will have a performance overhead because we need to clone the string everytime a span gets exported.
User that pass a `'static str` will not be impacted as `Cow::Borrow` just copies the pointer.
* fix: Mapping between Jaeger processes and Otel process.
The spec maps resource tags to process tags and `service.name` entry in resource to be the service name of the process.
#### `Unknown_service` was used as `service.name` process tags even though users provided another name via `with_service_name` function.
Service name serves a special purpose for Jaeger as it requires every span to have a service name. In the open telemetry model, `service.name` is just a resource, which should be provided by users or have a default value but users can override it to be empty.
To address the difference between those two models. We need to answer the following questions.
1. Should we store the service name within the exporter or store it in resource and extract when exporting?
2. What's the priority of different ways to set the service name?
3. Should we report the `service.name` as a process tags/resource for jaeger spans?
In this PR, we implemented the following process
1. We store the service name as part of the `process` field in the exporter
2. The priority of different methods are listed below from high to low
- `with_service_name` method in `PipelineBuilder`
- passing `service.name` resource from `with_trace_config` method in `PipelineBuilder`
- SDK provided service name(it can come from env vars or default `unknown_service`)
3. We append a `service.name` process tag for each Jaeger span
#### Duplicate process tags
Process tags can be set via `with_tags` function in Jaeger pipeline or by the resource within the trace config(`with_trace_config` function in Jaeger pipeline).
We didn't de-duplicate entries from those two functions.
For this problem, we should deprecate the `with_tags` method and asks users to use `with_trace_config` and store the process tags/resource only in one place.
We can store the process tags in either of the following places
- exporter's process tags
- trace config resources
From a performance standpoint, we should store the tags in the exporter's process tags. Jaeger clients only require a process instance for one batch. If we store the tags as resources, we will copy them for each span in the batch, which will be discarded later in Jaeger clients.
However, storing tags in exporters may cause confusion when users install multiple exporters in the tracer provider. Other exporters could get the resource from the trace config while Jaeger exporter will use resources/tags stored in itself.
* fix: deprecated methods
* docs: add docs around tags, process tags and service name.
* fix: make clippy happy
OTEL_SERVICE_NAME will take priority and if it's not available. We will try to detect the service.name from OTEL_RESOURCE_ATTRIBUTES. If it's also not available. We will use the default one.
Improves performance and ergonomics by allowing `&'static str` or
`String` types to be passed to span methods. Also clarifies the
intention behind the object safe trait variants in the global trace
module.
Currently it is not possible to construct an
`opentelemetry_otlp::SpanExporter` directly. This patch makes both the
builder and the `build_span_exporter` method public so users may
construct one if necessary.
When checking the examples, make sure that it is explicit for readers
that it's important to bind traces and metrics initialization to
unused variables, so they are not dropped early and will live for the
whole lifetime of the containing block.
Missing to assign to an unused variable will result in an early drop
of the reporting logic, and no traces or metrics will be reported.
* bump prometheus dependency to 0.13
0.13.0
Bug fix: Avoid panics from Instant::elapsed (#406)
Improvement: Allow trailing comma on macros (#390)
Improvement: Add macros for custom registry (#396)
Improvement: Export thread count from process_collector (#401)
Improvement: Add convenience TextEncoder functions to encode directly to string (#402)
Internal change: Clean up the use of macro_use and extern crate (#398)
Internal change: Update dependencies
* also update in example
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
* fix: new lint errors from rust 1.55
* fix: removed unsed fields.
* fix: removed attributed from bounded sync instruments(counter, recorder, etc.).
We store the attributed directly in the Records when bind the attributes. So there is no need to save it again in the instruments.
* fix: nightly channel's clippy warnings.
Most of them are unused fields.
1. Removed attributes fields in value_recorder and counter.
2. Removed stateful field in PushController.
3. Added two methods for collector_username and collector_password to help users building customer http client with environment variables.
4. Removed default_summary_quantiles and default_histogram_boundaries in prometheus.
5. Renamed the auth field to be _auth. Pending further investigation on whether we should keep this.
6. Removed kind field in MinMaxSumCountAggregator and histogram
* fixing: renamed `Label` to `Attribute` to align with metric specification
* Label is still being used when referring to Prometheus labels but when referring to OpenTelemetry metrics the term `attribute` is now used.
* The `opentelemetry-proto` library still has the `labels` field and will be deprecated approximately 3 months from July 1st 2021. At that point we can remove it when collecting metrics ( `labels: vec!()`)
* update change log to reference pr
* feat: initial commit for zpages.
* feat: implement span aggregation.
* Added proto.
* We use a dedicated SpanAggregator to collect information from ZPagesProcessor so that we could collect information from multiple span processor in the future should we needed to do so.
* Always copy SpanData information and send it to aggregator. It may seems unnecessary to refresh the information on running span whenever a span is started. But consider that if the span is generating too fast. Some span may fail to notify the aggregator they already ended. So to make sure no span is stuck as running span example. We refresh the running span example whenever there is a new span started.
* feat: merge upstreams changes on span processor.
* feat: add TracezResponse
* refactor: SpanSummary.
We aggregate the information for running spans, error spans and spans in different latencies bucket into SpanStats to better manage them.
* feat: add tracing query handler.
* Updated proto.
* feat: add serialization for TracezResponse.
* feat: improve SpanQueue's performance.
* feat: add tracez functions.
* feat: add examples, add counts in span queue, refactor the tracez queries.
1. We use counts to trace the total number of spans. It's different from the len because len function tracks the number of sampled spans.
2. Added examples on how to query tracez results.
* doc: add documentations.
* fix(doc): Add note that this crate is still in experimental state.
* feat: add method`from_env` to promethes exporter builder
Fixes#293
Allows configuring the `PrometheusExporter` using environment variables defined in the semantic conventions of the Otel specification. Currently, supports setting the host and port for prometheus in the builder and will fall back to the defaults defined in the otel specification.
* addressed feedback in favor of impl `Default` over `from_env`
* fixes clippy lint issues after upgrading to rust 1.54
* convert port from String to u16
An error is logged to the global OpenTelemetry error handling when we fail to parse the port from an OS Environment variable
* Set client to agent udp comm based on runtime
* Remove commented line
* run cargo fmt
* Fix linting issue
* Fix missing implementation of JaegerTraceruntime for TokioCurrentThread
* Add doc for new flags
* Run tests with r-tokio flag
* Expose JaegerTraceRuntime to API
* Rename uploaders and init functions
Co-authored-by: Julian Tescher <jatescher@gmail.com>
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
* fix(otlp): upgrade tonic to 0.5.
We no longer have a universal type for both client with interceptors and the client without interceptors. So instead of using a interceptors we just add the headers for each request as needed.
* chore: remove tower as dependency.
We need hash for span context because zPages implementation need it to move spans around between running, error and finished buckets.
Adding an export_data() function helps the zPages implementation to sample the running spans before they finish.
Clean up docs by:
* Extracting `trace::noop` to clarify their grouping
* Hide internal `util` module docs
* Clean up trace module docs
* Improve `trace::TracerProvider` docs and include examples
opentelemetry's StatusCode enum can be 0 (Unset), 1 (Ok)
or 2 (Error). otel specification says "don't ever use Ok, just leave it
unset".
So for, non-error spans, the opentelemetry-datadog exporter does the
right thing: it sets the datadog trace's status code to 0.
However, for errors, the opentelemetry-datadog exporter sets the
status_code to 2. This isn't explicitly discouraged by the available
Datadog docs, but it causes something really interesting: the trace
shows up with a "red" left border in Datadog APM (instead of a "green"
one). The "Ok/Error" filter filters those traces correctly. However,
half the UI is missing: when opening the trace, there's no '!' in the
left-hand corner, there's no "Errors (1)" tab, and there's no detail
of the error being shown in a user-friendly way.
tl;dr In the Datadog backend, there's some code that does "status_code
!= 0", and some code that does "status_code == 1". The 2 value isn't set
by their Go exporter, and so the opentelemetry-rust exporter should
never use it.
Remove all trace flags except `sampled` as that is the only supported
flag in the current trace spec. Removes `SpanContext::is_deferred` and
`SpanContex::is_debug` as they are also no longer supported.
We choose to break the loop at instrument level because debug info on `Accumulator` can be found via other structs and descriptors seem to be more important for instruments.
Expose the `Error` type in the `global` module to allow users to define their own error handlers. This is currently impossible because `Error` is private and therefore users cannot provide a function that satisfies the constraints of `set_error_handler`.
2021-05-13 12:41:30 -07:00
602 changed files with 82686 additions and 41260 deletions
Currently, the Opentelemetry Rust SDK has two ways to handle errors. In the situation where errors are not allowed to return. One should call global error handler to process the errors. Otherwise, one should return the errors.
The Opentelemetry Rust SDK comes with an error type `openetelemetry::Error`. For different function, one error has been defined. All error returned by trace module MUST be wrapped in `opentelemetry::trace::TraceError`. All errors returned by metrics module MUST be wrapped in `opentelemetry::metrics::MetricsError`.
Currently, the Opentelemetry Rust SDK has two ways to handle errors. In the situation where errors are not allowed to return. One should call global error handler to process the errors. Otherwise, one should return the errors.
For users that want to implement their own exporters. It's RECOMMENDED to wrap all errors from the exporter into a crate-level error type, and implement `ExporterError` trait.
The Opentelemetry Rust SDK comes with an error type `opentelemetry::Error`. For different function, one error has been defined. All error returned by trace module MUST be wrapped in `opentelemetry::trace::TraceError`. All errors returned by metrics module MUST be wrapped in `opentelemetry::metrics::MetricError`. All errors returned by logs module MUST be wrapped in `opentelemetry::logs::LogsError`.
For users that want to implement their own exporters. It's RECOMMENDED to wrap all errors from the exporter into a crate-level error type, and implement `ExporterError` trait.
### Priority of configurations
OpenTelemetry supports multiple ways to configure the API, SDK and other components. The priority of configurations is as follows:
- Environment variables
- Compiling time configurations provided in the source code
### Experimental/Unstable features
Use `otel_unstable` feature flag for implementation of specification with [experimental](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.27.0/specification/document-status.md) status. This approach ensures clear demarcation and safe integration of new or evolving features. Utilize the following structure:
```rust
#[cfg(feature = "otel_unstable")]
{
// Your feature implementation
}
```
It's important to regularly review and remove the `otel_unstable` flag from the code once the feature becomes stable. This cleanup process is crucial to maintain the overall code quality and to ensure that stable features are accurately reflected in the main build.
### Optional features
The potential features include:
- Stable and non-experimental features that are compliant with the specification and have a feature flag to minimize compilation size. Example: feature flags for signals (like `logs`, `traces`, `metrics`) and runtimes (`rt-tokio`, `rt-tokio-current-thread`).
- Stable and non-experimental features, although not part of the specification, are crucial for enhancing the tracing/log crate's functionality or boosting performance. These features are also subject to discussion and approval by the OpenTelemetry Rust Maintainers.
All such features should adhere to naming convention `<signal>_<feature_name>`
## Style Guide
* Run `cargo clippy --all` - this will catch common mistakes and improve
- Run `cargo clippy --all` - this will catch common mistakes and improve
your Rust code
* Run `cargo fmt` - this will find and fix code formatting
- Run `cargo fmt` - this will find and fix code formatting
issues.
## Testing and Benchmarking
* Run `cargo test --all` - this will execute code and doc tests for all
- Run `cargo test --all` - this will execute code and doc tests for all
projects in this workspace.
* Run `cargo bench` - this will run benchmarks to show performance
- Run `cargo bench` - this will run benchmarks to show performance
- Run `cargo bench` - this will run benchmarks to show performance
regressions
## Approvers and Maintainers
See the [code owners](CODEOWNERS) file.
### Become an Approver or a Maintainer
See the [community membership document in OpenTelemetry community
### Where should I put third party propagators/exporters, contrib or standalone crates?
As of now, the specification classify the propagators into three categories: Fully opened standards, platform-specific standards, proprietary headers. The conclusion is only the fully opened standards should live in SDK packages/repos. So here, only fully opened standards should live as independent crate. For more detail and discussion, see [this pr](https://github.com/open-telemetry/opentelemetry-specification/pull/1144).
As of now, the specification classify the propagators into three categories:
Fully opened standards, platform-specific standards, proprietary headers. The
conclusion is only the fully opened standards should live in SDK packages/repos.
So here, only fully opened standards should live as independent crate. For more
The meeting is open for all to join. We invite everyone to join our meeting,
regardless of your experience level. Whether you're a seasoned OpenTelemetry
developer, just starting your journey, or simply curious about the work we do,
you're more than welcome to participate!
## Approvers and Maintainers
### Maintainers
* [Cijo Thomas](https://github.com/cijothomas), Microsoft
* [Harold Dost](https://github.com/hdost)
* [Lalit Kumar Bhasin](https://github.com/lalitb), Microsoft
* [Utkarsh Umesan Pillai](https://github.com/utpilla), Microsoft
* [Zhongyang Wu](https://github.com/TommyCpp)
For more information about the maintainer role, see the [community repository](https://github.com/open-telemetry/community/blob/main/guides/contributor/membership.md#maintainer).
* [Shaun Cox](https://github.com/shaun-cox), Microsoft
For more information about the approver role, see the [community repository](https://github.com/open-telemetry/community/blob/main/guides/contributor/membership.md#approver).
For more information about the emeritus role, see the [community repository](https://github.com/open-telemetry/community/blob/main/guides/contributor/membership.md#emeritus-maintainerapprovertriager).
# Error handling patterns in public API interfaces
## Date
27 Feb 2025
## Summary
This ADR describes the general pattern we will follow when modelling errors in public API interfaces - that is, APIs that are exposed to users of the project's published crates. It summarizes the discussion and final option from [#2571](https://github.com/open-telemetry/opentelemetry-rust/issues/2571); for more context check out that issue.
We will focus on the exporter traits in this example, but the outcome should be applied to _all_ public traits and their fallible operations.
These include [SpanExporter](https://github.com/open-telemetry/opentelemetry-rust/blob/eca1ce87084c39667061281e662d5edb9a002882/opentelemetry-sdk/src/trace/export.rs#L18), [LogExporter](https://github.com/open-telemetry/opentelemetry-rust/blob/eca1ce87084c39667061281e662d5edb9a002882/opentelemetry-sdk/src/logs/export.rs#L115), and [PushMetricExporter](https://github.com/open-telemetry/opentelemetry-rust/blob/eca1ce87084c39667061281e662d5edb9a002882/opentelemetry-sdk/src/metrics/exporter.rs#L11) which form part of the API surface of `opentelemetry-sdk`.
There are various ways to handle errors on trait methods, including swallowing them and logging, panicking, returning a shared global error, or returning a method-specific error. We strive for consistency, and we want to be sure that we've put enough thought into what this looks like that we don't have to make breaking interface changes unnecessarily in the future.
## Design Guidance
### 1. No panics from SDK APIs
Failures during regular operation should not panic, instead returning errors to the caller where appropriate, _or_ logging an error if not appropriate.
Some of the opentelemetry SDK interfaces are dictated by the specification in way such that they may not return errors.
### 2. Consolidate error types within a trait where we can, let them diverge when we can't**
We aim to consolidate error types where possible _without indicating a function may return more errors than it can actually return_.
**Don't do this** - each function's signature indicates that it returns errors it will _never_ return, forcing the caller to write handlers for dead paths:
```rust
enum MegaError {
TooBig,
TooSmall,
TooLong,
TooShort
}
trait MyTrait {
// Will only ever return TooBig,TooSmall errors
fn action_one() -> Result<(), MegaError>;
// These will only ever return TooLong,TooShort errors
fn action_two() -> Result<(), MegaError>;
fn action_three() -> Result<(), MegaError>;
}
```
**Instead, do this** - each function's signature indicates only the errors it can return, providing an accurate contract to the caller:
```rust
enum ErrorOne {
TooBig,
TooSmall,
}
enum ErrorTwo {
TooLong,
TooShort
}
trait MyTrait {
fn action_one() -> Result<(), ErrorOne>;
// Action two and three share the same error type.
// We do not introduce a common error MyTraitError for all operations, as this would
// force all methods on the trait to indicate they return errors they do not return,
// complicating things for the caller.
fn action_two() -> Result<(), ErrorTwo>;
fn action_three() -> Result<(), ErrorTwo>;
}
```
## 3. Consolidate error types between signals where we can, let them diverge where we can't
Consider the `Exporter`s mentioned earlier. Each of them has the same failure indicators - as dictated by the OpenTelemetry spec - and we will
share the error types accordingly:
**Don't do this** - each signal has its own error type, despite having exactly the same failure cases:
```rust
#[derive(Error, Debug)]
pub enum OtelTraceError {
#[error("Shutdown already invoked")]
AlreadyShutdown,
#[error("Operation failed: {0}")]
InternalFailure(String),
/** ... additional errors ... **/
}
#[derive(Error, Debug)]
pub enum OtelLogError {
#[error("Shutdown already invoked")]
AlreadyShutdown,
#[error("Operation failed: {0}")]
InternalFailure(String),
/** ... additional errors ... **/
}
```
**Instead, do this** - error types are consolidated between signals where this can be done appropriately:
```rust
/// opentelemetry-sdk::error
#[derive(Error, Debug)]
pub enum OTelSdkError {
#[error("Shutdown already invoked")]
AlreadyShutdown,
#[error("Operation failed: {0}")]
InternalFailure(String),
/** ... additional errors ... **/
}
pub type OTelSdkResult = Result<(), OTelSdkError>;
/// signal-specific exporter traits all share the same
If this were _not_ the case - if we needed to mark an extra error for instance for `LogExporter` that the caller could reasonably handle -
we would let that error traits diverge at that point.
### 4. Box custom errors where a savvy caller may be able to handle them, stringify them if not
Note above that we do not box any `Error` into `InternalFailure`. Our rule here is that if the caller cannot reasonably be expected to handle a particular error variant, we will use a simplified interface that returns only a descriptive string. In the concrete example we are using with the exporters, we have a [strong signal in the opentelemetry-specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#export) that indicates that the error types _are not actionable_ by the caller.
If the caller may potentially recover from an error, we will follow the generally-accepted best practice (e.g., see [canonical's guide](https://canonical.github.io/rust-best-practices/error-and-panic-discipline.html) and instead preserve the nested error:
**Don't do this if the OtherError is potentially recoverable by a savvy caller**:
```rust
#[derive(Debug, Error)]
pub enum MyError {
#[error("Error one occurred")]
ErrorOne,
#[error("Operation failed: {0}")]
OtherError(String),
```
**Instead, do this**, allowing the caller to match on the nested error:
```rust
#[derive(Debug, Error)]
pub enum MyError {
#[error("Error one occurred")]
ErrorOne,
#[error("Operation failed: {source}")]
OtherError {
#[from]
source: Box<dynError+Send+Sync>,
},
}
```
Note that at the time of writing, there is no instance we have identified within the project that has required this.
### 5. Use thiserror by default
We will use [thiserror](https://docs.rs/thiserror/latest/thiserror/) by default to implement Rust's [error trait](https://doc.rust-lang.org/core/error/trait.Error.html).
This keeps our code clean, and as it does not appear in our interface, we can choose to replace any particular usage with a hand-rolled implementation should we need to.
### 6. Don't use `#[non_exhaustive]` by default
If an `Error` response set is closed - if we can confidently say it is very unlikely to gain new variants in the future - we should not annotate it with `#[non_exhaustive]`. By way of example, the variants of the exporter error types described above are exhaustively documented in the OpenTelemetry Specification, and we can confidently say that we do not expect new variants.
This directory contains architectural decision records made for the opentelemetry-rust project. These allow us to consolidate discussion, options, and outcomes, around key architectural decisions.
* attributes: {`otel.metric.overflow` = `true`}, count: `3` ← Notice this
special overflow attribute
If we later query "How many red apples were sold?" the answer would be 10, not
13, because the Midtown sales were folded into the overflow bucket. Similarly,
queries about "How many items were sold in Midtown?" would return 0, not 3.
However, the total count across all attributes (i.e How many total fruits were
sold in (T3, T4] would correctly give 26) would be accurate.
This limitation applies regardless of whether the attribute in question is
naturally high-cardinality. Even low-cardinality attributes like "color"
become unreliable for querying if they were part of attribute combinations
that triggered overflow.
OpenTelemetry's cardinality capping is only applied to attributes provided
when reporting measurements via the [Metrics API](#metrics-api). In other
words, attributes used to create `Meter` or `Resource` attributes are not
subject to this cap.
#### Cardinality Limits - How to Choose the Right Limit
Choosing the right cardinality limit is crucial for maintaining efficient memory
usage and predictable performance in your metrics system. The optimal limit
depends on your temporality choice and application characteristics.
Setting the limit incorrectly can have consequences:
* **Limit too high**: Due to the SDK's [memory
preallocation](#memory-preallocation) strategy, excess memory will be
allocated upfront and remain unused, leading to resource waste.
* **Limit too low**: Measurements will be folded into the overflow bucket
(`{"otel.metric.overflow": true}`), losing granular attribute information and
making attribute-based queries unreliable.
Consider these guidelines when determining the appropriate limit:
##### Choosing the Right Limit for Cumulative Temporality
Cumulative metrics retain every unique attribute combination that has *ever*
been observed since the start of the process.
* You must account for the theoretical maximum number of attribute combinations.
* This can be estimated by multiplying the number of possible values for each
attribute.
* If certain attribute combinations are invalid or will never occur in practice,
you can reduce the limit accordingly.
###### Example - Fruit Sales Scenario
Attributes:
* `name` can be "apple" or "lemon" (2 values)
* `color` can be "red", "yellow", or "green" (3 values)
The theoretical maximum is 2 × 3 = 6 unique attribute sets.
For this example, the simplest approach is to use the theoretical maximum and **set the cardinality limit to 6**.
However, if you know that certain combinations will never occur (for example, if "red lemons" don't exist in your application domain), you could reduce the limit to only account for valid combinations. In this case, if only 5 combinations are valid, **setting the cardinality limit to 5** would be more memory-efficient.
##### Choosing the Right Limit for Delta Temporality
Delta metrics reset their aggregation state after every export interval. This
approach enables more efficient memory utilization by focusing only on attributes
observed during each interval rather than maintaining state for all combinations.
* **When attributes are low-cardinality** (as in the fruit example), use the
same calculation method as with cumulative temporality.
* **When high-cardinality attribute(s) exist** like `user_id`, leverage Delta
temporality's "forget state" nature to set a much lower limit based on active
usage patterns. This is where Delta temporality truly excels - when the set of
active values changes dynamically and only a small subset is active during any
given interval.
###### Example - High Cardinality Attribute Scenario
Export interval: 60 sec
Attributes:
* `user_id` (up to 1 million unique users)
* `success` (true or false, 2 values)
Theoretical limit: 1 million users × 2 = 2 million attribute sets
But if only 10,000 users are typically active during a 60 sec export interval:
10,000 × 2 = 20,000
**You can set the limit to 20,000, dramatically reducing memory usage during
normal operation.**
###### Export Interval Tuning
Shorter export intervals further reduce the required cardinality:
* If your interval is halved (e.g., from 60 sec to 30 sec), the number of unique
attribute sets seen per interval may also be halved.
> [!NOTE] More frequent exports increase CPU/network overhead due to
> serialization and transmission costs.
##### Choosing the Right Limit - Backend Considerations
While delta temporality offers certain advantages for cardinality management,
your choice may be constrained by backend support:
* **Backend Restrictions:** Some metrics backends only support cumulative
temporality. For example, Prometheus requires cumulative temporality and
cannot directly consume delta metrics.
* **Collector Conversion:** To leverage delta temporality's memory advantages
while maintaining backend compatibility, configure your SDK to use delta
temporality and deploy an OpenTelemetry Collector with a delta-to-cumulative
conversion processor. This approach pushes the memory overhead from your
application to the collector, which can be more easily scaled and managed
independently.
TODO: Add the memory cost incurred by each data points, so users can know the
memory impact of setting a higher limits.
TODO: Add example of how query can be affected when overflow occurs, use
This example shows basic span and metric usage, and exports to the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) via OTLP.
## Prerequisite
You should first start a `opentelemetry-collector` on localhost using the default configuration.
error!(name: "my-event-name",target: "my-system",event_id=20,user_name="otel",user_email="otel@opentelemetry.io",message="This is an example message");