Commit Graph

28 Commits

Author SHA1 Message Date
MV Shiva ca43d78f58
inprocess: Support tracing message sizes (#11406) 2024-10-11 10:28:51 +05:30
yifeizhuang 6335e0be3e
core: DEADLINE_EXCEEDED gives hints for slow resolver (#10545) 2023-09-08 15:09:18 -07:00
yifeizhuang 14ba959545
census: add delayed name resolution tracing annotation (#10044) 2023-04-14 11:44:50 -07:00
yifeizhuang fc4410f159
api, census: add new pendingStreamCreated on clientStreamTracer and new tracer annotation (#10014) 2023-04-11 13:49:33 -07:00
DNVindhya 844de39c26
gcp-observability, census: add trace information to logs (#9963)
This commit adds trace information (TraceId, SpanId and TraceSampled)
fields to LogEntry, when both logging and tracing are enabled in
gcp-observability. 

For server-side logs, span information was readily available using
Span.getContext() propagated via `io.grpc.Context`. Similar approach is
not feasible for client-side architecture.

Client SpanContext which has all the information required to be added
to logs is propagated to the logging interceptor via `io.grpc.CallOptions`.
2023-03-20 14:18:16 -07:00
DNVindhya b09473b0d3
census: Trace annotation for reporting inbound message sizes (#9944)
This commit uses [OpenCensus Annotation][] to report message size
[bytes] for inbound/received messages in traces.

`addMessageEvent` API which is currently used expects both uncompressed
and compressed message (optional) sizes to be reported at the same.
Since decompression for messages happens at a later point in time,
reporting compressed message as is and reporting uncompressed size as
`-1` renders the size as _0 bytes received_ in cloud tracing front end.

As a workaround, we add _two annotations for each received message_:
* For compressed message size
* For uncompressed message size (when it is available)

This commit also removes `addMessageEvents` a flag introduced in
PR #9485 to temporarily suppress message events for gcp-observability.

[OpenCensus Annotation]: https://www.javadoc.io/static/io.opencensus/opencensus-api/0.31.0/io/opencensus/trace/Annotation.html
2023-03-10 16:19:21 -08:00
DNVindhya 66f95b7ade
census: add per call latency metric (#9906)
* updated call latency measure with AGGREGATION_WITH_MILLIS_HISTOGRAM; added test for call latency view
2023-03-03 09:36:14 -08:00
Benjamin Peterson ae6c506f96
all: fix build with errorprone 2.18 (#9886)
errorprone cannot be updated past 2.10 because later versions do not support Java 8.

Fixes https://github.com/grpc/grpc-java/issues/9916.
2023-03-01 13:45:18 -08:00
DNVindhya 7942b9e7f5
gcp-observability: add new *compressed_bytes_per_rpc views (#9893)
* Add new metric views; Register latency and bytes per metrics views for observability
* update view description to inlcude client/server
* Use pre-defined aggregation from OpenCensus for bytes histogram
* updated view names
2023-02-15 17:17:39 -08:00
Eric Anderson d17a2db4bd Upgrade to Checkstyle 8.28
Trying to upgrade Gradle to 7.6 improved the checkstyle plugin such that
it appears to have been running in new occasions. That in turn exposed
us to https://github.com/checkstyle/checkstyle/issues/5088. That bug was
fixed in 8.28, which also fixed lots of other bugs. So now we have
better checking and some existing volations needed fixing. Since the
code style fixes generated a lot of noise, this is a pre-fix to reduce
the size of a Gradle upgrade.

I did not upgrade past 8.28 because at some point some other bugs were
introduced, in particular with the Indentation module. I chose the
oldest version that had the particular bug impacting me fixed. Upgrading
to this old-but-newer version still makes it easier to upgrade to a
newer version in the future.
2023-01-05 17:07:04 -08:00
Justin Bassett 1241946c15 Always update the TagContext in filterContext()
This is a potential optimization, depending on the OpenCensus
implementation installed. Currently, the code checks TagContext equality
against the empty TagContext as an optimization to avoid updating the
Context, but checking equality may not be cheaper than updating the
Context. In particular, this condition is almost always true, because we
[update the parentCtx with an additional
tag](bacf18db8d/census/src/main/java/io/grpc/census/CensusStatsModule.java (L770-L774)).
It's only true when there is no OpenCensus implementation (i.e. some
kind of no-op implementation that ignores TagContexts) or when the empty
TagContext already has that tag set to the specific value which varies
depending on the RPC method.

For the default OpenCensus implementation, the equality check is
equality between two maps. Other implementations may require additional
work to determine equality.

Basically, we are trading off rarely avoiding updating the Context for
always doing at least a map equality or potentially even more depending
on the OpenCensus implementation in use. Given this, it makes sense to
just always update the Context.
2022-11-17 10:22:43 -08:00
Eric Anderson bacf18db8d
census: Avoid deprecated measure constants
Many of the deprecated constants are just aliases for non-deprecated
constants, so just swap which one we use.
2022-09-12 08:19:21 -07:00
sanjaypujare 6131a85196
census,observability: suppress message-events in traces when used by observability (#9485) 2022-08-26 16:39:54 -07:00
Larry Safran 01d5bd47cb
Cleanup some of the warnings across the code base (#9445)
No logic changes, just cleans up warnings to make spotting real problems easier.

Remove "public" declarations on interfaces
Remove duplicate semicolons (Java lines ending in ";;")
Remove unneeded import
Change non-javadoc comment to not start with "/**"
Remove unneeded explicit type declarations from generics
Fix broken javadoc links
2022-08-15 11:06:31 -07:00
DNVindhya 15ecc0714e
gcp-observability: add grpc-census as a dependency and update opencensus version (#9140) 2022-05-03 20:52:34 -07:00
ZHANG Dapeng 042f9879d4
all: remove deprecated StreamInfo.transportAttrs (#8768)
APIs such as `StreamInfo.getTransportAttrs()` were [deprecated](860e97d12a (diff-aa4049f54d6d5d462700e9221344184a37d2068b3ba7d715abd417b1df5bf883R114)) since 1.41.0. Removing now.
2021-12-20 09:46:25 -08:00
ZHANG Dapeng a4334eb5c3
census: fix NPE in calling recordFinishedAttempt() (#8706)
Fix the NPE as shown in the following stacktrace:

```
Caused by: java.lang.RuntimeException: java.lang.NullPointerException with message: null
        at io.grpc.census.CensusStatsModule$ClientTracer.recordFinishedAttempt(CensusStatsModule.java:388) ~[grpc-census-1.42.0.jar:1.42.0]
        at io.grpc.census.CensusStatsModule$CallAttemptsTracerFactory.recordFinishedCall(CensusStatsModule.java:525) ~[grpc-census-1.42.0.jar:1.42.0]
        at io.grpc.census.CensusStatsModule$CallAttemptsTracerFactory.attemptEnded(CensusStatsModule.java:492) ~[grpc-census-1.42.0.jar:1.42.0]
        at io.grpc.census.CensusStatsModule$ClientTracer.streamClosed(CensusStatsModule.java:345) ~[grpc-census-1.42.0.jar:1.42.0]
        at io.grpc.internal.StatsTraceContext.streamClosed(StatsTraceContext.java:155) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractClientStream$TransportState.closeListener(AbstractClientStream.java:458) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractClientStream$TransportState.access$400(AbstractClientStream.java:221) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractClientStream$TransportState$1.run(AbstractClientStream.java:442) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractClientStream$TransportState.deframerClosed(AbstractClientStream.java:278) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.Http2ClientStreamTransportState.deframerClosed(Http2ClientStreamTransportState.java:31) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.MessageDeframer.close(MessageDeframer.java:233) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.MessageDeframer.closeWhenComplete(MessageDeframer.java:191) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractStream$TransportState.closeDeframer(AbstractStream.java:200) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:445) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:401) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.AbstractClientStream$TransportState.inboundTrailersReceived(AbstractClientStream.java:384) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.internal.Http2ClientStreamTransportState.transportTrailersReceived(Http2ClientStreamTransportState.java:183) ~[grpc-core-1.42.0.jar:1.42.0]
        at io.grpc.netty.shaded.io.grpc.netty.NettyClientStream$TransportState.transportHeadersReceived(NettyClientStream.java:334) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
        at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.onHeadersRead(NettyClientHandler.java:372) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
        at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.access$1200(NettyClientHandler.java:91) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
        at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler$FrameListener.onHeadersRead(NettyClientHandler.java:934) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
```

The NPE can happen when `ClientCall.Listener.onClose()` and `StatsTraceContext.streamClosed()` (or `ClientStreamListener.closed()`) are invoked concurrently in different threads.  Note that `CensusStatsModule$CallAttemptsTracerFactory.attemptEnded()` in the above stack trace would observe `callEnded==true` in such a race condition.

The following are the possible scenarios that the race between `ClientCall.Listener.onClose()` and `ClientStreamListener.closed()` can happen:

- Deadline exceeded but the underlying real stream is [not committed](https://github.com/grpc/grpc-java/blob/v1.42.0/core/src/main/java/io/grpc/internal/RetriableStream.java#L486-L495), the `ClientCall.Listener` may be closed earlier than the stream listener. (This is the case of the above stack trace, in which the inbound end-of-stream is received observing `callEnded==true`. Even if nothing inbound is received, there is still a chance that the NPE can happen.)
- DelayedClientTransport.PendingStream has created a realStream but `setStream(realStream)` is [not called yet](https://github.com/grpc/grpc-java/blob/v1.42.0/core/src/main/java/io/grpc/internal/DelayedClientTransport.java#L366-L372), when deadline exceeded. (This has little chance to happen, only for the very first RPC on the channel.)
- Hedging case.

In deadline-exceeded cases, the shorter the deadline is, the more likely the race can happen.
2021-11-29 08:48:51 -08:00
ZHANG Dapeng 7c6f53ab79
all: add internal API to disable retry stats (#8510)
Resolves b/197648853 for internal performance regression. Reporting retry stats caused significant amount of performance overhead internally.
2021-09-13 09:12:04 -07:00
Sergii Tkachenko 6cd911757a
census: make internal linter happy
TODO is preferred to FIXME.
2021-09-03 13:26:43 -07:00
ZHANG Dapeng 2faa748797
census: Fix retry stats data race (#8459)
There is data race in `CensusStatsModule. CallAttemptsTracerFactory`:

If client call is cancelled while an active stream on the transport is not committed, then a [noop substream](https://github.com/grpc/grpc-java/blob/v1.40.0/core/src/main/java/io/grpc/internal/RetriableStream.java#L486) will be committed and the active stream will be cancelled. Because the active stream cancellation triggers the stream listener closed() on the _transport_ thread, the closed() method can be invoked concurrently with the call listener onClose(). Therefore, one `CallAttemptsTracerFactory.attemptEnded()` can be called concurrently with `CallAttemptsTracerFactory.callEnded()`, and there could be data race on RETRY_DELAY_PER_CALL. See also the regression test added.

The same data race can happen in hedging case when one of hedges is committed and completes the call, other uncommitted hedges would cancel themselves and trigger their stream listeners closed() on the transport_thread concurrently. 

Fixing the race by recording RETRY_DELAY_PER_CALL once both the conditions are met: 
- callEnded is true 
- number of active streams is 0.
2021-09-02 10:24:22 -07:00
ZHANG Dapeng fd2a58a55e
all: implement retry stats (#8362) 2021-08-11 10:24:37 -07:00
ZHANG Dapeng 860e97d12a
all: API refactoring in preparation to support retry stats (#8355)
Rebased PR #8343 into the first commit of this PR, then (the 2nd commit) reverted the part for metric recording of retry attempts. The PR as a whole is mechanical refactoring. No behavior change (except that some of the old code path when tracer is created is moved into the new method `streamCreated()`).

The API change is documented in go/grpc-stats-api-change-for-retry-java
2021-07-31 18:33:02 -07:00
Eric Anderson 020fb36753 Fix lint warnings 2020-08-07 14:49:50 -05:00
Eric Anderson e92b2275f9 Update to Error Prone 2.4
Most of the changes should be semi-clear why they were made. However, BadImport
may not be as obvious: https://errorprone.info/bugpattern/BadImport . That
impacted classes named Type, Entry, and Factory. Also
PublicContructorForAbstractClass:
https://errorprone.info/bugpattern/PublicConstructorForAbstractClass

The JdkObsolete issue is already resolved but is not yet in a release.
2020-08-06 10:56:16 -05:00
Eric Anderson 80d62bfce2 Upgrade to Mockito 3.3.3
verifyZeroInteractions has the same behavior as verifyNoMoreInteractions. It
was deprecated in Mockito 3.0.1 and replaced with verifyNoInteractions, which
does not change behavior depending on previous verify() calls. All instances
were replaced with verifyNoInteractions, except those in
ApplicationThreadDeframerTest which were replaced with verifyNoMoreInteractions
since there is a verify() call in `@Before`.
2020-08-06 10:49:23 -05:00
Chengyuan Zhang e6d8c27448
Revert "census: Set SpanKind on Client/Server traces (#6680)" (#6729)
This reverts commit 60bc74620f.
2020-02-19 13:59:58 -08:00
Georg Welzel 60bc74620f
census: Set SpanKind on Client/Server traces (#6680) 2020-02-07 09:48:29 -08:00
Chengyuan Zhang b7ccc0d142
core, census, testing, interop-testing, android-interop-testing: move census dependency out of grpc-core (#6577)
Decouples grpc-core with census, while still preserve the default integration of census in grpc-core. Users wishing to enable census needs to add grpc-census to their runtime classpath.

- Created a grpc-census module:
    - Moved CensusStatsModule.java and CensusTracingModule.java into grpc-census from grpc-core. CensusModuleTests.java is also moved. They now belong to io.grpc.census package.
Moved DeprecatedCensusConstants.java into io.grpc.census.internal (is this necessary?) in grpc-census.
    - Created CensusStatsAccessor.java and CensusTracingAccessor.java, which are used to create census ClientInterceptor and ServerStreamTracer.Factory.
    - Everything in grpc-census are package private, except the accessor classes. They only publicly expose ClientInterceptor and ServerStreamTracer.Factory, no Census specific types are exposed.

- Use runtime reflection to load and apply census stats/tracing to channel/server builders, if grpc-census is found in runtime classpath.

- Removed special APIs on AbstractManagedChannelImplBuilder and AbstractServerImplBuilder for overriding census module. They are only used for testing. Now we changed tests to apply Census ClientInterceptor and ServerStreamTracer.Factory just as normal interceptor/stream tracer factory. Test writer is responsible for taking care of the ordering concerns of interceptors and stream tracer factories.
2020-01-13 14:35:29 -08:00