The problem was one hedge was committed before another had drained
start(). This was not testable because HedgingRunnable checks whether
scheduledHedgingRef is cancelled, which is racy, but there's no way to
deterministically trigger either race.
The same problem couldn't be triggered with retries because only one
attempt will be draining at a time. Retries with cancellation also
couldn't trigger it, for the surprising reason that the noop stream used
in cancel() wasn't considered drained.
This commit marks the noop stream as drained with cancel(), which allows
memory to be garbage collected sooner and exposes the race for tests.
That then showed the stream as hanging, because inFlightSubStreams
wasn't being decremented.
Fixes#9185
This flag is added in the U SDK, which is still under development. Since it's just a numeric constant, we copy the value until it is stable and mark the API is experimental, with appropriate warnings about depending on it from production code.
A follow-up change will be made after SDK finalization to point to the official constant (or otherwise update to match any SDK changes), at which point we can remove the `@ExternalApi` annotation.
See b/274061424
There was recently a failure with the Tomcat test in servlet/jakarta:
```
io.grpc.servlet.jakarta.TomcatInteropTest > pingPong FAILED
java.lang.AssertionError at AbstractInteropTest.java:845
Caused by: io.grpc.StatusRuntimeException at Status.java:539
...
* What went wrong:
Execution failed for task ':grpc-servlet-jakarta:tomcat10Test'.
> There were failing tests. See the report at: file:///home/runner/work/grpc-java/grpc-java/servlet/jakarta/build/reports/tests/tomcat10Test/index.html
```
But we couldn't get more details because servlet/jakarta didn't match
the artifact glob.
LoadWorkerTest.runUnaryBlockingClosedLoop and Http2NettyTest.tlsInfo are
failing every CI run. It appears they are the unfortunate tests run
first, so are slowest to start as classloading proceeds. There's
definitely other tests that probably need adjustment, but fixing these
two gives us some hope of having a green run occasionally.
* removed populating monitored resource to k8s_conatiner by default for logging; Delegating the resource detection to cloud logging library instead (enabled by default)
* remove kubernetes resource detection logic from observability
Currently the code maintains one LoadStatsManager2 that collects all
stats. The problem with this is that in a federation situation there
will be multiple LrsClients that will be periodically picking up stats
from the manager and sending them to their respective control planes.
This creates a first-come-first-serve situation where the stats get
randomly distributed across the control planes.
This change creates separate LoadStatsManagers dedicated to their own
control planes, thus assuring no stats will get lost.
xds: Correctly start LRS clients in federation situations
The old code used a single member variable to indicate if load reporting
had already been started by XdsClientImpl. This boolean was used to
avoid starting a LoadReportClient more than twice. This works fine with
a single control plane server.
The problem occurs in federation situations where there is more than one
control plane and thus more than one LoadReportClient. Once the first
LoadReportClient is started, the member variable boolean is flipped to
true and no other LoadReportClients would be started.
This change removes the boolean member variable and relies on the fact
that starting an already started LoadReportClient is a no-op.
Provides a server with both a greet service and the health service.
Client has an example of using the health service directly through the unary call
<a href="https://github.com/grpc/grpc-java/blob/master/services/src/main/proto/grpc/health/v1/health.proto">check</a>
to get the current health. It also utilizes the health of the server's greet service
indirectly through the round robin load balancer, which uses the streaming rpc
<strong>watch</strong> (you can see how it is done in
{@link io.grpc.protobuf.services.HealthCheckingLoadBalancerFactory}).
* Fix order dependent test by changing the initializations and comparison so that elapsed time isn't as significant in identifying whether it was the context or call option's duration that was used.
fixes b/271122310
The coveralls task has been silently failing since we migrated to GitHub
Actions, away form Travis-CI:
```
no COVERALLS_REPO_TOKEN environmental variable found
no available CI service
> Task :grpc-all:coveralls
BUILD SUCCESSFUL in 23s
7 actionable tasks: 1 executed, 6 up-to-date
```
We'd rather not deal with private tokens, but the Coveralls GitHub
Action [only supports lcov][1] which makes it unhelpful for Java.
Looking deeper, yep, we [aren't the only ones impacted][2]:
[1]: https://github.com/marketplace/actions/coveralls-github-action
[2]: https://github.com/coverallsapp/github-action/issues/22
* Added s390x platform support
* Adapt to existing platform naming scheme
* Updated s390_64 library whitelist
* Use g++ compiler version 8.x for s390x
* Introduced dedicated Docker container for building s390x artifacts Minor fix
---------
Signed-off-by: Dirk Haubenreisser <haubenr@de.ibm.com>
Co-authored-by: Eric Anderson <ejona@google.com>
This PR adds a default custom tag for metrics, irrespective of custom
tags being present in the observability configuration.
OpenCensus by default adds a custom tag
[opencenus_task](https://docs.google.com/document/d/1sWC-XD277cM0PXxAhzJKY2X0Uj2W7bVoSv-jvnA0N8Q/edit?resourcekey=0-l-wqh1fctxZXHCUrvZv2BQ#heading=h.xy85j580eik0)
for metrics which gets overriden if custom tags are set.
The unique custom tag is required to ensure the uniqueness of the
Timeseries. The format of the default custom tag is:
`java-{PID}@{HOSTNAME}`, if `{PID}` is not available a random number
will be used.
This commit adds sleep in `close()` for metrics and/or traces to be
flushed before closing observability.
Currently sleep is set to 2 * [Metrics export interval (30 secs)].
This commit adds trace information (TraceId, SpanId and TraceSampled)
fields to LogEntry, when both logging and tracing are enabled in
gcp-observability.
For server-side logs, span information was readily available using
Span.getContext() propagated via `io.grpc.Context`. Similar approach is
not feasible for client-side architecture.
Client SpanContext which has all the information required to be added
to logs is propagated to the logging interceptor via `io.grpc.CallOptions`.
This provides an example on how a client can specify a deadline for an RPC. Also covers how deadlines are propagated to further RPCs a server might make.
Extensive README, a server that exposes channelz and has pauses, and a client that uses multiple channels also exposes channelz service and has a 30 second delay to allow people to run the grpcdebug tool.
Fixit b/259286633
This commit uses [OpenCensus Annotation][] to report message size
[bytes] for inbound/received messages in traces.
`addMessageEvent` API which is currently used expects both uncompressed
and compressed message (optional) sizes to be reported at the same.
Since decompression for messages happens at a later point in time,
reporting compressed message as is and reporting uncompressed size as
`-1` renders the size as _0 bytes received_ in cloud tracing front end.
As a workaround, we add _two annotations for each received message_:
* For compressed message size
* For uncompressed message size (when it is available)
This commit also removes `addMessageEvents` a flag introduced in
PR #9485 to temporarily suppress message events for gcp-observability.
[OpenCensus Annotation]: https://www.javadoc.io/static/io.opencensus/opencensus-api/0.31.0/io/opencensus/trace/Annotation.html
Allows using Android's LocalSocket via a Socket adapter. Such an adapter
isn't generally 100% safe, since some methods may not have any effect,
but we know what methods are called by gRPC's okhttp transport and can
update the adapter or the transport as appropriate.