These changes reduce connect_then_mainServerDown_fallbackServerUp test
time from 20 seconds to 5 s by faking time for the the does-no-exist
timer.
XdsClientImpl only uses the TimeProvider for CSDS cache details, so any
implementation should be fine. FakeXdsClient provides an implementation,
so might as well use it as it is one less clock to think about.
I noticed an old JDK 8u275 failed on the test because the modification
time's resolution was one second. A newer JDK 8u432 worked fine, so it's
not really a problem for me, but increasing the time difference is
cheap. I used two seconds as that's the resolution available on FAT
(which is unlikely to be TMPDIR, even on Windows).
Since approximately the LBv2 API (the current API) was introduced, gRPC
won't use a transport until it is ready. Long ago, transports could be
used before they were ready and these old tests were not waiting for the
negotiator to complete before starting. We need them to wait for the
handshake to complete to avoid a test-only data race in getAttributes()
noticed by TSAN.
Throwing away data frames in the Noop handshaker is necessary to act
like a normal handshaker; they don't allow data frames to pass until the
handshake is complete. Without the handling, it goes through invalid
code paths in NettyClientHandler where a terminated transport becomes
ready, and a similar data race.
```
Write of size 4 at 0x00008db31e2c by thread T37:
#0 io.grpc.netty.NettyClientHandler.handleProtocolNegotiationCompleted(Lio/grpc/Attributes;Lio/grpc/InternalChannelz$Security;)V NettyClientHandler.java:517
#1 io.grpc.netty.ProtocolNegotiators$GrpcNegotiationHandler.userEventTriggered(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V ProtocolNegotiators.java:937
#2 io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(Ljava/lang/Object;)V AbstractChannelHandlerContext.java:398
#3 io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V AbstractChannelHandlerContext.java:376
#4 io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(Ljava/lang/Object;)Lio/netty/channel/ChannelHandlerContext; AbstractChannelHandlerContext.java:368
#5 io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.fireProtocolNegotiationEvent(Lio/netty/channel/ChannelHandlerContext;)V ProtocolNegotiators.java:1107
#6 io.grpc.netty.ProtocolNegotiators$WaitUntilActiveHandler.channelActive(Lio/netty/channel/ChannelHandlerContext;)V ProtocolNegotiators.java:1011
...
Previous read of size 4 at 0x00008db31e2c by thread T4 (mutexes: write M0, write M1, write M2, write M3):
#0 io.grpc.netty.NettyClientHandler.getAttributes()Lio/grpc/Attributes; NettyClientHandler.java:345
#1 io.grpc.netty.NettyClientTransport.getAttributes()Lio/grpc/Attributes; NettyClientTransport.java:387
#2 io.grpc.netty.NettyClientTransport.newStream(Lio/grpc/MethodDescriptor;Lio/grpc/Metadata;Lio/grpc/CallOptions;[Lio/grpc/ClientStreamTracer;)Lio/grpc/internal/ClientStream; NettyClientTransport.java:198
#3 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;Lio/grpc/Metadata;)V NettyClientTransportTest.java:953
#4 io.grpc.netty.NettyClientTransportTest.huffmanCodingShouldNotBePerformed()V NettyClientTransportTest.java:631
...
```
```
Read of size 4 at 0x00008f983a3c by thread T4 (mutexes: write M0, write M1):
#0 io.grpc.netty.NettyClientHandler.getAttributes()Lio/grpc/Attributes; NettyClientHandler.java:345
#1 io.grpc.netty.NettyClientTransport.getAttributes()Lio/grpc/Attributes; NettyClientTransport.java:387
#2 io.grpc.netty.NettyClientTransport.newStream(Lio/grpc/MethodDescriptor;Lio/grpc/Metadata;Lio/grpc/CallOptions;[Lio/grpc/ClientStreamTracer;)Lio/grpc/internal/ClientStream; NettyClientTransport.java:198
#3 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;Lio/grpc/Metadata;)V NettyClientTransportTest.java:973
#4 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;)V NettyClientTransportTest.java:969
#5 io.grpc.netty.NettyClientTransportTest.handlerExceptionDuringNegotiatonPropagatesToStatus()V NettyClientTransportTest.java:425
...
Previous write of size 4 at 0x00008f983a3c by thread T56:
#0 io.grpc.netty.NettyClientHandler$FrameListener.onSettingsRead(Lio/netty/channel/ChannelHandlerContext;Lio/netty/handler/codec/http2/Http2Settings;)V NettyClientHandler.java:960
...
```
* Move creating the retry timer in handleRpcStreamClosed to as late as possible and call `close` so that the `call` is cancelled.
Also add some debug logging.
If the control plane sends a resource type the client doesn't understand
at-the-moment, the control plane will still expect the client to include
the nonce if the client subscribes to the type in the future.
This most easily happens when unsubscribing the last resource of a type.
Which meant 1cf1927d1 was insufficient.
Internal* classes should generally be accessors that are used outside of
the package/project. Only one attribute was used outside of xds, so
leave only that one attribute in InternalXdsAttributes. One attribute
was used by the internal.security package, so move the definition to the
same package to reduce the circular dependencies.
There's no reason to use the interface outside of
XdsClientImpl/ControlPlaneClient. Since XdsClientImpl implements the
interface directly, its methods are still public. That can be a future
cleanup.
The module metadata in Guava causes the -jre version to be selected even
when you choose the -android version. Gradle did not give any clues that
this was happening, and while
`println(configurations.compileClasspath.resolve())` shows the different
jar in use, most other diagonstics don't. dependencyInsight can show you
this is happening, but only if you know which dependency has a problem
and read Guava's module metadata first to understand the significance of
the results.
You could argue this is a Guava-specific problem. I was able to get
parts of our build working with attributes and resolutionStrategy
configurations mentioned at
https://github.com/google/guava/releases/tag/v32.1.0 , so that only
Guava would be changed. But it was fickle giving poor error messages or
silently swapping back to the -jre version.
Given the weak debuggability, the added complexity, and the lack of
value module metadata is providing us, disabling module metadata for our
entire build seems prudent.
See https://github.com/google/guava/issues/7575
These repositories are already included from the main build.gradle, so
they don't do anything. Much less do they need to be defined twice in
the same file.
In e08b9db20 we added `@DoNotCall` annotations to some call sites, but
Bazel used an older version of ErrorProne that complained at times it
shouldn't. The minimum version of Bazel we test/support is now Bazel 6,
well past Bazel 3.4+.
This avoids the dependency on animalsniffer-annotations. grpc-api, and
particularly grpc-context, are used many low-level places and it is
beneficial for them to be very low dependency. This brings grpc-context
back to zero-dependency.
grpc-binder's upcoming AndroidIntentNameResolver needs to know the target Android user so it can resolve target URIs in the correct place. Unfortunately, Android's built in intent:// URI scheme has no way to specify a user and in fact the android.os.UserHandle object can't reasonably be encoded as a String at all.
We solve this problem by extending NameResolver.Args with the same type-safe and domain-specific Key<T> pattern used by CallOptions, Context and CreateSubchannelArgs. New "custom" arguments could apply to all NameResolvers of a certain URI scheme, to all NameResolvers producing a particular type of java.net.SocketAddress, or even to a specific NameResolver subclass.
In 61f19d707a I swapped the signatures to use the version catalog. But I
failed to preserve the `@signature` extension and it all seemed to
work... But in fact all the animalsniffer tasks were completing as
SKIPPED as they lacked signatures. The build.gradle changes in this
commit are to fix that while still using version catalog.
But while it was broken violations crept in. Most violations weren't
too important and we're not surprised went unnoticed. For example, Netty
with TLS has long required the Java 8 API
`setEndpointIdentificationAlgorithm()`, so using `Optional` in the same
code path didn't harm anything in particular. I still swapped it to
Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`.
One important violation has not been fixed and instead I've disabled the
android signature in api/build.gradle for the moment. The violation is
in StatusException using the `fillInStackTrace` overload of Exception.
This problem [had been noticed][PR11066], but we couldn't figure out
what was going on. AnimalSniffer is now noticing this and agreeing with
the internal linter. There is still a question of why our interop tests
failed to notice this, but given they are no longer running on pre-API
level 24, that may forever be a mystery.
[PR11066]: https://github.com/grpc/grpc-java/pull/11066
StructOrError is a more generic API, but we have StatusOr now so we
don't want new usages of StructOrError. Moving StructOrError out of
io.grpc.xds.client will make it easier to delete StructOrError once
we've migrated to StatusOr in the future.
TRANSPORT_SOCKET_NAME_TLS should also move, but it wasn't immediately
clear to me where it should go.
ObjectPool is our standard solution for dealing with the
sometimes-shutdown resources. This was implemented by a contributor not
familiar with regular tools.
There are wider changes that can be made here, but I chose to just do a
smaller change because this class is used by GrpclbNameResolver.
The channel log is shared by many components and is poorly suited to
the noise of per-RPC events. This commit restricts RLS usage of the
logger to no more frequent than cache entry events. This may still be
too frequent, but should substantially improve the signal-to-noise and
we can do further rework as needed.
Many of the log entries were poor because they lacked enough context.
They weren't even clear they were from RLS. The cache entry events now
regularly include the request key in the logs, allowing you to follow
events for specific keys. I would have preferred using the hash code,
but NumberFormat is annoying and toString() may be acceptable given its
convenience.
This commit reverts much of eba699ad. Those logs have not proven to be
helpful as they produce more output than can be reasonably stored.
This was noticed because of a CallOptionsTest flake that had a
surprising error:
```
expected : 59.983387319
but was : 59.983387319
outside tolerance in seconds: 0.01
```