Commit Graph

6650 Commits

Author SHA1 Message Date
Eric Anderson 04f1cc5845 xds: Make XdsNR.RoutingConfig.empty a constant
The field was made final in 4b52639aa but was soon reverted in 3ebb3e192
because of what I assume was a bad merge conflict resolution. The field
has contained an immutable object since its introduction in d25f5acf1,
so it is pretty likely to remain a constant in the future.
2025-01-30 15:10:12 -08:00
Eric Anderson c506190b0f
xds: Reuse filter interceptors across RPCs
This moves the interceptor creation from the ConfigSelector to the
resource update handling.

The code structure changes will make adding support for filter
lifecycles (for RLQS) a bit easier. The filter lifecycles will allow
filters to share state across interceptors, and constructing all the
interceptors on a single thread will mean filters wouldn't need to be
thread-safe (but their interceptors would be thread-safe).
2025-01-30 12:43:51 -08:00
Eric Anderson 90aefb26e7 core: Propagate authority override from LB exactly once
Setting the authority is only useful when creating a real stream, as
there will be a following pick otherwise. In addition, DelayedStream
will buffer each call to setAuthority() in a list and we don't want that
memory usage. Note that no LBs are using this feature yet, so users
would not have been exposed to the memory use.

We also needed to setAuthority() when the LB selected a subchannel on
the first pick attempt.
2025-01-30 08:08:17 -08:00
Abhishek Agrawal 7153ff8522
netty: Removed 4096 min buffer size (#11856)
* netty: Removed 4096 min buffer size
2025-01-30 12:52:37 +05:30
Eric Anderson b3db8c2489 xds: Allow FaultFilter's interceptor to be reused
This is the only usage of PickSubchannelArgs when creating a filter's
ClientInterceptor, and a follow-up commit will remove the argument and
actually reuse the interceptors. Other filter's interceptors can
already be reused.

There doesn't seem to be any significant loss of legibility by making
FaultFilter a more ordinary interceptor, but the change does cause the
ForwardingClientCall to be present when faultDelay is configured,
independent of whether the fault delay ends up being triggered.

Reusing interceptors will move more state management out of the RPC path
which will be more relevant with RLQS.
2025-01-29 14:21:53 -08:00
vinodhabib 9e8629914f
examples: Added README files for all missing Examples (#11676) 2025-01-28 13:02:00 +05:30
Larry Safran 87aa6deadf
core:Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful (#11849)
* Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful.
Move cleanup of removed subchannels into a method so it can be called from 2 places in acceptResolvedAddresses.
Since the seek could mean we never looked at the first address, if we go off the end of the index and haven't looked at the all of the addresses then instead of scheduleBackoff() we reset the index and request a connection.
2025-01-24 16:42:56 -08:00
Boddu Harshavardhan 67351c0c53
gcp-observability: Optimize GcpObservabilityTest.enableObservability execution time (#11783) 2025-01-24 16:50:41 +05:30
MV Shiva bf8eb24a30
Update README etc to reference 1.70.0 (#11854) 2025-01-24 15:21:14 +05:30
Kannan J 0f5503ebb1
xds: Include max concurrent request limit in the error status for concurre… (#11845)
Include max concurrent request limit in the error status for concurrent connections limit exceeded
2025-01-23 21:40:21 +05:30
Eric Anderson 495a8906b2 xds: Fix fallback test FakeClock TSAN failure
d65d3942e increased the test speed of
connect_then_mainServerDown_fallbackServerUp by using FakeClock.
However, it introduced a data race because FakeClock is not thread-safe.
This change injects a single thread for gRPC callbacks such that
syncContext is run on a thread under the test's control.

A simpler approach would be to expose syncContext from XdsClientImpl for
testing. However, this test is in a different package and I wanted to
avoid adding a public method.

```
  Read of size 8 at 0x00008dec9d50 by thread T25:
    #0 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Lio/grpc/internal/FakeClock$ScheduledTask;JLjava/util/concurrent/TimeUnit;)V FakeClock.java:140
    #1 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;)Ljava/util/concurrent/ScheduledFuture; FakeClock.java:150
    #2 io.grpc.SynchronizationContext.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/ScheduledExecutorService;)Lio/grpc/SynchronizationContext$ScheduledHandle; SynchronizationContext.java:153
    #3 io.grpc.xds.client.ControlPlaneClient$AdsStream.handleRpcStreamClosed(Lio/grpc/Status;)V ControlPlaneClient.java:491
    #4 io.grpc.xds.client.ControlPlaneClient$AdsStream.lambda$onStatusReceived$0(Lio/grpc/Status;)V ControlPlaneClient.java:429
    #5 io.grpc.xds.client.ControlPlaneClient$AdsStream$$Lambda+0x00000001004a95d0.run()V ??
    #6 io.grpc.SynchronizationContext.drain()V SynchronizationContext.java:96
    #7 io.grpc.SynchronizationContext.execute(Ljava/lang/Runnable;)V SynchronizationContext.java:128
    #8 io.grpc.xds.client.ControlPlaneClient$AdsStream.onStatusReceived(Lio/grpc/Status;)V ControlPlaneClient.java:428
    #9 io.grpc.xds.GrpcXdsTransportFactory$EventHandlerToCallListenerAdapter.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V GrpcXdsTransportFactory.java:149
    #10 io.grpc.PartialForwardingClientCallListener.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V PartialForwardingClientCallListener.java:39
    ...

  Previous write of size 8 at 0x00008dec9d50 by thread T4 (mutexes: write M0, write M1, write M2, write M3):
    #0 io.grpc.internal.FakeClock.forwardTime(JLjava/util/concurrent/TimeUnit;)I FakeClock.java:368
    #1 io.grpc.xds.XdsClientFallbackTest.connect_then_mainServerDown_fallbackServerUp()V XdsClientFallbackTest.java:358
    ...
```
2025-01-22 16:00:00 -08:00
Eric Anderson fc86084df5 xds: Rename grpc.xds.cluster to grpc.lb.backend_service
The name is being changed to allow the value to be used in more metrics
where xds-specifics are awkward.
2025-01-17 17:16:32 -08:00
Eric Anderson a0a42fc8e6 Update README etc to reference 1.69.1 2025-01-17 13:30:22 -08:00
Eric Anderson f299ef5e58 compiler: Prepare for C++ protobuf using string_view
Protobuf is interested in using absl::string_view instead of const
std::string&. Just copy to std::string as the C++17 build isn't yet
operational and that level of performance doesn't matter.

cl/711732759 b/353571051
2025-01-17 10:44:01 -08:00
MV Shiva b44ebce45d
xds: Envoy proto sync to 2024-11-11 (#11816) 2025-01-17 14:58:52 +05:30
Larry Safran 4d8aff72dc
stub: Eliminate invalid test cases where different threads were calling close from the thread writing. (#11822)
* Eliminate invalid test cases where different threads were calling close from the thread writing.
* Remove multi-thread cancel/write test
2025-01-15 10:43:48 -08:00
Eric Anderson 87c7b7a375 interop-testing: Move soak out of AbstractInteropTest
The soak code grew considerably in 6a92a2a22e. Since it isn't a JUnit
test and doesn't resemble the other tests, it doesn't belong in
AbstractInteropTest. AbstractInteropTest has lots of users, including it
being re-compiled for use on Android, so moving it out makes the
remaining code more clear for the more common cases.
2025-01-14 13:28:10 -08:00
zbilun 87b27b1545
interop-testing: fix peer extraction issue in soak test iterations
This PR resolves an issue with peer address extraction in the soak
test. 

In current `TestServiceClient` implementation, the same
`clientCallCapture` atomic is shared across threads, leading to
incorrect peer extraction. This fix ensures that each thread uses a
local variable for capturing the client call.
2025-01-13 17:57:39 -08:00
Larry Safran 228dcf7a01
core: Alternate ipV4 and ipV6 addresses for Happy Eyeballs in PickFirstLeafLoadBalancer (#11624)
* Interweave ipV4 and ipV6 addresses as per gRFC.
2025-01-13 16:38:16 -08:00
Eric Anderson 7162d2d661 xds: Pass grpc.xds.cluster label to tracer
This is in service to gRFC A89. Since the gRFC isn't finalized this
purposefully doesn't really do anything yet. The grpc-opentelemetry
change to use this optional label will be done after the gRFC is merged.
grpc-opentelemetry currently has a hard-coded list (one entry) of labels
that it looks for, and this label will need to be added.

b/356167676
2025-01-13 11:54:36 -08:00
Larry Safran 176f3eed12
xds: Enable Xds Client Fallback by default (#11817) 2025-01-10 15:25:15 -08:00
Eric Anderson d65d3942e6 xds: Increase speed of fallback test
These changes reduce connect_then_mainServerDown_fallbackServerUp test
time from 20 seconds to 5 s by faking time for the the does-no-exist
timer.

XdsClientImpl only uses the TimeProvider for CSDS cache details, so any
implementation should be fine. FakeXdsClient provides an implementation,
so might as well use it as it is one less clock to think about.
2025-01-10 08:28:48 -08:00
Eric Anderson 82403b9bfa util: Increase modtime in AdvancedTlsX509TrustManagerTest
I noticed an old JDK 8u275 failed on the test because the modification
time's resolution was one second. A newer JDK 8u432 worked fine, so it's
not really a problem for me, but increasing the time difference is
cheap. I used two seconds as that's the resolution available on FAT
(which is unlikely to be TMPDIR, even on Windows).
2025-01-10 08:17:12 -08:00
Eric Anderson 70825adce6 Replace jsr305's GuardedBy with Error Prone's
We should avoid jsr305 and error prone's has the same semantics.
2025-01-10 08:16:48 -08:00
Eric Anderson 7b5d0692cc
Replace jsr305's CheckReturnValue with Error Prone's (#11811)
We should avoid jsr305 and error prone's has the same semantics.

Fixes #8687
2025-01-09 13:45:35 -08:00
Albumen Kevin 73721acc0d
Add UnitTest to verify updateTrustCredentials rotate (#11798)
* Add lastUpdateTime to avoid read
2025-01-09 11:53:36 -08:00
Eric Anderson e61b03cb9f netty: Fix getAttributes() data races in NettyClientTransportTest
Since approximately the LBv2 API (the current API) was introduced, gRPC
won't use a transport until it is ready. Long ago, transports could be
used before they were ready and these old tests were not waiting for the
negotiator to complete before starting. We need them to wait for the
handshake to complete to avoid a test-only data race in getAttributes()
noticed by TSAN.

Throwing away data frames in the Noop handshaker is necessary to act
like a normal handshaker; they don't allow data frames to pass until the
handshake is complete. Without the handling, it goes through invalid
code paths in NettyClientHandler where a terminated transport becomes
ready, and a similar data race.

```
  Write of size 4 at 0x00008db31e2c by thread T37:
    #0 io.grpc.netty.NettyClientHandler.handleProtocolNegotiationCompleted(Lio/grpc/Attributes;Lio/grpc/InternalChannelz$Security;)V NettyClientHandler.java:517
    #1 io.grpc.netty.ProtocolNegotiators$GrpcNegotiationHandler.userEventTriggered(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V ProtocolNegotiators.java:937
    #2 io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(Ljava/lang/Object;)V AbstractChannelHandlerContext.java:398
    #3 io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V AbstractChannelHandlerContext.java:376
    #4 io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(Ljava/lang/Object;)Lio/netty/channel/ChannelHandlerContext; AbstractChannelHandlerContext.java:368
    #5 io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.fireProtocolNegotiationEvent(Lio/netty/channel/ChannelHandlerContext;)V ProtocolNegotiators.java:1107
    #6 io.grpc.netty.ProtocolNegotiators$WaitUntilActiveHandler.channelActive(Lio/netty/channel/ChannelHandlerContext;)V ProtocolNegotiators.java:1011
    ...

  Previous read of size 4 at 0x00008db31e2c by thread T4 (mutexes: write M0, write M1, write M2, write M3):
    #0 io.grpc.netty.NettyClientHandler.getAttributes()Lio/grpc/Attributes; NettyClientHandler.java:345
    #1 io.grpc.netty.NettyClientTransport.getAttributes()Lio/grpc/Attributes; NettyClientTransport.java:387
    #2 io.grpc.netty.NettyClientTransport.newStream(Lio/grpc/MethodDescriptor;Lio/grpc/Metadata;Lio/grpc/CallOptions;[Lio/grpc/ClientStreamTracer;)Lio/grpc/internal/ClientStream; NettyClientTransport.java:198
    #3 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;Lio/grpc/Metadata;)V NettyClientTransportTest.java:953
    #4 io.grpc.netty.NettyClientTransportTest.huffmanCodingShouldNotBePerformed()V NettyClientTransportTest.java:631
    ...
```

```
  Read of size 4 at 0x00008f983a3c by thread T4 (mutexes: write M0, write M1):
    #0 io.grpc.netty.NettyClientHandler.getAttributes()Lio/grpc/Attributes; NettyClientHandler.java:345
    #1 io.grpc.netty.NettyClientTransport.getAttributes()Lio/grpc/Attributes; NettyClientTransport.java:387
    #2 io.grpc.netty.NettyClientTransport.newStream(Lio/grpc/MethodDescriptor;Lio/grpc/Metadata;Lio/grpc/CallOptions;[Lio/grpc/ClientStreamTracer;)Lio/grpc/internal/ClientStream; NettyClientTransport.java:198
    #3 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;Lio/grpc/Metadata;)V NettyClientTransportTest.java:973
    #4 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;)V NettyClientTransportTest.java:969
    #5 io.grpc.netty.NettyClientTransportTest.handlerExceptionDuringNegotiatonPropagatesToStatus()V NettyClientTransportTest.java:425
    ...

  Previous write of size 4 at 0x00008f983a3c by thread T56:
    #0 io.grpc.netty.NettyClientHandler$FrameListener.onSettingsRead(Lio/netty/channel/ChannelHandlerContext;Lio/netty/handler/codec/http2/Http2Settings;)V NettyClientHandler.java:960
    ...
```
2025-01-07 10:25:22 -08:00
MV Shiva ae109727d3
Start 1.71.0 development cycle (#11801) 2025-01-07 14:33:09 +05:30
MV Shiva 1edc4d84d4
xds: Parsing xDS Cluster Metadata (#11741) 2025-01-07 10:03:13 +05:30
Larry Safran 4222f77587
xds:Move creating the retry timer in handleRpcStreamClosed to as late as possible and call close() (#11776)
* Move creating the retry timer in handleRpcStreamClosed to as late as possible and call `close` so that the `call` is cancelled.
Also add some debug logging.
2025-01-06 13:09:42 -08:00
Eric Anderson 6c12c2bd24 xds: Remember nonces for unknown types
If the control plane sends a resource type the client doesn't understand
at-the-moment, the control plane will still expect the client to include
the nonce if the client subscribes to the type in the future.

This most easily happens when unsubscribing the last resource of a type.
Which meant 1cf1927d1 was insufficient.
2025-01-06 11:54:35 -08:00
Eric Anderson 4a0f707331 xds: Avoid depending on io.grpc.xds.Internal* classes
Internal* classes should generally be accessors that are used outside of
the package/project. Only one attribute was used outside of xds, so
leave only that one attribute in InternalXdsAttributes. One attribute
was used by the internal.security package, so move the definition to the
same package to reduce the circular dependencies.
2025-01-03 16:01:10 -08:00
Eric Anderson 1cf1927d1a
xds: Preserve nonce when unsubscribing type
This fixes a regression introduced in 19c9b998.

b/374697875
2025-01-03 12:34:47 -08:00
Eric Anderson 9a712c3f77 xds: Make XdsClient.ResourceStore package-private
There's no reason to use the interface outside of
XdsClientImpl/ControlPlaneClient. Since XdsClientImpl implements the
interface directly, its methods are still public. That can be a future
cleanup.
2025-01-03 11:45:55 -08:00
Benjamin Peterson bac8b32043
Fix equality and hashcode of CancelServerStreamCommand. (#11785)
In e036b1b198, CancelServerStreamCommand got another field. But, its hashCode and equals methods were not updated.
2025-01-03 10:42:42 -08:00
Eric Anderson b272f634c1 Disable Gradle Module Metadata resolution
The module metadata in Guava causes the -jre version to be selected even
when you choose the -android version. Gradle did not give any clues that
this was happening, and while
`println(configurations.compileClasspath.resolve())` shows the different
jar in use, most other diagonstics don't. dependencyInsight can show you
this is happening, but only if you know which dependency has a problem
and read Guava's module metadata first to understand the significance of
the results.

You could argue this is a Guava-specific problem. I was able to get
parts of our build working with attributes and resolutionStrategy
configurations mentioned at
https://github.com/google/guava/releases/tag/v32.1.0 , so that only
Guava would be changed. But it was fickle giving poor error messages or
silently swapping back to the -jre version.

Given the weak debuggability, the added complexity, and the lack of
value module metadata is providing us, disabling module metadata for our
entire build seems prudent.

See https://github.com/google/guava/issues/7575
2025-01-03 09:29:31 -08:00
Eric Anderson fdb9a5a94f gae-interop-testing: Remove duplicate repositories
These repositories are already included from the main build.gradle, so
they don't do anything. Much less do they need to be defined twice in
the same file.
2025-01-03 09:29:08 -08:00
Eric Anderson c96e926e65 examples: Remove references to maven-central.storage-download.googleapis.com
As stated [on its main page][index], it isn't officially supported, so
we shouldn't include it in our examples.

[index]: https://maven-central.storage-download.googleapis.com/index.html
2025-01-03 09:28:42 -08:00
Benjamin Peterson 8c261c3f28
Fix typo in deprecated blocking stub javadoc. (#11772) 2024-12-26 13:31:34 -08:00
vinodhabib 5e8abc6774
examples: Updated the attachHeaders to newAttachHeadersInterceptor in HeaderClientInterceptor (#11759) 2024-12-24 12:18:03 +05:30
John Cormie 1126a8e30b
binder: A standard API for pointing resolvers at a different Android User. (#11775) 2024-12-23 13:29:40 -08:00
Eric Anderson 805cad3782 bazel: Restore DoNotCall ErrorProne check
In e08b9db20 we added `@DoNotCall` annotations to some call sites, but
Bazel used an older version of ErrorProne that complained at times it
shouldn't. The minimum version of Bazel we test/support is now Bazel 6,
well past Bazel 3.4+.
2024-12-23 12:45:42 -08:00
Eric Anderson aafab74087 api: Use package-private IgnoreJRERequirement
This avoids the dependency on animalsniffer-annotations. grpc-api, and
particularly grpc-context, are used many low-level places and it is
beneficial for them to be very low dependency. This brings grpc-context
back to zero-dependency.
2024-12-23 12:45:26 -08:00
Alex Panchenko ebe2b48677
api: StatusRuntimeException without stacktrace - Android compatibility (#11072)
This is an alternative to e36f099be9 that avoids the "fillInStaceTrace"
constructor which is only available starting at Android API level 24.
2024-12-21 17:09:57 -08:00
Vindhya Ningegowda 6516c7387e
xds: Remove xds authority label from metric registration (#11760)
* Remove `grpc.xds.authority` label while registering `grpc.xds_client.resources` gauge, until the label value is available to record.
2024-12-20 19:50:09 -08:00
Larry Safran ea8c31c305
Bidi Blocking Stub (#10318) 2024-12-20 16:16:17 -08:00
Kannan J 7b29111cd0
Update README etc to reference 1.69.0 2024-12-20 11:15:54 -08:00
John Cormie 0b2d44098f
Introduce custom NameResolver.Args (#11669)
grpc-binder's upcoming AndroidIntentNameResolver needs to know the target Android user so it can resolve target URIs in the correct place. Unfortunately, Android's built in intent:// URI scheme has no way to specify a user and in fact the android.os.UserHandle object can't reasonably be encoded as a String at all.

We solve this problem by extending NameResolver.Args with the same type-safe and domain-specific Key<T> pattern used by CallOptions, Context and CreateSubchannelArgs. New "custom" arguments could apply to all NameResolvers of a certain URI scheme, to all NameResolvers producing a particular type of java.net.SocketAddress, or even to a specific NameResolver subclass.
2024-12-19 23:32:47 -08:00
Larry Safran ef7c2d59c1
xds: Fix XDS control plane client retry timer backoff duration when connection closes after results are received (#11766)
* Fix retry timer backoff duration.

* Reset stopwatch when we had results on AdsStream rather than change the delay calculation logic.
2024-12-19 14:46:58 -08:00
Eric Anderson 7601afc213 Bump android animalsniffer signature to API 21
This should have been done as part of 2b4f649b
2024-12-19 10:07:29 -08:00