Commit Graph

6837 Commits

Author SHA1 Message Date
MV Shiva 12197065fe
xds: xDS-based HTTP CONNECT configuration (#11861) 2025-03-06 13:40:18 +05:30
MV Shiva c340f4a2f3
rls: allow maxAge to exceed 5m if staleAge is set (#11931) 2025-03-04 10:02:03 +05:30
Sergii Tkachenko 1a2285b527
xds: ensure server interceptors are created in a sync context (#11930)
`XdsServerWrapper#generatePerRouteInterceptors` was always intended
to be executed within a sync context. This PR ensures that by calling
`syncContext.throwIfNotInThisSynchronizationContext()`.

This change is needed for upcoming xDS filter state retention because
the new tests in XdsServerWrapperTest flake with this NPE:

> `Cannot invoke "io.grpc.xds.client.XdsClient$ResourceWatcher.onChanged(io.grpc.xds.client.XdsClient$ResourceUpdate)" because "this.ldsWatcher" is null`
2025-03-03 14:28:36 -08:00
Kannan J cdab410b81
netty: Per-rpc authority verification against peer cert subject names (#11724)
Per-rpc verification of authority specified via call options or set by the LB API against peer cert's subject names.
2025-02-24 20:28:11 +05:30
Eric Anderson 57124d6b29 Use acceptResolvedAddresses() in easy cases
We want to move away from handleResolvedAddresses(). These are "easy" in
that they need no logic. LBs extending ForwardingLoadBalancer had the
method duplicated from handleResolvedAddresses() and swapped away from
`super` because ForwardingLoadBalancer only forwards
handleResolvedAddresses() reliably today. Duplicating small methods was
less bug-prone than dealing with ForwardingLoadBalancer.
2025-02-20 21:25:55 -08:00
Eric Anderson 110c1ff0d6 xds: Use acceptResolvedAddresses() for PriorityLb children
PriorityLb should propagate config problems up to the name resolver so
it can refresh.
2025-02-20 16:35:54 -08:00
Eric Anderson f207be39a9 util: Remove GracefulSwitchLb.switchTo()
It was deprecated in 85e0a01ec, so has been deprecated for six
releases/over six months.
2025-02-20 16:06:37 -08:00
Daniel Liu 892144dcac
xds: explicitly set request hash key for the ring hash LB policy
Implements [gRFC A76: explicitly setting the request hash key for the
ring hash LB policy][A76]
* Explictly setting the request hash key is guarded by the
  `GRPC_EXPERIMENTAL_RING_HASH_SET_REQUEST_HASH_KEY` environment
  variable until API stabilized. 

Tested:
* Verified end-to-end by spinning up multiple gRPC servers and a gRPC
  client that injects a custom service (load balancing) config with
  `ring_hash_experimental` and a custom `request_hash_header` (with
  NO associated value in the metadata headers) which generates a random
  hash for each request to the ring hash LB. Verified picks/RPCs are
  split evenly/uniformly across all backends.
* Ran affected unit tests with thread sanitizer and 1000 iterations to
  prevent data races.

[A76]: https://github.com/grpc/proposal/blob/master/A76-ring-hash-improvements.md#explicitly-setting-the-request-hash-key
2025-02-19 20:25:33 -08:00
Riya Mehta 68d79b5130
s2a: Use protos published under com.google.s2a.proto.v2. (#11908) 2025-02-19 16:59:50 -08:00
Kannan J 60f6ea7b8e
Upgrade gradle and gradle plugin versions. (#11906)
Upgrading to Gradle 8.11.
Gradle 8.12 requires newer versions of Windows (gradle/gradle#31939) that we can look into later.
2025-02-19 17:25:54 +05:30
Sergii Tkachenko 2b87b01651
xds: Change how xDS filters are created by introducing Filter.Provider (#11883)
This is the first step towards supporting filter state retention in
Java. The mechanism will be similar to the one described in [A83]
(https://github.com/grpc/proposal/blob/master/A83-xds-gcp-authn-filter.md#filter-call-credentials-cache)
for C-core, and will serve the same purpose. However, the
implementation details are very different due to the different nature
of xDS HTTP filter support in C-core and Java.

In Java, xDS HTTP filters are backed by classes implementing
`io.grpc.xds.Filter`, from here just called "Filters". To support
Filter state retention (next PR), Java's xDS implementation must be
able to create unique Filter instances per:
- Per HCM
  `envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager`
- Per filter name as specified in
  `envoy.extensions.filters.network.http_connection_manager.v3.HttpFilter.name`

This PR **does not** implements Filter state retention, but lays the
groundwork for it by changing how filters are registered and
instantiated. To achieve this, all existing Filter classes had to be
updated to the new instantiation mechanism described below.

Prior to these this PR, Filters had no livecycle. FilterRegistry
provided singleton instances for a given typeUrl. This PR introduces
a new interface `Filter.Provider`, which instantiates Filter classes.
All functionality that doesn't need an instance of a Filter is moved
to the Filter.Provider. This includes parsing filter config proto
into FilterConfig and determining the filter kind
(client-side, server-side, or both).

This PR is limited to refactoring, and there's no changes to the
existing behavior. Note that all Filter Providers still return
singleton Filter instances. However, with this PR, it is now possible
to create Providers that return a new Filter instance each time
`newInstance` is called.
2025-02-18 10:47:01 -08:00
Eric Anderson 713607056e util: Use acceptResolvedAddresses() for MultiChildLb children
A failing Status from acceptResolvedAddresses means something is wrong
with the config, but parts of the config may still have been applied.
Thus there are now two possible flows: errors that should prevent
updateOverallBalancingState() and errors that should have no effect
other than the return code. To manage that, MultChildLb must always be
responsible for calling updateOverallBalancingState().
acceptResolvedAddressesInternal() was inlined to make that error
processing easier. No existing usages actually needed to have logic
between updating the children and regenerating the picker.

RingHashLb already was verifying that the address list was not empty, so
the short-circuiting when acceptResolvedAddressesInternal() returned an
error was impossible to trigger. WrrLb's updateWeightTask() calls the
last picker, so it can run before acceptResolvedAddressesInternal(); the
only part that matters is re-creating the weightUpdateTimer.
2025-02-18 07:33:49 -08:00
Kannan J a132123c93
Start 1.72.0 development cycle (#11907) 2025-02-18 19:46:02 +05:30
Naveen Prasanna V 16edf7ac4e
Examples: Updated HelloWorldServer to use Executor (#11850) 2025-02-18 14:40:18 +05:30
Eric Anderson 16d26726cf
s2a: Don't allow S2AStub to be set
S2AStub is an internal API and shouldn't be used outside of s2a. It is
still available for tests.

IntegrationTest was moved to io.grpc.s2a. It uses a io.grpc.s2a class,
so shouldn't be in internal.handler
2025-02-14 15:47:19 -08:00
Eric Anderson 9e54e8e5e9 servlet: Provide Gradle a filter version number
The version number is simply a unique string per version.
2025-02-14 15:45:44 -08:00
Larry Safran c1d703546a
okhttp:Use a locally specified value instead of Segment.SIZE in okhttp (#11899)
Switched to using 8192 which is the current value of Segment.SIZE and just have a test check that they are equal.  

The reason for doing this is that Segment.SIZE is Kotlin internal so shouldn't be used outside of its module.
2025-02-14 14:46:54 -08:00
Eric Anderson 57af63ad0a kokoro: Increase gradle mem in android-interop
To try to aid failure when building android-interop-testing
```
The Daemon will expire after the build after running out of JVM heap space.
The project memory settings are likely not configured or are configured to an insufficient value.
The daemon will restart for the next build, which may increase subsequent build times.
These settings can be adjusted by setting 'org.gradle.jvmargs' in 'gradle.properties'.
The currently configured max heap space is '512 MiB' and the configured max metaspace is '384 MiB'.
...
Exception in thread "Daemon client event forwarder" java.lang.OutOfMemoryError: Java heap space
...
> Task :grpc-android-interop-testing:mergeDexDebug FAILED
ERROR:D8: java.lang.OutOfMemoryError: Java heap space
com.android.builder.dexing.DexArchiveMergerException: Error while merging dex archives:
```
2025-02-14 13:20:05 -08:00
Riya Mehta a5347b2bc4
s2a: inject Optional<AccessTokenManager> in tests 2025-02-14 12:55:42 -08:00
Larry Safran 41dd0c6d73
xds:Cleanup to reduce test flakiness (#11895)
* don't process resourceDoesNotExist for watchers that have been cancelled.

* Change test to use an ArgumentMatcher instead of expecting that only the final result will be sent since depending on timing there may be configs sent for clusters being removed with their entries as errors.
2025-02-14 10:23:54 -08:00
Alex Panchenko 5a7f350537
optimize number of buffer allocations (#11879)
Currently this improves 2 flows

1. Known length message which length is greater than 1Mb. Previously the
first buffer was 1Mb, and then many buffers of 4096 bytes (from
CodedOutputStream), now subsequent buffers are also up to 1Mb

2. In case of compression, the first write is always 10 bytes buffer
(gzip header), but worth allocating more space
2025-02-14 05:59:21 -08:00
MV Shiva 7585b1607d
core: remember last pick status in no real stream (#11851) 2025-02-14 11:38:06 +05:30
Eric Anderson 122b683717 Upgrade netty-tcnative to 2.0.70 2025-02-13 12:41:56 -08:00
Larry Safran 764a4e3f08
xds: Cleanup by moving methods in XdsDependencyManager ahead of classes (#11890)
* Move private methods ahead of classes
2025-02-11 14:34:46 -08:00
Larry Safran ade2dd2038
xds: Change XdsClusterConfig to have children field instead of endpoint (#11888)
* Change XdsConfig to match spec with a `children` object holding either `a list of leaf cluster names` or `an EdsUpdate`.  Removed intermediate aggregate nodes from `XdsConfig.clusters`.
2025-02-11 12:38:52 -08:00
Kannan J fc8571a0e5
Version upgrades (#11874) 2025-02-12 01:08:46 +05:30
Naveen Prasanna V 302342cfce
core: logging the error message when onClose() itself fails (#11880) 2025-02-11 11:38:07 +05:30
Benjamin Peterson dc316f7fd9 Add missing frame release to Http2ClientStreamTransportState.
If a data frame is received before headers, processing of the frame is abandoned. The frame must be released in that case.
2025-02-10 21:40:05 -08:00
Sergii Tkachenko bd6af59221
xds: improve code readability of server FilterChain parsing
- Improve code flow and variable names
- Reduce nesting
- Add comments between logical blocks
- Add comments explaining some xDS/gRPC nuances
2025-02-10 17:14:07 -08:00
Larry Safran 67fc2e156a
Add new classes for eliminating xds config tears (#11740)
* Framework definition to support A74
2025-02-07 16:33:17 -08:00
MV Shiva 90b1c4fe94
protobuf: Stabilize marshallerWithRecursionLimit (#11884) 2025-02-07 14:16:13 +05:30
Abhishek Agrawal 44e92e2c2c
core: updates the backoff range as per the A6 redefinition (#11858)
* core: updates the backoff range being used from [0, 1] to [0.8, 1.2] as per the A6 redefinition

* adds a flag for experimental jitter

* xds: Allow FaultFilter's interceptor to be reused

This is the only usage of PickSubchannelArgs when creating a filter's
ClientInterceptor, and a follow-up commit will remove the argument and
actually reuse the interceptors. Other filter's interceptors can
already be reused.

There doesn't seem to be any significant loss of legibility by making
FaultFilter a more ordinary interceptor, but the change does cause the
ForwardingClientCall to be present when faultDelay is configured,
independent of whether the fault delay ends up being triggered.

Reusing interceptors will move more state management out of the RPC path
which will be more relevant with RLQS.

* netty: Removed 4096 min buffer size (#11856)

* netty: Removed 4096 min buffer size

* turns the flag in a var for better efficiency

---------

Co-authored-by: Eric Anderson <ejona@google.com>
2025-02-05 14:18:20 -08:00
Alex Panchenko 3142928fa3
use gradle java-platform in grpc-bom; Fixes #5530 (#11875)
* use gradle java-platform in grpc-bom; Fixes #5530

* fix withXml

* explicitly exclude grpc-compiler
2025-02-05 11:10:39 -08:00
Eric Anderson 199a7ea3e8
xds: Improve XdsNR's selectConfig() variable handling
The variables from the do-while are no longer initialized to let the
compiler verify that the loop sets each. Unnecessary comparisons to null
are also removed and is more obvious as the variables are never set to
null. Added a minor optimization of computing the RPCs path once instead
of once for each route. The variable declarations were also sorted to
match their initialization order.

This does fix an unlikely bug where if the old code could successfully
matched a route but fail to retain the cluster, then when trying a
second time if the route was _not_ matched it would re-use the prior route
and thus infinite-loop failing to retain that same cluster.

It also adds a missing cast to unsigned long for a uint32 weight. The old
code would detect if the _sum_ was negative, but a weight using 32 bits
would have been negative and never selected.
2025-02-05 10:37:22 -08:00
Eric Anderson ea3f644eef
Replace Kokoro ARM build with GitHub Actions
The Kokoro aarch64 build runs on x86 with an emulator, and has always
been flaky due to the slow execution speed. At present it is continually
failing due to deadline exceededs. GitHub Actions is running on aarch64
hardware, so is much faster (4 minutes vs 30 minutes, without including
the speedup from GitHub Action's caching).
2025-02-04 10:51:13 -08:00
Alex Panchenko 4a10a38166
servlet: remove 4096 min buffer size
Similar to 7153ff8
2025-02-04 10:49:30 -08:00
Eric Anderson b1bc0a9d24 alts: Add ClientCall support to AltsContextUtil
This adds a createFrom(Attributes) to mirror the check(Attributes) added
in ba8ab79. It also adds conveniences for ClientCall for both
createFrom() and check(). This allows getting peer information from
ClientCall and CallCredentials.RequestInfo, as was already available
from ServerCall.

The tests were reworked to test the Attribute-based methods and then
only basic tests for client/server.

Fixes #11042
2025-01-30 16:10:19 -08:00
Eric Anderson 04f1cc5845 xds: Make XdsNR.RoutingConfig.empty a constant
The field was made final in 4b52639aa but was soon reverted in 3ebb3e192
because of what I assume was a bad merge conflict resolution. The field
has contained an immutable object since its introduction in d25f5acf1,
so it is pretty likely to remain a constant in the future.
2025-01-30 15:10:12 -08:00
Eric Anderson c506190b0f
xds: Reuse filter interceptors across RPCs
This moves the interceptor creation from the ConfigSelector to the
resource update handling.

The code structure changes will make adding support for filter
lifecycles (for RLQS) a bit easier. The filter lifecycles will allow
filters to share state across interceptors, and constructing all the
interceptors on a single thread will mean filters wouldn't need to be
thread-safe (but their interceptors would be thread-safe).
2025-01-30 12:43:51 -08:00
Eric Anderson 90aefb26e7 core: Propagate authority override from LB exactly once
Setting the authority is only useful when creating a real stream, as
there will be a following pick otherwise. In addition, DelayedStream
will buffer each call to setAuthority() in a list and we don't want that
memory usage. Note that no LBs are using this feature yet, so users
would not have been exposed to the memory use.

We also needed to setAuthority() when the LB selected a subchannel on
the first pick attempt.
2025-01-30 08:08:17 -08:00
Abhishek Agrawal 7153ff8522
netty: Removed 4096 min buffer size (#11856)
* netty: Removed 4096 min buffer size
2025-01-30 12:52:37 +05:30
Eric Anderson b3db8c2489 xds: Allow FaultFilter's interceptor to be reused
This is the only usage of PickSubchannelArgs when creating a filter's
ClientInterceptor, and a follow-up commit will remove the argument and
actually reuse the interceptors. Other filter's interceptors can
already be reused.

There doesn't seem to be any significant loss of legibility by making
FaultFilter a more ordinary interceptor, but the change does cause the
ForwardingClientCall to be present when faultDelay is configured,
independent of whether the fault delay ends up being triggered.

Reusing interceptors will move more state management out of the RPC path
which will be more relevant with RLQS.
2025-01-29 14:21:53 -08:00
vinodhabib 9e8629914f
examples: Added README files for all missing Examples (#11676) 2025-01-28 13:02:00 +05:30
Larry Safran 87aa6deadf
core:Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful (#11849)
* Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful.
Move cleanup of removed subchannels into a method so it can be called from 2 places in acceptResolvedAddresses.
Since the seek could mean we never looked at the first address, if we go off the end of the index and haven't looked at the all of the addresses then instead of scheduleBackoff() we reset the index and request a connection.
2025-01-24 16:42:56 -08:00
Boddu Harshavardhan 67351c0c53
gcp-observability: Optimize GcpObservabilityTest.enableObservability execution time (#11783) 2025-01-24 16:50:41 +05:30
MV Shiva bf8eb24a30
Update README etc to reference 1.70.0 (#11854) 2025-01-24 15:21:14 +05:30
Kannan J 0f5503ebb1
xds: Include max concurrent request limit in the error status for concurre… (#11845)
Include max concurrent request limit in the error status for concurrent connections limit exceeded
2025-01-23 21:40:21 +05:30
Eric Anderson 495a8906b2 xds: Fix fallback test FakeClock TSAN failure
d65d3942e increased the test speed of
connect_then_mainServerDown_fallbackServerUp by using FakeClock.
However, it introduced a data race because FakeClock is not thread-safe.
This change injects a single thread for gRPC callbacks such that
syncContext is run on a thread under the test's control.

A simpler approach would be to expose syncContext from XdsClientImpl for
testing. However, this test is in a different package and I wanted to
avoid adding a public method.

```
  Read of size 8 at 0x00008dec9d50 by thread T25:
    #0 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Lio/grpc/internal/FakeClock$ScheduledTask;JLjava/util/concurrent/TimeUnit;)V FakeClock.java:140
    #1 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;)Ljava/util/concurrent/ScheduledFuture; FakeClock.java:150
    #2 io.grpc.SynchronizationContext.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/ScheduledExecutorService;)Lio/grpc/SynchronizationContext$ScheduledHandle; SynchronizationContext.java:153
    #3 io.grpc.xds.client.ControlPlaneClient$AdsStream.handleRpcStreamClosed(Lio/grpc/Status;)V ControlPlaneClient.java:491
    #4 io.grpc.xds.client.ControlPlaneClient$AdsStream.lambda$onStatusReceived$0(Lio/grpc/Status;)V ControlPlaneClient.java:429
    #5 io.grpc.xds.client.ControlPlaneClient$AdsStream$$Lambda+0x00000001004a95d0.run()V ??
    #6 io.grpc.SynchronizationContext.drain()V SynchronizationContext.java:96
    #7 io.grpc.SynchronizationContext.execute(Ljava/lang/Runnable;)V SynchronizationContext.java:128
    #8 io.grpc.xds.client.ControlPlaneClient$AdsStream.onStatusReceived(Lio/grpc/Status;)V ControlPlaneClient.java:428
    #9 io.grpc.xds.GrpcXdsTransportFactory$EventHandlerToCallListenerAdapter.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V GrpcXdsTransportFactory.java:149
    #10 io.grpc.PartialForwardingClientCallListener.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V PartialForwardingClientCallListener.java:39
    ...

  Previous write of size 8 at 0x00008dec9d50 by thread T4 (mutexes: write M0, write M1, write M2, write M3):
    #0 io.grpc.internal.FakeClock.forwardTime(JLjava/util/concurrent/TimeUnit;)I FakeClock.java:368
    #1 io.grpc.xds.XdsClientFallbackTest.connect_then_mainServerDown_fallbackServerUp()V XdsClientFallbackTest.java:358
    ...
```
2025-01-22 16:00:00 -08:00
Eric Anderson fc86084df5 xds: Rename grpc.xds.cluster to grpc.lb.backend_service
The name is being changed to allow the value to be used in more metrics
where xds-specifics are awkward.
2025-01-17 17:16:32 -08:00
Eric Anderson a0a42fc8e6 Update README etc to reference 1.69.1 2025-01-17 13:30:22 -08:00