grpc-java

Commit Graph

Author	SHA1	Message	Date
Eric Anderson	f3f054a0a4	xds: Log cluster_manager config update before applying config It is confusing/harder to read the logs when the activations/deactivations because of the config happen before the log entry describing the new config.	2025-03-07 14:37:37 -08:00
Eric Anderson	d82613a74c	xds: Fix cluster selection races when updating config selector Listener2.onResult() doesn't require running in the sync context, so when called from the sync context it is guaranteed not to do its processing immediately (instead, it schedules work into the sync context). The code was doing an update dance: 1) update service config to add new cluster, 2) update config selector to use new cluster, 3) update service config to remove old clusters. But the onResult() wasn't being processed immediately, so the actual execution order was 2, 1, 3 which has a small window where RPCs will fail. But onResult2() does run immediately. And since `ca4819ac6`, updateBalancingState() updates the picker immediately. cleanUpRoutes() was also racy because it updated the routingConfig before swapping to the new config selector, so RPCs could fail saying there was no route instead of the useful error message. Even with the opposite order, some RPCs may be executing the while loop of selectConfig(), trying to acquire a cluster. The code unreffed the clusters before updating the routingConfig, so those RPCs could go into a tight loop until the routingConfig was updated. Also, once the routingConfig was updated to EMPTY those RPCs would similarly see the wrong error message. To give the correct error message, selectConfig() must fail such RPCs directly, and once it can do that there's no need to stop using the config selector in error cases. This has the benefit of fewer moving parts and more consistent threading among cases. The added test was able to detect the race 2% of the time. The slower the code/machine, the more reliable the test failed. `ca4819ac6` along with this commit reduced it to 0 failures in 1000 runs. Discovered when investigating b/394850611	2025-03-07 10:33:35 -08:00
Sergii Tkachenko	a6a041e415	xds: Support filter state retention This PR adds support filter state retention in Java. The mechanism will be similar to the one described in [A83] (https://github.com/grpc/proposal/blob/master/A83-xds-gcp-authn-filter.md#filter-call-credentials-cache) for C-core, and will serve the same purpose. However, the implementation details are very different due to the different nature of xDS HTTP filter support in C-core and Java. ### Filter instance lifecycle #### xDS gRPC clients New filter instances are created per combination of: 1. `XdsNameResolver` instance, 2. Filter name+typeUrl as configured in HttpConnectionManager (HCM) http_filters. Existing client-side filter instances are shutdown: - A single a filter instance is shutdown when an LDS update contains HCM that is missing filter configuration for name+typeUrl combination of this instance. - All filter instances when watched LDS resource is missing from an LDS update. - All filter instances name resolver shutdown. #### xDS-enabled gRPC servers New filter instances are created per combination of: 1. Server instance, 2. FilterChain name, 3. Filter name+typeUrl as configured in FilterChain's HCM.http_filters Filter instances of Default Filter Chain is tracked separately per: 1. Server instance, 2. Filter name+typeUrl in default_filter_chain's HCM.http_filters. Existing server-side filter instances are shutdown: - A single a filter instance is shutdown when an LDS update contains FilterChain with HCM.http_filters that is missing configuration for filter name+typeUrl. - All filter instances associated with the FilterChain when an LDS update no longer contains FilterChain's name. - All filter instances when watched LDS resource is missing from an LDS update. - All filter instances on server shutdown. ### Related - Part 1: #11883	2025-03-06 10:32:08 -08:00
MV Shiva	602aece081	xds: avoid unnecessary dns lookup (#11932 )	2025-03-06 16:04:53 +05:30
MV Shiva	12197065fe	xds: xDS-based HTTP CONNECT configuration (#11861 )	2025-03-06 13:40:18 +05:30
Sergii Tkachenko	1a2285b527	xds: ensure server interceptors are created in a sync context (#11930 ) `XdsServerWrapper#generatePerRouteInterceptors` was always intended to be executed within a sync context. This PR ensures that by calling `syncContext.throwIfNotInThisSynchronizationContext()`. This change is needed for upcoming xDS filter state retention because the new tests in XdsServerWrapperTest flake with this NPE: > `Cannot invoke "io.grpc.xds.client.XdsClient$ResourceWatcher.onChanged(io.grpc.xds.client.XdsClient$ResourceUpdate)" because "this.ldsWatcher" is null`	2025-03-03 14:28:36 -08:00
Eric Anderson	57124d6b29	Use acceptResolvedAddresses() in easy cases We want to move away from handleResolvedAddresses(). These are "easy" in that they need no logic. LBs extending ForwardingLoadBalancer had the method duplicated from handleResolvedAddresses() and swapped away from `super` because ForwardingLoadBalancer only forwards handleResolvedAddresses() reliably today. Duplicating small methods was less bug-prone than dealing with ForwardingLoadBalancer.	2025-02-20 21:25:55 -08:00
Eric Anderson	110c1ff0d6	xds: Use acceptResolvedAddresses() for PriorityLb children PriorityLb should propagate config problems up to the name resolver so it can refresh.	2025-02-20 16:35:54 -08:00
Daniel Liu	892144dcac	xds: explicitly set request hash key for the ring hash LB policy Implements [gRFC A76: explicitly setting the request hash key for the ring hash LB policy][A76] * Explictly setting the request hash key is guarded by the `GRPC_EXPERIMENTAL_RING_HASH_SET_REQUEST_HASH_KEY` environment variable until API stabilized. Tested: * Verified end-to-end by spinning up multiple gRPC servers and a gRPC client that injects a custom service (load balancing) config with `ring_hash_experimental` and a custom `request_hash_header` (with NO associated value in the metadata headers) which generates a random hash for each request to the ring hash LB. Verified picks/RPCs are split evenly/uniformly across all backends. * Ran affected unit tests with thread sanitizer and 1000 iterations to prevent data races. [A76]: https://github.com/grpc/proposal/blob/master/A76-ring-hash-improvements.md#explicitly-setting-the-request-hash-key	2025-02-19 20:25:33 -08:00
Sergii Tkachenko	2b87b01651	xds: Change how xDS filters are created by introducing Filter.Provider (#11883 ) This is the first step towards supporting filter state retention in Java. The mechanism will be similar to the one described in [A83] (https://github.com/grpc/proposal/blob/master/A83-xds-gcp-authn-filter.md#filter-call-credentials-cache) for C-core, and will serve the same purpose. However, the implementation details are very different due to the different nature of xDS HTTP filter support in C-core and Java. In Java, xDS HTTP filters are backed by classes implementing `io.grpc.xds.Filter`, from here just called "Filters". To support Filter state retention (next PR), Java's xDS implementation must be able to create unique Filter instances per: - Per HCM `envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager` - Per filter name as specified in `envoy.extensions.filters.network.http_connection_manager.v3.HttpFilter.name` This PR does not implements Filter state retention, but lays the groundwork for it by changing how filters are registered and instantiated. To achieve this, all existing Filter classes had to be updated to the new instantiation mechanism described below. Prior to these this PR, Filters had no livecycle. FilterRegistry provided singleton instances for a given typeUrl. This PR introduces a new interface `Filter.Provider`, which instantiates Filter classes. All functionality that doesn't need an instance of a Filter is moved to the Filter.Provider. This includes parsing filter config proto into FilterConfig and determining the filter kind (client-side, server-side, or both). This PR is limited to refactoring, and there's no changes to the existing behavior. Note that all Filter Providers still return singleton Filter instances. However, with this PR, it is now possible to create Providers that return a new Filter instance each time `newInstance` is called.	2025-02-18 10:47:01 -08:00
Eric Anderson	713607056e	util: Use acceptResolvedAddresses() for MultiChildLb children A failing Status from acceptResolvedAddresses means something is wrong with the config, but parts of the config may still have been applied. Thus there are now two possible flows: errors that should prevent updateOverallBalancingState() and errors that should have no effect other than the return code. To manage that, MultChildLb must always be responsible for calling updateOverallBalancingState(). acceptResolvedAddressesInternal() was inlined to make that error processing easier. No existing usages actually needed to have logic between updating the children and regenerating the picker. RingHashLb already was verifying that the address list was not empty, so the short-circuiting when acceptResolvedAddressesInternal() returned an error was impossible to trigger. WrrLb's updateWeightTask() calls the last picker, so it can run before acceptResolvedAddressesInternal(); the only part that matters is re-creating the weightUpdateTimer.	2025-02-18 07:33:49 -08:00
Larry Safran	41dd0c6d73	xds:Cleanup to reduce test flakiness (#11895 ) * don't process resourceDoesNotExist for watchers that have been cancelled. * Change test to use an ArgumentMatcher instead of expecting that only the final result will be sent since depending on timing there may be configs sent for clusters being removed with their entries as errors.	2025-02-14 10:23:54 -08:00
Larry Safran	764a4e3f08	xds: Cleanup by moving methods in XdsDependencyManager ahead of classes (#11890 ) * Move private methods ahead of classes	2025-02-11 14:34:46 -08:00
Larry Safran	ade2dd2038	xds: Change XdsClusterConfig to have children field instead of endpoint (#11888 ) * Change XdsConfig to match spec with a `children` object holding either `a list of leaf cluster names` or `an EdsUpdate`. Removed intermediate aggregate nodes from `XdsConfig.clusters`.	2025-02-11 12:38:52 -08:00
Sergii Tkachenko	bd6af59221	xds: improve code readability of server FilterChain parsing - Improve code flow and variable names - Reduce nesting - Add comments between logical blocks - Add comments explaining some xDS/gRPC nuances	2025-02-10 17:14:07 -08:00
Larry Safran	67fc2e156a	Add new classes for eliminating xds config tears (#11740 ) * Framework definition to support A74	2025-02-07 16:33:17 -08:00
Eric Anderson	199a7ea3e8	xds: Improve XdsNR's selectConfig() variable handling The variables from the do-while are no longer initialized to let the compiler verify that the loop sets each. Unnecessary comparisons to null are also removed and is more obvious as the variables are never set to null. Added a minor optimization of computing the RPCs path once instead of once for each route. The variable declarations were also sorted to match their initialization order. This does fix an unlikely bug where if the old code could successfully matched a route but fail to retain the cluster, then when trying a second time if the route was _not_ matched it would re-use the prior route and thus infinite-loop failing to retain that same cluster. It also adds a missing cast to unsigned long for a uint32 weight. The old code would detect if the _sum_ was negative, but a weight using 32 bits would have been negative and never selected.	2025-02-05 10:37:22 -08:00
Eric Anderson	04f1cc5845	xds: Make XdsNR.RoutingConfig.empty a constant The field was made final in `4b52639aa` but was soon reverted in `3ebb3e192` because of what I assume was a bad merge conflict resolution. The field has contained an immutable object since its introduction in `d25f5acf1`, so it is pretty likely to remain a constant in the future.	2025-01-30 15:10:12 -08:00
Eric Anderson	c506190b0f	xds: Reuse filter interceptors across RPCs This moves the interceptor creation from the ConfigSelector to the resource update handling. The code structure changes will make adding support for filter lifecycles (for RLQS) a bit easier. The filter lifecycles will allow filters to share state across interceptors, and constructing all the interceptors on a single thread will mean filters wouldn't need to be thread-safe (but their interceptors would be thread-safe).	2025-01-30 12:43:51 -08:00
Eric Anderson	b3db8c2489	xds: Allow FaultFilter's interceptor to be reused This is the only usage of PickSubchannelArgs when creating a filter's ClientInterceptor, and a follow-up commit will remove the argument and actually reuse the interceptors. Other filter's interceptors can already be reused. There doesn't seem to be any significant loss of legibility by making FaultFilter a more ordinary interceptor, but the change does cause the ForwardingClientCall to be present when faultDelay is configured, independent of whether the fault delay ends up being triggered. Reusing interceptors will move more state management out of the RPC path which will be more relevant with RLQS.	2025-01-29 14:21:53 -08:00
Kannan J	0f5503ebb1	xds: Include max concurrent request limit in the error status for concurre… (#11845 ) Include max concurrent request limit in the error status for concurrent connections limit exceeded	2025-01-23 21:40:21 +05:30
Eric Anderson	495a8906b2	xds: Fix fallback test FakeClock TSAN failure `d65d3942e` increased the test speed of connect_then_mainServerDown_fallbackServerUp by using FakeClock. However, it introduced a data race because FakeClock is not thread-safe. This change injects a single thread for gRPC callbacks such that syncContext is run on a thread under the test's control. A simpler approach would be to expose syncContext from XdsClientImpl for testing. However, this test is in a different package and I wanted to avoid adding a public method. ``` Read of size 8 at 0x00008dec9d50 by thread T25: #0 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Lio/grpc/internal/FakeClock$ScheduledTask;JLjava/util/concurrent/TimeUnit;)V FakeClock.java:140 #1 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;)Ljava/util/concurrent/ScheduledFuture; FakeClock.java:150 #2 io.grpc.SynchronizationContext.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/ScheduledExecutorService;)Lio/grpc/SynchronizationContext$ScheduledHandle; SynchronizationContext.java:153 #3 io.grpc.xds.client.ControlPlaneClient$AdsStream.handleRpcStreamClosed(Lio/grpc/Status;)V ControlPlaneClient.java:491 #4 io.grpc.xds.client.ControlPlaneClient$AdsStream.lambda$onStatusReceived$0(Lio/grpc/Status;)V ControlPlaneClient.java:429 #5 io.grpc.xds.client.ControlPlaneClient$AdsStream$$Lambda+0x00000001004a95d0.run()V ?? #6 io.grpc.SynchronizationContext.drain()V SynchronizationContext.java:96 #7 io.grpc.SynchronizationContext.execute(Ljava/lang/Runnable;)V SynchronizationContext.java:128 #8 io.grpc.xds.client.ControlPlaneClient$AdsStream.onStatusReceived(Lio/grpc/Status;)V ControlPlaneClient.java:428 #9 io.grpc.xds.GrpcXdsTransportFactory$EventHandlerToCallListenerAdapter.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V GrpcXdsTransportFactory.java:149 #10 io.grpc.PartialForwardingClientCallListener.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V PartialForwardingClientCallListener.java:39 ... Previous write of size 8 at 0x00008dec9d50 by thread T4 (mutexes: write M0, write M1, write M2, write M3): #0 io.grpc.internal.FakeClock.forwardTime(JLjava/util/concurrent/TimeUnit;)I FakeClock.java:368 #1 io.grpc.xds.XdsClientFallbackTest.connect_then_mainServerDown_fallbackServerUp()V XdsClientFallbackTest.java:358 ... ```	2025-01-22 16:00:00 -08:00
Eric Anderson	fc86084df5	xds: Rename grpc.xds.cluster to grpc.lb.backend_service The name is being changed to allow the value to be used in more metrics where xds-specifics are awkward.	2025-01-17 17:16:32 -08:00
MV Shiva	b44ebce45d	xds: Envoy proto sync to 2024-11-11 (#11816 )	2025-01-17 14:58:52 +05:30
Eric Anderson	7162d2d661	xds: Pass grpc.xds.cluster label to tracer This is in service to gRFC A89. Since the gRFC isn't finalized this purposefully doesn't really do anything yet. The grpc-opentelemetry change to use this optional label will be done after the gRFC is merged. grpc-opentelemetry currently has a hard-coded list (one entry) of labels that it looks for, and this label will need to be added. b/356167676	2025-01-13 11:54:36 -08:00
Larry Safran	176f3eed12	xds: Enable Xds Client Fallback by default (#11817 )	2025-01-10 15:25:15 -08:00
Eric Anderson	d65d3942e6	xds: Increase speed of fallback test These changes reduce connect_then_mainServerDown_fallbackServerUp test time from 20 seconds to 5 s by faking time for the the does-no-exist timer. XdsClientImpl only uses the TimeProvider for CSDS cache details, so any implementation should be fine. FakeXdsClient provides an implementation, so might as well use it as it is one less clock to think about.	2025-01-10 08:28:48 -08:00
Eric Anderson	70825adce6	Replace jsr305's GuardedBy with Error Prone's We should avoid jsr305 and error prone's has the same semantics.	2025-01-10 08:16:48 -08:00
Eric Anderson	7b5d0692cc	Replace jsr305's CheckReturnValue with Error Prone's (#11811 ) We should avoid jsr305 and error prone's has the same semantics. Fixes #8687	2025-01-09 13:45:35 -08:00
MV Shiva	1edc4d84d4	xds: Parsing xDS Cluster Metadata (#11741 )	2025-01-07 10:03:13 +05:30
Larry Safran	4222f77587	xds:Move creating the retry timer in handleRpcStreamClosed to as late as possible and call close() (#11776 ) * Move creating the retry timer in handleRpcStreamClosed to as late as possible and call `close` so that the `call` is cancelled. Also add some debug logging.	2025-01-06 13:09:42 -08:00
Eric Anderson	6c12c2bd24	xds: Remember nonces for unknown types If the control plane sends a resource type the client doesn't understand at-the-moment, the control plane will still expect the client to include the nonce if the client subscribes to the type in the future. This most easily happens when unsubscribing the last resource of a type. Which meant `1cf1927d1` was insufficient.	2025-01-06 11:54:35 -08:00
Eric Anderson	4a0f707331	xds: Avoid depending on io.grpc.xds.Internal* classes Internal* classes should generally be accessors that are used outside of the package/project. Only one attribute was used outside of xds, so leave only that one attribute in InternalXdsAttributes. One attribute was used by the internal.security package, so move the definition to the same package to reduce the circular dependencies.	2025-01-03 16:01:10 -08:00
Eric Anderson	1cf1927d1a	xds: Preserve nonce when unsubscribing type This fixes a regression introduced in `19c9b998`. b/374697875	2025-01-03 12:34:47 -08:00
Eric Anderson	9a712c3f77	xds: Make XdsClient.ResourceStore package-private There's no reason to use the interface outside of XdsClientImpl/ControlPlaneClient. Since XdsClientImpl implements the interface directly, its methods are still public. That can be a future cleanup.	2025-01-03 11:45:55 -08:00
Benjamin Peterson	8c261c3f28	Fix typo in deprecated blocking stub javadoc. (#11772 )	2024-12-26 13:31:34 -08:00
Vindhya Ningegowda	6516c7387e	xds: Remove xds authority label from metric registration (#11760 ) * Remove `grpc.xds.authority` label while registering `grpc.xds_client.resources` gauge, until the label value is available to record.	2024-12-20 19:50:09 -08:00
Larry Safran	ea8c31c305	Bidi Blocking Stub (#10318 )	2024-12-20 16:16:17 -08:00
Larry Safran	ef7c2d59c1	xds: Fix XDS control plane client retry timer backoff duration when connection closes after results are received (#11766 ) * Fix retry timer backoff duration. * Reset stopwatch when we had results on AdsStream rather than change the delay calculation logic.	2024-12-19 14:46:58 -08:00
Eric Anderson	8ea3629378	Re-enable animalsniffer, fixing violations In `61f19d707a` I swapped the signatures to use the version catalog. But I failed to preserve the `@signature` extension and it all seemed to work... But in fact all the animalsniffer tasks were completing as SKIPPED as they lacked signatures. The build.gradle changes in this commit are to fix that while still using version catalog. But while it was broken violations crept in. Most violations weren't too important and we're not surprised went unnoticed. For example, Netty with TLS has long required the Java 8 API `setEndpointIdentificationAlgorithm()`, so using `Optional` in the same code path didn't harm anything in particular. I still swapped it to Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`. One important violation has not been fixed and instead I've disabled the android signature in api/build.gradle for the moment. The violation is in StatusException using the `fillInStackTrace` overload of Exception. This problem [had been noticed][PR11066], but we couldn't figure out what was going on. AnimalSniffer is now noticing this and agreeing with the internal linter. There is still a question of why our interop tests failed to notice this, but given they are no longer running on pre-API level 24, that may forever be a mystery. [PR11066]: https://github.com/grpc/grpc-java/pull/11066	2024-12-19 07:54:54 -08:00
vinodhabib	f8f613984f	xds: fixed unsupported unsigned 32 bits issue for circuit breaker (#11735 ) Added change for circuit breaking by converting signed 32-bit Int to Unsigned 64-bit Long For MaxRequest negative value ( -1) Fixes #11695	2024-12-16 21:37:22 -08:00
Eric Anderson	fe752a290e	xds: Move specialized APIs out of XdsResourceType StructOrError is a more generic API, but we have StatusOr now so we don't want new usages of StructOrError. Moving StructOrError out of io.grpc.xds.client will make it easier to delete StructOrError once we've migrated to StatusOr in the future. TRANSPORT_SOCKET_NAME_TLS should also move, but it wasn't immediately clear to me where it should go.	2024-12-16 16:03:59 -08:00
Eric Anderson	e8ff6da2cf	xds: Unexpected types in server_features should be ignored It was clearly defined in gRFC A30. The relevant text was copied as a comment in the code. As discovered due to grpc/grpc-go#7932	2024-12-16 07:29:51 -08:00
Larry Safran	486b8ba67f	Fix tsan error (#11742 ) Eliminate unneeded fakeClock.forwardTime() that was causing the conflict.	2024-12-11 17:48:19 -08:00
Larry Safran	210f9c083e	Xds fallback (#11254 ) * XDS Client Fallback	2024-12-09 15:42:27 -08:00
Vindhya Ningegowda	ebb43a69e7	Add "#server" as dataplane target value for xDS enabled gRPC servers. (#11715 ) As mentioned in [A71 xDS Fallback]( https://github.com/grpc/proposal/blob/master/A71-xds-fallback.md#update-csds-to-aggregate-configs-from-multiple-xdsclient-instances): updated dataplane target to "#server" for xDS-enabled gRPC servers.	2024-11-27 10:59:54 -08:00
Vindhya Ningegowda	20d09cee57	xds: Add counter and gauge metrics (#11661 ) Adds the following xDS client metrics defined in [A78](https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#xdsclient). Counters - grpc.xds_client.server_failure - grpc.xds_client.resource_updates_valid - grpc.xds_client.resource_updates_invalid Gauges - grpc.xds_client.connected - grpc.xds_client.resources	2024-11-25 16:47:32 -08:00
Eric Anderson	1f159d7899	xds: Fix XdsSecurityClientServerTest TrustManagerStore race When spiffe support was added it caused tlsClientServer_useSystemRootCerts_validationContext to become flaky. This is because test execution order was important for whether the race would occur. Fixes #11678	2024-11-14 22:01:38 -08:00
Eric Anderson	4e8f7df589	util: Remove resolvedAddresses from MultiChildLb.ChildLbState It isn't actually used by MultiChildLb, and using the health API gives us more confidence that health is properly plumbed.	2024-11-14 12:56:24 -08:00
Eric Anderson	8237ae270a	util: Remove EAG conveniences from MultiChildLb This is a step toward removing ResolvedAddresses from ChildLbState, which isn't actually used by MultiChildLb. Most usages of the EAG usages can be served more directly without peering into MultiChildLb's internals or even accessing ChildLbStates, which make the tests less sensitive to implementation changes. Some changes do leverage the new behavior of MultiChildLb where it preserves the order of the entries. This does fix an important bug in shutdown tests. The tests looped over the ChildLbStates after shutdown, but shutdown deleted all the children so it looped over an entry collection. Fixing that exposed that deliverSubchannelState() didn't function after shutdown, as the listener was removed from the map when the subchannel was shut down. Moving the listener onto the TestSubchannel allowed having access to the listener even after shutdown. A few places in LeastRequestLb lines were just deleted, but that's because an existing assertion already provided the same check but without digging into MultiChildLb.	2024-11-11 13:16:21 -08:00

1 2 3 4 5 ...

1097 Commits