grpc-java

Commit Graph

Author	SHA1	Message	Date
Eric Anderson	028afbe352	xds: Implement equals in WRRLBConfig Just an is `a8de9f0`, lack of equals causes cluster_resolver to consider every update a different configuration and restart itself. Handling NaN should really be prevented with validation, but it looks like that would lead to yak shaving at the moment. b/435208946	2025-08-22 08:07:51 -07:00
MV Shiva	2039266ebc	xds: xdsClient caches transient error for new watchers (#12262 )	2025-08-19 21:41:52 +05:30
Eric Anderson	437e03dc98	xds: Avoid PriorityLb re-enabling timer on duplicate CONNECTING (#12289 ) Since `c4256add4` we no longer fabricate a TRANSIENT_FAILURE update from children. However, previously that would have set seenReadyOrIdleSinceTransientFailure = false and prevented future timer creation. If a LB policy gives extraneous updates with state CONNECTING, then it was possible to re-create failOverTimer which would then wait the 10 seconds for the child to finish CONNECTING. We only want to give the child one opportunity after transitioning out of READY/IDLE. https://github.com/grpc/proposal/pull/509	2025-08-19 12:53:47 +05:30
Eric Anderson	06707f7c38	xds: Use a different log name for XdsClientImpl and ControlPlaneClient Seems like a good time to stop hating ourselves, as that seems to be the only reason to use the same string.	2025-08-08 14:23:43 -07:00
MV Shiva	36fe276a50	xds: add "resource_timer_is_transient_failure" server feature (#12249 )	2025-07-30 17:54:45 +05:30
Kannan J	7e982e48a1	Xds: Aggregate cluster fixes (A75) (#12186 ) Instead of representing an aggregate cluster as a single cluster whose priorities come from different underlying clusters, represent an aggregate cluster as an instance of a priority LB policy where each child is a cds LB policy for the underlying cluster.	2025-07-29 18:06:39 +05:30
MV Shiva	c3ef1ab034	xds: Envoy proto sync to (#12224 )	2025-07-28 20:56:27 +05:30
Eric Anderson	8f09b96899	bazel: Use jarjar to avoid xds deps (#12243 ) Avoiding so many deps will allow us to upgrade the protos without being forced to upgrade to protobuf-java 4.x. It also removes the remaining non-bzlmod dependencies. It'd be really easy to get this wrong, so we do two things 1) mirror the gradle configuration as much as possible, as that sees a lot of testing, and 2) run the fake control plane with the _results_ of jarjar. There's lots of classes that we could mess up, but that at least kicks the tires. XdsTestUtils.buildRouteConfiguration() was moved to ControlPlaneRule to stop the unnecessary circular dependency between the classes and to avoid the many dependencies of XdsTestUtils. I'm totally hacking java_grpc_library to improve the dependency situation. Long-term, I think we will stop building Java libraries with Bazel and require users to rely entirely on Maven Central. That seems to be the direction Bazel is going and it will greatly simplify the problems we've seen with protobuf having a single repository for many languages. So while the hack isn't too bad, I hope we won't have to live with it long-term.	2025-07-28 12:30:39 +05:30
Eric Anderson	c4256add4d	xds: Align PriorityLB child selection with A56 The PriorityLB predates A56. tryNextPriority() now matches ChoosePriority() from the gRFC. The biggest change is waiting on CONNECTING children instead of failing after the failOverTimer fires. The failOverTimer should be used to start lower priorities more eagerly, but shouldn't cause the overall connectivity state to become TRANSIENT_FAILURE on its own. The prior behavior of creating the "Connection timeout for priority" failing picker was particularly strange, because it didn't update child's connectivity state. This previous behavior was creating errors because of the failOverTimer with no way to diagnose what was going wrong. b/428517222	2025-07-23 06:38:33 -07:00
Eric Anderson	6ff8ecac09	core: Don't pre-compute DEADLINE_EXCEEDED message for delayed calls The main reason I made a change here was to fix the tense from the deadline "will be exceeded in" to "was exceeded after". But we really don't want to be doing the string formatting unless the deadline is actually exceeded. There were a few more changes to make some variables effectively final.	2025-07-22 06:56:02 -07:00
Eric Anderson	1fc4ab0bb2	LBs should avoid calling LBs after lb.shutdown() LoadBalancers shouldn't be called after shutdown(), but RingHashLb could have enqueued work to the SynchronizationContext that executed after shutdown(). This commit fixes problems discovered when auditing all LBs usage of the syncContext for that type of problem. Similarly, PickFirstLb could have requested a new connection after shutdown(). We want to avoid that sort of thing too. RingHashLb's test changed from CONNECTING to TRANSIENT_FAILURE to get the latest picker. Because two subchannels have failed it will be in TRANSIENT_FAILURE. Previously the test was using an older picker with out-of-date subchannelView, and the verifyConnection() was too imprecise to notice it was creating the wrong subchannel. As discovered in b/430347751, where ClusterImplLb was seeing a new subchannel being called after the child LB was shutdown (the shutdown itself had been caused by RingHashConfig not implementing equals() and was fixed by `a8de9f07ab`, which caused ClusterResolverLb to replace its state): ``` java.lang.NullPointerException at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createClusterLocalityFromAttributes(ClusterImplLoadBalancer.java:322) at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createSubchannel(ClusterImplLoadBalancer.java:236) at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47) at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47) at io.grpc.internal.PickFirstLeafLoadBalancer.createNewSubchannel(PickFirstLeafLoadBalancer.java:527) at io.grpc.internal.PickFirstLeafLoadBalancer.requestConnection(PickFirstLeafLoadBalancer.java:459) at io.grpc.internal.PickFirstLeafLoadBalancer.acceptResolvedAddresses(PickFirstLeafLoadBalancer.java:174) at io.grpc.xds.LazyLoadBalancer$LazyDelegate.activate(LazyLoadBalancer.java:64) at io.grpc.xds.LazyLoadBalancer$LazyDelegate.requestConnection(LazyLoadBalancer.java:97) at io.grpc.util.ForwardingLoadBalancer.requestConnection(ForwardingLoadBalancer.java:61) at io.grpc.xds.RingHashLoadBalancer$RingHashPicker.lambda$pickSubchannel$0(RingHashLoadBalancer.java:440) at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:96) at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:128) at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.onData(XdsClientImpl.java:817) ```	2025-07-17 12:56:33 +00:00
MV Shiva	6935d3a115	Revert "xds: add "resource_timer_is_transient_failure" server feature (#12063 )" (#12228 )	2025-07-17 11:35:34 +05:30
MV Shiva	d7d70c6905	xds: cncf/xds proto sync to 2025-05-02 (#12225 )	2025-07-17 10:26:12 +05:30
MV Shiva	5a8326f1c7	xds: add "resource_timer_is_transient_failure" server feature (#12063 )	2025-07-15 15:33:02 +05:30
Eric Anderson	a8de9f07ab	xds: Implement equals in RingHashConfig Lack of equals causes cluster_resolver to consider every update a different configuration and restart itself. b/430347751	2025-07-14 14:06:15 +00:00
Eric Anderson	9d191b31b5	xds: Check isHttp11ProxyAvailable in equals() This fixes an equals/hashCode bug introduced in `12197065fe`. Discovered when investigating b/430347751	2025-07-14 14:05:35 +00:00
Eric Anderson	ca99a8c478	Fix RLS regressions from XdsDepMan conversion `297ab05ef` converted CDS to XdsDependencyManager. This caused three regressions: * CdsLB2 as a RLS child would always fail with "Unable to find non-dynamic root cluster" because is_dynamic=true was missing in its service config * XdsNameResolver only propagated resolution updates when the clusters changed, so a CdsUpdate change would be ignored. This caused a hang for RLS even with is_dynamic=true. For non-RLS the lack config update broke the circuit breaking psm interop test. This would have been more severe if ClusterResolverLb had been converted to XdsDependenceManager, as it would have ignored EDS updates * RLS did not propagate resolution updates, so CdsLB2 even with is_dynamic=true the CdsUpdate for the new cluster would never arrive, causing a hang b/428120265 b/427912384	2025-06-30 14:23:32 +00:00
Eric Anderson	d374b26b68	xds: Disable LOGICAL_DNS in XdsDepMan until used ClusterResolverLb is still doing DNS itself, so disable it in XdsDepMan until that migration has finished. EDS is fine in XdsDepman, because XdsClient will share the result with ClusterResolverLb.	2025-06-24 14:56:16 +00:00
Eric Anderson	d2d8ed8efa	xds: Add logical dns cluster support to XdsDepManager ClusterResolverLb gets the NameResolverRegistry from LoadBalancer.Helper, so a new API was added in NameResover.Args to propagate the same object to the name resolver tree. RetryingNameResolver was exposed to xds. This is expected to be temporary, as the retrying is being removed from ManagedChannelImpl and moved into the resolvers. At that point, DnsNameResolverProvider would wrap DnsNameResolver with a similar API to RetryingNameResolver and xds would no longer be responsible.	2025-06-17 22:14:20 +00:00
Eric Anderson	2604ce8a55	xds: XdsNR should be subscribing to clusters with XdsDepManager This is missing behavior defined in gRFC A74: > As per gRFC A31, the ConfigSelector gives each RPC a ref to the > cluster that was selected for it to ensure that the cluster is not > removed from the xds_cluster_manager LB policy config before the RPC > is done with its LB picks. These cluster refs will also hold a > subscription for the cluster from the XdsDependencyManager, so that > the XdsDependencyManager will not stop watching the cluster resource > until the cluster is removed from the xds_cluster_manager LB policy > config. Without the logic, RPCs can race and see the error: > INTERNAL: CdsLb for cluster0: Unable to find non-dynamic root cluster Fixes #12152. This fixes the regression introduced in `297ab05e`	2025-06-17 13:36:24 +00:00
Eric Anderson	d5b4fb51c2	xds: Support tracking non-xds resources in XdsDepManager This will be used for logical dns clusters as part of gRFC A74. Swapping to EnumMap wasn't really necessary, but was easy given the new type system. I can't say I'm particularly happy with the name of the new TrackedWatcher type, but XdsConfigWatcher prevented using "Watcher" because it won't implement the new interface, and ResourceWatcher already exists in XdsClient. So we have TrackedWatcher, WatcherTracer, TypeWatchers, and TrackedWatcherType.	2025-06-16 14:32:27 +00:00
Eric Anderson	240f731e00	xds: Avoid changing cache when watching children in XdsDepManager The watchers can be completely regular, so the base class can do the cache management while the subclasses are only concerned with subscribing to children.	2025-06-13 15:18:14 +00:00
Eric Anderson	297ab05efe	xds: Convert CdsLb to XdsDepManager I noticed we deviated from gRFC A37 in some ways. It turned out those were added to the gRFC later in https://github.com/grpc/proposal/pull/344: - NACKing empty aggregate clusters - Failing aggregate cluster when children could not be loaded - Recusion limit of 16. We had this behavior already, but it was ascribed to matching C++ There's disagreement on whether we should actually fail the aggregate cluster for bad children, so I'm preserving the pre-existing behavior for now. The code is now doing a depth-first leaf traversal, not breadth-first. This was odd to see, but the code was also pretty old, so the reasoning seems lost to history. Since we haven't seen more than a single level of aggregate clusters in practice, this wouldn't have been noticed by users. XdsDependencyManager.start() was created to guarantee that the callback could not be called before returning from the constructor. Otherwise XDS_CLUSTER_SUBSCRIPT_REGISTRY could potentially be null.	2025-06-11 11:56:13 -07:00
MV Shiva	a16d655919	compiler: generate blocking v2 unary calls that throw StatusException (#12126 )	2025-06-10 10:31:03 +05:30
Eric Anderson	6afacf589e	xds: Don't cache rdsName in XdsDepManager We can easily compute the rdsName and avoiding the state means we don't need to override onResourceDoesNotExist() to keep the cache in-sync with the config.	2025-06-09 14:13:59 +00:00
Eric Anderson	4ee662fbcf	xds: cancelled=true on watch close in XdsDepManager `1fd29bc80` replaced cancelWatcher() with watcher.close(). But setting cancelled was missing. Because the config update checks for shutdown, the cancelled flag no longer avoids exceptions. But it seems best to continue avoiding any processing after close to avoid surprises.	2025-06-09 14:13:41 +00:00
Eric Anderson	1fd29bc804	xds: Use tracing GC in XdsDepManager Reference counting doesn't release cycles, so swap to a tracing garbage collector. This greatly simplifies the code as well, as diffing is no longer necessary. (If vanilla reference counting was used, diffing wouldn't have been necessary either as you just increment all the new objects and decrement the old ones. But that doesn't work when use a set instead of an integer.)	2025-06-06 08:43:35 -07:00
Eric Anderson	4cd7881086	xds: Fix XdsDepManager aggregate cluster child ordering and loop detection The children of aggregate clusters have a priority order, so we can't ever throw them in an ordinary set for later iteration. This now detects recusion limits only after subscribing, but that matches our existing behavior in CdsLoadBalancer2. We don't get much value detecting the limit before subscribing and doing so makes watcher types more different. Loops are still a bit broken as they won't be unwatched when orphaned, as they will form a reference loop. In CdsLoadBalancer2, duplicate clusters had duplicate watchers so there was single-ownership and reference cycles couldn't form. Fixing that is a bigger change. Intermediate aggregate clusters are now included in XdsConfig, just for simplicity. It doesn't hurt anything whether they are present or missing. but it required updates to some tests.	2025-06-05 08:43:02 -07:00
Eric Anderson	482dc5c1c3	xds: Don't allow hostnames in address field (#12123 ) * xds: Don't allow hostnames in address field gRFC A27 specifies they must be IPv4 or IPv6 addresses. Certainly doing a DNS lookup hidden inside the config object is asking for trouble. The tests were accidentally doing a lot of failing DNS requests greatly slowing them down. On my desktop, which made the problem most obvious with five search paths in /etc/resolv.conf, :grpc-xds:test decreased from 66s to 29s. The majority of that is XdsDependencyManagerTest which went from 33s to .1s, as it generated a UUID for the in-process transport each test and then used it as a hostname, which defeated Java's DNS (negative) cache. The slowness was noticed because XdsDependencyManagerTest should have run quickly as a single thread without I/O, but was particularly slow on my desktop. The cleanup caused me to audit serverName and the weird places it went. I think some of them were tricks for XdsClientFallbackTest to squirrel away something distinguishing, although reusing the serverName is asking for confusion as is including the tricks in "shared" utilities. XdsClientFallbackTest does have some non-trivial changes, but this seems to fix some pre-existing bugs in the tests. * Add failing hostname unit test	2025-06-05 15:37:49 +05:30
Eric Anderson	efe9ccc22c	xds: Non-SOTW resources need onError() callbacks, too (#12122 ) SOTW is unique in that it can become absent after being found. But if we NACK when initially loading the resource, we don't want to delay, depend on the resource timeout, and then give a poor error. This was noticed while adding the EDS restriction that address is not a hostname and some tests started hanging instead of failing quickly.	2025-06-03 12:02:02 +05:30
Eric Anderson	48d08e643e	xds: Remove EDS maybePublishConfig() avoidance in XdsDepManager (#12121 ) The optimization makes the code more complicated. Yes, we know that maybePublishConfig() will do no work because of an outstanding watch, but we don't do this for other new watchers created and doing so would just make the code more bug-prone. This removes a difference in how different watcher types are handled.	2025-06-03 11:52:38 +05:30
Eric Anderson	6bad600592	xds: Use getWatchers more often in XdsDepManager This provides better type and missing-map handling. Note that getWatchers() now implicitly creates the map if it doesn't exist, instead of just returning an empty map. That makes it a bit easier to use and more importantly avoids accidents where a bug tries to modify the immutable map.	2025-06-02 06:42:31 -07:00
Eric Anderson	8044a56ad2	xds: Remove timeouts from XdsDepManagerTest (#12114 ) The tests are using FakeClock and inprocess transport with direct executor, so all operations should run in the test thread.	2025-06-02 12:25:27 +05:30
Eric Anderson	142e378cea	xds: Improve shutdown handling of XdsDepManager The most important change here is to handle subscribeToCluster() calls after shutdown(), and preventing the internal state from being heavily confused as the assumption is there are no watchers after shutdown(). ClusterSubscription.closed isn't strictly necessary, but I don't want the code to depend on double-deregistration being safe. maybePublishConfig() isn't being called after shutdown(), but adding the protection avoids a class of bugs that would cause channel panic.	2025-05-30 09:00:27 -07:00
Alex Panchenko	22cf7cf2ac	tests: Replace usages of deprecated junit ExpectedException with assertThrows (#12103 )	2025-05-30 10:55:37 +05:30
MV Shiva	9d439d4a44	xds: Add GcpAuthenticationFilter to FilterRegistry (#12075 )	2025-05-22 16:22:41 +05:30
Eric Anderson	f8700a13ad	compiler: Default to @generated=omit (#12080 ) After many years of issue 9179 being open, there's been nothing to show that we need the javax.annotations.Generated annotation. Most tools use file paths and a few check for annotations with "Generated" in the name. ErrorProne has a few that check for javax.annotations.Generated, but only UnnecessarilyFullyQualified looks like it'd be a problem and it is disabled by default. We're not getting any more information, no users have reported issues with `@generated=omit`, and the existing dependency is annoying users, so just drop it. Given we will still retain the GrpcGenerated annotation, it seems highly likely things are already okay. Even if there are problems they would probably be addressed by adding a io.grpc.stub.annotations.Generated annotation or small tweaks. In the short-term, (non-Bazel) users can use `@generated=javax`, but long-term we could consider removing the option assuming we've resolved any outstanding issues. We will want to update the examples and the README to remove the org.apache.tomcat:annotations-api dependency after the next release. Fixes #9179	2025-05-21 22:49:30 +05:30
Luwei Ge	2fb09578a8	xds: enableSpiffe also checks the new env var per gRFC-A87	2025-05-19 12:50:25 -07:00
Eric Anderson	b089761486	xds: Enable least request by default (#12054 ) Enable least request by default. It has seen good testing by users and behaved as expected. Fixes #11996	2025-05-14 13:23:37 +05:30
vinodhabib	6baac45bd2	xds: Fix pretty-print of Cluster with WrrLocality and LB policies (#12037 )	2025-05-12 12:44:14 +05:30
Eric Anderson	80cc988b3c	xds: Use acceptResolvedAddresses() for WeightedTarget children (#12053 ) Convert the tests to use acceptResolvedAddresses() as well.	2025-05-08 11:34:16 +05:30
Kim Jin Young	12aaf88d86	Fix comment's typo (#12045 )	2025-05-05 22:32:31 +05:30
Eric Anderson	25199e9df9	xds: XdsDepManager should ignore updates after shutdown This prevents a NPE and subsequent channel panic when trying to build a config (because there are no watchers, so waitingOnResource==false) without any listener and route. ``` java.lang.NullPointerException: Cannot invoke "io.grpc.xds.XdsDependencyManager$RdsUpdateSupplier.getRdsUpdate()" because "routeSource" is null at io.grpc.xds.XdsDependencyManager.buildUpdate(XdsDependencyManager.java:295) at io.grpc.xds.XdsDependencyManager.maybePublishConfig(XdsDependencyManager.java:266) at io.grpc.xds.XdsDependencyManager$EdsWatcher.onChanged(XdsDependencyManager.java:899) at io.grpc.xds.XdsDependencyManager$EdsWatcher.onChanged(XdsDependencyManager.java:888) at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.notifyWatcher(XdsClientImpl.java:929) at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.lambda$onData$0(XdsClientImpl.java:837) at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:96) ``` I think this fully-fixes the problem today, but not tomorrow. subscribeToCluster() is racy as well, but not yet used. This was noticed when idleTimeout was firing, with some other code calling getState(true) to wake the channel back up. That may have made this panic more visible than it would be otherwise, but that has not been investigated. b/412474567	2025-04-23 09:18:08 -07:00
Abhishek Agrawal	6cd007d0d0	xds: add the missing xds.authority metric (#12018 ) This completes the [XDS client metrics](https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#xdsclient) by adding the remaining grpc.xds.authority metric.	2025-04-22 14:34:51 +05:30
Eric Anderson	9619453799	Implement grpc.lb.backend_service optional label This completes gRFC A89. `7162d2d66` and `fc86084df` had already implemented the LB plumbing for the optional label on RPC metrics. This observes the value in OpenTelemetry and adds it to WRR metrics as well. https://github.com/grpc/proposal/blob/master/A89-backend-service-metric-label.md	2025-04-21 06:17:43 -07:00
MV Shiva	7a08fdb7f9	xds: float LRU cache across interceptors (#11992 )	2025-04-17 07:26:40 +05:30
Eric Anderson	65d0bb8a4d	xds: Enable deprecation warnings The security code referenced fields removed from gRFC A29 before it was finalized. Note that this fixes a bug in CommonTlsContextUtil where CombinedValidationContext was not checked. I believe this was the only location with such a bug as I audited all non-test usages of has/getValidationContext() and confirmed they have have a corresponding has/getCombinedValidationContext().	2025-04-11 08:25:21 -07:00
Eric Anderson	f79ab2f16f	api: Remove deprecated SubchannelPicker.requestConnection() It has been deprecated since `cec9ee368`, six years ago. It was replaced with LoadBalancer.requestConnection().	2025-04-09 12:51:33 -07:00
Kannan J	a13fca2bf2	xds: ClusterResolverLoadBalancer handle update for both resolved addresses and errors via ResolutionResult (#11997 )	2025-04-04 22:08:29 +05:30
MV Shiva	c8d1e6e39c	xds: listener type validation (#11933 )	2025-04-03 11:22:26 +05:30

1 2 3 4 5 ...

1155 Commits