Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:
1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used
`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.
To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
Currently the code maintains one LoadStatsManager2 that collects all
stats. The problem with this is that in a federation situation there
will be multiple LrsClients that will be periodically picking up stats
from the manager and sending them to their respective control planes.
This creates a first-come-first-serve situation where the stats get
randomly distributed across the control planes.
This change creates separate LoadStatsManagers dedicated to their own
control planes, thus assuring no stats will get lost.
xds: Correctly start LRS clients in federation situations
The old code used a single member variable to indicate if load reporting
had already been started by XdsClientImpl. This boolean was used to
avoid starting a LoadReportClient more than twice. This works fine with
a single control plane server.
The problem occurs in federation situations where there is more than one
control plane and thus more than one LoadReportClient. Once the first
LoadReportClient is started, the member variable boolean is flipped to
true and no other LoadReportClients would be started.
This change removes the boolean member variable and relies on the fact
that starting an already started LoadReportClient is a no-op.
* In `handleRpcStreamClosed()`, move retry handling to before the call to `xdsResponseHandler.handleStreamClosed()` so that TSan doesn't report a race condition that is completely meaningless.
fixes#9920
When XdsClient learns that a control plane no longer tracks a resource,
it should only notify watchers associated with that control plane.
This matters in control plane federation cases when more than one
control plane is in use.
Introduce an AsyncService interface in the generated code and move the methods from <service>ImplBase to default implementation of the interface.
* update pom files to allow java 1.8
* Add a bindService(<service>Async) method
* Change TestServiceImpl to use the interface and include a bind method instead of extending TestServiceImplBase.
* xds: allow sum of cluster weights above MAX_INT up to max of unsigned int.
* Define nextLong(long bound) method in FakeRandom for WeightedRandomPickerTest.
Fix a bug. When any of the xds subscribers for a resource has the last watcher cancelled, the bug will accidentally remove that resource type from the map, which make xds stream not accepting response update for that resource type entirely(pass through, no ACK/NACK will send).
Trying to upgrade Gradle to 7.6 improved the checkstyle plugin such that
it appears to have been running in new occasions. That in turn exposed
us to https://github.com/checkstyle/checkstyle/issues/5088. That bug was
fixed in 8.28, which also fixed lots of other bugs. So now we have
better checking and some existing volations needed fixing. Since the
code style fixes generated a lot of noise, this is a pre-fix to reduce
the size of a Gradle upgrade.
I did not upgrade past 8.28 because at some point some other bugs were
introduced, in particular with the Indentation module. I chose the
oldest version that had the particular bug impacting me fixed. Upgrading
to this old-but-newer version still makes it easier to upgrade to a
newer version in the future.
If an artifact on Maven Central exposes a type from gRPC on its API
surface, then consumers of that artifact need that gRPC API in the
compile classpath. Bazel handles this by making hjars for transitive
dependencies, but if the dependencies are runtime_deps then Bazel won't
generate hjars containing the needed symbols.
We don't export netty-shaded because the classes already don't match
Maven Central. If an artifact on Maven Central is exposing a
netty-shaded class on its API surface, it wouldn't work anyway since the
class simply doesn't exist for the Bazel build.
Fixes#9772
* xds: Disallow duplicate addresses in the RingHashLB.
Removed test that was previously checking for specific expected behavior with duplicate addresses.
This change has these main aspects to it:
1. Removal of any name resolution responsibility from ManagedChannelImpl
2. Creation of a new RetryScheduler to own generic retry logic
- Can also be used outside the name resolution context
3. Creation of a new RetryingNameScheduler that can be used to wrap any
polling name resolver to add retry capability
4. A new facility in NameResolver to allow implementations to notify
listeners on the success of name resolution attempts
- RetryingNameScheduler relies on this
* xds:Change timer creation logic to wait until the adsStream is ready before creating the timer to mark resources absent.
* xds:When the ads stream is closed only send errors to subscribers that haven't yet gotten results to match spec.
* Use a blocking queue to avoid the 2-second sleep.
For some inexplicable reason the following call.verifyRequest fails only for the V2 test and only from command line not IDE unless there is some Thread.sleep, even if it is only 1-millis.
Fix ConcurrentModificationException in PriorityLoadBalancer by making copy of children values to iterate rather than directly using children in for loop.
We use state-of-the-world approach. For LDS/CDS, the control plane must return all resources that the client has subscribed to in each request. If some LDS/CDS resources are gone in a new update, their corresponding RDS/EDS resources names will be onAbsent(), unless there is cached data that is in use by other subscribers in other components.
The motivations to remove this "retained resource" logic between resource types are:
1. Already handled by the subscribers, e.g. a CDS state would shut down its childLBs on new updates. XdsResolver for LdsUpdate would cancel all existing RDS subscriptions. Therefore the onAbsent() notification is effectively no-op.
2. Complexity.
ClusterImplLoadBalancer adds the ATTR_CLUSTER_NAME and
ATTR_SSL_CONTEXT_PROVIDER_SUPPLIER attributes to the EAG list when it
creates a new subchannel, but they are lost on subsequent address
updates. This change assures the attributes are also included on address
updates.
If a child policy triggers an update to the parent priority policy
it will be ignored if an update is already in process.
This is the second attempt to make this change, the first one caused a
problem with the ring hash LB. A new test that uses actual control plane
and data plane servers is now included to prove the issue no longer
appears.
This extracts the startup and shutdown code for the control and data
plane server to reparate JUnit rules, which allows this logic to be
resued in other tests in a simple manner. Also makes the test easier to
read with the boiler plate init code removed.
Now the xds resources are dynamically managed in resourceStore in xdsClient. The types is a xdsResourceType, singleton.
There is no longer hardcoded static list of known resource types, the subscription list is the source of truth.
AbstractXdsClient that manages AdsStream will only accept the xds resource types that has already has watchers subscribed to, same behaviour as before.
This fixes a regression in commit e1ad984. I'd create a test, but the
NPE gets thrown away in the context of the current test setup so can't
be created as quickly as we'd like to fix this. I have manually tested
in a custom reproduction to confirm it resolves the NPE.
Seen at b/248326695
```
java.lang.AssertionError: java.lang.NullPointerException
at io.grpc.xds.ClientXdsClient$1.uncaughtException(ClientXdsClient.java:89)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:97)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.xds.ClientXdsClient.cancelXdsResourceWatch(ClientXdsClient.java:327)
at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState$EdsClusterState.shutdown(ClusterResolverLoadBalancer.java:378)
at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState.shutdown(ClusterResolverLoadBalancer.java:206)
at io.grpc.util.GracefulSwitchLoadBalancer.shutdown(GracefulSwitchLoadBalancer.java:195)
at io.grpc.xds.ClusterResolverLoadBalancer.shutdown(ClusterResolverLoadBalancer.java:141)
at io.grpc.xds.CdsLoadBalancer2$CdsLbState.shutdown(CdsLoadBalancer2.java:136)
at io.grpc.xds.CdsLoadBalancer2.shutdown(CdsLoadBalancer2.java:110)
at io.grpc.util.GracefulSwitchLoadBalancer.shutdown(GracefulSwitchLoadBalancer.java:195)
at io.grpc.xds.ClusterManagerLoadBalancer$ChildLbState.shutdown(ClusterManagerLoadBalancer.java:256)
at io.grpc.xds.ClusterManagerLoadBalancer.shutdown(ClusterManagerLoadBalancer.java:138)
at io.grpc.internal.AutoConfiguredLoadBalancerFactory$AutoConfiguredLoadBalancer.shutdown(AutoConfiguredLoadBalancerFactory.java:164)
at io.grpc.internal.ManagedChannelImpl.shutdownNameResolverAndLoadBalancer(ManagedChannelImpl.java:381)
at io.grpc.internal.ManagedChannelImpl.access$8200(ManagedChannelImpl.java:118)
at io.grpc.internal.ManagedChannelImpl$DelayedTransportListener.transportTerminated(ManagedChannelImpl.java:2174)
at io.grpc.internal.DelayedClientTransport$3.run(DelayedClientTransport.java:122)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.internal.ManagedChannelImpl$RealChannel.shutdown(ManagedChannelImpl.java:1057)
at io.grpc.internal.ManagedChannelImpl.shutdown(ManagedChannelImpl.java:817)
at io.grpc.internal.ManagedChannelImpl.shutdownNow(ManagedChannelImpl.java:837)
at io.grpc.internal.ManagedChannelImpl.shutdownNow(ManagedChannelImpl.java:117)
at io.grpc.internal.ForwardingManagedChannel.shutdownNow(ForwardingManagedChannel.java:52)
at io.grpc.internal.ManagedChannelOrphanWrapper.shutdownNow(ManagedChannelOrphanWrapper.java:65)
at io.grpc.testing.integration.GrpclbFallbackTestClient.tearDown(GrpclbFallbackTestClient.java:178)
at io.grpc.testing.integration.GrpclbFallbackTestClient.main(GrpclbFallbackTestClient.java:67)
Caused by: java.lang.NullPointerException
at io.grpc.xds.ClientXdsClient.handleResourceResponse(ClientXdsClient.java:179)
at io.grpc.xds.AbstractXdsClient$AbstractAdsStream.handleRpcResponse(AbstractXdsClient.java:358)
at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1$1.run(AbstractXdsClient.java:511)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
... 26 more
```
* xds: security code refactoring/renaming
1) move certprovider package under security
2) refactor inner Factory into CertProviderClientSslContextProviderFactory and CertProviderServerSslContextProviderFactory
3) Make CertProviderClientSslContextProvider and CertProviderServerSslContextProvider non-public
4) use only public (non package private) types like SslContextProvider (instead of CertProviderClientSslContextProvider etc)
Mainly refactor work to make type specific xds resources generic, e.g.
1. Define abstract class XdsResourceType to be extended by pluggable new resources. It mainly contains abstract method doParse() to parse unpacked proto messges and produce a ResourceUpdate. The common unpacking proto logic is in XdsResourceType default method parse()
2. Move the parsing/processing logics to specific XdsResourceType. Implementing:
XdsListenerResource for LDS
XdsRouteConfigureResource for RDS
XdsClusterResource for CDS
XdsEndpointResource for EDS
3. The XdsResourceTypes are singleton. To process for each XdsClient, context is passed in parameters, defined by XdsResourceType.Args.
4. Watchers will use generic APIs to subscribe to resource watchXdsResource(XdsResourceType, resourceName, watcher). Watcher and ResourceSubscribers becomes java generic class.
Removes the option of skipping the update of the priority LB state when
the failover timer is pending.
This consistency facilitates a future change weher we delay child LB
status updates if the priority LB is performing an update. The upcoming
priority LB policy gRFC also does not require this update to ever be
skipped.
The log id is an incrementing value starting from 0. That means the same
binary on two different machines will have the same hashes for each
consecutive Channel. That was not at all the intension of CHANNEL_ID.
From gRFC A42:
> This can be used in similar situations to where Envoy uses
> connection_properties to hash on the source IP address.
No logic changes, just cleans up warnings to make spotting real problems easier.
Remove "public" declarations on interfaces
Remove duplicate semicolons (Java lines ending in ";;")
Remove unneeded import
Change non-javadoc comment to not start with "/**"
Remove unneeded explicit type declarations from generics
Fix broken javadoc links
It's been 17 months since the check was introduced, which is plenty for
the migration. Leaving ignoreRefreshNameResolutionCheck() in-place to
let users delete their call sites. We'll remove the method after a few
releases.
Fixes#9409
This fixes builds including dependencies from Maven that use
io.grpc:grpc-services or io.grpc:grpc-xds. It resolves this error:
```
no such target '@io_grpc_grpc_java//services:services': target 'services' not declared in package 'services' defined by services/BUILD.bazel and referenced by '@maven//:io_grpc_grpc_services'
```
Fixes#9419
- Reduce nesting level by using `continue`
- Rearrange the order when it's possible to bail out early
- Add comments explaining what case it is, and the logic behind it
Introduces a new acceptResolvedAddresses() to the LoadBalancer.
This will now be the preferred way to handle addresses from the NameResolver. The existing handleResolvedAddresses() will eventually be deprecated.
The new method returns a boolean based on the LoadBalancers ability to use the provided addresses. If it does not accept them, false is returned. LoadBalancer implementations using the new method should no longer implement the canHandleEmptyAddressListFromNameResolution(), which will eventually be removed, along with handleResolvedAddresses().
Backward compatibility will be maintained so existing load balancers using handleResolvedAddresses() will continue to work.
Additionally the previously deprecated handleResolvedAddressGroups() method is removed.
%s is fairly safe (requires a Formattable to use Locale), so %d is the
main risk item. Places that really didn't need to use String.format()
were converted to plain string concatenation. Logging locations were
generally converted to using the log infrastructure's delayed
formatting, which is generally locale-sensitive but we're okay with
that. That wasn't done in okhttp, however, because Android frequently
doesn't use MessageFormat so we'd lose the parameters. Everywhere else
was explicitly defined to be Locale.US, to be consistent independent of
the default system locale.