The `EdsLoadBalancerProvider` provides `LookasideLb` (Will rename `LookasideLb` to `EdsLoadBalancer` in future, but kept the name now to show better diff) with no-op callbacks for fallback.
- `CdsLoadBalancer` will load `EdsLoadBalancerProvider/LookasideLb` directly skipping fallback.
- The EDS-only flow is unchanged, still loading `XdsLoadBalancerProvider/XdsLoadBalancer2`, keeping current fallback behavior and producing horrible error message when both the primary and fallback policy fail.
This change implements a mechanism for printing xDS responses, which contains com.google.protobuf.Any type fields in proto messages, in human-readable format.
- Replace XdsComms2 with XdsClientImpl
- Enable/Disable load report stats with `XdsClient` APIs.
Testing strategy:
- Use real XdsClientImp for EDS-only because the balancer creates the XdsClient by itself. The state of the XdsClientImpl will be the actual state as real.
- Use mock XdsClient for non EDS-only case because the XdsClient in resolved addresses attributes is supposed to be a stateful XdsClient with some pre-existing CDS state, so creating a brand new real XdsClientImp in test can not simulate the same state. In this case only verify interaction with XdsClient APIs.
- Use a `LocalityStoreFactory` to verify interaction with `LocalityStore` APIs. However, this can not cover any interaction with the `Helper` and `LoadStatsStore` inputs of `LocalityStoreFactory.newLocalityStore(Helper, LoadBalancerRegistry, LoadStatsStore)`, so some basic non-exhaustive tests are added to cover the gap.
The testing strategy is imperfect but is a trade-off considering load stats report is very hard to test here and LocalityStore/real balancing behavior is too much to be exhaustively tested in `LookasideLb`.
This change will fail application RPC immediately if XdsClient encounters any error instead of retrying or getting to fallback silently.
There could be optimization if the channel is currently READY while XdsClient stream just closed due to connection error, in which case we could still be using the current available subchannels while retrying, but this requires the LB knows the semantics of error status from the XdsClient. This optimization is not worth the effort for now.
Use timeout to conclude resource not exist in xDS protocol.
RDS and EDS protocols are quasi-incremental, each response may not include all the requested resources that present on server side. The way to conclude a requested resource not exist is to use a timeout. In Envoy, this timeout is defined as initial fetch timeout, which is set up at the time client starts subscribing to some resource and disarmed at the time client receives update for that resource. In gRPC's implementation, we set this timeout to be constant 15 seconds, instead of getting its value from ConfigSource proto message.
Initial fetch timeout was initially considered to be not required for LDS and CDS. But gRPC is trying to avoid the temporary inconsistency in the case of racing request/response.
After resource fetch timers are fired, some resources are known to be absent for sure. XdsClient manages its knowledge for resources that are known to be present or absent with caches.
Enables the full flow of xDS protocol in gRPC. An XdsClient instance is created in XdsNameResolver when trying to resolve the address for URI with "xds-experimental" scheme. XdsClient sends LDS/RDS request under the hood to discover service's cluster information for the target URI. The XdsNameResolver then returns a service config containing cluster information to the channel. A reference of the XdsClient instance is also passed to the channel within the ResolutionResult.
A gRPC channel will only ever be interested in a single Listener. So each RDS request will request for at most one resource. By design, server is required to always send back client's newly requested resources, so client will always receive the RDS resource (if exists) after the request was sent. Therefore, client does not need to cache anything.
This change integrates invocation of client side load reporting into XdsClient's implementation:
- Changed LRS client implementation based on LRS design changes. In the new design, first LRS request contains a single ClusterStats message with cluster_name set to the cluster (AKA, CDS cluster) that this LRS client is reporting loads for (no stats data in first request). Then server responses back the name of cluster service (AKA, EDS service) to report loads for.
- Implemented newly proposed LRS client API for adding/removing sources of load stats data.
- Implemented XdsClient APIs for initiating/stopping load reporting.
Move helper methods for building xDS protobuf messages into a utility class for code sharing. Tests for gRPC components that use `XdsClient` instance may want to use these methods as well.
`GracefulSwitchLoadBalancer` was doing switch based on `LoadBalancerProvider.getPolicyName()`. This turned out to be very awkward when I have to synthesize a policy name for the provider, and what I actually care about is the identity of the lb provider not necessarily the policy name.
Now `GracefulSwitchLoadBalancer` is doing switch based on identity of `LoadBalancer.Factory`, which is simpler.
`XdsNameResolver` will eventually send out a CDS config instead EDS config, but balancer_name should never be sent out from `XdsNameResolver` anyway.
This PR is mainly to unblock current staging test.
Support bootstrap file containing multiple xDS servers, with each has its own server URI and channel credential options. Multiple xDS servers are provided in case of one not reachable. For now, we would only use the first one.
This change also formats JSON strings in bootstrap related tests and add several tests for parsing bootstrap JSON as completeness.
Implementation of XdsClient is changed to take in a list of xDS servers. But still, we only use the first one.
- Contains `ClusterWatcher` implementation. On cluster update, `ClusterWatcherImpl` spawns an EDS child balancer if not created, then based on the ClusterUpdate data, it sends an edsConfig to the EDS child balancer.
- `CdsLoadBalancer` reads `XdsClientRef` and `CdsConfig` from `resolvedAddresses`. Base on `resolvedAddresses`, it register a `ClusterWatcherImpl` to `XdsClient`.
- For a different cluster resource name in CdsConfig when `handleResolvedAddresses()`, `CdsLoadBalancer` will gracefully switch to the new cluster.