Update LB policy config generation to support ring hash policy as the endpoint-level LB policy.
- Changed the CDS LB policy to accept RING_HASH as the endpoint LB policy from CDS updates. This configuration is directly passed to its child policy (aka, ClusterResolverLoadBalancer) in its config.
- Changed ClusterResolverLoadBalancer to generate different LB configs for its downstream LB policies, depending on the endpoint-level LB policies.
- If the endpoint-level LB policy is ROUND_ROBIN, the downstream LB policy hierarchy is: PriorityLB -> ClusterImplLB -> WeightedTargetLB -> RoundRobinLB
- If the endpoin-level LB policy is RNIG_HASH, the downstream LB policy hierarchy is: PriorityLB -> ClusterImplLB -> RingHashLB.
Currently each subchannel implicitly refreshes the name resolution when its state changes to IDLE or TRANSIENT_FAILURE. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).
In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.
A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.
In the ring hash LB policy, building the ring is computationally heavy. Although using a larger ring can make the RPC distribution closer to the actual weights of hosts, it takes long time to finish the test.
Internally, each test class is expected to finish within 1 minute, while each of the test cases for testing pick distribution takes about 30 sec. By reducing the ring size by a factor of 10, the time spent for those test cases reduce to 1-2 seconds. Now we need larger tolerance for the distribution (three hosts with weights 1:10:100):
- With a ring size of 100000, the 10000 RPCs distribution is close to 91 : 866 : 9043
- With a ring size of 10000, the 10000 RPCs distribution is close to 104 : 808 : 9088
Roughly, this is still acceptable.
Implementation for the ring hash LB policy. A LoadBalancer that provides consistent hashing based load balancing to upstream hosts, with the "Ketama" hashing that maps hosts onto a circle (the "ring") by hashing its addresses. Each request is routed to a host by hashing some property of the request and finding the nearest corresponding host clockwise around the ring. Each host is placed on the ring some number of times proportional to its weight. With the ring partitioned appropriately, the addition or removal of one host from a set of N hosts will affect only 1/N requests.
Fixes inconsistent state of XdsNameResolver with most recently received xDS configurations. The full suite of configurations should always be updated whenever receiving new resource updates, including updates that revoke currently in-use resources. Reference counts for currently in-use clusters should also be cleaned up properly. Otherwise, re-receiving (after being revoked) the same resource can be treated as if the configuration never changed.
Updates TestServiceClient to support creating channel without target port (mostly useful for xDS that uses the channel target as the resource name).
Adds an env var for overriding the TD URI used in cloud-to-prod test.
Initially we were about to support both xx_hash and murmur_hash. But later we decided to support xx_hash only. So for any hashing we are using in xDS, it's going to be xx_hash. Therefore, we do not need to define and carry this configuration around.
Implemented CloudToProdNameResolver, which will be used for DirectPath with URI scheme "google-c2p". The resolver is only a wrapper that delegates name resolution either to DNS or xDS resolver depending on the environment. If it is delegating to the xDS resolver, it will send HTTP requests (to a local HTTP server) to fetch metadata that is used to generate a bootstrap config. The self-generated bootstrap will be used for xDS.
Implement a simple allocation-free xx_hash utility class without using sun.misc.Unsafe. The hash function mainly targets on xDS use case, which is mostly small strings (endpoint address, headers, etc) and primitive types.
In gRPC's use case, string characters need to be treated as ASCIIs to make the produced hash values match other implementations (Envoy, gRPC-Go, C-core, etc) would produce.
The hashing implementation and tests are borrowed from OpenHFT's XxHash implementation (https://github.com/OpenHFT/Zero-Allocation-Hashing/blob/master/src/main/java/net/openhft/hashing/XxHash.java, see commit 658079a50903c32c54f2ab5c86243244b3ac60ed), which is under Apache 2.0 license. For more details, see https://github.com/OpenHFT/Zero-Allocation-Hashing.
The code is made to be in third_party directory with LISENCE and NOTICE files.