* xds: Custom LB configs to support UDPA TypeStruct
The legacy com.github.udpa.udpa.type.v1.TypedStruct proto should be supported in addition to the current com.github.udpa.udpa.type.v1.TypedStruct one.
Co-authored-by: Sergii Tkachenko <hi@sergii.org>
There are still some cases for xdstp processing, but they are to percent
encoding replacement strings. Those seem better to leave running since
it looks like it they could be triggered even with federation disabled
in the bootstrap processing.
Two main incompatibilities existed in the copy of protos in grpc-proto:
no SimpleContext and an Empty method argument was replaced with a
message. "Context" is a very old word for "Metadata" back from the days
before the current gRPC protocol. We don't need that message in
particular, and well-known protos actually works in Protobuf Lite these
days, so we can swap to wrappers.proto's StringValue and don't need to
upstream a change to grpc-proto. The argument problem is fixed just by
changing the type in the Java code.
With the incompatibilities fixed, do a sync from grpc-proto and include
interop-testing.
runUnaryBlockingClosedLoop is failing after 10.3s, which means 5.3s was
probably spent loading the LoadWorker. That means things are likely
indeed slow enough that 5s isn't enough time to wait for the server to
start. A successful execution of runUnaryBlockingClosedLoop takes
12.1 seconds, which again points to general slow execution.
This was observed in the Bazel/Blaze build where io.grpc.util is a
separate target from the rest of core. During the build of a library
SecretRoundRobinLoadBalancerProvider was not on the classpath, and the
library was later included into a binary using grpc-core from Maven
Central which includes SecretRoundRobinLoadBalancerProvider.
```
java.util.ServiceConfigurationError: Provider io.grpc.util.SecretRoundRobinLoadBalancerProvider$Provider could not be instantiated java.lang.ClassCastException: class io.grpc.util.SecretRoundRobinLoadBalancerProvider$Provider cannot be cast to some.app.aaa.aab
```
These APIs were added to NettyServerBuilder for gRFC A8 and A9. They are
important enough that they shouldn't require using the perma-unstable
transport API to access. This change also allows using these methods
with grpc-netty-shaded.
Fixes#8991
Not updating the example WORKSPACE because it doesn't have any
Bazel-enabled build that depends on xds and so doesn't need the
additional repository dependencies.
Fixes#9162
Support for the is_optional logic in Cluster Specifier Plugins:
if unsupported Cluster Specifier Plugin is optional,
don't NACK, and skip any routes that point to it.
WrrLocalityLoadBalancer should remove the locality weights attribute from Resolved addresses after using the information. Not propagating this attribute will make it impossible for another child wrr_locality from working. This is not a supported situation and this change make the failure happen earlier and to be more obvious as to the cause.
remove OrcaOobHelperWrapper layer. Use OrcaOobUtil.updateListener() to set OrcaOobReportListener per each subchannel, not per helper. OrcaOobReportListener is per helper+subchannel unique.
Orca stats are created when creating helper.createSubchannels(), overriding subchannel attributes to store orcaState in the orcaHelper created subchannels
moved orcaMetrics to service project and renamed to MetricRecorder, added internal accessor. Maybe we can combine metricsRecorder and callMetricRecorder, they looks almost the same things.
OrcaServiceImpl depends on metricRecorder, not visa versa.
The transport should be usable with non-`unix:` name resolvers. As long
as the name resolver returns the correct socket address type, things
should work fine.
Instead of providing round robin or least request configurations directly, ClientXdsClient now wraps them in a WRR locality config.
ClusterResolverLoadBalancer passes this configuration directly to PriorityLoadBalancer to use as the endpoint LB policy it provides to ClusterImplLoadBalancer. A new ResolvedAddresses attribute is also set that has all the locality weights. This is needed by WrrLocalityLoadBalancer when it configures WeightedTargetLoadBalancer.
Renames the LegacyLoadBalancerConfigFactory to just LoadBalancerConfigFactory and gives it responsibility for both the legacy and the new LB config mechanism.
The new configuration mechanism is explained in gRFC A52: https://github.com/grpc/proposal/pull/298
* netty: implement UdsNameResolver and UdsNettyChannelProvider
When the scheme is "unix:" we get the UdsNettyChannelProvider to
create a NettyChannelBuilder with DomainSocketAddress type and
other related params needed for UDS sockets
This should fix test failures on aarch64.
```
expected to be less than: 0.0
but was : 0.0
at app//io.grpc.benchmarks.driver.LoadWorkerTest.assertWorkOccurred(LoadWorkerTest.java:198)
at app//io.grpc.benchmarks.driver.LoadWorkerTest.runUnaryBlockingClosedLoop(LoadWorkerTest.java:90)
```
runUnaryBlockingClosedLoop() has been failing but the other tests
suceeding. The failure is complaining that getCount() == 0, which means
no RPCs completed. The slowest successful test has a mean RPC time of
226 ms (the unit was logged incorrectly) and comparing to x86 tests
runUnaryBlockingClosedLoop() is ~2x as slow because it executes first.
So this is probably _barely_ failing and 4 attempts instead of 3 would
be sufficient. While the test tries to wait for 10 RPCs to complete, it
seems likely it is stopping early even for the successful runs on
aarch64. There are 4 concurrent RPCs, so to get 10 RPCs we need to wait
for 3 batches of RPCs to complete which would be 1346 ms (5 loops)
assuming a 452 ms mean latency. Bumping timeout by 10x to give lots of
headroom.
When a problem happens, it will now report back quickly instead of
waiting until the timeout expires. The timeout exception will also
report each RPC's state.
This is to help diagnose aarch64 test failures.
The test appears to be slow because of classloading. The failure cases
were very slow at 14-16 seconds, but looking at other logs it succeeds
after 12 seconds. It is the first test in the class, and the other tests
run much faster. This could be solved with warmup code, but increasing
the RPC deadline is easier.
Two back-to-back failures on aarch64:
https://source.cloud.google.com/results/invocations/c4612a28-d594-42e9-b8ab-12c999690b40/targetshttps://source.cloud.google.com/results/invocations/3d5d1dc2-6b47-493d-b15c-e99458067d73/targets
```
expected to be true
at app//io.grpc.rls.CachingRlsLbClientTest.rls_withCustomRlsChannelServiceConfig(CachingRlsLbClientTest.java:267)
```
And the next run failed on a different line but seems the same cause:
https://source.cloud.google.com/results/invocations/546b83d1-cd26-4b87-8871-a7a06a60dc06/targets
```
expected to be true
at app//io.grpc.rls.CachingRlsLbClientTest.rls_withCustomRlsChannelServiceConfig(CachingRlsLbClientTest.java:273)
```
Reproduced with:
```diff
diff --git a/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java b/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java
index 9fac852fa..631d632eb 100644
--- a/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java
+++ b/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java
@@ -264,6 +264,11 @@ public class CachingRlsLbClientTest {
// initial request
CachedRouteLookupResponse resp = getInSyncContext(routeLookupRequest);
+ try {
+ Thread.sleep(2000);
+ } catch (Exception e) {
+ throw new RuntimeException(e);
+ }
assertThat(resp.isPending()).isTrue();
// server response
```
1. move orca from xds and from service to io.grpc.xds.orca new package
2. keep CallMetricsRecorder and InternalCallMetricsRecorder in service
3. Added APIs for recording utilization/requestCost/cpuUtilization/memoryUtilzation for per-query requests, added internal data structure equivalent to OrcaLoadReport