The GitHub Actions Linux Testing only reports limited information (can not see full stacktrace, time consumed, or stderr from child threads) when unit tests fail. Adding a step to upload the test report to Artifacts if the test fails. If the test is successful, no artifacts will be uploaded.
We are setting up fallback test based on TD. Currently the test client is compiled in google3, so we must run it in a container so that the client can have the GRTE dependency. However, container does not have `ip`, `iptables`, etc network command, so we plan to run the network command outside of the container. To do this, add a new flag `skipNetCmd` to skip network commands inside the test client.
Protobuf uses Guava 30.1.1, so I upgrade it at the same time. It also
caused an update to rules_jvm_external and reworking the Bazel build.
Protobuf no longer requires bind() so they were dropped. Although
Protobuf's protobuf_deps() brings in rules_jvm_external, and so we don't
need to define it ourselves, it seems better to define it directly and
not depend on transitive deps since we use it directly.
Protobuf now has support for maven_install() by exposing
PROTOBUF_MAVEN_ARTIFACTS, which required reorganizing the WORKSPACE to
use maven_install() after loading protobuf. Protobuf still doesn't
define target overrides for itself so we still maintain those. When
reorganizing the WORKSPACE I noticed http_archive should ideally be
above io_grpc_grpc_java as most users will need it there, so I fixed
that since there were lots of other load()-reordering already.
The `RlsProtoData.RouteLookupConfig` class is out-of-date.
- Some of the fields were long, but now are of `Duration` type.
- Some of the fields are deleted.
- The validation of some of the fields either have been changed or were wrong since beginning.
Now overhaul all the fields in `RlsProtoData.RouteLookupConfig` class based on the spec http://go/grpc-rls-lb-policy-design#heading=h.y3h669gfpown.
Also move the validation logic in json parsing rather than in the constructor of `RouteLookupConfig`.
Fix the NPE as shown in the following stacktrace:
```
Caused by: java.lang.RuntimeException: java.lang.NullPointerException with message: null
at io.grpc.census.CensusStatsModule$ClientTracer.recordFinishedAttempt(CensusStatsModule.java:388) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.census.CensusStatsModule$CallAttemptsTracerFactory.recordFinishedCall(CensusStatsModule.java:525) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.census.CensusStatsModule$CallAttemptsTracerFactory.attemptEnded(CensusStatsModule.java:492) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.census.CensusStatsModule$ClientTracer.streamClosed(CensusStatsModule.java:345) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.internal.StatsTraceContext.streamClosed(StatsTraceContext.java:155) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.closeListener(AbstractClientStream.java:458) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.access$400(AbstractClientStream.java:221) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState$1.run(AbstractClientStream.java:442) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.deframerClosed(AbstractClientStream.java:278) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.Http2ClientStreamTransportState.deframerClosed(Http2ClientStreamTransportState.java:31) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.MessageDeframer.close(MessageDeframer.java:233) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.MessageDeframer.closeWhenComplete(MessageDeframer.java:191) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractStream$TransportState.closeDeframer(AbstractStream.java:200) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:445) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:401) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.inboundTrailersReceived(AbstractClientStream.java:384) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.Http2ClientStreamTransportState.transportTrailersReceived(Http2ClientStreamTransportState.java:183) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientStream$TransportState.transportHeadersReceived(NettyClientStream.java:334) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.onHeadersRead(NettyClientHandler.java:372) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.access$1200(NettyClientHandler.java:91) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler$FrameListener.onHeadersRead(NettyClientHandler.java:934) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
```
The NPE can happen when `ClientCall.Listener.onClose()` and `StatsTraceContext.streamClosed()` (or `ClientStreamListener.closed()`) are invoked concurrently in different threads. Note that `CensusStatsModule$CallAttemptsTracerFactory.attemptEnded()` in the above stack trace would observe `callEnded==true` in such a race condition.
The following are the possible scenarios that the race between `ClientCall.Listener.onClose()` and `ClientStreamListener.closed()` can happen:
- Deadline exceeded but the underlying real stream is [not committed](https://github.com/grpc/grpc-java/blob/v1.42.0/core/src/main/java/io/grpc/internal/RetriableStream.java#L486-L495), the `ClientCall.Listener` may be closed earlier than the stream listener. (This is the case of the above stack trace, in which the inbound end-of-stream is received observing `callEnded==true`. Even if nothing inbound is received, there is still a chance that the NPE can happen.)
- DelayedClientTransport.PendingStream has created a realStream but `setStream(realStream)` is [not called yet](https://github.com/grpc/grpc-java/blob/v1.42.0/core/src/main/java/io/grpc/internal/DelayedClientTransport.java#L366-L372), when deadline exceeded. (This has little chance to happen, only for the very first RPC on the channel.)
- Hedging case.
In deadline-exceeded cases, the shorter the deadline is, the more likely the race can happen.
Implement applying `server_listener_resource_name_template` and `client_listener_resource_name_template` with xdstp scheme, extracting authorities from xdstp resource URI and lookup authorities map in bootstrap.
As documented in https://developers.google.com/protocol-buffers/docs/proto3#json,
the canonical proto-to-json converter converts int64 (Java long) values to string values in Json rather than Json numbers (Java Double). Conversely, either Json string value or number value are accepted to be converted to int64 proto value.
To better support service configs defined by protobuf messages, support parsing String values as numbers in `JsonUtil`.
This introduces new TLS 1.2 cipher suites (#8610) and prepares the
internal okhttp implementation for TLS1.3. A new method for creating
internal ConnectionSpec was added to be able to use the newly introduced
cipher suites in the OkHttpChannelBuilder. Okhttp cipher suites
synchronized with the ones from netty.
DirectPath is going to support non-default service account. This commit
allows users to pass CallCredentials to GoogleDefaultChannelCredentials.
See design in go/directpath-file-credential-google-default-creds
- Partially revert the change of RlsProtoData.java in #8612 by removing `public` accessor
- Have grpc-xds no longer strongly depend on grpc-rls. The application will need grpc-rls as runtime dependencies if they need route lookup feature in xds.
- Parse RouteLookupServiceClusterSpecifierPlugin config to the Json/Map representation of `io.grpc.lookup.v1.RouteLookupClusterSpecifier` instead of `io.grpc.rls.RlsProtoData.RouteLookupConfig`
Fix bugs:
1. Invalid resource at xdsClient, the watcher should have been delivered an error instead of resource not found.
2. If the resource is properly determined to not exist, it shouldn't cause start() to fail. From A36 xDS for Servers:
"XdsServer's start must not fail due to transient xDS issues, like missing xDS configuration from the xDS server."
The addition of the authz tests in 0d345721 is causing the tests to
exceed their timeout. By itself, the authz test takes about an hour in
this environment. Before the authz tests, xds-k8s was taking an hour
and a half.
Generating a uuid in filterChain breaks the de-duplication detection which causes XdsServer to cycle connections, so removing it.
An empty name is now allowed. The name is currently only used for debug purpose.
Add AbstractXdsInteropTest, XdsTestControlPlaneService and only ping-pong testcase in initial implementation.
AbstractXdsInteropTest sets up the test control plane, create xdsClient and xdServer using bootstrap override, test case extending AbstractXdsInteropTest is supposed to override the control plane config and run the verification.
XdsTestControlPlaneService only has static xds configurations, not able to keep states.
How to run:
./gradlew :grpc-interop-testing:installDist -PskipCodegen=true
./interop-testing/build/install/grpc-interop-testing/bin/xds-e2e-test-client
Addresses a problem where we initially only resolve addresses to the backends, but not the load balancer and then later resolve addresses to both. In this situation the fallback timer was started during the second instance even if it resulted in the timer later failing as we were already using fallback backends.
This change assures that a fallback time is only ever started if we are not already using the fallback backends.
This is a follow-up fix to #8253.
The previous attempt at this CL relied on guava's Hashing class which
is still in beta. This update compares Signature objects directly instead
of SHA256 hashs, removing the need for the Hashing class.
Add additional comments to the security policy class, to mention that
implementing new policies requires significant care.
With that in mind, add security policies to check the peer app's
signature, so people can create cross-app communication without
having to implement their own policy.
Finally, add the UntrustedSecurityPolicies class, since that's
inevitably a policy which is sometimes needed.