Android Gradle plugin bumped to 4.2.0 in examples, for Gradle 7 compat
and to match main build.
Jib 3 changed default base image away from distroless, but we do want
to use distroless.
Ring hash can only be used from within xds currently, because that's
the only way to get a hash assigned to RPCs which is required for it
to function. So it should be using the _experimental suffix like the
other only-used-from-xds policies.
These changes make the build compatible with Gradle 7, except for
Android which requires plugin updates.
I removed animalsniffer from binder because it did nothing (as there
were no signatures) and it was failing after setting toolVersion. It
failed because animalsniffer is only compatible with java plugin. After
this change I put the withId(animalsniffer) loading inside the
withId(java) to avoid a plugin ordering failure. That made it safe again
for binder to load animalsniffer, but it is still best to remove the
plugin from binder as it is misleading.
I did not upgrade Android plugin versions as newer versions (even 3.6)
require dealing with androidx (#8421).
- bump android plugin version to 4.2.0
- migrate deprecated android.support dependencies to androidx dependencies
- bump `targetSdkVersion` to 29
- temporarily ignore lint error for 'MissingClass' due to #8799
- run android CIs with `-Pandroid.useAndroidX=true -Pandroid.enableJetifier=true` flags
- android examples are still using android.support dependencies, will not be updated in this PR.
This is to keep names of the top-level process* functions called from
handle*Response functions, and returning *Update resources consistent:
- `handleLdsResponse()` -> `LdsUpdate processClientSideListener()`
`LdsUpdate processServerSideListener()`
- `handleCdsResponse()` -> `CdsUpdate processCluster()`
- `handleRdsResponse()` -> `RdsUpdate processRouteConfiguration()`
- `handleEdsResponse()` -> `EdsUpdate processClusterLoadAssignment()`
For some reason, processCluster() was renamed to parseCluster() in
fa4b980e0.
The GitHub Actions Linux Testing only reports limited information (can not see full stacktrace, time consumed, or stderr from child threads) when unit tests fail. Adding a step to upload the test report to Artifacts if the test fails. If the test is successful, no artifacts will be uploaded.
We are setting up fallback test based on TD. Currently the test client is compiled in google3, so we must run it in a container so that the client can have the GRTE dependency. However, container does not have `ip`, `iptables`, etc network command, so we plan to run the network command outside of the container. To do this, add a new flag `skipNetCmd` to skip network commands inside the test client.
Protobuf uses Guava 30.1.1, so I upgrade it at the same time. It also
caused an update to rules_jvm_external and reworking the Bazel build.
Protobuf no longer requires bind() so they were dropped. Although
Protobuf's protobuf_deps() brings in rules_jvm_external, and so we don't
need to define it ourselves, it seems better to define it directly and
not depend on transitive deps since we use it directly.
Protobuf now has support for maven_install() by exposing
PROTOBUF_MAVEN_ARTIFACTS, which required reorganizing the WORKSPACE to
use maven_install() after loading protobuf. Protobuf still doesn't
define target overrides for itself so we still maintain those. When
reorganizing the WORKSPACE I noticed http_archive should ideally be
above io_grpc_grpc_java as most users will need it there, so I fixed
that since there were lots of other load()-reordering already.
The `RlsProtoData.RouteLookupConfig` class is out-of-date.
- Some of the fields were long, but now are of `Duration` type.
- Some of the fields are deleted.
- The validation of some of the fields either have been changed or were wrong since beginning.
Now overhaul all the fields in `RlsProtoData.RouteLookupConfig` class based on the spec http://go/grpc-rls-lb-policy-design#heading=h.y3h669gfpown.
Also move the validation logic in json parsing rather than in the constructor of `RouteLookupConfig`.
Fix the NPE as shown in the following stacktrace:
```
Caused by: java.lang.RuntimeException: java.lang.NullPointerException with message: null
at io.grpc.census.CensusStatsModule$ClientTracer.recordFinishedAttempt(CensusStatsModule.java:388) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.census.CensusStatsModule$CallAttemptsTracerFactory.recordFinishedCall(CensusStatsModule.java:525) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.census.CensusStatsModule$CallAttemptsTracerFactory.attemptEnded(CensusStatsModule.java:492) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.census.CensusStatsModule$ClientTracer.streamClosed(CensusStatsModule.java:345) ~[grpc-census-1.42.0.jar:1.42.0]
at io.grpc.internal.StatsTraceContext.streamClosed(StatsTraceContext.java:155) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.closeListener(AbstractClientStream.java:458) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.access$400(AbstractClientStream.java:221) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState$1.run(AbstractClientStream.java:442) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.deframerClosed(AbstractClientStream.java:278) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.Http2ClientStreamTransportState.deframerClosed(Http2ClientStreamTransportState.java:31) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.MessageDeframer.close(MessageDeframer.java:233) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.MessageDeframer.closeWhenComplete(MessageDeframer.java:191) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractStream$TransportState.closeDeframer(AbstractStream.java:200) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:445) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:401) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.AbstractClientStream$TransportState.inboundTrailersReceived(AbstractClientStream.java:384) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.internal.Http2ClientStreamTransportState.transportTrailersReceived(Http2ClientStreamTransportState.java:183) ~[grpc-core-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientStream$TransportState.transportHeadersReceived(NettyClientStream.java:334) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.onHeadersRead(NettyClientHandler.java:372) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler.access$1200(NettyClientHandler.java:91) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
at io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler$FrameListener.onHeadersRead(NettyClientHandler.java:934) ~[grpc-netty-shaded-1.42.0.jar:1.42.0]
```
The NPE can happen when `ClientCall.Listener.onClose()` and `StatsTraceContext.streamClosed()` (or `ClientStreamListener.closed()`) are invoked concurrently in different threads. Note that `CensusStatsModule$CallAttemptsTracerFactory.attemptEnded()` in the above stack trace would observe `callEnded==true` in such a race condition.
The following are the possible scenarios that the race between `ClientCall.Listener.onClose()` and `ClientStreamListener.closed()` can happen:
- Deadline exceeded but the underlying real stream is [not committed](https://github.com/grpc/grpc-java/blob/v1.42.0/core/src/main/java/io/grpc/internal/RetriableStream.java#L486-L495), the `ClientCall.Listener` may be closed earlier than the stream listener. (This is the case of the above stack trace, in which the inbound end-of-stream is received observing `callEnded==true`. Even if nothing inbound is received, there is still a chance that the NPE can happen.)
- DelayedClientTransport.PendingStream has created a realStream but `setStream(realStream)` is [not called yet](https://github.com/grpc/grpc-java/blob/v1.42.0/core/src/main/java/io/grpc/internal/DelayedClientTransport.java#L366-L372), when deadline exceeded. (This has little chance to happen, only for the very first RPC on the channel.)
- Hedging case.
In deadline-exceeded cases, the shorter the deadline is, the more likely the race can happen.
Implement applying `server_listener_resource_name_template` and `client_listener_resource_name_template` with xdstp scheme, extracting authorities from xdstp resource URI and lookup authorities map in bootstrap.
As documented in https://developers.google.com/protocol-buffers/docs/proto3#json,
the canonical proto-to-json converter converts int64 (Java long) values to string values in Json rather than Json numbers (Java Double). Conversely, either Json string value or number value are accepted to be converted to int64 proto value.
To better support service configs defined by protobuf messages, support parsing String values as numbers in `JsonUtil`.
This introduces new TLS 1.2 cipher suites (#8610) and prepares the
internal okhttp implementation for TLS1.3. A new method for creating
internal ConnectionSpec was added to be able to use the newly introduced
cipher suites in the OkHttpChannelBuilder. Okhttp cipher suites
synchronized with the ones from netty.
DirectPath is going to support non-default service account. This commit
allows users to pass CallCredentials to GoogleDefaultChannelCredentials.
See design in go/directpath-file-credential-google-default-creds
- Partially revert the change of RlsProtoData.java in #8612 by removing `public` accessor
- Have grpc-xds no longer strongly depend on grpc-rls. The application will need grpc-rls as runtime dependencies if they need route lookup feature in xds.
- Parse RouteLookupServiceClusterSpecifierPlugin config to the Json/Map representation of `io.grpc.lookup.v1.RouteLookupClusterSpecifier` instead of `io.grpc.rls.RlsProtoData.RouteLookupConfig`
Fix bugs:
1. Invalid resource at xdsClient, the watcher should have been delivered an error instead of resource not found.
2. If the resource is properly determined to not exist, it shouldn't cause start() to fail. From A36 xDS for Servers:
"XdsServer's start must not fail due to transient xDS issues, like missing xDS configuration from the xDS server."
The addition of the authz tests in 0d345721 is causing the tests to
exceed their timeout. By itself, the authz test takes about an hour in
this environment. Before the authz tests, xds-k8s was taking an hour
and a half.