All BinderTransport transactions are oneway which means uncaught
Exceptions during processing are merely logged locally and not
propagated to the peer. Instead, we add a top level catch block
that handles the unexpected by shutting down the whole transport. This
makes our peer aware of the problem immediately (instead of relying on a
deadline) and gives clients a fresh transport instance to handle any
retries.
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:
1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used
`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.
To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
The internal build rule for this client already adds these dependencies, and is what related tests for these dependencies have been relying on for a while.
We're starting to use the per-release published interop client docker images for those tests too now, and these use the gradle build rules.
The xds server can take a really long time to start if the xds resources
are slow to load. Ideally the management server would be available
during this time so we can inspect the server. The server health still
won't go to SERVING until the xds server starts, which is appropriate.
There's still plenty more that could be done, but I want to keep this on
the simpler/less-invasive side and it'd just delay these changes for no
real benefit.
Previously builds were done with Ubuntu 16.04, and now we are using
18.04. Thus the generated binaries will no longer work for
Ubuntu 16.04 and Debian 9 users, both of which are outside of their
support window and aren't supported by Abseil. RHEL users are
unaffected, as the binaries already didn't work on RHEL 7 and they will
remain working with RHEL 8. FWIW, Ubuntu 18.04 will leave its support
window in June.
The point of the sorting is to reduce the chances of merge conflicts. I
greatly prefer verboseness over cleverness in examples, but the tasks
can only be sorted manually and there's so many of them.
It is counter-productive to do this for the examples that have their own
project folder, as there's so few tasks in that case that they don't
need to be ordered.
The version used by protoc-gen-grpc-java will be upgraded separately,
because of large C++ build changes necessary. But that won't impact
users at all. We are upgrading to protoc 22.3; only the grpc plugin is
not upgraded.
Bazel is upgraded for both Java and C++.
If a child load balancer rejects the addresses it if given all we can do
is to trigger a name resolution refresh and hope for a better set of
addresses.
This removes some steps from the release process. These two locations
aren't special enough in way that deserves manually changing the version
each release.
This will replace the kokoro-based CI. We need to upgrade the image
Kokoro runs on, but it is easier to add a GitHub Actions CI than to run
a one-off test on a new image with Kokoro.
A clean run takes 10 minutes and a cached run takes 8.5. So it is a
little bit slower than the 5.5 minutes on Kokoro, but still pretty
quick.
The problem was one hedge was committed before another had drained
start(). This was not testable because HedgingRunnable checks whether
scheduledHedgingRef is cancelled, which is racy, but there's no way to
deterministically trigger either race.
The same problem couldn't be triggered with retries because only one
attempt will be draining at a time. Retries with cancellation also
couldn't trigger it, for the surprising reason that the noop stream used
in cancel() wasn't considered drained.
This commit marks the noop stream as drained with cancel(), which allows
memory to be garbage collected sooner and exposes the race for tests.
That then showed the stream as hanging, because inFlightSubStreams
wasn't being decremented.
Fixes#9185
This flag is added in the U SDK, which is still under development. Since it's just a numeric constant, we copy the value until it is stable and mark the API is experimental, with appropriate warnings about depending on it from production code.
A follow-up change will be made after SDK finalization to point to the official constant (or otherwise update to match any SDK changes), at which point we can remove the `@ExternalApi` annotation.
See b/274061424
There was recently a failure with the Tomcat test in servlet/jakarta:
```
io.grpc.servlet.jakarta.TomcatInteropTest > pingPong FAILED
java.lang.AssertionError at AbstractInteropTest.java:845
Caused by: io.grpc.StatusRuntimeException at Status.java:539
...
* What went wrong:
Execution failed for task ':grpc-servlet-jakarta:tomcat10Test'.
> There were failing tests. See the report at: file:///home/runner/work/grpc-java/grpc-java/servlet/jakarta/build/reports/tests/tomcat10Test/index.html
```
But we couldn't get more details because servlet/jakarta didn't match
the artifact glob.
LoadWorkerTest.runUnaryBlockingClosedLoop and Http2NettyTest.tlsInfo are
failing every CI run. It appears they are the unfortunate tests run
first, so are slowest to start as classloading proceeds. There's
definitely other tests that probably need adjustment, but fixing these
two gives us some hope of having a green run occasionally.
* removed populating monitored resource to k8s_conatiner by default for logging; Delegating the resource detection to cloud logging library instead (enabled by default)
* remove kubernetes resource detection logic from observability