Triggering balancing state updates after already being shutdown can be confusing for the upstream of round_robin. In cases of the callers not managing round_robin's lifecycle (e.g., not ignoring updates after it shuts down round_robin, which it should), it can make problem very bad, especially with the behavior that round_robin is actually propagating TRANSIENT_FAILURE with a picker that buffers RPCs.
This change only polishes round_robin by always preserving its invariant. Callers/LBs should not rely on this and should still manage the balancing updates from its downstream correctly based on the downstream's lifetime.
Currently each subchannel implicitly refreshes the name resolution when its state changes to IDLE or TRANSIENT_FAILURE. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).
In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.
A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.
Fix the following bug:
ManagedChannelImpl.ConfigSelectingClientCall may return early in start() leaving delegate null, and fails request() method after start().
Currently the bug can only be triggered when using xDS.
ManagedChannelImpl should not override authority for createResolvingOobChannel(target, creds), because ManagedChannelImpl does not know what target and creds are.
Enhance `ManagedChannelBuilder.overrideAuthority()` to make it impossible to use a different authority to a backend by wrapping ClientTransportFactory.newClientTransport() and setting ClientTransportOptions’ authority. To avoid confusing the LB policy, it would need to keep the original addresses to return during `Subchannel.getAddresses()`
The class `OverrideAuthorityNameResolverFactory` is deleted and its logic is moved into `ManagedChannelImpl`.
- Add APIs to `ClientTransportFactory`:
```java
public interface ClientTransportFactory {
/**
* Swaps to a new ChannelCredentials with all other settings unchanged. Returns null if the
* ChannelCredentials is not supported by the current ClientTransportFactory settings.
*/
SwapChannelCredentialsResult swapChannelCredentials(ChannelCredentials channelCreds);
final class SwapChannelCredentialsResult {
final ClientTransportFactory transportFactory;
@Nullable final CallCredentials callCredentials;
}
}
```
- Add `ChannelCredentials` to constructor args of `ManagedChannelImplBuilder`:
```java
public ManagedChannelImplBuilder(
String target, @Nullable ChannelCredentials channelCreds, @Nullable CallCredentials callCreds, ...)
```
Improve the CallCredentialsApplyingTransport shutdown lifecycle management. Right now CallCredentialsApplyingTransport shutdown the delegated real transport too early. It should be waiting for the metadataAppliers to finish because they may execute asynchronously. In addition, there is no shutdown check on CallCredentialsApplyingTransport for newStream(). The degraded lifecycle implementation may cause RejectionExecutionException, or accepting new RPCs after the underlying transport is already closed during channel shutdown.
We added listener on metadataApplier to notify completion, a magic counter to track the pending metadataApplier for delaying shutdown, also added shutdown check for newStream().
This provides us a path forward with #7211 (hiding
AbstractManagedChannelImplBuilder and AbstractServerImplBuilder) while
providing users a migration path to manage the ABI breakage (#7552). We
do a .class hack so that recompiling avoids the internal class reference
yet the old methods are still available.
Leaving the classes as-is causes javac to compile two versions of each
method, one returning the public class (e.g. ServerBuilder) and one
returning the internal class (e.g., AbstractServerImplBuilder). However,
we rewrite the signature that is used at compile time so that new
compilations will not reference internal-returning methods.
This is intended to be temporary, just to give a migration path. Once we
have given users some time to recompile we will remove this rewriting
and change the generics to use public classes.
DelayedClientTransport needs to avoid becoming terminated while it owns
RPCs. Previously DelayedClientTransport could terminate when some of its
RPCs had their realStream but realStream.start() hadn't yet been called.
To avoid that, we now make sure to call realStream.start()
synchronously with setting realStream. Since start() and the method
calls before start execute quickly, we can run it in-line. But it does
mean we now need to split the Stream methods into "before start" and
"after start" categories for queuing.
Fixes#6283
Throw for subchannel creation if the channel is being shutting down and the delayed transport is terminated (aka, all retry calls has been finished). This enforces load balancer implementations to avoid creating subchannels after being shut down.
Change InternalServer to handle multiple addresses and implemented in NettyServer.
It makes ServerImpl to have a single transport server, and this single transport server (NettyServer) will bind to all listening addresses during bootstrap. (#7674)
There was a report from a user in b/176088054 that experienced an RPC
waiting_for_connection that failed after 45 minutes due to deadline.
That would be quite normal for wait-for-ready, but because we didn't
include that detail in the status we have to do code inspection to
determine if wait-for-ready was enabled.
API change (See go/grpc-rls-callcreds-to-server):
- Add `ChannelCredentials.withoutBearerTokens()`
- Add `createResolvingOobChannelBuilder(String, ChannelCredentials)`, `getChannelCredentials()` and `getUnsafeChannelCredentials()` for `LoadBalancer.Helper`
This PR does not include the implementation of `createResolvingOobChannelBuilder(String, ChannelCredentials)`.
The addition of CompositeChannelCredentials allowed CallCredentials to
be passed to the ManagedChannel itself. But the implementation was buggy
and used the call creds for out-of-band channels as well, which is
inappropriate since they have a different authority.
This also fixes a bug where resolving OOB channels would have CallCreds
duplicated; that wasn't noticed or important because we don't use
CallCreds in OOB channels.
Fixes#7643
Interceptor-based config selector will be needed for fault injection.
Add `interceptor` field to `InternalConfigSelector.Result`. Keep `callOptions` and `committedCallback` fields for the moment, because it needs a refactoring to migrate the existing xds config selector implementation to the new API.
Round robin is keeping use of READY subchannels even if there is name resolution error. However, it moves Channel state to TRANSIENT_ERROR.
In hierarchical load balancers, the upstream LB policy may need to aggregate pickers from multiple downstream round_robin LB policy while filtering out non-ready subchannels. It cannot infer if the subchannel can be used just from the SubchannelPicker interface. It relies on the state that the round_robin intends to set channel to.
So the change is to match the readiness of the picker/subchannel with the state that round_robin tries to update. It will completely ignore name resolution error if there are READY subchannels.
* fix channel builders ABI backward compatibility broken in v1.33.0
* fix server builders ABI backward compatibility broken in v1.33.0
* makes ForwardingServerBuilder package-private
With this, it will be clear if the RPC failed because the server didn't
use a double-GOAWAY or if it failed because of MAX_CONCURRENT_STREAMS or
if it was due to a local race. It also fixes the status code to be
UNAVAILABLE except for the RPCs included in the GOAWAY error (modulo the
Netty bug).
Fixes#5855
It deprecates ExpectedException and Assert.assertThat(T, org.hamcrest.Matcher).
Without Java 8 we don't want to migrate away from ExpectedException at
this time. We tend to prefer Truth over Hamcrest, so I swapped the one
instance of Assert.assertThat() to use Truth. With this change we get a
warning-less build with JUnit 4.13. We don't yet upgrade because we
still need to support JUnit 4.12 for some use-cases, but will be able to
upgrade to 4.13 soon when they upgrade.