This change assures that if there are only calls in real transport the
channel will remain in idle mode. Idle mode will be exited if there
are calls in delayed transport to allow them to be processed.
The semantics around cancel vary slightly between ServerCall and CancellableContext - the context should always be cancelled regardless of the outcome of the call while the ServerCall should only be cancelled on a non-OK status.
This fixes a bug where the ServerCall was always marked cancelled regardless of call status.
Fixes#5882
Stabilize `enableRetry()` and `disableRetry()`.
Disable retry in `ManagedChannelImplTest` because each call attempt will fork the headers to a new instance, and add a ClientStreamTracer.Factory for bufferSizeLimit in CallOptions, which makes verification not straightforward.
There has been an issue about flow control when retry is enabled.
Currently we call `masterListener.onReady()` whenever `substreamListener.onReady()` is called.
The user's `onReady()` implementation might do
```
while(observer.isReady()) {
// send one more message.
}
```
However, currently if the `RetriableStream` is still draining, `isReady()` is false, and user's `onReady()` exits immediately. And because `substreamListener.onReady()` is already called, it may not be called again after drained.
This PR fixes the issue by
- Use a SerializeExecutor to call all `masterListener` callbacks.
- Once `RetriableStream` is drained, check `isReady()` and if so call `onReady()`.
- Once `substreamListener.onReady()` is called, check `isReady()` and only if so we call `masterListener.onReady()`.
While adding regression tests to #8386, I found a bug in an edge case: while retry attempt is draining the last buffered entry, if it is in the mean time committed and then we cancel the call, the stream will never be cancelled. See the regression test case `commitAndCancelWhileDraining()`.
There is a bug in the scenario of the following sequence of events:
- `call.start()`
- received retryable status and about to retry
- The retry attempt Substream is created but not yet `drain()`
- `call.cancel()`
But `stream.cancel()` cannot be called prior to `stream.start()`, otherwise retry will cause a failure with IllegalStateException. The current RetriableStream code must be fixed to not cancel a stream until `start()` is called in `drain()`.
Rebased PR #8343 into the first commit of this PR, then (the 2nd commit) reverted the part for metric recording of retry attempts. The PR as a whole is mechanical refactoring. No behavior change (except that some of the old code path when tracer is created is moved into the new method `streamCreated()`).
The API change is documented in go/grpc-stats-api-change-for-retry-java
This change adds a traditional try/finally block around readers and
streams to control the closing of those objects when the method has
completed rather than relying on the GC to deal with them.
This issue was flagged by an analysis tool via binary analysis of the
grpc-core package as part of a dependency from another project.
We used to have two ClientStreamListener.closed() methods. One is simply calling the other with default arg. This doubles debugging (e.g. #7921) and sometimes unit testing work. Deleting the 2-arg method to cleanup.
This PR is purely refactoring.
Add ServerCallExecutorSupplier interface in serverBuilder to allow defining which executor to to handle the server call.
Split StreamCreated() contextRunnable into two to support this new feature: one for method lookup, the other for server call handling. methodLookup() runs on default executor, handleServerCall() may run on the executorSupplier executor.
callbacks are queued after methodLookup() and handleServerCall() on serializing executor to ensure stream listener is set when callbacks starts running.
Make executor settable in serializing executor to allow switching executor for the server call handling runnable as a result of the outcome of the method lookup runnable.
failOnVersionConflict has never been good for us. It is equivalent to
Maven dependencyConvergence which we discourage our users to use because
it is too tempermental and _creates_ version skew issues over time.
However, we had no real alternative for determining if our deps would be
misinterpeted by Maven.
failOnVersionConflict has been a constant drain and makes it really hard
to do seemingly-trivial upgrades. As evidenced by protobuf/build.gradle
in this change, it also caused _us_ to introduce a version downgrade.
This introduces our own custom requireUpperBoundDeps implementation so
that we can get back to simple dependency upgrades _and_ increase our
confidence in a consistent dependency tree.
This adds support for creating a Netty Channel with SocketAddress and ChannelCredentials.
This aligns with NettyServerBuilder.forAddress(SocketAddress address, ServerCredentials creds).
Enables a codepath for zero-copy protobuf deserialization. Two new InputStream extension interfaces are added:
- HasByteBuffer: allows access to the underlying buffers containing inbound bytes directly without copying
- Detachable: allows customer marshaller to keep the buffers around until the application code is done with using the protobuf messages
Applications can implement a custom marshaller that takes over the ownership of ByteBuffers and wrap them into ByteStrings with protobuf's UnsafeByteOperations support. Then a RopeByteString, which is a in-place composite of ByteStrings can be created. This enables using the zero-copy codepath (requires immutable ByteBuffer indication) of CodedInputStream for deserialization.
Since static methods are pseudo-inherited by Builder implementations but
are trivially accidentally used, we re-define static methods in each
builder to make them behave more like the caller would expect. However,
not all the methods actually work; some just throw because the caller
was certainly not getting what they would expect.
Annotating with `@DoNotCall` can expose the problems at compile time
instead of runtime. While `@Deprecated` would also be an option, it is a
bit harder to figure out the ramifications and whether we want to go
that route.
This change was suggested by a lint tool for XdsServerBuilder and it
seems appropriate so I applied it to the other similar cases I could
find.
Triggering balancing state updates after already being shutdown can be confusing for the upstream of round_robin. In cases of the callers not managing round_robin's lifecycle (e.g., not ignoring updates after it shuts down round_robin, which it should), it can make problem very bad, especially with the behavior that round_robin is actually propagating TRANSIENT_FAILURE with a picker that buffers RPCs.
This change only polishes round_robin by always preserving its invariant. Callers/LBs should not rely on this and should still manage the balancing updates from its downstream correctly based on the downstream's lifetime.
Currently each subchannel implicitly refreshes the name resolution when its state changes to IDLE or TRANSIENT_FAILURE. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).
In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.
A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.
Fix the following bug:
ManagedChannelImpl.ConfigSelectingClientCall may return early in start() leaving delegate null, and fails request() method after start().
Currently the bug can only be triggered when using xDS.
ManagedChannelImpl should not override authority for createResolvingOobChannel(target, creds), because ManagedChannelImpl does not know what target and creds are.
Enhance `ManagedChannelBuilder.overrideAuthority()` to make it impossible to use a different authority to a backend by wrapping ClientTransportFactory.newClientTransport() and setting ClientTransportOptions’ authority. To avoid confusing the LB policy, it would need to keep the original addresses to return during `Subchannel.getAddresses()`
The class `OverrideAuthorityNameResolverFactory` is deleted and its logic is moved into `ManagedChannelImpl`.
- Add APIs to `ClientTransportFactory`:
```java
public interface ClientTransportFactory {
/**
* Swaps to a new ChannelCredentials with all other settings unchanged. Returns null if the
* ChannelCredentials is not supported by the current ClientTransportFactory settings.
*/
SwapChannelCredentialsResult swapChannelCredentials(ChannelCredentials channelCreds);
final class SwapChannelCredentialsResult {
final ClientTransportFactory transportFactory;
@Nullable final CallCredentials callCredentials;
}
}
```
- Add `ChannelCredentials` to constructor args of `ManagedChannelImplBuilder`:
```java
public ManagedChannelImplBuilder(
String target, @Nullable ChannelCredentials channelCreds, @Nullable CallCredentials callCreds, ...)
```
Improve the CallCredentialsApplyingTransport shutdown lifecycle management. Right now CallCredentialsApplyingTransport shutdown the delegated real transport too early. It should be waiting for the metadataAppliers to finish because they may execute asynchronously. In addition, there is no shutdown check on CallCredentialsApplyingTransport for newStream(). The degraded lifecycle implementation may cause RejectionExecutionException, or accepting new RPCs after the underlying transport is already closed during channel shutdown.
We added listener on metadataApplier to notify completion, a magic counter to track the pending metadataApplier for delaying shutdown, also added shutdown check for newStream().
This provides us a path forward with #7211 (hiding
AbstractManagedChannelImplBuilder and AbstractServerImplBuilder) while
providing users a migration path to manage the ABI breakage (#7552). We
do a .class hack so that recompiling avoids the internal class reference
yet the old methods are still available.
Leaving the classes as-is causes javac to compile two versions of each
method, one returning the public class (e.g. ServerBuilder) and one
returning the internal class (e.g., AbstractServerImplBuilder). However,
we rewrite the signature that is used at compile time so that new
compilations will not reference internal-returning methods.
This is intended to be temporary, just to give a migration path. Once we
have given users some time to recompile we will remove this rewriting
and change the generics to use public classes.
DelayedClientTransport needs to avoid becoming terminated while it owns
RPCs. Previously DelayedClientTransport could terminate when some of its
RPCs had their realStream but realStream.start() hadn't yet been called.
To avoid that, we now make sure to call realStream.start()
synchronously with setting realStream. Since start() and the method
calls before start execute quickly, we can run it in-line. But it does
mean we now need to split the Stream methods into "before start" and
"after start" categories for queuing.
Fixes#6283
Throw for subchannel creation if the channel is being shutting down and the delayed transport is terminated (aka, all retry calls has been finished). This enforces load balancer implementations to avoid creating subchannels after being shut down.
Change InternalServer to handle multiple addresses and implemented in NettyServer.
It makes ServerImpl to have a single transport server, and this single transport server (NettyServer) will bind to all listening addresses during bootstrap. (#7674)
There was a report from a user in b/176088054 that experienced an RPC
waiting_for_connection that failed after 45 minutes due to deadline.
That would be quite normal for wait-for-ready, but because we didn't
include that detail in the status we have to do code inspection to
determine if wait-for-ready was enabled.
API change (See go/grpc-rls-callcreds-to-server):
- Add `ChannelCredentials.withoutBearerTokens()`
- Add `createResolvingOobChannelBuilder(String, ChannelCredentials)`, `getChannelCredentials()` and `getUnsafeChannelCredentials()` for `LoadBalancer.Helper`
This PR does not include the implementation of `createResolvingOobChannelBuilder(String, ChannelCredentials)`.
The addition of CompositeChannelCredentials allowed CallCredentials to
be passed to the ManagedChannel itself. But the implementation was buggy
and used the call creds for out-of-band channels as well, which is
inappropriate since they have a different authority.
This also fixes a bug where resolving OOB channels would have CallCreds
duplicated; that wasn't noticed or important because we don't use
CallCreds in OOB channels.
Fixes#7643
Interceptor-based config selector will be needed for fault injection.
Add `interceptor` field to `InternalConfigSelector.Result`. Keep `callOptions` and `committedCallback` fields for the moment, because it needs a refactoring to migrate the existing xds config selector implementation to the new API.
Round robin is keeping use of READY subchannels even if there is name resolution error. However, it moves Channel state to TRANSIENT_ERROR.
In hierarchical load balancers, the upstream LB policy may need to aggregate pickers from multiple downstream round_robin LB policy while filtering out non-ready subchannels. It cannot infer if the subchannel can be used just from the SubchannelPicker interface. It relies on the state that the round_robin intends to set channel to.
So the change is to match the readiness of the picker/subchannel with the state that round_robin tries to update. It will completely ignore name resolution error if there are READY subchannels.
* fix channel builders ABI backward compatibility broken in v1.33.0
* fix server builders ABI backward compatibility broken in v1.33.0
* makes ForwardingServerBuilder package-private
With this, it will be clear if the RPC failed because the server didn't
use a double-GOAWAY or if it failed because of MAX_CONCURRENT_STREAMS or
if it was due to a local race. It also fixes the status code to be
UNAVAILABLE except for the RPCs included in the GOAWAY error (modulo the
Netty bug).
Fixes#5855
It deprecates ExpectedException and Assert.assertThat(T, org.hamcrest.Matcher).
Without Java 8 we don't want to migrate away from ExpectedException at
this time. We tend to prefer Truth over Hamcrest, so I swapped the one
instance of Assert.assertThat() to use Truth. With this change we get a
warning-less build with JUnit 4.13. We don't yet upgrade because we
still need to support JUnit 4.12 for some use-cases, but will be able to
upgrade to 4.13 soon when they upgrade.
As mentioned in https://github.com/grpc/grpc-java/pull/7413#issuecomment-690756200 `RealChannel` did not manage `configSelector`, and therefore `configSelector.get()`, `configSelector.set()` and `drainPendingCalls()` were scattered everywhere in `ManagedChannelImpl`. This PR re-organizes `RealChannel` to manage `configSelector`.
Fixing the bug: if two consecutive name resolution updates are queued together in SynchronizationContext, drainPendingCalls() might be called twice and be broken.
There was bug that new pending calls were not drained after channel is shutdown. The bug was worked around by #7354 .
Fixing by making sure new calls fail immediately if the channel is already shutdown.
We used `null` and `RetryPolicy.DEFAULT` for the value of `retryPolicy` in `RetriableStream` to distinguish the state between name resolution not being completed and name resolution being completed but no retry policy. After the change #7259 , name resolution is always completed when creating a `RetriableStream`, so the distinction will be gone. It will be cleaner to get rid of `RetryPolicy.DEFAULT` and simply use `null` for absence of RetryPolicy. `RetryPolicy.Provider` will be deleted in upcoming PR.
Java 9 introduces overridden methods with covariant return types for the following methods in java.nio.ByteBuffer:
- position(int newPosition)
- limit(int newLimit)
- flip()
- clear()
- mark()
- reset()
- rewind()
In Java 9 they all now return ByteBuffer, whereas the methods they override return Buffer, resulting in exceptions like this when executing on Java 8 and lower:
java.lang.NoSuchMethodError: java.nio.ByteBuffer.limit(I)Ljava/nio/ByteBuffer
This is because the generated byte code includes the static return type of the method, which is not found on Java 8 and lower because the overloaded methods with covariant return types don't exist (the issue appears even with source and target 8 or lower in compilation parameters).
The solution is to cast ByteBuffer instances to Buffer before calling the method.
ManagedChannelImpl.newCall() will return a DelayedClientCall until the name resolver updates the configSelector reference.
The configSelector follows the same service config error handling rules.
Made the following assumption:
If there is no service config in resolution result, then there must be no config selector in the resolution result. Actually we ignore any config selector in the resolution result if there is no service config.
Resolves#7222: If a hedging substream fails triggering throttling threshold, the call should be committed.
Refactored RetryPlan to two separate classes RetryPlan and HedgingPlan.
verifyZeroInteractions has the same behavior as verifyNoMoreInteractions. It
was deprecated in Mockito 3.0.1 and replaced with verifyNoInteractions, which
does not change behavior depending on previous verify() calls. All instances
were replaced with verifyNoInteractions, except those in
ApplicationThreadDeframerTest which were replaced with verifyNoMoreInteractions
since there is a verify() call in `@Before`.
Adding `DelayedClientCall` in preparation of implementing `ConfigSelector` in core.
`DelayedClientCall` is implemented exactly in the same way as `DelayedStream`. Only added logic to monitor initial DEADLINE. Note that `ClientCall.cancel()` is not thread-safe and will cause exceptions if trying to start call after it, which is different from in the stream where cancel() is thread-safe and wouldn't trigger any checkState()s. The initial DEADLINE monitor should not call `ClientCall.cancel()` directly.