Round robin is keeping use of READY subchannels even if there is name resolution error. However, it moves Channel state to TRANSIENT_ERROR.
In hierarchical load balancers, the upstream LB policy may need to aggregate pickers from multiple downstream round_robin LB policy while filtering out non-ready subchannels. It cannot infer if the subchannel can be used just from the SubchannelPicker interface. It relies on the state that the round_robin intends to set channel to.
So the change is to match the readiness of the picker/subchannel with the state that round_robin tries to update. It will completely ignore name resolution error if there are READY subchannels.
* fix channel builders ABI backward compatibility broken in v1.33.0
* fix server builders ABI backward compatibility broken in v1.33.0
* makes ForwardingServerBuilder package-private
With this, it will be clear if the RPC failed because the server didn't
use a double-GOAWAY or if it failed because of MAX_CONCURRENT_STREAMS or
if it was due to a local race. It also fixes the status code to be
UNAVAILABLE except for the RPCs included in the GOAWAY error (modulo the
Netty bug).
Fixes#5855
It deprecates ExpectedException and Assert.assertThat(T, org.hamcrest.Matcher).
Without Java 8 we don't want to migrate away from ExpectedException at
this time. We tend to prefer Truth over Hamcrest, so I swapped the one
instance of Assert.assertThat() to use Truth. With this change we get a
warning-less build with JUnit 4.13. We don't yet upgrade because we
still need to support JUnit 4.12 for some use-cases, but will be able to
upgrade to 4.13 soon when they upgrade.
As mentioned in https://github.com/grpc/grpc-java/pull/7413#issuecomment-690756200 `RealChannel` did not manage `configSelector`, and therefore `configSelector.get()`, `configSelector.set()` and `drainPendingCalls()` were scattered everywhere in `ManagedChannelImpl`. This PR re-organizes `RealChannel` to manage `configSelector`.
Fixing the bug: if two consecutive name resolution updates are queued together in SynchronizationContext, drainPendingCalls() might be called twice and be broken.
There was bug that new pending calls were not drained after channel is shutdown. The bug was worked around by #7354 .
Fixing by making sure new calls fail immediately if the channel is already shutdown.
We used `null` and `RetryPolicy.DEFAULT` for the value of `retryPolicy` in `RetriableStream` to distinguish the state between name resolution not being completed and name resolution being completed but no retry policy. After the change #7259 , name resolution is always completed when creating a `RetriableStream`, so the distinction will be gone. It will be cleaner to get rid of `RetryPolicy.DEFAULT` and simply use `null` for absence of RetryPolicy. `RetryPolicy.Provider` will be deleted in upcoming PR.
Java 9 introduces overridden methods with covariant return types for the following methods in java.nio.ByteBuffer:
- position(int newPosition)
- limit(int newLimit)
- flip()
- clear()
- mark()
- reset()
- rewind()
In Java 9 they all now return ByteBuffer, whereas the methods they override return Buffer, resulting in exceptions like this when executing on Java 8 and lower:
java.lang.NoSuchMethodError: java.nio.ByteBuffer.limit(I)Ljava/nio/ByteBuffer
This is because the generated byte code includes the static return type of the method, which is not found on Java 8 and lower because the overloaded methods with covariant return types don't exist (the issue appears even with source and target 8 or lower in compilation parameters).
The solution is to cast ByteBuffer instances to Buffer before calling the method.
ManagedChannelImpl.newCall() will return a DelayedClientCall until the name resolver updates the configSelector reference.
The configSelector follows the same service config error handling rules.
Made the following assumption:
If there is no service config in resolution result, then there must be no config selector in the resolution result. Actually we ignore any config selector in the resolution result if there is no service config.
Resolves#7222: If a hedging substream fails triggering throttling threshold, the call should be committed.
Refactored RetryPlan to two separate classes RetryPlan and HedgingPlan.
verifyZeroInteractions has the same behavior as verifyNoMoreInteractions. It
was deprecated in Mockito 3.0.1 and replaced with verifyNoInteractions, which
does not change behavior depending on previous verify() calls. All instances
were replaced with verifyNoInteractions, except those in
ApplicationThreadDeframerTest which were replaced with verifyNoMoreInteractions
since there is a verify() call in `@Before`.
Adding `DelayedClientCall` in preparation of implementing `ConfigSelector` in core.
`DelayedClientCall` is implemented exactly in the same way as `DelayedStream`. Only added logic to monitor initial DEADLINE. Note that `ClientCall.cancel()` is not thread-safe and will cause exceptions if trying to start call after it, which is different from in the stream where cancel() is thread-safe and wouldn't trigger any checkState()s. The initial DEADLINE monitor should not call `ClientCall.cancel()` directly.
This should avoid messages being leaked when a Listener throws an exception and
the executor is shut down immediately after the call completes. This is related
to #7105 but a different scenario and we aren't aware of any user having
observed the previous behavior.
Note also this does _not_ fix the similar case of reordering caused by
delayedCancelOnDeadlineExceeded().
It has been our intention for years to remove nameResolverFactory. We should
make it clear to users to avoid new code depending on it and so they can tell
us why they need it so we can provide replacements.
This provides a substantial ~3x performance increase to Netty async
streaming with small messages. It also increases OkHttp performance for
the same benchmark 40% and decreases unary latency by 3µs for Netty and
10µs for OkHttp.
We avoid calling listener after closure because the Executor used for
RPC callbacks may no longer be available. This issue was already
present in the ApplicationThreadDeframer, but full-stream compression is
not really deployed so was unnoticed.
DirectExecutor saw a 5-6µs latency increase via MigratingDeframer.
DirectExecutor usages should see no benefit from MigratingDeframer, so
disable it in that case.
This fixes#6817 for the normal retry case, although it makes the hedging issue #7089 more broken, and there is still space of optimization for normal retry.
The original service config error handling design was unclear about the case when an updated resolution result with valid service config and empty address is returned to Channel, and the selected LB policy does not accept empty addresses. Existing implementation silently triggers resolver backoff after LB policy is changed, while leaving Channel being CONNECTING state, if there was an old service config resolved. This change converge the behaviors of whether the received service config is the first one or an update to a previous config, in both cases, Channel goes into TRANSIENT_FAILURE if the selected LB policy cannot handle empty addresses.
- Use gradle configuration `api` for dependencies that are part of grpc public api signatures.
- Replace deprecated gradle configurations `compile`, `testCompile`, `runtime` and `testRuntime`.
- With minimal change in dependencies: If we need dep X and Y to compile our code, and if X transitively depends on Y, then our build would still pass even if we only include X as `compile`/`implementation` dependency for our project. Ideally we should include both X and Y explicitly as `implementation` dependency for our project, but in this PR we don't add the missing Y if it is previously missing.
Define util function to exclude guava's transitive dependencies jsr305 and animal-sniffer-annotations, and always manually add them as runtimeOnly dependency. error_prone_annotations is an exception: It is also excluded but manually added not as runtimeOnly. It must always compile with guava, otherwise users will see warning spams if guava is in the compile classpath but error_prone_annotations is not.
Eliminate the hack of InternalNotifyOnBuild mechanism for letting ProtoReflectionService get access to the Sever instance, which makes ProtoReflectionService incompatible with server interceptors. This change put the Server instance into the Context and let the ProtoReflectionService RPC obtain it in its RPC Context. Also enhanced ProtoReflectionService so that one service instance can be used across multiple servers.
A user has been seeing "InternalSubchannel closed transport due to address
change" errors (b/153064566). It is unclear if they are predomenent, but they
are at least adding noise. Since #2562 is still far from being generally
solved, we delay the shutdown a while to side-step the race.
Make each subchannel created by RR stay in TRANSIENT_FAILURE state until READY. That is, each subchannel ignores consequent non-READY states after TRANSIENT_FAILURE.
Previously AbstractServerStream would throw an exception which would kill the
RPC with a RST_STREAM. Now the server actually responds with a clean error
message and avoids spamming the logs.
WARNING: Exception in onHeadersRead()
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:495)
at io.grpc.internal.AbstractStream$TransportState.onStreamAllocated(AbstractStream.java:232)
at io.grpc.internal.AbstractServerStream$TransportState.onStreamAllocated(AbstractServerStream.java:224)
at io.grpc.netty.NettyServerHandler.onHeadersRead(NettyServerHandler.java:451)
at io.grpc.netty.NettyServerHandler.access$900(NettyServerHandler.java:101)
at io.grpc.netty.NettyServerHandler$FrameListener.onHeadersRead(NettyServerHandler.java:807)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:373)
The condition "effectiveServiceConfig != validServiceConfig" should
have been deleted in commit 2162ad0436.
The condition was there before that commit because
NAME_RESOLVER_SERVICE_CONFIG was already in "attrs", thus it needed to
be re-added only if "effectiveServiceConfig" differs from the original
"validServiceConfig".
In contrast, ATTR_HEALTH_CHECKING_CONFIG is not in the original
"attrs" and always needs to be added.
This allows an InProcessTransport instance to be created via a new
internal accessor class InternalInProcess. We effectively just expose a
method to create an InProcessTransport with a existing ServerListener
instance.
This will be used for in-process channels to an under-development
on-device server.
Move ATTR_LB_ADDR_AUTHORITY and ATTR_LB_PROVIDED_BACKEND attributes definition in GrpcAttributes to GrpclbConstants. grpc-alts will have a compile dependency on grpc-grpclb.
Eliminated the code path of resolving Grpclb balancer addresses in grpc-core and moved it into GrpclbNameResolver, which is a subclass of DnsNameResolver. Main changes:
- Slightly changed ResourceResolver and its JNDI implementation. ResourceResolver#resolveSrv(String) returns a list of SrvRecord so that it only parse SRV records and does nothing more. It's gRPC's name resolver's logic to use information parsed from SRV records.
- Created a GrpclbNameResolver class that extends DnsNameResolver. Logic of using information from SRV records to set balancer addresses as ResolutionResult attributes is implemented in GrpclbNameResolver only.
- Refactored DnsNameResolver, mainly the resolveAll(...) method. Logics for resolving backend addresses and service config are modularized into resolveAddresses() and resolveServiceConfig() methods respectively. They are shared implementation for subclasses (i.e., GrpclbNameResolver).
Add "envoy.lb.does_not_support_overprovisioning" to xDS node identifier client features. Set the new user_agent_name and user_agent_version fields for build version.
First take for grpclb selection stabilization:
1. Changed DnsNameResolver to return balancer addresses as a GrpcAttributes.ATTR_LB_ADDRS attribute in ResolutionResult, instead of among the addresses.
2. AutoConfiguredLoadBalancerFactory decides LB policy solely based on parsed service config without looking at resolved addresses. Behavior changes:
- If no LB policy is specified in service config, default to pick_first, even if there exist balancer addresses (in attributes).
- If grpclb specified but not available and no other specified policies available, it will fail without fallback to round_robin.
3. GrpclbLoadBalancer populates balancer addresses from ResolvedAddresses's attribute (GrpclbConstants.ATTR_LB_ADDRS) instead of sieving from addresses.
Go to TRANSIENT_FAILURE immediately instead of NOOP if GracefulSwitchLoadBalancer receives resolution error before switching to any delegate.
In most natural usecase, the `gracefulSwitchLb` is a child balancer of some `parentLb`, and the `gracefulSwitchLb` switches to a new `delegateLb` when `parentLb.handleResolvedAddressGroup()`. If the `parentLb` receives a resolution error before receiving any resolved addresses, it should go to TRANSIENT_FAILURE. In this case, it will be more convenient if the initial `gracefulSwitchLb` can go to TRANSIENT_FAILURE directly.
Decouples grpc-core with census, while still preserve the default integration of census in grpc-core. Users wishing to enable census needs to add grpc-census to their runtime classpath.
- Created a grpc-census module:
- Moved CensusStatsModule.java and CensusTracingModule.java into grpc-census from grpc-core. CensusModuleTests.java is also moved. They now belong to io.grpc.census package.
Moved DeprecatedCensusConstants.java into io.grpc.census.internal (is this necessary?) in grpc-census.
- Created CensusStatsAccessor.java and CensusTracingAccessor.java, which are used to create census ClientInterceptor and ServerStreamTracer.Factory.
- Everything in grpc-census are package private, except the accessor classes. They only publicly expose ClientInterceptor and ServerStreamTracer.Factory, no Census specific types are exposed.
- Use runtime reflection to load and apply census stats/tracing to channel/server builders, if grpc-census is found in runtime classpath.
- Removed special APIs on AbstractManagedChannelImplBuilder and AbstractServerImplBuilder for overriding census module. They are only used for testing. Now we changed tests to apply Census ClientInterceptor and ServerStreamTracer.Factory just as normal interceptor/stream tracer factory. Test writer is responsible for taking care of the ordering concerns of interceptors and stream tracer factories.
This PR is to add one more Executor parameter when creating the SslContext.
In Netty, we already have this implementation for passing Executor when creating SslContext: netty/netty#8847. This extra Executor is used to take some time-consuming tasks when doing SSL handshake. However, in current gRPC implementation, we are not using this API.
In this PR, the relevant changes are:
1. get the executorPool from ChannelBuilder or ServerBuilder
2. pass the executorPool all the way down to ClientTlsHandler
3. fill executorPool in when creating SslHandler
`GracefulSwitchLoadBalancer` was doing switch based on `LoadBalancerProvider.getPolicyName()`. This turned out to be very awkward when I have to synthesize a policy name for the provider, and what I actually care about is the identity of the lb provider not necessarily the policy name.
Now `GracefulSwitchLoadBalancer` is doing switch based on identity of `LoadBalancer.Factory`, which is simpler.
The API review for #6279 came up with a more meaningful name that
better explains the intent. A setter for the old name was left in
ManagedChannelBuilder to ease migration to the new name by current
users.
Adds an Executor to NameResolver.Args, which is optionally set on ManagedChannelBuilder. This allows NameResolver implementations to avoid creating their own thread pools if the application already manages its own pools.
Addresses #3703.
This allows plumbing information to the subchannel via Attributes, like
an authority override. RR still does not support multiple EAGs that only
differ by attributes.
This can be used to prevent duplicate classes in the classpath, one via
Maven and one via Bazel-native.
See census-instrumentation/opencensus-java#1963 and #5359
ServerImpl uses that ticker to create incoming Deadlines. This feature is specifically restricted to in-process, as it can also customize ScheduledExecutorService, and them together can fake out the clock which is useful in tests. On the other hand, a fake Ticker won't work with Netty's ScheduledExecutorService.
Also improved mismatch detection, documentation and tests in Deadline.
Fixes#5593 and supersedes #5601
Now that https://github.com/census-instrumentation/opencensus-java/pull/1854 has been merged & released as 0.21.0. We can start using the method & status tags.
Background:
Opencensus introduced new tags for status and method (https://github.com/census-instrumentation/opencensus-java/pull/1115). The old views that used those tags were deprecated and new views were created that used the new tags. However grpc-java wasn't updated to use the new tags due to concern of breaking existing metrics. This resulted in the old views being deprecated while the new views were broken (goomics #50).
https://github.com/census-instrumentation/opencensus-java/pull/1854 added a compatibility layer to opencensus that would remap new tags to old tags for old views. This should unblock grpc to switching to the new tags while allowing old views to still be populated. That commit was released as part of opencensus 0.21, which grpc currently uses
* Bump PerfMark to 0.17.0
The main changes how linking is done. Linking is now always done
through the `PerfMark` entry class. This is for two reasons:
1. It make instrumenting the linking calls *much* easier.
2. It follows the API pattern of "verbNoun()". Previous callsites
would have `Link link = PerfMark.link(); link.link()`. This
stuttering is not quick to follow.
Generated using:
```
find -name \*.java -exec sed -i 's#link = PerfMark.link();#link = PerfMark.linkOut();#g' {} \;
find -name \*.java -exec sed -i 's#link.link();#PerfMark.linkIn(link);#g' {} \;
find -name \*.java -exec sed -i 's#command.getLink().link();#PerfMark.linkIn(command.getLink());#g' {} \;
find -name \*.java -exec sed -i 's#cmd.getLink().link();#PerfMark.linkIn(cmd.getLink());#g' {} \;
find -name \*.java -exec sed -i 's#msg.getLink().link();#PerfMark.linkIn(msg.getLink());#g' {} \;
```
Since the deprecated link methods are also `@DoNotCall`, the same
sed calls will need to be used on import.