Previously the client waits ~10 seconds until the fallback timer has
expired. While the timer is useful to address the long tail, it
shouldn't delay using the fallback in case of obvious errors, like the
channel failing to connect or an UNIMPLEMENTED response.
Provides a `SynchronizationContext` for scheduling tasks, with and without delay, from LoadBalancer implementations. This absorbs and extends the internal utility `ChannelExecutor`. It supersedes `Helper.runSerialized()`, which is now deprecated.
# Motivation
I see multiple cases that schedule tasks with a delay while requiring the task to run in the "Channel Executor". There have been repeated work to wrap scheduled tasks and handle races between cancellation and task run (see the diff in `GrpclbState.java` for example). The LoadBalancer implementation (e.g., GrpclbLoadBalancer) also has to acquire the `ScheduledExecutorService` from somewhere and release it upon shutdown.
The upcoming HealthCheckLoadBalancer (#4932), which would use back-off policy to retry health-checking streams, would have to do all the things above. At this point I think we need to provide something that combines `runSerialized()` with a scheduled executor with the same synchronization guarantees.
# Design details
`SynchronizationContext` is a similar to `ScheduledExecutorService` but tailored for use in `LoadBalancer` and potentially other cases outside of `LoadBalancer`. It offers task queuing and serialization and delayed scheduling. It guarantees non-reentrancy and happens-before among tasks. It owns no thread, but run tasks on caller's or caller-provided threads.
All channel-level state mutations and callback methods on `LoadBalancer` are done in a SynchronizationContext, which was previously referred to as "Channel Executor".
`SynchronizationContext.schedule()` returns a `ScheduledHandle` for status checking and cancellation. `ScheduedFuture` from `SchedulingExecutorService.schedule()` is too broad for our use cases (e.g., the blocking `get()` should never be used).
`SynchronizationContext.schedule()` requires a `ScheduledExecutorService`, which is now available through `Helper.getScheduledExecutorService()`. LoadBalancers don't need to worry about where to get `SchedulingExecutorService` any more.
# Alternatives
Alternatively, we could keep `Helper.runSerialized()` and add something like `Helper.runSerialiezdWithDelay()`, but having them on their own interface allows clean fake implementation by `FakeClock` for test, and allows other components (potentially `InternalSubchannel` for reconnection backoff) to use it too.
Instead of asking caller of `schedule()` to provide the `ScheduledExecutorService`, we considered having SynchronizationContext take a `ScheduledExecutorService` at construction. It would be inconvenient for LoadBalancer implementations that don't use `schedule()`, as they would be forced to provide a fake `ScheduledExecutorService` (which is cumbersome).
Instead of making `SynchronizationContext` a (semi-)concrete class, we considered making it an pure abstract class. However, we found it nontrivial to implement `execute()` correctly with the non-reentrancy guarantee.
This will allow enabling Error Prone on JDK 10+ (after
updating the net.ltgt.errorprone plugin), and is also a
prerequisite to that plugin update.
Also remove net.ltgt.apt plugin, as Gradle has native
support for annotationProcessor.
Those overrides are kept for backward compatibility and convenience
for callers. Documentation already says implementations should not
override them. Making them final reduces confusion around which
override should be verified in tests and be overridden in forwarding
classes, thus prevents bugs caused by such confusion.
The addresses from the string dump of the LoadBalanceResponse proto is
in binary format and not human-readable. We will log the
BackendAddressGroups when using a new list from the balancer. The
original logging of LoadBalanceResponse is downgraded to FINER level.
This is an API used to coordinate across packages and must live in
`io.grpc`.
Prepending `Internal` makes it easier to detect and hide this class
from public visibility when using certain build tools.
fixes#4796
This PR adds an automatic gradle format checker and reformats all the *.gradle files. After this, new changes to *.gradle files will fail to build if not in good format, just like checkStyle failure.
Deprecate static builder method, Keys.of(), add a notice of plans to
remove keys(), emphasize that the name is only a debug label.
The `@ExperimentalAPI` is left on the class because there are still
issues around hashCode/equals.
This is to conform with the GRPCLB spec. The spec doesn't (yet) define
the actual timeout. We choose 10 seconds here arbitrarily. It may be
configurable in the future.
This will fix internal bug b/74410243
Spies are really magical and easily produce unexpected results. Using them in
tests can easily yield tests that don't do what you think they do. Delegation
is much safer when possible.
Delegation doesn't work when methods `return true`, final methods, and with
restricted visibility, though. So CensusModulesTest and
MaxConnectionIdleManagerTest are left as-is.
The channelz service must not live in io.grpc.internal, and channelz
needs to be able to get the identifier of the entities it
tracks. Since io.grpc can not refer to io.grpc.internal, the LogId
must be moved out of internal.
Previously fallback mode can be entered only if the client has not
received any server list and the fallback timeout has expired.
Now the fallback timer is started when the stream to the balancer is broken
AND there is no ready Subchannels. Fallback mode is activated when the
timer expires. When a new server list is received from the balancer, either
the fallback timer is cancelled, or fallback mode is exited.
Also fixed a bug that the fallback timer should've been cancelled when GrpcState
is shut down.
This moves away from the global String-based Span name registry which
is not as flexible as we desire.
Also renamed the option name to be more accurate. This is not
API-breaking because the origianl addition to MethodDescriptor and
code-gen didn't make it into the 1.7.0 release.
* MethodDescriptor is lazy loaded, so protobuf loading only happens on demand. This also means tracing registration happens on demand.
* The names of the getters all being with `method`. This makes it harder for autocomplete to pick them up.
* A new field is used, which matches the getter name. Rather than make the new-getters reference the old-fields, make the old-fields reference the new getters. This makes removal of the old-fields a simple operation.
* The getters may not be inlineable, but thats an easy fix if it ends up being a problem. Not worth premature optimization (but is worth future work).
The expected timeline for this is adding this to the 1.8 cut, and deprecating the old-fields. They will be removed in 1.9.
This is a more favorable approach than #3467. Doing the registration
in MethodDescriptor should allow us to deregister in case the
generated stub and its MethodDescriptors are garbage-collected
routinely, e.g., if they are loaded by a separate ClassLoader.
Two methods, outboundMessageSent() and inboundMessageRead() are added to StreamTracer in order to associate individual messages with sizes. Both types of sizes are optional, as allowed by Census tracing.
Both methods accept a sequence number as the type ID as required by Census. The original outboundMesage() and inboundMessage() are also replaced by overrides that take the sequence number, to better match the new methods. The deprecation of the old overrides are tracked by #3460