The problem: GrpclbState tracks Subchannels' states as a mutable
attribute in Subchannel.getAttributes(). However, GrpclbState only
update this attribute for the Subchannels its managing. For those
cached in SubchannelPool, their state attributes are stale. When they
are given back to GrpclbState, IDLE state is assumed. As a result, if
a Subchannel is READY when it's reclaimed from the pool, it will not
be picked.
To fix that, this change expands SubchannelPool interface to handle
Subchannel state updates, which GrpclbState will call. SubchannelPool
saves the latest state and delivers it when it's returned to
GrpclbState by scheduling a call to handleSubchannelState() in the
SynchronizationContext, so that GrpclbState will take the latest state
as if it was just reported from the Channel.
The PICK_FIRST mode puts all backend addresses in a single Subchannel. There are a few points where it's different from the default ROUND_ROBIN mode:
1. PICK_FIRST doesn't eagerly connect to backends like ROUND_ROBIN does. Instead, it requests for connections when the Subchannel is picked.
2. PICK_FIRST adds tokens to the headers via a different code path (`TokenAttachingTracerFactory`) than ROUND_ROBIN
3. For simple implementation, when the mode is changed by service config when the LoadBalancer is working, we will shut down `GrpclbState` and starts a new one with the new mode. All connections will be closed during the transition. We don't expect this to happen in practice given the specific use case of PICK_FIRST.
Make sure the config for grpclb is passed to the GrpclbLoadBalancer, which will support two child policies -- "round_robin" (default) and "pick_first".
Previously the presence of balancer addresses would dictate "grpclb" policy, despite of the service config. Service config will now take precedence instead.
Implement config parsing logic in GrpclbLoadBalancer. Per offline discussions with @markdroth and @ejona86, we will ignore configuration errors for now. The more appropriate config error handling is upcoming.
This commit swaps to using a Sync task to place generated code in the
src/generated folder instead of the gradle-protobuf-plugin's
generatedFilesBaseDir. This provides much nicer results on failed
builds, and you will no longer see all the generated files deleted.
But at the same time the Sync task makes it easy to only copy the
grpc-generated code. This was not previously done because we were lazy
and using generatedFilesBaseDir, which made it difficult to treat the
services differently from the messages.
This will be a new override. The old override is now deprecated.
In order to pass new information without adding new overrides, I shoved most information
into an object called StreamInfo. The Metadata is left out to draw attention because
it's mutable.
Motivation: this is needed for correctly supporting pick_first in GRPCLB. GRPCLB needs to
add a token to the headers, and the token varies among servers. With round_robin, GRPCLB
create a Subchannel for each server, thus can attach the token when the Subchannel is picked.
To implement pick_first, all server addresses will be put in a single Subchannel, we will
need to add the header in newClientStreamTracer(), by looking up the server address from
the transport attributes and deciding which token to add.
For Bazel, we upgrade to protobuf 3.6.1.2 and javalite HEAD to fix
incompatibilities in newer Bazel releases.
compiler/Dockerfile is unused, so it was removed instead of being updated.
protoc no longer includes codegen for nano, so we remain on the older protoc
any time nano is used.
Protobuf now requires C++11 when compiling, so windows was swapped to
VC 14.
Remove unused variables and prefer ArrayDeque to LinkedList. The swap to
Queue from Deque was just to make it more obvious what the usage was,
since the original swap to Deque was to avoid the same LinkedList lint
violation (3d51756d).
Introduce ChannelLogger, which is a utility provided to LoadBalancer implementations (potentially NameResolvers too) for recording events to channel trace. This is immediately required by client-side health checking (#4932, https://github.com/grpc/proposal/blob/master/A17-client-side-health-checking.md) to record an error about disabling health checking. It is also useful for any LoadBalancer implementations to record important information.
ChannelLogger implementation is backed by the internal ChannelTracer/Channelz. Because Channelz limits the number of retained events, and events are lost once the process ends, I have expanded it to also log Java logger. This would provide a "last resort" in cases where there are too many events or off-line investigation is needed. All logs are prefixed with logId so that they can be easily associated with the involved Channel/Subchannel.
To prevent log spamming, the logs are all at FINE level or below so that they are not visible by default. They are logged to ChannelLogger's logger, so that user can have precise control.
There are also more verbose information that may not fit in ChannelTracer, but can be useful for debugging. It's desirable that these logs are associated with logId, but they currently manually include the logId, which is cumbersome and may result in inconsistency. For this use case, I added the DEBUG level for ChannelLogger, which formats the log in the same way as other levels, while not recorded to Channelz.
I have converted most logging and channel tracer recording in the Channel implementation and LoadBalancers.
LoadBalancerProvider is the interface that extends LoadBalancer.Factory. LoadBalancerRegistry is the one that loads the providers through service loader, and allows users to access providers through their names.
pick_first and round_robin balancer factories, which are experimental public API are now deprecated. Their providers are internal, as they are accessible by policy name.
AutoConfiguredLoadBalancerFactory is modified to access implementations purely by their names, thus hard-coded class names are no longer needed, and it can support arbitrary policy selected by service config.
Previously the client waits ~10 seconds until the fallback timer has
expired. While the timer is useful to address the long tail, it
shouldn't delay using the fallback in case of obvious errors, like the
channel failing to connect or an UNIMPLEMENTED response.
Provides a `SynchronizationContext` for scheduling tasks, with and without delay, from LoadBalancer implementations. This absorbs and extends the internal utility `ChannelExecutor`. It supersedes `Helper.runSerialized()`, which is now deprecated.
# Motivation
I see multiple cases that schedule tasks with a delay while requiring the task to run in the "Channel Executor". There have been repeated work to wrap scheduled tasks and handle races between cancellation and task run (see the diff in `GrpclbState.java` for example). The LoadBalancer implementation (e.g., GrpclbLoadBalancer) also has to acquire the `ScheduledExecutorService` from somewhere and release it upon shutdown.
The upcoming HealthCheckLoadBalancer (#4932), which would use back-off policy to retry health-checking streams, would have to do all the things above. At this point I think we need to provide something that combines `runSerialized()` with a scheduled executor with the same synchronization guarantees.
# Design details
`SynchronizationContext` is a similar to `ScheduledExecutorService` but tailored for use in `LoadBalancer` and potentially other cases outside of `LoadBalancer`. It offers task queuing and serialization and delayed scheduling. It guarantees non-reentrancy and happens-before among tasks. It owns no thread, but run tasks on caller's or caller-provided threads.
All channel-level state mutations and callback methods on `LoadBalancer` are done in a SynchronizationContext, which was previously referred to as "Channel Executor".
`SynchronizationContext.schedule()` returns a `ScheduledHandle` for status checking and cancellation. `ScheduedFuture` from `SchedulingExecutorService.schedule()` is too broad for our use cases (e.g., the blocking `get()` should never be used).
`SynchronizationContext.schedule()` requires a `ScheduledExecutorService`, which is now available through `Helper.getScheduledExecutorService()`. LoadBalancers don't need to worry about where to get `SchedulingExecutorService` any more.
# Alternatives
Alternatively, we could keep `Helper.runSerialized()` and add something like `Helper.runSerialiezdWithDelay()`, but having them on their own interface allows clean fake implementation by `FakeClock` for test, and allows other components (potentially `InternalSubchannel` for reconnection backoff) to use it too.
Instead of asking caller of `schedule()` to provide the `ScheduledExecutorService`, we considered having SynchronizationContext take a `ScheduledExecutorService` at construction. It would be inconvenient for LoadBalancer implementations that don't use `schedule()`, as they would be forced to provide a fake `ScheduledExecutorService` (which is cumbersome).
Instead of making `SynchronizationContext` a (semi-)concrete class, we considered making it an pure abstract class. However, we found it nontrivial to implement `execute()` correctly with the non-reentrancy guarantee.
This will allow enabling Error Prone on JDK 10+ (after
updating the net.ltgt.errorprone plugin), and is also a
prerequisite to that plugin update.
Also remove net.ltgt.apt plugin, as Gradle has native
support for annotationProcessor.
Those overrides are kept for backward compatibility and convenience
for callers. Documentation already says implementations should not
override them. Making them final reduces confusion around which
override should be verified in tests and be overridden in forwarding
classes, thus prevents bugs caused by such confusion.
The addresses from the string dump of the LoadBalanceResponse proto is
in binary format and not human-readable. We will log the
BackendAddressGroups when using a new list from the balancer. The
original logging of LoadBalanceResponse is downgraded to FINER level.
This is an API used to coordinate across packages and must live in
`io.grpc`.
Prepending `Internal` makes it easier to detect and hide this class
from public visibility when using certain build tools.
fixes#4796
This PR adds an automatic gradle format checker and reformats all the *.gradle files. After this, new changes to *.gradle files will fail to build if not in good format, just like checkStyle failure.
Deprecate static builder method, Keys.of(), add a notice of plans to
remove keys(), emphasize that the name is only a debug label.
The `@ExperimentalAPI` is left on the class because there are still
issues around hashCode/equals.