Using Long.MAX_VALUE as the delay for Netty's NioEventLoop#schedule can
cause (long) deadlines for scheduled tasks to wrap into negative values,
which is unspecified behavior. Recent versions of netty are guarding
against overflows, but not all versions of grpc-java are using a recent
enough netty.
When connections are gracefully closed, Netty's Http2ConnectionHandler
sets up a timeout task forcing resources to be freed after a grace
period. When their deadlines wrap into negative values, a race was
observed in production systems (with grpc-java <= 1.12.1) causing Netty
to never properly release buffers associated with active streams in the
connection being (gracefully) closed.
The problem: GrpclbState tracks Subchannels' states as a mutable
attribute in Subchannel.getAttributes(). However, GrpclbState only
update this attribute for the Subchannels its managing. For those
cached in SubchannelPool, their state attributes are stale. When they
are given back to GrpclbState, IDLE state is assumed. As a result, if
a Subchannel is READY when it's reclaimed from the pool, it will not
be picked.
To fix that, this change expands SubchannelPool interface to handle
Subchannel state updates, which GrpclbState will call. SubchannelPool
saves the latest state and delivers it when it's returned to
GrpclbState by scheduling a call to handleSubchannelState() in the
SynchronizationContext, so that GrpclbState will take the latest state
as if it was just reported from the Channel.
Make the "logged" method be consistent, and refer to the public logging class name and method. This makes the log statements return the same class name used to set the log level.
Before:
```
190306 13:29:39.290:D 1 [io.grpc.internal.ChannelTracer.logOnly] [Channel<1>: (localhost:10000)] Channel for 'localhost:10000' created
190306 13:29:39.414:D 1 [io.grpc.internal.ChannelTracer.logOnly] [Channel<1>: (localhost:10000)] Exiting idle mode
190306 13:29:39.622:D 17 [io.grpc.internal.ChannelTracer.logOnly] [Channel<1>: (localhost:10000)] Resolved address: [[[/127.0.0.1:10000]/{}]], config={}
190306 13:29:39.623:D 17 [io.grpc.internal.ChannelTracer.logOnly] [Channel<1>: (localhost:10000)] Address resolved: [[[/127.0.0.1:10000]/{}]]
190306 13:29:39.624:D 17 [io.grpc.internal.ChannelTracer.logOnly] [Subchannel<3>] Subchannel for [[[/127.0.0.1:10000]/{}]] created
```
After:
```
190306 13:49:15.654:D 1 [io.grpc.ChannelLogger.log] [Channel<1>: (localhost:10000)] Channel for 'localhost:10000' created
190306 13:49:15.772:D 1 [io.grpc.ChannelLogger.log] [Channel<1>: (localhost:10000)] Exiting idle mode
190306 13:49:15.995:D 18 [io.grpc.ChannelLogger.log] [Channel<1>: (localhost:10000)] Resolved address: [[[/127.0.0.1:10000]/{}]], config={}
190306 13:49:15.995:D 18 [io.grpc.ChannelLogger.log] [Channel<1>: (localhost:10000)] Address resolved: [[[/127.0.0.1:10000]/{}]]
190306 13:49:15.997:D 18 [io.grpc.ChannelLogger.log] [Subchannel<3>] Subchannel for [[[/127.0.0.1:10000]/{}]] created
190306 13:49:15.999:D 18 [io.grpc.ChannelLogger.log] [Channel<1>: (localhost:10000)] Child Subchannel created
```
The PICK_FIRST mode puts all backend addresses in a single Subchannel. There are a few points where it's different from the default ROUND_ROBIN mode:
1. PICK_FIRST doesn't eagerly connect to backends like ROUND_ROBIN does. Instead, it requests for connections when the Subchannel is picked.
2. PICK_FIRST adds tokens to the headers via a different code path (`TokenAttachingTracerFactory`) than ROUND_ROBIN
3. For simple implementation, when the mode is changed by service config when the LoadBalancer is working, we will shut down `GrpclbState` and starts a new one with the new mode. All connections will be closed during the transition. We don't expect this to happen in practice given the specific use case of PICK_FIRST.
Make sure the config for grpclb is passed to the GrpclbLoadBalancer, which will support two child policies -- "round_robin" (default) and "pick_first".
Previously the presence of balancer addresses would dictate "grpclb" policy, despite of the service config. Service config will now take precedence instead.
Implement config parsing logic in GrpclbLoadBalancer. Per offline discussions with @markdroth and @ejona86, we will ignore configuration errors for now. The more appropriate config error handling is upcoming.
This commit swaps to using a Sync task to place generated code in the
src/generated folder instead of the gradle-protobuf-plugin's
generatedFilesBaseDir. This provides much nicer results on failed
builds, and you will no longer see all the generated files deleted.
But at the same time the Sync task makes it easy to only copy the
grpc-generated code. This was not previously done because we were lazy
and using generatedFilesBaseDir, which made it difficult to treat the
services differently from the messages.
This was added for the potential use case of needing to resolve target
names (of the same scheme as the top-level channel's target's) in the
LoadBalancer. Now actual use cases come up in xDS that we need to
resolve fully-qualified target strings with arbitrary schemes. This
method has never been used and won't fit future uses because it's too
restrictive.
This is needed for GRPCLB pick_first support, which needs to attach
tokens to headers, and the tokens are per server. In pick_first, all
addresses are in a single Subchannel, thus the LoadBalancer needs to
know which backend is used for a new stream.