Otherwise, when the first response is received from the grpclb server, the
parent ClientConn enters TransientFailure, and the first several
non-wait-for-ready RPCs will fail.
* Modified tests to use tlogger.
* Fail on errors, with error expectations.
* Added expects and MixedCapsed grpclb_config tests
* Moved tlogger to grpctest, moved leakcheck tester to grpctest.go
* Added ExpectErrorN()
* Removed redundant leak checks
* Fixed new test
* Made tlogger globals into tlogger methods
* ErrorsLeft -> EndTest
* Removed some redundant lines
* Fixed error in test and empty map in EndTest
Small refactor of tests:
- Add a channel to remoteBalancer on which to send received stats messages
- Change runAndGetStats to runAndCheckStats to allow for faster successful test
runs (no longer need to sleep for one whole second per test).
- Remove redundant leak check from runAndCheckStats, otherwise excess load
report messages not read from the channel will result in an infinite loop.
- Add String method to rpcStats to avoid data race between merging and printing.
This is necessary because there's another way to select grpclb (by specifying grpclb in service config's balancing policy field). So it's possible that grpclb is picked, but resolver doesn't have any balancer addresses.
When an update without balancer address is received, grpclb closes the underlying ClientConn to remote balancer, and enters fallback mode.
Note that grpclb waits until the ClientConn and the RPC goroutines are actually closed to do the fallback work. This can avoid race caused by async close.
Before these fixes, it was possible to see errors on new RPCs after a
connection began draining, and before establishing a new connection. There is
an inherent race between choosing a SubConn and attempting to creating a stream
on it. We should be able to avoid application-visible RPC errors due to this
with transparent retry. However, several bugs were preventing this from
working correctly:
1. Non-wait-for-ready RPCs were skipping transparent retry, though the retry
design calls for retrying them.
2. The transport closed itself (and would consequently error new RPCs) before
notifying the SubConn that it was draining.
3. The SubConn wasn't synchronously updating itself once it was notified about
the closing or draining state.
4. The SubConn would go into the TRANSIENT_FAILURE state instantaneously,
causing RPCs to fail instead of queue.
Keep the drop index from the previous picker when regenerating picker due to
subchannel state change. Only reset the drop index when picker is regenerated
because of a serverlist update.
fixes#2623
This PR splits out grpclb from grpc. I have made the PR in several commits so you can see more clearly the steps that happened.
There are a few possibly contentious points that I would like to make clear up front:
* grpclb will no longer autoload as a load balancer. I think this is okay, as service config is not widely (at all?) used, and I believe this is the only way to access it.
* `internal` is used more, as a way of having code shared between packages without exposing types
* ConnectivityStateEvaluator, as used by grpclb, is no longer thread safe. I believe there is an outer mutex that guards access, but I want to point out this subtle change up here.
All but one tests pass with this, due to another cyclic dependency. I can fix this, but it is a little more widely scoped (such as exposing grpc.server and grpc.errorDesc in the internal package). This PR is a nearly-passing sample of that last step to get this working.
PTAL @menghanl @dfawley