This code is highly experimental, so we can change it at will later. This PR is to go ahead and try it out.
Perfmark task calls are added to the client and server calls public APIs, recording when the calls begin and end. They use separate scope IDs (roughly these are Thread IDs) for the listener and the call due to no synchronization between them. However, they both use the same PerfTags, which allows them to be associated.
In the future, we can plumb the tag down into the stream to include deeper information about whats going on in a call.
This resolves#5523
While bumping `com.android.tools.build:gradle:3.1.2` to `3.3.0`, some other plugins/artifacts/maven repo/buildscripts have to be updated:
- gradle (wrapper) need to upgrade to 4.10.x
- protobuf gradle plugin need to bump a version compatible with gradle version.
- need add `google()` and `jcenter()` repos for android (otherwise `com.android.tools.build:aapt2:3.3.0x` and `trove4j` will not be found resp.)
- need to accept license for Android "build-tools;28.0.3" in kokoro env.
The `ThreadlessExecutor` currently used for blocking calls uses `LinkedBlockingQueue` which is relatively heavy both in terms of allocations and synchronization overhead (e.g. when compared to `ConcurrentLinkedQueue`). It accounts for ~10% of allocations and ~5% of allocated bytes per-call in the `TransportBenchmark` when using in-process transport with [stats and tracing disabled](https://github.com/grpc/grpc-java/issues/5510).
Changing to use a `ConcurrentLinkedQueue` results in a ~5% speedup of that benchmark.
Before:
```
Benchmark (direct) (transport) Mode Cnt Score Error Units
TransportBenchmark.unaryCall1024 true INPROCESS avgt 60 1877.339 ± 46.309 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 60 12680.525 ± 208.684 ns/op
```
After:
```
Benchmark (direct) (transport) Mode Cnt Score Error Units
TransportBenchmark.unaryCall1024 true INPROCESS avgt 60 1779.188 ± 36.769 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 60 12532.470 ± 238.271 ns/op
```
* grpclb: use a stopwatch to manage elapased time for LB backoff
* reformatt lines
* create a new Stopwatch for each single newly created HC stream
* use existing stopwatch supplier provided by FakeClock in tests
Using Long.MAX_VALUE as the delay for Netty's NioEventLoop#schedule can
cause (long) deadlines for scheduled tasks to wrap into negative values,
which is unspecified behavior. Recent versions of netty are guarding
against overflows, but not all versions of grpc-java are using a recent
enough netty.
When connections are gracefully closed, Netty's Http2ConnectionHandler
sets up a timeout task forcing resources to be freed after a grace
period. When their deadlines wrap into negative values, a race was
observed in production systems (with grpc-java <= 1.12.1) causing Netty
to never properly release buffers associated with active streams in the
connection being (gracefully) closed.