Implementations of ManagedClientTransport.start() are restricted from
calling the passed listener until start() returns, in order to avoid
reentrency problems with locks. For most transports this isn't a
problem, because they need additional threads anyway. InProcess uses no
additional threads naturally so ends up needing a thread just to
notifyReady. Now transports can just return a Runnable that can be run
after locks are dropped.
This was originally intended to be a performance optimization, but the
thread also causes nondeterminism because RPCs are delayed until
notifyReady is called. So avoiding the thread reduces needless fakes
during tests.
Protobuf-lite since beta-4 is now more of a fork than a subset of
protobuf-java, which may cause us problems later since lite API is not
stable. Also, lite-generated code may now depend on APIs only in
protobuf-lite, so our users must depend on the protobuf-lite runtime.
Having all our users explicitly override the dependency is bothersome to
them and can easily only expose problems only after we do a release.
So now we are doing the dependency overriding; most users should "just
work" and pick up the correct protobuf artifact. I've confirmed the
exclusion is listed in the grpc-protobuf pom and "gradle dependencies"
and "mvn dependency:tree" do not include protobuf-lite for the examples.
Vanilla protobuf users are most likely to experience any breakage, which
should detect problems more quickly since we use protobuf-java more
frequently than protobuf-lite during development.
protobuf-lite does not include pre-generated code for the well-known
protos, so users will need to generate them themselves for the moment
(google/protobuf#1889).
Note that today changing deps does not noticeably reduce the method code
for our users, since ProGuard already is stripping most classes. The
difference in output is only a reduction of 3 classes and 6 methods for
the android example.
The != should have been ==. However, it is provable that the exception
won't be null, but we want to make that fact obvious when auditing. So
we just fail if the exception is ever null.
780b2696 caused all failures for blocked unary stubs to have a
StatusRuntimeException as the cause of the StatusRuntimeException, with
the two exceptions having almost the same status.
`ClientTransport.newStream()` and
`CallCredentials.applyRequestMetadata()` is now called under the context
of the call. This can be used to pass any call-specific information to
`CallCredentials`.
The value of nodeCount depended on deadlines expiring after the chain
was constructed. This is effectively the same as using Thread.sleep()
and would commonly fail if the machine was under load.
Instead of checking nodeCount after the deadline expires, we now wait
for the chain to be constructed and then cancel the RPC. This also
ensures that the cancel propagates instead of each hop just enforcing
the deadline. As a bonus, this also reduces test execution time by one
second. A new test was added for deadline propagation.
Fixes#1852
MessageFramer calls Drainable.drainTo with a special output stream of
OutputStreamAdapter. Currently, ByteBufInputStream writes to this output
stream by allocating a heapBuffer in UnsafeByteBufUtil.getBytes, copying
from the direct byte buffer of BBIS, and then copies to the direct byte
buffer from MessageFramer.writeRaw().
This change is an easy way to cut down on wasted memory, even though
ideally there would be some way to have less copies. The actual data is
only around 10 bytes, but causes O(10)s of megabytes allocation for the
heap pool.
For #2062
We are no longer using resources to load providers on Android. Instead,
we are calling Class.forName() for known providers. ProGuard is able to
detect these usages automatically.
The benchmarks today do not have a good way to record metrics with precision
or shutdown safely when the benchmark is over. This change alters the
AbstractBenchmark class to return a latch that can be waited upon when ending
the benchmark.
Benchmarks also would accidentally request way too many messages from the
server by calling request(1) explicitly in addition to the implicit one
in the StreamObserver to Call adapter. This change adds a few outstanding
requests, but otherwise keeps the request count bounded.
Additionally, benchmark calls would ignore errors, and just shutdown in such
cases. This changes them to log the error and just wait for the benchmark to
complete. In the successful case, the benchmark client notifies server by
halfClosing (via onCompleted) where it previously did not. It is also
careful to only do this once.
Lastly, Benchmarks have been changes to enable and disable recording at exact
points in the benchmark method, rather than waiting for teardown to occur.
Also, recording begins inside the recording method, not in Setup. JMH may
do other procressing before, between, and after iterations.
partially resolving #1469
The added option for java_plugin `enable_deprecated` is `true` by default in `java_plugin.cpp`, so the generated code for `TestService.java` (`compiler/build.gradle` not setting this option) has all deprecated interfaces and static bindService method.
`./build.gradle` and `examples/build.gradle` set this option explicitly to `false`, so all the other generated classes do not have deprecated code.
Will set `enable_deprecated` to `false` by default in future PR when we are ready.
To my knowledge, there has been just a single DeadlineTest flake since
the code was fixed to avoid issues with I/O due to class loading:
io.grpc.DeadlineTest > defaultTickerIsSystemTicker[0] FAILED
java.lang.AssertionError: <-21431071 ns from now> and <0 ns from now> should have been within <20000000ns> of each other
But we don't really need fine-grained verification during the test
though; if the code is not using nanoTime, then it is almost certainly
not going to have even a day of accuracy (except on a fresh VM). So
checking for a second of accuracy vs 20ms shouldn't really be an issue.
WriteQueue uses LinkedBlockingQueue, which has stronger synchronization
semantics than we need. It also requires that we batch reads from it
in order to get reasonable performance. After profiling the delay
between writing to LBQ and reading from it, there was a ~10us delay.
This change switches to using ConcurrentLinkedQueue as the underlying
queue, and removes the batching (reads). Using CLQ with batching is
slightly slower.
Benchmarks show favorable numbers for both latency and throughput.
Each of the following results were run serveral times:
Before:
Benchmark (direct) (transport) Mode Cnt Score Error Units
TransportBenchmark.unaryCall1024 true NETTY sample 321575 124185.027 ± 406.112 ns/op
TransportBenchmark.unaryCall1024 false NETTY sample 237400 168232.991 ± 548.043 ns/op
After:
Benchmark (direct) (transport) Mode Cnt Score Error Units
TransportBenchmark.unaryCall1024 true NETTY sample 354773 112552.339 ± 362.471 ns/op
TransportBenchmark.unaryCall1024 false NETTY sample 263297 151660.490 ± 507.463 ns/op
Qps with 10 outstanding RPCs per channel:
Before:
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 396
90%ile Latency (in micros): 680
95%ile Latency (in micros): 838
99%ile Latency (in micros): 1476
99.9%ile Latency (in micros): 5231
Maximum Latency (in micros): 43327
QPS: 85761
After:
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 384
90%ile Latency (in micros): 612
95%ile Latency (in micros): 725
99%ile Latency (in micros): 1080
99.9%ile Latency (in micros): 3107
Maximum Latency (in micros): 30447
QPS: 93353
The results are even better when under heavy load. Qps with 100
outstanding RPCs per channel:
Before:
Channels: 4
Outstanding RPCs per Channel: 100
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 2735
90%ile Latency (in micros): 5051
95%ile Latency (in micros): 6219
99%ile Latency (in micros): 9271
99.9%ile Latency (in micros): 13759
Maximum Latency (in micros): 44831
QPS: 125775
After:
Channels: 4
Outstanding RPCs per Channel: 100
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 2697
90%ile Latency (in micros): 4639
95%ile Latency (in micros): 5539
99%ile Latency (in micros): 7931
99.9%ile Latency (in micros): 12335
Maximum Latency (in micros): 61823
QPS: 131904
conscrypt at some point which would allow ALPN to function
Clarify the SSLContext.getDefault is not used when constructing the
default SSLSocketFactory.