See #933
- Create InternalHandlerRegistry, an immutable look-up table. Handlers
passed to ServerBuilder.addService() go to this registry. This covers
the most common use cases. By keeping the registry internal we could
freely change the registry's interface to accommodate optimizations,
e.g., for hpack.
- The internal registry uses a flat fullMethodName -> handler look-up
table instead of a hierarchical one used before. It faster because it
saves one look-up and a substring.
- Introduces the fallback registry, settable by
ServerBuilder.fallbackHandlerRegistry(), for advanced users who want a
dynamic registry. Moved the current MutableHandlerRegistryImpl to
io.grpc.util.MutableHandlerRegistry as a stock implementation of the
fallback registry. The io.grpc.MutableHandlerRegistry interface is now
removed.
This reverts commit 8825f355df.
The commit changed the package name of services that were used across
languages. That broke their functionality pretty severely. The changes
require more coordination with others.
Resolves#1221
Add ClientCall.cancel(String, Throwable) and deprecate
ClientCall.cancel(). Will delete cancel() after all known third-party
overriders have switched to overriding the new one.
This reduces the necessary number of threads in the application executor
and provides a small improvement in latency (~15μs, which is normally in
the noise, but would be a 5% improvement).
Benchmark (direct) (transport) Mode Cnt Score Error Units
Before:
TransportBenchmark.unaryCall1024 true INPROCESS avgt 10 1566.168 ± 13.677 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 10 35769.532 ± 2358.967 ns/op
After:
TransportBenchmark.unaryCall1024 true INPROCESS avgt 10 1813.778 ± 19.995 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 10 18568.223 ± 1679.306 ns/op
The benchmark results are exactly what we would expect, assuming that
half of the benefit of direct is on server and half on client:
1566 + (35769 - 1566) / 2 = 18668 ns --vs-- 18568 ns
It is expected that direct=true would get worse, because
SerializingExecutor is now used instead of
SerializeReentrantCallsDirectExecutor plus the additional cost of
ThreadlessExecutor.
In the future we could try to detect the ThreadlessExecutor and ellide
Serializ*Executor completely (as is possible for any single-threaded
executor). We could also optimize the queue used in ThreadlessExecutor
to be single-producer, single-consumer. I don't expect to do those
optimizations soon, however.
This reduces the number of classes defined, which reduces memory usage.
It also reduces the number of methods defined, which is important
because of the dex limit.
This should have virtually zero performance degradation because the
contiguous switch uses tableswitch bytecode.
When using a direct executor we don't need to wrap calls in a
serializing executor and can thus also avoid the overhead that
comes with it.
Benchmarks show that throughput can be improved substantially.
On my MBP I get a 24% improvement in throughput with also
significantly better latency throughout all percentiles.
(running qps_client and qps_server with --address=localhost:1234 --directexecutor)
=== BEFORE ===
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 452
90%ile Latency (in micros): 600
95%ile Latency (in micros): 726
99%ile Latency (in micros): 1314
99.9%ile Latency (in micros): 5663
Maximum Latency (in micros): 136447
QPS: 78498
=== AFTER ===
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 399
90%ile Latency (in micros): 429
95%ile Latency (in micros): 453
99%ile Latency (in micros): 650
99.9%ile Latency (in micros): 1265
Maximum Latency (in micros): 33855
QPS: 97552
A few things to note:
- ByteString has gone away in favor of AsciiString.
- Http2Headers now uses CharSequence for all methods, so there are a few places that we have to explicitly check for AsciiString to get the optimizations.
- We now have to specify a graceful shutdown timeout for our Netty handlers. Using 5 seconds.
There is no need to use ServerMethodDefinition in codegen. The create()
method itself could be helpful to a dynamic HandlerRegistry
implementation, so we won't remove it.
Client:
* New ManagedChannel abstract class.
* Adding ping to Channel.
* Moving builders and implementations to internal.
Server:
* Added lifecycle management API to Server (mirroring ManagedChannel).
* Moved ServerImpl, AbstractServerBuilder and handler registries to internal.
* New ServerBuilder abstract class (mirroring ManagedChannelBuilder).
Fixes#545