This removes a reference of io.grpc.internal from io.grpc. It also
optimizes server call handling for outbound grpc-accept-encoding header,
as had already been done for client call.
See #933
- Create InternalHandlerRegistry, an immutable look-up table. Handlers
passed to ServerBuilder.addService() go to this registry. This covers
the most common use cases. By keeping the registry internal we could
freely change the registry's interface to accommodate optimizations,
e.g., for hpack.
- The internal registry uses a flat fullMethodName -> handler look-up
table instead of a hierarchical one used before. It faster because it
saves one look-up and a substring.
- Introduces the fallback registry, settable by
ServerBuilder.fallbackHandlerRegistry(), for advanced users who want a
dynamic registry. Moved the current MutableHandlerRegistryImpl to
io.grpc.util.MutableHandlerRegistry as a stock implementation of the
fallback registry. The io.grpc.MutableHandlerRegistry interface is now
removed.
This reverts commit 8825f355df.
The commit changed the package name of services that were used across
languages. That broke their functionality pretty severely. The changes
require more coordination with others.
Resolves#1221
Add ClientCall.cancel(String, Throwable) and deprecate
ClientCall.cancel(). Will delete cancel() after all known third-party
overriders have switched to overriding the new one.
This reduces the necessary number of threads in the application executor
and provides a small improvement in latency (~15μs, which is normally in
the noise, but would be a 5% improvement).
Benchmark (direct) (transport) Mode Cnt Score Error Units
Before:
TransportBenchmark.unaryCall1024 true INPROCESS avgt 10 1566.168 ± 13.677 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 10 35769.532 ± 2358.967 ns/op
After:
TransportBenchmark.unaryCall1024 true INPROCESS avgt 10 1813.778 ± 19.995 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 10 18568.223 ± 1679.306 ns/op
The benchmark results are exactly what we would expect, assuming that
half of the benefit of direct is on server and half on client:
1566 + (35769 - 1566) / 2 = 18668 ns --vs-- 18568 ns
It is expected that direct=true would get worse, because
SerializingExecutor is now used instead of
SerializeReentrantCallsDirectExecutor plus the additional cost of
ThreadlessExecutor.
In the future we could try to detect the ThreadlessExecutor and ellide
Serializ*Executor completely (as is possible for any single-threaded
executor). We could also optimize the queue used in ThreadlessExecutor
to be single-producer, single-consumer. I don't expect to do those
optimizations soon, however.
This reduces the number of classes defined, which reduces memory usage.
It also reduces the number of methods defined, which is important
because of the dex limit.
This should have virtually zero performance degradation because the
contiguous switch uses tableswitch bytecode.
When using a direct executor we don't need to wrap calls in a
serializing executor and can thus also avoid the overhead that
comes with it.
Benchmarks show that throughput can be improved substantially.
On my MBP I get a 24% improvement in throughput with also
significantly better latency throughout all percentiles.
(running qps_client and qps_server with --address=localhost:1234 --directexecutor)
=== BEFORE ===
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 452
90%ile Latency (in micros): 600
95%ile Latency (in micros): 726
99%ile Latency (in micros): 1314
99.9%ile Latency (in micros): 5663
Maximum Latency (in micros): 136447
QPS: 78498
=== AFTER ===
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 399
90%ile Latency (in micros): 429
95%ile Latency (in micros): 453
99%ile Latency (in micros): 650
99.9%ile Latency (in micros): 1265
Maximum Latency (in micros): 33855
QPS: 97552
There is no need to use ServerMethodDefinition in codegen. The create()
method itself could be helpful to a dynamic HandlerRegistry
implementation, so we won't remove it.
Client:
* New ManagedChannel abstract class.
* Adding ping to Channel.
* Moving builders and implementations to internal.
Server:
* Added lifecycle management API to Server (mirroring ManagedChannel).
* Moved ServerImpl, AbstractServerBuilder and handler registries to internal.
* New ServerBuilder abstract class (mirroring ManagedChannelBuilder).
Fixes#545
This takes some steps towards #525, but it won't be fixed until we
officially support OpenSSL. This is due to the fact that ALPN->NPN fallback
isn't supported with Jetty (since only one of the bootstrap plugins can be provided).
Reserve io.grpc for public API only, and all internal stuff in core to
io.grpc.internal, including the non-stable transport API.
Raise the netty/okhttp/inprocess subpackages one level up to io.grpc,
because they are public API and entry points for most users.
Details:
- Rename io.grpc.transport to io.grpc.internal;
- Move SharedResourceHolder and SerializingExecutor to io.grpc.internal
- Rename io.grpc.transport.{netty|okhttp|inprocess} to
io.grpc.{netty|okhttp|inprocess}
- Rename flushTo() to drainTo().
- Remove flushTo() from DeferredNanoProtoInputStream (which is renamed
to NanoProtoInputStream), because the optimization is not implemented.
- Rename DeferredProtoInputStream to ProtoInputStream.
#529
- Remove blockingClientStreamingCall() which is not used, and we don't
actually want that API.
- Rename duplexStreamingCall() to asyncDuplexStreamingCall() to align
with other async methods.
- In unary call and client streaming call, do not request for additional
response after the first response.
This gives us more flexibility in API changes in the future.
Unary call and server streaming call should call the flow-control method
call.request() only once. Previously it was called whenever a request
arrives, which is wrong. Now it's fixed.
Resolves#436
Resolves#511.
- In generated code, make CONFIG private and METHOD_* fields public.
METHOD_* fields are MethodDescriptors now, users of the CONFIG field
should switch to using the METHOD_* fields.
- Move MethodType into MethodDescriptor (#529).
- Unify the fully qualified method name. It is fully qualified service
name + slash + short method name. It doesn't have the leading slash.
- HandlerRegistry switches the key from short method name to fully
qualified method name.
- Pass CallOptions to Channel.newCall() and
ClientInterceptor.interceptCall().
- Remove timeout from AbstractStub.StubConfigBuilder and add deadline,
which is stored in a CallOptions inside the stub.
- Deadline is in nanoseconds in the clock defined by System.nanoTime().
It is converted to timeout before transmitting on the wire. Fail the
call with DEADLINE_EXCEEDED if it's already expired.
Other classes are already following the convention that ClientFoo for
client-side, and ServerFoo for server-side. Call has been the black
sheep of the family.
- Call -> ClientCall
- Calls -> ClientCalls
- ForwardingCall* -> ForwardingClientCall*
To use the test certs a module has to depend on the grpc-interop-testing module, which doesn't really make sense. I'm moving the certs to the grpc-testing module, which is meant to be a sort of testing common area for all of our modules.
The classes are available, even on Windows. Trying to use them though
won't work. You'll get an error like:
java.lang.UnsatisfiedLinkError: no netty-transport-native-epoll in java.library.path
- streaming ping-pong messages per second
- unary response bandwidth in megabits per second
- streaming response bandwidth in megabits per second
- streaming server that obeys outbound flow control
The previous output was not generated with proto3-alpha-2. This has been
regenerated using protoc on Maven Central, so it should be the same
output everywhere.
Resolves#357
- Add project property ``grpc.skip.codegen``, which is false by default.
People who don't change the codegen nor the proto files can set it to
true so that they don't need to set up C++ compilation.
- Check in all generated files under ``src/generated``.
- Support for Streaming RPCs.
- Support for using DirectExecutor.
- Support for specifying a client payload size.
- Support to export the histogram / latency distribution to file,
so that it can be visualized with a third party tool.
- Support setting the server / client flow control window for the connection and stream.
- Improved output format.
- Update README to include some JVM tuning and profiling information.
- Latencies are no longer reported in nanos, but in micros instead.
- Change warmup period to run the same code as in the actual benchmark.
- Can no longer specify the total number of concurrent RPCs, but instead
the number of concurrent RPCs per channel.
- Import the .proto file used by the C++ QPS package.
- Code Cleanup.
1. Adds <property name="separateLineBetweenGroups" value="true"/> to CustomImportOrder to enfore blank line between imports groups.
2. Uses checkstyle 6.5, which fixed a bug of "CustomImportOrder checks import sorting according to ASCII order instead of case-insensitive alphabetical order".
The checkstyle.xml is a slightly modified version of the upstream Google
checkstyle configuration. All changes have comment describing them.
Lots of warnings were corrected. Examples is the only project that has
warnings still, as the necessary changes require some thought.
The QpsClient no longer executes a fixed number of RPCs but runs for a period of time now (see #83).
After some discussion with @ejona86, we also agreed to remove the "server_threads" parameter
and to no longer use a `DirectExecutor` and thus run the QPS Server without any tweaks and
modifications. We believe/hope that this change will make the comparison between the C++ and
Java versions less "Apples and Oranges".
The "client_threads" parameter was renamed to "concurrent_calls" to better reflect what it
acutally does.
Furthermore, I updated the gradle build script to create separate executables for the
client and the server.
I also added a README.