If a LoadBalancer2 is passed in, the builder will create ManagedChannelImpl2 instead of ManagedChannelImpl. This allows us to test the LBv2 classes on a large scale.
not necessary to synchronze every time calling
getServiceDescriptor(), if the descriptor has been created already;
go with the double-checked locking idom
Perhaps by accident, each LoadClient channel got its own executor
pool rather than sharing between all of them. The benchmarks are
currently set up with C++ idioms, creating 64 "channels" per
client. C++ uses one thread per channel, while java does not,
leading to a threading model mismatch.
ForkJoinPool is the current executor because it has good contention
characteristics. However, FJP is also very sensitive to the
number of actively running tasks. Too many, and the contention
will go up. Too few, and threads will repeatedly go to sleep and
wake up.
This change basically makes the FJP shared for all clients in the
same process. It doesn't try at all to shut it down properly or
be generally useful; it's just a benchmark. Additionally, this
also groups the Netty and OkHttp specific channel options into
helper functions. This means OkHttp benchmarks will also use
the correct executor now.
A quick test shows QPS going from ~107kqps to 149kqps.
Fixes: #2406
core: adds @Nullable Object getAttachedObject() to ServiceDescriptor
compiler: Plumbing necessary to access proto file descriptors via
the reflection service
This doesn't impact test behavior per-se, but causes it to produce less
useless log output of the form:
java.lang.IllegalStateException: call was half-closed
at com.google.common.base.Preconditions.checkState(Preconditions.java:174)
at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:380)
at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:299)
at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:406)
at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:400)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:382)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessageRead.runInContext(ClientCallImpl.java:473)
... 7 more
Fixes#2372
The DefaultHttp2Headers class is a general-purpose Http2Headers implementation
and provides much more functionality than we need in gRPC. In gRPC, when reading
headers off the wire, we only inspect a handful of them, before converting to
Metadata.
This commit introduces a Http2Headers implementation that aims for insertion
efficiency, a low memory footprint and fast conversion to Metadata.
- Header names and values are stored in plain byte[].
- Insertion is O(1), while lookup is now O(n).
- Binary header values are base64 decoded as they are inserted.
- The byte[][] returned by namesAndValues() can directly be used to construct
a new Metadata object.
- For HTTP/2 request headers, the pseudo headers are no longer carried over to
Metadata.
A microbenchmark aiming to replicate the usage of Http2Headers in NettyClientHandler
and NettyServerHandler shows decent throughput gains when compared to DefaultHttp2Headers.
Benchmark Mode Cnt Score Error Units
InboundHeadersBenchmark.defaultHeaders_clientHandler avgt 10 283.830 ± 4.063 ns/op
InboundHeadersBenchmark.defaultHeaders_serverHandler avgt 10 1179.975 ± 21.810 ns/op
InboundHeadersBenchmark.grpcHeaders_clientHandler avgt 10 190.108 ± 3.510 ns/op
InboundHeadersBenchmark.grpcHeaders_serverHandler avgt 10 561.426 ± 9.079 ns/op
Additionally, the memory footprint is reduced by more than 50%!
gRPC Request Headers: 864 bytes
Netty Request Headers: 1728 bytes
gRPC Response Headers: 216 bytes
Netty Response Headers: 528 bytes
Furthermore, this change does most of the gRPC groundwork necessary to be able
to cache higher ordered objects in HPACK's dynamic table, as discussed in [1].
[1] https://github.com/grpc/grpc-java/issues/2217
MessageFramer calls Drainable.drainTo with a special output stream of
OutputStreamAdapter. Currently, ByteBufInputStream writes to this output
stream by allocating a heapBuffer in UnsafeByteBufUtil.getBytes, copying
from the direct byte buffer of BBIS, and then copies to the direct byte
buffer from MessageFramer.writeRaw().
This change is an easy way to cut down on wasted memory, even though
ideally there would be some way to have less copies. The actual data is
only around 10 bytes, but causes O(10)s of megabytes allocation for the
heap pool.
For #2062
The benchmarks today do not have a good way to record metrics with precision
or shutdown safely when the benchmark is over. This change alters the
AbstractBenchmark class to return a latch that can be waited upon when ending
the benchmark.
Benchmarks also would accidentally request way too many messages from the
server by calling request(1) explicitly in addition to the implicit one
in the StreamObserver to Call adapter. This change adds a few outstanding
requests, but otherwise keeps the request count bounded.
Additionally, benchmark calls would ignore errors, and just shutdown in such
cases. This changes them to log the error and just wait for the benchmark to
complete. In the successful case, the benchmark client notifies server by
halfClosing (via onCompleted) where it previously did not. It is also
careful to only do this once.
Lastly, Benchmarks have been changes to enable and disable recording at exact
points in the benchmark method, rather than waiting for teardown to occur.
Also, recording begins inside the recording method, not in Setup. JMH may
do other procressing before, between, and after iterations.
partially resolving #1469
The added option for java_plugin `enable_deprecated` is `true` by default in `java_plugin.cpp`, so the generated code for `TestService.java` (`compiler/build.gradle` not setting this option) has all deprecated interfaces and static bindService method.
`./build.gradle` and `examples/build.gradle` set this option explicitly to `false`, so all the other generated classes do not have deprecated code.
Will set `enable_deprecated` to `false` by default in future PR when we are ready.
first step to address issue #1469:
- leave and deprecate interfaces in codegen
- introduce `ServiceImplBase`,
- `AbstractService` is deprecated and extends `ServiceImplBase`
- static `bindService()` is deprecated
Resolves#1756
The thread-unsafe method `io.grpc.testing.TestUtils.pickUnusedPort` causes flakes (#1756) in windows. Need to avoid use of this method in test as in windows the tests are running in different jvms and concurrent calls of this method in multiple processes tend to return the same port number.
There are some usages of this method in benchmarks, so moved the method to `io.grpc.benchmarks.Utils` and the method will only be used in benchmarks and not in test.
This allows us to play with zero-copy and proto3 support for lite.
Unfortunately, it introduced some warnings, so deprecated warnings are
now ignored for benchmarks and interop-testing.
This reverts commit 3df1446deb.
The commit was adding to the difficulty of integration for testing. By
itself it isn't bad, so this is a temporary revert until the many other
commits are absorbed and then it will be reapplied.
This does have a manual edit for ClientCallsTest.
This removes a reference of io.grpc.internal from io.grpc. It also
optimizes server call handling for outbound grpc-accept-encoding header,
as had already been done for client call.
This now catches a few more places we needed -Xlint:-options.
InProcessSocketAddress is technically already in our stable API, so I
maintained its current serialVersionUID.
See #933
- Create InternalHandlerRegistry, an immutable look-up table. Handlers
passed to ServerBuilder.addService() go to this registry. This covers
the most common use cases. By keeping the registry internal we could
freely change the registry's interface to accommodate optimizations,
e.g., for hpack.
- The internal registry uses a flat fullMethodName -> handler look-up
table instead of a hierarchical one used before. It faster because it
saves one look-up and a substring.
- Introduces the fallback registry, settable by
ServerBuilder.fallbackHandlerRegistry(), for advanced users who want a
dynamic registry. Moved the current MutableHandlerRegistryImpl to
io.grpc.util.MutableHandlerRegistry as a stock implementation of the
fallback registry. The io.grpc.MutableHandlerRegistry interface is now
removed.
This reverts commit 8825f355df.
The commit changed the package name of services that were used across
languages. That broke their functionality pretty severely. The changes
require more coordination with others.
Resolves#1221
Add ClientCall.cancel(String, Throwable) and deprecate
ClientCall.cancel(). Will delete cancel() after all known third-party
overriders have switched to overriding the new one.
This reduces the necessary number of threads in the application executor
and provides a small improvement in latency (~15μs, which is normally in
the noise, but would be a 5% improvement).
Benchmark (direct) (transport) Mode Cnt Score Error Units
Before:
TransportBenchmark.unaryCall1024 true INPROCESS avgt 10 1566.168 ± 13.677 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 10 35769.532 ± 2358.967 ns/op
After:
TransportBenchmark.unaryCall1024 true INPROCESS avgt 10 1813.778 ± 19.995 ns/op
TransportBenchmark.unaryCall1024 false INPROCESS avgt 10 18568.223 ± 1679.306 ns/op
The benchmark results are exactly what we would expect, assuming that
half of the benefit of direct is on server and half on client:
1566 + (35769 - 1566) / 2 = 18668 ns --vs-- 18568 ns
It is expected that direct=true would get worse, because
SerializingExecutor is now used instead of
SerializeReentrantCallsDirectExecutor plus the additional cost of
ThreadlessExecutor.
In the future we could try to detect the ThreadlessExecutor and ellide
Serializ*Executor completely (as is possible for any single-threaded
executor). We could also optimize the queue used in ThreadlessExecutor
to be single-producer, single-consumer. I don't expect to do those
optimizations soon, however.
This reduces the number of classes defined, which reduces memory usage.
It also reduces the number of methods defined, which is important
because of the dex limit.
This should have virtually zero performance degradation because the
contiguous switch uses tableswitch bytecode.
When using a direct executor we don't need to wrap calls in a
serializing executor and can thus also avoid the overhead that
comes with it.
Benchmarks show that throughput can be improved substantially.
On my MBP I get a 24% improvement in throughput with also
significantly better latency throughout all percentiles.
(running qps_client and qps_server with --address=localhost:1234 --directexecutor)
=== BEFORE ===
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 452
90%ile Latency (in micros): 600
95%ile Latency (in micros): 726
99%ile Latency (in micros): 1314
99.9%ile Latency (in micros): 5663
Maximum Latency (in micros): 136447
QPS: 78498
=== AFTER ===
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 399
90%ile Latency (in micros): 429
95%ile Latency (in micros): 453
99%ile Latency (in micros): 650
99.9%ile Latency (in micros): 1265
Maximum Latency (in micros): 33855
QPS: 97552
A few things to note:
- ByteString has gone away in favor of AsciiString.
- Http2Headers now uses CharSequence for all methods, so there are a few places that we have to explicitly check for AsciiString to get the optimizations.
- We now have to specify a graceful shutdown timeout for our Netty handlers. Using 5 seconds.
There is no need to use ServerMethodDefinition in codegen. The create()
method itself could be helpful to a dynamic HandlerRegistry
implementation, so we won't remove it.
Client:
* New ManagedChannel abstract class.
* Adding ping to Channel.
* Moving builders and implementations to internal.
Server:
* Added lifecycle management API to Server (mirroring ManagedChannel).
* Moved ServerImpl, AbstractServerBuilder and handler registries to internal.
* New ServerBuilder abstract class (mirroring ManagedChannelBuilder).
Fixes#545
This takes some steps towards #525, but it won't be fixed until we
officially support OpenSSL. This is due to the fact that ALPN->NPN fallback
isn't supported with Jetty (since only one of the bootstrap plugins can be provided).
Reserve io.grpc for public API only, and all internal stuff in core to
io.grpc.internal, including the non-stable transport API.
Raise the netty/okhttp/inprocess subpackages one level up to io.grpc,
because they are public API and entry points for most users.
Details:
- Rename io.grpc.transport to io.grpc.internal;
- Move SharedResourceHolder and SerializingExecutor to io.grpc.internal
- Rename io.grpc.transport.{netty|okhttp|inprocess} to
io.grpc.{netty|okhttp|inprocess}
- Rename flushTo() to drainTo().
- Remove flushTo() from DeferredNanoProtoInputStream (which is renamed
to NanoProtoInputStream), because the optimization is not implemented.
- Rename DeferredProtoInputStream to ProtoInputStream.
#529
- Remove blockingClientStreamingCall() which is not used, and we don't
actually want that API.
- Rename duplexStreamingCall() to asyncDuplexStreamingCall() to align
with other async methods.
- In unary call and client streaming call, do not request for additional
response after the first response.
This gives us more flexibility in API changes in the future.
Unary call and server streaming call should call the flow-control method
call.request() only once. Previously it was called whenever a request
arrives, which is wrong. Now it's fixed.
Resolves#436
Resolves#511.
- In generated code, make CONFIG private and METHOD_* fields public.
METHOD_* fields are MethodDescriptors now, users of the CONFIG field
should switch to using the METHOD_* fields.
- Move MethodType into MethodDescriptor (#529).
- Unify the fully qualified method name. It is fully qualified service
name + slash + short method name. It doesn't have the leading slash.
- HandlerRegistry switches the key from short method name to fully
qualified method name.
- Pass CallOptions to Channel.newCall() and
ClientInterceptor.interceptCall().
- Remove timeout from AbstractStub.StubConfigBuilder and add deadline,
which is stored in a CallOptions inside the stub.
- Deadline is in nanoseconds in the clock defined by System.nanoTime().
It is converted to timeout before transmitting on the wire. Fail the
call with DEADLINE_EXCEEDED if it's already expired.
Other classes are already following the convention that ClientFoo for
client-side, and ServerFoo for server-side. Call has been the black
sheep of the family.
- Call -> ClientCall
- Calls -> ClientCalls
- ForwardingCall* -> ForwardingClientCall*
To use the test certs a module has to depend on the grpc-interop-testing module, which doesn't really make sense. I'm moving the certs to the grpc-testing module, which is meant to be a sort of testing common area for all of our modules.
The classes are available, even on Windows. Trying to use them though
won't work. You'll get an error like:
java.lang.UnsatisfiedLinkError: no netty-transport-native-epoll in java.library.path
"Interoperability" is a more appropriate name for the tests, since they
are used for testing across different implementations. They will do a
bit of integration testing, like for auth, but this is a smaller scale.
It seems the other languages (Go, C++, Node, PHP, Python, Ruby) are
using "interop" to describe the tests, and the test case specifications
document is named with "interop". After this change, C# will be the only
language calling them "integration" tests.
This change just renames the folder and artifact. We can change the
internal package names later. However, once we do a release, old
artifact names will live forever in Maven Central.