Commit Graph

244 Commits

Author SHA1 Message Date
Carl Mastrangelo 870ae40c8d benchmarks: disable flag printing 2017-04-14 11:17:47 -07:00
Łukasz Strzałkowski 4f96b0a483 compiler: move over to method descriptor builder 2017-04-12 08:56:46 -07:00
Carl Mastrangelo 17b90169d8 all: begin 1.4.x development cycle 2017-04-11 14:51:39 -07:00
Kun Zhang 903197b2aa core: StreamTracer (#2863)
Background
==========

LoadBalancer needs to track RPC measurements and status for
load-reporting.  We need to introduce a "Tracer" API for that.

Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.

Alternatives
============

We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.

Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.

Part of the LB-related information is whether the client has received
the initial headers from the server.  While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.

Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API.  Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.

Summary of changes
==================

API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.

On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.

Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor.  By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call.  This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.

On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.

The Tracer API supports propagation of stats/tracing information through
Context and metadata.  Both client-side and server-side tracer factories
have access to the headers object.  Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.

Implementation details
----------------------

Only real streams report stats.  Pseudo streams such as delayed stream,
failing stream don't report.  InProcess transport streams currently
don't report stats.

"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream.  On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.

The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.

Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.

Bug fixed
----------------

The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong.  Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK.  Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.

Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK).  Whichever happens
first is the reported status.

TODOs
=====

A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
2017-04-07 11:03:24 -07:00
Carl Mastrangelo 824e5df5cf benchmarks: use JMH 1.18 2017-03-30 14:52:22 -07:00
Eric Anderson 4096d4b668 core,netty: support GET verb in AbstractClientStream2 2017-03-30 14:18:14 -07:00
Carl Mastrangelo a4d698f7c1 core: make SerializingExecutor lockless (#2858)
This is an alternative implementation of #2192
2017-03-29 17:57:42 -07:00
Carl Mastrangelo 7a73bf1068 core,benchmarks: use Atomics for StatsTraceContext
This removes a needless warning, and isn't much slower.  Also this
includes a benchmark for StatsTraceContext to measure the overhead
for creation.  It adds about 40ns per RPC.  Optimization will come
after structural changes are made to break the dependency on
Census.
2017-03-23 17:36:21 -07:00
Eric Anderson 48a32fbeaa benchmarks: Fix broken building of ServerServiceDefinition
This appears to have been broken by 3df1446 (which was reverted and
later rolled forward again in 66ab956).

Without this fix, the ServerServiceDefinition.Builder realizes that a
method is registered that isn't in the ServiceDescriptor. Swapping to a
different constructor causes the builder to generate the
ServiceDescriptor for us.

java.lang.IllegalStateException: No entry in descriptor matching bound method E6Cq77iKGNKVCGyVOqq8DqEazX9AcBdPNoMj86c3I5zo4Tv77U/vLe7QS7mhUfaooN7eYdBW7gd9oyV.kc9I0zJumfuUbhyb7SR1u
	at io.grpc.ServerServiceDefinition$Builder.build(ServerServiceDefinition.java:164)
	at io.grpc.benchmarks.netty.HandlerRegistryBenchmark.setup(HandlerRegistryBenchmark.java:107)
2017-03-23 13:39:16 -07:00
Carl Mastrangelo ee12cc2a34 all: update to latest version of errorprone 2017-03-22 22:09:04 -07:00
Carl Mastrangelo 7ce2b4f81d all: start 1.3.0 development cycle 2017-03-06 13:12:44 -08:00
Carl Mastrangelo 5c37a8360c core: cache Accept-Encoding headers (#2766)
* core: cache Accept-Encoding headers

This avoids rebuilding the raw bytes for each RPC.  The decode path
is not yet optimized to avoid pulling to much into a single commit.
Decompressors may still use invalid names, but this is equivalent to
previous behavior, as cleaing happens later in the caching.

Also, internal accessors on DecompressorRegistry are now hidden.

Before:
Benchmark                                                    (extraEncodings)    Mode      Cnt        Score    Error  Units
DecompressorRegistryBenchmark.marshalOld                                    0  sample   928744      124.104 ± 11.159  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   0  sample                84.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   0  sample                94.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   0  sample               107.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   0  sample               114.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   0  sample               202.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  0  sample              4944.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 0  sample             12178.008           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   0  sample           2056192.000           ns/op
DecompressorRegistryBenchmark.marshalOld                                    1  sample  1345050      150.123 ±  6.952  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   1  sample               109.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   1  sample               127.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   1  sample               142.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   1  sample               152.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   1  sample               243.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  1  sample              4640.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 1  sample             11472.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   1  sample           2101248.000           ns/op
DecompressorRegistryBenchmark.marshalOld                                    2  sample  1130903      175.846 ±  1.392  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   2  sample               131.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   2  sample               148.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   2  sample               164.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   2  sample               174.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   2  sample               311.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  2  sample              6048.768           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 2  sample             12349.107           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   2  sample            112000.000           ns/op

After:
Benchmark                                                    (extraEncodings)    Mode      Cnt        Score   Error  Units
DecompressorRegistryBenchmark.marshalOld                                    0  sample  1095005       67.555 ± 5.529  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   0  sample                42.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   0  sample                52.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   0  sample                69.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   0  sample                84.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   0  sample               133.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  0  sample              3324.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 0  sample             11056.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   0  sample           1820672.000          ns/op
DecompressorRegistryBenchmark.marshalOld                                    1  sample  1437034       78.089 ± 0.723  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   1  sample                60.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   1  sample                69.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   1  sample                79.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   1  sample                83.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   1  sample                96.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  1  sample              2728.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 1  sample             11104.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   1  sample            105344.000          ns/op
DecompressorRegistryBenchmark.marshalOld                                    2  sample  1203782       95.213 ± 0.864  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   2  sample                68.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   2  sample                85.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   2  sample                98.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   2  sample               101.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   2  sample               119.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  2  sample              3209.736          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 2  sample             11257.947          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   2  sample             63168.000          ns/op
2017-03-01 15:30:28 -08:00
Eric Anderson 675080b208 all: Enable ErrorProne during compilation
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.

This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
2017-02-24 14:53:23 -08:00
Eric Anderson 72923dca87 all: prepare for ErrorProne's FutureReturnValueIgnored
Futures almost universally should be handled in some way when being
returned, either to receive the value or to cancel scheduled tasks to
prevent leaks.

Netty is a bit of a special case though, since it constantly returns
futures that you ignore (even adding a listener returns the "this"
future). So we want to suppress the warning for code using Netty instead
of trying to fix it. When we enable ErrorProne in the build, we should
start passing -Xep:FutureReturnValueIgnored:OFF in the compilerArgs.
2017-02-16 07:22:56 -08:00
Carl Mastrangelo 700abb32af compiler: add some missing final modifiers on generated code 2017-02-10 10:16:43 -08:00
Carl Mastrangelo b0323ac22c all: update to protobuf 3.2.0 2017-02-07 09:47:15 -08:00
Carl Mastrangelo e8aef5b4bb Start 1.2.0 development cycle 2017-01-30 16:40:12 -08:00
Carl Mastrangelo 237a65ebfc core: make ServiceDescriptor use the Builder pattern 2017-01-30 15:22:40 -08:00
Carl Mastrangelo 566d0e9400 testing: change names of noopMarshaller to void marshaller
This is needed because in interceptor tests, often the types cannot
be changed.  The void methods stay for users who are writing tests
where they actually don't care about types.  The noop methods
require types to be specified.  This is for users who don't care
about the implementation.  These represent different levels of
commitment.

This eases the transition of code Mocking MethodDescriptor, which
breaks in this release.
2017-01-27 14:41:38 -08:00
Carl Mastrangelo 89bc2cd3b2 all: update to latest import ordering 2017-01-26 13:43:06 -08:00
Carl Mastrangelo efbcd1f1b9 core: change method descriptor to be builder based 2017-01-23 12:29:35 -08:00
Carl Mastrangelo d5eb248737 all: bump to netty 4.1.7 2017-01-19 15:24:26 -08:00
Kun Zhang d17a7b5bd4 core: abstract channel builder to accept LoadBalancer2 (#2583)
If a LoadBalancer2 is passed in, the builder will create ManagedChannelImpl2 instead of ManagedChannelImpl. This allows us to test the LBv2 classes on a large scale.
2017-01-10 15:30:12 -08:00
Kenji Kaneda fed9be28ca benchmarks: do not set done to true when HistogramFuture#get 2017-01-09 11:22:01 -08:00
ZHANG Dapeng 3d210ae875 compiler: reduce synchronzed invocation (#2539)
not necessary to synchronze every time calling
getServiceDescriptor(), if the descriptor has been created already;

go with the double-checked locking idom
2016-12-29 12:21:04 -08:00
Kun Zhang 1aaf1a989c compiler: final bindService() in generated code. (#2553)
So that it won't be overridden by Mockito when it creates a mock for
the server interface.
2016-12-29 10:32:47 -08:00
Carl Mastrangelo 49b235dbbe benchmarks: update to jmh 1.17.3 2016-12-19 16:14:59 -08:00
Lukasz Strzalkowski ecb79ce36f Update param name for saving histograms 2016-12-08 11:21:50 -08:00
Carl Mastrangelo f15ed05260 stub: remove a reference to internal 2016-12-02 09:13:37 -08:00
ZHANG Dapeng e70f7b421c all: cleanup - errorprone, unused 2016-11-16 19:15:16 -08:00
Carl Mastrangelo 944879dbbb benchmarks: reuse executor between channels
Perhaps by accident, each LoadClient channel got its own executor
pool rather than sharing between all of them.  The benchmarks are
currently set up with C++ idioms, creating 64 "channels" per
client.  C++ uses one thread per channel, while java does not,
leading to a threading model mismatch.

ForkJoinPool is the current executor because it has good contention
characteristics.   However, FJP is also very sensitive to the
number of actively running tasks.  Too many, and the contention
will go up.  Too few, and threads will repeatedly go to sleep and
wake up.

This change basically makes the FJP shared for all clients in the
same process.  It doesn't try at all to shut it down properly or
be generally useful; it's just a benchmark.  Additionally, this
also groups the Netty and OkHttp specific channel options into
helper functions.  This means OkHttp benchmarks will also use
the correct executor now.

A quick test shows QPS going from ~107kqps to 149kqps.

Fixes: #2406
2016-11-09 09:52:02 -08:00
Carl Mastrangelo 636b43b871 benchmarks: print gc detail and vm flags for benchmarks 2016-11-03 12:55:42 -07:00
Carl Mastrangelo f6c333eefe benchmarks: check shutdown after acquiring a token, and weaken error messages on shutdown 2016-11-02 13:08:16 -07:00
Eric Gribkoff abffc76da2 addressing reviewer comments 2016-10-28 08:45:32 -07:00
Eric Gribkoff aff1cac7da Compiler/core changes to support the proto reflection API
core: adds @Nullable Object getAttachedObject() to ServiceDescriptor

compiler: Plumbing necessary to access proto file descriptors via
the reflection service
2016-10-28 08:45:32 -07:00
Eric Anderson 98f4e41e7f benchmarks: Avoid sending a message after half close
This doesn't impact test behavior per-se, but causes it to produce less
useless log output of the form:

java.lang.IllegalStateException: call was half-closed
	at com.google.common.base.Preconditions.checkState(Preconditions.java:174)
	at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:380)
	at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:299)
	at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:406)
	at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:400)
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:382)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessageRead.runInContext(ClientCallImpl.java:473)
	... 7 more

Fixes #2372
2016-10-26 15:55:19 -07:00
Carl Mastrangelo 894802e3ec compiler: lazily generate ServiceDescriptors 2016-10-19 16:52:47 -07:00
Jakob Buchgraber 46e46eb8bd benchmarks: Reduce histogram precision. (#2343)
It greatly reduces memory usage (from 130KiB to 30KiB per histogram).
2016-10-15 10:45:00 +02:00
Carl Mastrangelo 1623063143 core: speed up Status code and message parsing
This introduces the idea of a "Trusted" Ascii Marshaller, which is
known to always produce valid ASCII byte arrays.  This saves a
surprising amount of garbage, since String conversion involves
creating a new java.lang.StringCoding, and a sun.nio.cs.US_ASCII.

There are other types that can be converted (notably
Http2ClientStream's :status marshaller, which is particularly
wasteful).

Before:
Benchmark                              Mode     Cnt     Score    Error  Units
StatusBenchmark.codeDecode           sample  641278    88.889 ±  9.673  ns/op
StatusBenchmark.codeEncode           sample  430800    73.014 ±  1.444  ns/op
StatusBenchmark.messageDecodeEscape  sample  433467   441.078 ± 58.373  ns/op
StatusBenchmark.messageDecodePlain   sample  676526   268.620 ±  7.849  ns/op
StatusBenchmark.messageEncodeEscape  sample  547350  1211.243 ± 29.907  ns/op
StatusBenchmark.messageEncodePlain   sample  419318   223.263 ±  9.673  ns/op

After:
Benchmark                              Mode     Cnt    Score    Error  Units
StatusBenchmark.codeDecode           sample  442241   48.310 ±  2.409  ns/op
StatusBenchmark.codeEncode           sample  622026   35.475 ±  0.642  ns/op
StatusBenchmark.messageDecodeEscape  sample  595572  312.407 ± 15.870  ns/op
StatusBenchmark.messageDecodePlain   sample  565581   99.090 ±  8.799  ns/op
StatusBenchmark.messageEncodeEscape  sample  479147  201.422 ± 10.765  ns/op
StatusBenchmark.messageEncodePlain   sample  560957   94.722 ±  1.187  ns/op

Also fixes #2237

Before:
Result "unaryCall1024":
  mean = 155710.268 ±(99.9%) 149.278 ns/op

  Percentiles, ns/op:
      p(0.0000) =  63552.000 ns/op
     p(50.0000) = 151552.000 ns/op
     p(90.0000) = 188672.000 ns/op
     p(95.0000) = 207360.000 ns/op
     p(99.0000) = 260608.000 ns/op
     p(99.9000) = 358912.000 ns/op
     p(99.9900) = 1851425.792 ns/op
     p(99.9990) = 11161178.767 ns/op
     p(99.9999) = 14985005.383 ns/op
    p(100.0000) = 17235968.000 ns/op

Benchmark                         (direct)  (transport)    Mode      Cnt       Score     Error  Units
TransportBenchmark.unaryCall1024      true        NETTY  sample  3205966  155710.268 ± 149.278  ns/op

After:
Result "unaryCall1024":
  mean = 147474.794 ±(99.9%) 128.733 ns/op

  Percentiles, ns/op:
      p(0.0000) =  59520.000 ns/op
     p(50.0000) = 144640.000 ns/op
     p(90.0000) = 176128.000 ns/op
     p(95.0000) = 190464.000 ns/op
     p(99.0000) = 236544.000 ns/op
     p(99.9000) = 314880.000 ns/op
     p(99.9900) = 1113084.723 ns/op
     p(99.9990) = 10783126.979 ns/op
     p(99.9999) = 13887153.242 ns/op
    p(100.0000) = 15253504.000 ns/op

Benchmark                         (direct)  (transport)    Mode      Cnt       Score     Error  Units
TransportBenchmark.unaryCall1024      true        NETTY  sample  3385015  147474.794 ± 128.733  ns/op
2016-09-13 13:08:59 -07:00
Jakob Buchgraber 8c18a0d355 netty: use custom http2 headers for decoding.
The DefaultHttp2Headers class is a general-purpose Http2Headers implementation
and provides much more functionality than we need in gRPC. In gRPC, when reading
headers off the wire, we only inspect a handful of them, before converting to
Metadata.

This commit introduces a Http2Headers implementation that aims for insertion
efficiency, a low memory footprint and fast conversion to Metadata.

  - Header names and values are stored in plain byte[].
  - Insertion is O(1), while lookup is now O(n).
  - Binary header values are base64 decoded as they are inserted.
  - The byte[][] returned by namesAndValues() can directly be used to construct
    a new Metadata object.
  - For HTTP/2 request headers, the pseudo headers are no longer carried over to
    Metadata.

A microbenchmark aiming to replicate the usage of Http2Headers in NettyClientHandler
and NettyServerHandler shows decent throughput gains when compared to DefaultHttp2Headers.

Benchmark                                             Mode  Cnt     Score    Error  Units
InboundHeadersBenchmark.defaultHeaders_clientHandler  avgt   10   283.830 ±  4.063  ns/op
InboundHeadersBenchmark.defaultHeaders_serverHandler  avgt   10  1179.975 ± 21.810  ns/op
InboundHeadersBenchmark.grpcHeaders_clientHandler     avgt   10   190.108 ±  3.510  ns/op
InboundHeadersBenchmark.grpcHeaders_serverHandler     avgt   10   561.426 ±  9.079  ns/op

Additionally, the memory footprint is reduced by more than 50%!

gRPC Request Headers: 864 bytes
Netty Request Headers: 1728 bytes
gRPC Response Headers: 216 bytes
Netty Response Headers: 528 bytes

Furthermore, this change does most of the gRPC groundwork necessary to be able
to cache higher ordered objects in HPACK's dynamic table, as discussed in [1].

[1] https://github.com/grpc/grpc-java/issues/2217
2016-09-09 23:15:18 +02:00
Carl Mastrangelo de9c320196 benchmarks: upgrade to jmh 1.14 2016-09-09 14:14:31 -07:00
Carl Mastrangelo ca5a402fe6 netty: cache method path conversion
Benchmark                                      Mode      Cnt   Score   Error  Units
MethodDescriptorBenchmark.direct             sample  1179094  43.483 ± 0.610  ns/op
MethodDescriptorBenchmark.old                sample  1079285  66.210 ± 5.402  ns/op
MethodDescriptorBenchmark.transportSpecific  sample  1408070  36.800 ± 0.423  ns/op
2016-09-07 14:41:02 -07:00
Carl Mastrangelo 54f5c4ba89 benchmarks: add fork join pool executor to load server too 2016-08-04 17:34:41 -07:00
Carl Mastrangelo 130d3815cf benchmarks: use a fork-join pool to reduce executor contention 2016-08-04 16:49:10 -07:00
Carl Mastrangelo 1aadc6c223 benchmarks: update to using jmh 1.13 2016-08-04 13:04:49 -07:00
Eric Anderson 5e1e88357c Update protobuf to 3.0.0
Fixes #2086
2016-07-29 09:31:15 -07:00
Carl Mastrangelo efc5cf50b7 benchmarks: avoid 2 copies in favor of one
MessageFramer calls Drainable.drainTo with a special output stream of
OutputStreamAdapter.  Currently, ByteBufInputStream writes to this output
stream by allocating a heapBuffer in UnsafeByteBufUtil.getBytes, copying
from the direct byte buffer of BBIS, and then copies to the direct byte
buffer from MessageFramer.writeRaw().

This change is an easy way to cut down on wasted memory, even though
ideally there would be some way to have less copies.  The actual data is
only around 10 bytes, but causes O(10)s of megabytes allocation for the
heap pool.

For #2062
2016-07-26 10:38:03 -07:00
Carl Mastrangelo bfd12e4a86 benchmarks: improve benchmarks recording and shutdown
The benchmarks today do not have a good way to record metrics with precision
or shutdown safely when the benchmark is over.  This change alters the
AbstractBenchmark class to return a latch that can be waited upon when ending
the benchmark.

Benchmarks also would accidentally request way too many messages from the
server by calling request(1) explicitly in addition to the implicit one
in the StreamObserver to Call adapter.  This change adds a few outstanding
requests, but otherwise keeps the request count bounded.

Additionally, benchmark calls would ignore errors, and just shutdown in such
cases.  This changes them to log the error and just wait for the benchmark to
complete.  In the successful case, the benchmark client notifies server by
halfClosing (via onCompleted) where it previously did not.  It is also
careful to only do this once.

Lastly, Benchmarks have been changes to enable and disable recording at exact
points in the benchmark method, rather than waiting for teardown to occur.
Also, recording begins inside the recording method, not in Setup.  JMH may
do other procressing before, between, and after iterations.
2016-07-22 14:55:03 -07:00
ZHANG Dapeng 3a13aa5665 compiler: make Stub final class 2016-07-22 09:49:32 -07:00
ZHANG Dapeng e109125c62 compiler: add build option to enable deprecated generated code
partially resolving #1469

The added option for java_plugin `enable_deprecated` is `true` by default in `java_plugin.cpp`, so the generated code for `TestService.java` (`compiler/build.gradle` not setting this option) has all deprecated interfaces and static bindService method.

`./build.gradle` and `examples/build.gradle` set this option explicitly to `false`, so all the other generated classes do not have deprecated code.

Will set `enable_deprecated` to `false` by default in future PR when we are ready.
2016-07-21 16:35:18 -07:00
Carl Mastrangelo fa86c8eb74 benchmarks: daemonize threads to reduce the number of stacktraces 2016-07-19 11:32:00 -07:00
Louis Ryan c1ef8061d1 Fix selection of security Provider to conscruct SSLContext
Cleanup redundant API un TestUtils
Fix TlsTest to be ignored on JKD7 correctly
2016-07-18 14:25:27 -07:00
Eric Anderson e9643bb5d7 Start 1.1.0 development cycle 2016-07-11 16:57:58 -07:00
Eric Anderson 46379da1a6 Start 1.0.0 development cycle 2016-07-01 11:46:33 -07:00
nmittler 8f78b4d88e Fixing error-prone warnings. 2016-07-01 10:03:41 -07:00
Jakob Buchgraber 3eafa5aeec benchmarks: fix typo in README 2016-07-01 16:20:47 +02:00
ZHANG Dapeng f149e4c175 compiler: deprecate interfaces and add ImplBase in codegen
first step to address issue #1469:

- leave and deprecate interfaces in codegen
- introduce `ServiceImplBase`,
- `AbstractService` is deprecated and extends `ServiceImplBase`
- static `bindService()` is deprecated
2016-06-29 21:17:03 -07:00
ZHANG Dapeng 8ed2dc8bec testware: fix flakes caused by pickUnusedPort
Resolves #1756

The thread-unsafe method `io.grpc.testing.TestUtils.pickUnusedPort` causes flakes (#1756) in windows. Need to avoid use of this method in test as in windows the tests are running in different jvms and concurrent calls of this method in multiple processes tend to return the same port number.

There are some usages of this method in benchmarks, so moved the method to `io.grpc.benchmarks.Utils` and the method will only be used in benchmarks and not in test.
2016-06-28 13:34:38 -07:00
Eric Anderson dbef1af29a Bump protobuf dependency to 3.0.0-beta-3
This allows us to play with zero-copy and proto3 support for lite.
Unfortunately, it introduced some warnings, so deprecated warnings are
now ignored for benchmarks and interop-testing.
2016-06-28 08:58:13 -07:00
Eric Anderson 66ab956f9e Reapply "Eliminate MethodDescriptor from startCall and interceptCall for servers"
This reverts commit ef178304cb, which
itself was a revert.
2016-06-23 09:27:47 -07:00
Eric Anderson ef178304cb Revert "Eliminate MethodDescriptor from startCall and interceptCall for servers"
This reverts commit 3df1446deb.

The commit was adding to the difficulty of integration for testing. By
itself it isn't bad, so this is a temporary revert until the many other
commits are absorbed and then it will be reapplied.

This does have a manual edit for ClientCallsTest.
2016-06-20 15:18:18 -07:00
Jakob Buchgraber 63d8274661 core: don't create a new context for each client call. Fixes #1926
Also, undo the hack that makes sure AsyncClient's stack doesn't overflow as it's no longer needed.
2016-06-16 16:02:17 +02:00
Jakob Buchgraber 33cd6bee0e core: Overwrite values for existing custom call options.
When a custom calloption is overwritten we keep its old value around.
This patch actually overwrites the old value.
2016-06-15 10:29:05 +02:00
Carl Mastrangelo 7c2f7913dc benchmarks: add a header encoding benchmark
Initial numbers so we have some reference:

Benchmark                              (headerCount)    Mode     Cnt     Score    Error  Units
HeadersBenchmark.convertClientHeaders              1  sample  108722   111.912 ±  1.751  ns/op
HeadersBenchmark.convertClientHeaders              5  sample  165053   268.991 ±  1.776  ns/op
HeadersBenchmark.convertClientHeaders             10  sample  188272   441.860 ±  2.531  ns/op
HeadersBenchmark.convertClientHeaders             20  sample  190869   841.065 ± 31.847  ns/op
HeadersBenchmark.convertServerHeaders              1  sample  126578   110.958 ±  1.604  ns/op
HeadersBenchmark.convertServerHeaders              5  sample  179316   241.991 ±  2.006  ns/op
HeadersBenchmark.convertServerHeaders             10  sample  103436   406.329 ±  3.114  ns/op
HeadersBenchmark.convertServerHeaders             20  sample   99179   809.501 ±  4.320  ns/op
HeadersBenchmark.encodeClientHeaders               1  sample  134333   625.415 ± 48.461  ns/op
HeadersBenchmark.encodeClientHeaders               5  sample  167081   971.640 ± 42.102  ns/op
HeadersBenchmark.encodeClientHeaders              10  sample  112809  1418.281 ± 61.045  ns/op
HeadersBenchmark.encodeClientHeaders              20  sample  137593  2286.269 ±  6.715  ns/op
2016-06-14 10:41:31 -07:00
Louis Ryan 3df1446deb Eliminate MethodDescriptor from startCall and interceptCall for servers
Make the MethodDescriptor a property of ServerCall
Move ServerMethodDefinition into ServerServiceDefinition
2016-06-13 14:39:58 -07:00
Eric Anderson 0b55c81548 core: Move ACCEPT_ENCODING_JOINER to DecompressorRegistry
This removes a reference of io.grpc.internal from io.grpc. It also
optimizes server call handling for outbound grpc-accept-encoding header,
as had already been done for client call.
2016-06-13 13:42:58 -07:00
Eric Anderson 7c722e440e codegen: Specify URL for ExperimentalApi
Fixes #1378. These are the last ExperimentalApis that didn't have their
own separate tracking issue.
2016-06-07 15:00:29 -07:00
Carl Mastrangelo 0021e063f2 benchmarks: reset context in AsyncClient to avoid chaining.
Fixes #1878
2016-06-02 21:41:43 -07:00
Carl Mastrangelo 2ed39c92eb core: cache decompressor registry encodings, and make it copy on write 2016-06-02 09:10:42 -07:00
Carl Mastrangelo 134451f62c benchmarks: use nextUp for nextDelay calculation in OpenLoopClient 2016-05-31 14:05:48 -07:00
Carl Mastrangelo 9f37951680 benchmarks: honor transport in AsyncClient 2016-05-27 11:55:18 -07:00
Carl Mastrangelo f172170649 benchmarks: use more realistic header counts in benchmark 2016-05-25 16:30:29 -07:00
Eric Anderson 641cb357c6 Tweak -Xlint warnings
This now catches a few more places we needed -Xlint:-options.
InProcessSocketAddress is technically already in our stable API, so I
maintained its current serialVersionUID.
2016-05-24 15:04:07 -07:00
Carl Mastrangelo cca473df99 benchmarks: add client/server header benchmarks 2016-05-24 13:11:14 -07:00
Carl Mastrangelo 5e30b2f7ba netty: speed up header conversion by caching user agent string
Performance gain seems to be about 100-200ns based on a a couple trials.

Benchmark                                 (headerCount)  (validate)    Mode     Cnt     Score    Error  Units
HeadersBenchmark.convertClientHeadersOld             10       false  sample  187490   858.234 ±  3.992  ns/op
HeadersBenchmark.convertClientHeadersOld             20       false  sample  113589  1407.557 ± 45.178  ns/op
HeadersBenchmark.convertClientHeadersOld             50       false  sample  100725  3141.936 ± 55.175  ns/op
HeadersBenchmark.convertClientHeadersOld            100       false  sample  109742  5707.748 ± 38.222  ns/op
HeadersBenchmark.convertHeaders                      10       false  sample  109137   748.486 ±  4.060  ns/op
HeadersBenchmark.convertHeaders                      20       false  sample  133639  1238.528 ± 51.914  ns/op
HeadersBenchmark.convertHeaders                      50       false  sample  107914  2915.602 ± 10.017  ns/op
HeadersBenchmark.convertHeaders                     100       false  sample  110305  5682.404 ± 44.032  ns/op
2016-05-23 17:49:30 -07:00
Carl Mastrangelo 8090effa2d netty: don't revalidate when converting between Metadata and Http2Headers 2016-05-23 11:02:22 -07:00
Carl Mastrangelo 8c7440e141 benchmarks: add NETTY_EPOLL as an option for transport testing 2016-05-18 16:37:42 -07:00
Carl Mastrangelo 6a802a2027 benchmarks: revert use of CMS GC 2016-05-17 10:38:59 -07:00
Carl Mastrangelo d08c167bc7 benchmarks: use Concurrent Mark and Sweep GC for benchmark worker 2016-05-16 16:44:58 -07:00
Carl Mastrangelo 188840b920 benchmarks: update to JMH 1.12 2016-05-13 21:25:31 -07:00
Kun Zhang 95973827f5 Refactor HandlerRegistry.
See #933

- Create InternalHandlerRegistry, an immutable look-up table. Handlers
  passed to ServerBuilder.addService() go to this registry. This covers
  the most common use cases. By keeping the registry internal we could
  freely change the registry's interface to accommodate optimizations,
  e.g., for hpack.

- The internal registry uses a flat fullMethodName -> handler look-up
  table instead of a hierarchical one used before. It faster because it
  saves one look-up and a substring.

- Introduces the fallback registry, settable by
  ServerBuilder.fallbackHandlerRegistry(), for advanced users who want a
  dynamic registry. Moved the current MutableHandlerRegistryImpl to
  io.grpc.util.MutableHandlerRegistry as a stock implementation of the
  fallback registry. The io.grpc.MutableHandlerRegistry interface is now
  removed.
2016-05-04 17:12:32 -07:00
ZHANG Dapeng aed886d8de use Jetty ALPN agent instead of Jetty ALPN
#1497
2016-05-02 14:01:36 -07:00
Carl Mastrangelo 3c5b5a5e09 Begin v0.15.0 Cycle 2016-04-29 13:54:18 -07:00
Eric Anderson ac4168a236 Revert "Update proto packages to reflect directory structure"
This reverts commit 8825f355df.

The commit changed the package name of services that were used across
languages. That broke their functionality pretty severely. The changes
require more coordination with others.
2016-04-28 10:42:47 -07:00
Carl Mastrangelo 43b73f34cb Fix more null proto refs 2016-04-27 14:28:52 -07:00
Carl Mastrangelo 8825f355df Update proto packages to reflect directory structure 2016-04-27 14:16:28 -07:00
Carl Mastrangelo 0293c10161 Checking null on proto fields just ain't right. 2016-04-27 13:23:52 -07:00
Kun Zhang 6b5177d3e3 Fix javadoc warning. 2016-04-27 09:57:02 -07:00
Kun Zhang 16e168951a Allow application to pass cancellation details.
Resolves #1221

Add ClientCall.cancel(String, Throwable) and deprecate
ClientCall.cancel(). Will delete cancel() after all known third-party
overriders have switched to overriding the new one.
2016-04-27 09:38:18 -07:00
Carl Mastrangelo 38a91f83e1 Fix lint warnings found on internal import 2016-04-26 13:21:25 -07:00
nmittler 7e8b504e3f Add javadoc to grpc codegen based on proto docs
Fixes #1612
2016-04-22 13:23:17 -07:00
Eric Anderson 9bc5d93e4a Mark generated abstract class as Experimental 2016-04-19 12:35:04 -07:00
Jan Tattermusch 4a0c110f7d Merge pull request #1687 from grpc/java_qps_take_two
Fix QpsWorker shutdown properly
2016-04-18 16:59:21 -07:00
Jan Tattermusch 6b2727b181 Merge pull request #1686 from grpc/tweaking_java_qpsworker
Adjust Java qps worker to work well with run_performance_tests.py
2016-04-18 16:40:25 -07:00
Carl Mastrangelo c4e8b1f10f Rename internal.Server to internal.TransportServer 2016-04-18 15:50:20 -07:00
Carl Mastrangelo a1725d41ad Update benchmark gradle file 2016-04-14 14:12:39 -07:00
Louis Ryan a7049bca3b Implement the load worker that can receive control events from the load driver and initiate load testing scenarios.
Will be used for GRPC's continuous load testing process.
2016-04-13 13:45:22 -07:00
Carl Mastrangelo 357878d2d6 Bump jmh version 2016-04-13 11:56:33 -07:00
Lukasz Strzalkowski 363e0f6cfc Print compiler version number in generated files 2016-04-11 19:35:19 -07:00
Lukasz Strzalkowski 2fbf142a41 Provide base methods for Abstract stub
Default implementation returns status UNIMPLEMENTED. This allows adding
new methods to services without breaking existing code.
2016-04-11 16:38:23 +02:00