grpc-java/benchmarks
Kun Zhang 903197b2aa core: StreamTracer (#2863)
Background
==========

LoadBalancer needs to track RPC measurements and status for
load-reporting.  We need to introduce a "Tracer" API for that.

Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.

Alternatives
============

We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.

Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.

Part of the LB-related information is whether the client has received
the initial headers from the server.  While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.

Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API.  Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.

Summary of changes
==================

API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.

On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.

Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor.  By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call.  This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.

On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.

The Tracer API supports propagation of stats/tracing information through
Context and metadata.  Both client-side and server-side tracer factories
have access to the headers object.  Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.

Implementation details
----------------------

Only real streams report stats.  Pseudo streams such as delayed stream,
failing stream don't report.  InProcess transport streams currently
don't report stats.

"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream.  On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.

The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.

Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.

Bug fixed
----------------

The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong.  Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK.  Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.

Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK).  Whichever happens
first is the reported status.

TODOs
=====

A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
2017-04-07 11:03:24 -07:00
..
src core: StreamTracer (#2863) 2017-04-07 11:03:24 -07:00
README.md Update param name for saving histograms 2016-12-08 11:21:50 -08:00
build.gradle benchmarks: use JMH 1.18 2017-03-30 14:52:22 -07:00

README.md

grpc Benchmarks

QPS Benchmark

The "Queries Per Second Benchmark" allows you to get a quick overview of the throughput and latency characteristics of grpc.

To build the benchmark type

$ ./gradlew :grpc-benchmarks:installDist

from the grpc-java directory.

You can now find the client and the server executables in benchmarks/build/install/grpc-benchmarks/bin.

The C++ counterpart can be found at https://github.com/grpc/grpc/tree/master/test/cpp/qps

Visualizing the Latency Distribution

The QPS client comes with the option --save_histogram=FILE, if set it serializes the histogram to FILE which can then be used with a plotter to visualize the latency distribution. The histogram is stored in the file format of HdrHistogram. That way it can be plotted very easily using a browser based tool like http://hdrhistogram.github.io/HdrHistogram/plotFiles.html. Simply upload the generated file and it will generate a beautiful graph for you. It also allows you to plot two or more histograms on the same surface in order two easily compare latency distributions.

JVM Options

When running a benchmark it's often useful to adjust some JVM options to improve performance and to gain some insights into what's happening. Passing JVM options to the QPS server and client is as easy as setting the JAVA_OPTS environment variables. Below are some options that I find very useful:

  • -Xms gives a lower bound on the heap to allocate and -Xmx gives an upper bound. If your program uses more than what's specified in -Xmx the JVM will exit with an OutOfMemoryError. When setting those always set Xms and Xmx to the same value. The reason for this is that the young and old generation are sized according to the total available heap space. So if the total heap gets resized, they will also have to be resized and this will then trigger a full GC.
  • -verbose:gc prints some basic information about garbage collection. It will log to stdout whenever a GC happend and will tell you about the kind of GC, pause time and memory compaction.
  • -XX:+PrintGCDetails prints out very detailed GC and heap usage information before the program terminates.
  • -XX:-HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath=path when you are pushing the JVM hard it sometimes happens that it will crash due to the lack of available heap space. This option will allow you to dive into the details of why it happened. The heap dump can be viewed with e.g. the Eclipse Memory Analyzer.
  • -XX:+PrintCompilation will give you a detailed overview of what gets compiled, when it gets compiled, by which HotSpot compiler it gets compiled and such. It's a lot of output. I usually just redirect it to file and look at it with less and grep.
  • -XX:+PrintInlining will give you a detailed overview of what gets inlined and why some methods didn't get inlined. The output is very verbose and like -XX:+PrintCompilation and useful to look at after some major changes or when a drop in performance occurs.
  • It sometimes happens that a benchmark just doesn't make any progress, that is no bytes are transferred over the network, there is hardly any CPU utilization and low memory usage but the benchmark is still running. In that case it's useful to get a thread dump and see what's going on. HotSpot ships with a tool called jps and jstack. jps tells you the process id of all running JVMs on the machine, which you can then pass to jstack and it will print a thread dump of this JVM.
  • Taking a heap dump of a running JVM is similarly straightforward. First get the process id with jps and then use jmap to take the heap dump. You will almost always want to run it with -dump:live in order to only dump live objects. If possible, try to size the heap of your JVM (-Xmx) as small as possible in order to also keep the heap dump small. Large heap dumps are very painful and slow to analyze.

Profiling

Newer JVMs come with a built-in profiler called Java Flight Recorder. It's an excellent profiler and it can be used to start a recording directly on the command line, from within Java Mission Control or with jcmd.

A good introduction on how it works and how to use it are http://hirt.se/blog/?p=364 and http://hirt.se/blog/?p=370.