The DefaultHttp2Headers class is a general-purpose Http2Headers implementation
and provides much more functionality than we need in gRPC. In gRPC, when reading
headers off the wire, we only inspect a handful of them, before converting to
Metadata.
This commit introduces a Http2Headers implementation that aims for insertion
efficiency, a low memory footprint and fast conversion to Metadata.
- Header names and values are stored in plain byte[].
- Insertion is O(1), while lookup is now O(n).
- Binary header values are base64 decoded as they are inserted.
- The byte[][] returned by namesAndValues() can directly be used to construct
a new Metadata object.
- For HTTP/2 request headers, the pseudo headers are no longer carried over to
Metadata.
A microbenchmark aiming to replicate the usage of Http2Headers in NettyClientHandler
and NettyServerHandler shows decent throughput gains when compared to DefaultHttp2Headers.
Benchmark Mode Cnt Score Error Units
InboundHeadersBenchmark.defaultHeaders_clientHandler avgt 10 283.830 ± 4.063 ns/op
InboundHeadersBenchmark.defaultHeaders_serverHandler avgt 10 1179.975 ± 21.810 ns/op
InboundHeadersBenchmark.grpcHeaders_clientHandler avgt 10 190.108 ± 3.510 ns/op
InboundHeadersBenchmark.grpcHeaders_serverHandler avgt 10 561.426 ± 9.079 ns/op
Additionally, the memory footprint is reduced by more than 50%!
gRPC Request Headers: 864 bytes
Netty Request Headers: 1728 bytes
gRPC Response Headers: 216 bytes
Netty Response Headers: 528 bytes
Furthermore, this change does most of the gRPC groundwork necessary to be able
to cache higher ordered objects in HPACK's dynamic table, as discussed in [1].
[1] https://github.com/grpc/grpc-java/issues/2217
|
||
|---|---|---|
| .. | ||
| src | ||
| README.md | ||
| build.gradle | ||
README.md
grpc Benchmarks
QPS Benchmark
The "Queries Per Second Benchmark" allows you to get a quick overview of the throughput and latency characteristics of grpc.
To build the benchmark type
$ ./gradlew :grpc-benchmarks:installDist
from the grpc-java directory.
You can now find the client and the server executables in benchmarks/build/install/grpc-benchmarks/bin.
The C++ counterpart can be found at https://github.com/grpc/grpc/tree/master/test/cpp/qps
Visualizing the Latency Distribution
The QPS client comes with the option --dump_histogram=FILE, if set it serializes the histogram to FILE which can then be used with a plotter to visualize the latency distribution. The histogram is stored in the file format of HdrHistogram. That way it can be plotted very easily using a browser based tool like http://hdrhistogram.github.io/HdrHistogram/plotFiles.html. Simply upload the generated file and it will generate a beautiful graph for you. It also allows you to plot two or more histograms on the same surface in order two easily compare latency distributions.
JVM Options
When running a benchmark it's often useful to adjust some JVM options to improve performance and to gain some insights into what's happening. Passing JVM options to the QPS server and client is as easy as setting the JAVA_OPTS environment variables. Below are some options that I find very useful:
-Xmsgives a lower bound on the heap to allocate and-Xmxgives an upper bound. If your program uses more than what's specified in-Xmxthe JVM will exit with anOutOfMemoryError. When setting those always setXmsandXmxto the same value. The reason for this is that the young and old generation are sized according to the total available heap space. So if the total heap gets resized, they will also have to be resized and this will then trigger a full GC.-verbose:gcprints some basic information about garbage collection. It will log to stdout whenever a GC happend and will tell you about the kind of GC, pause time and memory compaction.-XX:+PrintGCDetailsprints out very detailed GC and heap usage information before the program terminates.-XX:-HeapDumpOnOutOfMemoryErrorand-XX:HeapDumpPath=pathwhen you are pushing the JVM hard it sometimes happens that it will crash due to the lack of available heap space. This option will allow you to dive into the details of why it happened. The heap dump can be viewed with e.g. the Eclipse Memory Analyzer.-XX:+PrintCompilationwill give you a detailed overview of what gets compiled, when it gets compiled, by which HotSpot compiler it gets compiled and such. It's a lot of output. I usually just redirect it to file and look at it withlessandgrep.-XX:+PrintInliningwill give you a detailed overview of what gets inlined and why some methods didn't get inlined. The output is very verbose and like-XX:+PrintCompilationand useful to look at after some major changes or when a drop in performance occurs.- It sometimes happens that a benchmark just doesn't make any progress, that is no bytes are transferred over the network, there is hardly any CPU utilization and low memory usage but the benchmark is still running. In that case it's useful to get a thread dump and see what's going on. HotSpot ships with a tool called
jpsandjstack.jpstells you the process id of all running JVMs on the machine, which you can then pass tojstackand it will print a thread dump of this JVM. - Taking a heap dump of a running JVM is similarly straightforward. First get the process id with
jpsand then usejmapto take the heap dump. You will almost always want to run it with-dump:livein order to only dump live objects. If possible, try to size the heap of your JVM (-Xmx) as small as possible in order to also keep the heap dump small. Large heap dumps are very painful and slow to analyze.
Profiling
Newer JVMs come with a built-in profiler called Java Flight Recorder. It's an excellent profiler and it can be used to start a recording directly on the command line, from within Java Mission Control or
with jcmd.
A good introduction on how it works and how to use it are http://hirt.se/blog/?p=364 and http://hirt.se/blog/?p=370.