grpc-java/netty
Carl Mastrangelo 953599543e netty: reduce contention in WriteQueue
WriteQueue uses LinkedBlockingQueue, which has stronger synchronization
semantics than we need.  It also requires that we batch reads from it
in order to get reasonable performance.  After profiling the delay
between writing to LBQ and reading from it, there was a ~10us delay.

This change switches to using ConcurrentLinkedQueue as the underlying
queue, and removes the batching (reads).  Using CLQ with batching is
slightly slower.

Benchmarks show favorable numbers for both latency and throughput.
Each of the following results were run serveral times:

Before:
Benchmark                         (direct)  (transport)    Mode     Cnt       Score     Error  Units
TransportBenchmark.unaryCall1024      true        NETTY  sample  321575  124185.027 ± 406.112  ns/op
TransportBenchmark.unaryCall1024     false        NETTY  sample  237400  168232.991 ± 548.043  ns/op

After:
Benchmark                         (direct)  (transport)    Mode     Cnt       Score     Error  Units
TransportBenchmark.unaryCall1024      true        NETTY  sample  354773  112552.339 ± 362.471  ns/op
TransportBenchmark.unaryCall1024     false        NETTY  sample  263297  151660.490 ± 507.463  ns/op

Qps with 10 outstanding RPCs per channel:

Before:
Channels:                       4
Outstanding RPCs per Channel:   10
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     396
90%ile Latency (in micros):     680
95%ile Latency (in micros):     838
99%ile Latency (in micros):     1476
99.9%ile Latency (in micros):   5231
Maximum Latency (in micros):    43327
QPS:                            85761

After:
Channels:                       4
Outstanding RPCs per Channel:   10
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     384
90%ile Latency (in micros):     612
95%ile Latency (in micros):     725
99%ile Latency (in micros):     1080
99.9%ile Latency (in micros):   3107
Maximum Latency (in micros):    30447
QPS:                            93353

The results are even better when under heavy load.  Qps with 100
outstanding RPCs per channel:

Before:
Channels:                       4
Outstanding RPCs per Channel:   100
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     2735
90%ile Latency (in micros):     5051
95%ile Latency (in micros):     6219
99%ile Latency (in micros):     9271
99.9%ile Latency (in micros):   13759
Maximum Latency (in micros):    44831
QPS:                            125775

After:
Channels:                       4
Outstanding RPCs per Channel:   100
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     2697
90%ile Latency (in micros):     4639
95%ile Latency (in micros):     5539
99%ile Latency (in micros):     7931
99.9%ile Latency (in micros):   12335
Maximum Latency (in micros):    61823
QPS:                            131904
2016-07-18 09:50:47 -07:00
..
src netty: reduce contention in WriteQueue 2016-07-18 09:50:47 -07:00
build.gradle use Jetty ALPN agent instead of Jetty ALPN 2016-05-02 14:01:36 -07:00