grpc-java

History

Carl Mastrangelo 953599543e netty: reduce contention in WriteQueue WriteQueue uses LinkedBlockingQueue, which has stronger synchronization semantics than we need. It also requires that we batch reads from it in order to get reasonable performance. After profiling the delay between writing to LBQ and reading from it, there was a ~10us delay. This change switches to using ConcurrentLinkedQueue as the underlying queue, and removes the batching (reads). Using CLQ with batching is slightly slower. Benchmarks show favorable numbers for both latency and throughput. Each of the following results were run serveral times: Before: Benchmark (direct) (transport) Mode Cnt Score Error Units TransportBenchmark.unaryCall1024 true NETTY sample 321575 124185.027 ± 406.112 ns/op TransportBenchmark.unaryCall1024 false NETTY sample 237400 168232.991 ± 548.043 ns/op After: Benchmark (direct) (transport) Mode Cnt Score Error Units TransportBenchmark.unaryCall1024 true NETTY sample 354773 112552.339 ± 362.471 ns/op TransportBenchmark.unaryCall1024 false NETTY sample 263297 151660.490 ± 507.463 ns/op Qps with 10 outstanding RPCs per channel: Before: Channels: 4 Outstanding RPCs per Channel: 10 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 396 90%ile Latency (in micros): 680 95%ile Latency (in micros): 838 99%ile Latency (in micros): 1476 99.9%ile Latency (in micros): 5231 Maximum Latency (in micros): 43327 QPS: 85761 After: Channels: 4 Outstanding RPCs per Channel: 10 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 384 90%ile Latency (in micros): 612 95%ile Latency (in micros): 725 99%ile Latency (in micros): 1080 99.9%ile Latency (in micros): 3107 Maximum Latency (in micros): 30447 QPS: 93353 The results are even better when under heavy load. Qps with 100 outstanding RPCs per channel: Before: Channels: 4 Outstanding RPCs per Channel: 100 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 2735 90%ile Latency (in micros): 5051 95%ile Latency (in micros): 6219 99%ile Latency (in micros): 9271 99.9%ile Latency (in micros): 13759 Maximum Latency (in micros): 44831 QPS: 125775 After: Channels: 4 Outstanding RPCs per Channel: 100 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 2697 90%ile Latency (in micros): 4639 95%ile Latency (in micros): 5539 99%ile Latency (in micros): 7931 99.9%ile Latency (in micros): 12335 Maximum Latency (in micros): 61823 QPS: 131904	2016-07-18 09:50:47 -07:00
..
src	netty: reduce contention in WriteQueue	2016-07-18 09:50:47 -07:00
build.gradle	use Jetty ALPN agent instead of Jetty ALPN	2016-05-02 14:01:36 -07:00

Carl Mastrangelo 953599543e netty: reduce contention in WriteQueue

WriteQueue uses LinkedBlockingQueue, which has stronger synchronization
semantics than we need.  It also requires that we batch reads from it
in order to get reasonable performance.  After profiling the delay
between writing to LBQ and reading from it, there was a ~10us delay.

This change switches to using ConcurrentLinkedQueue as the underlying
queue, and removes the batching (reads).  Using CLQ with batching is
slightly slower.

Benchmarks show favorable numbers for both latency and throughput.
Each of the following results were run serveral times:

Before:
Benchmark                         (direct)  (transport)    Mode     Cnt       Score     Error  Units
TransportBenchmark.unaryCall1024      true        NETTY  sample  321575  124185.027 ± 406.112  ns/op
TransportBenchmark.unaryCall1024     false        NETTY  sample  237400  168232.991 ± 548.043  ns/op

After:
Benchmark                         (direct)  (transport)    Mode     Cnt       Score     Error  Units
TransportBenchmark.unaryCall1024      true        NETTY  sample  354773  112552.339 ± 362.471  ns/op
TransportBenchmark.unaryCall1024     false        NETTY  sample  263297  151660.490 ± 507.463  ns/op

Qps with 10 outstanding RPCs per channel:

Before:
Channels:                       4
Outstanding RPCs per Channel:   10
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     396
90%ile Latency (in micros):     680
95%ile Latency (in micros):     838
99%ile Latency (in micros):     1476
99.9%ile Latency (in micros):   5231
Maximum Latency (in micros):    43327
QPS:                            85761

After:
Channels:                       4
Outstanding RPCs per Channel:   10
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     384
90%ile Latency (in micros):     612
95%ile Latency (in micros):     725
99%ile Latency (in micros):     1080
99.9%ile Latency (in micros):   3107
Maximum Latency (in micros):    30447
QPS:                            93353

The results are even better when under heavy load.  Qps with 100
outstanding RPCs per channel:

Before:
Channels:                       4
Outstanding RPCs per Channel:   100
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     2735
90%ile Latency (in micros):     5051
95%ile Latency (in micros):     6219
99%ile Latency (in micros):     9271
99.9%ile Latency (in micros):   13759
Maximum Latency (in micros):    44831
QPS:                            125775

After:
Channels:                       4
Outstanding RPCs per Channel:   100
Server Payload Size:            0
Client Payload Size:            0
50%ile Latency (in micros):     2697
90%ile Latency (in micros):     4639
95%ile Latency (in micros):     5539
99%ile Latency (in micros):     7931
99.9%ile Latency (in micros):   12335
Maximum Latency (in micros):    61823
QPS:                            131904

2016-07-18 09:50:47 -07:00

src

netty: reduce contention in WriteQueue

2016-07-18 09:50:47 -07:00

build.gradle

use Jetty ALPN agent instead of Jetty ALPN

2016-05-02 14:01:36 -07:00