This patch introduces an additional ALPN protocol, grpc-exp, intended to
take preference to h2 and indicate to the server that the connection
contains only gRPC traffic. This allows servers and intermediate boxes
to distinguish gRPC from other HTTP/2 traffic.
The choice of grpc-exp as a protocol identifier indicates that this
scheme is currently experimental and should not be relied upon. The
protocol is not in the IANA TLS registry.
This is the grpc-java equivalent of
8cdf17a620.
Due to the opacity of ALPN and TLS negotiation at application level, the
tests are only there to validate that the lists we're feeding into the
negotiation process have the desired ordering properties:
* If grpc-exp is present, h2 is as well.
* grpc-exp is preferenced over h2.
Not to expose dependency of `io.netty.handler.ssl.SslContext` when implementing `TransportCreationParamsFilter`
Change the `TransportCreationParamsFilter` API
````
ProtocolNegotiator getProtocolNegotiator(NegotiationType negotiationType, SslContext sslContext);
````
into
````
ProtocolNegotiator getProtocolNegotiator();
````
resolvesgrpc/grpc#8715
now that setListener is called prior to
`JumpToApplicationThreadServerStreamListener` being completely ready to
use. We should not call `AbstractStream2#onStreamAllocated()` inside
`setListener()` anymore, but call it after `ServerImpl#streamCreated()`
is completed.
Resolves#1936
Two bugs fixed:
- NPE in `ServerImpl#streamCreated()` when stream listener not set before
stream closed
- It is possible that `internalCancel()` is called during
`InProcessClientStream#start()` due to early server `onComplete()` or server `onError()`,
in this case no need to enlist `streams`, otherwise the channel can not be shutdown by `shutdown()`.
We only want to use the HTTP code for errors, when the response is not
grpc. grpc status codes may be mapped to HTTP codes in the future, and
we don't want to break when that happens. We also don't want to ever
accidentally use Status.OK without receiving it from the server, even
for HTTP 200.
Binary header values are printed in their base64 encoded form.
The GrpcHttpOutboundHeaders, as mentioned in the issue, don't seem to be affected by this regression. The toString() method seems fine.
Highlights
==========
StatsTraceContext
-----------------
The bridge between gRPC library and Census. It keeps track of the total
payload sizes and the elapsed time of a Call. The rest of the gRPC code
doesn't invoke Census directly.
Context propagation
-------------------
StatsTraceContext carries CensusContext (and the upcoming TraceContext)
and is attached to the gRPC Context.
1. StatsTraceContext is created by ManagedChannelImpl, by calling
createClientContext(), which inherits the current CensusContext if available.
2. ManagedChannelImpl passes StatsTraceContext to ClientCallImpl, then
to the stream, then to the framer and deframer explicitly.
3. ClientCallImpl propagates the CensusContext to the headers.
1. ServerImpl creates a StatsTraceContext by implementing a new callback
method StatsTraceContext methodDetermined(MethodDescriptor, Metadata) on
ServerTransportListener.
2. NettyServerHandler calls methodDetermined() before creating the
stream, and passes the StatsTraceContext to the stream.
3. When ServerImpl creates the gRPC Context for the new ServerCall, it
calls the new method statsTraceContext() on ServerStream and puts the
StatsTraceContext in the Context.
Metrics recording
-----------------
1. Client-side start time: when ClientCallImpl is created
2. Server-side start time: when methodDetermined() is called
3. Server-side end time: in ServerStreamListener.closed(), but before
calling onComplete() or onCancel() on ServerCall.Listener.
4. Client-side end time: in ClientStreamListener.closed(), but before
calling onClonse() on ClientCall.Listener
Message sizes are recorded in MessageFramer and MessageDeframer. Both
the uncompressed and wire (possibly compressed) payload sizes are
counted.
TODOs
=====
The CensusContext created from headers on the server side should be
attached to the gRPC Context for the call. It's not done at this moment
because Census lacks the proper API to do it. It only affects tracing
and resource accounting, but doesn't affect stats functionality
Our API allows pings to be send even after the transport has been shutdown. We currently
don't handle the case, where the Netty channel has been closed but the NettyClientHandler
has not yet been removed from the pipeline, correctly. That is, we need to query the shutdown
status whenever we receive a ClosedChannelException.
Also, some cleanup.
The DefaultHttp2Headers class is a general-purpose Http2Headers implementation
and provides much more functionality than we need in gRPC. In gRPC, when reading
headers off the wire, we only inspect a handful of them, before converting to
Metadata.
This commit introduces a Http2Headers implementation that aims for insertion
efficiency, a low memory footprint and fast conversion to Metadata.
- Header names and values are stored in plain byte[].
- Insertion is O(1), while lookup is now O(n).
- Binary header values are base64 decoded as they are inserted.
- The byte[][] returned by namesAndValues() can directly be used to construct
a new Metadata object.
- For HTTP/2 request headers, the pseudo headers are no longer carried over to
Metadata.
A microbenchmark aiming to replicate the usage of Http2Headers in NettyClientHandler
and NettyServerHandler shows decent throughput gains when compared to DefaultHttp2Headers.
Benchmark Mode Cnt Score Error Units
InboundHeadersBenchmark.defaultHeaders_clientHandler avgt 10 283.830 ± 4.063 ns/op
InboundHeadersBenchmark.defaultHeaders_serverHandler avgt 10 1179.975 ± 21.810 ns/op
InboundHeadersBenchmark.grpcHeaders_clientHandler avgt 10 190.108 ± 3.510 ns/op
InboundHeadersBenchmark.grpcHeaders_serverHandler avgt 10 561.426 ± 9.079 ns/op
Additionally, the memory footprint is reduced by more than 50%!
gRPC Request Headers: 864 bytes
Netty Request Headers: 1728 bytes
gRPC Response Headers: 216 bytes
Netty Response Headers: 528 bytes
Furthermore, this change does most of the gRPC groundwork necessary to be able
to cache higher ordered objects in HPACK's dynamic table, as discussed in [1].
[1] https://github.com/grpc/grpc-java/issues/2217
Metadata.removeAll creates an iterator for looking through removed
values even if the call doens't use it. This change adds a similar
method which doesn't create garbage.
This change makes it easier in the future to alter the internals
of Metadata where it may be expensive to return removed values.
Called whenever a ServerTransport is ready and terminated. Has the
ability to modify transport attributes, which ServerCall.attributes()
are based on.
Related changes:
- Attribute keys for remote address and SSL session are now moved from
ServerCall to a neutral place io.grpc.Grpc, because they can also be
used from ServerTransportFilter, and probably will be used on the
client-side too. The old keys on ServerCall is marked deprecated and
are equivalent to the new keys.
- Added transportReady() to ServerTransportListener.
Resolves#2132
After debugging #2153, it would have been nice to know what the exact
parameter was that was null. This change adds a name for each
checkNotNull (and tries to normalized on static imports in order to
shorten lines)
Implementations of ManagedClientTransport.start() are restricted from
calling the passed listener until start() returns, in order to avoid
reentrency problems with locks. For most transports this isn't a
problem, because they need additional threads anyway. InProcess uses no
additional threads naturally so ends up needing a thread just to
notifyReady. Now transports can just return a Runnable that can be run
after locks are dropped.
This was originally intended to be a performance optimization, but the
thread also causes nondeterminism because RPCs are delayed until
notifyReady is called. So avoiding the thread reduces needless fakes
during tests.
WriteQueue uses LinkedBlockingQueue, which has stronger synchronization
semantics than we need. It also requires that we batch reads from it
in order to get reasonable performance. After profiling the delay
between writing to LBQ and reading from it, there was a ~10us delay.
This change switches to using ConcurrentLinkedQueue as the underlying
queue, and removes the batching (reads). Using CLQ with batching is
slightly slower.
Benchmarks show favorable numbers for both latency and throughput.
Each of the following results were run serveral times:
Before:
Benchmark (direct) (transport) Mode Cnt Score Error Units
TransportBenchmark.unaryCall1024 true NETTY sample 321575 124185.027 ± 406.112 ns/op
TransportBenchmark.unaryCall1024 false NETTY sample 237400 168232.991 ± 548.043 ns/op
After:
Benchmark (direct) (transport) Mode Cnt Score Error Units
TransportBenchmark.unaryCall1024 true NETTY sample 354773 112552.339 ± 362.471 ns/op
TransportBenchmark.unaryCall1024 false NETTY sample 263297 151660.490 ± 507.463 ns/op
Qps with 10 outstanding RPCs per channel:
Before:
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 396
90%ile Latency (in micros): 680
95%ile Latency (in micros): 838
99%ile Latency (in micros): 1476
99.9%ile Latency (in micros): 5231
Maximum Latency (in micros): 43327
QPS: 85761
After:
Channels: 4
Outstanding RPCs per Channel: 10
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 384
90%ile Latency (in micros): 612
95%ile Latency (in micros): 725
99%ile Latency (in micros): 1080
99.9%ile Latency (in micros): 3107
Maximum Latency (in micros): 30447
QPS: 93353
The results are even better when under heavy load. Qps with 100
outstanding RPCs per channel:
Before:
Channels: 4
Outstanding RPCs per Channel: 100
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 2735
90%ile Latency (in micros): 5051
95%ile Latency (in micros): 6219
99%ile Latency (in micros): 9271
99.9%ile Latency (in micros): 13759
Maximum Latency (in micros): 44831
QPS: 125775
After:
Channels: 4
Outstanding RPCs per Channel: 100
Server Payload Size: 0
Client Payload Size: 0
50%ile Latency (in micros): 2697
90%ile Latency (in micros): 4639
95%ile Latency (in micros): 5539
99%ile Latency (in micros): 7931
99.9%ile Latency (in micros): 12335
Maximum Latency (in micros): 61823
QPS: 131904