This is a big, but mostly mechanical change. The newly added Test*StreamTracer classes are designed to be extended which is why they are non final and have protected fields. There are a few notable things in this:
1. verifyNoMoreInteractions is gone. The API for StreamTracers doesn't make this guarantee. I have recovered this behavior by failing duplicate calls. This has resulted in a few bugs in the test code being fixed.
2. StreamTracers cannot be mocked anymore. Tracers need to be thread safe, which mocks simply are not. This leads to a HUGE number of reports when trying to find real races in gRPC.
3. If these classes are useful, we can promote them out of internal. I just put them here out of convenience.
* netty: support `status()` on Headers
Recent Netty change a91df58ca1
caused the `status()` method to be invoked, which AbstractHttp2Headers does not implement.
This change is necesary to upgrade to Netty 4.1.14
Coupled with the similar change on server-side, this removes the need for a
thread when using Netty. For InProcess and OkHttp, it would allow us to let the
user to provide the scheduler for tests or application-wide thread sharing.
For Netty, this reduces the number of threads necessary for servers (although
until channel is converted, actual number of threads isn't impacted) and
naturally reduces contention and timeout latency.
For InProcess, this gets us closer to allowing applications to provide all
executors, which is especially useful during tests.
Class.forName(String) is understood by ProGuard, removing the need for
manual ProGuard configuration and allows ProGuard to rename the provider
classes. Previously the provider classes could not be renamed.
Fixes#2633
Moved the following APIs from `io.grpc.testing.TestUtils` to `io.grpc.internal.TestUtils`:
`InetSocketAddress testServerAddress(String host, int port)`
`InetSocketAddress testServerAddress(int port)`
`List<String> preferredTestCiphers()`
`File loadCert(String name)`
`X509Certificate loadX509Cert(String fileName)`
`SSLSocketFactory newSslSocketFactoryForCa(Provider provider, File certChainFile)`
`void sleepAtLeast(long millis)`
APIs not to be moved:
`ServerInterceptor recordRequestHeadersInterceptor()`
`ServerInterceptor recordServerCallInterceptor()`
This the cause of the flakey serverNotListening test, because the
NOOP_MESSAGE just sits around the pipeline. As a result, the
listener does not fire within the 1s verification timeout.
InternalHandlerSettings is part of "netty:internal" inside google,
which is used to allow controlled exposure of internals.
"netty:internal" depends on "netty", which consists of the rest of the
netty subproject. Therefore, "netty" should not depend on
"netty:internal".
Sadly, the serverNotListening test is still flakey after this change, but this PR fixes a legit problem.
The listener to the connect future depends on the channel pipeline being intact. But the way it is attached allows the connect attempt to fail, and have the entire pipeline being torn down by netty before the .addListener actually runs. The result is that the listener will be attached to an already completed future, and the logic will be applied to an empty pipeline.
The fundamental problem is that there are two threads, the grpc thread, and the netty thread. This PR moves the listener attaching code into the netty thread, guaranteeing the listener is attached before any connection is made. It makes more sense for the code to live inside AbstractBufferingHandler, since handlers are generally free to swallow exceptions (the alternative is to make NettyClientHandler forward exceptions up the pipeline from itself). AbstractBufferingHandler needs the special guarantees, so it will be the one with special code.
Creating the SslContext can throw, generally due to broken ALPN. We want
that to propagate to the caller of build(), instead of within the
channel where it could easily cause hangs.
We still delay creation until actual build() time, since TLS is not
guaranteed to work and the application may be configuring plaintext or
similar later before calling build() where SslContext is unnecessary.
The only externally-visible change should be the exception handling.
I'd add a test, but the things throwing are static and trying to inject
them would be pretty messy.
Fixes#2599
This is a minor change setting the size of data frames sent when
interleaving RPCs. The size was ~1024bytes previously, which
resulted in the `writev` syscalls sending many smaller chunks
before hitting the low water mark. The end effect is larger calls
to `writev`, as seen with strace.
The effect of this is noticeable when sending a lot of data. When
sending as many 1MB messages as possible it nearly doubles the
rate.
Before:
```
INFO: single throughput GRPC
50.0%ile Latency (in nanos): 280856575
90.0%ile Latency (in nanos): 349618175
95.0%ile Latency (in nanos): 380444671
99.0%ile Latency (in nanos): 455172095
99.9%ile Latency (in nanos): 537198591
100.0%ile Latency (in nanos): 566886399
QPS: 346
Count: 103984
```
After:
```
gRPC
50.0%ile Latency (in nanos): 125948927
90.0%ile Latency (in nanos): 166322175
95.0%ile Latency (in nanos): 177276927
99.0%ile Latency (in nanos): 193840127
99.9%ile Latency (in nanos): 226841599
100.0%ile Latency (in nanos): 256110591
QPS: 774
Count: 232340
```
* Upgrade netty to 4.1.11.Final
* Upgrade netty-tcnative to 2.0.1.Final
* Remove `FixedHttp2ConnectionDecoder` as it's no longer needed
* Use new, extensible `DefaultHttp2HeadersDecoder` for custom headers handling
AbstractManagedChannelImplBuilder accepts an overrideAuthority parameter, but this value is not hooked up to the name resolver object. Ultimately, Channel.authority consults with the NameResolver, so the overrideAuthority should be hooked into the NameResolverFactory, while all other functionality should be preserved.
Also, add unit tests for all the variants of OkHttpChannelBuilder and NettyChannelBuilder constructors, namely to test the slightly different NettyChannelBuilder(SocketAddress) code path.
Fixes#2682
While we can use permit/deny in this one case, it isn't generalizable to
other cases. In order to avoid always questioning how to deal with
boolean config options, just pass the boolean in all cases.
This mirrors what is being done with the client-side's
keepAliveWithoutCalls.
These methods were very recently added, so there is a low risk of
breakage.
Background
==========
LoadBalancer needs to track RPC measurements and status for
load-reporting. We need to introduce a "Tracer" API for that.
Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.
Alternatives
============
We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.
Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.
Part of the LB-related information is whether the client has received
the initial headers from the server. While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.
Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API. Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.
Summary of changes
==================
API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.
On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.
Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor. By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call. This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.
On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.
The Tracer API supports propagation of stats/tracing information through
Context and metadata. Both client-side and server-side tracer factories
have access to the headers object. Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.
Implementation details
----------------------
Only real streams report stats. Pseudo streams such as delayed stream,
failing stream don't report. InProcess transport streams currently
don't report stats.
"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream. On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.
The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.
Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.
Bug fixed
----------------
The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong. Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK. Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.
Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK). Whichever happens
first is the reported status.
TODOs
=====
A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
Now that there is a config, the new defaults are now being enabled.
Previously there were no default limits. Now keepalives may not be more
frequent than every 5 minutes and only when there are outstanding RPCs.
To be in line with `NettyServerBuilder` APIs
- Deprecated `enableKeepAlive(boolean enable)` and
`enableKeepAlive(boolean enable, long keepAliveDelay, TimeUnit delayUnit, long keepAliveTimeout,
TimeUnit timeoutUnit)`
which never worked in v1.2
- Added `keepAliveTime(long keepAliveTime, TimeUnit timeUnit)` and
`keepAliveTimeout(long keepAliveTimeout, TimeUnit timeUnit)`
Everything is currently permitted, but I've tested with other
configurations and all tests pass. I'll set the restrictive default at
the same time as adding a configuration API.
d116cc9 fixed the NPE, but the initialization of the manager happened
_after_ newHandler() was called, so a null manager was passed to the
handler.
Fixes#2828
`keepAlivedManager#onTransportshutdown` should not be called in `transport.shutdown()` because it is possible that there are still open RPC streams, and maybe inactive, so keepalive is still needed.
fix JavaStyle and ErrorProne warnings found in internal weekly import:
- Calls to ExpectedException#expect should always be followed by exactly one statement.
- Do not mock 'java.util.concurrent.Future'
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.
This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
Fixes NPE when keepalive is enabled.
* Move creation of keepAliveManager to the bottom of start()
* Enable keepAlive in NettyClientTransportTest
* Add test cases checking if keepalive is enabled/disabled, specifically.
Fixes#2726
In some environments DNS is not available and is performed by the
CONNECT proxy. Nothing "special" should need to be done for these
environments, but the previous support took shortcuts which knowingly
would not support such environments.
This change should fix both OkHttp and Netty. Netty's
Bootstrap.connect() resolved the name immediately whereas using
ChannelPipeline.connect() waits until the address reaches the end of the
pipeline. Netty uses NetUtil.toSocketAddressString() to get the name of
the address, which uses InetSocketAddress.getHostString() when
available.
OkHttp is still using InetSocketAddress.getHostName() which may issue
reverse DNS lookups. However, if the reverse DNS lookup fails, it should
convert the IP to a textual string like getHostString(). So as long as
the reverse DNS maps to the same machine as the IP, there should only be
performance concerns, not correctness issues. Since the DnsNameResolver
is creating unresolved addresses, the reverse DNS lookups shouldn't
occur in the common case.
This is a squash and modification of master commits that also includes:
netty,okhttp: Fix CONNECT and its error handling
This commit has been modified to reduce its size to substantially reduce
risk of it breaking Netty error handling. But that also means proxy
error handling just provides a useless "there was an error" sort of
message.
There is no Java API to enable the proxy support. Instead, you must set
the GRPC_PROXY_EXP environment variable which should be set to a
host:port string. The environment variable is temporary; it will not
exist in future releases. It exists to provide support without needing
explicit code to enable the future, while at the same time not risking
enabling it for existing users.
add `getAttributes()` to `ClientStream` and `ClientCall` to be able to share clientTransport
information such as socket TOS with higher lever API's, once the RPC picks up an active transport that is ready to use.
This patch introduces an additional ALPN protocol, grpc-exp, intended to
take preference to h2 and indicate to the server that the connection
contains only gRPC traffic. This allows servers and intermediate boxes
to distinguish gRPC from other HTTP/2 traffic.
The choice of grpc-exp as a protocol identifier indicates that this
scheme is currently experimental and should not be relied upon. The
protocol is not in the IANA TLS registry.
This is the grpc-java equivalent of
8cdf17a620.
Due to the opacity of ALPN and TLS negotiation at application level, the
tests are only there to validate that the lists we're feeding into the
negotiation process have the desired ordering properties:
* If grpc-exp is present, h2 is as well.
* grpc-exp is preferenced over h2.
Not to expose dependency of `io.netty.handler.ssl.SslContext` when implementing `TransportCreationParamsFilter`
Change the `TransportCreationParamsFilter` API
````
ProtocolNegotiator getProtocolNegotiator(NegotiationType negotiationType, SslContext sslContext);
````
into
````
ProtocolNegotiator getProtocolNegotiator();
````
resolvesgrpc/grpc#8715
now that setListener is called prior to
`JumpToApplicationThreadServerStreamListener` being completely ready to
use. We should not call `AbstractStream2#onStreamAllocated()` inside
`setListener()` anymore, but call it after `ServerImpl#streamCreated()`
is completed.
Resolves#1936
Two bugs fixed:
- NPE in `ServerImpl#streamCreated()` when stream listener not set before
stream closed
- It is possible that `internalCancel()` is called during
`InProcessClientStream#start()` due to early server `onComplete()` or server `onError()`,
in this case no need to enlist `streams`, otherwise the channel can not be shutdown by `shutdown()`.
We only want to use the HTTP code for errors, when the response is not
grpc. grpc status codes may be mapped to HTTP codes in the future, and
we don't want to break when that happens. We also don't want to ever
accidentally use Status.OK without receiving it from the server, even
for HTTP 200.