Moved the following APIs from `io.grpc.testing.TestUtils` to `io.grpc.internal.TestUtils`:
`InetSocketAddress testServerAddress(String host, int port)`
`InetSocketAddress testServerAddress(int port)`
`List<String> preferredTestCiphers()`
`File loadCert(String name)`
`X509Certificate loadX509Cert(String fileName)`
`SSLSocketFactory newSslSocketFactoryForCa(Provider provider, File certChainFile)`
`void sleepAtLeast(long millis)`
APIs not to be moved:
`ServerInterceptor recordRequestHeadersInterceptor()`
`ServerInterceptor recordServerCallInterceptor()`
Main implementation is in CensusTracingModule.
Also a few fix-ups in the stats implementation CensusStatsModule:
- Change header key name from grpc-census-bin to grpc-tags-bin
- Server does not fail on header parse errors. Uses the default instead.
Protect Census-based stats and tracing with static flags: `GrpcUtil.enableCensusStats` and `GrpcUtil.enableCensusTracing`. They keep those features disabled by default until they, especially their wire formats, are stabilized.
We aren't upgrading yet, because we don't want to begin using the new
Mockito APIs. But all the tests now pass with the newer version. There
are a lot of warnings that can't be fixed until we bump the mockito
version.
Background
==========
LoadBalancer needs to track RPC measurements and status for
load-reporting. We need to introduce a "Tracer" API for that.
Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.
Alternatives
============
We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.
Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.
Part of the LB-related information is whether the client has received
the initial headers from the server. While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.
Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API. Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.
Summary of changes
==================
API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.
On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.
Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor. By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call. This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.
On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.
The Tracer API supports propagation of stats/tracing information through
Context and metadata. Both client-side and server-side tracer factories
have access to the headers object. Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.
Implementation details
----------------------
Only real streams report stats. Pseudo streams such as delayed stream,
failing stream don't report. InProcess transport streams currently
don't report stats.
"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream. On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.
The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.
Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.
Bug fixed
----------------
The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong. Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK. Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.
Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK). Whichever happens
first is the reported status.
TODOs
=====
A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.
This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
This is needed because in interceptor tests, often the types cannot
be changed. The void methods stay for users who are writing tests
where they actually don't care about types. The noop methods
require types to be specified. This is for users who don't care
about the implementation. These represent different levels of
commitment.
This eases the transition of code Mocking MethodDescriptor, which
breaks in this release.
Because "core"'s test source already depends on "testing", e.g.,
`core/src/test/java/io/grpc/internal/ServerCallImplTest.java` uses
`testing/src/main/java/io/grpc/internal/testing/StatsTestUtils.java`,
which forms a circular dependency.
This change moves the StatsContext setter accessors from "core"'s test
source to "testing".
Resolves#2651
The interceptors are quite specific, and probably not helpful for
current testing strategies. They really are only useful for interop
testing. Moving them to interop-testing avoids them appearing to be in a
public API (even if that API is experimental).
No functional changes were made; just code movement.
In upstream, Census is renamed to "Instrumentation". `com.google.census` is renamed to `com.google.instrumentation.stats`.
In gRPC, "census" in every name is replaced by "stats".
GrpcServerRule configures an in-process server and channel. It is
useful for asserting requests being made to a service. A consumer can
create a mock implementation of their service that records each
request, then make assertions on those records in their test.
Resolves#1936
Two bugs fixed:
- NPE in `ServerImpl#streamCreated()` when stream listener not set before
stream closed
- It is possible that `internalCancel()` is called during
`InProcessClientStream#start()` due to early server `onComplete()` or server `onError()`,
in this case no need to enlist `streams`, otherwise the channel can not be shutdown by `shutdown()`.
Highlights
==========
StatsTraceContext
-----------------
The bridge between gRPC library and Census. It keeps track of the total
payload sizes and the elapsed time of a Call. The rest of the gRPC code
doesn't invoke Census directly.
Context propagation
-------------------
StatsTraceContext carries CensusContext (and the upcoming TraceContext)
and is attached to the gRPC Context.
1. StatsTraceContext is created by ManagedChannelImpl, by calling
createClientContext(), which inherits the current CensusContext if available.
2. ManagedChannelImpl passes StatsTraceContext to ClientCallImpl, then
to the stream, then to the framer and deframer explicitly.
3. ClientCallImpl propagates the CensusContext to the headers.
1. ServerImpl creates a StatsTraceContext by implementing a new callback
method StatsTraceContext methodDetermined(MethodDescriptor, Metadata) on
ServerTransportListener.
2. NettyServerHandler calls methodDetermined() before creating the
stream, and passes the StatsTraceContext to the stream.
3. When ServerImpl creates the gRPC Context for the new ServerCall, it
calls the new method statsTraceContext() on ServerStream and puts the
StatsTraceContext in the Context.
Metrics recording
-----------------
1. Client-side start time: when ClientCallImpl is created
2. Server-side start time: when methodDetermined() is called
3. Server-side end time: in ServerStreamListener.closed(), but before
calling onComplete() or onCancel() on ServerCall.Listener.
4. Client-side end time: in ClientStreamListener.closed(), but before
calling onClonse() on ClientCall.Listener
Message sizes are recorded in MessageFramer and MessageDeframer. Both
the uncompressed and wire (possibly compressed) payload sizes are
counted.
TODOs
=====
The CensusContext created from headers on the server side should be
attached to the gRPC Context for the call. It's not done at this moment
because Census lacks the proper API to do it. It only affects tracing
and resource accounting, but doesn't affect stats functionality
Our API allows pings to be send even after the transport has been shutdown. We currently
don't handle the case, where the Netty channel has been closed but the NettyClientHandler
has not yet been removed from the pipeline, correctly. That is, we need to query the shutdown
status whenever we receive a ClosedChannelException.
Also, some cleanup.
Called whenever a ServerTransport is ready and terminated. Has the
ability to modify transport attributes, which ServerCall.attributes()
are based on.
Related changes:
- Attribute keys for remote address and SSL session are now moved from
ServerCall to a neutral place io.grpc.Grpc, because they can also be
used from ServerTransportFilter, and probably will be used on the
client-side too. The old keys on ServerCall is marked deprecated and
are equivalent to the new keys.
- Added transportReady() to ServerTransportListener.
Resolves#2132
Implementations of ManagedClientTransport.start() are restricted from
calling the passed listener until start() returns, in order to avoid
reentrency problems with locks. For most transports this isn't a
problem, because they need additional threads anyway. InProcess uses no
additional threads naturally so ends up needing a thread just to
notifyReady. Now transports can just return a Runnable that can be run
after locks are dropped.
This was originally intended to be a performance optimization, but the
thread also causes nondeterminism because RPCs are delayed until
notifyReady is called. So avoiding the thread reduces needless fakes
during tests.
conscrypt at some point which would allow ALPN to function
Clarify the SSLContext.getDefault is not used when constructing the
default SSLSocketFactory.
Resolves#1756
The thread-unsafe method `io.grpc.testing.TestUtils.pickUnusedPort` causes flakes (#1756) in windows. Need to avoid use of this method in test as in windows the tests are running in different jvms and concurrent calls of this method in multiple processes tend to return the same port number.
There are some usages of this method in benchmarks, so moved the method to `io.grpc.benchmarks.Utils` and the method will only be used in benchmarks and not in test.
This reverts commit 3df1446deb.
The commit was adding to the difficulty of integration for testing. By
itself it isn't bad, so this is a temporary revert until the many other
commits are absorbed and then it will be reapplied.
This does have a manual edit for ClientCallsTest.