RPC upstarts are counted into metrics
RPC_{CLIENT,SERVER}_STARTED_COUNT. In addition, RPC completions are
counted into metrics RPC_{CLIENT,SERVER}_FINISHED_COUNT. From these
metrics, users will be able to derive count of RPCs that are currently
active.
Only bump the counter from AbstractServerStream.TransportState, and hole punch
from AbstractServerStream to TransportState when the application calls close.
This commit updates gRPC core to use io.opencensus:opencensus-api and
io.opencensus:opencensus-contrib-grpc-metrics instead of
com.google.instrumentation:instrumentation-api for stats and tagging. The gRPC
Monitoring Service continues to use instrumentation-api.
The main changes affecting gRPC:
- The StatsContextFactory is replaced by three objects, StatsRecorder, Tagger,
and TagContextBinarySerializer.
- The StatsRecorder, Tagger, and TagContextBinarySerializer are never null,
but the objects are no-ops when the OpenCensus implementation is not
available.
This commit includes changes written by @songy23 and @sebright.
Our Travis-CI builds are failing with "Protocol family unavailable" due
to the usage of ::1. Although it's 2017 and we'd expect to have ipv6
_loopback_ anywhere that mattered, apparently that's not the case.
The tests now work equally well on IPv4-only and IPv6-only machines.
This is needed for both completeness and stats/tracing contexts propagation.
Stats recording with Census is intentionally disabled (#2284), while the rest of the Census-related logic work the same as on the other transports.
Two methods, outboundMessageSent() and inboundMessageRead() are added to StreamTracer in order to associate individual messages with sizes. Both types of sizes are optional, as allowed by Census tracing.
Both methods accept a sequence number as the type ID as required by Census. The original outboundMesage() and inboundMessage() are also replaced by overrides that take the sequence number, to better match the new methods. The deprecation of the old overrides are tracked by #3460
This aligns with shutdownNow(), which is already accepting a status.
The status will be propagated to application when RPCs failed because
of transport shutdown, which will become useful information for debug.
This is a big, but mostly mechanical change. The newly added Test*StreamTracer classes are designed to be extended which is why they are non final and have protected fields. There are a few notable things in this:
1. verifyNoMoreInteractions is gone. The API for StreamTracers doesn't make this guarantee. I have recovered this behavior by failing duplicate calls. This has resulted in a few bugs in the test code being fixed.
2. StreamTracers cannot be mocked anymore. Tracers need to be thread safe, which mocks simply are not. This leads to a HUGE number of reports when trying to find real races in gRPC.
3. If these classes are useful, we can promote them out of internal. I just put them here out of convenience.
Moved the following APIs from `io.grpc.testing.TestUtils` to `io.grpc.internal.TestUtils`:
`InetSocketAddress testServerAddress(String host, int port)`
`InetSocketAddress testServerAddress(int port)`
`List<String> preferredTestCiphers()`
`File loadCert(String name)`
`X509Certificate loadX509Cert(String fileName)`
`SSLSocketFactory newSslSocketFactoryForCa(Provider provider, File certChainFile)`
`void sleepAtLeast(long millis)`
APIs not to be moved:
`ServerInterceptor recordRequestHeadersInterceptor()`
`ServerInterceptor recordServerCallInterceptor()`
Main implementation is in CensusTracingModule.
Also a few fix-ups in the stats implementation CensusStatsModule:
- Change header key name from grpc-census-bin to grpc-tags-bin
- Server does not fail on header parse errors. Uses the default instead.
Protect Census-based stats and tracing with static flags: `GrpcUtil.enableCensusStats` and `GrpcUtil.enableCensusTracing`. They keep those features disabled by default until they, especially their wire formats, are stabilized.
We aren't upgrading yet, because we don't want to begin using the new
Mockito APIs. But all the tests now pass with the newer version. There
are a lot of warnings that can't be fixed until we bump the mockito
version.
Background
==========
LoadBalancer needs to track RPC measurements and status for
load-reporting. We need to introduce a "Tracer" API for that.
Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.
Alternatives
============
We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.
Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.
Part of the LB-related information is whether the client has received
the initial headers from the server. While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.
Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API. Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.
Summary of changes
==================
API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.
On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.
Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor. By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call. This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.
On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.
The Tracer API supports propagation of stats/tracing information through
Context and metadata. Both client-side and server-side tracer factories
have access to the headers object. Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.
Implementation details
----------------------
Only real streams report stats. Pseudo streams such as delayed stream,
failing stream don't report. InProcess transport streams currently
don't report stats.
"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream. On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.
The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.
Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.
Bug fixed
----------------
The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong. Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK. Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.
Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK). Whichever happens
first is the reported status.
TODOs
=====
A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.
This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
This is needed because in interceptor tests, often the types cannot
be changed. The void methods stay for users who are writing tests
where they actually don't care about types. The noop methods
require types to be specified. This is for users who don't care
about the implementation. These represent different levels of
commitment.
This eases the transition of code Mocking MethodDescriptor, which
breaks in this release.
Because "core"'s test source already depends on "testing", e.g.,
`core/src/test/java/io/grpc/internal/ServerCallImplTest.java` uses
`testing/src/main/java/io/grpc/internal/testing/StatsTestUtils.java`,
which forms a circular dependency.
This change moves the StatsContext setter accessors from "core"'s test
source to "testing".
Resolves#2651