mirror of https://github.com/grpc/grpc-java.git
Background ========== LoadBalancer needs to track RPC measurements and status for load-reporting. We need to introduce a "Tracer" API for that. Since such API is very close to the current Census(instrumentation)-based stats reporting mechanism in terms of what are recorded, we will migrate the Census-based stats reporting under the new Tracer API. Alternatives ============ We considered plumbing the LB-related information from the LoadBalancer to the core, and recording those information along with the currently recorded stats to Census. The LB-related information, such as LB_ID, reason for dropping reqeusts etc, would be added to the Census StatsContext as tags. Since tags are held by StatsContext before eventually being recorded by providing the measurements, and StatsContext is immutable, this would require a way for LoadBalancer to override the StatsContext, which means LoadBalancer API would has direct reference to the Census StatsContext. This is undesirable because Census API is not stable yet. Part of the LB-related information is whether the client has received the initial headers from the server. While such information can be grabbed by implementing a ClientInterceptor, it must be recorded along with other information such as LB_ID to be useful, and LB_ID is only available in GrpclbLoadBalancer. Bottom line, trying to use solely the Census StatsContext API to record LB load information would require extra data plumbing channel between ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing Census API on the gRPC API. Even with those extensive changes, we are yet to find a working solution. Therefore, we abandoned this idea and propose this PR. Summary of changes ================== API summary ----------- Introduce "StreamTracer" API, a callback interface for receiving stats and tracing related updates concerning **a single stream**. "ClientStreamTracer" and "ServerStreamTracer" add side-specific events. A stream can have zero or more tracers and report to all of them. On the client-side, CallOptions now takes a list of ClientStreamTracer.Factory. Opon creating a ClientStream, each of the factory creates a ClientStreamTracer for the stream. This allows ClientInterceptors to install its own tracer factories by overriding the CallOptions. Since StreamTracer only tracks the span of a stream, tracking of a ClientCall needs to be done in a ClientInterceptor. By installing its own StreamTracer when a ClientCall is created, ClientInterceptor can associate the updates for a Call with the updates for the Streams created for that Call. This is how we keep the existing Census reporting mechanism in CensusStreamTracerModule. On the server-side, ServerStreamTracer.Factory is added through the ServerBuilder, and is used to create ServerStreamTracers for every ServerStream. The Tracer API supports propagation of stats/tracing information through Context and metadata. Both client-side and server-side tracer factories have access to the headers object. Client-side tracer relies on interceptor to read the Context, while server-side tracer has filterContext() method that can override the Context. Implementation details ---------------------- Only real streams report stats. Pseudo streams such as delayed stream, failing stream don't report. InProcess transport streams currently don't report stats. "StatsTraceContext" which used to receive updates from core and report directly to Census (StatsContext), now delegates to the StreamTracers of a stream. On the client-side, the scope of a StatsTraceContext reduces from ClientCall to a ClientStream to match the scope of StreamTracer. The Census-specific logic that was in StatsTraceContext is moved into CensusStreamTracerModule, which produces factories for StreamTracers that report to Census. Reporting with StatsTraceContext is moved out of the Channel/Call layer into Transport/Stream layer, to match the scope change of StatsTraceContext. Bug fixed ---------------- The end of a server-side call was reported in ServerCallImpl's ServerStreamListenerImpl.closed(), which was wrong. Because closed() receiving OK doesn't necessarily mean the RPC ended with OK. Instead it means the server has successfully sent the final status, which may be non-OK, to the client. Now the end report is done in both ServerStream.close(any Status) and before calling ServerStreamListener.closed(non-OK). Whichever happens first is the reported status. TODOs ===== A follow-up change to the LoadBalancer API will add a ClientStreamTracer.Factory to the PickResult to complete the API needed by load-reporting. |
||
|---|---|---|
| .. | ||
| src | ||
| build.gradle | ||