This aligns with shutdownNow(), which is already accepting a status.
The status will be propagated to application when RPCs failed because
of transport shutdown, which will become useful information for debug.
Coupled with the similar change on server-side, this removes the need for a
thread when using Netty. For InProcess and OkHttp, it would allow us to let the
user to provide the scheduler for tests or application-wide thread sharing.
Class.forName(String) is understood by ProGuard, removing the need for
manual ProGuard configuration and allows ProGuard to rename the provider
classes. Previously the provider classes could not be renamed.
Fixes#2633
AbstractManagedChannelImplBuilder accepts an overrideAuthority parameter, but this value is not hooked up to the name resolver object. Ultimately, Channel.authority consults with the NameResolver, so the overrideAuthority should be hooked into the NameResolverFactory, while all other functionality should be preserved.
Also, add unit tests for all the variants of OkHttpChannelBuilder and NettyChannelBuilder constructors, namely to test the slightly different NettyChannelBuilder(SocketAddress) code path.
Fixes#2682
Background
==========
LoadBalancer needs to track RPC measurements and status for
load-reporting. We need to introduce a "Tracer" API for that.
Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.
Alternatives
============
We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.
Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.
Part of the LB-related information is whether the client has received
the initial headers from the server. While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.
Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API. Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.
Summary of changes
==================
API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.
On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.
Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor. By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call. This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.
On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.
The Tracer API supports propagation of stats/tracing information through
Context and metadata. Both client-side and server-side tracer factories
have access to the headers object. Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.
Implementation details
----------------------
Only real streams report stats. Pseudo streams such as delayed stream,
failing stream don't report. InProcess transport streams currently
don't report stats.
"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream. On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.
The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.
Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.
Bug fixed
----------------
The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong. Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK. Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.
Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK). Whichever happens
first is the reported status.
TODOs
=====
A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
`keepAlivedManager#onTransportshutdown` should not be called in `transport.shutdown()` because it is possible that there are still open RPC streams, and maybe inactive, so keepalive is still needed.
fix JavaStyle and ErrorProne warnings found in internal weekly import:
- Calls to ExpectedException#expect should always be followed by exactly one statement.
- Do not mock 'java.util.concurrent.Future'
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.
This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
The new plugin uses a newer version of animalsniffer, allows overriding
the animalsniffer version used, and has up-to-date handling. The
up-to-date handling cuts fully incremental parallel build times in half,
from 5.5s to 2.7s.
The previous plugin was supposed to be verifying tests. However, either
it wasn't verifying them or its verification was broken.
This is needed because in interceptor tests, often the types cannot
be changed. The void methods stay for users who are writing tests
where they actually don't care about types. The noop methods
require types to be specified. This is for users who don't care
about the implementation. These represent different levels of
commitment.
This eases the transition of code Mocking MethodDescriptor, which
breaks in this release.
This is a squash and modification of master commits that also includes:
netty,okhttp: Fix CONNECT and its error handling
This commit has been modified to reduce its size to substantially reduce
risk of it breaking Netty error handling. But that also means proxy
error handling just provides a useless "there was an error" sort of
message.
There is no Java API to enable the proxy support. Instead, you must set
the GRPC_PROXY_EXP environment variable which should be set to a
host:port string. The environment variable is temporary; it will not
exist in future releases. It exists to provide support without needing
explicit code to enable the future, while at the same time not risking
enabling it for existing users.
add `getAttributes()` to `ClientStream` and `ClientCall` to be able to share clientTransport
information such as socket TOS with higher lever API's, once the RPC picks up an active transport that is ready to use.
This patch introduces an additional ALPN protocol, grpc-exp, intended to
take preference to h2 and indicate to the server that the connection
contains only gRPC traffic. This allows servers and intermediate boxes
to distinguish gRPC from other HTTP/2 traffic.
The choice of grpc-exp as a protocol identifier indicates that this
scheme is currently experimental and should not be relied upon. The
protocol is not in the IANA TLS registry.
This is the grpc-java equivalent of
8cdf17a620.
Due to the opacity of ALPN and TLS negotiation at application level, the
tests are only there to validate that the lists we're feeding into the
negotiation process have the desired ordering properties:
* If grpc-exp is present, h2 is as well.
* grpc-exp is preferenced over h2.
"Because of spurious wakeups, wait() must always be called in a loop".
Got rid of wait().
"If you return or throw from a finally, then values returned or thrown from the try-catch block will be ignored. Consider using try-with-resources instead."
Ignored the exception thrown from finally.
TransportSet mandates that transportShutdown() has been called prior to
transportTerminated(). OkHttp transport tries to meet this requirement
by calling startGoAway() which calls transportShutdown() in both the
normal path and the error path. However, the error path only caught
Exception. If an Error is thrown from the try-block, calling
transportTerminated() in the finally block will cause a check failure in
TransportSet, shadowing the original Error, which is undesirable for
debugging. Catching Throwable here is more helpful.
We only want to use the HTTP code for errors, when the response is not
grpc. grpc status codes may be mapped to HTTP codes in the future, and
we don't want to break when that happens. We also don't want to ever
accidentally use Status.OK without receiving it from the server, even
for HTTP 200.
Highlights
==========
StatsTraceContext
-----------------
The bridge between gRPC library and Census. It keeps track of the total
payload sizes and the elapsed time of a Call. The rest of the gRPC code
doesn't invoke Census directly.
Context propagation
-------------------
StatsTraceContext carries CensusContext (and the upcoming TraceContext)
and is attached to the gRPC Context.
1. StatsTraceContext is created by ManagedChannelImpl, by calling
createClientContext(), which inherits the current CensusContext if available.
2. ManagedChannelImpl passes StatsTraceContext to ClientCallImpl, then
to the stream, then to the framer and deframer explicitly.
3. ClientCallImpl propagates the CensusContext to the headers.
1. ServerImpl creates a StatsTraceContext by implementing a new callback
method StatsTraceContext methodDetermined(MethodDescriptor, Metadata) on
ServerTransportListener.
2. NettyServerHandler calls methodDetermined() before creating the
stream, and passes the StatsTraceContext to the stream.
3. When ServerImpl creates the gRPC Context for the new ServerCall, it
calls the new method statsTraceContext() on ServerStream and puts the
StatsTraceContext in the Context.
Metrics recording
-----------------
1. Client-side start time: when ClientCallImpl is created
2. Server-side start time: when methodDetermined() is called
3. Server-side end time: in ServerStreamListener.closed(), but before
calling onComplete() or onCancel() on ServerCall.Listener.
4. Client-side end time: in ClientStreamListener.closed(), but before
calling onClonse() on ClientCall.Listener
Message sizes are recorded in MessageFramer and MessageDeframer. Both
the uncompressed and wire (possibly compressed) payload sizes are
counted.
TODOs
=====
The CensusContext created from headers on the server side should be
attached to the gRPC Context for the call. It's not done at this moment
because Census lacks the proper API to do it. It only affects tracing
and resource accounting, but doesn't affect stats functionality
Metadata.removeAll creates an iterator for looking through removed
values even if the call doens't use it. This change adds a similar
method which doesn't create garbage.
This change makes it easier in the future to alter the internals
of Metadata where it may be expensive to return removed values.
After debugging #2153, it would have been nice to know what the exact
parameter was that was null. This change adds a name for each
checkNotNull (and tries to normalized on static imports in order to
shorten lines)
Implementations of ManagedClientTransport.start() are restricted from
calling the passed listener until start() returns, in order to avoid
reentrency problems with locks. For most transports this isn't a
problem, because they need additional threads anyway. InProcess uses no
additional threads naturally so ends up needing a thread just to
notifyReady. Now transports can just return a Runnable that can be run
after locks are dropped.
This was originally intended to be a performance optimization, but the
thread also causes nondeterminism because RPCs are delayed until
notifyReady is called. So avoiding the thread reduces needless fakes
during tests.
The != should have been ==. However, it is provable that the exception
won't be null, but we want to make that fact obvious when auditing. So
we just fail if the exception is ever null.
conscrypt at some point which would allow ALPN to function
Clarify the SSLContext.getDefault is not used when constructing the
default SSLSocketFactory.
Creates a KeepAliveManager which should be used by each transport. It does keepalive pings and shuts down the transport if does not receive an response in time.
It prevents the connection being shut down for long lived streams. It could also detect broken socket in certain platforms.
Resolves#1756
The thread-unsafe method `io.grpc.testing.TestUtils.pickUnusedPort` causes flakes (#1756) in windows. Need to avoid use of this method in test as in windows the tests are running in different jvms and concurrent calls of this method in multiple processes tend to return the same port number.
There are some usages of this method in benchmarks, so moved the method to `io.grpc.benchmarks.Utils` and the method will only be used in benchmarks and not in test.
Resolves#1815
The only legitimate way to create plaintext connection with okhttp is by calling
OkHttpChannelBuilder#usePlaintex
Throw IllegalArgumentException when calling
OkHttpChannelBuilder#connectionSpec or Utils.convertSpec directly with CLEARTEXT a connectionSped argument.
A transport is "in use" iff number of streams > 0. In following changes
the channel will use this information when deciding whether it should
transit to the IDLE mode (#1276).
Introduce CallCredentials as a first-class option to allow applications
to set per-call credentials into headers for outgoing RPCs. This will
supersede ClientAuthInterceptor. It has access to more
information (e.g., transport attributes, MethodDescriptor) and allow
results to be returned asynchronously, e.g., from a blocking I/O, which
was problemantic with ClientAuthInterceptor.
adding
ClientStream newStream(MethodDescriptor<?, ?> method, Metadata headers, CallOptions callOptions);
to ClientTransport interface
Created this PR first because both fail fast implementation and another change will be using this interface change
OkHttpClientTransport has a fix for shutdown during start which
prevented transportTerminated from being called. It also no longer fails
pending streams during shutdown. Lifecycle management in general was
revamped to be hopefully simpler and more precise. In the process GOAWAY
handling (both sending and receiving) was improved.
With some of the changes, the log spam generated was immense and
unhelpful (since many exceptions are part of normal operation on
shutdown). This change reduces the amount of log spam to nothing.
To ManagedChannelImpl, TransportSet and all client transport
implementations, so they can be correlated in the logs. Also added more
life-cycle logging in general.
Although the changes were determined automatically, they were manually
applied to the codebase.
ClientCalls actually has a bug fix, since the suggestion to add
interrupt() made it obvious that interrupted() was inappropriate.
Always return a completed future from `TransportSet`. If a (real) transport has not been created (e.g., in reconnect back-off), a `DelayedClientTransport` will be returned.
Eventually we will get rid of the transport futures everywhere, and have streams always __owned__ by some transports.
DelayedClientTransport
----------------------
After we get rid of the transport future, this is what `ClientCallImpl` and `LoadBalancer` get when a real transport has not been created yet. It buffers new streams and pings until `setTransport()` is called, after which point all buffered and future streams/pings are transferred to the real transport.
If a buffered stream is cancelled, `DelayedClientTransport` will remove it from the buffer list, thus #1342 will be resolved after the larger refactoring is complete.
This PR only makes `TransportSet` use `DelayedClientTransport`. Follow-up changes will be made to allow `LoadBalancer.pickTransport()` to return null, in which case `ManagedChannelImpl` will give `ClientCallImpl` a `DelayedClientTransport`.
Changes to ClientTransport shutdown semantics
---------------------------------------------
Previously when shutdown() is called, `ClientTransport` should not accept newStream(), and when all existing streams have been closed, `ClientTransport` is terminated. Only when a transport is terminated would a transport owner (e.g., `TransportSet`) remove the reference to it.
`DelayedClientTransport` brings about a new case: when `setTransport()` is called, we switch to the real transport and no longer need the delayed transport. This is achieved by calling `shutdown()` on the delayed transport and letting it terminate. However, as the delayed transport has already been handed out to users, we would like `newStream()` to keep working for them, even though the delayed transport is already shut down and terminated.
In order to make it easy to manage the life-cycle of `DelayedClientTransport`, we redefine the shutdown semantics of transport:
- A transport can own a stream. Typically the transport owns the streams
it creates, but there may be exceptions. `DelayedClientTransport` DOES
NOT OWN the streams it returns from `newStream()` after `setTransport()`
has been called. Instead, the ownership would be transferred to the
real transport.
- After `shutdown()` has been called, the transport stops owning new
streams, and `newStream()` may still succeed. With this idea,
`DelayedClientTransport`, even when terminated, will continue
passing `newStream()` to the real transport.
- When a transport is in shutdown state, and it doesn't own any stream,
it then can enter terminated state.
ManagedClientTransport / ClientTransport
----------------------------------------
Remove life-cycle interfaces from `ClientTransport`, and put them in its subclass - `ManagedClientTransport`, with the same idea that we have `Channel` and `ManagedChannel`. Only the one who creates the transport will get `ManagedClientTransport` thus is able to start and shutdown the transport. The users of transport, e.g., `LoadBalancer`, can only get `ClientTransport` thus are not alter its state. This change clarifies the responsibility of transport life-cycle management.
Fix TransportSet shutdown semantics
-----------------------------------
Currently, if `TransportSet.shutdown()` has been called, it no longer create new transports, which is wrong.
The correct semantics of `TransportSet.shutdown()` should be:
- Shutdown all transports, thus stop new streams being created on them
- Stop `obtainActiveTransport()` from returning transports
- Streams that already created, including those buffered in delayed transport, should continue. That means if delayed transport has buffered streams, we should let the existing reconnect task continue.
This provides more structured data into the application for it to do
special handling.
In general, we would hope most people don't need this functionality, but
it is a good escape hatch to allow users to workaround infrastructure
problems.
This reverts commit eca1f7c1d6.
We want to preserve the status message identical to what the server
sent. We'll need a better way to communicate debugging details.
- Channel builders decide the default port based on whether TLS is used.
- Channel builders populate the default port via an Attributes object
passed to NameResolver.Factory#newNameResolver
- NameResolverRegistry contains all the official NameResolvers. Users
can also add custom NameResolvers to it. It looks up NameResolver by
try-and-fail. It is the default NameResolver.Factory for builders.
DnsNameResolver.
- Pass target as Strings instead of URIs from the channel builder to
ManagedChannelImpl. A target string is not necessarily a valid URI, in
which case ManagedChannelImpl will add "dns:///" to the beginning of
the target and use it as URI.
- DnsNameResolver will require scheme "dns" to be present. It no longer
allows scheme-absent URIs.
- Add NameResolver and LoadBalancer interfaces.
- ManagedChannelImpl now uses NameResolver and LoadBalancer for
transport selection, which may return transports to multiple
addresses.
- Transports are still owned by ManagedChannelImpl, which implements
TransportManager interface that is provided to LoadBalancer, so that
LoadBalancer doesn't worry about Transport lifecycles.
- Channel builders can be created by forTarget() that accepts the fully
qualified target name, which is an URI. (TODO) it's not tested.
- The old address-based construction pattern is supported by using
AbstractManagedChannelImplBuilder.DirectAddressNameResolver.
- (TODO) DnsNameResolver and SimpleLoadBalancer are currently
incomplete. They merely work for the single-address scenario.
We want to allow overriding authority in the ManagedChannelBuilder for
testing. In doing that, we basically require that all Channels support
authority. In reality, this simplifies things and is already being done
by the C implementation, as their unix domain socket support uses
"localhost" just like our in-process transport now does.
We can debate some whether "localhost" is really the most appropriate
authority for the in-process transport, but that should probably happen
later since "localhost" is "good enough" for now.
Negotiation failure results in a RuntimeException, which is not properly handled by the okhttp code, resulting in the client hanging.
Refactored the code to shutdown the transport when TLS negotiation fails.
This provides an API for applications to use gRPC without using
ExperimentalApis. It also allows swapping out a transport implementation
in the future.
Client:
* New ManagedChannel abstract class.
* Adding ping to Channel.
* Moving builders and implementations to internal.
Server:
* Added lifecycle management API to Server (mirroring ManagedChannel).
* Moved ServerImpl, AbstractServerBuilder and handler registries to internal.
* New ServerBuilder abstract class (mirroring ManagedChannelBuilder).
Fixes#545
The URI no longer needs to be provided to the Credential explicitly,
which prevents needing to know a magic string and allows using the same
Credential with multiple services.
The current process of building a channel is a bit complicated in that transports have to provide a own shutdown hook to the channel builder in order to close shared executors. This somewhat entagled creation pattern makes it difficult to separate the process of channel building from transport building. Better separating these two should make the code more readable and maintainable moving forward.