Highlights
==========
StatsTraceContext
-----------------
The bridge between gRPC library and Census. It keeps track of the total
payload sizes and the elapsed time of a Call. The rest of the gRPC code
doesn't invoke Census directly.
Context propagation
-------------------
StatsTraceContext carries CensusContext (and the upcoming TraceContext)
and is attached to the gRPC Context.
1. StatsTraceContext is created by ManagedChannelImpl, by calling
createClientContext(), which inherits the current CensusContext if available.
2. ManagedChannelImpl passes StatsTraceContext to ClientCallImpl, then
to the stream, then to the framer and deframer explicitly.
3. ClientCallImpl propagates the CensusContext to the headers.
1. ServerImpl creates a StatsTraceContext by implementing a new callback
method StatsTraceContext methodDetermined(MethodDescriptor, Metadata) on
ServerTransportListener.
2. NettyServerHandler calls methodDetermined() before creating the
stream, and passes the StatsTraceContext to the stream.
3. When ServerImpl creates the gRPC Context for the new ServerCall, it
calls the new method statsTraceContext() on ServerStream and puts the
StatsTraceContext in the Context.
Metrics recording
-----------------
1. Client-side start time: when ClientCallImpl is created
2. Server-side start time: when methodDetermined() is called
3. Server-side end time: in ServerStreamListener.closed(), but before
calling onComplete() or onCancel() on ServerCall.Listener.
4. Client-side end time: in ClientStreamListener.closed(), but before
calling onClonse() on ClientCall.Listener
Message sizes are recorded in MessageFramer and MessageDeframer. Both
the uncompressed and wire (possibly compressed) payload sizes are
counted.
TODOs
=====
The CensusContext created from headers on the server side should be
attached to the gRPC Context for the call. It's not done at this moment
because Census lacks the proper API to do it. It only affects tracing
and resource accounting, but doesn't affect stats functionality
Metadata.removeAll creates an iterator for looking through removed
values even if the call doens't use it. This change adds a similar
method which doesn't create garbage.
This change makes it easier in the future to alter the internals
of Metadata where it may be expensive to return removed values.
After debugging #2153, it would have been nice to know what the exact
parameter was that was null. This change adds a name for each
checkNotNull (and tries to normalized on static imports in order to
shorten lines)
Implementations of ManagedClientTransport.start() are restricted from
calling the passed listener until start() returns, in order to avoid
reentrency problems with locks. For most transports this isn't a
problem, because they need additional threads anyway. InProcess uses no
additional threads naturally so ends up needing a thread just to
notifyReady. Now transports can just return a Runnable that can be run
after locks are dropped.
This was originally intended to be a performance optimization, but the
thread also causes nondeterminism because RPCs are delayed until
notifyReady is called. So avoiding the thread reduces needless fakes
during tests.
The != should have been ==. However, it is provable that the exception
won't be null, but we want to make that fact obvious when auditing. So
we just fail if the exception is ever null.
conscrypt at some point which would allow ALPN to function
Clarify the SSLContext.getDefault is not used when constructing the
default SSLSocketFactory.
Creates a KeepAliveManager which should be used by each transport. It does keepalive pings and shuts down the transport if does not receive an response in time.
It prevents the connection being shut down for long lived streams. It could also detect broken socket in certain platforms.
Resolves#1756
The thread-unsafe method `io.grpc.testing.TestUtils.pickUnusedPort` causes flakes (#1756) in windows. Need to avoid use of this method in test as in windows the tests are running in different jvms and concurrent calls of this method in multiple processes tend to return the same port number.
There are some usages of this method in benchmarks, so moved the method to `io.grpc.benchmarks.Utils` and the method will only be used in benchmarks and not in test.
Resolves#1815
The only legitimate way to create plaintext connection with okhttp is by calling
OkHttpChannelBuilder#usePlaintex
Throw IllegalArgumentException when calling
OkHttpChannelBuilder#connectionSpec or Utils.convertSpec directly with CLEARTEXT a connectionSped argument.
A transport is "in use" iff number of streams > 0. In following changes
the channel will use this information when deciding whether it should
transit to the IDLE mode (#1276).
Introduce CallCredentials as a first-class option to allow applications
to set per-call credentials into headers for outgoing RPCs. This will
supersede ClientAuthInterceptor. It has access to more
information (e.g., transport attributes, MethodDescriptor) and allow
results to be returned asynchronously, e.g., from a blocking I/O, which
was problemantic with ClientAuthInterceptor.
adding
ClientStream newStream(MethodDescriptor<?, ?> method, Metadata headers, CallOptions callOptions);
to ClientTransport interface
Created this PR first because both fail fast implementation and another change will be using this interface change
OkHttpClientTransport has a fix for shutdown during start which
prevented transportTerminated from being called. It also no longer fails
pending streams during shutdown. Lifecycle management in general was
revamped to be hopefully simpler and more precise. In the process GOAWAY
handling (both sending and receiving) was improved.
With some of the changes, the log spam generated was immense and
unhelpful (since many exceptions are part of normal operation on
shutdown). This change reduces the amount of log spam to nothing.
To ManagedChannelImpl, TransportSet and all client transport
implementations, so they can be correlated in the logs. Also added more
life-cycle logging in general.
Although the changes were determined automatically, they were manually
applied to the codebase.
ClientCalls actually has a bug fix, since the suggestion to add
interrupt() made it obvious that interrupted() was inappropriate.
Always return a completed future from `TransportSet`. If a (real) transport has not been created (e.g., in reconnect back-off), a `DelayedClientTransport` will be returned.
Eventually we will get rid of the transport futures everywhere, and have streams always __owned__ by some transports.
DelayedClientTransport
----------------------
After we get rid of the transport future, this is what `ClientCallImpl` and `LoadBalancer` get when a real transport has not been created yet. It buffers new streams and pings until `setTransport()` is called, after which point all buffered and future streams/pings are transferred to the real transport.
If a buffered stream is cancelled, `DelayedClientTransport` will remove it from the buffer list, thus #1342 will be resolved after the larger refactoring is complete.
This PR only makes `TransportSet` use `DelayedClientTransport`. Follow-up changes will be made to allow `LoadBalancer.pickTransport()` to return null, in which case `ManagedChannelImpl` will give `ClientCallImpl` a `DelayedClientTransport`.
Changes to ClientTransport shutdown semantics
---------------------------------------------
Previously when shutdown() is called, `ClientTransport` should not accept newStream(), and when all existing streams have been closed, `ClientTransport` is terminated. Only when a transport is terminated would a transport owner (e.g., `TransportSet`) remove the reference to it.
`DelayedClientTransport` brings about a new case: when `setTransport()` is called, we switch to the real transport and no longer need the delayed transport. This is achieved by calling `shutdown()` on the delayed transport and letting it terminate. However, as the delayed transport has already been handed out to users, we would like `newStream()` to keep working for them, even though the delayed transport is already shut down and terminated.
In order to make it easy to manage the life-cycle of `DelayedClientTransport`, we redefine the shutdown semantics of transport:
- A transport can own a stream. Typically the transport owns the streams
it creates, but there may be exceptions. `DelayedClientTransport` DOES
NOT OWN the streams it returns from `newStream()` after `setTransport()`
has been called. Instead, the ownership would be transferred to the
real transport.
- After `shutdown()` has been called, the transport stops owning new
streams, and `newStream()` may still succeed. With this idea,
`DelayedClientTransport`, even when terminated, will continue
passing `newStream()` to the real transport.
- When a transport is in shutdown state, and it doesn't own any stream,
it then can enter terminated state.
ManagedClientTransport / ClientTransport
----------------------------------------
Remove life-cycle interfaces from `ClientTransport`, and put them in its subclass - `ManagedClientTransport`, with the same idea that we have `Channel` and `ManagedChannel`. Only the one who creates the transport will get `ManagedClientTransport` thus is able to start and shutdown the transport. The users of transport, e.g., `LoadBalancer`, can only get `ClientTransport` thus are not alter its state. This change clarifies the responsibility of transport life-cycle management.
Fix TransportSet shutdown semantics
-----------------------------------
Currently, if `TransportSet.shutdown()` has been called, it no longer create new transports, which is wrong.
The correct semantics of `TransportSet.shutdown()` should be:
- Shutdown all transports, thus stop new streams being created on them
- Stop `obtainActiveTransport()` from returning transports
- Streams that already created, including those buffered in delayed transport, should continue. That means if delayed transport has buffered streams, we should let the existing reconnect task continue.
This provides more structured data into the application for it to do
special handling.
In general, we would hope most people don't need this functionality, but
it is a good escape hatch to allow users to workaround infrastructure
problems.
This reverts commit eca1f7c1d6.
We want to preserve the status message identical to what the server
sent. We'll need a better way to communicate debugging details.