* Fork DelayedClientTransport into DelayedClientTransport2, and fix a race
on it. Consider the following sequence:
1. Channel is created. Picker is initially null
2. Channel has a new RPC. Because picker is null, the delayed transport
is picked, but newStream() is not called yet.
3. LoadBalancer updates a new picker to Channel. Channel runs
reprocess() but does nothing because no pending stream is there.
4. newStream() called on the delayed transport.
In previous implementation, newStream() on step 4 will not see the
picker. It will only use the next picker.
After this change, delayed transport would save the latest picker and
use it on newStream(), which is the desired behavior.
Also deleted all the code that will not be used after the LB refactor.
* Also fixed a bug: newStream() should always return a failing stream if it's shutdown. Previously it's not doing so if picker is not null.
OutputStreamAdapter is a private class, and is only ever called by two
places: ByteStreams.copy, which never calls the single byte method, and
DrainTo, which potentially can. There are two classes that implement
DrainTo, which is primarily ProtoInputStream. It calls
MessageLite.writeTo(OutputStream) or back again to ByteStreams.copy.
MessageLite.writeTo in turn wraps the OutputStream in a
CodedOutputStream.OutputStreamEncoder, which then never calls the single
byte version. Thus, all know implementations never call the single
byte override.
Additionally, it is well known that the single byte write is slow, and
is expected to be wrapped in a BufferedOutputStream if there are many
small writes.
- Remove unused variable terminated from TransportListener#transportTerminated
- Do not mention getLock in the javadoc of InUseStateAggregator2#handleNotInUse
FackeClock used PriorityQueue for storing tasks which is not
thread-safe, and caused flake
> io.grpc.internal.ClientCallImplTest > deadlineExceededBeforeCallStarted FAILED
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.PriorityQueue.removeAt(PriorityQueue.java:619)
> at java.util.PriorityQueue.remove(PriorityQueue.java:378)
> at io.grpc.internal.FakeClock$ScheduledTask.cancel(FakeClock.java:89)
> at io.grpc.internal.ClientCallImpl.removeContextListenerAndCancelDeadlineFuture(ClientCallImpl.java:296)
> at io.grpc.internal.ClientCallImpl.start(ClientCallImpl.java:250)
> at io.grpc.internal.ClientCallImplTest.deadlineExceededBeforeCallStarted(ClientCallImplTest.java:615)
Document the threading requirements.
Turn all metrics to volatile because callEnded() may not be synchronized
with other metric-updating methods in the case where the RPC is closed
because of transport error or cancellation from the other side.
Remove precondition checks thus allow metrics updating methods to be
called after callEnded() has been called, which may happen since the
application may not be aware when the RPC is closed by the transport.
This is the first step of a major refactor for the LoadBalancer-related part of Channel impl (#1600). It forks TransportSet into InternalSubchannel and makes changes on it.
What's changed:
- InternalSubchannel no longer has delayed transport, thus will not buffer
streams when a READY real transport is absent.
- When there isn't a READ real transport, obtainActiveTransport() will
return null.
- InternalSubchannel is no longer a ManagedChannel
- Overhauled Callback interface, with onStateChange() replacing the
adhoc transport event methods.
- Call out channelExecutor, which is a serializing executor that runs
the Callback.
The first commit is an unmodified copy of the files that are being forked. Please review the second commit which changes on the forked files.
NoopCensusContextFactory used to use a shared zero-sized ByteBuffer,
which has mutable states like positions and thus is not thread-safe.
Sharing it means it will be used by all calls thus may be used from
different threads, which makes TSAN unhappy in tests.
Resolves#2377
resolvesgrpc/grpc#8715
now that setListener is called prior to
`JumpToApplicationThreadServerStreamListener` being completely ready to
use. We should not call `AbstractStream2#onStreamAllocated()` inside
`setListener()` anymore, but call it after `ServerImpl#streamCreated()`
is completed.
Resolves#1936
Two bugs fixed:
- NPE in `ServerImpl#streamCreated()` when stream listener not set before
stream closed
- It is possible that `internalCancel()` is called during
`InProcessClientStream#start()` due to early server `onComplete()` or server `onError()`,
in this case no need to enlist `streams`, otherwise the channel can not be shutdown by `shutdown()`.
Proxies can trigger errors, and we'd like to categorize common
proxy-generating errors into gRPC status codes for sane application
handling.
Fixes#2264
We only want to use the HTTP code for errors, when the response is not
grpc. grpc status codes may be mapped to HTTP codes in the future, and
we don't want to break when that happens. We also don't want to ever
accidentally use Status.OK without receiving it from the server, even
for HTTP 200.
core: adds @Nullable Object getAttachedObject() to ServiceDescriptor
compiler: Plumbing necessary to access proto file descriptors via
the reflection service
Found this bug in some unit test in which client-streaming call is hanging because `ServerCallImpl#messageRead()` did not handle RuntimeException properly
Highlights
==========
StatsTraceContext
-----------------
The bridge between gRPC library and Census. It keeps track of the total
payload sizes and the elapsed time of a Call. The rest of the gRPC code
doesn't invoke Census directly.
Context propagation
-------------------
StatsTraceContext carries CensusContext (and the upcoming TraceContext)
and is attached to the gRPC Context.
1. StatsTraceContext is created by ManagedChannelImpl, by calling
createClientContext(), which inherits the current CensusContext if available.
2. ManagedChannelImpl passes StatsTraceContext to ClientCallImpl, then
to the stream, then to the framer and deframer explicitly.
3. ClientCallImpl propagates the CensusContext to the headers.
1. ServerImpl creates a StatsTraceContext by implementing a new callback
method StatsTraceContext methodDetermined(MethodDescriptor, Metadata) on
ServerTransportListener.
2. NettyServerHandler calls methodDetermined() before creating the
stream, and passes the StatsTraceContext to the stream.
3. When ServerImpl creates the gRPC Context for the new ServerCall, it
calls the new method statsTraceContext() on ServerStream and puts the
StatsTraceContext in the Context.
Metrics recording
-----------------
1. Client-side start time: when ClientCallImpl is created
2. Server-side start time: when methodDetermined() is called
3. Server-side end time: in ServerStreamListener.closed(), but before
calling onComplete() or onCancel() on ServerCall.Listener.
4. Client-side end time: in ClientStreamListener.closed(), but before
calling onClonse() on ClientCall.Listener
Message sizes are recorded in MessageFramer and MessageDeframer. Both
the uncompressed and wire (possibly compressed) payload sizes are
counted.
TODOs
=====
The CensusContext created from headers on the server side should be
attached to the gRPC Context for the call. It's not done at this moment
because Census lacks the proper API to do it. It only affects tracing
and resource accounting, but doesn't affect stats functionality
Channel state API doesn't allow a TRANSIENT_FAILURE->IDLE edge.
Change TransportSet to always transition to CONNECTING after
TRANSIENT_FAILURE.
This behavior, combined with that it never uses IDLE_TIMEOUT to
transition from READY to IDLE, effectivly makes TransportSet
Channel-state API-compliant under an infinite IDLE_TIMEOUT.
Also set the default IDLE_TIMEOUT to 30min.
Trying to fix issue #2188
- Try to keep avoiding the lock issue #2152 and also to avoid race condition #2188.
- Add `checkState` for `endBackoff()`. Could help hit and identify any potential issue related to #2188.
- Make sure `startBackoff()` and `endBackoff()` invoked in the right order.
- Not to schedule endBackoff if transportSet has been shutdown.
US_ASCII may have not been initialized when the Code enums are created,
causing an NPE and making Status class fail to load:
java.lang.ExceptionInInitializerError
at io.grpc.Status$Code.<init>(Status.java:222)
at io.grpc.Status$Code.<clinit>(Status.java:79)
at io.grpc.internal.GrpcUtilTest.http2ErrorStatus(GrpcUtilTest.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:49)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:230)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:273)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:240)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:65)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:238)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:55)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:231)
at org.junit.runners.ParentRunner.run(ParentRunner.java:316)
at com.google.testing.junit.runner.junit4.CancellableRequestFactory$CancellableRunner.run(CancellableRequestFactory.java:90)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
at com.google.testing.junit.runner.junit4.JUnit4Runner.run(JUnit4Runner.java:112)
at com.google.testing.junit.runner.GoogleTestRunner.runTestsInSuite(GoogleTestRunner.java:197)
at com.google.testing.junit.runner.GoogleTestRunner.runTestsInSuite(GoogleTestRunner.java:174)
at com.google.testing.junit.runner.GoogleTestRunner.main(GoogleTestRunner.java:133)
Caused by: java.lang.NullPointerException
at io.grpc.Status$Code.values(Status.java:75)
at io.grpc.Status.buildStatusList(Status.java:249)
at io.grpc.Status.<clinit>(Status.java:245)
Strangely this only fails GrpcUtilTest.http2ErrorStatus inside google.
Now switch to use Guava Charsets.
Commit 656e8ce (#2208) removed most usages, but missed one and one was
re-added. Since all usages are removed, it should be much easier to
notice regressions.
This change exposed a pre-existing bug where shutdownNow wasn't called
for decommissionedTransports. The bug is fixed and a test added in this
commit.
Fixes#2120
Our API allows pings to be send even after the transport has been shutdown. We currently
don't handle the case, where the Netty channel has been closed but the NettyClientHandler
has not yet been removed from the pipeline, correctly. That is, we need to query the shutdown
status whenever we receive a ClosedChannelException.
Also, some cleanup.