To ManagedChannelImpl, TransportSet and all client transport
implementations, so they can be correlated in the logs. Also added more
life-cycle logging in general.
Long-lived streams or lengthy RPCs can keep the transport open for
minutes after a GOAWAY is received. Previously, during this time any new
RPCs would fail with a message like:
> Cannot create stream 5 since this endpoint has received a GOAWAY frame
> with last stream id 3
All usages of goAwayStatus were replaced with lifecycleManager. Although
note that previously goAwayStatus() would never return null because it
would generate a Status if the current field was null.
getShutdownStatus() does not have this feature, so some code was
rearranged to guarantee the Status is non-null before retrieving it.
The listener handling was simplified by 1) avoiding the need for
thread-safety and 2) moving state keeping into a small class for easy
comprehensibility and simplified usage in tests.
NettyClientTransport.shutdown() no longer calls transportShutdown()
because it lies (because the message can be delayed for quite some time)
and because it was the only usage of lifecycleManager not on the event
loop.
Fixes#1359
Because `scheduleConnection()` is run under lock, if we ran
`createTransportRunnable` inside `scheduleConnection()`,
`savedDelayedTransport.setTransport()` will be under lock which violates
the assumption made in https://github.com/grpc/grpc-java/issues/1408 that
> there is an implicit rule today that channel layer will not hold any lock while calling into transport
and had caused deadlock with `InProcessTransport`.
Also updated tests to
1. Make it clear that we want the first obtained transport to be the
real transport instead of the delayed transport, because
`InProcessTransport` has deadlock issues with delayed
transport (grpc/grpc-java/pull/1510)
2. Not rely on identity equity to check whether `TransportSet` has
switched to a different address. Instead, call `newStream()` and check
which real transport is actually called, which is more reliable.
3. Add timeout when checking for real transport creation and
starting. Tests in general should not check for implementation detail,
e.g., whether certain work is done synchronously.
Our tests are detecting cases where we are still getting
ClosedChannelException. We need to fix that because it is a useless
status, but until it is fixed we want a stable CI.
Fixes#1513 and NettyTransportTest.serverNotListening failures
Netty client shutdown would race with the negotiation handling and
circumvent AbstractBufferingHandler. Use a new command in order to
leave channel.close() available for abrupt killing of the connection
when connecting.
ping_afterTermination was previously racey that made it succeed. After
fixing the test, Netty would consistently fail to call callback. After
fixing Netty to fail the callback it was not using the right status
because when Netty's channel is closed none of our handlers are run.
This reliably fails the future with ClosedChannelException, which is
useless, so now we special-case that exception and fill in the reason
for shutdown.
To prevent accidentally reporting Status.OK, the transports no longer
use OK when calling transportShutdown. The OK status was already no
longer being consumed, since keying off whether transportReady was
called is more helpful.
This fixes#1330
This is lower-level than the existing AbstractTransportTest. It should
be used by all transports, but InProcessTransport is the only one as
part of this commit.
DelayedClientTransport.PendingStream will override cancel(), which has a
clearer semantic.
Also permitting all status codes except OK in ClientStream.cancel(),
instead of just 4 codes.
This PR finishes the refactoring started by #1395. Resolves#1342.
Major changes:
- All interfaces that return `ListenableFuture<T>` now return `T`
- Stop passing `null` for transports after shut down. Pass
`FailingClientTransport` instead. This simplifies the interface
semantics of interfaces that return a transport, as well as their
callers because they no longer need to check for `null`.
- Add two methods to `TransportManager`:
- `createInterimTransport()` takes the place of `BlankFutureProvider`
in `LoadBalancer` implementations.
- `createFailingTransport()` takes the place of errors in the `Future`
when `LoadBalancer` implementations propagate errors to the caller.
- `createInterimTransport()` creates a `DelayedClientTransport` tracked
by `ManagedChannelImpl`, which will not terminate until all
`DelayedClientTransport`s, in addition to the `TransportSet`s, are
terminated.
- Removed the transport argument from the ready and shutdown callbacks,
because if `LoadBalancer` was given a delayed transport earlier, it
will get the real transport's shutdown event, thus won't be able to
match the two through identity comparison, which beats the purpose of
passing the transport.
When running :grcp-interop-testing:test, Travis has been hanging on OS X
or flaking on Linux with:
> Process 'Gradle Test Executor 4' finished with non-zero exit value 137
Exit code 137 indicates SIGKILL (128 + 9) and is most likely caused by
the JVM being killed by the kernel's OOM killer.
The limit in .travis.yml is 2x what was necessary to do a parallel
build. The main test memory limit in build.gradle is well above 16m
which is necessary for the tests. Interop-testing is well above 64m
which is necessary for interop-testing, but we use 1.5g to help prevent
timeouts on Travis.
Protobuf and protobuf-nano each have one tests that decodes a proto >64M
in size, which prevents them from running with less than 512m and 768m,
respectively.
Codecov.io provides patch-based code coverage, so you can easily know
how many of the added lines are covered. It also has a more useful UI.
Unfortunately, the percentage it reports does not include partially-
covered lines--those with uncovered conditions. Thus the reported
percentage is about 6% lower than the coverage we've been looking at
previously. Because of this alone, I don't expect to remove coveralls
support soon.
Use the bash script instead of python module since pip isn't available
by default on Travis OS X.
jacocoTestReport uses mustRunAfter (contrary to the docs; see
https://issues.gradle.org/browse/GRADLE-2960) to make sure it runs after
all tests, only if testing is taking place. We would like
:grpc-all:jacocoTestReport to behave the same way. Without it, we would
need two separate invocations of gradle (adding ~1m to Travis run) in
order to prevent getting random results depending on what tests just so
happened to have been run.