There are two AbstractTransportTests. The newest one is the more aptly
named, so rename the older one to AbstractInteropTest to remove name
collision when speaking.
Fixes#1484
If TransportSet fails to connect a transport (i.e., transportShutdown()
called without transportReady()), TransportSet will automatically
schedule reconnection for the next address, unless it has reached the end
of the address list, in which case it will fail the delayed transport.
This will reduce stream errors caused by bad addresses appearing before
good addresses in the resolved address list.
Before this change, TransportSet would return the real transport on the
first call of obtainActiveTransport(). After this change, it will return
the delayed transport instead.
This requires that all nano protos have .nano in their package name. The
support is expected to be removed from protobuf soon, since using the
deprecated package name causes problems.
Our current lock ordering rules prohibit holding a lock when calling the
channel and stream. This change avoids the lock for both
DelayedClientTransport and DelayedStream. It is effectively a rewrite of
DelayedStream.
The fixes to ClientCallImpl were to ensure sane state in DelayedStream.
Fixes#1510
To ManagedChannelImpl, TransportSet and all client transport
implementations, so they can be correlated in the logs. Also added more
life-cycle logging in general.
Long-lived streams or lengthy RPCs can keep the transport open for
minutes after a GOAWAY is received. Previously, during this time any new
RPCs would fail with a message like:
> Cannot create stream 5 since this endpoint has received a GOAWAY frame
> with last stream id 3
All usages of goAwayStatus were replaced with lifecycleManager. Although
note that previously goAwayStatus() would never return null because it
would generate a Status if the current field was null.
getShutdownStatus() does not have this feature, so some code was
rearranged to guarantee the Status is non-null before retrieving it.
The listener handling was simplified by 1) avoiding the need for
thread-safety and 2) moving state keeping into a small class for easy
comprehensibility and simplified usage in tests.
NettyClientTransport.shutdown() no longer calls transportShutdown()
because it lies (because the message can be delayed for quite some time)
and because it was the only usage of lifecycleManager not on the event
loop.
Fixes#1359
Because `scheduleConnection()` is run under lock, if we ran
`createTransportRunnable` inside `scheduleConnection()`,
`savedDelayedTransport.setTransport()` will be under lock which violates
the assumption made in https://github.com/grpc/grpc-java/issues/1408 that
> there is an implicit rule today that channel layer will not hold any lock while calling into transport
and had caused deadlock with `InProcessTransport`.
Also updated tests to
1. Make it clear that we want the first obtained transport to be the
real transport instead of the delayed transport, because
`InProcessTransport` has deadlock issues with delayed
transport (grpc/grpc-java/pull/1510)
2. Not rely on identity equity to check whether `TransportSet` has
switched to a different address. Instead, call `newStream()` and check
which real transport is actually called, which is more reliable.
3. Add timeout when checking for real transport creation and
starting. Tests in general should not check for implementation detail,
e.g., whether certain work is done synchronously.
Our tests are detecting cases where we are still getting
ClosedChannelException. We need to fix that because it is a useless
status, but until it is fixed we want a stable CI.
Fixes#1513 and NettyTransportTest.serverNotListening failures
Netty client shutdown would race with the negotiation handling and
circumvent AbstractBufferingHandler. Use a new command in order to
leave channel.close() available for abrupt killing of the connection
when connecting.
ping_afterTermination was previously racey that made it succeed. After
fixing the test, Netty would consistently fail to call callback. After
fixing Netty to fail the callback it was not using the right status
because when Netty's channel is closed none of our handlers are run.
This reliably fails the future with ClosedChannelException, which is
useless, so now we special-case that exception and fill in the reason
for shutdown.
To prevent accidentally reporting Status.OK, the transports no longer
use OK when calling transportShutdown. The OK status was already no
longer being consumed, since keying off whether transportReady was
called is more helpful.
This fixes#1330
This is lower-level than the existing AbstractTransportTest. It should
be used by all transports, but InProcessTransport is the only one as
part of this commit.
DelayedClientTransport.PendingStream will override cancel(), which has a
clearer semantic.
Also permitting all status codes except OK in ClientStream.cancel(),
instead of just 4 codes.
This PR finishes the refactoring started by #1395. Resolves#1342.
Major changes:
- All interfaces that return `ListenableFuture<T>` now return `T`
- Stop passing `null` for transports after shut down. Pass
`FailingClientTransport` instead. This simplifies the interface
semantics of interfaces that return a transport, as well as their
callers because they no longer need to check for `null`.
- Add two methods to `TransportManager`:
- `createInterimTransport()` takes the place of `BlankFutureProvider`
in `LoadBalancer` implementations.
- `createFailingTransport()` takes the place of errors in the `Future`
when `LoadBalancer` implementations propagate errors to the caller.
- `createInterimTransport()` creates a `DelayedClientTransport` tracked
by `ManagedChannelImpl`, which will not terminate until all
`DelayedClientTransport`s, in addition to the `TransportSet`s, are
terminated.
- Removed the transport argument from the ready and shutdown callbacks,
because if `LoadBalancer` was given a delayed transport earlier, it
will get the real transport's shutdown event, thus won't be able to
match the two through identity comparison, which beats the purpose of
passing the transport.
When running :grcp-interop-testing:test, Travis has been hanging on OS X
or flaking on Linux with:
> Process 'Gradle Test Executor 4' finished with non-zero exit value 137
Exit code 137 indicates SIGKILL (128 + 9) and is most likely caused by
the JVM being killed by the kernel's OOM killer.
The limit in .travis.yml is 2x what was necessary to do a parallel
build. The main test memory limit in build.gradle is well above 16m
which is necessary for the tests. Interop-testing is well above 64m
which is necessary for interop-testing, but we use 1.5g to help prevent
timeouts on Travis.
Protobuf and protobuf-nano each have one tests that decodes a proto >64M
in size, which prevents them from running with less than 512m and 768m,
respectively.