Guava 20 introduced some overloading optimizations for Preconditions
that require using Guava 20+ at runtime. Unfortunately, Guava 20 removes
some things that is causing incompatibilities with other libraries, like
Cassandra. While the incompatibility did trigger some of those libraries
to improve compatibility for newer Guavas, we'd like to give the
community more time to work through it. See #2688
At this commit, we appear to be compatible with Guava 18+. It's not
clear if we want to actually "support" 18, but it did compile. Guava 17
doesn't have at least MoreObjects, directExecutor, and firstNotNull.
Guava 21 compiles without warnings, so it should be compatible with
Guava 22 when it is released.
One test method will fail with the upcoming Guava 22, but this won't
impact applications. I made MoreThrowables to avoid using any
known-deprecated Guava methods in our JARs, to reduce pain for those
stuck with old versions of gRPC in the future (July 2018).
In the stand-alone Android apps I removed unnecessary explicit deps
instead of syncing the version used.
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.
This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
Futures almost universally should be handled in some way when being
returned, either to receive the value or to cancel scheduled tasks to
prevent leaks.
Netty is a bit of a special case though, since it constantly returns
futures that you ignore (even adding a listener returns the "this"
future). So we want to suppress the warning for code using Netty instead
of trying to fix it. When we enable ErrorProne in the build, we should
start passing -Xep:FutureReturnValueIgnored:OFF in the compilerArgs.
The new plugin uses a newer version of animalsniffer, allows overriding
the animalsniffer version used, and has up-to-date handling. The
up-to-date handling cuts fully incremental parallel build times in half,
from 5.5s to 2.7s.
The previous plugin was supposed to be verifying tests. However, either
it wasn't verifying them or its verification was broken.
The LBv2 setter was added in #2658 as an abstract method, which breaks
anyone who implements ManagedChannelBuilder. Here we add a default
implementation that throws.
This is needed because in interceptor tests, often the types cannot
be changed. The void methods stay for users who are writing tests
where they actually don't care about types. The noop methods
require types to be specified. This is for users who don't care
about the implementation. These represent different levels of
commitment.
This eases the transition of code Mocking MethodDescriptor, which
breaks in this release.
In some environments DNS is not available and is performed by the
CONNECT proxy. Nothing "special" should need to be done for these
environments, but the previous support took shortcuts which knowingly
would not support such environments.
This change should fix both OkHttp and Netty. Netty's
Bootstrap.connect() resolved the name immediately whereas using
ChannelPipeline.connect() waits until the address reaches the end of the
pipeline. Netty uses NetUtil.toSocketAddressString() to get the name of
the address, which uses InetSocketAddress.getHostString() when
available.
OkHttp is still using InetSocketAddress.getHostName() which may issue
reverse DNS lookups. However, if the reverse DNS lookup fails, it should
convert the IP to a textual string like getHostString(). So as long as
the reverse DNS maps to the same machine as the IP, there should only be
performance concerns, not correctness issues. Since the DnsNameResolver
is creating unresolved addresses, the reverse DNS lookups shouldn't
occur in the common case.
Because "core"'s test source already depends on "testing", e.g.,
`core/src/test/java/io/grpc/internal/ServerCallImplTest.java` uses
`testing/src/main/java/io/grpc/internal/testing/StatsTestUtils.java`,
which forms a circular dependency.
This change moves the StatsContext setter accessors from "core"'s test
source to "testing".
Resolves#2651
This removes an abuse of scheduled executor in ManagedChannelImpl. The executor
was used to avoid deadlocking. Now we run the work on the same thread, but
delay it until locks have been released.
There is no need to fix ManagedChannelImpl2. Due to its different
threading model it didn't have need to abuse the scheduledExecutor.
Fixes#2444
These are only used in internal tests. In production,
StatsContextFactory is loaded by the "Instrumentation" library and must
be one per process, thus we won't allow setting it on a per-channel or
per-server basis.
* Remove else, make it easier to read
* Ready subchanns list will be at max the same as the input list
* Make status private, add pkg-private getter for tests
* Remove empty field, cache size, simplify logic
Make list unmodifiable. Document index with GuardedBy (see #nextSubchannel())
add `getAttributes()` to `ClientStream` and `ClientCall` to be able to share clientTransport
information such as socket TOS with higher lever API's, once the RPC picks up an active transport that is ready to use.
Previously it does it at shutdown, which was wrong because executor may
still be used before the server is terminated.
Resolves#2034
Uses ObjectPool to make this change testable. Cleans up test and makes
it mostly single-threaded, except for two deadlock tests that have to be
multi-threaded.
If a LoadBalancer2 is passed in, the builder will create ManagedChannelImpl2 instead of ManagedChannelImpl. This allows us to test the LBv2 classes on a large scale.
1. Adapt to LoadBalancer2 interface. Channel holds on to a single
DelayedClientTransport2.
2. Lock-free: every channel state mutation, including Subchannel
mutations, calling into LoadBalancer, idleness and shutdown, is made
from channelExecutor.
3. Idleness grace period is no longer needed.
In upstream, Census is renamed to "Instrumentation". `com.google.census` is renamed to `com.google.instrumentation.stats`.
In gRPC, "census" in every name is replaced by "stats".
Besides API changes, this implementation is also up-to-date with the
latest design:
1. Delegate to round-robin and pick-first policies if requested by
the naming system.
2. OOB channels to LoadBalancer always use the LB authority provided by
the naming system.
3. Never send application RPCs to balancer addresses, even if the
address returns UNIMPLEMENTED error.
"Because of spurious wakeups, wait() must always be called in a loop".
Got rid of wait().
"If you return or throw from a finally, then values returned or thrown from the try-catch block will be ignored. Consider using try-with-resources instead."
Ignored the exception thrown from finally.
1. Use ChannelExecutor to run callbacks. Now callbacks are no longer
run under the delayed transport lock.
2. Use explicit picker version instead of relying on identity equality
to detect new pickers. Rationale: if reprocess() is called again with
the same picker, all pending streams will be reprocessed by this picker
again, thus there is no reason to leave out the racing new stream.
* Fork DelayedClientTransport into DelayedClientTransport2, and fix a race
on it. Consider the following sequence:
1. Channel is created. Picker is initially null
2. Channel has a new RPC. Because picker is null, the delayed transport
is picked, but newStream() is not called yet.
3. LoadBalancer updates a new picker to Channel. Channel runs
reprocess() but does nothing because no pending stream is there.
4. newStream() called on the delayed transport.
In previous implementation, newStream() on step 4 will not see the
picker. It will only use the next picker.
After this change, delayed transport would save the latest picker and
use it on newStream(), which is the desired behavior.
Also deleted all the code that will not be used after the LB refactor.
* Also fixed a bug: newStream() should always return a failing stream if it's shutdown. Previously it's not doing so if picker is not null.
OutputStreamAdapter is a private class, and is only ever called by two
places: ByteStreams.copy, which never calls the single byte method, and
DrainTo, which potentially can. There are two classes that implement
DrainTo, which is primarily ProtoInputStream. It calls
MessageLite.writeTo(OutputStream) or back again to ByteStreams.copy.
MessageLite.writeTo in turn wraps the OutputStream in a
CodedOutputStream.OutputStreamEncoder, which then never calls the single
byte version. Thus, all know implementations never call the single
byte override.
Additionally, it is well known that the single byte write is slow, and
is expected to be wrapped in a BufferedOutputStream if there are many
small writes.
- Remove unused variable terminated from TransportListener#transportTerminated
- Do not mention getLock in the javadoc of InUseStateAggregator2#handleNotInUse
FackeClock used PriorityQueue for storing tasks which is not
thread-safe, and caused flake
> io.grpc.internal.ClientCallImplTest > deadlineExceededBeforeCallStarted FAILED
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.PriorityQueue.removeAt(PriorityQueue.java:619)
> at java.util.PriorityQueue.remove(PriorityQueue.java:378)
> at io.grpc.internal.FakeClock$ScheduledTask.cancel(FakeClock.java:89)
> at io.grpc.internal.ClientCallImpl.removeContextListenerAndCancelDeadlineFuture(ClientCallImpl.java:296)
> at io.grpc.internal.ClientCallImpl.start(ClientCallImpl.java:250)
> at io.grpc.internal.ClientCallImplTest.deadlineExceededBeforeCallStarted(ClientCallImplTest.java:615)
Document the threading requirements.
Turn all metrics to volatile because callEnded() may not be synchronized
with other metric-updating methods in the case where the RPC is closed
because of transport error or cancellation from the other side.
Remove precondition checks thus allow metrics updating methods to be
called after callEnded() has been called, which may happen since the
application may not be aware when the RPC is closed by the transport.
This is the first step of a major refactor for the LoadBalancer-related part of Channel impl (#1600). It forks TransportSet into InternalSubchannel and makes changes on it.
What's changed:
- InternalSubchannel no longer has delayed transport, thus will not buffer
streams when a READY real transport is absent.
- When there isn't a READ real transport, obtainActiveTransport() will
return null.
- InternalSubchannel is no longer a ManagedChannel
- Overhauled Callback interface, with onStateChange() replacing the
adhoc transport event methods.
- Call out channelExecutor, which is a serializing executor that runs
the Callback.
The first commit is an unmodified copy of the files that are being forked. Please review the second commit which changes on the forked files.
NoopCensusContextFactory used to use a shared zero-sized ByteBuffer,
which has mutable states like positions and thus is not thread-safe.
Sharing it means it will be used by all calls thus may be used from
different threads, which makes TSAN unhappy in tests.
Resolves#2377
resolvesgrpc/grpc#8715
now that setListener is called prior to
`JumpToApplicationThreadServerStreamListener` being completely ready to
use. We should not call `AbstractStream2#onStreamAllocated()` inside
`setListener()` anymore, but call it after `ServerImpl#streamCreated()`
is completed.
Resolves#1936
Two bugs fixed:
- NPE in `ServerImpl#streamCreated()` when stream listener not set before
stream closed
- It is possible that `internalCancel()` is called during
`InProcessClientStream#start()` due to early server `onComplete()` or server `onError()`,
in this case no need to enlist `streams`, otherwise the channel can not be shutdown by `shutdown()`.
Proxies can trigger errors, and we'd like to categorize common
proxy-generating errors into gRPC status codes for sane application
handling.
Fixes#2264
We only want to use the HTTP code for errors, when the response is not
grpc. grpc status codes may be mapped to HTTP codes in the future, and
we don't want to break when that happens. We also don't want to ever
accidentally use Status.OK without receiving it from the server, even
for HTTP 200.
core: adds @Nullable Object getAttachedObject() to ServiceDescriptor
compiler: Plumbing necessary to access proto file descriptors via
the reflection service
Found this bug in some unit test in which client-streaming call is hanging because `ServerCallImpl#messageRead()` did not handle RuntimeException properly
Highlights
==========
StatsTraceContext
-----------------
The bridge between gRPC library and Census. It keeps track of the total
payload sizes and the elapsed time of a Call. The rest of the gRPC code
doesn't invoke Census directly.
Context propagation
-------------------
StatsTraceContext carries CensusContext (and the upcoming TraceContext)
and is attached to the gRPC Context.
1. StatsTraceContext is created by ManagedChannelImpl, by calling
createClientContext(), which inherits the current CensusContext if available.
2. ManagedChannelImpl passes StatsTraceContext to ClientCallImpl, then
to the stream, then to the framer and deframer explicitly.
3. ClientCallImpl propagates the CensusContext to the headers.
1. ServerImpl creates a StatsTraceContext by implementing a new callback
method StatsTraceContext methodDetermined(MethodDescriptor, Metadata) on
ServerTransportListener.
2. NettyServerHandler calls methodDetermined() before creating the
stream, and passes the StatsTraceContext to the stream.
3. When ServerImpl creates the gRPC Context for the new ServerCall, it
calls the new method statsTraceContext() on ServerStream and puts the
StatsTraceContext in the Context.
Metrics recording
-----------------
1. Client-side start time: when ClientCallImpl is created
2. Server-side start time: when methodDetermined() is called
3. Server-side end time: in ServerStreamListener.closed(), but before
calling onComplete() or onCancel() on ServerCall.Listener.
4. Client-side end time: in ClientStreamListener.closed(), but before
calling onClonse() on ClientCall.Listener
Message sizes are recorded in MessageFramer and MessageDeframer. Both
the uncompressed and wire (possibly compressed) payload sizes are
counted.
TODOs
=====
The CensusContext created from headers on the server side should be
attached to the gRPC Context for the call. It's not done at this moment
because Census lacks the proper API to do it. It only affects tracing
and resource accounting, but doesn't affect stats functionality
Channel state API doesn't allow a TRANSIENT_FAILURE->IDLE edge.
Change TransportSet to always transition to CONNECTING after
TRANSIENT_FAILURE.
This behavior, combined with that it never uses IDLE_TIMEOUT to
transition from READY to IDLE, effectivly makes TransportSet
Channel-state API-compliant under an infinite IDLE_TIMEOUT.
Also set the default IDLE_TIMEOUT to 30min.
Trying to fix issue #2188
- Try to keep avoiding the lock issue #2152 and also to avoid race condition #2188.
- Add `checkState` for `endBackoff()`. Could help hit and identify any potential issue related to #2188.
- Make sure `startBackoff()` and `endBackoff()` invoked in the right order.
- Not to schedule endBackoff if transportSet has been shutdown.
US_ASCII may have not been initialized when the Code enums are created,
causing an NPE and making Status class fail to load:
java.lang.ExceptionInInitializerError
at io.grpc.Status$Code.<init>(Status.java:222)
at io.grpc.Status$Code.<clinit>(Status.java:79)
at io.grpc.internal.GrpcUtilTest.http2ErrorStatus(GrpcUtilTest.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:49)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:230)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:273)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:240)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:65)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:238)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:55)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:231)
at org.junit.runners.ParentRunner.run(ParentRunner.java:316)
at com.google.testing.junit.runner.junit4.CancellableRequestFactory$CancellableRunner.run(CancellableRequestFactory.java:90)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
at com.google.testing.junit.runner.junit4.JUnit4Runner.run(JUnit4Runner.java:112)
at com.google.testing.junit.runner.GoogleTestRunner.runTestsInSuite(GoogleTestRunner.java:197)
at com.google.testing.junit.runner.GoogleTestRunner.runTestsInSuite(GoogleTestRunner.java:174)
at com.google.testing.junit.runner.GoogleTestRunner.main(GoogleTestRunner.java:133)
Caused by: java.lang.NullPointerException
at io.grpc.Status$Code.values(Status.java:75)
at io.grpc.Status.buildStatusList(Status.java:249)
at io.grpc.Status.<clinit>(Status.java:245)
Strangely this only fails GrpcUtilTest.http2ErrorStatus inside google.
Now switch to use Guava Charsets.
Commit 656e8ce (#2208) removed most usages, but missed one and one was
re-added. Since all usages are removed, it should be much easier to
notice regressions.
This change exposed a pre-existing bug where shutdownNow wasn't called
for decommissionedTransports. The bug is fixed and a test added in this
commit.
Fixes#2120
Our API allows pings to be send even after the transport has been shutdown. We currently
don't handle the case, where the Netty channel has been closed but the NettyClientHandler
has not yet been removed from the pipeline, correctly. That is, we need to query the shutdown
status whenever we receive a ClosedChannelException.
Also, some cleanup.
The DefaultHttp2Headers class is a general-purpose Http2Headers implementation
and provides much more functionality than we need in gRPC. In gRPC, when reading
headers off the wire, we only inspect a handful of them, before converting to
Metadata.
This commit introduces a Http2Headers implementation that aims for insertion
efficiency, a low memory footprint and fast conversion to Metadata.
- Header names and values are stored in plain byte[].
- Insertion is O(1), while lookup is now O(n).
- Binary header values are base64 decoded as they are inserted.
- The byte[][] returned by namesAndValues() can directly be used to construct
a new Metadata object.
- For HTTP/2 request headers, the pseudo headers are no longer carried over to
Metadata.
A microbenchmark aiming to replicate the usage of Http2Headers in NettyClientHandler
and NettyServerHandler shows decent throughput gains when compared to DefaultHttp2Headers.
Benchmark Mode Cnt Score Error Units
InboundHeadersBenchmark.defaultHeaders_clientHandler avgt 10 283.830 ± 4.063 ns/op
InboundHeadersBenchmark.defaultHeaders_serverHandler avgt 10 1179.975 ± 21.810 ns/op
InboundHeadersBenchmark.grpcHeaders_clientHandler avgt 10 190.108 ± 3.510 ns/op
InboundHeadersBenchmark.grpcHeaders_serverHandler avgt 10 561.426 ± 9.079 ns/op
Additionally, the memory footprint is reduced by more than 50%!
gRPC Request Headers: 864 bytes
Netty Request Headers: 1728 bytes
gRPC Response Headers: 216 bytes
Netty Response Headers: 528 bytes
Furthermore, this change does most of the gRPC groundwork necessary to be able
to cache higher ordered objects in HPACK's dynamic table, as discussed in [1].
[1] https://github.com/grpc/grpc-java/issues/2217
The cast required in protobuf makes me question how much I like
ReflectableMarshaller, but it seems to be pretty sound and the cast is
more an artifact of generics than the API.
Nano and Thrift were purposefully not updated, since getting just the
class requires making a new message instance. That seems a bit lame. It
probably is no burden to create an instance to get the class, and it may
not be too hard to improve the factory to provide class information, but
didn't want to bother at this point. Especially since nano users are
unlikely to need the introspection functionality.
Metadata.removeAll creates an iterator for looking through removed
values even if the call doens't use it. This change adds a similar
method which doesn't create garbage.
This change makes it easier in the future to alter the internals
of Metadata where it may be expensive to return removed values.
The Context API is not particularly gRPC-specific, and will be used by
Census as its context propagation mechanism.
Removed all dependencies to make it easy for other libraries to depend
on.
Called whenever a ServerTransport is ready and terminated. Has the
ability to modify transport attributes, which ServerCall.attributes()
are based on.
Related changes:
- Attribute keys for remote address and SSL session are now moved from
ServerCall to a neutral place io.grpc.Grpc, because they can also be
used from ServerTransportFilter, and probably will be used on the
client-side too. The old keys on ServerCall is marked deprecated and
are equivalent to the new keys.
- Added transportReady() to ServerTransportListener.
Resolves#2132
This reduces the number of methods gRPC brings in by ~450, which is
substantial. Each application will see different numbers though,
depending on their usage and their other dependencies.
A very rough (under) counting for number of methods included because of
gRPC in android-interop-test is 2746, and that is reduced to 2313 (-433)
by this change. That count includes grpc, guava, okhttp, okio, and nano.
The actual reduction of methods is 447, with the discrepency due to
reduction of methods in java.util and java.lang. Of the 433 removed
methods, 377 are from com.google.common.collect and 61 from
com.google.common.base. The removal costed an increase of 5 methods
(total 1671) within io.grpc itself.
Instead of `List<List<ResolvedServerInfo>>`, `onUpdate` now takes
`List<ResolvedServerInfoGroup>` as an argument and every `ResolvedServerInfoGroup`
object can have `Attributes` attached to it which means that we can provide
attributes on each level:
* root level via `onUpdate` argument (applies to all servers)
* group level via property of `ResolvedServerInfoGroup` (applies to all servers
in the group)
* host level via property of `ResolvedServerInfo` (applies to a single server)
CodedInputStream is risk averse in ways that hurt performance when
parsing large messages. gRPC knows how large the input size is as it
is being read from the wire, and only tries to parse it once the entire
message has been read in. The message is represented as chunks of
memory strung together in a CompositeReadableBuffer, and then wrapped
in a custom BufferInputStream.
When passed to Protobuf, CodedInputStream attempts to read data out
of this InputStream into CIS's internal 4K buffer. For messages that
are much larger, CIS copies from the input in chunks of 4K and saved in
an ArrayList. Once the entire message size is read in, it is re-copied
into one large byte array and passed back up. This only happens for
ByteStrings and ByteBuffers that are read out of CIS. (See
CIS.readRawBytesSlowPath for implementation).
gRPC doesn't need this overhead, since we already have the entire
message in memory, albeit in chunks. This change copies the composite
buffer into a single heap byte buffer, and passes this (via
UnsafeByteOperations) into CodedInputStream. This pays one copy to
build the heap buffer, but avoids the two copes in CIS. This also
ensures that the buffer is considered "immutable" from CIS's point of
view.
Because CIS does not have ByteString aliasing turned on, this large
buffer will not accidentally be kept in memory even if only tiny fields
from the proto are still referenced. Instead, reading ByteStrings out
of CIS will always copy. (This copy, and the problems it avoids, can
be turned off by calling CIS.enableAliasing.)
Benchmark results will come shortly, but initial testing shows
significant speedup in throughput tests. Profiling has shown that
copying memory was a large time consumer for messages of size 1MB.
After debugging #2153, it would have been nice to know what the exact
parameter was that was null. This change adds a name for each
checkNotNull (and tries to normalized on static imports in order to
shorten lines)
io.grpc should not be depending on anything from internal. Also, the
convenience method of Deadline is part of our public API and shouldn't
use LogExceptionRunnable because it would surprise our users.
Swapped to lower-case 'log' since the logger is not immutable.
Implementations of ManagedClientTransport.start() are restricted from
calling the passed listener until start() returns, in order to avoid
reentrency problems with locks. For most transports this isn't a
problem, because they need additional threads anyway. InProcess uses no
additional threads naturally so ends up needing a thread just to
notifyReady. Now transports can just return a Runnable that can be run
after locks are dropped.
This was originally intended to be a performance optimization, but the
thread also causes nondeterminism because RPCs are delayed until
notifyReady is called. So avoiding the thread reduces needless fakes
during tests.
`ClientTransport.newStream()` and
`CallCredentials.applyRequestMetadata()` is now called under the context
of the call. This can be used to pass any call-specific information to
`CallCredentials`.
To my knowledge, there has been just a single DeadlineTest flake since
the code was fixed to avoid issues with I/O due to class loading:
io.grpc.DeadlineTest > defaultTickerIsSystemTicker[0] FAILED
java.lang.AssertionError: <-21431071 ns from now> and <0 ns from now> should have been within <20000000ns> of each other
But we don't really need fine-grained verification during the test
though; if the code is not using nanoTime, then it is almost certainly
not going to have even a day of accuracy (except on a fresh VM). So
checking for a second of accuracy vs 20ms shouldn't really be an issue.
conscrypt at some point which would allow ALPN to function
Clarify the SSLContext.getDefault is not used when constructing the
default SSLSocketFactory.
Metadata has been passed to the application. The application may be
modifying Metadata concurrently, so we must not access Metadata after
that point.
Fixes#1947
Creates a KeepAliveManager which should be used by each transport. It does keepalive pings and shuts down the transport if does not receive an response in time.
It prevents the connection being shut down for long lived streams. It could also detect broken socket in certain platforms.
Resolves#1276
Idle mode is where the channel does not keep live connections, and does
not have running NameResolver and LoadBalancer.
TransportSet aggregates the in-use state of transports, including the
delayed transport and real transports. Channel aggregates the in-use
state of TransportSets and delayed tranports.
Channel starts in idle mode. It exits idle mode if one of the following
occurs:
1. A new Call requests for a transport.
2. The channel's in-use state turns to true.
3. Someone calls exitIdleMode().
Channel enters the idle mode if its in-use state has been false for the
configured timeout (disabled by default). It shuts down all
TransportSets, NameResolver and LoadBalancer. Interim transports and OOB
transports are LoadBalancer's responsibility.
There is a race that could cause annoyance if IDLE_TIMEOUT was too
small (e.g., 0). A TransportSet's delayed transport is holding streams,
which keeps its in-use state in true. When a real transport is ready,
all streams are transferred to the real transport, immediately after
which the delayed transport's in-use state turns to false, while the
real transport's in-use state may have not turned to true, because some
transport (e.g. netty) may have a brief delay between newStream() being
called and the stream being created internally. This could cause the
channel's aggregated in-use state be in false for a brief time, if which
is longer than IDLE_TIMEOUT, could make channel go to idle mode. Even
though the channel would go back to non-idle again, idle mode would
shutdown all transports and NameResolver and LoadBalancer which leads to
spurious error in the application.
We minimize the chance of such race by setting the minimum timeout to 1
second.
Related chanes:
- ManagedChannelImplTest now switched to use fake executors.
- Turn a few anonymous runnables into named classes. This is more useful for debugging.
Commit 5487ea8f54 unintentionally fixed a race condition in
shutdownNow, but the race was detected when being used inside Google. This could have
been avoided by the enforceable documentation of GaurdedBy annotations. These make it
clear what locks should be held, promote documentation for newly added server state,
and can be automatically checked by static analysis.
It was withCallCredentials on the stub to avoid confusion with channel
credentials (which don't exist in Java at this time, but do in other
languages), and having the method names be different doesn't add value.
first step to address issue #1469:
- leave and deprecate interfaces in codegen
- introduce `ServiceImplBase`,
- `AbstractService` is deprecated and extends `ServiceImplBase`
- static `bindService()` is deprecated
Resolves#1756
The thread-unsafe method `io.grpc.testing.TestUtils.pickUnusedPort` causes flakes (#1756) in windows. Need to avoid use of this method in test as in windows the tests are running in different jvms and concurrent calls of this method in multiple processes tend to return the same port number.
There are some usages of this method in benchmarks, so moved the method to `io.grpc.benchmarks.Utils` and the method will only be used in benchmarks and not in test.
onReady/isReady previously could disagree causing a sort of deadlock
where the application isn't sending because grpc said not to, but won't
be informed to send via onReady later.
This is a stack trace from inprocessTransportOutboundFlowControl. The
line numbers are from this commit but with the changes to DelayedStream
disabled:
at io.grpc.internal.DelayedStream.isReady(DelayedStream.java:306)
(That is isReady returning false because fallThrough == false)
at io.grpc.internal.ClientCallImpl.isReady(ClientCallImpl.java:382)
at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.isReady(ClientCalls.java:289)
at io.grpc.stub.ClientCallsTest$8$1.run(ClientCallsTest.java:403)
(And yet that was the onReady callback, and it won't be called again)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onReady(ClientCalls.java:377)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$4.runInContext(ClientCallImpl.java:481)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:52)
at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:65)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.onReady(ClientCallImpl.java:478)
at io.grpc.internal.DelayedStream$DelayedStreamListener.onReady(DelayedStream.java:366)
at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.request(InProcessTransport.java:284)
at io.grpc.internal.ServerCallImpl.request(ServerCallImpl.java:99)
at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.request(ServerCalls.java:345)
at io.grpc.stub.ClientCallsTest.inprocessTransportOutboundFlowControl(ClientCallsTest.java:432)
Fixes#1932
Over-specifying List prevents fewer things to be passed in and makes it
less efficient to use a Map later. We definitely don't want people
extending this class.
Not all users are triggering the loading of ManagedChannelProvider, so
avoid the unnecessary loading. Also, since class initialization is
involved, exception handling tends to get strange and hard to diagnose
issues.
This reverts commit 3df1446deb.
The commit was adding to the difficulty of integration for testing. By
itself it isn't bad, so this is a temporary revert until the many other
commits are absorbed and then it will be reapplied.
This does have a manual edit for ClientCallsTest.