This aligns with shutdownNow(), which is already accepting a status.
The status will be propagated to application when RPCs failed because
of transport shutdown, which will become useful information for debug.
This commit aligns the naming of the Bazel Maven jars with the names
used by Bazel's migration-tooling project:
https://github.com/bazelbuild/migration-tooling
Unfortunately, we can't fix @com_google_protobuf_java because it's
required by Bazel itself.
Fixes#3328
null class loader means to use the bootstrap class loader, which would
normally make sense. However, Robolectric tests run on OpenJDK and
provide the Android environment as part of the application, so "Android"
won't be present in the bootstrap class loader.
This is a big, but mostly mechanical change. The newly added Test*StreamTracer classes are designed to be extended which is why they are non final and have protected fields. There are a few notable things in this:
1. verifyNoMoreInteractions is gone. The API for StreamTracers doesn't make this guarantee. I have recovered this behavior by failing duplicate calls. This has resulted in a few bugs in the test code being fixed.
2. StreamTracers cannot be mocked anymore. Tracers need to be thread safe, which mocks simply are not. This leads to a HUGE number of reports when trying to find real races in gRPC.
3. If these classes are useful, we can promote them out of internal. I just put them here out of convenience.
If onTransportActive ran while SendPing was already scheduled, we would
schedule another SendPing, which seems fine, but the server might observe
us sending pings too quickly, and make us GOAWAY.
Fixes#3274.
As discussed in #1914, we need CallCredentials and MoreCallCredentials
to be stable, but there's less of a strong need for the contents of
CallCredentials to be stable. We're willing to commit to the name,
without needing to commit to the plumbing.
Coupled with the similar change on server-side, this removes the need for a
thread when using Netty. For InProcess and OkHttp, it would allow us to let the
user to provide the scheduler for tests or application-wide thread sharing.
This is only implementing how subchannel state changes are hooked up to channel state APIs. Debates in state semantics and transition rules for loadbalanced channel such as #2873 are not addressed. Any changes in state semantics and transition rules in the future can be easily adapted.
For Netty, this reduces the number of threads necessary for servers (although
until channel is converted, actual number of threads isn't impacted) and
naturally reduces contention and timeout latency.
For InProcess, this gets us closer to allowing applications to provide all
executors, which is especially useful during tests.
Class.forName(String) is understood by ProGuard, removing the need for
manual ProGuard configuration and allows ProGuard to rename the provider
classes. Previously the provider classes could not be renamed.
Fixes#2633
Previously anyone could create metadata keys with a leading ":", due to not having a way to prevent it. This change effectively only allows internal code to make use of this.
* refactor SubchannelImpl -> AbstractSubchannel
The old `SubchannelImpl` is actually not implementing anything, and the old `SubchannelImplImpl` has a weird name.
currentThread().getContextClassLoader() was originally used to match
ServiceLoader's default. ServiceLoader is mostly used with SPIs that
exist in the bootclasspath and so the class's ClassLoader would not
work. But in gRPC we have a library in the application's ClassLoader.
Context ClassLoader has been causing problems in any case more than one
application ClassLoader exists, like in Servlet containers, Android, and
plugins. To my knowledge, in every case the context class loader is
different from the provider's class loader, the context class loader has
caused breakage, not success. In addition, since we use static fields
for storing the provider results, using anything but the provider's
class loader is asking to have results that vary simply depending on
which thread called gRPC first.
Fixes#2375
It is misnamed if it is @ExperimentalApi. Even being @Internal is
strange, since nothing in io.grpc actually uses it and so it could
actually be in io.grpc.internal.
I'm not trying to fix the API, but just make it slightly more
consistent.
Previously ClientCallImpl's stream listener would call
stream.setDecompressor(), but this has always been a bit strange as the
only case where the call listener calls the stream and forms a bit of a
loop. It also turned out to be racy in the presence of
DelayedClientStream since DelayedClientStream does not guarantee that
the listener has processed before returning.
Now we let the stream handle decompressor selection itself. Compressor
selection on client and server and decompressor selection on server
remain unchanged. Nothing prevents them from being changed, other than
it is currently unnecessary to fix the severe compressionTest flake.
Fixes#2865Fixes#2157
Bazel third party dependencies are specified in repositories.bzl which
gives the consumer the ability to opt-out of any dependencies they use
directly in their own project.
Fixes#2756
This adds server-wide interceptors that applies to all call handlers.
Because ServerCallHandler is acquired per request, and can be dynamicly
provided by the fallback registry, the interceptors have to be installed
on a per-request basis. This adds a few object allocations per request,
which is acceptable.
Today JumpToApplicationThreadServerStreamListener leaks server state by transmitting details about uncaught StatusRuntimeException throwables to the client. This is a security problem.
This PR ensures that uncaught exceptions always close the ServerCall without leaking any state information. Users running in a trusted environment who want to transmit error details can install the TransmitStatusRuntimeExceptionInterceptor.
fixes#2189
Interning strings isn't needed much any more since keys are now
compared by byte value. Also, intern is slow (about 150ns per)
which affects code that creates keys often. Lastly, older
versions of Java don't garbage collect interned strings, which
lowers the applications stability.
The current implementation has a bug where certain methods are not forwarded to the delegate.
This is essentially the same as e4f1f39 which was merged to the v1.4.x branch. This PR uses the new license header.
Fixes#3061
The current check in ServerCallImpl is theoretically unsafe (#3059). Move that check into the stub, and expand the unit tests to cover other interesting edge cases on the server side:
client sends one, but zero requests received at onHalfClose
client sends one, but > 1 requests received at onHalfClose
server sends one, but zero responses sent at onComplete
server sends one, but > 1 responses sent via onNext
fixes#2243fixes#3059
Add some example tests for easier fields in AbstractManagedChannelImplBuilder.
Many fields are no longer Nullable, in order to move logic from construction to
mutation, which eases testing and simplifies cross-class interactions.
The nameResolverFactory comment starting "Avoid loading the provider unless
necessary" was outdated and has not been a concern since #2071 which swapped to
a hard-coded list on Android.
We don't want people to be desensitized to the Internal annotation.
NameResolverProvider should have probably been ExperimentalApi from the
start, but it was copied from ManagedChannelProvider.
Theoretically, ManagedChannelProvider could be marked ExperimentalApi as
well, but there are no known use cases for doing so since making an
alternative ManagedChannel implementation is quite difficuilt and we
aren't aware of anybody interested in doing so. That isn't the case for
NameResolverProvider; NameResolverProvider is core to 'targets' being
useful.
Calling CancellableContext#close() multiple times does not cause problems, but is unnecessary here. The return value of getListener() is a ServerStreamListenerImpl, which already calls CancellableContext#cancel() inside of the finally block of ServerStreamListener#closed().
When a cancellation happens, the ServerCall and Context get notified. Rather than serializing on the normal work queue (which may be doing user computation), we should execute the notification immediately, thereby allowing the user computation to see the cancellation.
This allows avoiding re-creating connections unnecessarily. For
NameResolvers that support notifications this avoids stampeding herds
when possible, since all clients may receive the notification at
approximately the same time.
This allows avoiding re-creating connections unnecessarily. For
NameResolvers that support notifications this avoids stampeding herds
when possible, since all clients may receive the notification at
approximately the same time.
AbstractManagedChannelImplBuilder accepts an overrideAuthority parameter, but this value is not hooked up to the name resolver object. Ultimately, Channel.authority consults with the NameResolver, so the overrideAuthority should be hooked into the NameResolverFactory, while all other functionality should be preserved.
Also, add unit tests for all the variants of OkHttpChannelBuilder and NettyChannelBuilder constructors, namely to test the slightly different NettyChannelBuilder(SocketAddress) code path.
Fixes#2682
Static mutable flags are evil. Turned them to options on channel
builder. Also restore the local stats recording by default, because
these flags were added with the concern of wire-compatibility, but not
local feature.
Main implementation is in CensusTracingModule.
Also a few fix-ups in the stats implementation CensusStatsModule:
- Change header key name from grpc-census-bin to grpc-tags-bin
- Server does not fail on header parse errors. Uses the default instead.
Protect Census-based stats and tracing with static flags: `GrpcUtil.enableCensusStats` and `GrpcUtil.enableCensusTracing`. They keep those features disabled by default until they, especially their wire formats, are stabilized.
We aren't upgrading yet, because we don't want to begin using the new
Mockito APIs. But all the tests now pass with the newer version. There
are a lot of warnings that can't be fixed until we bump the mockito
version.
We previously limited at 1 minute because there was no server-side
enforcement. Now that there is server-side enforcement, it is safer to
allow clients to choose a lower value, since the service owner still has
to permit it.
Per spec this metric should be calculated on the server and sent back to
the client, for which the mechanism is not currently defined. As it's
not a required metric, we remove the incorrect implementation for now.
Internal ref: b/37208451
Background
==========
LoadBalancer needs to track RPC measurements and status for
load-reporting. We need to introduce a "Tracer" API for that.
Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.
Alternatives
============
We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.
Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.
Part of the LB-related information is whether the client has received
the initial headers from the server. While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.
Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API. Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.
Summary of changes
==================
API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.
On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.
Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor. By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call. This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.
On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.
The Tracer API supports propagation of stats/tracing information through
Context and metadata. Both client-side and server-side tracer factories
have access to the headers object. Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.
Implementation details
----------------------
Only real streams report stats. Pseudo streams such as delayed stream,
failing stream don't report. InProcess transport streams currently
don't report stats.
"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream. On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.
The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.
Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.
Bug fixed
----------------
The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong. Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK. Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.
Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK). Whichever happens
first is the reported status.
TODOs
=====
A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
To be in line with `NettyServerBuilder` APIs
- Deprecated `enableKeepAlive(boolean enable)` and
`enableKeepAlive(boolean enable, long keepAliveDelay, TimeUnit delayUnit, long keepAliveTimeout,
TimeUnit timeoutUnit)`
which never worked in v1.2
- Added `keepAliveTime(long keepAliveTime, TimeUnit timeUnit)` and
`keepAliveTimeout(long keepAliveTimeout, TimeUnit timeUnit)`
executor.schedule() will "eat" any exceptions thrown by the Runnables,
because the Future is expected to be used to see them. However, we never
call get() on the Future, so we need to just the exceptions like we do
elsewhere in this case.
We got a NullPointerException from ClientCallImpl#startDeadlineTimer
when a new Call is created after a Netty channel is terminated. Here
is a stacktrace:
INTERNAL: java.lang.NullPointerException
at io.grpc.internal.ClientCallImpl.startDeadlineTimer(ClientCallImpl.java:320)
at io.grpc.internal.ClientCallImpl.start(ClientCallImpl.java:253)
The following code snippet reproduces the bug:
```
ManagedChannel channel = NettyChannelBuilder.forAddress(host, port)
.usePlaintext(true)
.build();
channel.shutdown();
Thread.sleep(1000);
GreeterGrpc.GreeterBlockingStub stub =
GreeterGrpc.newBlockingStub(channel)
.withDeadlineAfter(10, TimeUnit.SECONDS);
stub.sayHello(HelloRequest.newBuilder().setName("world").build());
```
The issue was that ClientCallImpl is created from RealChannel#newCall
*after* ManagedChannelImpl#maybeTerminateChannel is called and
scheduledExecutor is set to null. In such a scenario,
deadlineCancellationExecutor is set to null.
I think there are several ways to fix this, but one way would be to
just avoid calling startDeadlineTimer() when
deadlineCancellationExecutor is null. DelayedClientTransport will
create a FailingClientStream with Status.UNAVAILABLE and we will get
```
Exception in thread "main" io.grpc.StatusRuntimeException:
UNAVAILABLE: Channel has shutdown (reported by delayed transport)
```
This removes a needless warning, and isn't much slower. Also this
includes a benchmark for StatsTraceContext to measure the overhead
for creation. It adds about 40ns per RPC. Optimization will come
after structural changes are made to break the dependency on
Census.
Resolves#2716
- Add attributes to EquivalentAddressGroup
- Deprecate ResolvedServerInfoGroup by EquivalentAddressGroup
- Deprecate ResolvedServerInfo, because attributes for a single address
with an address group is not found to be useful.
- The changes on the NameResolver and LoadBalancer interfaces are backward-compatible
in the next release, with which implementors can switch to the new API smoothly.
As a related change, redefine the semantics of DnsNameResolver and
RoundRobinLoadBalancer:
- Before: DnsNameResolver returns all addresses in one address group.
RoundRobinLoadBalancer ignores the grouping of addresses and
round-robin on every single addresses. It doesn't work well with the
one-server-multiple-address setup, e.g., both IPv4 and IPv6 addresses
are returned for a single serve, even if they are put in the same
address group by the NameResolver.
- After: DnsNameResolver returns every address in its own
EAG. RoundRobinLoadBalancer takes an EAG as a whole, and only
round-robin on the list of EAGs. The new behavior is a better
interpretation of the EAGs, and really allows the case where one
server has more than one addresses (e.g., IPv4 and IPv6).
This change will affect users that use custom LoadBalancer with the
stock DnsNameResolver, and those who use custom NameResolver with the
stock RoundRobinLoadBalancer.
Users who use both the stock DnsNameResolver and RoundRobinLoadBalancer
or PickFirstBalancer will see no behavioral change. Because they will
still round-robin on individual addresses from DNS, or do pick-first on
all addresses from DNS (PickFirstBalancer flattens all addresses).
The result is a simpler API and reduction of boilderplates.
`keepAlivedManager#onTransportshutdown` should not be called in `transport.shutdown()` because it is possible that there are still open RPC streams, and maybe inactive, so keepalive is still needed.
Preparing to support server side keepalive.
For the convience on server side, not to use Ping `onSuccess()` callback to cancle shutdownFuture any more, instead, regard `onDataReceived()` as ping Ack and cancel shutdownFuture in it.
The balancer service attaches a token string for each Server entry it
sends to the client. The client has to set the token to the "lb-token"
header when assigning that Server entry to an RPC.
For convenience of testing, also implemented hashCode() and equals() for
PickResult.
The build (jdk version jdk1.8.0_91) produces some noise
```
/usr/local/google/home/zdapeng/git/grpc-java/core/src/main/java/io/grpc/LoadBalancer.java:196: warning - Tag @link: can't find pickSubchannel(PickSubchannelArgs) in io.grpc.LoadBalancer.SubchannelPicker
/usr/local/google/home/zdapeng/git/grpc-java/core/src/main/java/io/grpc/LoadBalancer.java:388: warning - Tag @link: can't find pickSubchannel(
PickSubchannelArgs) in io.grpc.LoadBalancer.SubchannelPicker
/usr/local/google/home/zdapeng/git/grpc-java/core/src/main/java/io/grpc/LoadBalancer.java:178: warning - Tag @link: can't find pickSubchannel(PickSubchannelArgs) in io.grpc.LoadBalancer.SubchannelPicker
3 warnings
:grpc-core:javadocJar
```
It seems nothing wrong with the javadoc and it could be a javac's bug, but here's a workaround.
fix JavaStyle and ErrorProne warnings found in internal weekly import:
- Calls to ExpectedException#expect should always be followed by exactly one statement.
- Do not mock 'java.util.concurrent.Future'
handleResolvedAddresses constructs subchannels for each address in
an EquivalentAddressGroup, sharing a reference to an immutable
Atrributes. However, as noted, this Attributes instance contains
a mutable AtomicReference, eventually causing subchannel state
changes to be improperly reflected in *all* subchannels of an
EquivalentAddressGroup.
Guava 20 introduced some overloading optimizations for Preconditions
that require using Guava 20+ at runtime. Unfortunately, Guava 20 removes
some things that is causing incompatibilities with other libraries, like
Cassandra. While the incompatibility did trigger some of those libraries
to improve compatibility for newer Guavas, we'd like to give the
community more time to work through it. See #2688
At this commit, we appear to be compatible with Guava 18+. It's not
clear if we want to actually "support" 18, but it did compile. Guava 17
doesn't have at least MoreObjects, directExecutor, and firstNotNull.
Guava 21 compiles without warnings, so it should be compatible with
Guava 22 when it is released.
One test method will fail with the upcoming Guava 22, but this won't
impact applications. I made MoreThrowables to avoid using any
known-deprecated Guava methods in our JARs, to reduce pain for those
stuck with old versions of gRPC in the future (July 2018).
In the stand-alone Android apps I removed unnecessary explicit deps
instead of syncing the version used.
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.
This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
Futures almost universally should be handled in some way when being
returned, either to receive the value or to cancel scheduled tasks to
prevent leaks.
Netty is a bit of a special case though, since it constantly returns
futures that you ignore (even adding a listener returns the "this"
future). So we want to suppress the warning for code using Netty instead
of trying to fix it. When we enable ErrorProne in the build, we should
start passing -Xep:FutureReturnValueIgnored:OFF in the compilerArgs.
The new plugin uses a newer version of animalsniffer, allows overriding
the animalsniffer version used, and has up-to-date handling. The
up-to-date handling cuts fully incremental parallel build times in half,
from 5.5s to 2.7s.
The previous plugin was supposed to be verifying tests. However, either
it wasn't verifying them or its verification was broken.
The LBv2 setter was added in #2658 as an abstract method, which breaks
anyone who implements ManagedChannelBuilder. Here we add a default
implementation that throws.
This is needed because in interceptor tests, often the types cannot
be changed. The void methods stay for users who are writing tests
where they actually don't care about types. The noop methods
require types to be specified. This is for users who don't care
about the implementation. These represent different levels of
commitment.
This eases the transition of code Mocking MethodDescriptor, which
breaks in this release.
In some environments DNS is not available and is performed by the
CONNECT proxy. Nothing "special" should need to be done for these
environments, but the previous support took shortcuts which knowingly
would not support such environments.
This change should fix both OkHttp and Netty. Netty's
Bootstrap.connect() resolved the name immediately whereas using
ChannelPipeline.connect() waits until the address reaches the end of the
pipeline. Netty uses NetUtil.toSocketAddressString() to get the name of
the address, which uses InetSocketAddress.getHostString() when
available.
OkHttp is still using InetSocketAddress.getHostName() which may issue
reverse DNS lookups. However, if the reverse DNS lookup fails, it should
convert the IP to a textual string like getHostString(). So as long as
the reverse DNS maps to the same machine as the IP, there should only be
performance concerns, not correctness issues. Since the DnsNameResolver
is creating unresolved addresses, the reverse DNS lookups shouldn't
occur in the common case.
Because "core"'s test source already depends on "testing", e.g.,
`core/src/test/java/io/grpc/internal/ServerCallImplTest.java` uses
`testing/src/main/java/io/grpc/internal/testing/StatsTestUtils.java`,
which forms a circular dependency.
This change moves the StatsContext setter accessors from "core"'s test
source to "testing".
Resolves#2651
This removes an abuse of scheduled executor in ManagedChannelImpl. The executor
was used to avoid deadlocking. Now we run the work on the same thread, but
delay it until locks have been released.
There is no need to fix ManagedChannelImpl2. Due to its different
threading model it didn't have need to abuse the scheduledExecutor.
Fixes#2444
These are only used in internal tests. In production,
StatsContextFactory is loaded by the "Instrumentation" library and must
be one per process, thus we won't allow setting it on a per-channel or
per-server basis.
* Remove else, make it easier to read
* Ready subchanns list will be at max the same as the input list
* Make status private, add pkg-private getter for tests
* Remove empty field, cache size, simplify logic
Make list unmodifiable. Document index with GuardedBy (see #nextSubchannel())
add `getAttributes()` to `ClientStream` and `ClientCall` to be able to share clientTransport
information such as socket TOS with higher lever API's, once the RPC picks up an active transport that is ready to use.
Previously it does it at shutdown, which was wrong because executor may
still be used before the server is terminated.
Resolves#2034
Uses ObjectPool to make this change testable. Cleans up test and makes
it mostly single-threaded, except for two deadlock tests that have to be
multi-threaded.