Commit Graph

1904 Commits

Author SHA1 Message Date
Eric Anderson 02deb9c1a1 core: Remove unused mocks which broke @DoNotMock 2017-04-10 14:19:38 -07:00
ZHANG Dapeng 81da785f75 netty: add jitter to max connection age 2017-04-10 13:38:47 -07:00
ZHANG Dapeng 83a06cc1a5 netty: implement server max connection age 2017-04-10 11:32:28 -07:00
Kun Zhang 44cca5507d core: remove incorrect reporting of CLIENT_SERVER_ELAPSED_TIME. (#2891)
Per spec this metric should be calculated on the server and sent back to
the client, for which the mechanism is not currently defined. As it's
not a required metric, we remove the incorrect implementation for now.

Internal ref: b/37208451
2017-04-10 11:25:25 -07:00
Kun Zhang 770b7e0f81 doc: document that channel state is not implemented. (#2890) 2017-04-10 11:21:55 -07:00
Eric Anderson cfc6634650 netty: Pass boolean to builder instead of permit/deny specialized naming
While we can use permit/deny in this one case, it isn't generalizable to
other cases. In order to avoid always questioning how to deal with
boolean config options, just pass the boolean in all cases.

This mirrors what is being done with the client-side's
keepAliveWithoutCalls.

These methods were very recently added, so there is a low risk of
breakage.
2017-04-10 10:34:03 -07:00
Kun Zhang c4e615cd28 core: allow SubchannelPicker to return a StreamTracer factory. (#2882)
This allows LoadBalancers to trace the activities, including the final
status of the stream that is created as a result of the pick.
2017-04-10 10:31:45 -07:00
Kun Zhang 903197b2aa core: StreamTracer (#2863)
Background
==========

LoadBalancer needs to track RPC measurements and status for
load-reporting.  We need to introduce a "Tracer" API for that.

Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.

Alternatives
============

We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.

Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.

Part of the LB-related information is whether the client has received
the initial headers from the server.  While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.

Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API.  Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.

Summary of changes
==================

API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.

On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.

Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor.  By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call.  This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.

On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.

The Tracer API supports propagation of stats/tracing information through
Context and metadata.  Both client-side and server-side tracer factories
have access to the headers object.  Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.

Implementation details
----------------------

Only real streams report stats.  Pseudo streams such as delayed stream,
failing stream don't report.  InProcess transport streams currently
don't report stats.

"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream.  On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.

The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.

Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.

Bug fixed
----------------

The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong.  Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK.  Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.

Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK).  Whichever happens
first is the reported status.

TODOs
=====

A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
2017-04-07 11:03:24 -07:00
Eric Anderson 4236027713 netty: Add config for server keepalive enforcement
Now that there is a config, the new defaults are now being enabled.
Previously there were no default limits. Now keepalives may not be more
frequent than every 5 minutes and only when there are outstanding RPCs.
2017-04-06 15:48:33 -07:00
Eric Anderson ebd2f2d2f7 android: Bump android build plugin version to 2.3.1
This cleans up some deprecation warnings from Gradle and cuts full build
time in half.
2017-04-06 15:36:47 -07:00
Eric Anderson 810b2d0b96 all: Update to gradle 3.4.1
Among build speed improvements and VS 2015 support, it also improves
quote handling in gradlew.
2017-04-06 15:36:47 -07:00
Eric Anderson 3818087aa4 netty: Handle channel creation failure
Something "very bad" has happened, but without grpc propagating the
cause from the Future it is very difficult to figure out what.

Fixes #2296
2017-04-06 11:11:49 -07:00
Kun Zhang 123bb315e9 grpclb: skip picker updates that have no effect (#2876)
Each time helper.updatePicker() is called, the Channel will re-process
all pending streams with the new picker.  If the old picker is
equivalent to the old one, it's wasteful.

This is also needed to make our internal integration test easier.
Because the load-balancer may send address list that is identical to the
previous one, just to update the TTL.  Without this change, new picker
replaces the old picker even if they carry the same list, which
effectively resets the round-robin pointer.  This causes a little
imbalance between test backends, resulting in test failure.
2017-04-05 09:43:05 -07:00
ZHANG Dapeng 1c1864be73 netty: refactor NettyChannelBuilder keepalive API (#2874)
To be in line with `NettyServerBuilder` APIs
- Deprecated `enableKeepAlive(boolean enable)` and
`enableKeepAlive(boolean enable, long keepAliveDelay, TimeUnit delayUnit, long keepAliveTimeout, 
TimeUnit timeoutUnit)`
which never worked in v1.2

- Added `keepAliveTime(long keepAliveTime, TimeUnit timeUnit)` and
`keepAliveTimeout(long keepAliveTimeout, TimeUnit timeUnit)`
2017-04-04 18:19:41 -07:00
Eric Anderson 90788305a3 netty: Add server keepalive enforcement
Everything is currently permitted, but I've tested with other
configurations and all tests pass. I'll set the restrictive default at
the same time as adding a configuration API.
2017-04-04 16:47:42 -07:00
Eric Anderson f9eb545df0 netty: Fix client keepalive initialization (again)
d116cc9 fixed the NPE, but the initialization of the manager happened
_after_ newHandler() was called, so a null manager was passed to the
handler.

Fixes #2828
2017-03-31 17:21:33 -07:00
ZHANG Dapeng c4bbe66506 netty: expose server side keepalive API
expose server side keepalive API in NettyServerBuilder
2017-03-31 10:35:03 -07:00
Carl Mastrangelo 824e5df5cf benchmarks: use JMH 1.18 2017-03-30 14:52:22 -07:00
Eric Anderson 4096d4b668 core,netty: support GET verb in AbstractClientStream2 2017-03-30 14:18:14 -07:00
Eric Anderson d4c9d5f087 core: Wrap keepalive runnables with exception logging
executor.schedule() will "eat" any exceptions thrown by the Runnables,
because the Future is expected to be used to see them. However, we never
call get() on the Future, so we need to just the exceptions like we do
elsewhere in this case.
2017-03-30 14:10:53 -07:00
Eric Anderson 0bcf921e20 core: Remove internal comment referencing compression frames
Compression frames existed in a very early iteration of the gRPC
protocol. It was killed long ago.
2017-03-30 14:10:29 -07:00
Eric Anderson 075b5ecddd services: Remove unused variables 2017-03-30 13:41:32 -07:00
Eric Anderson 0d498fbb95 core: Fix User-Agent Javadoc in ManagedChannelBuilder
The behavior of how the application's User-Agent is used changed in
2247ad2. But the Javadoc was not updated.
2017-03-30 13:34:26 -07:00
ZHANG Dapeng 8114b93113 netty: Server side keep alive
use KeepAliveManager in NettyServerHandler
2017-03-30 09:24:04 -07:00
Carl Mastrangelo a4d698f7c1 core: make SerializingExecutor lockless (#2858)
This is an alternative implementation of #2192
2017-03-29 17:57:42 -07:00
Kenji Kaneda c131d2dd14 core: Do not call startDeadlineTimer when is deadlineCancellationExecutor is null
We got a NullPointerException from ClientCallImpl#startDeadlineTimer
when a new Call is created after a Netty channel is terminated. Here
is a stacktrace:

INTERNAL: java.lang.NullPointerException
at io.grpc.internal.ClientCallImpl.startDeadlineTimer(ClientCallImpl.java:320)
at io.grpc.internal.ClientCallImpl.start(ClientCallImpl.java:253)

The following code snippet reproduces the bug:

```
ManagedChannel channel = NettyChannelBuilder.forAddress(host, port)
    .usePlaintext(true)
    .build();
channel.shutdown();

Thread.sleep(1000);

GreeterGrpc.GreeterBlockingStub stub =
GreeterGrpc.newBlockingStub(channel)
    .withDeadlineAfter(10, TimeUnit.SECONDS);
stub.sayHello(HelloRequest.newBuilder().setName("world").build());
```

The issue was that ClientCallImpl is created from RealChannel#newCall
*after* ManagedChannelImpl#maybeTerminateChannel is called and
scheduledExecutor is set to null. In such a scenario,
deadlineCancellationExecutor is set to null.

I think there are several ways to fix this, but one way would be to
just avoid calling startDeadlineTimer() when
deadlineCancellationExecutor is null. DelayedClientTransport will
create a FailingClientStream with Status.UNAVAILABLE and we will get

```
Exception in thread "main" io.grpc.StatusRuntimeException:
UNAVAILABLE: Channel has shutdown (reported by delayed transport)
```
2017-03-28 16:31:29 -07:00
Carl Mastrangelo 6765596bb9 core: add @since annotations to MethodDescriptor 2017-03-24 16:05:57 -07:00
Carl Mastrangelo aaf9067e0c all: fix gradle nag for deprecated leftshift operator 2017-03-23 18:02:35 -07:00
Carl Mastrangelo 7a73bf1068 core,benchmarks: use Atomics for StatsTraceContext
This removes a needless warning, and isn't much slower.  Also this
includes a benchmark for StatsTraceContext to measure the overhead
for creation.  It adds about 40ns per RPC.  Optimization will come
after structural changes are made to break the dependency on
Census.
2017-03-23 17:36:21 -07:00
Eric Anderson 48a32fbeaa benchmarks: Fix broken building of ServerServiceDefinition
This appears to have been broken by 3df1446 (which was reverted and
later rolled forward again in 66ab956).

Without this fix, the ServerServiceDefinition.Builder realizes that a
method is registered that isn't in the ServiceDescriptor. Swapping to a
different constructor causes the builder to generate the
ServiceDescriptor for us.

java.lang.IllegalStateException: No entry in descriptor matching bound method E6Cq77iKGNKVCGyVOqq8DqEazX9AcBdPNoMj86c3I5zo4Tv77U/vLe7QS7mhUfaooN7eYdBW7gd9oyV.kc9I0zJumfuUbhyb7SR1u
	at io.grpc.ServerServiceDefinition$Builder.build(ServerServiceDefinition.java:164)
	at io.grpc.benchmarks.netty.HandlerRegistryBenchmark.setup(HandlerRegistryBenchmark.java:107)
2017-03-23 13:39:16 -07:00
ZHANG Dapeng 6789eac581 core,netty,okhttp: KeepAliveManager with Pinger
Modified KeepAliveManager to use a Pinger interface, which can send ping or shutdown transport for both server and client.
2017-03-23 13:34:19 -07:00
Kun Zhang c14c5dda63 core: delete deprecated pickSubchannel() (#2849)
The deprecation happens in 1.2.0. Now we can delete it for the next release.
2017-03-23 11:02:26 -07:00
Kun Zhang 3f35ea69df doc: performance implication of Metadata.containsKey() (#2851) 2017-03-23 11:01:37 -07:00
Carl Mastrangelo ee12cc2a34 all: update to latest version of errorprone 2017-03-22 22:09:04 -07:00
Kun Zhang 418d52d16d core: unify EquivalentAddressGroup and its immitators. (#2755)
Resolves #2716

- Add attributes to EquivalentAddressGroup
- Deprecate ResolvedServerInfoGroup by EquivalentAddressGroup
- Deprecate ResolvedServerInfo, because attributes for a single address
  with an address group is not found to be useful.
- The changes on the NameResolver and LoadBalancer interfaces are backward-compatible
  in the next release, with which implementors can switch to the new API smoothly.

As a related change, redefine the semantics of DnsNameResolver and
RoundRobinLoadBalancer:

- Before: DnsNameResolver returns all addresses in one address group.
  RoundRobinLoadBalancer ignores the grouping of addresses and
  round-robin on every single addresses.  It doesn't work well with the
  one-server-multiple-address setup, e.g., both IPv4 and IPv6 addresses
  are returned for a single serve, even if they are put in the same
  address group by the NameResolver.

- After: DnsNameResolver returns every address in its own
  EAG. RoundRobinLoadBalancer takes an EAG as a whole, and only
  round-robin on the list of EAGs. The new behavior is a better
  interpretation of the EAGs, and really allows the case where one
  server has more than one addresses (e.g., IPv4 and IPv6).

This change will affect users that use custom LoadBalancer with the
stock DnsNameResolver, and those who use custom NameResolver with the
stock RoundRobinLoadBalancer.

Users who use both the stock DnsNameResolver and RoundRobinLoadBalancer
or PickFirstBalancer will see no behavioral change. Because they will
still round-robin on individual addresses from DNS, or do pick-first on
all addresses from DNS (PickFirstBalancer flattens all addresses).

The result is a simpler API and reduction of boilderplates.
2017-03-22 18:29:31 -07:00
ZHANG Dapeng 3ffa5a9660 Okhttp: keepAlivedManager#onTransportShutdown moved from shutdown to stopIfNecessary and refactored
`keepAlivedManager#onTransportshutdown` should not be called in `transport.shutdown()` because it is possible that there are still open RPC streams, and maybe inactive, so keepalive is still needed.
2017-03-22 10:26:45 -07:00
Kun Zhang 8890888b12 core: delete defunct TransportManager. (#2846) 2017-03-22 10:21:59 -07:00
Kun Zhang c112a2c5d8 core: suggest against overriding Context in ClientInterceptor (#2838)
Reference: #2829
2017-03-22 09:35:10 -07:00
kpayson64 3db951720c okhttp: Add restricted AppEngine SSL setup (#2845) 2017-03-21 17:24:34 -07:00
Eric Gribkoff 6cf8f059c9 protobuf: utility methods for com.google.rpc.Status 2017-03-21 14:03:46 -07:00
Eric Gribkoff 260fc273b8 services: update monitoring.proto 2017-03-21 10:46:40 -07:00
Eric Gribkoff 00bebc477a documentation: update path/method for reflection 2017-03-20 12:41:33 -07:00
ZHANG Dapeng a14689eff8 netty: move startWriteQueue right after channel is constructed
Now that the commit 65e4d9f has split the channel instantiation and `connect()`, we can `startWriteQueue()` even earlier.
2017-03-20 11:54:57 -07:00
Carl Mastrangelo 82bdf53cd3 core: use nanos more consistently 2017-03-20 11:09:24 -07:00
Carl Mastrangelo c6e44b28c9 all: include analytics in releasing notes 2017-03-20 09:53:55 -07:00
Carl Mastrangelo caa0dd23aa all: bump recommended version to 1.2.0 2017-03-17 14:54:45 -07:00
ZHANG Dapeng 87c75b3ce7 core: annotate some keys with Immutable; Context.Key final 2017-03-16 17:50:23 -07:00
Eric Anderson 19afd8b48b core: Support keepalive even when transport is idle
Nothing is using this yet, but it will be used on both client and
server.
2017-03-15 17:15:19 -07:00
Carl Mastrangelo 6d44f2ffa4 stub: document withChannel and document method history 2017-03-15 13:31:09 -07:00
ZHANG Dapeng c44a4b24dd core: keepaliveManager not to use Ping.onSuccess; ragard onDataReceive as ping Ack
Preparing to support server side keepalive.
For the convience on server side, not to use Ping `onSuccess()` callback to cancle shutdownFuture any more, instead, regard `onDataReceived()` as ping Ack and cancel shutdownFuture in it.
2017-03-15 11:15:44 -07:00