Commit Graph

3911 Commits

Author SHA1 Message Date
Carl Mastrangelo e5bd7f282c
Revert "Revert "core, netty: add io.perfmark Annotations" (#5853)" (#5884)
This reverts commit 2db3abc9ad.
2019-06-14 14:09:05 -07:00
Eric Anderson b836b36777 core: Fix FINE deadline logging
We were logging when withDeadline() was used, not when the Context was used. As
discovered while looking at https://stackoverflow.com/q/56593692/4690866 .

In e19e8f7d updateTimeoutHeaders was removed and logIfContextNarrowedTimeout
was called directly. However, the two methods had reverse ordering of
callDeadine/outerCallDeadline and the caller did not get their arguments
swapped.
2019-06-14 11:27:57 -07:00
Chengyuan Zhang 5f4bc15f83
xds: StatsStore#interceptPickResult should not intercept NO_RESULT (#5876)
* fixed bug of intercepting a PickResult with no Subchannel, it should just return the original PickResult. Also, the test was not correct, fixed it.

* changed ClientLoadCounter to a mock in XdsLoadStatsStoreTest, it's not necessary to instantiate a real instance.

* added a TODO comment for suggesting a warning for desired locality counter missing when intercepting a PickResult

* use isSameInstanceAs for verifying intercepting invalid PickResult instead of isEqualTo.
2019-06-14 10:50:36 -07:00
Kun Zhang aa783ee252
core/test: re-enable tests for panic mode. (#5879)
panic mode was temporarily disabled by #4152 and re-enabled by #4245,
but the tests were not.  This has caused a few test code that was
broken but not executed at all.
2019-06-14 10:28:06 -07:00
Eric Anderson e795f14bed interop-testing: Observe flow control in TestServiceImpl 2019-06-14 09:01:29 -07:00
ZHANG Dapeng 0b27e2862d
xds: let ChannelLogger log more useful information 2019-06-13 15:43:53 -07:00
Eric Anderson 8e59a2d1e5 Revert "services: fix HealthCheckingLoadBalancer.shutdown(). (#5848)"
This reverts commit c6f15162ff. It broke
an internal health checking test because the server wouldn't shut down.
We assume the health checking RPC isn't getting closed.
2019-06-13 15:04:41 -07:00
Chengyuan Zhang 77544786b6
xds: integrate client load reporting with xds load balancer (part 2) (#5867)
* integrate recordDropRequest in LocalityStore

* integrated StatsStore#addLocality and StatsStore#removeLocality in LocalityStore in handling EDS response.

* integrated picker interception in LocalityStore

* integrate XdsLoadReportClient in XdsLoadBalancer

* put removing locality counters after updating subchannl pickers to narrow down race window

* fixed modifier for XdsLoadReportClientFactory

* refactor handleNewConfig method in XdsLoadBalancer for better readability

* edited message for closing lb rpc when balancer name changes

* weaker the specification of XdsLoadReportClient to allow start/stop be called multiple times.

* removed lrsWorking flag as we relaxed precondition of calling start/stop on XdsLoadReportClient

* refactor initLbChannel to be a factory method for better readability

* added comment for the case when child policy changes, lrs should not be affected

* changed comments for eliminating potential load lose upon locality update.

* make lb RPC cancellation message more informative
2019-06-13 13:43:45 -07:00
Nick Travers 6aed34231f netty: refine filtering for benign transport level exceptions
Transport level exceptions (e.g. "Connection reset by peer") are not
useful and clutter the logs. `NettyServerTransport` contains logic to
log such exceptions at level `FINE`.

When running with epoll, transport level exceptions are prefixed with
additional contextual information (e.g. "syscall:read(..) failed:") that
causes the exceptions to be logged at level `INFO`.

Update the filtering logic to match on error messages _containing_ the
blacklisted messages, rather than using string equality.

Closes #5872.

Signed-off-by: Nick Travers <n.e.travers@gmail.com>
2019-06-13 09:24:36 -07:00
Carl Mastrangelo 3432395119
alts: handle inline flushes on close in frame handler
gRPC issues flushes after close in the WriteQueue, which can show up as an NPE in the framer.  This was thought to have been handled, by checking to see if there were any pending writes, but if the close() call gets far enough, the writes will be null.    This causes an NPE when the flush comes though.

The issue is difficult to reproduce, and I think my test case emulates the failure.  EmbeddedChannel is different than the normal Channels we use, making the precise ordering tough.  The test case isn't exactly what the production code would do, but it does have the same ordering.

cc @jiangtaoli2016 

Sample Stack trace:

```
Jun 10, 2019 2:09:03 PM io.grpc.ChannelLogger log
FINEST: [OobChannel<10>] Entering SHUTDOWN state
Jun 10, 2019 2:09:03 PM io.grpc.ChannelLogger log
FINEST: [Subchannel-OOB<11>: (fake-authority-that-is-always-the-same)] NettyClientTransport<14>: (/0:0:0:0:0:0:0:1:20008) SHUTDOWN with UNAVAILABLE(OobChannel is shutdown)
Jun 10, 2019 2:09:03 PM io.grpc.netty.NettyClientHandler close
FINE: Network channel being closed by the application.
Jun 10, 2019 2:09:03 PM io.grpc.internal.ClientCallImpl logIfContextNarrowedTimeout
FINE: Call timeout set to '4999299080' ns, due to context deadline. Explicit call timeout was not set.
Jun 10, 2019 2:09:03 PM io.netty.handler.codec.http2.Http2FrameLogger logGoAway
FINE: [id: 0x4bcebba6, L:/0:0:0:0:0:0:0:1:33296 - R:/0:0:0:0:0:0:0:1:20008] OUTBOUND GO_AWAY: lastStreamId=0 errorCode=0 length=0 bytes=
Jun 10, 2019 2:09:03 PM io.grpc.netty.NettyClientHandler onConnectionError
FINE: Caught a connection error
java.lang.NullPointerException
        at io.grpc.alts.internal.TsiFrameHandler.flush(TsiFrameHandler.java:126)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:754)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:746)
        at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:732)
        at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:201)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:754)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:746)
        at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:732)
        at io.netty.channel.DefaultChannelPipeline.flush(DefaultChannelPipeline.java:978)
        at io.netty.channel.AbstractChannel.flush(AbstractChannel.java:253)
        at io.grpc.netty.WriteQueue.flush(WriteQueue.java:124)
        at io.grpc.netty.WriteQueue.access$000(WriteQueue.java:32)
        at io.grpc.netty.WriteQueue$1.run(WriteQueue.java:44)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

Jun 10, 2019 2:09:03 PM io.netty.channel.AbstractChannelHandlerContext notifyHandlerException
WARNING: An exception was thrown by a user handler while handling an exceptionCaught event
java.lang.NullPointerException
        at io.grpc.alts.internal.TsiFrameHandler.flush(TsiFrameHandler.java:126)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:754)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:746)
        at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:732)
        at io.netty.handler.codec.http2.Http2ConnectionHandler.onError(Http2ConnectionHandler.java:629)
        at io.grpc.netty.AbstractNettyHandler.exceptionCaught(AbstractNettyHandler.java:81)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:276)
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:268)
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
        at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:836)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:756)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:746)
        at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:732)
        at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:201)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:754)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:746)
        at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:732)
        at io.netty.channel.DefaultChannelPipeline.flush(DefaultChannelPipeline.java:978)
        at io.netty.channel.AbstractChannel.flush(AbstractChannel.java:253)
        at io.grpc.netty.WriteQueue.flush(WriteQueue.java:124)
        at io.grpc.netty.WriteQueue.access$000(WriteQueue.java:32)
        at io.grpc.netty.WriteQueue$1.run(WriteQueue.java:44)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

Jun 10, 2019 2:09:03 PM io.grpc.netty.NettyClientHandler channelInactive
FINE: Network channel is closed
Jun 10, 2019 2:09:03 PM io.grpc.ChannelLogger log
FINEST: [Subchannel-OOB<11>: (fake-authority-that-is-always-the-same)] NettyClientTransport<14>: (/0:0:0:0:0:0:0:1:20008) Terminated
Jun 10, 2019 2:09:03 PM io.grpc.ChannelLogger log
FINEST: [Subchannel-OOB<11>: (fake-authority-that-is-always-the-same)] Terminated
```
2019-06-12 11:06:12 -07:00
Sebastian Schmidt e4b666aaeb Fixing typo in SECURITY.md
See https://en.wikipedia.org/wiki/Cipher_suite
2019-06-11 22:33:30 -07:00
Chengyuan Zhang ef5a992e77
xds: fix bug of using the wrong cluster name for client load reporting (#5865)
* fixed bug of using the wrong cluster name for client load reporting

* moved clusterName into LrsStream
2019-06-11 15:54:05 -07:00
Chengyuan Zhang 213b91b165
xds: refactor XdsLoadReportClient and XdsLoadStatsStore in order to integrate with XdsLoadBalancer (part 1) (#5863)
* extract self-defined Locality into XdsLocality class

* separate out functionalities for recording client load from lrsClient, xds load balancer will directly interact with XdsLoadStatsStore to set up locality counters

* added GRPC to constant TRAFFICDIRECTOR_HOSTNAME_FIELD name to better match that in XdsComms

* fixed bug of using the wrong cluster name in load report's ClusterStats, it should be GSLB service name, which is responsed by load report response (same as that in EDS response).

* added a new line to the end of files.

* Revert "fixed bug of using the wrong cluster name in load report's ClusterStats, it should be GSLB service name, which is responsed by load report response (same as that in EDS response)."

This reverts commit 6097dd4066.

* rephrase interface comment for StatsStore

* added equality and hashCode test for XdsLocality
2019-06-11 09:43:15 -07:00
Chengyuan Zhang c98fb2d03e
xds: fix bug of missing total_dropped_requests field in ClusterStats proto (#5862) 2019-06-10 14:22:16 -07:00
ZHANG Dapeng f7077a565a
xds: cleanup XdsLbStateTest
The test case `XdsLbStateTest.handleSubchannelState()` was introduced before `LocalityStore` refactored out of `XdsLbState`. After `LocalityStore` refactored out, the test case should not be in `XdsLbStateTest` anymore. The test case is already covered in `LocalityStoreTest`.
2019-06-10 14:19:27 -07:00
ZHANG Dapeng b69b15fddb
javadoc: exclude internal APIs
Fixes #5858 in master
2019-06-10 13:36:21 -07:00
ZHANG Dapeng 33c30db42c
xds: allow grpclb balancer addresses for backward compatibility
During migration, the name resolver may not know when the client has been upgraded to xds, so it may still send grpclb v1 addresses with a list of policies including both grpclb v1 and xds.
2019-06-10 11:27:42 -07:00
Carl Mastrangelo 2db3abc9ad
Revert "core, netty: add io.perfmark Annotations" (#5853)
This causes internal breakage which needs to be resolved before continuing.

This reverts commit 71967622d6.
2019-06-07 17:23:49 -07:00
Carl Mastrangelo 44ecdf3649
interop-testing: fix tests on Android 2019-06-07 15:01:21 -07:00
Kun Zhang c6f15162ff
services: fix HealthCheckingLoadBalancer.shutdown(). (#5848)
HealthCheckingLoadBalancer.shutdown() calls
hcState.onSubchannelState(SHUTDOWN) which removes that hcState from
helper.hcStates.  Therefore, if more than one Subchannels are present,
ConcurrentModificationException will be thrown.

Since HealthCheckingLoadBalancer.shutdown() will clear the hcStates
set after the loop, it's unnecessary to do the deletion within the
loop.  However, when a Subchannel is shutdown by LoadBalancer, its
HcState still needs to be removed.  To do that, change moves the
deletion to Subchannel.shutdown().
2019-06-07 09:32:55 -07:00
Eric Anderson 26bd76fa76
Upgrade to Gradle 5 2019-06-07 08:40:53 -07:00
Eric Anderson a284cff892 java_grpc_library: Swap to descriptor_set_in to protoc
This avoids re-parsing the proto files and allows proto_library
to enforce more checks.

This is an export of cl/197343148
2019-06-07 07:27:01 -07:00
Carl Mastrangelo 71967622d6
core, netty: add io.perfmark Annotations
This add perfmark annotations in some key places, notably on transport/application boundaries, and thread hop locations. Perfmark records to a thread-local buffer the events that happen in each thread. Perfmark is disabled by default, and will compile to a noop unless Perfmark.setEnabled is invoked. This should make it free when disable, and pretty fast when it is enabled.

It is important that started tasks are ended, so several places in our code are moved to either try-finally blocks, or moved into a private method. I realize this is ugly, but I think it is manageable. In the future, we can look at making an agent or compiler plugin that simplifies the recording.

Linking between threads is done with a Link object, which is created on the "outbound" task, and used on the "inbound" task. This is slightly more verbose, and does has a small amount of runtime overhead, even when disabled. (for null checks, slightly higher memory usage, etc.) I think this is okay to, because it makes other optimizations much easier.
2019-06-06 17:58:49 -07:00
Eric Anderson 5d0c283b46 java_grpc_library.bzl: Support alternative javac toolchains
Depending on jdk:toolchain causes java_grpc_library to always use the
_default_ toolchain, even if the user tried to override it. Changing to
:current_java_toolchain allows the rule to use the user-selected
toolchain when overridden.

Tested by adding to BUILD:
load("@bazel_tools//tools/jdk:default_java_toolchain.bzl", "default_java_toolchain")
default_java_toolchain(
    name = "mychain",
    misc = ["-Amy=flag"],
    visibility = ["//visibility:public"],
)

And then verifying -Amy=flag is in the output of:
bazel aquery --java_toolchain=:mychain services:_reflection_java_grpc

Fixes #5841
2019-06-06 15:31:31 -07:00
Eric Anderson 63a6e26f39 Hard-code Netty's epoll classifier
There's only one classifier that we use today, and we really want the
compilation results to be the same independent of which machine you
compiled on.
2019-06-06 15:28:11 -07:00
Carl Mastrangelo dcd68e5b57
context: Extend raw ComparableSubject instead of supplying type parameters 2019-06-06 15:00:03 -07:00
Carl Mastrangelo f8ba38a0e4
alts: ensure only the first few bytes of key are used
Fixes grpc/grpc#19271
2019-06-06 14:59:22 -07:00
Eric Anderson be819fa3fd
android: Convert to maven-publish
com.github.dcendents:android-maven-gradle-plugin is incompatible with
Gradle 5 and the project hasn't seen any activity in over a year, so it
seems unlikely to get fixed.

We want to use maven-publish anyway, since that's what we use elsewhere.
2019-06-06 14:53:27 -07:00
Carl Mastrangelo 8536832232
core,netty: expose server stream id 2019-06-06 13:52:22 -07:00
Eric Anderson ee5731cc18 interop-testing: Only set okhttp's sslSocketFactory for test CA
We want the interop client to be configured like a normal user would,
and a normal user wouldn't call sslSocketFactory to use the default
roots.
2019-06-06 10:30:10 -07:00
Carl Mastrangelo 9ef0e9fc1b
interop-testing: disable timeout when debugging 2019-06-05 23:39:43 -07:00
ZHANG Dapeng 6aadaf0a64
core,services: cleanup io.grpc.internal.IoUtils 2019-06-05 17:31:46 -07:00
Carl Mastrangelo 7657523b28
all: update to error prone 2.3.3 2019-06-05 15:28:43 -07:00
ZHANG Dapeng 16de96befe
xds: Add gogoproto dependency to xds
The generated grpc services are not changed.
2019-06-05 10:13:19 -07:00
Carl Mastrangelo 409afe5867
all: update to truth 0.45 2019-06-04 21:12:21 -07:00
Jihun Cho 23170c298e
alts: add TsiPeer boolean property (#5824) 2019-06-03 16:29:48 -07:00
Chengyuan Zhang f81201024e
upgrade netty version to 4.1.35 and netty-tcnative version to 2.0.25 (#5818) 2019-06-03 11:40:59 -07:00
Ran 81ba42a1d6
core: expose some of AutoConfiguredLoadBalancer because some internal tests need to access them (#5821)
* core: revert some changes to fix tests

* fix style
2019-06-03 09:34:04 -07:00
Chengyuan Zhang 93551719b9
xds: integrate backend metric API to client load reporting (#5797)
* augmented ClientLoadCounter with backend metrics

* added a listener implementation for receiving backend metrics and aggregate in ClientLoadCounter
2019-05-31 14:28:23 -07:00
Kun Zhang 276b7d8512
interop-testing: create GrpclbLongLivedAffinityTestClient (#5817)
This is a long-running stand-alone test client for a specific customer that uses GRPCLB's pick_first mode.
2019-05-31 12:54:05 -07:00
Manuel Kollus e526891a2b api,protobuf-lite: solve code style issues 2019-05-31 12:49:30 -07:00
Eric Anderson c6c2ee876a core: Remove unnecessary SuppressWarnings from JsonParser 2019-05-31 10:09:39 -07:00
Jihun Cho d37f87abce
core: Migrate InternalSubchannel to use SynchronizedContext (#5555) 2019-05-30 18:40:50 -07:00
ZHANG Dapeng d8aa42723d
xds: fix bug in XdsLoadBalancerProvider.parseLoadBalancingConfigPolicy
Resolves #5804
2019-05-30 16:37:08 -07:00
ZHANG Dapeng f9decbf69d
xds: remove unused variables 2019-05-30 14:38:43 -07:00
Eric Anderson 3c931b40b0 api: Mention similarity of synccontext to a dedicated thread
This is the conceptual model we use. Document it to help aid others'
understanding and make it easier to understand when it is appropriate to
use.
2019-05-30 10:52:22 -07:00
Kun Zhang eff13a9ec8
core: only let ManagedChannelImpl convert empty resolution result to error (#5803)
Previously, AutoConfiguredLoadBalancer was also handling it, but it
doesn't trigger retries.  By returning true for
canHandleEmptyAddressListFromNameResolution(),
AutoConfiguredLoadBalancer effectively by-passed the empty-result
handling logic in ManagedChannelImpl, thus resolution retries were
never triggered.

This change requires AutoConfiguredLoadBalancer to stop being a
LoadBalancer, for its tryHandleResolvedAddresses().  It doesn't
cause any trouble because AutoConfiguredLoadBalancer has become
less and less like a LoadBalancer during the service config changes.
2019-05-30 09:48:30 -07:00
Kun Zhang af2c16d301
api: deprecate Helper.updateSubchannelAddresses() and add equivalent on Subchannel (#5802)
Resolves #5676
2019-05-30 09:16:38 -07:00
Ryan Michela 9b4c958201 Explain why client stub mocking is discouraged (#5796) 2019-05-30 08:52:13 -07:00
Eric Anderson bc2e1764f6 api,stub: Clarify isReady()/onReady() interaction semantics 2019-05-29 17:28:45 -07:00