Commit Graph

4854 Commits

Author SHA1 Message Date
Leonardo Pistone 1a655622c7
Document that xds uses grpc-netty-shaded (#7877) 2021-05-11 19:19:40 -07:00
Eric Anderson 1882c47eb9 netty: Remove Maven pom.properties from netty-shaded
The pom.properties are apparently present to allow tooling to know what
Maven artifact cooresponds to a JAR, just by looking at the JAR. Since
we shade Netty, that produces inaccurate results. This was noticed in
in #8077.
2021-05-11 15:36:44 -07:00
ZHANG Dapeng 8dc16cd569
okhttp: let frameReader report existing goAwayStatus when socket closed
`OkHttpClientTransport.ClientFrameHandler` will fail a stream with `Status.UNAVAILABLE.withDescription("End of stream or IOException")` when socket is closed with an error. However, it does not include any more error detail. This PR provides more error detail in case there is an existing goaway status, e.g. netty server can send goaway with lastKnownStreamId=MAX_INT when header size exceeded max allowed size netty/netty/pull/10775 and shutdown the connection.

Test: `io.grpc.okhttp.OkHttpTransportTest.serverChecksInboundMetadataSize` with `netty-4.1.54.Final`
2021-05-11 14:23:39 -07:00
Chengyuan Zhang f4fe466fb0
xds: lazily and only parse headers with matchers matching the key (#8163)
In normal cases, we only have a few header matchers but the number of headers can be completely up to the application. Indexing headers eagerly parses all headers, even for those with no matcher matching the key. We should only parse header values for those with key matching the header matcher (aka, only call Metadata.get() with key that has some matcher looking for).
2021-05-11 14:20:02 -07:00
Chengyuan Zhang dbc5786c30
xds: ring_hash self recover from TRANSIENT_FAILURE by attempting to connect one subchannel (#8144)
Kicks off connection for one of IDLE subchannels (if exist) when the ring_hash LB policy is reporting TRANSIENT_FAILURE to its upstream.

While the ring_hash policy is reporting TRANSIENT_FAILURE, it will not be getting any pick requests from the priority policy. However, because the ring_hash policy does not attempt to reconnect to subchannels unless it is getting pick requests, it will need special handling to ensure that it will eventually recover from TRANSIENT_FAILURE state once the problem is resolved. Specifically, it will make sure that it is attempting to connect (after applicable backoff period) to at least one subchannel at any given time.
2021-05-11 01:58:57 -07:00
sanjaypujare 0c2d8edc4c
xds: refactor TlsContextManager related code to remove dependency on Bootstrapper (#8150) 2021-05-10 13:13:26 -07:00
Chengyuan Zhang c7afb89708
grpclb: use a standalone Context for gRPCLB control plane RPCs (#8154)
Inject a standalone Context that is independent of application RPCs to GrpclbLoadBalancer for control plane RPCs. The control plane RPC should be independent and not impacted by the lifetime of Context used for application RPCs.
2021-05-10 10:21:36 -07:00
Chengyuan Zhang 7b09056aa4
xds: use a standalone Context for xDS control plane RPCs (#8153)
Control plane RPCs are independent of application RPCs, they can stand for completely different lifetime. So the context for making application RPCs should not be propagated to control plane RPCs. This change makes control plane RPCs use the ROOT Context.
2021-05-07 18:00:47 -07:00
Sergii Tkachenko 976daf2dd0 buildscripts: switch xds-k8s cluster to 1.20.x 2021-05-05 20:01:15 -04:00
Eric Gribkoff c0eca6de25
Start 1.39.0 development cycle (#8147) 2021-05-05 16:30:38 -07:00
Lidi Zheng d2160ea703
Extend the xDS interop tests timeout to 360 mins (#8133) 2021-05-05 10:18:44 -07:00
sanjaypujare c9e327d42f
xds: extend SslContextProviderSupplier to DowmstreamTlsContext for server side (#8146) 2021-05-04 22:19:15 -07:00
yifeizhuang 27b1641653
xds: import envoy (#8145) 2021-05-04 16:20:44 -07:00
Chengyuan Zhang fcaf9a9583
xds: ignore balancing state update from downstream after LB shutdown (#8134)
LoadBalancers should not propagate balancing state updates after itself being shutdown.

For LB policies that maintain a group of child LB policies with each having its independent lifetime, balancing state update propagations from each child LB policy can go out of the lifetime of its parent easily, especially for cases that balancing state update is put to the back of the queue and not propagated up inline.

For LBs that are simple pass-through in the middle of the LB tree structure, it isn't a big issue as its lifecycle would be the same as its child. Transitively, It would behave correctly as long as its downstream is doing in the right way.

This change is a sanity cleanup for LB policies that maintain multiple child LB policies to preserve the invariant that further balancing state updates from their child policies will not get propagated.
2021-05-04 15:56:56 -07:00
Chengyuan Zhang ee000f0dc1
xds: throw away subchannel references after ring_hash is shutdown (#8140)
Similar to 368c43aec4. 

Clean up subchannels after the RingHashLoadBalancer itself is shutdown to prevent further balancing state updates being propagated to the upstream.

Note this should not be considered as a fix for any problem anybody is noticing. Upstreams of RingHashLoadBalancer should not rely on this, it should still have its own logic for maintaining the lifecycle of downstream LB and ignore invalid upcalls when necessary.
2021-05-04 13:35:37 -07:00
Eric Anderson 16eb5a47ec Stabilize ChannelCredentials
Some of the experimental API annotations were changed to other issues or
became `@Internal` to match their related APIs.

Fixes #7479
2021-05-03 16:22:43 -07:00
Eric Anderson d42f3b8fcb Stabilize ServerCredentials
Some of the experimental API annotations were changed to other issues or
became @Internal to match their related APIs.

Fixes #7621
2021-05-03 16:10:24 -07:00
Joris Scharp f0c9ae26d7 examples: exporting the io.grpc.examples.manualflowcontrol client & server to the example bin output folder. This commit improves gRPC to quickly find out this example exists, instead of having to go through the source code. 2021-05-03 10:57:42 -07:00
Chengyuan Zhang 4a339e41ba
xds: fix URI creation used to instantiate DNS name resolver (#8129)
When creating the URI using Channel authority for instantiating a DNS resolver in the cluster_resolver LB policy, a "dns" scheme needs to be manually attached and the Channel authority would be used as the URI path (same as creating Channel with target). Otherwise, the Channel authority will just be used as the scheme and causing name resolver not found.

The change also handles name resolver lookup more defensively. Although it should not happen, if there does have bug causing DNS resolver not being able to be loaded, the cluster_resolver LB policy propagates the INTERNAL error to upstream.
2021-04-30 18:10:40 -07:00
Chengyuan Zhang 368c43aec4
core: throw away subchannel references after round_robin is shutdown (#8132)
Triggering balancing state updates after already being shutdown can be confusing for the upstream of round_robin. In cases of the callers not managing round_robin's lifecycle (e.g., not ignoring updates after it shuts down round_robin, which it should), it can make problem very bad, especially with the behavior that round_robin is actually propagating TRANSIENT_FAILURE with a picker that buffers RPCs.

This change only polishes round_robin by always preserving its invariant. Callers/LBs should not rely on this and should still manage the balancing updates from its downstream correctly based on the downstream's lifetime.
2021-04-30 15:42:44 -07:00
sanjaypujare 02ff64fa21
xds: use singleton XdsClient for server side (#8130) 2021-04-30 09:52:56 -07:00
Chengyuan Zhang 5d99bb07b8
xds: pretty print ClusterConfig message (#8128)
Adds ClusterConfig message descriptor to message printer.
2021-04-29 18:04:03 -07:00
Chengyuan Zhang 42d7fba1b8
xds: implement toString() for pickers to visualize selectable hosts (#8123)
Implements toString() for the wrapping SubchannelPickers so that we are able to see how hosts are selected when sending out RPCs.
2021-04-28 15:16:30 -07:00
Jan Tattermusch 72527708f5
skip flaky :grpc-xds:test in linux aarch64 tests (#8119)
Due to #8118 the :grpc-xds:test currently doesn't provide useful signal.
2021-04-27 10:18:39 -07:00
Sergii Tkachenko fcbc1abc44 buildscripts: xds-k8s pin pip to 21.0.1
pip 21.1 released on Apr 24 introduced a regression for python 3.6.1.
The regression was identified on Apr 24, the fix merged on Apr 25.
The fix is expected to be delivered in the 21.1.1 patch.

There's no clear date, when 21.1.1 will be released.
Until then, pin is temporarily pinned to the previous release, 21.0.1.
2021-04-26 18:15:18 -04:00
yifeizhuang 6755cfed34
tsan, xds: fix XdsClientWrapperForServerSds data races (#8107) 2021-04-26 14:37:11 -07:00
yifeizhuang 8468b5c42f
tsan, xds: fix data races in ServerWrapperForXds (#8114) 2021-04-26 11:58:32 -07:00
Jan Tattermusch b436d0dfb7
Improve emulated linux aarch64 tests, without protoc artifact build
Based on my testing, the tests are currently flaky, so it would be better to first merge it and start running it continuously to asses the situation.
2021-04-26 10:54:28 -07:00
Chengyuan Zhang f33658a6d7
Migrate away from Jcenter (#8111)
Replaces jcenter with mavenCentral.
2021-04-22 15:59:36 -07:00
Lidi Zheng 134e9cbc42
buildscripts: enable CSDS test 2021-04-21 20:48:34 -04:00
Eric Anderson 2eb0a95305 Bump Guava to 30.1 for Bazel
This was missed from a81bf14
2021-04-21 15:49:52 -07:00
ZhenLian 1703d692bc
advancedtls: Add a Utility Class For Loading Certs/Keys (#8023) 2021-04-21 10:07:44 -07:00
Eric Anderson 8a9aa41416 all: Only depend on evaluation of enumerated subprojects
Back in 10fb206 when this for loop was added, we didn't have the
subprojects list. That was added in 9bd7bab, but I failed to update one
reference to rootProject. So all has had an evaluation dependency on all
projects, even though it only needs a subset.

This should have little impact, but would prevent weird scenarios like
an issue in :grpc-gae-interop-testing-jdk8 preventing :all-all from
being loaded. Not to say things wouldn't still fail to load, but that
this bug could distract from the real problem. I noticed this
during #8049.
2021-04-19 14:18:54 -07:00
Eric Anderson 84dc5642bc Allow both old and new behavior from google-auth-library-java
google-auth-library-java:0.25.0 strips port and path parts in the
audience claim ("aud"). Updating the test to pass in both old
and new version of google-auth-library-java.

This commit does not upgrade google-auth-library-java because
it turned out that the upgrade involves the newer Guava version
(google-auth-library-java's dependency) failing with DexingNoClasspathTransform.
Details: https://github.com/grpc/grpc-java/pull/8078#issuecomment-821566805
It's technically possible to exclude the newer Guava, but it's a
good practice avoid excluding the newer version of a library.
2021-04-19 14:13:23 -07:00
Eric Anderson a81bf14f1f Upgrade to Guava 30.1, which warns on Java 7
This change can have large impact from two aspects:
1. It calls out a _large_ impact on the _few_ Java 7 users.
2. It may have _small_ impact on the _many_ Android users.

https://github.com/grpc/grpc-java/issues/4671 tracks gRPC's removal of
Java 7 support. We are quite eager to drop Java 7 support as that would
allow using new language features like default methods. Guava is also
dropping Java 7 support and starting in 30.1 it will warn when used on
Java 7. The purpose of the warning is to help discover users that are
negatively impacted by dropping Java 7 before it becomes a bigger
problem.

The Guava logging check was implemented in such a way that there is an
optional class that uses Java 8 bytecode. While the class is optional at
runtime, the Android build system notices when dexing and fails if
Java 8 language featutres are not enabled. We believe this will not be a
problem for most Android users, but they may need to add to their build:

```
android {
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
}
```

See also https://github.com/google/guava/releases/tag/v30.1
2021-04-19 09:16:00 -07:00
Chengyuan Zhang bab1fe38dc
services: move classes with protobuf dependency into io.grpc.protobuf.services (#8056)
To separately manage services/classes with and without protobuf dependency in services package, we are moving classes with protobuf dependency into io.grpc.protobuf.services. This includes healthchecking, reflection, channelz, and binlogging.

Forwarding classes are created to avoid breaking existing users, while they are marked as deprecated to notify users to migrate.
2021-04-16 17:27:12 -07:00
ZHANG Dapeng eb6764841b
netty: fix status message when GOAWAY at MAX_CONCURRENT_STREAMS limit
Resolves #8097
2021-04-16 16:10:38 -07:00
ZHANG Dapeng 49f9380fc9
netty: fix StreamBufferingEncoder GOAWAY bug
Fix a bug in StreamBufferingEncoder: when client receives GOWAY while there are pending streams due to MAX_CONCURRENT_STREAMS, we see the following error:
io.netty.handler.codec.http2.Http2Exception$StreamException: Maximum active streams violated for this endpoint.
2021-04-16 14:23:14 -07:00
Chengyuan Zhang b4fe07d22d
xds: support ring_hash as the endpoint-level LB policy (#7991)
Update LB policy config generation to support ring hash policy as the endpoint-level LB policy.

- Changed the CDS LB policy to accept RING_HASH as the endpoint LB policy from CDS updates. This configuration is directly passed to its child policy (aka, ClusterResolverLoadBalancer) in its config.

- Changed ClusterResolverLoadBalancer to generate different LB configs for its downstream LB policies, depending on the endpoint-level LB policies.
  - If the endpoint-level LB policy is ROUND_ROBIN, the downstream LB policy hierarchy is: PriorityLB -> ClusterImplLB -> WeightedTargetLB -> RoundRobinLB
  - If the endpoin-level LB policy is RNIG_HASH, the downstream LB policy hierarchy is: PriorityLB -> ClusterImplLB -> RingHashLB.
2021-04-16 12:46:55 -07:00
Eric Anderson 31cfb6d32e
all: JacocoMerge must run after grpc-interop-testing's tests (#8093)
Otherwise the executionData would be out-of-date.
2021-04-16 11:10:04 -07:00
Chengyuan Zhang 9614738a7d
core, grpclb, xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken (#8048)
Currently each subchannel implicitly refreshes the name resolution when its state changes to IDLE or TRANSIENT_FAILURE. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).

In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.

A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.
2021-04-16 10:49:06 -07:00
Eric Anderson 384f4c401d context: Add docs describing common Key usage
This recently came up in https://stackoverflow.com/a/67062503/4690866,
but it has come up multiple times before. These docs aren't ideal, as
they may be missed by a reader and so references in other parts of the
API would probably be appropriate. There could also be something about
"Context is not a general purpose map." But this is an improvement, and
I didn't want to let the perfect be the enemy of the good.
2021-04-16 09:36:16 -07:00
Tomo Suzuki 4ad49266ec OkHttpClientTransportTest's proxy to use localhost
Fixes #8080. The address 0.0.0.0 (that comes from new Socket(0).
.getLocalSocketAddress()) is for listening with a server, but it
is not meant to be used as the destination address as per
"3.2.1.3 Addressing" in RFC 1122
2021-04-15 10:57:54 -07:00
ZHANG Dapeng d25ebaf57d
core: fix NPE in ConfigSelectingClientCall
Fix the following bug:

ManagedChannelImpl.ConfigSelectingClientCall may return early in start() leaving delegate null, and fails request() method after start().

Currently the bug can only be triggered when using xDS.
2021-04-14 23:06:37 -07:00
Chengyuan Zhang d4fa0ecc07
xds: reduce the size of ring for testing pick distributions (#8079)
In the ring hash LB policy, building the ring is computationally heavy. Although using a larger ring can make the RPC distribution closer to the actual weights of hosts, it takes long time to finish the test.

Internally, each test class is expected to finish within 1 minute, while each of the test cases for testing pick distribution takes about 30 sec. By reducing the ring size by a factor of 10, the time spent for those test cases reduce to 1-2 seconds. Now we need larger tolerance for the distribution (three hosts with weights 1:10:100):

- With a ring size of 100000, the 10000 RPCs distribution is close to 91 : 866 : 9043
- With a ring size of 10000, the 10000 RPCs distribution is close to 104 : 808 : 9088

Roughly, this is still acceptable.
2021-04-12 14:55:31 -07:00
Sergii Tkachenko 278a336d1f buildscript: xds-k8s increase build timeout 2021-04-08 22:05:29 -04:00
Sergii Tkachenko c113ba1030 buildscript: add xds-k8s cluster endpoint override
Missed this in 1b86618ce9
2021-04-08 21:14:08 -04:00
Chengyuan Zhang 95adf96848
xds: implement ring_hash load balancing policy (#7943)
Implementation for the ring hash LB policy. A LoadBalancer that provides consistent hashing based load balancing to upstream hosts, with the "Ketama" hashing that maps hosts onto a circle (the "ring") by hashing its addresses. Each request is routed to a host by hashing some property of the request and finding the nearest corresponding host clockwise around the ring. Each host is placed on the ring some number of times proportional to its weight. With the ring partitioned appropriately, the addition or removal of one host from a set of N hosts will affect only 1/N requests.
2021-04-08 17:58:45 -07:00
Sergii Tkachenko 1b86618ce9 buildscript: use different xds-k8s cluster
In preparation to the Public Preview.
2021-04-08 19:00:34 -04:00
Sergii Tkachenko d971fe629c RELEASING.md: remove JCenter note
JFrog has announced that they are shutting down the JCenter: https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/
2021-04-08 14:03:33 -04:00