Commit Graph

4876 Commits

Author SHA1 Message Date
Menghan Li d4e90a78fd
buildscript: fold header/path matching tests to all (#8054) 2021-05-27 09:47:50 -07:00
Chengyuan Zhang 8129c4e673
xds: import v3 RBAC http filter proto (#8215) 2021-05-27 09:43:56 -07:00
markb74 f88d362bc7
Fix the use of scheduler pools in BinderServer. (#8210)
Switch to using scheduled pools in BinderServer.
2021-05-27 13:37:22 +02:00
sanjaypujare bfcba82dd5
xds: remove MeshCaCertificateProvider and DynamicReloadingCertificate{Provider (#8214) 2021-05-26 19:35:51 -07:00
sanjaypujare 328071bbce
xds: replace DownstreamTlsContext by SslContextProviderSupplier in the Listener (#8205) 2021-05-26 14:42:47 -07:00
ZHANG Dapeng 6aeeba805f
xds: enhance delay injection error message on DEADLINE_EXCEEDED (#8185)
When an RPC is injected with a delay and then fails with DEADLINE_EXCEEDED (partially) due to the delay, it could confuse users if the error message does not mention the existence of the delay injection, because end users normally are not the same people who configured fault injection policy in control plane.
2021-05-26 14:35:45 -07:00
Eric Anderson a7792d3d14 Limit permissions to gradlew validator GH Action
I've already limited the grpc-wide setting to read-only access, but
limiting it explicitly here seems like a good idea; all workflows should
explicitly set their permissions since any action can implicitly access
the GITHUB_TOKEN.
2021-05-26 14:24:29 -07:00
Chengyuan Zhang 505594ac53
xds: change google_default/compute_engine creds to select TLS if the xDS cluster name is prefixed with 'google_cfe_' (#8152)
Following up changes in bbc5f61abb, the cluster_resolver LB policy uses the hostname received in CDS responses for discovering LOGICAL_DNS cluster endpoints.

Based on the new design, TD will generate a CFE cluster called "google_cfe_${service_name}" (e.g., for DirectPath service "cloud-bigtable.googleapis.com", the cluster name will be "google_cfe_cloud-bigtable.googleapis.com") for each DirectPath service. google_default/compute_engine creds will identify CFE clusters by the name having the prefix "google_cfe_".
2021-05-26 12:06:23 -07:00
Chengyuan Zhang bbc5f61abb
xds: use load assignment endpoint address in Cluster as the DNS hostname for LOGICAL_DNS (#8151)
Fixes the source of hostname used for DNS resolution in the cluster_resolver LB policy for LOGICAL_DNS clusters. The change includes:

- parse the single endpoint address from the embedded Cluster resource in CDS responses as the DNS hostname for LOGICAL_DNS cluster and include it in CdsUpdate being notified to the CDS LB policy.
- propagate the DNS hostname to the cluster_resolver LB policy via its LB config (DiscoveryMechanism for LOGICAL_DNS cluster).
- cluster_resolver LB policy takes the DNS hostname from the DiscoveryMechanism for LOGICAL_DNS cluster and use it as the name for DNS resolution.
2021-05-26 12:02:18 -07:00
markb74 8e18c11bbd
binder: BinderTransport implementation. (#8031)
This is the first major code drop for binderchannel, containing the transport class and its internals.
2021-05-26 14:54:32 +02:00
yifeizhuang 2239dd717c
tsan, xds: fix data race (#8206) 2021-05-25 13:35:09 -07:00
sanjaypujare 5b1c3fa12c
xds: shutDown the scheduledExecutorService when the provider is shutdown (#8198) 2021-05-24 12:45:01 -07:00
cfredri4 c8cd4cb260
netty: Support SocketAddress with ChannelCredentials (#8194)
This adds support for creating a Netty Channel with SocketAddress and ChannelCredentials.

This aligns with NettyServerBuilder.forAddress(SocketAddress address, ServerCredentials creds).
2021-05-24 09:49:20 -07:00
sanjaypujare 869b395ec0
xds: ignore unknown SAN name type instead of throwing exception (#8183) 2021-05-19 11:48:11 -07:00
Eric Gribkoff 465c932b41
Update README etc to reference 1.38.0 (#8189) 2021-05-19 00:09:36 -07:00
Chengyuan Zhang 86465b3399
xds: cluster_resolver LB policy should wait until all clusters being resolved before propagating endpoints to child LB policy (#8176)
Do not propagate partial endpoint discovery results to the child LB policy of cluster_resolver LB policy. This could avoid premature RPC failures when connections to resolved endpoints fail while there are other unresolved endpoints. Also, endpoints should be attempted in the order of clusters they belong to: endpoints from a lower-priority cluster should not be used before endpoints from a higher-priority cluster are attempted. Most importantly, it should not fallback to use DNS-resolved endpoints before all EDS-resolved endpoints failed.
2021-05-18 13:14:37 -07:00
Chengyuan Zhang e5d0e9d9a8
api, core: support zero copy into protobuf (#8102)
Enables a codepath for zero-copy protobuf deserialization. Two new InputStream extension interfaces are added:

- HasByteBuffer: allows access to the underlying buffers containing inbound bytes directly without copying
- Detachable: allows customer marshaller to keep the buffers around until the application code is done with using the protobuf messages

Applications can implement a custom marshaller that takes over the ownership of ByteBuffers and wrap them into ByteStrings with protobuf's UnsafeByteOperations support. Then a RopeByteString, which is a in-place composite of ByteStrings can be created. This enables using the zero-copy codepath (requires immutable ByteBuffer indication) of CodedInputStream for deserialization.
2021-05-14 14:45:03 -07:00
Chengyuan Zhang fd8964f7d1
Update README etc to reference 1.37.1 (#8179) 2021-05-14 12:42:28 -07:00
Chengyuan Zhang 413deb7f0c
xds: implement PriorityChildConfig toString() (#8173) 2021-05-12 16:01:40 -07:00
Chengyuan Zhang 2335eb5b63
xds: eliminate test verification for nondeterministic behaviors (#8172)
When the ring_hash LB policy enters TRANSIENT_FAILURE, it tries to connect one of the IDLE subchannels. Which subchannel to be connected to is non-deterministic, it just choose the first one from the subchannels map.

The existing test creates 4 subchannels, brings down 2 of them to let ring_hash LB policy enter TRANSIENT_FAILURE. But which one fo the remaining two subchannels to be kicked off connection is nondeterministic. This introduces trouble for verifying the behavior. This change simplifies the test, to only create 3 subchannels so that there is only one single subchannel remaining in IDLE after bringing the other two down. We are able to easily verify the behavior of ring_hash LB policy requesting connection for that one subchannel.
2021-05-12 14:17:21 -07:00
sanjaypujare e59604b7ce
xds: add null reference checks in SslContextProviderSupplier (#8169) 2021-05-12 10:27:44 -07:00
Eric Anderson e08b9db208
Use @DoNotCall for static methods in Builders that throw
Since static methods are pseudo-inherited by Builder implementations but
are trivially accidentally used, we re-define static methods in each
builder to make them behave more like the caller would expect. However,
not all the methods actually work; some just throw because the caller
was certainly not getting what they would expect.

Annotating with `@DoNotCall` can expose the problems at compile time
instead of runtime. While `@Deprecated` would also be an option, it is a
bit harder to figure out the ramifications and whether we want to go
that route.

This change was suggested by a lint tool for XdsServerBuilder and it
seems appropriate so I applied it to the other similar cases I could
find.
2021-05-12 10:12:52 -07:00
Leonardo Pistone 1a655622c7
Document that xds uses grpc-netty-shaded (#7877) 2021-05-11 19:19:40 -07:00
Eric Anderson 1882c47eb9 netty: Remove Maven pom.properties from netty-shaded
The pom.properties are apparently present to allow tooling to know what
Maven artifact cooresponds to a JAR, just by looking at the JAR. Since
we shade Netty, that produces inaccurate results. This was noticed in
in #8077.
2021-05-11 15:36:44 -07:00
ZHANG Dapeng 8dc16cd569
okhttp: let frameReader report existing goAwayStatus when socket closed
`OkHttpClientTransport.ClientFrameHandler` will fail a stream with `Status.UNAVAILABLE.withDescription("End of stream or IOException")` when socket is closed with an error. However, it does not include any more error detail. This PR provides more error detail in case there is an existing goaway status, e.g. netty server can send goaway with lastKnownStreamId=MAX_INT when header size exceeded max allowed size netty/netty/pull/10775 and shutdown the connection.

Test: `io.grpc.okhttp.OkHttpTransportTest.serverChecksInboundMetadataSize` with `netty-4.1.54.Final`
2021-05-11 14:23:39 -07:00
Chengyuan Zhang f4fe466fb0
xds: lazily and only parse headers with matchers matching the key (#8163)
In normal cases, we only have a few header matchers but the number of headers can be completely up to the application. Indexing headers eagerly parses all headers, even for those with no matcher matching the key. We should only parse header values for those with key matching the header matcher (aka, only call Metadata.get() with key that has some matcher looking for).
2021-05-11 14:20:02 -07:00
Chengyuan Zhang dbc5786c30
xds: ring_hash self recover from TRANSIENT_FAILURE by attempting to connect one subchannel (#8144)
Kicks off connection for one of IDLE subchannels (if exist) when the ring_hash LB policy is reporting TRANSIENT_FAILURE to its upstream.

While the ring_hash policy is reporting TRANSIENT_FAILURE, it will not be getting any pick requests from the priority policy. However, because the ring_hash policy does not attempt to reconnect to subchannels unless it is getting pick requests, it will need special handling to ensure that it will eventually recover from TRANSIENT_FAILURE state once the problem is resolved. Specifically, it will make sure that it is attempting to connect (after applicable backoff period) to at least one subchannel at any given time.
2021-05-11 01:58:57 -07:00
sanjaypujare 0c2d8edc4c
xds: refactor TlsContextManager related code to remove dependency on Bootstrapper (#8150) 2021-05-10 13:13:26 -07:00
Chengyuan Zhang c7afb89708
grpclb: use a standalone Context for gRPCLB control plane RPCs (#8154)
Inject a standalone Context that is independent of application RPCs to GrpclbLoadBalancer for control plane RPCs. The control plane RPC should be independent and not impacted by the lifetime of Context used for application RPCs.
2021-05-10 10:21:36 -07:00
Chengyuan Zhang 7b09056aa4
xds: use a standalone Context for xDS control plane RPCs (#8153)
Control plane RPCs are independent of application RPCs, they can stand for completely different lifetime. So the context for making application RPCs should not be propagated to control plane RPCs. This change makes control plane RPCs use the ROOT Context.
2021-05-07 18:00:47 -07:00
Sergii Tkachenko 976daf2dd0 buildscripts: switch xds-k8s cluster to 1.20.x 2021-05-05 20:01:15 -04:00
Eric Gribkoff c0eca6de25
Start 1.39.0 development cycle (#8147) 2021-05-05 16:30:38 -07:00
Lidi Zheng d2160ea703
Extend the xDS interop tests timeout to 360 mins (#8133) 2021-05-05 10:18:44 -07:00
sanjaypujare c9e327d42f
xds: extend SslContextProviderSupplier to DowmstreamTlsContext for server side (#8146) 2021-05-04 22:19:15 -07:00
yifeizhuang 27b1641653
xds: import envoy (#8145) 2021-05-04 16:20:44 -07:00
Chengyuan Zhang fcaf9a9583
xds: ignore balancing state update from downstream after LB shutdown (#8134)
LoadBalancers should not propagate balancing state updates after itself being shutdown.

For LB policies that maintain a group of child LB policies with each having its independent lifetime, balancing state update propagations from each child LB policy can go out of the lifetime of its parent easily, especially for cases that balancing state update is put to the back of the queue and not propagated up inline.

For LBs that are simple pass-through in the middle of the LB tree structure, it isn't a big issue as its lifecycle would be the same as its child. Transitively, It would behave correctly as long as its downstream is doing in the right way.

This change is a sanity cleanup for LB policies that maintain multiple child LB policies to preserve the invariant that further balancing state updates from their child policies will not get propagated.
2021-05-04 15:56:56 -07:00
Chengyuan Zhang ee000f0dc1
xds: throw away subchannel references after ring_hash is shutdown (#8140)
Similar to 368c43aec4. 

Clean up subchannels after the RingHashLoadBalancer itself is shutdown to prevent further balancing state updates being propagated to the upstream.

Note this should not be considered as a fix for any problem anybody is noticing. Upstreams of RingHashLoadBalancer should not rely on this, it should still have its own logic for maintaining the lifecycle of downstream LB and ignore invalid upcalls when necessary.
2021-05-04 13:35:37 -07:00
Eric Anderson 16eb5a47ec Stabilize ChannelCredentials
Some of the experimental API annotations were changed to other issues or
became `@Internal` to match their related APIs.

Fixes #7479
2021-05-03 16:22:43 -07:00
Eric Anderson d42f3b8fcb Stabilize ServerCredentials
Some of the experimental API annotations were changed to other issues or
became @Internal to match their related APIs.

Fixes #7621
2021-05-03 16:10:24 -07:00
Joris Scharp f0c9ae26d7 examples: exporting the io.grpc.examples.manualflowcontrol client & server to the example bin output folder. This commit improves gRPC to quickly find out this example exists, instead of having to go through the source code. 2021-05-03 10:57:42 -07:00
Chengyuan Zhang 4a339e41ba
xds: fix URI creation used to instantiate DNS name resolver (#8129)
When creating the URI using Channel authority for instantiating a DNS resolver in the cluster_resolver LB policy, a "dns" scheme needs to be manually attached and the Channel authority would be used as the URI path (same as creating Channel with target). Otherwise, the Channel authority will just be used as the scheme and causing name resolver not found.

The change also handles name resolver lookup more defensively. Although it should not happen, if there does have bug causing DNS resolver not being able to be loaded, the cluster_resolver LB policy propagates the INTERNAL error to upstream.
2021-04-30 18:10:40 -07:00
Chengyuan Zhang 368c43aec4
core: throw away subchannel references after round_robin is shutdown (#8132)
Triggering balancing state updates after already being shutdown can be confusing for the upstream of round_robin. In cases of the callers not managing round_robin's lifecycle (e.g., not ignoring updates after it shuts down round_robin, which it should), it can make problem very bad, especially with the behavior that round_robin is actually propagating TRANSIENT_FAILURE with a picker that buffers RPCs.

This change only polishes round_robin by always preserving its invariant. Callers/LBs should not rely on this and should still manage the balancing updates from its downstream correctly based on the downstream's lifetime.
2021-04-30 15:42:44 -07:00
sanjaypujare 02ff64fa21
xds: use singleton XdsClient for server side (#8130) 2021-04-30 09:52:56 -07:00
Chengyuan Zhang 5d99bb07b8
xds: pretty print ClusterConfig message (#8128)
Adds ClusterConfig message descriptor to message printer.
2021-04-29 18:04:03 -07:00
Chengyuan Zhang 42d7fba1b8
xds: implement toString() for pickers to visualize selectable hosts (#8123)
Implements toString() for the wrapping SubchannelPickers so that we are able to see how hosts are selected when sending out RPCs.
2021-04-28 15:16:30 -07:00
Jan Tattermusch 72527708f5
skip flaky :grpc-xds:test in linux aarch64 tests (#8119)
Due to #8118 the :grpc-xds:test currently doesn't provide useful signal.
2021-04-27 10:18:39 -07:00
Sergii Tkachenko fcbc1abc44 buildscripts: xds-k8s pin pip to 21.0.1
pip 21.1 released on Apr 24 introduced a regression for python 3.6.1.
The regression was identified on Apr 24, the fix merged on Apr 25.
The fix is expected to be delivered in the 21.1.1 patch.

There's no clear date, when 21.1.1 will be released.
Until then, pin is temporarily pinned to the previous release, 21.0.1.
2021-04-26 18:15:18 -04:00
yifeizhuang 6755cfed34
tsan, xds: fix XdsClientWrapperForServerSds data races (#8107) 2021-04-26 14:37:11 -07:00
yifeizhuang 8468b5c42f
tsan, xds: fix data races in ServerWrapperForXds (#8114) 2021-04-26 11:58:32 -07:00
Jan Tattermusch b436d0dfb7
Improve emulated linux aarch64 tests, without protoc artifact build
Based on my testing, the tests are currently flaky, so it would be better to first merge it and start running it continuously to asses the situation.
2021-04-26 10:54:28 -07:00