Commit Graph

1056 Commits

Author SHA1 Message Date
vinodhabib 112dccbb6a xds: fixed unsupported unsigned 32 bits issue for circuit breaker (#11735)
Added change for circuit breaking by converting signed 32-bit Int to Unsigned 64-bit Long For MaxRequest negative value ( -1)

Fixes #11695
2025-01-15 14:55:01 -08:00
Vindhya Ningegowda 0de7bfefb9
xds: Remove xds authority label from metric registration
* Remove `grpc.xds.authority` label while registering `grpc.xds_client.resources` gauge, until the label value is available to record.
2025-01-14 21:23:23 -08:00
Eric Anderson e3e343db8e xds: Remember nonces for unknown types
If the control plane sends a resource type the client doesn't understand
at-the-moment, the control plane will still expect the client to include
the nonce if the client subscribes to the type in the future.

This most easily happens when unsubscribing the last resource of a type.
Which meant 1cf1927d1 was insufficient.
2025-01-07 12:35:49 -08:00
Eric Anderson aba8a0c306 xds: Preserve nonce when unsubscribing type
This fixes a regression introduced in 19c9b998.

b/374697875
2025-01-07 10:39:33 -08:00
Eric Anderson ded82e209c xds: Unexpected types in server_features should be ignored
It was clearly defined in gRFC A30. The relevant text was copied as a
comment in the code.

As discovered due to grpc/grpc-go#7932
2024-12-19 08:13:04 -08:00
Vindhya Ningegowda 20d09cee57
xds: Add counter and gauge metrics (#11661)
Adds the following xDS client metrics defined in [A78](https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#xdsclient).

Counters
- grpc.xds_client.server_failure
- grpc.xds_client.resource_updates_valid
- grpc.xds_client.resource_updates_invalid

Gauges
- grpc.xds_client.connected
- grpc.xds_client.resources
2024-11-25 16:47:32 -08:00
Eric Anderson 1f159d7899 xds: Fix XdsSecurityClientServerTest TrustManagerStore race
When spiffe support was added it caused
tlsClientServer_useSystemRootCerts_validationContext to become flaky.
This is because test execution order was important for whether the race
would occur.

Fixes #11678
2024-11-14 22:01:38 -08:00
Eric Anderson 4e8f7df589
util: Remove resolvedAddresses from MultiChildLb.ChildLbState
It isn't actually used by MultiChildLb, and using the health API gives
us more confidence that health is properly plumbed.
2024-11-14 12:56:24 -08:00
Eric Anderson 8237ae270a util: Remove EAG conveniences from MultiChildLb
This is a step toward removing ResolvedAddresses from ChildLbState,
which isn't actually used by MultiChildLb. Most usages of the EAG usages
can be served more directly without peering into MultiChildLb's
internals or even accessing ChildLbStates, which make the tests less
sensitive to implementation changes. Some changes do leverage the new
behavior of MultiChildLb where it preserves the order of the entries.

This does fix an important bug in shutdown tests. The tests looped over
the ChildLbStates after shutdown, but shutdown deleted all the children
so it looped over an entry collection. Fixing that exposed that
deliverSubchannelState() didn't function after shutdown, as the listener
was removed from the map when the subchannel was shut down. Moving the
listener onto the TestSubchannel allowed having access to the listener
even after shutdown.

A few places in LeastRequestLb lines were just deleted, but that's
because an existing assertion already provided the same check but
without digging into MultiChildLb.
2024-11-11 13:16:21 -08:00
Kannan J 5081e60626
xds: Replace null check with has value check because proto fields can never be null. (#11675) 2024-11-08 13:17:24 +05:30
erm-g d6c80294a7
xds: Spiffe Trust Bundle Support (#11627)
Adds verification of SPIFFE based identities using SPIFFE trust bundles.

For in-progress gRFC A87.
2024-11-07 21:03:15 -08:00
MV Shiva 76705c235c
xds: Implement GcpAuthenticationFilter (#11638) 2024-11-06 16:39:00 +05:30
Eric Anderson 664f1fcf8a xds: Remove Bazel dependency on xds v2
feab4e54 removed xds v2 for the Gradle build. Testing with a deploy.jar,
I see the same 4 MB size reduction (31 -> 27 MB) here.

While an orca dependency is deleted in this commit, it is only a direct
dependency. It remains in the :orca target, so doesn't contribute a size
reduction.
2024-11-05 10:02:23 -08:00
MV Shiva 88596868a4
xds: Envoy proto sync to 2024-10-23 (#11664) 2024-11-05 10:56:33 +05:30
Eric Anderson 1993e68b03
Upgrade depedencies (#11655) 2024-11-01 07:50:08 -07:00
Kannan J c167ead851
xds: Per-rpc rewriting of the authority header based on the selected route. (#11631)
Implementation of A81.
2024-10-30 21:11:41 +05:30
Eric Anderson 3562380da5 Upgrade Gradle to 8.10.2 and upgrade plugins
com.github.johnrengelman.shadow is now com.gradleup.shadow (note the
redirect)
https://github.com/johnrengelman/shadow/releases/tag/8.3.0
2024-10-30 07:00:57 -07:00
Kannan J 0b2c17d0da
Xds: Implement using system root trust CA for TLS server authentication (#11470)
Allow using system root certs for server cert validation rather than CA root certs provided by the control plane when the validation context provided by the control plane specifies so.
2024-10-25 14:36:27 +05:30
Vindhya Ningegowda 2e9c3e19fb
xds: Update error handling for ADS stream close and failure scenarios (#11596)
When an ADS stream in closed with a non-OK status after receiving a response, new status will be updated to OK status. This makes the fail behavior consistent with gRFC A57.
2024-10-08 17:28:14 -07:00
Kannan J 1ded8aff81
On result2 resolution result have addresses or error (#11330)
Combined success / error status passed via ResolutionResult to the NameResolver.Listener2 interface's onResult2 method - Addresses in the success case or address resolution error in the failure case now get set in ResolutionResult::addressesOrError by the internal name resolvers.
2024-10-07 17:55:56 +05:30
Larry Safran 9bb06af963
Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time (#11520)
* Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time if environment variable GRPC_SERIALIZE_RETRIES == true.

Cache serializingRetries value so that it doesn't have to look up the flag every time.

Clear the correct task when READY in processSubchannelState and move the logic to cancelScheduledTasks

Cleanup based on PR review

remove unneeded checks for shutdown.

* Fix previously broken tests

* Shutdown previous subchannel when run off end of index.

* Provide option to disable subchannel retries to let PFLeafLB take control of retries.

* InternalSubchannel internally goes to IDLE when sees TF when reconnect is disabled.
Remove an extra index.increment in LeafLB
2024-10-02 17:03:47 -07:00
Eric Anderson 795e2cc3ff util: Simplify MultiChildLB.getChildLbState()
Tests were converted to use getChildLbStateEag() if the argument was an
EAG, so the instanceof was no longer necessary.
2024-09-30 08:17:24 -07:00
Eric Anderson 8c3496943c xds: Have ClusterManagerLB use child map for preserving children
Instead of doing a dance of supplementing config so the later
createChildAddressesMap() won't delete children, just look at the
existing children and don't delete any that shouldn't be deleted.
2024-09-30 08:17:10 -07:00
Eric Anderson 9faa0f4eb0 xds: Update ClusterImpl test to work with PFLeafLB 2024-09-26 09:54:04 -07:00
Eric Anderson 5dbca0e80c xds: Improve ClusterImpl's FakeSubchannel to verify state changes
The main goal was to make sure subchannels went CONNECTING only after a
connection was requested (since the test doesn't transition to
CONNECTING from TF). That helps guarantee that the test is using the
expected subchannel.

The missing ClusterImplLB.requestConnection() doesn't actually matter
much, as cluster manager doesn't propagate connection requests.
2024-09-25 11:11:44 -07:00
Vindhya Ningegowda 3e8ef8cf0c
xds: Check for validity of xdsClient in ClusterImplLbHelper (#11553)
* Added null check for xdsClient in onSubChannelState. This avoids NPE
for xdsClient when LB is shutdown and onSubChannelState is called later
as part of listener callback. As shutdown is racy and eventually consistent,
this check would avoid calculating locality after LB is shutdown.
2024-09-24 16:18:34 -07:00
Vindhya Ningegowda f3cf7c3c75
xds: Add xDS node ID in few control plane errors (#11519) 2024-09-12 15:40:20 -07:00
Vindhya Ningegowda f6d2f20fcd
Fix assertion to resolve flakiness in upstreamLocalityStatsList order (#11514) 2024-09-06 09:15:14 -07:00
Vindhya Ningegowda 1dae144f0a
xds: Fix load reporting when pick first is used for locality-routing. (#11495)
* Determine subchannel's network locality from connected address, instead of assuming that all addresses for a subchannel are in the same locality.
2024-08-31 16:07:53 -07:00
Eric Anderson cfecc4754b Focus MultiChildLB updates around ResolvedAddresses of children
This makes ClusterManagerLB more straight-forward, focusing on just the
things that are relevant to it, and it avoids specialized map key
handling in updateChildrenWithResolvedAddresses().
2024-08-29 13:13:57 -07:00
Eric Anderson 4cb6465194 util: MultiChildLB children know if they are active
No need to look up in the map to see if they are still a child.
2024-08-29 08:05:16 -07:00
Eric Anderson 01389774d5 util: Remove child policy config from MultiChildLB state
The child policy config should be refreshed every address update, so it
shouldn't be stored in the ChildLbState. In addition, none of the
current usages actually used what was stored in the ChildLbState in a
meaningful way (it was always null).

ResolvedAddresses was also removed from createChildLbState(), as nothing
in it should be needed for creation; it varies over time and the values
passed at creation are immutable.
2024-08-29 08:04:50 -07:00
Eric Anderson 10d6002cbd xds: ClusterManagerLB must update child configuration
While child LB policies are unlikey to change for each cluster name (RLS
returns regular cluster names, so should be unique), and the
configuration for CDS policies won't change, RLS configuration can
definitely change.
2024-08-28 14:34:56 -07:00
Larry Safran d034a56cb0
Xds client split (#11484) 2024-08-23 13:05:38 -07:00
Eric Anderson 778a00b623 util: Remove MultiChildLB.getImmutableChildMap()
No usages actually needed a map nor a copy.
2024-08-17 08:55:22 -07:00
Eric Anderson ff8e413760
Remove direct dependency on j2objc
Bazel had the dependency added because of #5046, where Guava was
depending on it as compile-only and Bazel build have "unknown enum
constant" warnings. Guava now has a compile dependency on j2objc, so
this workaround is no longer needed. There are currently no version skew
issues in Gradle, which was the only usage.
2024-08-13 21:33:55 -07:00
Eric Anderson 909c4bc382 util: Remove minor convenience functions from MultiChildLB
These were once needed to be overridden (e.g., by RoundRobinLB), but
now nothing overrides them and MultiChildLB doesn't even call one of
them.
2024-08-13 21:29:08 -07:00
Eric Anderson fd8734f341 xds: Delegate more RingHashLB address updates to MultiChildLB
Since 04474970 RingHashLB has not used
acceptResolvedAddressesInternal(). At the time that was needed because
deactivated children were part of MultiChildLB. But in 9de8e443, the
logic of RingHashLB and MultiChildLB.acceptResolvedAddressesInternal()
converged, so it can now swap back to using the base class for more
logic.
2024-08-12 16:40:00 -07:00
Eric Anderson b5989a5401 util: MultiChildLb children should always start with a NoResult picker
That's the obvious default, and all current usages use (something
equivalent to) that default.
2024-08-12 16:39:44 -07:00
Eric Anderson a6f8ebf33d Remove implicit requestConnection() on IDLE from MultiChildLB
One LB no longer needs to extend ChildLbState and one has to start, so
it is a bit of a wash. There are more LBs that need the auto-request
logic, but if we have an API where subclasses override it without
calling super then we can't change the implementation in the future.
Adding behavior on top of a base class allows subclasses to call super,
which lets the base class change over time.
2024-08-12 15:40:01 -07:00
Eric Anderson 0d47f5bd1b
xds: WRRPicker must not access unsynchronized data in ChildLbState
There was no point to using subchannels as keys to
subchannelToReportListenerMap, as the listener is per-child. That meant
the keys would be guaranteed to be known ahead-of-time and the
unsynchronized getOrCreateOrcaListener() during picking was unnecessary.

The picker still stores ChildLbStates to make sure that updating weights
uses the correct children, but the picker itself no longer references
ChildLbStates except in the constructor. That means weight calculation
is moved into the LB policy, as child.getWeight() is unsynchronized, and
the picker no longer needs a reference to helper.
2024-08-12 11:23:37 -07:00
Eric Anderson 0d2ad89016 xds: Remove useless ExperimentalApi for WRR
A package-private class isn't visible and `@Internal` is stronger than
experimental. The only way users should use WRR is via the
weight_round_robin string, and that's already not suffixed with
_experimental.

Closes #9885
2024-08-12 11:19:56 -07:00
Eric Anderson d1dcfb0451 xds: Replace WrrHelper with a per-child Helper
There's no need to assume which child makes a subchannel based on the
subchannel address.
2024-08-09 16:24:51 -07:00
Kurt Alfred Kluever 06135a0745 Migrate from the deprecated `Charsets` constants (in Guava) to the `StandardCharsets` constants (in the JDK)
cl/658539667
2024-08-05 13:31:08 -07:00
Sergii Tkachenko c29763d886
xds: Import RLQS protos (#11418)
Imports the protos of Rate Limiting Quota Service (RLQS) and Rate
Limit Quota HTTP Filter.

Note: the list below only shows the new top-level protos, and excludes
their direct and transitional dependencies (those from import
statements).

#### RLQS Imports
- Service — envoy/service/rate_limit_quota/v3/rlqs.proto
  (Service): 7b8a304
- HTTP Filter —
  envoy/extensions/filters/http/rate_limit_quota/v3/rate_limit_quota.proto:
  49c77c4

#### CEL Imports
- Initial third-party repo setup: 99a64bd
- Parsed CEL Expression: cel/expr/syntax.proto: 99a64bd
- Parsed and type-checked CEL Expression: cel/expr/checked.proto:
  99a64bd


#### Required typed_config extensions
##### `bucket_matchers` predicate input
- `HttpAttributesCelMatchInput` —
  xds/type/matcher/v3/http_inputs.proto: 54924e0
- `HttpRequestHeaderMatchInput` —
  envoy/type/matcher/v3/http_inputs.proto: 49c77c4

##### `bucket_matchers` predicate custom_match
- `CelMatcher` — xds/type/matcher/v3/cel.proto: 54924e0
2024-08-02 10:35:29 -07:00
Eric Anderson 9bc1a93f6e xds: Add test that uses real DnsNR with ClusterResolverLB
This can detect failures like the UnsupportedOperationException from
ebffb0a6.
2024-08-02 07:26:11 -07:00
Eric Anderson dc83446d98 xds: Stop extending RR in WRR
They share very little code, and we really don't want RoundRobinLb to be
public and non-final. Originally, WRR was expected to share much more
code with RR, and even delegated to RR at times. The delegation was
removed in 111ff60e. After dca89b25, most of the sharing has been moved
out into general-purpose tools that can be used by any LB policy.

FixedResultPicker now has equals to makes it as a EmptyPicker
replacement. RoundRobinLb still uses EmptyPicker because fixing its
tests is a larger change. OutlierDetectionLbTest was changed because
FixedResultPicker is used by PickFirstLeafLb, and now RoundRobinLb can
squelch some of its updates for ready pickers.
2024-07-31 13:32:49 -07:00
Sergii Tkachenko 0017c98f6b
xds: cncf/xds proto sync to 2024-07-24 (#11417)
`cncf/xds`: Sync protos to the latest imported version
cncf/xds@024c85f (commit 2024-07-23, cl/655545156).

Should be a noop, just a routine xDS proto update to make upcoming
RLQS-related imports simpler, see related #11401.

Note that CEL is only added as a bazel dependency as now it's required
to build cncf/xds. Actual third-party source import will be done in
the follow up PR, where RLQS dependencies are added to the import
scripts.
2024-07-30 12:17:49 -07:00
Jiajing LU 448ec4f37e
xds: XdsClient should unsubscribe on last resource (#11264)
Otherwise, the server will continue sending updates and if we
re-subscribe to the last resource, the server won't re-send it. Also
completely remove the per-type state, as it could only add confusion.
2024-07-30 08:46:01 -07:00
Sergii Tkachenko 96a788a349
xds: Envoy proto sync to 2024-07-06 (#11401)
`envoyproxy/envoy`: Sync protos to the latest imported version
ab911ac2ff
(commit 2024-07-06, cl/651956889).

Should be a noop, just a routine xDS proto update to make upcoming
RLQS-related imports simpler.
2024-07-29 09:18:18 -07:00