Commit Graph

6837 Commits

Author SHA1 Message Date
Kun Zhang e6e7bcadaf
binder: stops emulating for 21/22 Lollipop in tests
See cl/769936336 internally
2025-06-18 14:47:24 -07:00
John Cormie 922dc8a999
Mark a few test helper methods as @CanIgnoreReturnValue (#12162) 2025-06-18 10:59:21 +05:30
Eric Anderson d2d8ed8efa xds: Add logical dns cluster support to XdsDepManager
ClusterResolverLb gets the NameResolverRegistry from
LoadBalancer.Helper, so a new API was added in NameResover.Args to
propagate the same object to the name resolver tree.

RetryingNameResolver was exposed to xds. This is expected to be
temporary, as the retrying is being removed from ManagedChannelImpl and
moved into the resolvers. At that point, DnsNameResolverProvider would
wrap DnsNameResolver with a similar API to RetryingNameResolver and xds
would no longer be responsible.
2025-06-17 22:14:20 +00:00
Eric Anderson f07eb47cac
util: Deliver addresses in a random order in MultiChildLb
This should often not matter much, but in b/412468630 it was cleary
visible that child creation order can skew load for the first batch of
RPCs. This doesn't solve all the cases, as further-away backends will
still be less likely chosen initially and it is ignorant of the LB
policy. But this doesn't impact correctness, is easy, and is one fewer
cases to worry about.
2025-06-17 07:14:32 -07:00
Eric Anderson 2604ce8a55 xds: XdsNR should be subscribing to clusters with XdsDepManager
This is missing behavior defined in gRFC A74:

> As per gRFC A31, the ConfigSelector gives each RPC a ref to the
> cluster that was selected for it to ensure that the cluster is not
> removed from the xds_cluster_manager LB policy config before the RPC
> is done with its LB picks. These cluster refs will also hold a
> subscription for the cluster from the XdsDependencyManager, so that
> the XdsDependencyManager will not stop watching the cluster resource
> until the cluster is removed from the xds_cluster_manager LB policy
> config.

Without the logic, RPCs can race and see the error:

> INTERNAL: CdsLb for cluster0: Unable to find non-dynamic root cluster

Fixes #12152. This fixes the regression introduced in 297ab05e
2025-06-17 13:36:24 +00:00
vimanikag 1c43098990
util: OutlierDetection should use Ticker, not TimeProvider (#12110)
TimeProvider provides wall time. That can move forward and backward as time is adjusted. OutlierDetection is measuring durations, so it should use a monotonic clock.

Fixes #11622
2025-06-16 07:47:36 -07:00
Eric Anderson d5b4fb51c2 xds: Support tracking non-xds resources in XdsDepManager
This will be used for logical dns clusters as part of gRFC A74. Swapping
to EnumMap wasn't really necessary, but was easy given the new type
system.

I can't say I'm particularly happy with the name of the new
TrackedWatcher type, but XdsConfigWatcher prevented using "Watcher"
because it won't implement the new interface, and ResourceWatcher
already exists in XdsClient. So we have TrackedWatcher, WatcherTracer,
TypeWatchers, and TrackedWatcherType.
2025-06-16 14:32:27 +00:00
Eric Anderson 8974a306af
util: Mark OutlierDetectionLb classes final
None of these classes were intended to be extended. Even non-public
classes need final to prevent mocks from doing horrible things.
2025-06-13 10:52:00 -07:00
Eric Anderson d88ef97a87 core: Remove RetryingNR.RESOLUTION_RESULT_LISTENER_KEY
It was introduced in fcb5c54e4 because at the time we didn't change the
API to communicate the status. When onResult2() was introduced in
90d0fabb1 this hack stopped being necessary.
2025-06-13 15:20:10 +00:00
Eric Anderson 240f731e00 xds: Avoid changing cache when watching children in XdsDepManager
The watchers can be completely regular, so the base class can do the
cache management while the subclasses are only concerned with
subscribing to children.
2025-06-13 15:18:14 +00:00
MV Shiva 26bd0eee47
core: Use lazy message formatting in checkState (#12144) 2025-06-13 12:30:13 +05:30
David Sanderson 6f69363d90
bazel: Migrate java_grpc_library to use DefaultInfo (#12148)
We here address the following obstacles in grpc-java to using Bazel's
--incompatible_disable_target_default_provider_fields flag:

```
ERROR: /private/var/tmp/_bazel_dws/7fd3cd5077fbf76d9e2ae421c39ef7ed/external/googleapis+/google/devtools/build/v1/BUILD.bazel:81:18: in _java_grpc_library rule @@googleapis+//google/devtools/build/v1:build_java_grpc:
Traceback (most recent call last):
        File "/private/var/tmp/_bazel_dws/7fd3cd5077fbf76d9e2ae421c39ef7ed/external/grpc-java+/java_grpc_library.bzl", line 94, column 30, in _java_rpc_library_impl
                args.add(toolchain.plugin.files_to_run.executable, format = "--plugin=protoc-gen-rpc-plugin=%s")
Error: Accessing the default provider in this manner is deprecated and will be removed soon. It may be temporarily re-enabled by setting --incompatible_disable_target_default_provider_fields=false. See https://github.com/bazelbuild/bazel/issues/20183 for details.
ERROR: /private/var/tmp/_bazel_dws/7fd3cd5077fbf76d9e2ae421c39ef7ed/external/googleapis+/google/devtools/build/v1/BUILD.bazel:81:18: Analysis of target '@@googleapis+//google/devtools/build/v1:build_java_grpc' failed
ERROR: Analysis of target '//src:bazel' failed; build aborted: Analysis failed
```
2025-06-12 08:43:23 -07:00
Eric Anderson 6cc2ff1ced util: In OutlierDetectionLb, don't box longs if they can't be null
Changed the builder pattern to pass the builder to the constructor,
since I was changing almost all the arguments of the constructor anyway.
2025-06-12 04:02:40 +00:00
Eric Anderson 13fe008044
rls: Refactor estimatedSizeBytes updates (#12145)
Just use a regular method instead of reusing the EvictionListener API.
Fix a few comments as well. Both of these changes were based on review
comments to pre-existing code in #11203.

Contributes to #11243
2025-06-12 09:31:19 +05:30
John Cormie 30f6a4db77
google-java-format a line that was too long (#12147) 2025-06-11 12:19:19 -07:00
Eric Anderson 297ab05efe
xds: Convert CdsLb to XdsDepManager
I noticed we deviated from gRFC A37 in some ways. It turned out those
were added to the gRFC later in https://github.com/grpc/proposal/pull/344:
- NACKing empty aggregate clusters
- Failing aggregate cluster when children could not be loaded
- Recusion limit of 16. We had this behavior already, but it was
  ascribed to matching C++

There's disagreement on whether we should actually fail the aggregate
cluster for bad children, so I'm preserving the pre-existing behavior
for now.

The code is now doing a depth-first leaf traversal, not breadth-first.
This was odd to see, but the code was also pretty old, so the reasoning
seems lost to history. Since we haven't seen more than a single level of
aggregate clusters in practice, this wouldn't have been noticed by
users.

XdsDependencyManager.start() was created to guarantee that the callback
could not be called before returning from the constructor. Otherwise
XDS_CLUSTER_SUBSCRIPT_REGISTRY could potentially be null.
2025-06-11 11:56:13 -07:00
MV Shiva a16d655919
compiler: generate blocking v2 unary calls that throw StatusException (#12126) 2025-06-10 10:31:03 +05:30
Eric Anderson 6afacf589e xds: Don't cache rdsName in XdsDepManager
We can easily compute the rdsName and avoiding the state means we don't
need to override onResourceDoesNotExist() to keep the cache in-sync with
the config.
2025-06-09 14:13:59 +00:00
Eric Anderson 4ee662fbcf xds: cancelled=true on watch close in XdsDepManager
1fd29bc80 replaced cancelWatcher() with watcher.close(). But setting
cancelled was missing. Because the config update checks for shutdown,
the cancelled flag no longer avoids exceptions. But it seems best to
continue avoiding any processing after close to avoid surprises.
2025-06-09 14:13:41 +00:00
Eric Anderson 1fd29bc804 xds: Use tracing GC in XdsDepManager
Reference counting doesn't release cycles, so swap to a tracing garbage
collector. This greatly simplifies the code as well, as diffing is no
longer necessary. (If vanilla reference counting was used, diffing
wouldn't have been necessary either as you just increment all the new
objects and decrement the old ones. But that doesn't work when use a set
instead of an integer.)
2025-06-06 08:43:35 -07:00
eshitachandwani dc192f5c5e
Create SPIFFE tests config (#12133) 2025-06-06 19:24:27 +05:30
John Cormie c206428749
binder: Rationalize @ThreadSafe-ty of BinderTransport. (#12130)
- Use @BinderThread to document restrictions on methods and certain fields.
- Make TransactionHandler non-public since only Android should call it.
- Replace an unnecessary AtomicLong with a plain old long.
2025-06-05 19:07:59 -07:00
Eric Anderson 4cd7881086 xds: Fix XdsDepManager aggregate cluster child ordering and loop detection
The children of aggregate clusters have a priority order, so we can't
ever throw them in an ordinary set for later iteration.

This now detects recusion limits only after subscribing, but that
matches our existing behavior in CdsLoadBalancer2. We don't get much
value detecting the limit before subscribing and doing so makes watcher
types more different.

Loops are still a bit broken as they won't be unwatched when orphaned,
as they will form a reference loop. In CdsLoadBalancer2, duplicate
clusters had duplicate watchers so there was single-ownership and
reference cycles couldn't form. Fixing that is a bigger change.

Intermediate aggregate clusters are now included in XdsConfig, just for
simplicity. It doesn't hurt anything whether they are present or
missing. but it required updates to some tests.
2025-06-05 08:43:02 -07:00
John Cormie 4c73999102
Move all test helper classes out of AbstractTransportTest so they can be used elsewhere (#12125) 2025-06-05 15:39:51 +05:30
Eric Anderson 482dc5c1c3
xds: Don't allow hostnames in address field (#12123)
* xds: Don't allow hostnames in address field

gRFC A27 specifies they must be IPv4 or IPv6 addresses. Certainly doing
a DNS lookup hidden inside the config object is asking for trouble.

The tests were accidentally doing a lot of failing DNS requests greatly
slowing them down. On my desktop, which made the problem most obvious
with five search paths in /etc/resolv.conf, :grpc-xds:test decreased
from 66s to 29s. The majority of that is XdsDependencyManagerTest which
went from 33s to .1s, as it generated a UUID for the in-process
transport each test and then used it as a hostname, which defeated
Java's DNS (negative) cache. The slowness was noticed because
XdsDependencyManagerTest should have run quickly as a single thread
without I/O, but was particularly slow on my desktop.

The cleanup caused me to audit serverName and the weird places it went.
I think some of them were tricks for XdsClientFallbackTest to squirrel
away something distinguishing, although reusing the serverName is asking
for confusion as is including the tricks in "shared" utilities.
XdsClientFallbackTest does have some non-trivial changes, but this seems
to fix some pre-existing bugs in the tests.

* Add failing hostname unit test
2025-06-05 15:37:49 +05:30
Eric Anderson efe9ccc22c
xds: Non-SOTW resources need onError() callbacks, too (#12122)
SOTW is unique in that it can become absent after being found. But if we
NACK when initially loading the resource, we don't want to delay, depend
on the resource timeout, and then give a poor error.

This was noticed while adding the EDS restriction that address is not a
hostname and some tests started hanging instead of failing quickly.
2025-06-03 12:02:02 +05:30
Eric Anderson 48d08e643e
xds: Remove EDS maybePublishConfig() avoidance in XdsDepManager (#12121)
The optimization makes the code more complicated. Yes, we know that
maybePublishConfig() will do no work because of an outstanding watch,
but we don't do this for other new watchers created and doing so would
just make the code more bug-prone. This removes a difference in how
different watcher types are handled.
2025-06-03 11:52:38 +05:30
Eric Anderson 6bad600592
xds: Use getWatchers more often in XdsDepManager
This provides better type and missing-map handling. Note that
getWatchers() now implicitly creates the map if it doesn't exist,
instead of just returning an empty map. That makes it a bit easier to
use and more importantly avoids accidents where a bug tries to modify
the immutable map.
2025-06-02 06:42:31 -07:00
MV Shiva 379fbaa380
Update README etc to reference 1.73.0 (#12108) 2025-06-02 15:37:11 +05:30
Eric Anderson 8044a56ad2
xds: Remove timeouts from XdsDepManagerTest (#12114)
The tests are using FakeClock and inprocess transport with direct executor, so all operations should run in the test thread.
2025-06-02 12:25:27 +05:30
Eric Anderson 142e378cea
xds: Improve shutdown handling of XdsDepManager
The most important change here is to handle subscribeToCluster() calls
after shutdown(), and preventing the internal state from being heavily
confused as the assumption is there are no watchers after shutdown().

ClusterSubscription.closed isn't strictly necessary, but I don't want
the code to depend on double-deregistration being safe.
maybePublishConfig() isn't being called after shutdown(), but adding the
protection avoids a class of bugs that would cause channel panic.
2025-05-30 09:00:27 -07:00
Alex Panchenko 22cf7cf2ac
tests: Replace usages of deprecated junit ExpectedException with assertThrows (#12103) 2025-05-30 10:55:37 +05:30
Sangamesh 83538cdae3
cronet: Delete TODO for User-Agent on CronetEngine
gRPC doesn't create the CronetEngine, so even though streaming is
observing the CronetEngine's User-Agent, we don't have control of that.
In addition, CronetEngines are commonly shared between gRPC and normal
HTTP traffic, so we don't actually expect users to set gRPC in engine's
user agent. The existing behavior seems to be working as well as
feasible.

Fixes #11582
2025-05-27 10:52:47 -07:00
Eric Anderson d124007ff4
kokoro: Don't run grpc codegen in android-interop (#12098)
android-interop has been failing to build since 46485c8 because it
didn't have cmake installed and defined LDFLAGS/CXXFLAGS with pkg-config
before make_dependencies.sh had been run.

Android-interop didn't verify the codegen is up-to-date. Building the
codegen was just a relic from when android was its own separate gradle
build. Avoiding codegen means we don't have to compile absl/protobuf and
have a C++ toolchain.
2025-05-26 08:42:58 +05:30
Kannan J 46485c8b62
Upgrade Protobuf C++ to 22.5 (#11961) 2025-05-23 12:24:14 +05:30
Michael Lumish 9406d3b2a0 Rename PSM interop fallback test suite to light 2025-05-22 15:43:25 -07:00
MV Shiva 9d439d4a44
xds: Add GcpAuthenticationFilter to FilterRegistry (#12075) 2025-05-22 16:22:41 +05:30
Eric Anderson f8700a13ad
compiler: Default to @generated=omit (#12080)
After many years of issue 9179 being open, there's been nothing to show
that we need the javax.annotations.Generated annotation. Most tools use
file paths and a few check for annotations with "Generated" in the name.
ErrorProne has a few that check for javax.annotations.Generated, but
only UnnecessarilyFullyQualified looks like it'd be a problem and it is
disabled by default. We're not getting any more information, no users
have reported issues with `@generated=omit`, and the existing dependency
is annoying users, so just drop it.

Given we will still retain the GrpcGenerated annotation, it seems highly
likely things are already okay. Even if there are problems they would
probably be addressed by adding a io.grpc.stub.annotations.Generated
annotation or small tweaks. In the short-term, (non-Bazel) users can use
`@generated=javax`, but long-term we could consider removing the option
assuming we've resolved any outstanding issues.

We will want to update the examples and the README to remove the
org.apache.tomcat:annotations-api dependency after the next release.

Fixes #9179
2025-05-21 22:49:30 +05:30
Luwei Ge 2fb09578a8
xds: enableSpiffe also checks the new env var per gRFC-A87 2025-05-19 12:50:25 -07:00
Gregory Cooke 480640dc2f
alts: add experimental keepalive (#12076) 2025-05-19 08:30:10 -07:00
Abhishek Agrawal b88536a17d
kokoro: add cloud run tests config (#12065)
kokoro: add cloud run tests config
2025-05-15 10:59:52 +05:30
Eric Anderson b089761486
xds: Enable least request by default (#12054)
Enable least request by default.

It has seen good testing by users and behaved as expected.

Fixes #11996
2025-05-14 13:23:37 +05:30
MV Shiva 59adeb9d47
Start 1.74.0 development cycle (#12061) 2025-05-13 17:15:36 +05:30
vinodhabib 6baac45bd2
xds: Fix pretty-print of Cluster with WrrLocality and LB policies (#12037) 2025-05-12 12:44:14 +05:30
John Cormie 454f1c5c6a
binder: Create a Robolectric version of BinderTransportTest (#12057)
This version runs way faster than BinderTransportTest and doesn't require an actual Android device/emulator. It'll allow future tests to simulate things that are difficult/impossible on real Android, at the price of some realism.
2025-05-09 14:57:59 -07:00
John Cormie 64fe061ccd
Simplify RobolectricBinderSecurityTest (#12058)
- Use INSTRUMENTATION_TEST @LooperMode to avoid custom Executors and idleLoopers() toil.
- No android.app.Service is actually needed with Robolectric.
2025-05-09 14:52:12 -07:00
Eric Anderson 80cc988b3c
xds: Use acceptResolvedAddresses() for WeightedTarget children (#12053)
Convert the tests to use acceptResolvedAddresses() as well.
2025-05-08 11:34:16 +05:30
Eric Anderson 3f5fdf1266
core: Remove unnecessary SuppressWarnings("deprecation") (#12052) 2025-05-07 10:14:48 +05:30
Eric Anderson be0247f501 Bump com.google.protobuf gradle plugin to 0.9.5
The plugin now outputs to "generated/sources". The IDE configuration
explicitly adding the folders to the source sets hasn't been needed for
some years.
2025-05-06 06:20:33 -07:00
Kim Jin Young 12aaf88d86
Fix comment's typo (#12045) 2025-05-05 22:32:31 +05:30