Commit Graph

129 Commits

Author SHA1 Message Date
Eric Anderson 70825adce6 Replace jsr305's GuardedBy with Error Prone's
We should avoid jsr305 and error prone's has the same semantics.
2025-01-10 08:16:48 -08:00
Eric Anderson 7b5d0692cc
Replace jsr305's CheckReturnValue with Error Prone's (#11811)
We should avoid jsr305 and error prone's has the same semantics.

Fixes #8687
2025-01-09 13:45:35 -08:00
Benjamin Peterson 8c261c3f28
Fix typo in deprecated blocking stub javadoc. (#11772) 2024-12-26 13:31:34 -08:00
Larry Safran ea8c31c305
Bidi Blocking Stub (#10318) 2024-12-20 16:16:17 -08:00
Eric Anderson 8ea3629378
Re-enable animalsniffer, fixing violations
In 61f19d707a I swapped the signatures to use the version catalog. But I
failed to preserve the `@signature` extension and it all seemed to
work... But in fact all the animalsniffer tasks were completing as
SKIPPED as they lacked signatures. The build.gradle changes in this
commit are to fix that while still using version catalog.

But while it was broken violations crept in. Most violations weren't
too important and we're not surprised went unnoticed. For example, Netty
with TLS has long required the Java 8 API
`setEndpointIdentificationAlgorithm()`, so using `Optional` in the same
code path didn't harm anything in particular. I still swapped it to
Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`.

One important violation has not been fixed and instead I've disabled the
android signature in api/build.gradle for the moment.  The violation is
in StatusException using the `fillInStackTrace` overload of Exception.
This problem [had been noticed][PR11066], but we couldn't figure out
what was going on. AnimalSniffer is now noticing this and agreeing with
the internal linter. There is still a question of why our interop tests
failed to notice this, but given they are no longer running on pre-API
level 24, that may forever be a mystery.

[PR11066]: https://github.com/grpc/grpc-java/pull/11066
2024-12-19 07:54:54 -08:00
Eric Anderson 7f9c1f39f3
rls: Reduce RLS channel logging
The channel log is shared by many components and is poorly suited to
the noise of per-RPC events. This commit restricts RLS usage of the
logger to no more frequent than cache entry events. This may still be
too frequent, but should substantially improve the signal-to-noise and
we can do further rework as needed.

Many of the log entries were poor because they lacked enough context.
They weren't even clear they were from RLS. The cache entry events now
regularly include the request key in the logs, allowing you to follow
events for specific keys. I would have preferred using the hash code,
but NumberFormat is annoying and toString() may be acceptable given its
convenience.

This commit reverts much of eba699ad. Those logs have not proven to be
helpful as they produce more output than can be reasonably stored.
2024-11-27 11:37:45 -08:00
Terry Wilson c63e354883
rls: Fix log statements incorrectly referring to "LRS" (#11497) 2024-08-29 16:12:59 -07:00
Eric Anderson 5c6b80881d rls: Make LinkedHashLruCache non-threadsafe
CachingRlsLbClient already calls it with a lock held. The only reason
the cache needs to manage the lock itself is for the periodic cleanup.
Let the consumer of the cache handle the timer.
2024-05-29 08:24:56 -07:00
Eric Anderson f9b6e5f92d rls: Guarantee backoff will update RLS picker
Previously, picker was likely null if entering backoff soon after
start-up. This prevented the picker from being updated and directing
queued RPCs to the fallback. It would work for new RPCs if RLS returned
extremely rapidly; both ManagedChannelImpl and DelayedClientTransport do
a pick before enqueuing so the ManagedChannelImpl pick could request
from RLS and DelayedClientTransport could use the response. So the test
uses a delay to purposefully avoid that unlikely-in-real-life case.

Creating a resolving OOB channel for InProcess doesn't actually change
the destination from the parent, because InProcess uses directaddress.
Thus the fakeRlsServiceImpl is now being added to the fake backend
server, because the same server is used for RLS within the test.

b/333185213
2024-05-13 16:29:05 -07:00
Vindhya Ningegowda 77a1e77e11
xds, rls: Experimental metrics are disabled by default (#11196)
Experimental metrics (i.e WRR and RLS metrics) are disabled by default. Users are expected to explicitly enable while configuring metrics.
2024-05-10 17:46:58 -07:00
Terry Wilson 511b9c3a5b
rls: Add gauge metric recording (#11175)
Adds these gauges:
- grpc.lb.rls.cache_entries
- grpc.lb.rls.cache_size
2024-05-08 15:15:34 -07:00
Eric Anderson 7a663f633c api: Hide internal metric APIs
Some APIs were marked experimental but had internal APIs in their
surface. These were all changed to internal. And then the internal APIs
were mostly hidden from generated documentation.

All these APIs will eventually become public and maybe even stable. But
they need some iteration before we're ready for others to start using
them.
2024-05-08 10:24:24 -07:00
Larry Safran 59b189bf91
Change HappyEyeballs and new pick first LB flags default value to false (#11120)
* Change HappyEyeballs flag default value to false since some G3 users are seeing problems.
Put the flag logic in a common place for PickFirstLeafLoadBalancer & WRR's test.

* Set expected requestConnection count based on whether happy eyeballs is enabled or not

* Disable new PickFirstLB

* Fix test expectations to handle both new and old PF LB paths.
2024-05-08 10:08:23 -07:00
Eric Anderson 54ac06ae30 rls: Add metric test with real channel 2024-05-07 10:06:46 -07:00
hakusai22 6ec744f2a0
Fix various typos (#11144) 2024-05-06 20:29:44 -07:00
Terry Wilson a1d19327fe
rls: Add the target label to RLS counter metrics (#11142) 2024-05-01 16:19:56 -07:00
Terry Wilson a9fb272b78
rls: add counter metrics (#11138)
Adds the following metrics to the RlsLoadBalancer:
- grpc.lb.rls.default_target_picks
- grpc.lb.rls.target_picks
- grpc.lb.rls.failed_picks
2024-05-01 11:24:38 -07:00
Eric Anderson 4c78a9746c
Plumb optional labels from LB to ClientStreamTracer
As part of gRFC A78:

> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.
2024-04-29 16:30:51 -07:00
Eric Anderson da619e2bde rls: Fix time handling in CachingRlsLbClient
`getMinEvictionTime()` was fixed to make sure only deltas were used for
comparisons (`a < b` is broken; `a - b < 0` is okay). It had also
returned `0` by default, which was meaningless as there is no epoch for
`System.nanoTime()`. LinkedHashLruCache now passes the current time into
a few more functions since the implementations need it and it was
sometimes already available. This made it easier to make some classes
static.
2024-04-25 15:38:39 -07:00
Eric Anderson 056195401f rls: Document RefCountedChildPolicyWrapperFactory as non-threadsafe
Instead of having docs in RefCountedChildPolicyWrapperFactory saying
that every method was guarded by a lock, I added `@GuardedBy("lock")`
within CachingRlsLbClient, so now it is clearly not thread-safe and the
lock protects access. The AtomicLong was replaced with a long since
1) there was no multi-threading and 2) the logic was not atomic-safe
which was misleading.
2024-04-25 15:35:50 -07:00
Eric Anderson 6e97b180b4
rls: Synchronization fixes in CachingRlsLbClient
This started with combining handleNewRequest with asyncRlsCall, but that
emphasized pre-existing synchronization issues and trying to fix those
exposed others. It was hard to split this into smaller commits because
they were interconnected.

handleNewRequest was combined with asyncRlsCall to use a single code
flow for handling the completed future while also failing the pick
immediately for thottled requests. That flow was then reused for
refreshing after backoff and data stale. It no longer optimizes the RPC
completing immediately because that would not happen in real life; it
only happens in tests because of inprocess+directExecutor() and we don't
want to test a different code flow in tests. This did require updating
some of the tests.

One small behavior change to share the combined asyncRlsCall with
backoff is we now always invalidate an entry after the backoff.
Previously the code could replace the entry with its new value in one
operation if the asyncRlsCall future completed immediately. That only
mattered to a single test which now sees an EXPLICIT eviction.

SynchronizationContext used to provide atomic scheduling in
BackoffCacheEntry, but it was not guaranteeing the scheduledRunnable was
only accessed from the sync context. The same was true for calling up
the LB tree with `updateBalancingState()`. In particular, adding entries
to the cache during a pick could evict entries without running the
cleanup methods within the context, as well as the RLS channel
transitioning from TRANSIENT_FAILURE to READY. This was replaced with
using a bare Future with a lock to provide atomicity.

BackoffCacheEntry no longer uses the current time and instead waits for
the backoff timer to actually run before considering itself expired.
Previously, it could race with periodic cleanup and get evicted before
the timer ran, which would cancel the timer and forget the
backoffPolicy. Since the backoff timer invalidates the entry, it is
likely useless to claim it ever expires, but that level of behavior was
preserved since I didn't look into the LRU cache deeply.

propagateRlsError() was moved out of asyncRlsCall because it was not
guaranteed to run after the cache was updated. If something was already
running on the sync context, then RPCs would hang until another update
caused updateBalancingState().

Some methods were moved out of the CacheEntry classes to avoid
shared-state mutation in constructors. But if we add something in a
factory method, we want to remove it in a sibling method to the factory
method, so additional code is moved for symmetry. Moving shared-state
mutation ouf of constructors is important because 1) it is surprising
and 2) ErrorProne doesn't validate locking within constructors. In
general, having shared-state methods in CacheEntries also has the
problem that ErrorProne can't validate CachingRlsLbClient calls to
CacheEntry. ErrorProne can't know that "lock" is already held because
CacheEntry could have been created from a _different instance_ of
CachingRlsLbClient and there's no way for us to let ErrorProne prove it
is the same instance of "lock".

DataCacheEntry still mutates global state that requires a lock in its
constructor, but it is less severe of a problem and it requires more
choices to address.
2024-04-03 12:22:04 -07:00
David Burns 00649913b0
bazel: Use the `artifact` macro for loading maven deps
The recommended way to load dependencies from `rules_jvm_external`
is to make use of the `@maven` workspace, and the most readable
way of doing that is to use the `artifact` macro provides.

This removes the need to generate the "compat" namespaces, which
`rules_jvm_external` provided for backwards compatibility with
older releases. This change also sets things up for supporting
`bzlmod`: this requires all workspaces accessed by a library to
be named "up front" in the `MODULE.bazel` file. This way, the
only repo that needs to be exported is `@maven`, rather than the
current huge list.
2024-03-28 14:33:32 -07:00
Larry Safran 51f811df86
Enable Happy Eyeballs by default (#11022)
* Flip the flag

* Fix test flakiness where IPv6 was not considered loopback
2024-03-21 16:59:54 -07:00
Larry Safran d1c406bd23
Prepare to switch flag to use new PickFirstLeafLoadBalancer by default (#10998)
* Fix PickFirstLeafLoadBalancer and tests to work when it is used.
* Actually use EAG attributes for subchannels.
2024-03-11 14:12:56 -07:00
Eric Anderson aa90768129
rls: Fix a local and remote race
The local race passes `rlsPicker` to the channel before
CachingRlsLbClient is finished constructing. `RlsPicker` can use
multiple of the fields not yet initialized. This seems not to be
happening in practice, because it appears like it would break things
very loudly (e.g., NPE).

The remote race seems incredibly hard to hit, because it requires an RPC
to complete before the pending data tracking the RPC is added to a map.
But with if a system is at 100% CPU utilization, maybe it can be hit. If
it is hit, all RPCs needing the impacted cache entry will forever be
buffered.
2024-03-08 09:47:11 -08:00
Terry Wilson eba699ad16
rls: Adding extra debug logs (#10902) 2024-02-15 15:23:36 -08:00
Eric Anderson d6830d7f99
Change many api deps to implementation deps
These look pretty fair now, mostly only exposing grpc-api and
annotations as api dependencies.
2023-12-15 15:14:29 -08:00
Eric Anderson 0299788807 util: Make grpc-core an implementation dependency
This prevents grpc-core from being exposed on the classpath when
compiling code using grpc-util.
2023-11-13 16:52:42 -08:00
Terry Wilson 9888a54abd
lb: acceptResolvedAddresses() to return Status (#10636)
Instead of a boolean, we now return a Status object. Status.OK
represents accepted addresses and other non-acceptance. This allows the
LB to provide more information about why a set of addresses were not
acceptable.

The status will later be sent to the name resolver as well to allow it
to also better react to to bad addresses.
2023-11-03 09:02:46 -07:00
Sergii Tkachenko a294b27d52
core: Deprecate ForwardingChannelBuilder (#10587)
Deprecate `ForwardingChannelBuilder` in favor of `ForwardingChannelBuilder2`.
2023-11-02 10:58:20 -07:00
Eric Anderson 3e44bbfe4a Exclude Internal classes from javadoc 2023-08-16 15:38:30 -07:00
sanjaypujare 41552bfd9a
all: generate automatic module name in the manifest (#10413) 2023-07-25 09:00:11 -07:00
Larry Safran afa4d6dac8
Have rls's LRU Cache rely on cleanup process to remove expired entries (#10400)
* Add test for multiple targets with cache expiration.
2023-07-21 12:12:19 -07:00
Larry Safran 9f78b2bd3c
Revert "Change the default for staleAge to be maxAge - 1 minute rather than maxage (unless maxAge is < 2 minutes) for the RLS configuration from proto. (#10397)" (#10399)
This reverts commit 56d1c42c80.
2023-07-20 15:32:35 -07:00
Larry Safran 56d1c42c80
Change the default for staleAge to be maxAge - 1 minute rather than maxage (unless maxAge is < 2 minutes) for the RLS configuration from proto. (#10397) 2023-07-20 10:43:08 -07:00
sanjaypujare 0f5f07f876
core, inprocess, util: move inprocess and util code into their own new artifacts grpc-inprocess and grpc-util (#10362)
* core, inprocess, util: move inprocess and util code into their own new artifacts grpc-inprocess and grpc-util
2023-07-17 11:45:31 -07:00
Philip K. Warren 3808e707f9
compiler: Use fully qualified String in codegen (#10321)
Currently, the gRPC compiler isn't properly using the fully qualified
string name `java.lang.String` instead of `String`. Update the generator
to use the `$String$` alias to avoid compile issues with protobuf
messages called String.

Fixes #10316.
2023-06-29 10:50:13 -07:00
Eric Anderson 29b8483fd6
Use test fixtures instead of sourceSets.test.output
This avoids the (often missing) evaluationDependsOn and fixes using
results from other projects without propagating those through
Configuration. It also reduces the number of useless classes pulled in
by down-stream tests, reducing the probability of rebuilds.

The expectation of fixtures is they help testing down-stream code that
use the classes in main. That applies to all the classes here except for
FakeClock and StaticTestingClassLoader. It would also apply to many
internal classes in grpc-testing, but let's consider cleaning that up
future work.
2023-05-16 12:10:13 -07:00
Eric Anderson 847ea7cfc9 Upgrade Mockito to 3.12.4
MockitoAnnotations.initMocks() is deprecated.
2023-05-08 16:39:42 -07:00
Terry Wilson 6e54ceb2d1
rls: Refresh name resolution on rejected addresses (#10032)
If a child load balancer rejects the addresses it if given all we can do
is to trigger a name resolution refresh and hope for a better set of
addresses.
2023-04-14 16:27:17 -07:00
Benjamin Peterson ae6c506f96
all: fix build with errorprone 2.18 (#9886)
errorprone cannot be updated past 2.10 because later versions do not support Java 8.

Fixes https://github.com/grpc/grpc-java/issues/9916.
2023-03-01 13:45:18 -08:00
Larry Safran 19eab29f8d
compiler: Generate interfaces for services to implement (#9688)
Introduce an AsyncService interface in the generated code and move the methods from <service>ImplBase to default implementation of the interface.
* update pom files to allow java 1.8
* Add a bindService(<service>Async) method
* Change TestServiceImpl to use the interface and include a bind method instead of extending TestServiceImplBase.
2023-02-15 10:33:44 -08:00
Larry Safran 5983be1369
rls:Fix throttling in route lookup (b/262779100) (#9874)
* Correct value being passed to throttler which had been backwards.

* Fix flaky test.

* Add a test using AdaptiveThrottler with a CachingRlsLBClient.

* Address test flakiness.
2023-02-06 15:19:16 -08:00
Terry Wilson 950fb7da61
rls: Migrate RLS LB to acceptResolvedAddresses() (#9612)
Second attempt at this, now with the understanding that RLS actually can
accept empty address lists.

This seems contrary to the behavior this LB advertizes with the canHandleEmptyAddressListFromNameResolution() method. This method is not overridden, so the default response of false is preserved. Empty address lists are supported though, and the parent LB never called the canHandleEmptyAddressListFromNameResolution() method.
2022-10-10 13:38:03 -07:00
Alexander Polcyn b7363bc854 Revert "rls: use acceptResolvedAddresses() (#9569)"
This reverts commit 3b62fbe365.
2022-10-03 16:15:51 -07:00
Terry Wilson 3b62fbe365
rls: use acceptResolvedAddresses() (#9569)
Switch over from handleResolvedAddresses as part of a LoadBalancer
public API refactoring.
2022-09-29 12:51:31 -07:00
Terry Wilson 4b4cb0bd3b
api,core: Add LoadBalancer.acceptResolvedAddresses() (#9498)
Introduces a new acceptResolvedAddresses() method in LoadBalancer that
is to ultimately replace the existing handleResolvedAddresses(). The new
method allows the LoadBalancer implementation to reject the given
addresses by returning false from the method.

The long deprecated handleResolvedAddressGroups is also removed.
2022-08-31 08:36:50 -07:00
Larry Safran b66250e9e5
Rls spec sync (#9437)
rls: Update implementation to match spec.

* Cleanup cache if exceeds max size when add an entry. Make cache entry size calculations more accurate
* Trigger pending RPC processing if unexpired backoff entries were removed from the cache by triggering helper to call it's parent updateBalancingState with the same state and picker
* Introduce minimum time before eviction (5 seconds)
* Change default accept ratio for AdaptiveThrottler from 1.2 -> 2.0
* Configuration validation
* When checking key names for duplicates also look at headers
* Check extra keys for duplicates

See analysis of implementation versus spec at https://docs.google.com/spreadsheets/d/18w5s1TEebRumWzk1pvWnjiHFGKc6MW-vt8tRLY4eNs0/
2022-08-19 13:31:05 -07:00
Larry Safran 778098b911
rls: fix RLS policy to not propagate status from control plane RPC to data plane RPC (#9413)
rls: Avoid library returning the status codes which the status spec document says that the library will never return when talking to RLS server.  Instead, always return UNAVAILABLE on errors.

* Provide context around error message from RLS server
2022-08-15 11:10:10 -07:00
Eric Anderson 61f19d707a
Swap Animalsniffer to Java 8 and Android 19
Also added missing signatures. Swapping to version catalog will make
this process easier in the future.
2022-08-10 12:41:57 -07:00