In 61f19d707a I swapped the signatures to use the version catalog. But I
failed to preserve the `@signature` extension and it all seemed to
work... But in fact all the animalsniffer tasks were completing as
SKIPPED as they lacked signatures. The build.gradle changes in this
commit are to fix that while still using version catalog.
But while it was broken violations crept in. Most violations weren't
too important and we're not surprised went unnoticed. For example, Netty
with TLS has long required the Java 8 API
`setEndpointIdentificationAlgorithm()`, so using `Optional` in the same
code path didn't harm anything in particular. I still swapped it to
Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`.
One important violation has not been fixed and instead I've disabled the
android signature in api/build.gradle for the moment. The violation is
in StatusException using the `fillInStackTrace` overload of Exception.
This problem [had been noticed][PR11066], but we couldn't figure out
what was going on. AnimalSniffer is now noticing this and agreeing with
the internal linter. There is still a question of why our interop tests
failed to notice this, but given they are no longer running on pre-API
level 24, that may forever be a mystery.
[PR11066]: https://github.com/grpc/grpc-java/pull/11066
The channel log is shared by many components and is poorly suited to
the noise of per-RPC events. This commit restricts RLS usage of the
logger to no more frequent than cache entry events. This may still be
too frequent, but should substantially improve the signal-to-noise and
we can do further rework as needed.
Many of the log entries were poor because they lacked enough context.
They weren't even clear they were from RLS. The cache entry events now
regularly include the request key in the logs, allowing you to follow
events for specific keys. I would have preferred using the hash code,
but NumberFormat is annoying and toString() may be acceptable given its
convenience.
This commit reverts much of eba699ad. Those logs have not proven to be
helpful as they produce more output than can be reasonably stored.
CachingRlsLbClient already calls it with a lock held. The only reason
the cache needs to manage the lock itself is for the periodic cleanup.
Let the consumer of the cache handle the timer.
Previously, picker was likely null if entering backoff soon after
start-up. This prevented the picker from being updated and directing
queued RPCs to the fallback. It would work for new RPCs if RLS returned
extremely rapidly; both ManagedChannelImpl and DelayedClientTransport do
a pick before enqueuing so the ManagedChannelImpl pick could request
from RLS and DelayedClientTransport could use the response. So the test
uses a delay to purposefully avoid that unlikely-in-real-life case.
Creating a resolving OOB channel for InProcess doesn't actually change
the destination from the parent, because InProcess uses directaddress.
Thus the fakeRlsServiceImpl is now being added to the fake backend
server, because the same server is used for RLS within the test.
b/333185213
Some APIs were marked experimental but had internal APIs in their
surface. These were all changed to internal. And then the internal APIs
were mostly hidden from generated documentation.
All these APIs will eventually become public and maybe even stable. But
they need some iteration before we're ready for others to start using
them.
* Change HappyEyeballs flag default value to false since some G3 users are seeing problems.
Put the flag logic in a common place for PickFirstLeafLoadBalancer & WRR's test.
* Set expected requestConnection count based on whether happy eyeballs is enabled or not
* Disable new PickFirstLB
* Fix test expectations to handle both new and old PF LB paths.
As part of gRFC A78:
> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.
`getMinEvictionTime()` was fixed to make sure only deltas were used for
comparisons (`a < b` is broken; `a - b < 0` is okay). It had also
returned `0` by default, which was meaningless as there is no epoch for
`System.nanoTime()`. LinkedHashLruCache now passes the current time into
a few more functions since the implementations need it and it was
sometimes already available. This made it easier to make some classes
static.
Instead of having docs in RefCountedChildPolicyWrapperFactory saying
that every method was guarded by a lock, I added `@GuardedBy("lock")`
within CachingRlsLbClient, so now it is clearly not thread-safe and the
lock protects access. The AtomicLong was replaced with a long since
1) there was no multi-threading and 2) the logic was not atomic-safe
which was misleading.
This started with combining handleNewRequest with asyncRlsCall, but that
emphasized pre-existing synchronization issues and trying to fix those
exposed others. It was hard to split this into smaller commits because
they were interconnected.
handleNewRequest was combined with asyncRlsCall to use a single code
flow for handling the completed future while also failing the pick
immediately for thottled requests. That flow was then reused for
refreshing after backoff and data stale. It no longer optimizes the RPC
completing immediately because that would not happen in real life; it
only happens in tests because of inprocess+directExecutor() and we don't
want to test a different code flow in tests. This did require updating
some of the tests.
One small behavior change to share the combined asyncRlsCall with
backoff is we now always invalidate an entry after the backoff.
Previously the code could replace the entry with its new value in one
operation if the asyncRlsCall future completed immediately. That only
mattered to a single test which now sees an EXPLICIT eviction.
SynchronizationContext used to provide atomic scheduling in
BackoffCacheEntry, but it was not guaranteeing the scheduledRunnable was
only accessed from the sync context. The same was true for calling up
the LB tree with `updateBalancingState()`. In particular, adding entries
to the cache during a pick could evict entries without running the
cleanup methods within the context, as well as the RLS channel
transitioning from TRANSIENT_FAILURE to READY. This was replaced with
using a bare Future with a lock to provide atomicity.
BackoffCacheEntry no longer uses the current time and instead waits for
the backoff timer to actually run before considering itself expired.
Previously, it could race with periodic cleanup and get evicted before
the timer ran, which would cancel the timer and forget the
backoffPolicy. Since the backoff timer invalidates the entry, it is
likely useless to claim it ever expires, but that level of behavior was
preserved since I didn't look into the LRU cache deeply.
propagateRlsError() was moved out of asyncRlsCall because it was not
guaranteed to run after the cache was updated. If something was already
running on the sync context, then RPCs would hang until another update
caused updateBalancingState().
Some methods were moved out of the CacheEntry classes to avoid
shared-state mutation in constructors. But if we add something in a
factory method, we want to remove it in a sibling method to the factory
method, so additional code is moved for symmetry. Moving shared-state
mutation ouf of constructors is important because 1) it is surprising
and 2) ErrorProne doesn't validate locking within constructors. In
general, having shared-state methods in CacheEntries also has the
problem that ErrorProne can't validate CachingRlsLbClient calls to
CacheEntry. ErrorProne can't know that "lock" is already held because
CacheEntry could have been created from a _different instance_ of
CachingRlsLbClient and there's no way for us to let ErrorProne prove it
is the same instance of "lock".
DataCacheEntry still mutates global state that requires a lock in its
constructor, but it is less severe of a problem and it requires more
choices to address.
The recommended way to load dependencies from `rules_jvm_external`
is to make use of the `@maven` workspace, and the most readable
way of doing that is to use the `artifact` macro provides.
This removes the need to generate the "compat" namespaces, which
`rules_jvm_external` provided for backwards compatibility with
older releases. This change also sets things up for supporting
`bzlmod`: this requires all workspaces accessed by a library to
be named "up front" in the `MODULE.bazel` file. This way, the
only repo that needs to be exported is `@maven`, rather than the
current huge list.
The local race passes `rlsPicker` to the channel before
CachingRlsLbClient is finished constructing. `RlsPicker` can use
multiple of the fields not yet initialized. This seems not to be
happening in practice, because it appears like it would break things
very loudly (e.g., NPE).
The remote race seems incredibly hard to hit, because it requires an RPC
to complete before the pending data tracking the RPC is added to a map.
But with if a system is at 100% CPU utilization, maybe it can be hit. If
it is hit, all RPCs needing the impacted cache entry will forever be
buffered.
Instead of a boolean, we now return a Status object. Status.OK
represents accepted addresses and other non-acceptance. This allows the
LB to provide more information about why a set of addresses were not
acceptable.
The status will later be sent to the name resolver as well to allow it
to also better react to to bad addresses.
Currently, the gRPC compiler isn't properly using the fully qualified
string name `java.lang.String` instead of `String`. Update the generator
to use the `$String$` alias to avoid compile issues with protobuf
messages called String.
Fixes#10316.
This avoids the (often missing) evaluationDependsOn and fixes using
results from other projects without propagating those through
Configuration. It also reduces the number of useless classes pulled in
by down-stream tests, reducing the probability of rebuilds.
The expectation of fixtures is they help testing down-stream code that
use the classes in main. That applies to all the classes here except for
FakeClock and StaticTestingClassLoader. It would also apply to many
internal classes in grpc-testing, but let's consider cleaning that up
future work.
If a child load balancer rejects the addresses it if given all we can do
is to trigger a name resolution refresh and hope for a better set of
addresses.
Introduce an AsyncService interface in the generated code and move the methods from <service>ImplBase to default implementation of the interface.
* update pom files to allow java 1.8
* Add a bindService(<service>Async) method
* Change TestServiceImpl to use the interface and include a bind method instead of extending TestServiceImplBase.
* Correct value being passed to throttler which had been backwards.
* Fix flaky test.
* Add a test using AdaptiveThrottler with a CachingRlsLBClient.
* Address test flakiness.
Second attempt at this, now with the understanding that RLS actually can
accept empty address lists.
This seems contrary to the behavior this LB advertizes with the canHandleEmptyAddressListFromNameResolution() method. This method is not overridden, so the default response of false is preserved. Empty address lists are supported though, and the parent LB never called the canHandleEmptyAddressListFromNameResolution() method.
Introduces a new acceptResolvedAddresses() method in LoadBalancer that
is to ultimately replace the existing handleResolvedAddresses(). The new
method allows the LoadBalancer implementation to reject the given
addresses by returning false from the method.
The long deprecated handleResolvedAddressGroups is also removed.
rls: Update implementation to match spec.
* Cleanup cache if exceeds max size when add an entry. Make cache entry size calculations more accurate
* Trigger pending RPC processing if unexpired backoff entries were removed from the cache by triggering helper to call it's parent updateBalancingState with the same state and picker
* Introduce minimum time before eviction (5 seconds)
* Change default accept ratio for AdaptiveThrottler from 1.2 -> 2.0
* Configuration validation
* When checking key names for duplicates also look at headers
* Check extra keys for duplicates
See analysis of implementation versus spec at https://docs.google.com/spreadsheets/d/18w5s1TEebRumWzk1pvWnjiHFGKc6MW-vt8tRLY4eNs0/
rls: Avoid library returning the status codes which the status spec document says that the library will never return when talking to RLS server. Instead, always return UNAVAILABLE on errors.
* Provide context around error message from RLS server