* Change HappyEyeballs flag default value to false since some G3 users are seeing problems.
Put the flag logic in a common place for PickFirstLeafLoadBalancer & WRR's test.
* Set expected requestConnection count based on whether happy eyeballs is enabled or not
* Disable new PickFirstLB
* Fix test expectations to handle both new and old PF LB paths.
This is needed by gRFC A78 for xds metrics, and for RLS metrics. Since
gauges need to acquire a lock (or other synchronization) in the
callback, the callback allows batching multiple gauges together to avoid
acquiring-and-requiring such locks.
Unlike other metrics, gauges are reported on-demand to the MetricSink.
This means not all sinks will receive the same data, as the sinks will
ask for the gauges at different times.
This will be used for gRFC A66's OTel per-RPC metric label:
> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`).
The majority of the changes are to move target computation from
ManagedChannelImpl into the builder. A small hack API was added to
ManagedChannelBuilder to get the target to create an interceptor.
This should preserve all the existing behavior of GlobalInterceptors as
used by grpc-gcp-observability, including it disabling the implicit
OpenCensus integration.
Both the old and new API are internal. I hid Configurator and
ConfiguratorRegistry behind Internal-prefixed classes, like had been
done with GlobalInterceptors to further discourage use until the API is
ready.
GlobalInterceptorsTest was modified to become ConfiguratorRegistryTest.
This adds the following components that are required for gRPC A79
non-per-call metrics architecture.
- MetricSink implementation for gRPC OpenTelemetry
- Configurator for plumbing per call metrics ClientInterceptor and
ServerStreamTracer.Factory via unified OpenTelemetryModule.
gRFC A78 has WRR and pick-first include a `grpc.target` label, defined
in A66:
> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`). For channels such as inprocess channels
> where a target URI is not available, implementations can synthesize a
> target URI.
As part of gRFC A78:
> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.
* added MetricRecorderImpl and unit tests for MetricInstrumentRegistry
* updated MetricInstrumentRegistry to use array instead of ArrayList
* renamed record<>Counter APIs to add<>Counter. Added check for mismatched label values
* added lock for instruments array
Handles Netty write frame failures caused by issues in the Netty
itself.
Normally we don't need to do anything on frame write failures because
the cause of a failed future would be an IO error that resulted in
the stream closure. Prior to this PR we treated these issues as a
noop, except the initial headers write on the client side.
However, a case like netty/netty#13805 (a bug in generating next
stream id) resulted in an unclosed stream on our side. This PR adds
write frame future failure handlers that ensures the stream is
cancelled, and the cause is propagated via Status.
Fixes#10849
The recommended way to load dependencies from `rules_jvm_external`
is to make use of the `@maven` workspace, and the most readable
way of doing that is to use the `artifact` macro provides.
This removes the need to generate the "compat" namespaces, which
`rules_jvm_external` provided for backwards compatibility with
older releases. This change also sets things up for supporting
`bzlmod`: this requires all workspaces accessed by a library to
be named "up front" in the `MODULE.bazel` file. This way, the
only repo that needs to be exported is `@maven`, rather than the
current huge list.
The name resolver takes some time before it returns addresses. While waiting the channel will be IDLE instead of the proper CONNECTING. This generally doesn't matter since RPCs behave similarly for IDLE and CONNECTING, but is confusing for users when watching channel.getState() closely.
Fixes#10517.
We provided extra details when the RPC is killed by CallOptions'
Deadline, but didn't do the same for Context.
To avoid duplicating code, things were restructured, including the
threading. There are more code flows now, but I think the
multi-threading came out more obvious and less error-prone. I didn't
change the status when the deadline is already expired, because the
text is shared with DelayedClientCall and AbstractInteropTest doesn't
distinguish between the two cases.
This is a roll-forward that avoids a NPE when cancel() is called
without an earlier call to start().
As seen at b/300991330
* Allow the queued byte threshold for a Stream to be ready to be configurable
- on clients this is exposed by setting a CallOption
- on servers this is configured by calling a method on ServerCall or ServerStreamListener
We provided extra details when the RPC is killed by CallOptions'
Deadline, but didn't do the same for Context.
To avoid duplicating code, things were restructured, including the
threading. There are more code flows now, but I think the
multi-threading came out more obvious and less error-prone. I didn't
change the status when the deadline is already expired, because the
text is shared with DelayedClientCall and AbstractInteropTest doesn't
distinguish between the two cases.
As seen at b/300991330
SelfSignedCertificate is not available on Java 17 because
OpenJdkSelfSignedCertGenerator is not available. This only impacted
tests.
AccessController is being removed, and these locations are doing simple
reflection which is unlikely to require it even when a security policy
is in effect. There's other places we do reflection without the
AccessController, so either no security policies care or the users can
update their policies to allow it.
We can just compare the Deadline instances instead of asserting that
very little time has passed during the test. Real time probably still
matters in the test, but only insofar that the deadline is not expired
by the time ClientCallImpl sees it.
This fixes a test failure seen in the emulated aarch64 CI. Note that the
message says "ns" when it should say "ms", but this change deletes the
code with that typo.
```
java.lang.AssertionError: timeout: 548 ns
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at io.grpc.internal.ClientCallImplTest.assertTimeoutBetween(ClientCallImplTest.java:1102)
at io.grpc.internal.ClientCallImplTest.contextDeadlineShouldBePropagatedToStream(ClientCallImplTest.java:828)
```
Add the 'fake' dependency to grpc-netty instead of grpc-core.
grpc-okhttp already depends on grpc-util and probably would be fine
without round_robin on Android.
There's not actually a circular dependency, but some tools can't handle
the compile vs runtime distinction. Such tools are broken, but fixes
have been slow and this approach works with no real downfalls.
Works around #10576#10701
* Provide a default implementation for new method added to ManagedTransport.Listener to support ClientTransportFilters
* Relax test constraint to reduce flakiness due to timing.
* Add test for listener.filterTransport.
This fixes two bugs in outbound message size checking:
* When thet checke failed, the thrown StatusRuntimeException with a status code
of RESOURCE_EXHAUSTED was been caught and rewrapped in another
StatusRuntimeException but this time with status code UNKNOWN.
* This applies the max outbound message size check to messages prior to, and
after compression, since compression of a message that is smaller than the
maximum send size can result in a larger message that exceeds the maximum
send size.
Originally you had to confirm that awaitTermination() returned true, but
that was annoying and useless, especially after calling shutdownNow().
The behavior was changed in ce2ae1fb because the awaitTermination()
detection logic could prevent the channel from getting garbage
collected.
Fixes#10732
This change has health checking consumer (new pick first) to install a listener through and health checking producer (outlier detection and client health checking) producing health checks. Health notification chain is built reusing the previous connectivity state chain.
Pickfirst installs the health listener, and is capable of detecting when no health checking producer is installed in the system. In that case, it sets health status to be READY so that health system is no-op.
This may help some to move closer to Providers. It especially helps
cases where `NameResolverFactory`s aren't returning `InetSocketAddress`,
as it allows them to override `getProducedSocketAddressTypes()`, which
will now fail starting in 15fc70be.
Adds a new module grpc-opentelemetry that integrates OpenTelemetry and focuses on metrics.
OpenTelemetry APIs are used for instrumenting metrics collection. Users are expected to provide SDK with implementations.
If no SDK is passed, by default gRPC uses OpenTelemetry.noop().
* Update picker logic per A61 that it no longer pays attention to the first 2 elements, but rather takes the first ring element not in TF and uses that.
---------
Pulled in by rebase:
Eric Anderson (android: Remove unneeded proguard rule 44723b6)
Terry Wilson (stub: Deprecate StreamObservers b5434e8)
This is currently the only place where we return Status.UNKNOWN with no description, which makes is harder to debug and differentiate from statuses originated from non-grpc sources.
This PR enriches ServerImpl's #internalClose `Status.UNKNOWN` with description `Application error processing RPC`.
* core, netty, okhttp: implement new logic for nameResolverFactory API in channelBuilder
fix ManagedChannelImpl to use NameResolverRegistry instead of NameResolverFactory
fix the ManagedChannelImplBuilder and remove nameResolverFactory
* Integrate target parsing and NameResolverProvider searching
Actually creating the name resolver is now delayed to the end of
ManagedChannelImpl.getNameResolver; we don't want to call into the name
resolver to determine if we should use the name resolver.
Added getDefaultScheme() to NameResolverRegistry to avoid needing
NameResolver.Factory.
---------
Co-authored-by: Eric Anderson <ejona@google.com>
Instead of a boolean, we now return a Status object. Status.OK
represents accepted addresses and other non-acceptance. This allows the
LB to provide more information about why a set of addresses were not
acceptable.
The status will later be sent to the name resolver as well to allow it
to also better react to to bad addresses.
This breaks the ABI of the classes listed below.
Users that recompiled their code using grpc-java [`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) (released on
Feb 23, 2021) and later, ARE NOT AFFECTED.
Users that compiled their source using grpc-java earlier than
[`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) need to
recompile when upgrading to grpc-java `v1.59.0`. Otherwise the code
will fail on runtime with `NoSuchMethodError`. For example, code:
```java
NettyChannelBuilder.forTarget("localhost:100").maxRetryAttempts(2);
```
Will fail with
> `java.lang.NoSuchMethodError: 'io.grpc.internal.AbstractManagedChannelImplBuilder
io.grpc.netty.NettyChannelBuilder.maxRetryAttempts(int)'`
**Affected classes**
Class `AbstractManagedChannelImplBuilder` is deleted, and no longer in
the class hierarchy of the channel builders:
- `io.grpc.netty.NettyChannelBuilder`
- `io.grpc.okhttp.OkhttpChannelBuilder`
- `grpc.cronet.CronetChannelBuilder`
This breaks the ABI of the classes listed below.
Users that recompiled their code using grpc-java [`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) (released on
Feb 23, 2021) and later, ARE NOT AFFECTED.
Users that compiled their source using grpc-java earlier than
[`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) need to
recompile when upgrading to grpc-java `v1.59.0`. Otherwise the code
will fail on runtime with `NoSuchMethodError`. For example, code:
```java
NettyServerBuilder.forPort(80).directExecutor();
```
Will fail with
> `java.lang.NoSuchMethodError: 'io.grpc.internal.AbstractServerImplBuilder
io.grpc.netty.NettyServerBuilder.directExecutor()'`
**Affected classes**
Class `AbstractServerImplBuilder` is deleted, and no longer in the
class hierarchy of the server builders:
- `io.grpc.netty.NettyServerBuilder`
- `io.grpc.inprocess.InProcessServerBuilder`
When a memory leak occurs, it is really helpful to have access records
to understand where the buffer was being held when it leaked. retain()
when we create the NettyReadableBuffer already creates an access record
the ByteBuf, so here we track when the ByteBuf is passed to another
thread.
See #8330
* Use internalClose instead of close when sendMessage has a RuntimeException.
* Change argument to internalClose to a Throwable instead of a Status.
* Rename internalClose to handleInternalError
* Eliminate NPE by skipping further processing when stream is defined, but doesn't have a property for streamKey (header processing identified an error)
Fixes#10364
* Add unit test for missing content type
Encode the service authority before passing it into gRPC util in the xDS name resolver to handle xDS requests which might contain multiple slashes. Example: xds:///path/to/service:port.
As currently the underlying Java URI library does not break the encoded authority into host/port correctly simplify the check to just look for '@' as we are only interested in checking for user info to validate the authority for HTTP.
This change also leads to few changes in unit tests that relied on this check for invalid authorities which now will be considered valid.
Just like #9376, depending on Guava packages such as URLEscapers or PercentEscapers leads to internal failures(Ex: Unresolvable reference to com.google.common.escape.Escaper from io.grpc.internal.GrpcUtil). To avoid these issues create an in house version that is heavily inspired by grpc-go/grpc.
Wrapping the DnsNameResolver in DnsNameResolverProvider can cause
problems to external name resolvers that delegate to a DnsResolver
already wrapped in RetryingNameResolver. ManagedChannelImpl would
end up wrapping these name resolvers again, causing an exception
later from a RetryingNameResolver safeguard that checks for double
wrapping.
Motivation:
When multiple NameResolvers are created, the Classloader is scanned every time trying to figure out if the Platform is Android. This expensive work could be done only once.
Modification:
Cache isAndroid resolution in a constant.
Result:
Less expensive multiple NameResolvers instantiation.
ManagedCahnnelImpl did not make sure to use a RetryingNameResolver if
authority was not overriden. This was not a problem for DNS name
resolution as the DNS name resolver factory explicitly returns a
RetryingNameResolver. For polling name resolvers that do not do this in
their factories (like the grpclb name resolver) this meant not having retry
at all.
* context, all: move Context classes to grpc-api
clean up grpc-context since it has no source code: only add dep on grpc-api
add exclusion for all transitive deps of grpc-api - only guava
exclude grpc-context as a dependency from grpc-alts because all context code is in grpc-api now
api: 1.7 as target Java version for Context source-set of grpc-api
* core, census: fix the issues with android project pulling in old grpc-context version
* api,context: make changes to bazel build files to account for context code moving from context to api
Since 44847bf4e, when we upgraded our JUnit version, the JUnit
exclusions have probably not been necessary. e0ac97c4f upgraded
Robolectric to a version that had the auto.service problem fixed.
This avoids the (often missing) evaluationDependsOn and fixes using
results from other projects without propagating those through
Configuration. It also reduces the number of useless classes pulled in
by down-stream tests, reducing the probability of rebuilds.
The expectation of fixtures is they help testing down-stream code that
use the classes in main. That applies to all the classes here except for
FakeClock and StaticTestingClassLoader. It would also apply to many
internal classes in grpc-testing, but let's consider cleaning that up
future work.
LoadBalancers in general should never throw, but
parseLoadBalancingConfig() in particular has a return value to
communicate the error. Throwing can be a bit unpredictable, but at its
most trivial form causes a channel panic. There's no reason to throw
explicitly and calls to JsonUtil have to be protected by a try-catch
because it can throw.
ReadyPicker hasn't been necessary since 111ff60e, when we stopped
calling super.pickSubchannel(). EmptyPicker was only used in tests, and
we can just compare the class name instead of doing an instanceof check.
Unfortunately, calling getClass() caused Java to start casting the
return value of pickerCaptor.getValue() based on its generics. Captors
don't verify the type they capture, so using any type other than
SubchannelPicker for the pickerCaptor is misleading and hides a cast.
If provided with the new PickFirstLoadBalancerConfig,
PickFirstLoadBalancer will shuffle the list of addresses it receives
from the name resolver.
PickFirstLoadBalancerProvider will now support the new config
if enabled by an env variable.
When the subchannel is transitioning from TRANSIENT_FAILURE to either
IDLE or CONNECTING we will not update the LB state. Additionally, if
the subchannel becomes idle we request a new connection so that the
subchannel will keep on trying to establish a connection.
The problem was one hedge was committed before another had drained
start(). This was not testable because HedgingRunnable checks whether
scheduledHedgingRef is cancelled, which is racy, but there's no way to
deterministically trigger either race.
The same problem couldn't be triggered with retries because only one
attempt will be draining at a time. Retries with cancellation also
couldn't trigger it, for the surprising reason that the noop stream used
in cancel() wasn't considered drained.
This commit marks the noop stream as drained with cancel(), which allows
memory to be garbage collected sooner and exposes the race for tests.
That then showed the stream as hanging, because inFlightSubStreams
wasn't being decremented.
Fixes#9185
This change has these main aspects to it:
1. Removal of any name resolution responsibility from ManagedChannelImpl
2. Creation of a new RetryScheduler to own generic retry logic
- Can also be used outside the name resolution context
3. Creation of a new RetryingNameScheduler that can be used to wrap any
polling name resolver to add retry capability
4. A new facility in NameResolver to allow implementations to notify
listeners on the success of name resolution attempts
- RetryingNameScheduler relies on this
ExpectedException is deprecated, so I fixed the new warnings. However,
we are still using ExpectedException many places and had previously
supressed the warning. See
https://github.com/grpc/grpc-java/issues/7467 . I did not fix those
existing instances that had suppressed the warning, since it is
unrelated to the upgrade and we have been free to fix them at any time
since we dropped Java 7.
Use a volatile to publish the value even though it is final. TSAN
ignores the final aspect of a field, which is fair since another thread
may not see the parent's pointer become updated and use a stale value.
The lack of synchronization was clearly against lastServiceConfig's
documentation.
Fixes#9267
This helps to prevent retryable stream from using calloptions.executor when it shouldn't, e.g. call is already notified closed. It is done by delaying notifying upper layer (through masterListener).
Trying to upgrade Gradle to 7.6 improved the checkstyle plugin such that
it appears to have been running in new occasions. That in turn exposed
us to https://github.com/checkstyle/checkstyle/issues/5088. That bug was
fixed in 8.28, which also fixed lots of other bugs. So now we have
better checking and some existing volations needed fixing. Since the
code style fixes generated a lot of noise, this is a pre-fix to reduce
the size of a Gradle upgrade.
I did not upgrade past 8.28 because at some point some other bugs were
introduced, in particular with the Indentation module. I chose the
oldest version that had the particular bug impacting me fixed. Upgrading
to this old-but-newer version still makes it easier to upgrade to a
newer version in the future.
If an artifact on Maven Central exposes a type from gRPC on its API
surface, then consumers of that artifact need that gRPC API in the
compile classpath. Bazel handles this by making hjars for transitive
dependencies, but if the dependencies are runtime_deps then Bazel won't
generate hjars containing the needed symbols.
We don't export netty-shaded because the classes already don't match
Maven Central. If an artifact on Maven Central is exposing a
netty-shaded class on its API surface, it wouldn't work anyway since the
class simply doesn't exist for the Bazel build.
Fixes#9772
This change has these main aspects to it:
1. Removal of any name resolution responsibility from ManagedChannelImpl
2. Creation of a new RetryScheduler to own generic retry logic
- Can also be used outside the name resolution context
3. Creation of a new RetryingNameScheduler that can be used to wrap any
polling name resolver to add retry capability
4. A new facility in NameResolver to allow implementations to notify
listeners on the success of name resolution attempts
- RetryingNameScheduler relies on this
updateAndGet is only available in API level 24+. It is unclear why
AnimalSniffer didn't detect this.
This was noticed when doing the import, but the Android CI has also been
failing because of it.
Add big negative integer to pending stream count when cancelled. The count is used to delay closing master listener until streams fully drained.
Increment pending stream count before creating one. The count is also used to indicate callExecutor is safe to be used. New stream will not be created if big negative number was added, i.e. stream cancelled. New stream is created if not cancelled, callExecutor is safe to be used, because cancel will be delayed.
Create new streams (retry, hedging) is moved to the main thread, before callExecutor calls drain.
Minor refactor the masterListener.close() scenario.
Certain status codes are inappropriate for a control plane to be
returning and indicate a bug in the control plane. These status codes
will be replaced by INTERNAL, to make it clearer to see that the control
plane is misbehaving.
My motivation for making this change is that [`ByteBuffer` is becoming
`sealed`](https://download.java.net/java/early_access/loom/docs/api/java.base/java/nio/ByteBuffer.html)
in new versions of Java. This makes it impossible for Mockito's
_current_ default mockmaker to mock it.
That said, Mockito will likely [switch its default
mockmaker](https://github.com/mockito/mockito/issues/2589) to an
alternative that _is_ able to mock `sealed` classes. However, there are
downside to that, such as [slower
performance](https://github.com/mockito/mockito/issues/2589#issuecomment-1192725206),
so it's probably better to leave our options open by avoiding mocking at
all.
And in this case, it's equally easy to use real objects.
As a bonus, I think that real objects makes the code a little easier to
follow: Before, we created mocks that the code under test never
interacted with in any way. (The code just passed them through to a
delegate.) When I first read the tests, I was confused, since I assumed
that the mock we were creating was the same mock that we then passed to
`verify` at the end of the method. That turned out not to be the case.
Forwarding acceptResolvedAddresses() to a delegate in ForwardingLoadBalancer can cause problems if an extending class expects its handleResolvedAddresses implementation to be called even when a client calls handleResolvedAddresses(). This would not happen as ForwardingLoadBalancer would directly send the call to the delegate.
* core,api,auth: Choose the callOptions executor when applying request metadata to credentials during newStream based upon whether AppEngineCredentials are being used as they require a specific thread to do the processing.
Add an interface to differentiate whether the specific thread is needed.
Fixes b/244209681
Introduces a new acceptResolvedAddresses() method in LoadBalancer that
is to ultimately replace the existing handleResolvedAddresses(). The new
method allows the LoadBalancer implementation to reject the given
addresses by returning false from the method.
The long deprecated handleResolvedAddressGroups is also removed.
This was already being done for the Listener, but it was missed for the
ClientCall draining itself. That's not too surprising, since very few
things should be looking at the context in that path. We don't care too
much about this client call case, but if the context _does_ end up
mattering it could be painful to debug.
Fixes#9478
Implements a fake load balancer with round robin like behavior in order
to test an outlier detection scenario where a subchannel goes from
having multiple associated addresses to one.