This will be used for gRFC A66's OTel per-RPC metric label:
> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`).
The majority of the changes are to move target computation from
ManagedChannelImpl into the builder. A small hack API was added to
ManagedChannelBuilder to get the target to create an interceptor.
This should preserve all the existing behavior of GlobalInterceptors as
used by grpc-gcp-observability, including it disabling the implicit
OpenCensus integration.
Both the old and new API are internal. I hid Configurator and
ConfiguratorRegistry behind Internal-prefixed classes, like had been
done with GlobalInterceptors to further discourage use until the API is
ready.
GlobalInterceptorsTest was modified to become ConfiguratorRegistryTest.
This adds the following components that are required for gRPC A79
non-per-call metrics architecture.
- MetricSink implementation for gRPC OpenTelemetry
- Configurator for plumbing per call metrics ClientInterceptor and
ServerStreamTracer.Factory via unified OpenTelemetryModule.
gRFC A78 has WRR and pick-first include a `grpc.target` label, defined
in A66:
> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`). For channels such as inprocess channels
> where a target URI is not available, implementations can synthesize a
> target URI.
As part of gRFC A78:
> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.
* added MetricRecorderImpl and unit tests for MetricInstrumentRegistry
* updated MetricInstrumentRegistry to use array instead of ArrayList
* renamed record<>Counter APIs to add<>Counter. Added check for mismatched label values
* added lock for instruments array
Handles Netty write frame failures caused by issues in the Netty
itself.
Normally we don't need to do anything on frame write failures because
the cause of a failed future would be an IO error that resulted in
the stream closure. Prior to this PR we treated these issues as a
noop, except the initial headers write on the client side.
However, a case like netty/netty#13805 (a bug in generating next
stream id) resulted in an unclosed stream on our side. This PR adds
write frame future failure handlers that ensures the stream is
cancelled, and the cause is propagated via Status.
Fixes#10849
The recommended way to load dependencies from `rules_jvm_external`
is to make use of the `@maven` workspace, and the most readable
way of doing that is to use the `artifact` macro provides.
This removes the need to generate the "compat" namespaces, which
`rules_jvm_external` provided for backwards compatibility with
older releases. This change also sets things up for supporting
`bzlmod`: this requires all workspaces accessed by a library to
be named "up front" in the `MODULE.bazel` file. This way, the
only repo that needs to be exported is `@maven`, rather than the
current huge list.
The name resolver takes some time before it returns addresses. While waiting the channel will be IDLE instead of the proper CONNECTING. This generally doesn't matter since RPCs behave similarly for IDLE and CONNECTING, but is confusing for users when watching channel.getState() closely.
Fixes#10517.
We provided extra details when the RPC is killed by CallOptions'
Deadline, but didn't do the same for Context.
To avoid duplicating code, things were restructured, including the
threading. There are more code flows now, but I think the
multi-threading came out more obvious and less error-prone. I didn't
change the status when the deadline is already expired, because the
text is shared with DelayedClientCall and AbstractInteropTest doesn't
distinguish between the two cases.
This is a roll-forward that avoids a NPE when cancel() is called
without an earlier call to start().
As seen at b/300991330
* Allow the queued byte threshold for a Stream to be ready to be configurable
- on clients this is exposed by setting a CallOption
- on servers this is configured by calling a method on ServerCall or ServerStreamListener
We provided extra details when the RPC is killed by CallOptions'
Deadline, but didn't do the same for Context.
To avoid duplicating code, things were restructured, including the
threading. There are more code flows now, but I think the
multi-threading came out more obvious and less error-prone. I didn't
change the status when the deadline is already expired, because the
text is shared with DelayedClientCall and AbstractInteropTest doesn't
distinguish between the two cases.
As seen at b/300991330
SelfSignedCertificate is not available on Java 17 because
OpenJdkSelfSignedCertGenerator is not available. This only impacted
tests.
AccessController is being removed, and these locations are doing simple
reflection which is unlikely to require it even when a security policy
is in effect. There's other places we do reflection without the
AccessController, so either no security policies care or the users can
update their policies to allow it.
We can just compare the Deadline instances instead of asserting that
very little time has passed during the test. Real time probably still
matters in the test, but only insofar that the deadline is not expired
by the time ClientCallImpl sees it.
This fixes a test failure seen in the emulated aarch64 CI. Note that the
message says "ns" when it should say "ms", but this change deletes the
code with that typo.
```
java.lang.AssertionError: timeout: 548 ns
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at io.grpc.internal.ClientCallImplTest.assertTimeoutBetween(ClientCallImplTest.java:1102)
at io.grpc.internal.ClientCallImplTest.contextDeadlineShouldBePropagatedToStream(ClientCallImplTest.java:828)
```
Add the 'fake' dependency to grpc-netty instead of grpc-core.
grpc-okhttp already depends on grpc-util and probably would be fine
without round_robin on Android.
There's not actually a circular dependency, but some tools can't handle
the compile vs runtime distinction. Such tools are broken, but fixes
have been slow and this approach works with no real downfalls.
Works around #10576#10701
* Provide a default implementation for new method added to ManagedTransport.Listener to support ClientTransportFilters
* Relax test constraint to reduce flakiness due to timing.
* Add test for listener.filterTransport.
This fixes two bugs in outbound message size checking:
* When thet checke failed, the thrown StatusRuntimeException with a status code
of RESOURCE_EXHAUSTED was been caught and rewrapped in another
StatusRuntimeException but this time with status code UNKNOWN.
* This applies the max outbound message size check to messages prior to, and
after compression, since compression of a message that is smaller than the
maximum send size can result in a larger message that exceeds the maximum
send size.
Originally you had to confirm that awaitTermination() returned true, but
that was annoying and useless, especially after calling shutdownNow().
The behavior was changed in ce2ae1fb because the awaitTermination()
detection logic could prevent the channel from getting garbage
collected.
Fixes#10732
This change has health checking consumer (new pick first) to install a listener through and health checking producer (outlier detection and client health checking) producing health checks. Health notification chain is built reusing the previous connectivity state chain.
Pickfirst installs the health listener, and is capable of detecting when no health checking producer is installed in the system. In that case, it sets health status to be READY so that health system is no-op.
This may help some to move closer to Providers. It especially helps
cases where `NameResolverFactory`s aren't returning `InetSocketAddress`,
as it allows them to override `getProducedSocketAddressTypes()`, which
will now fail starting in 15fc70be.
Adds a new module grpc-opentelemetry that integrates OpenTelemetry and focuses on metrics.
OpenTelemetry APIs are used for instrumenting metrics collection. Users are expected to provide SDK with implementations.
If no SDK is passed, by default gRPC uses OpenTelemetry.noop().
* Update picker logic per A61 that it no longer pays attention to the first 2 elements, but rather takes the first ring element not in TF and uses that.
---------
Pulled in by rebase:
Eric Anderson (android: Remove unneeded proguard rule 44723b6)
Terry Wilson (stub: Deprecate StreamObservers b5434e8)
This is currently the only place where we return Status.UNKNOWN with no description, which makes is harder to debug and differentiate from statuses originated from non-grpc sources.
This PR enriches ServerImpl's #internalClose `Status.UNKNOWN` with description `Application error processing RPC`.
* core, netty, okhttp: implement new logic for nameResolverFactory API in channelBuilder
fix ManagedChannelImpl to use NameResolverRegistry instead of NameResolverFactory
fix the ManagedChannelImplBuilder and remove nameResolverFactory
* Integrate target parsing and NameResolverProvider searching
Actually creating the name resolver is now delayed to the end of
ManagedChannelImpl.getNameResolver; we don't want to call into the name
resolver to determine if we should use the name resolver.
Added getDefaultScheme() to NameResolverRegistry to avoid needing
NameResolver.Factory.
---------
Co-authored-by: Eric Anderson <ejona@google.com>
Instead of a boolean, we now return a Status object. Status.OK
represents accepted addresses and other non-acceptance. This allows the
LB to provide more information about why a set of addresses were not
acceptable.
The status will later be sent to the name resolver as well to allow it
to also better react to to bad addresses.
This breaks the ABI of the classes listed below.
Users that recompiled their code using grpc-java [`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) (released on
Feb 23, 2021) and later, ARE NOT AFFECTED.
Users that compiled their source using grpc-java earlier than
[`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) need to
recompile when upgrading to grpc-java `v1.59.0`. Otherwise the code
will fail on runtime with `NoSuchMethodError`. For example, code:
```java
NettyChannelBuilder.forTarget("localhost:100").maxRetryAttempts(2);
```
Will fail with
> `java.lang.NoSuchMethodError: 'io.grpc.internal.AbstractManagedChannelImplBuilder
io.grpc.netty.NettyChannelBuilder.maxRetryAttempts(int)'`
**Affected classes**
Class `AbstractManagedChannelImplBuilder` is deleted, and no longer in
the class hierarchy of the channel builders:
- `io.grpc.netty.NettyChannelBuilder`
- `io.grpc.okhttp.OkhttpChannelBuilder`
- `grpc.cronet.CronetChannelBuilder`
This breaks the ABI of the classes listed below.
Users that recompiled their code using grpc-java [`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) (released on
Feb 23, 2021) and later, ARE NOT AFFECTED.
Users that compiled their source using grpc-java earlier than
[`v1.36.0`]
(https://github.com/grpc/grpc-java/releases/tag/v1.36.0) need to
recompile when upgrading to grpc-java `v1.59.0`. Otherwise the code
will fail on runtime with `NoSuchMethodError`. For example, code:
```java
NettyServerBuilder.forPort(80).directExecutor();
```
Will fail with
> `java.lang.NoSuchMethodError: 'io.grpc.internal.AbstractServerImplBuilder
io.grpc.netty.NettyServerBuilder.directExecutor()'`
**Affected classes**
Class `AbstractServerImplBuilder` is deleted, and no longer in the
class hierarchy of the server builders:
- `io.grpc.netty.NettyServerBuilder`
- `io.grpc.inprocess.InProcessServerBuilder`
When a memory leak occurs, it is really helpful to have access records
to understand where the buffer was being held when it leaked. retain()
when we create the NettyReadableBuffer already creates an access record
the ByteBuf, so here we track when the ByteBuf is passed to another
thread.
See #8330
* Use internalClose instead of close when sendMessage has a RuntimeException.
* Change argument to internalClose to a Throwable instead of a Status.
* Rename internalClose to handleInternalError
* Eliminate NPE by skipping further processing when stream is defined, but doesn't have a property for streamKey (header processing identified an error)
Fixes#10364
* Add unit test for missing content type
Encode the service authority before passing it into gRPC util in the xDS name resolver to handle xDS requests which might contain multiple slashes. Example: xds:///path/to/service:port.
As currently the underlying Java URI library does not break the encoded authority into host/port correctly simplify the check to just look for '@' as we are only interested in checking for user info to validate the authority for HTTP.
This change also leads to few changes in unit tests that relied on this check for invalid authorities which now will be considered valid.
Just like #9376, depending on Guava packages such as URLEscapers or PercentEscapers leads to internal failures(Ex: Unresolvable reference to com.google.common.escape.Escaper from io.grpc.internal.GrpcUtil). To avoid these issues create an in house version that is heavily inspired by grpc-go/grpc.
Wrapping the DnsNameResolver in DnsNameResolverProvider can cause
problems to external name resolvers that delegate to a DnsResolver
already wrapped in RetryingNameResolver. ManagedChannelImpl would
end up wrapping these name resolvers again, causing an exception
later from a RetryingNameResolver safeguard that checks for double
wrapping.
Motivation:
When multiple NameResolvers are created, the Classloader is scanned every time trying to figure out if the Platform is Android. This expensive work could be done only once.
Modification:
Cache isAndroid resolution in a constant.
Result:
Less expensive multiple NameResolvers instantiation.
ManagedCahnnelImpl did not make sure to use a RetryingNameResolver if
authority was not overriden. This was not a problem for DNS name
resolution as the DNS name resolver factory explicitly returns a
RetryingNameResolver. For polling name resolvers that do not do this in
their factories (like the grpclb name resolver) this meant not having retry
at all.
* context, all: move Context classes to grpc-api
clean up grpc-context since it has no source code: only add dep on grpc-api
add exclusion for all transitive deps of grpc-api - only guava
exclude grpc-context as a dependency from grpc-alts because all context code is in grpc-api now
api: 1.7 as target Java version for Context source-set of grpc-api
* core, census: fix the issues with android project pulling in old grpc-context version
* api,context: make changes to bazel build files to account for context code moving from context to api
Since 44847bf4e, when we upgraded our JUnit version, the JUnit
exclusions have probably not been necessary. e0ac97c4f upgraded
Robolectric to a version that had the auto.service problem fixed.
This avoids the (often missing) evaluationDependsOn and fixes using
results from other projects without propagating those through
Configuration. It also reduces the number of useless classes pulled in
by down-stream tests, reducing the probability of rebuilds.
The expectation of fixtures is they help testing down-stream code that
use the classes in main. That applies to all the classes here except for
FakeClock and StaticTestingClassLoader. It would also apply to many
internal classes in grpc-testing, but let's consider cleaning that up
future work.
LoadBalancers in general should never throw, but
parseLoadBalancingConfig() in particular has a return value to
communicate the error. Throwing can be a bit unpredictable, but at its
most trivial form causes a channel panic. There's no reason to throw
explicitly and calls to JsonUtil have to be protected by a try-catch
because it can throw.
ReadyPicker hasn't been necessary since 111ff60e, when we stopped
calling super.pickSubchannel(). EmptyPicker was only used in tests, and
we can just compare the class name instead of doing an instanceof check.
Unfortunately, calling getClass() caused Java to start casting the
return value of pickerCaptor.getValue() based on its generics. Captors
don't verify the type they capture, so using any type other than
SubchannelPicker for the pickerCaptor is misleading and hides a cast.
If provided with the new PickFirstLoadBalancerConfig,
PickFirstLoadBalancer will shuffle the list of addresses it receives
from the name resolver.
PickFirstLoadBalancerProvider will now support the new config
if enabled by an env variable.
When the subchannel is transitioning from TRANSIENT_FAILURE to either
IDLE or CONNECTING we will not update the LB state. Additionally, if
the subchannel becomes idle we request a new connection so that the
subchannel will keep on trying to establish a connection.
The problem was one hedge was committed before another had drained
start(). This was not testable because HedgingRunnable checks whether
scheduledHedgingRef is cancelled, which is racy, but there's no way to
deterministically trigger either race.
The same problem couldn't be triggered with retries because only one
attempt will be draining at a time. Retries with cancellation also
couldn't trigger it, for the surprising reason that the noop stream used
in cancel() wasn't considered drained.
This commit marks the noop stream as drained with cancel(), which allows
memory to be garbage collected sooner and exposes the race for tests.
That then showed the stream as hanging, because inFlightSubStreams
wasn't being decremented.
Fixes#9185