Commit Graph

2074 Commits

Author SHA1 Message Date
Kannan J 8b46ad58c3
Start 1.76.0 development cycle (#12258) 2025-08-05 22:00:39 +05:30
Eric Anderson 6ff8ecac09 core: Don't pre-compute DEADLINE_EXCEEDED message for delayed calls
The main reason I made a change here was to fix the tense from the
deadline "will be exceeded in" to "was exceeded after". But we really
don't want to be doing the string formatting unless the deadline is
actually exceeded. There were a few more changes to make some variables
effectively final.
2025-07-22 06:56:02 -07:00
Eric Anderson 1fc4ab0bb2 LBs should avoid calling LBs after lb.shutdown()
LoadBalancers shouldn't be called after shutdown(), but RingHashLb could
have enqueued work to the SynchronizationContext that executed after
shutdown(). This commit fixes problems discovered when auditing all LBs
usage of the syncContext for that type of problem.

Similarly, PickFirstLb could have requested a new connection after
shutdown(). We want to avoid that sort of thing too.

RingHashLb's test changed from CONNECTING to TRANSIENT_FAILURE to get
the latest picker. Because two subchannels have failed it will be in
TRANSIENT_FAILURE. Previously the test was using an older picker with
out-of-date subchannelView, and the verifyConnection() was too imprecise
to notice it was creating the wrong subchannel.

As discovered in b/430347751, where ClusterImplLb was seeing a new
subchannel being called after the child LB was shutdown (the shutdown
itself had been caused by RingHashConfig not implementing equals() and
was fixed by a8de9f07ab, which caused ClusterResolverLb to replace its
state):

```
java.lang.NullPointerException
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createClusterLocalityFromAttributes(ClusterImplLoadBalancer.java:322)
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createSubchannel(ClusterImplLoadBalancer.java:236)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.internal.PickFirstLeafLoadBalancer.createNewSubchannel(PickFirstLeafLoadBalancer.java:527)
	at io.grpc.internal.PickFirstLeafLoadBalancer.requestConnection(PickFirstLeafLoadBalancer.java:459)
	at io.grpc.internal.PickFirstLeafLoadBalancer.acceptResolvedAddresses(PickFirstLeafLoadBalancer.java:174)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.activate(LazyLoadBalancer.java:64)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.requestConnection(LazyLoadBalancer.java:97)
	at io.grpc.util.ForwardingLoadBalancer.requestConnection(ForwardingLoadBalancer.java:61)
	at io.grpc.xds.RingHashLoadBalancer$RingHashPicker.lambda$pickSubchannel$0(RingHashLoadBalancer.java:440)
	at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:96)
	at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:128)
	at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.onData(XdsClientImpl.java:817)
```
2025-07-17 12:56:33 +00:00
John Cormie 94532a6b56
binder: Introduce server pre-authorization (#12127)
grpc-binder clients authorize servers by checking the UID of the sender of the SETUP_TRANSPORT Binder transaction against some SecurityPolicy. But merely binding to an unauthorized server to learn its UID can enable "keep-alive" and "background activity launch" abuse, even if security policy ultimately decides the connection is unauthorized. Pre-authorization mitigates this kind of abuse by looking up and authorizing a candidate server Application's UID before binding to it. Pre-auth is especially important when the server's address is not fixed in advance but discovered by PackageManager lookup.
2025-07-10 14:14:36 -07:00
Eric Anderson 6dfa03c51c
core: grpc-timeout should always be positive (#12201)
PROTOCOL-HTTP2.md specifies "TimeoutValue → {positive integer as ASCII
string of at most 8 digits}". Zero is not positive, so it should be
avoided. So make sure timeouts are at least 1 nanosecond instead of 0
nanoseconds.

grpc-go recently began disallowing zero timeouts in
https://github.com/grpc/grpc-go/pull/8290 which caused a regression as
grpc-java can generate such timeouts. Apparently no gRPC implementation
had previously been checking for zero timeouts.

Instead of changing the max(0) to max(1) everywhere, just move the max
handling into TimeoutMarshaller, since every caller of TIMEOUT_KEY was
doing the same max() handling.

Before fd8fd517d (in 2016!), grpc-java actually behaved correctly, as it
failed RPCs with timeouts "<= 0". The commit changed the handling to the
max(0) handling we see now.

b/427338711
2025-07-03 11:44:04 +05:30
Abhishek Agrawal 919370172d
census: APIs for stats and tracing (#12050) 2025-07-01 20:44:28 +05:30
Eric Anderson af7efeb9f5 core: Rely on ping-pong for flow control testing
The previous code did a ping-pong to make sure the transport had enough
time to process, but then proceeded to sleep 5 seconds. That sleep would
have been needed without the ping-pong, but with the ping-pong we are
confident all events have been drained from the transport. Deleting the
unnecessary sleeps saves 10 seconds, for each of the 9 instances of this
test.
2025-06-25 04:52:46 +00:00
Eric Anderson ebc6d3e932 Start 1.75.0 development cycle 2025-06-25 04:52:17 +00:00
Eric Anderson d2d8ed8efa xds: Add logical dns cluster support to XdsDepManager
ClusterResolverLb gets the NameResolverRegistry from
LoadBalancer.Helper, so a new API was added in NameResover.Args to
propagate the same object to the name resolver tree.

RetryingNameResolver was exposed to xds. This is expected to be
temporary, as the retrying is being removed from ManagedChannelImpl and
moved into the resolvers. At that point, DnsNameResolverProvider would
wrap DnsNameResolver with a similar API to RetryingNameResolver and xds
would no longer be responsible.
2025-06-17 22:14:20 +00:00
Eric Anderson d88ef97a87 core: Remove RetryingNR.RESOLUTION_RESULT_LISTENER_KEY
It was introduced in fcb5c54e4 because at the time we didn't change the
API to communicate the status. When onResult2() was introduced in
90d0fabb1 this hack stopped being necessary.
2025-06-13 15:20:10 +00:00
MV Shiva 26bd0eee47
core: Use lazy message formatting in checkState (#12144) 2025-06-13 12:30:13 +05:30
John Cormie 4c73999102
Move all test helper classes out of AbstractTransportTest so they can be used elsewhere (#12125) 2025-06-05 15:39:51 +05:30
MV Shiva 59adeb9d47
Start 1.74.0 development cycle (#12061) 2025-05-13 17:15:36 +05:30
Eric Anderson 3f5fdf1266
core: Remove unnecessary SuppressWarnings("deprecation") (#12052) 2025-05-07 10:14:48 +05:30
Kim Jin Young 12aaf88d86
Fix comment's typo (#12045) 2025-05-05 22:32:31 +05:30
Eric Anderson f79ab2f16f api: Remove deprecated SubchannelPicker.requestConnection()
It has been deprecated since cec9ee368, six years ago. It was replaced
with LoadBalancer.requestConnection().
2025-04-09 12:51:33 -07:00
Eric Anderson 2db4852e23 core: Loop over interceptors when computing effective interceptors
A post-merge review of 8516cfef9 suggested this change and the comment
had been lost in my inbox.
2025-04-07 21:33:55 -07:00
Eric Anderson 5ca4d852ae
core: Avoid Set.removeAll() when passing a possibly-large List (#11994)
See #11958
2025-04-04 17:46:37 +05:30
Abhishek Agrawal d4c46a7f1f
refactor: prevents global stats config freeze in ConfiguratorRegistry.getConfigurators() (#11991) 2025-04-04 11:23:08 +05:30
Eric Anderson 908f9f19cd
core: Delete the long-deprecated GRPC_PROXY_EXP (#11988)
"EXP" stood for experimental and all documentation that referenced it made it clear it was experimental. It's been some years since we started logging a message when it was used to say it will be deleted. There's no time like the present to delete it.
2025-04-02 16:24:32 +05:30
Eric Anderson 8ca7c4ef1f
core: Delete stale SuppressWarnings("deprecated") for ATTR_LOAD_BALANCING_CONFIG (#11982)
ATTR_LOAD_BALANCING_CONFIG was deleted in bf7a42dbd.
2025-04-02 16:22:00 +05:30
Kannan J c28a7e3e06
okhttp: Per-rpc call option authority verification (#11754) 2025-04-02 10:10:41 +05:30
Abhishek Agrawal 8f6a16f846
Start 1.73.0 development cycle (#11987) 2025-04-01 16:27:39 +05:30
Alex Panchenko 7507a9ec06
core: Use java.time.Time.getNano in InstantTimeProvider without reflection (#11977)
Fixes #11975
2025-03-26 13:49:21 +05:30
Eric Anderson 3961a923ac
core: Log any exception during panic because of exception
panic() calls a good amount of code, so it could get another exception.
The SynchronizationContext is running on an arbitrary thread and we
don't want to propagate this secondary exception up its stack (to be
handled by its UncaughtExceptionHandler); it we wanted that we'd
propagate the original exception.

This second exception will only be seen in the logs; the first exception
was logged and will be used to fail RPCs.

Also related to http://yaqs/8493785598685872128 and b692b9d26
2025-03-24 14:32:53 -07:00
Alex Panchenko d60e6fc251
Replace usages of deprecated ExpectedException in grpc-api and grpc-core (#11962) 2025-03-21 13:00:24 +05:30
Abhishek Agrawal a57c14a51e
refactor: Stops exception allocation on channel shutdown
This fixes #11955.

Stops exception allocation and its propagation on channel shutdown.
2025-03-19 09:27:34 +05:30
Eric Anderson ca4819ac6d core: Apply ManagedChannelImpl's updateBalancingState() immediately
ffcc360ba adjusted updateBalancingState() to require being run within
the sync context. However, it still queued the work into the sync
context, which was unnecessary. This re-entering the sync context
unnecessarily delays the new state from being used.
2025-03-06 12:31:10 -08:00
Kannan J cdab410b81
netty: Per-rpc authority verification against peer cert subject names (#11724)
Per-rpc verification of authority specified via call options or set by the LB API against peer cert's subject names.
2025-02-24 20:28:11 +05:30
Kannan J a132123c93
Start 1.72.0 development cycle (#11907) 2025-02-18 19:46:02 +05:30
Alex Panchenko 5a7f350537
optimize number of buffer allocations (#11879)
Currently this improves 2 flows

1. Known length message which length is greater than 1Mb. Previously the
first buffer was 1Mb, and then many buffers of 4096 bytes (from
CodedOutputStream), now subsequent buffers are also up to 1Mb

2. In case of compression, the first write is always 10 bytes buffer
(gzip header), but worth allocating more space
2025-02-14 05:59:21 -08:00
MV Shiva 7585b1607d
core: remember last pick status in no real stream (#11851) 2025-02-14 11:38:06 +05:30
Naveen Prasanna V 302342cfce
core: logging the error message when onClose() itself fails (#11880) 2025-02-11 11:38:07 +05:30
Benjamin Peterson dc316f7fd9 Add missing frame release to Http2ClientStreamTransportState.
If a data frame is received before headers, processing of the frame is abandoned. The frame must be released in that case.
2025-02-10 21:40:05 -08:00
Abhishek Agrawal 44e92e2c2c
core: updates the backoff range as per the A6 redefinition (#11858)
* core: updates the backoff range being used from [0, 1] to [0.8, 1.2] as per the A6 redefinition

* adds a flag for experimental jitter

* xds: Allow FaultFilter's interceptor to be reused

This is the only usage of PickSubchannelArgs when creating a filter's
ClientInterceptor, and a follow-up commit will remove the argument and
actually reuse the interceptors. Other filter's interceptors can
already be reused.

There doesn't seem to be any significant loss of legibility by making
FaultFilter a more ordinary interceptor, but the change does cause the
ForwardingClientCall to be present when faultDelay is configured,
independent of whether the fault delay ends up being triggered.

Reusing interceptors will move more state management out of the RPC path
which will be more relevant with RLQS.

* netty: Removed 4096 min buffer size (#11856)

* netty: Removed 4096 min buffer size

* turns the flag in a var for better efficiency

---------

Co-authored-by: Eric Anderson <ejona@google.com>
2025-02-05 14:18:20 -08:00
Eric Anderson 90aefb26e7 core: Propagate authority override from LB exactly once
Setting the authority is only useful when creating a real stream, as
there will be a following pick otherwise. In addition, DelayedStream
will buffer each call to setAuthority() in a list and we don't want that
memory usage. Note that no LBs are using this feature yet, so users
would not have been exposed to the memory use.

We also needed to setAuthority() when the LB selected a subchannel on
the first pick attempt.
2025-01-30 08:08:17 -08:00
Larry Safran 87aa6deadf
core:Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful (#11849)
* Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful.
Move cleanup of removed subchannels into a method so it can be called from 2 places in acceptResolvedAddresses.
Since the seek could mean we never looked at the first address, if we go off the end of the index and haven't looked at the all of the addresses then instead of scheduleBackoff() we reset the index and request a connection.
2025-01-24 16:42:56 -08:00
Larry Safran 228dcf7a01
core: Alternate ipV4 and ipV6 addresses for Happy Eyeballs in PickFirstLeafLoadBalancer (#11624)
* Interweave ipV4 and ipV6 addresses as per gRFC.
2025-01-13 16:38:16 -08:00
Eric Anderson 70825adce6 Replace jsr305's GuardedBy with Error Prone's
We should avoid jsr305 and error prone's has the same semantics.
2025-01-10 08:16:48 -08:00
Eric Anderson 7b5d0692cc
Replace jsr305's CheckReturnValue with Error Prone's (#11811)
We should avoid jsr305 and error prone's has the same semantics.

Fixes #8687
2025-01-09 13:45:35 -08:00
MV Shiva ae109727d3
Start 1.71.0 development cycle (#11801) 2025-01-07 14:33:09 +05:30
Larry Safran 4222f77587
xds:Move creating the retry timer in handleRpcStreamClosed to as late as possible and call close() (#11776)
* Move creating the retry timer in handleRpcStreamClosed to as late as possible and call `close` so that the `call` is cancelled.
Also add some debug logging.
2025-01-06 13:09:42 -08:00
Eric Anderson 805cad3782 bazel: Restore DoNotCall ErrorProne check
In e08b9db20 we added `@DoNotCall` annotations to some call sites, but
Bazel used an older version of ErrorProne that complained at times it
shouldn't. The minimum version of Bazel we test/support is now Bazel 6,
well past Bazel 3.4+.
2024-12-23 12:45:42 -08:00
Alex Panchenko ebe2b48677
api: StatusRuntimeException without stacktrace - Android compatibility (#11072)
This is an alternative to e36f099be9 that avoids the "fillInStaceTrace"
constructor which is only available starting at Android API level 24.
2024-12-21 17:09:57 -08:00
John Cormie 0b2d44098f
Introduce custom NameResolver.Args (#11669)
grpc-binder's upcoming AndroidIntentNameResolver needs to know the target Android user so it can resolve target URIs in the correct place. Unfortunately, Android's built in intent:// URI scheme has no way to specify a user and in fact the android.os.UserHandle object can't reasonably be encoded as a String at all.

We solve this problem by extending NameResolver.Args with the same type-safe and domain-specific Key<T> pattern used by CallOptions, Context and CreateSubchannelArgs. New "custom" arguments could apply to all NameResolvers of a certain URI scheme, to all NameResolvers producing a particular type of java.net.SocketAddress, or even to a specific NameResolver subclass.
2024-12-19 23:32:47 -08:00
Eric Anderson 8ea3629378
Re-enable animalsniffer, fixing violations
In 61f19d707a I swapped the signatures to use the version catalog. But I
failed to preserve the `@signature` extension and it all seemed to
work... But in fact all the animalsniffer tasks were completing as
SKIPPED as they lacked signatures. The build.gradle changes in this
commit are to fix that while still using version catalog.

But while it was broken violations crept in. Most violations weren't
too important and we're not surprised went unnoticed. For example, Netty
with TLS has long required the Java 8 API
`setEndpointIdentificationAlgorithm()`, so using `Optional` in the same
code path didn't harm anything in particular. I still swapped it to
Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`.

One important violation has not been fixed and instead I've disabled the
android signature in api/build.gradle for the moment.  The violation is
in StatusException using the `fillInStackTrace` overload of Exception.
This problem [had been noticed][PR11066], but we couldn't figure out
what was going on. AnimalSniffer is now noticing this and agreeing with
the internal linter. There is still a question of why our interop tests
failed to notice this, but given they are no longer running on pre-API
level 24, that may forever be a mystery.

[PR11066]: https://github.com/grpc/grpc-java/pull/11066
2024-12-19 07:54:54 -08:00
Eric Anderson 6055adca5a
core: Simplify DnsNameResolver by using ObjectPool
ObjectPool is our standard solution for dealing with the
sometimes-shutdown resources. This was implemented by a contributor not
familiar with regular tools.

There are wider changes that can be made here, but I chose to just do a
smaller change because this class is used by GrpclbNameResolver.
2024-12-10 12:53:24 -08:00
Kannan J 229a010f55
Start 1.70.0 development cycle (#11708)
* Start 1.70.0 development cycle
2024-11-26 21:36:50 +05:30
Vindhya Ningegowda 20d09cee57
xds: Add counter and gauge metrics (#11661)
Adds the following xDS client metrics defined in [A78](https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#xdsclient).

Counters
- grpc.xds_client.server_failure
- grpc.xds_client.resource_updates_valid
- grpc.xds_client.resource_updates_invalid

Gauges
- grpc.xds_client.connected
- grpc.xds_client.resources
2024-11-25 16:47:32 -08:00
vinodhabib 4ae04b7d94
core: increased test tolerance to 1 second
Fixes #11680
2024-11-18 07:42:51 -08:00