Commit Graph

2045 Commits

Author SHA1 Message Date
Kannan J a132123c93
Start 1.72.0 development cycle (#11907) 2025-02-18 19:46:02 +05:30
Alex Panchenko 5a7f350537
optimize number of buffer allocations (#11879)
Currently this improves 2 flows

1. Known length message which length is greater than 1Mb. Previously the
first buffer was 1Mb, and then many buffers of 4096 bytes (from
CodedOutputStream), now subsequent buffers are also up to 1Mb

2. In case of compression, the first write is always 10 bytes buffer
(gzip header), but worth allocating more space
2025-02-14 05:59:21 -08:00
MV Shiva 7585b1607d
core: remember last pick status in no real stream (#11851) 2025-02-14 11:38:06 +05:30
Naveen Prasanna V 302342cfce
core: logging the error message when onClose() itself fails (#11880) 2025-02-11 11:38:07 +05:30
Benjamin Peterson dc316f7fd9 Add missing frame release to Http2ClientStreamTransportState.
If a data frame is received before headers, processing of the frame is abandoned. The frame must be released in that case.
2025-02-10 21:40:05 -08:00
Abhishek Agrawal 44e92e2c2c
core: updates the backoff range as per the A6 redefinition (#11858)
* core: updates the backoff range being used from [0, 1] to [0.8, 1.2] as per the A6 redefinition

* adds a flag for experimental jitter

* xds: Allow FaultFilter's interceptor to be reused

This is the only usage of PickSubchannelArgs when creating a filter's
ClientInterceptor, and a follow-up commit will remove the argument and
actually reuse the interceptors. Other filter's interceptors can
already be reused.

There doesn't seem to be any significant loss of legibility by making
FaultFilter a more ordinary interceptor, but the change does cause the
ForwardingClientCall to be present when faultDelay is configured,
independent of whether the fault delay ends up being triggered.

Reusing interceptors will move more state management out of the RPC path
which will be more relevant with RLQS.

* netty: Removed 4096 min buffer size (#11856)

* netty: Removed 4096 min buffer size

* turns the flag in a var for better efficiency

---------

Co-authored-by: Eric Anderson <ejona@google.com>
2025-02-05 14:18:20 -08:00
Eric Anderson 90aefb26e7 core: Propagate authority override from LB exactly once
Setting the authority is only useful when creating a real stream, as
there will be a following pick otherwise. In addition, DelayedStream
will buffer each call to setAuthority() in a list and we don't want that
memory usage. Note that no LBs are using this feature yet, so users
would not have been exposed to the memory use.

We also needed to setAuthority() when the LB selected a subchannel on
the first pick attempt.
2025-01-30 08:08:17 -08:00
Larry Safran 87aa6deadf
core:Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful (#11849)
* Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful.
Move cleanup of removed subchannels into a method so it can be called from 2 places in acceptResolvedAddresses.
Since the seek could mean we never looked at the first address, if we go off the end of the index and haven't looked at the all of the addresses then instead of scheduleBackoff() we reset the index and request a connection.
2025-01-24 16:42:56 -08:00
Larry Safran 228dcf7a01
core: Alternate ipV4 and ipV6 addresses for Happy Eyeballs in PickFirstLeafLoadBalancer (#11624)
* Interweave ipV4 and ipV6 addresses as per gRFC.
2025-01-13 16:38:16 -08:00
Eric Anderson 70825adce6 Replace jsr305's GuardedBy with Error Prone's
We should avoid jsr305 and error prone's has the same semantics.
2025-01-10 08:16:48 -08:00
Eric Anderson 7b5d0692cc
Replace jsr305's CheckReturnValue with Error Prone's (#11811)
We should avoid jsr305 and error prone's has the same semantics.

Fixes #8687
2025-01-09 13:45:35 -08:00
MV Shiva ae109727d3
Start 1.71.0 development cycle (#11801) 2025-01-07 14:33:09 +05:30
Larry Safran 4222f77587
xds:Move creating the retry timer in handleRpcStreamClosed to as late as possible and call close() (#11776)
* Move creating the retry timer in handleRpcStreamClosed to as late as possible and call `close` so that the `call` is cancelled.
Also add some debug logging.
2025-01-06 13:09:42 -08:00
Eric Anderson 805cad3782 bazel: Restore DoNotCall ErrorProne check
In e08b9db20 we added `@DoNotCall` annotations to some call sites, but
Bazel used an older version of ErrorProne that complained at times it
shouldn't. The minimum version of Bazel we test/support is now Bazel 6,
well past Bazel 3.4+.
2024-12-23 12:45:42 -08:00
Alex Panchenko ebe2b48677
api: StatusRuntimeException without stacktrace - Android compatibility (#11072)
This is an alternative to e36f099be9 that avoids the "fillInStaceTrace"
constructor which is only available starting at Android API level 24.
2024-12-21 17:09:57 -08:00
John Cormie 0b2d44098f
Introduce custom NameResolver.Args (#11669)
grpc-binder's upcoming AndroidIntentNameResolver needs to know the target Android user so it can resolve target URIs in the correct place. Unfortunately, Android's built in intent:// URI scheme has no way to specify a user and in fact the android.os.UserHandle object can't reasonably be encoded as a String at all.

We solve this problem by extending NameResolver.Args with the same type-safe and domain-specific Key<T> pattern used by CallOptions, Context and CreateSubchannelArgs. New "custom" arguments could apply to all NameResolvers of a certain URI scheme, to all NameResolvers producing a particular type of java.net.SocketAddress, or even to a specific NameResolver subclass.
2024-12-19 23:32:47 -08:00
Eric Anderson 8ea3629378
Re-enable animalsniffer, fixing violations
In 61f19d707a I swapped the signatures to use the version catalog. But I
failed to preserve the `@signature` extension and it all seemed to
work... But in fact all the animalsniffer tasks were completing as
SKIPPED as they lacked signatures. The build.gradle changes in this
commit are to fix that while still using version catalog.

But while it was broken violations crept in. Most violations weren't
too important and we're not surprised went unnoticed. For example, Netty
with TLS has long required the Java 8 API
`setEndpointIdentificationAlgorithm()`, so using `Optional` in the same
code path didn't harm anything in particular. I still swapped it to
Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`.

One important violation has not been fixed and instead I've disabled the
android signature in api/build.gradle for the moment.  The violation is
in StatusException using the `fillInStackTrace` overload of Exception.
This problem [had been noticed][PR11066], but we couldn't figure out
what was going on. AnimalSniffer is now noticing this and agreeing with
the internal linter. There is still a question of why our interop tests
failed to notice this, but given they are no longer running on pre-API
level 24, that may forever be a mystery.

[PR11066]: https://github.com/grpc/grpc-java/pull/11066
2024-12-19 07:54:54 -08:00
Eric Anderson 6055adca5a
core: Simplify DnsNameResolver by using ObjectPool
ObjectPool is our standard solution for dealing with the
sometimes-shutdown resources. This was implemented by a contributor not
familiar with regular tools.

There are wider changes that can be made here, but I chose to just do a
smaller change because this class is used by GrpclbNameResolver.
2024-12-10 12:53:24 -08:00
Kannan J 229a010f55
Start 1.70.0 development cycle (#11708)
* Start 1.70.0 development cycle
2024-11-26 21:36:50 +05:30
Vindhya Ningegowda 20d09cee57
xds: Add counter and gauge metrics (#11661)
Adds the following xDS client metrics defined in [A78](https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#xdsclient).

Counters
- grpc.xds_client.server_failure
- grpc.xds_client.resource_updates_valid
- grpc.xds_client.resource_updates_invalid

Gauges
- grpc.xds_client.connected
- grpc.xds_client.resources
2024-11-25 16:47:32 -08:00
vinodhabib 4ae04b7d94
core: increased test tolerance to 1 second
Fixes #11680
2024-11-18 07:42:51 -08:00
Eric Anderson 4e8f7df589
util: Remove resolvedAddresses from MultiChildLb.ChildLbState
It isn't actually used by MultiChildLb, and using the health API gives
us more confidence that health is properly plumbed.
2024-11-14 12:56:24 -08:00
Eric Anderson 1993e68b03
Upgrade depedencies (#11655) 2024-11-01 07:50:08 -07:00
Kannan J c167ead851
xds: Per-rpc rewriting of the authority header based on the selected route. (#11631)
Implementation of A81.
2024-10-30 21:11:41 +05:30
vinodhabib 9176b55286
core: Make timestamp usage in Channelz use nanos from Java.time.Instant when available (#11604)
When java.time.Instant is available use the timestamp from this class in nano precision rather than using System.currentTimeInMillis and converting it to nanos.

Fixes #5494.
2024-10-29 10:19:47 +05:30
Eric Anderson 31dad6af49 Start 1.69.0 development cycle 2024-10-24 10:57:29 -07:00
MV Shiva b65cbf5081
inprocess: Support tracing message sizes guarded by flag (#11629) 2024-10-24 01:22:41 +05:30
erm-g 4be69e3f8a
core: SpiffeUtil API for extracting Spiffe URI and loading TrustBundles (#11575)
Additional API for SpiffeUtil:
 - extract Spiffe URI from certificate chain
 - load Spiffe Trust Bundle from filesystem [json spec][] [JWK spec][]

JsonParser was changed to reject duplicate keys in objects.

[json spec]: https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE_Trust_Domain_and_Bundle.md
[JWK spec]: https://github.com/spiffe/spiffe/blob/main/standards/X509-SVID.md#61-publishing-spiffe-bundle-elements
2024-10-17 11:11:07 -07:00
Eric Anderson b692b9d26e core: Handle NR/LB exceptions when panicking
If a panic is followed a panic, we'd ignore the second. But if an
exception happens while entering panic mode we may fail to update the
picker with the first error. This is "fine" from a correctness
standpoint; all bets are off when panicking and we've already logged the
first error. But failing RPCs can often be more easily seen than just
the log.

Noticed because of http://yaqs/8493785598685872128
2024-10-16 13:26:45 -07:00
MV Shiva ca43d78f58
inprocess: Support tracing message sizes (#11406) 2024-10-11 10:28:51 +05:30
yifeizhuang 2129078dee
core: fix test flakiness in retriableStream hedging deadlock test (#11606) 2024-10-08 17:44:40 -07:00
yifeizhuang 2aae68e117
report uncompressed message size when it does not need compression (#11598) 2024-10-07 10:44:27 -07:00
Kannan J 1ded8aff81
On result2 resolution result have addresses or error (#11330)
Combined success / error status passed via ResolutionResult to the NameResolver.Listener2 interface's onResult2 method - Addresses in the success case or address resolution error in the failure case now get set in ResolutionResult::addressesOrError by the internal name resolvers.
2024-10-07 17:55:56 +05:30
Larry Safran 9bb06af963
Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time (#11520)
* Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time if environment variable GRPC_SERIALIZE_RETRIES == true.

Cache serializingRetries value so that it doesn't have to look up the flag every time.

Clear the correct task when READY in processSubchannelState and move the logic to cancelScheduledTasks

Cleanup based on PR review

remove unneeded checks for shutdown.

* Fix previously broken tests

* Shutdown previous subchannel when run off end of index.

* Provide option to disable subchannel retries to let PFLeafLB take control of retries.

* InternalSubchannel internally goes to IDLE when sees TF when reconnect is disabled.
Remove an extra index.increment in LeafLB
2024-10-02 17:03:47 -07:00
Kannan J fa26a8bc5e
Make address resolution error use the default service config (#11577)
Fixes #11040.
2024-10-01 11:21:08 +05:30
erm-g 1c069375ce
core: SpiffeId parser (#11490)
SpiffeId parser compliant with [official spec](https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE-ID.md)
2024-09-26 12:01:11 -04:00
Larry Safran 5de65a6d50
use an attribute from resolved addresses IS_PETIOLE_POLICY to control whether or not health checking is supported (#11513)
* use an attribute from resolved addresses IS_PETIOLE_POLICY to control whether or not health checking is supported so that top level versions can't do any health checking, while those under petiole policies can.

Fixes #11413
2024-09-06 11:43:07 -07:00
Eric Anderson 721d063d55 core: touch() buffer when detach()ing
Detachable lets a buffer outlive its original lifetime. The new lifetime
is application-controlled. If the application fails to read/close the
stream, then the leak detector wouldn't make clear what code was
responsible for the buffer's lifetime. With this touch, we'll be able to
see detach() was called and thus know the application needs debugging.

Realized when looking at b/364531464, although I think the issue is
unrelated.
2024-09-05 14:39:33 -07:00
MV Shiva 8adfbf9ac5
Start 1.68.0 development cycle (#11507) 2024-09-04 19:33:28 +05:30
Vindhya Ningegowda 1dae144f0a
xds: Fix load reporting when pick first is used for locality-routing. (#11495)
* Determine subchannel's network locality from connected address, instead of assuming that all addresses for a subchannel are in the same locality.
2024-08-31 16:07:53 -07:00
Eric Anderson ee3ffef3ee core: In PF, disjoint update while READY should transition to IDLE
This is the same as if we received a GOAWAY. We wait for the next RPC to
begin connecting again. This is InternalSubchannel's behavior.
2024-08-22 11:23:11 -07:00
Eric Anderson 9762945f81 core: In PF, remove extraneous index.reset()
The index was just reset by updateGroups().
2024-08-21 07:16:29 -07:00
Eric Anderson 82a8d57396 core: In PF, remove useless requestConnection for CONNECTING subchannel
It doesn't do anything.

Call scheduleNextConnection() unconditionally since it is responsible
for checking if `enableHappyEyeballs == true`. It's also surprising to
check in the CONNECTING case but not the IDLE case.
2024-08-21 07:16:29 -07:00
Eric Anderson 2c93791c98 core: PF.requestConnection() is possible when READY
requestConnection() is public API, and it is allowed to be called even
if the load balancer is already READY.
2024-08-21 07:16:29 -07:00
Eric Anderson 4914ffc59a core: Avoid exception handling in PF for invalid index
It is trivial to avoid the exception from
addressIndex.getCurrentAddress(). The log message was inaccurate, as the
subchannel might have been TRANSIENT_FAILURE. The only important part of
the condition was whether the subchannel was the current subchannel.
2024-08-21 07:16:29 -07:00
Eric Anderson 33687d3983 core: Remove useless NPE check for syncContext in PF
It will never throw, because it would only throw if helper is null, but
helper is checkNotNull()ed in the constructor. It could have checked for
a null return value instead; since it hasn't been, it is clear we don't
need this check.
2024-08-21 07:16:29 -07:00
Eric Anderson 8bd97953ad core: Never have null PF Index
This prevents many null checks and combines two code paths, with no
additional allocations.
2024-08-21 07:16:29 -07:00
Eric Anderson c120e364d2 core: PF Index.size() should be number of addresses
This would impact TRANSIENT_FAILURE and refreshNameResolver() logic for
dual-stack endpoints.
2024-08-17 08:54:18 -07:00
Eric Anderson ff8e413760
Remove direct dependency on j2objc
Bazel had the dependency added because of #5046, where Guava was
depending on it as compile-only and Bazel build have "unknown enum
constant" warnings. Guava now has a compile dependency on j2objc, so
this workaround is no longer needed. There are currently no version skew
issues in Gradle, which was the only usage.
2024-08-13 21:33:55 -07:00
Petr Portnov | PROgrm_JARvis 40e2b165b7
Make once-set fields of `AbstractClientStream` `final` (#11389) 2024-08-08 10:48:12 +05:30