Commit Graph

6625 Commits

Author SHA1 Message Date
John Cormie e58c998a42
AndroidComponentAddress includes a target UserHandle (#11670)
The target UserHandle is best modeled as part of the SocketAddress not the Channel since it's part of the server's location.

This change allows a NameResolver to select different target users over time within a single Channel.
2024-11-18 17:31:01 -08:00
zbilun 6a92a2a22e
interop-testing: Add concurrency condition to the soak test using existing blocking api
The goal of this PR is to increase the test coverage of the C2P E2E load test by improving the rpc_soak and channel_soak tests to support concurrency.

**rpc_soak:**
The client performs many large_unary RPCs in sequence over the same channel. The test can run in either a concurrent or non-concurrent mode, depending on the number of threads specified (soak_num_threads):
  - Non-Concurrent Mode: When soak_num_threads = 1, all RPCs are performed sequentially on a single thread.
  - Concurrent Mode: When soak_num_threads > 1, the client uses multiple threads to distribute the workload. Each thread performs a portion of the total soak_iterations, executing its own set of RPCs concurrently.

**channel_soak:**
Similar to rpc_soak, but this time each RPC is performed on a new channel. The channel is created just before each RPC and is destroyed just after. Note on Concurrent Execution and Channel Creation: In a concurrent execution setting (i.e., when soak_num_threads > 1), each thread performs a portion of the total soak_iterations and creates and destroys its own channel for each RPC iteration.
- createNewChannel Function: In channel_soak, the createNewChannel function is used by each thread to create a new channel before every RPC. This function ensures that each RPC has a separate channel, preventing race conditions by isolating channels between threads. It shuts down the previous channel (if any) and creates a new one for each iteration, ensuring accurate latency measurement per RPC.

- Thread-specific logs will include the thread_id, helping to track performance across threads, especially when each thread is managing its own channel lifecycle.
2024-11-18 11:57:02 -08:00
vinodhabib 4ae04b7d94
core: increased test tolerance to 1 second
Fixes #11680
2024-11-18 07:42:51 -08:00
Eric Anderson 5431bf7e77 services: Don't track code coverage of reflection.v1 gencode
Generated code for v1alpha was ignored, but not v1. Ignoring v1 reduces
lines being checked from 16,145 to 6,303, significantly improving the
overall code coverage and removing noise. This was noticed because there
was a very clear drop at 0aa976c4 visible in the coveralls.io coverage
graph, the point when v1 was introduced.
2024-11-15 08:22:13 -08:00
Eric Anderson 1f159d7899 xds: Fix XdsSecurityClientServerTest TrustManagerStore race
When spiffe support was added it caused
tlsClientServer_useSystemRootCerts_validationContext to become flaky.
This is because test execution order was important for whether the race
would occur.

Fixes #11678
2024-11-14 22:01:38 -08:00
Eric Anderson 4e8f7df589
util: Remove resolvedAddresses from MultiChildLb.ChildLbState
It isn't actually used by MultiChildLb, and using the health API gives
us more confidence that health is properly plumbed.
2024-11-14 12:56:24 -08:00
John Cormie b1703345f7
Make channelz work with proto lite (#11685)
Allows android apps to expose internal grpc state for debugging.
2024-11-13 16:50:14 -08:00
MV Shiva 921f88ae30
services: Deprecate V1alpha (#11681) 2024-11-12 12:27:40 +05:30
Eric Anderson 8237ae270a util: Remove EAG conveniences from MultiChildLb
This is a step toward removing ResolvedAddresses from ChildLbState,
which isn't actually used by MultiChildLb. Most usages of the EAG usages
can be served more directly without peering into MultiChildLb's
internals or even accessing ChildLbStates, which make the tests less
sensitive to implementation changes. Some changes do leverage the new
behavior of MultiChildLb where it preserves the order of the entries.

This does fix an important bug in shutdown tests. The tests looped over
the ChildLbStates after shutdown, but shutdown deleted all the children
so it looped over an entry collection. Fixing that exposed that
deliverSubchannelState() didn't function after shutdown, as the listener
was removed from the map when the subchannel was shut down. Moving the
listener onto the TestSubchannel allowed having access to the listener
even after shutdown.

A few places in LeastRequestLb lines were just deleted, but that's
because an existing assertion already provided the same check but
without digging into MultiChildLb.
2024-11-11 13:16:21 -08:00
Riya Mehta 546efd79f1
s2a: fix flake in FakeS2AServerTest (#11673)
While here:
 * add an awaitTermination to after calling shutdown on server
 * don't use port picker

Fixes #11648
2024-11-08 10:25:49 -08:00
Kannan J 5081e60626
xds: Replace null check with has value check because proto fields can never be null. (#11675) 2024-11-08 13:17:24 +05:30
erm-g d6c80294a7
xds: Spiffe Trust Bundle Support (#11627)
Adds verification of SPIFFE based identities using SPIFFE trust bundles.

For in-progress gRFC A87.
2024-11-07 21:03:15 -08:00
MV Shiva 76705c235c
xds: Implement GcpAuthenticationFilter (#11638) 2024-11-06 16:39:00 +05:30
Colin Alworth a5db67d0cb Deframe failures should be logged on the server as warnings
This brings grpc-servlet in line with the grpc-netty implementation found
in NettyServerStream.TransportState.
2024-11-05 13:28:16 -08:00
Kannan J dae078c0a6
api: When forwarding from Listener onAddresses to Listener2 continue to use onResult (#11666)
When forwarding from Listener onAddresses to Listener2 continue to use onResult and not onResult2 because the latter requires to be called from within synchronization context and it breaks existing code that didn't need to do so when using the old Listener interface.
2024-11-05 23:52:20 +05:30
Eric Anderson 664f1fcf8a xds: Remove Bazel dependency on xds v2
feab4e54 removed xds v2 for the Gradle build. Testing with a deploy.jar,
I see the same 4 MB size reduction (31 -> 27 MB) here.

While an orca dependency is deleted in this commit, it is only a direct
dependency. It remains in the :orca target, so doesn't contribute a size
reduction.
2024-11-05 10:02:23 -08:00
MV Shiva 88596868a4
xds: Envoy proto sync to 2024-10-23 (#11664) 2024-11-05 10:56:33 +05:30
Eric Anderson 1993e68b03
Upgrade depedencies (#11655) 2024-11-01 07:50:08 -07:00
Kannan J ef1fe87373
okhttp: Use failing "source" for read bytes when sending GOAWAY due to insufficient thread pool size
Create `ClientFrameHandler` with failing source to be used in case of failed 2nd thread scheduling. Fixes NPE from https://github.com/grpc/grpc-java/pull/11503.
2024-10-31 11:51:40 +05:30
Kannan J c167ead851
xds: Per-rpc rewriting of the authority header based on the selected route. (#11631)
Implementation of A81.
2024-10-30 21:11:41 +05:30
Eric Anderson 3562380da5 Upgrade Gradle to 8.10.2 and upgrade plugins
com.github.johnrengelman.shadow is now com.gradleup.shadow (note the
redirect)
https://github.com/johnrengelman/shadow/releases/tag/8.3.0
2024-10-30 07:00:57 -07:00
SreeramdasLavanya 766b92379b
api: Add java.time.Duration overloads to CallOptions, AbstractStub taking TimeUnit and a time value (#11562) 2024-10-30 18:49:53 +05:30
Eric Anderson b5ef09c548
RELEASING.md: Fix interop_matrix image name (#11653) 2024-10-30 10:59:03 +05:30
Eric Anderson 1612536f86 Update README etc to reference 1.68.1 2024-10-29 14:09:15 -07:00
Eric Anderson a431e3664b binder: Remove unnecessary uses of LooperMode(PAUSED)
PAUSED Looper mode has been the default for many years, maybe around
robolectric 4.5 (9ae9f0b6a6). Explicitly specifying PAUSED Looper mode
is not necessary.

cl/690684542
2024-10-29 08:01:40 -07:00
vinodhabib 9176b55286
core: Make timestamp usage in Channelz use nanos from Java.time.Instant when available (#11604)
When java.time.Instant is available use the timestamp from this class in nano precision rather than using System.currentTimeInMillis and converting it to nanos.

Fixes #5494.
2024-10-29 10:19:47 +05:30
Ran 735b3f3fe6
netty: add soft Metadata size limit enforcement. (#11603) 2024-10-28 10:25:17 -07:00
John Cormie fe350cfd50
Update error codes doc for new "Safer Intent" rules. (#11639) 2024-10-25 14:41:03 -07:00
Kannan J 0b2c17d0da
Xds: Implement using system root trust CA for TLS server authentication (#11470)
Allow using system root certs for server cert validation rather than CA root certs provided by the control plane when the validation context provided by the control plane specifies so.
2024-10-25 14:36:27 +05:30
Eric Anderson 370e7ce27c
Revert "stub: Ignore unary response on server if status is not OK" (#11636)
This reverts commit 99f86835ed.

The change doesn't handle `null` messages, which don't happen with
protobuf, but can happen with other marshallers, especially in tests.
See cl/689445172

This will reopen #5969.
2024-10-25 12:09:22 +05:30
Luwei Ge ba8ab796e7
alts: support altsCallCredentials in GoogleDefaultChannelCredentials (#11634) 2024-10-24 15:18:53 -07:00
Eric Anderson 31dad6af49 Start 1.69.0 development cycle 2024-10-24 10:57:29 -07:00
John Cormie 46c1b387fa
Update binderDied() error description to spell out the possibilities for those unfamiliar with Android internals. (#11628)
Callers are frequently confused by this message and waste time looking for problems in the client when the root cause is simply a server crash. See b/371447460 for more context.
2024-10-24 10:52:44 -07:00
MV Shiva b65cbf5081
inprocess: Support tracing message sizes guarded by flag (#11629) 2024-10-24 01:22:41 +05:30
hlx502 62f409810d
netty: Avoid TCP_USER_TIMEOUT warning when not using epoll (#11564)
In NettyClientTransport, the TCP_USER_TIMEOUT attribute can be set only
if the channel is of the AbstractEpollStreamChannel.

Fixes #11517
2024-10-22 12:17:39 -07:00
Lucas Mirelmann 00c8bc78dd
Minor grammar fix in Javadoc (#11609) 2024-10-18 11:29:35 +05:30
erm-g 4be69e3f8a
core: SpiffeUtil API for extracting Spiffe URI and loading TrustBundles (#11575)
Additional API for SpiffeUtil:
 - extract Spiffe URI from certificate chain
 - load Spiffe Trust Bundle from filesystem [json spec][] [JWK spec][]

JsonParser was changed to reject duplicate keys in objects.

[json spec]: https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE_Trust_Domain_and_Bundle.md
[JWK spec]: https://github.com/spiffe/spiffe/blob/main/standards/X509-SVID.md#61-publishing-spiffe-bundle-elements
2024-10-17 11:11:07 -07:00
Eng Zer Jun 1e0928fb79 api: fix javadoc of CallCredentials.applyRequestMetadata
It is the `Executor appExecutor` that should be given an asynchronous
task, not `CallCredentials.MetadataApplier applier`.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2024-10-17 10:13:12 -07:00
Eric Anderson 23ebf364d4 inprocess: Delete "standalone" internal transport
This had been used for a time with a combined inprocess+binder server.
However, just having multiple servers worked fine and this is no longer
used/needed.
2024-10-17 09:47:20 -07:00
Vindhya Ningegowda 84d30afad6
Get mesh_id local label from "CSM_MESH_ID" environment variable, rather than parsing from bootstrap file (#11621) 2024-10-16 16:12:27 -07:00
Eric Anderson b692b9d26e core: Handle NR/LB exceptions when panicking
If a panic is followed a panic, we'd ignore the second. But if an
exception happens while entering panic mode we may fail to update the
picker with the first error. This is "fine" from a correctness
standpoint; all bets are off when panicking and we've already logged the
first error. But failing RPCs can often be more easily seen than just
the log.

Noticed because of http://yaqs/8493785598685872128
2024-10-16 13:26:45 -07:00
Naveen Prasanna V 99f86835ed
stub: Ignore unary response on server if status is not OK
Fixes #5969
2024-10-16 09:23:22 -07:00
jiangyuan 36e29abf41
fix XdsTestServer/TestServiceServer listenAddresses conflict (#11612) 2024-10-14 12:33:06 -07:00
MV Shiva ca43d78f58
inprocess: Support tracing message sizes (#11406) 2024-10-11 10:28:51 +05:30
Riya Mehta a01a9e2340
Enable publishing. (#11581) 2024-10-10 16:32:10 -07:00
Riya Mehta d628396ec7
s2a: Add S2AStub cleanup handler. (#11600)
* Add S2AStub cleanup handler.

* Give TLS and Cleanup handlers name + update comment.

* Don't add TLS handler twice.

* Don't remove explicitly, since done by fireProtocolNegotiationEvent.

* plumb S2AStub close to handshake end + add integration test.

* close stub when TLS negotiation fails.
2024-10-10 16:31:18 -07:00
yifeizhuang 2129078dee
core: fix test flakiness in retriableStream hedging deadlock test (#11606) 2024-10-08 17:44:40 -07:00
Vindhya Ningegowda 2e9c3e19fb
xds: Update error handling for ADS stream close and failure scenarios (#11596)
When an ADS stream in closed with a non-OK status after receiving a response, new status will be updated to OK status. This makes the fail behavior consistent with gRFC A57.
2024-10-08 17:28:14 -07:00
yifeizhuang e59ae5fad0
rename grpc-context-override-opentelemetry and publish artifact (#11599) 2024-10-08 17:00:33 -07:00
Riya Mehta 9d252c2466
Don't use Utils.pickUnusedPort. (#11601) 2024-10-08 10:57:32 -07:00