Commit Graph

6498 Commits

Author SHA1 Message Date
Eric Anderson a28357e197 okhttp: Workaround SSLSocket not noticing socket is closed
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes #11012
2024-06-06 09:39:35 -07:00
Eric Anderson 0a4df9f93c binder: Make transport notification methods package-private
These are overrides of BinderTransport itself, so not used elsewhere.
They are essentially private. It was scary seeing `@GuardedBy` for a
public method. I copied the annotation to the base class to make sure
ErrorProne could verify the calls.
2024-06-06 08:59:32 -07:00
John Cormie 791f894e25
binder: Add a connection timeout (#11255)
Timeout is initially infinite so that we can release/import the supporting code and the behavior change independently.

Fixes #11137
2024-06-05 15:11:30 -07:00
Eric Anderson 22642465d3 gcp-csm-o11y: Add resource attributes from environment
This fixes csm_remote_workload_namespace_name being unknown when running
the interop test.
2024-06-05 10:21:50 -07:00
Eric Anderson 9e2cca08fa gcp-csm-o11y: Use xds's shadow configuration
This prevents mixing shaded and non-shaded class files. We do the same
thing in all the other projects that depend on xds.
2024-06-05 06:56:27 -07:00
Eric Anderson 0358c508da gcp-csm-o11y: Enable maven-publish plugin 2024-06-05 06:23:42 -07:00
Eric Anderson 839d2770ab
interop-testing: Add gcp-csm-o11y testing support 2024-06-04 17:15:58 -07:00
Eric Anderson b2731f27ad core: Delete AbstractTransportTest.clientShutdownBeforeStartRunnable
The test was added in e4e7f3a06 when InProcess stopped returning a
Runnable from start(). In c5a63a1 we realized (indirectly) that there's
no point in using the Runnable any more.

This test failed with Binder (which seems to have been using the
Runnable unnecessarily), and InProcess, Netty, and OkHttp don't use the
Runnable. Instead of fixing it, we'll just move toward stopping using
Runnable.

I'm not removing the Runnable usage from Binder in this commit because
this test is currently causing CI failures and I don't want to do a
behavior change when fixing it.
2024-06-04 13:43:14 -07:00
Eric Anderson 62cf8427be gcp-csm-o11y: s/csm.service_namespace/csm.service_namespace_name/
Just a typo, maybe because service_namespace is used in filter metadata
from CDS.
2024-06-04 13:41:22 -07:00
Eric Anderson 9792c9f106 kokoro: Remove unavailable API levels 21-23
There are no longer any devices (virtual or otherwise) that support API
level 21, 22, or 23. Google Play services is still supporting API level
21 (although there is a pattern of notifying of dropped levels in July,
and dropping them in August).
2024-06-04 13:05:15 -07:00
Eric Anderson c5a63a16a7 core: Remove "can't call transport listener from start()" restriction
This hasn't been needed since f8f569e07, when InternalSubchannel stopped
calling start() with a lock held. Note that also means no transport
needs to return a Runnable (but some still are).

I had noticed in e4e7f3a06 that it was safe for InProcess to call the
listener directly within start(), but I didn't notice this Javadoc that
said it wasn't allowed.
2024-06-04 11:26:08 -07:00
Eric Anderson 0fcd8cc19f kokoro: Add psm-csm build config 2024-06-03 11:12:36 -07:00
Eric Anderson dc490ae0cb
gcp-csm-observibility: Fill in experimental issue URL 2024-05-30 16:42:18 -07:00
erm-g 781b4c4575
security: Stabilize AdvancedTlsX509KeyManager. (#11139)
* Clean up and de-experimentalization of KeyManager

* Unit tests for API validity.
2024-05-30 13:54:11 -04:00
Eric Anderson df8cfe9ddc Create gcp-csm-observability 2024-05-29 14:40:44 -07:00
Eric Anderson 6dde844c04 opentelemetry: Plumb plugins for CSM o11y 2024-05-29 14:40:44 -07:00
Eric Anderson 5c6b80881d rls: Make LinkedHashLruCache non-threadsafe
CachingRlsLbClient already calls it with a lock held. The only reason
the cache needs to manage the lock itself is for the periodic cleanup.
Let the consumer of the cache handle the timer.
2024-05-29 08:24:56 -07:00
Mir3605 c31dbf48ad
Minor - add missing instruction (#11131)
The "list" instruction was missing, so the command didn't work properly
2024-05-29 12:45:44 +05:30
John Cormie df01271687
Use a builder to eliminate BinderServer's long list of ctor params (#11235) 2024-05-28 19:34:55 -07:00
Eric Anderson e4e7f3a068
inprocess: Fix listener race if transport is shutdown while starting
Returning the runnable did nothing, as both the start method and the
runnable are run within the synchronization context. I believe the
Runnable used to be required in the previous implementation of
ManagedChannelImpl (the lock-based implementation before we created
SynchronizationContext).

This fixes a NPE seen in ServerImpl because the server expects proper
ordering of transport lifecycle events.
```
Uncaught exception in the SynchronizationContext. Panic!
java.lang.NullPointerException: Cannot invoke "java.util.concurrent.Future.cancel(boolean)" because "this.handshakeTimeoutFuture" is null
	at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.transportReady(ServerImpl.java:440)
	at io.grpc.inprocess.InProcessTransport$4.run(InProcessTransport.java:215)
	at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:94)
```

b/338445186
2024-05-28 15:47:15 -07:00
Eric Anderson 107fdb4b7c core: Make RPC buffering comment more clear
It wasn't entirely clear what "it" referred to.
2024-05-28 15:26:13 -07:00
Eric Anderson 75fa441fc9
xds: Plumb the Cluster's filterMetadata to RPCs
This will be used by CSM observability, and may get exposed to further
uses in the future.
2024-05-24 15:08:36 -07:00
Eric Anderson 018917ae59 core: Restore optimization for InProcess RPC order
fea577c80 disabled an optimization that some tests notice, as it can
change execution order. This restores the old behavior, at slight
expense to seeing relationship between in-use tracking and idle mode.
2024-05-24 13:02:17 -07:00
Eric Anderson 960012d76e api: Add ClientStreamTracer.inboundHeaders(Metadata)
This will be used by the metadata exchange of CSM. When recording
per-attempt metrics, we really need per-attempt data and can't leverage
ClientInterceptors.
2024-05-24 11:28:40 -07:00
Eric Anderson fea577c804
core: Exit idle mode when delayed transport is in use
8844cf7b8 triggered a regression where a new RPC wouldn't cause the
channel to exit idle mode, if an RPC was still progressing on an old
transport. This was already possible previously, but was racy.
8844cf7b8 made it less racy and more obvious.

The two added `exitIdleMode()` calls in this commit are companions to
those in `enterIdleMode()`, which detect whether the channel should
immediately exit idle mode.

Noticed in cl/635819804.
2024-05-23 14:45:38 -07:00
John Cormie 0b5f38d942
Use the builder pattern to replace BinderClientTransport's long ctor arg list (#11220) 2024-05-23 11:50:37 -07:00
Colin Alworth 6aa063990a
servlet: Update Servlet container test versions (#11212)
Verifies that latest versions of Tomcat/Undertow/Jetty pass
integration tests - I manually verified that all ignored tests still
fail.

Two tests failed in Jetty, it appears that the integration test
anticipates that the server implementation is willing to send larger
trailers than the client SETTINGS frame allows for. Since the server
refuses to send too large of headers/trailers, the client does not
receive the too-large payloads, and doesn't fail with the expected
message. This change was introduced in Jetty 10.0.15/11.0.11. Those
tests are ignored.
2024-05-23 09:49:34 -07:00
Eric Anderson db96219be1
core: Remove direct test dependency on inprocess
An inprocess class was just being abused. Note that grpc-testing depends
on inprocess, so there is still an indirect dependency on inprocess
present.
2024-05-22 14:50:26 -07:00
Eric Anderson 58bab7434a opentelemetry: Use dep from gradle/libs.versions.toml 2024-05-21 10:29:27 -07:00
Terry Wilson 8aaace12eb
Update README etc to reference 1.64.0 (#11213) 2024-05-15 13:31:10 -07:00
Nikolay Firov f995c121e9
Include com_google_protobuf_javalite to MODULE.bazel to fix bzlmod querying graph in end-user repo (#11147)
* Fix 3d party dependency use_repo

* remove protobuf as it is already added as module dep

* fix

* fix

* fix

* return com_google_protobuf_javalite archive and use it in MODULE.bazel
2024-05-15 17:21:23 +05:30
Eric Anderson 8844cf7b87 core: Fully delegate picks to DelayedClientTransport
DelayedClientTransport already had to handle all the cases, so
ManagedChannelImpl picking was acting only as an optimization.
Optimizing DelayedClientTransport to avoid the lock when not queuing
makes ManagedChannelImpl picking entirely redundant, and allows us to
remove the duplicate race-handling logic.

This avoids double-picking when queuing, where ManagedChannelImpl does a
pick, decides to queue, and then DelayedClientTransport re-performs the
pick because it doesn't know which pick version was used. This was
noticed with RLS, which mutates state within the picker.
2024-05-14 11:37:14 -07:00
Eric Anderson d9e09c285b all: Add opentelemetry
This adds opentelemetry to the shared javadoc (but also other things
like having its tests contribute to code coverage).
2024-05-14 10:28:02 -07:00
Eric Anderson e82b8f0674
opentelemetry: Mark registerGlobal() as experimental 2024-05-14 10:26:56 -07:00
Eric Anderson f9b6e5f92d rls: Guarantee backoff will update RLS picker
Previously, picker was likely null if entering backoff soon after
start-up. This prevented the picker from being updated and directing
queued RPCs to the fallback. It would work for new RPCs if RLS returned
extremely rapidly; both ManagedChannelImpl and DelayedClientTransport do
a pick before enqueuing so the ManagedChannelImpl pick could request
from RLS and DelayedClientTransport could use the response. So the test
uses a delay to purposefully avoid that unlikely-in-real-life case.

Creating a resolving OOB channel for InProcess doesn't actually change
the destination from the parent, because InProcess uses directaddress.
Thus the fakeRlsServiceImpl is now being added to the fake backend
server, because the same server is used for RLS within the test.

b/333185213
2024-05-13 16:29:05 -07:00
Vindhya Ningegowda 77a1e77e11
xds, rls: Experimental metrics are disabled by default (#11196)
Experimental metrics (i.e WRR and RLS metrics) are disabled by default. Users are expected to explicitly enable while configuring metrics.
2024-05-10 17:46:58 -07:00
Vindhya Ningegowda 5ba1a55637
opentelemetry: Publish grpc opentelemetry (#11187)
publish grpc opentelemetry
2024-05-09 13:24:52 -07:00
Terry Wilson 511b9c3a5b
rls: Add gauge metric recording (#11175)
Adds these gauges:
- grpc.lb.rls.cache_entries
- grpc.lb.rls.cache_size
2024-05-08 15:15:34 -07:00
Eric Anderson 7a663f633c api: Hide internal metric APIs
Some APIs were marked experimental but had internal APIs in their
surface. These were all changed to internal. And then the internal APIs
were mostly hidden from generated documentation.

All these APIs will eventually become public and maybe even stable. But
they need some iteration before we're ready for others to start using
them.
2024-05-08 10:24:24 -07:00
Larry Safran 59b189bf91
Change HappyEyeballs and new pick first LB flags default value to false (#11120)
* Change HappyEyeballs flag default value to false since some G3 users are seeing problems.
Put the flag logic in a common place for PickFirstLeafLoadBalancer & WRR's test.

* Set expected requestConnection count based on whether happy eyeballs is enabled or not

* Disable new PickFirstLB

* Fix test expectations to handle both new and old PF LB paths.
2024-05-08 10:08:23 -07:00
Eric Anderson d366d74fa6
opentelemetry: Rename and stabilize API OpenTelemetryModule
OpenTelemetryModule is renamed to GrpcOpenTelemetry. The Builder is now
`final`, although that should only impact mocks as it had a private
constructor.

Fixes #10591
2024-05-08 07:51:17 -07:00
Eric Anderson 5a6745b97e opentelemetry: Missing locality should be empty string
From gRFC A78:

> If no locality information is available, the label will be set to the
> empty string.
2024-05-08 07:50:28 -07:00
Eric Anderson 45a91bd035 xds: Add WRR metric test with real channel 2024-05-08 07:50:09 -07:00
Terry Wilson 2bc4306940
xds: Include locality label in WRR metrics (#11170) 2024-05-07 11:40:03 -07:00
Eric Anderson 54ac06ae30 rls: Add metric test with real channel 2024-05-07 10:06:46 -07:00
Eric Anderson 6bede04d9f opentelemetry: Add optional grpc.lb.locality to per-call metrics
The optional label API was added in 4c78a974 and xds_cluster_impl was
plumbed in 077dcbf9.

From gRFC A78:

> ### Optional xDS Locality Label
>
> When xDS is used, it is desirable for some metrics to include an optional
> label indicating which xDS locality the metrics are associated with.
> We want to provide this optional label for the metrics in both the
> existing per-call metrics defined in [A66] and in the new metrics for
> the WRR LB policy, described below.
>
> If locality information is available, the value of this label will be of
> the form `{region="${REGION}", zone="${ZONE}", sub_zone="${SUB_ZONE}"}`,
> where `${REGION}`, `${ZONE}`, and `${SUB_ZONE}` are replaced with the
> actual values.  If no locality information is available, the label will
> be set to the empty string.
>
> #### Per-Call Metrics
>
> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.  We will then use this mechanism in the `xds_cluster_impl`
> policy's picker to set the locality label. ...
>
> This label will be available on the following per-call metrics:
> - `grpc.client.attempt.duration`
> - `grpc.client.attempt.sent_total_compressed_message_size`
> - `grpc.client.attempt.rcvd_total_compressed_message_size`
2024-05-07 09:00:08 -07:00
hakusai22 6ec744f2a0
Fix various typos (#11144) 2024-05-06 20:29:44 -07:00
Eric Anderson 354b028cae
Add gauge metric API and Otel implementation
This is needed by gRFC A78 for xds metrics, and for RLS metrics. Since
gauges need to acquire a lock (or other synchronization) in the
callback, the callback allows batching multiple gauges together to avoid
acquiring-and-requiring such locks.

Unlike other metrics, gauges are reported on-demand to the MetricSink.
This means not all sinks will receive the same data, as the sinks will
ask for the gauges at different times.
2024-05-06 11:38:04 -07:00
Eric Anderson 8516cfef9c opentelemetry: Add grpc.target label to per-call metrics
As defined by gRFC A66, the target is on all client-side per-call
metrics (both call and attempt).
2024-05-06 10:53:46 -07:00
Eric Anderson ca35577327 Add internal channel builder API to get target
This will be used for gRFC A66's OTel per-RPC metric label:

> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`).

The majority of the changes are to move target computation from
ManagedChannelImpl into the builder. A small hack API was added to
ManagedChannelBuilder to get the target to create an interceptor.
2024-05-06 10:53:46 -07:00