Currently xDS response messages are printed with protobuf JsonFormat as they may contain Any-typed messages. JsonFormat internally uses Gson for generating human-readable message representations. The default Gson instance escapes HTML characters, which causes characters like '=' to be printed as '\u003d'. There is nothing can be done on our side to disable this as it is protobuf's internal implementation.
To make the same content consistent in both request and response messages consistent, we use JsonFormat to print both.
Interceptor-based config selector will be needed for fault injection.
Add `interceptor` field to `InternalConfigSelector.Result`. Keep `callOptions` and `committedCallback` fields for the moment, because it needs a refactoring to migrate the existing xds config selector implementation to the new API.
--secure was moved to front since many languages need flags to precede
positional parameters, and we'd like other languages to use the same
flags when feasible.
:8000 was removed from xds: target in the README, as it isn't all that
useful and is confusing as xDS itself provides the backend port numbers.
In v3, UpstreamTlsContext and DownstreamTlsContext messages are not transitively type-referenced by Cluster and Listener. So the message printer is not able to print them without adding them into the registry manually.
Making the global map for holding atomics that aggregates cluster's outstanding requests an explicit dependency is easier for maintaining the architecture.
The xds package is not at all pretty. There's lots of stuff leaking that
shouldn't. For the moment accept that and just clean up the javadoc.
This is a bit more important now because XdsChannelCredentials and
XdsServerBuilder will be exposed, so users will start noticing stuff
here. Unfortunately this change doesn't fix IDEs auto-suggesting classes
that users shouldn't use.
On Android, there is a race between making new RPCs and reconnecting after network comes back. If the former happens first, RPCs fail immediately. This is because resetConnectBackoff() does not update the picker before trying to reconnect and new RPCs are sent with the old picker, which fails RPCs immediately.
In this change, we move to use enterIdle(), which updates the channel picker to cause new RPCs being buffered (while subchannels are in reconnecting), at the moment network recovers. Hopefully, this can avoid RPCs being dropped prematurely in network recovery.
This is to be used for xDS to inject configuration for the
XdsServerCredentials. We'd like a cleaner approach, but they mostly seem
to be more heavy-weight. We will probably address this at the same time
we handle the Executor being passed for TLS. In the mean time this is
easy, doesn't hurt much, and can easily be changed in the future.
GcFinalization.awaitClear() only guarantees the reference is cleared but no guarantee for the ref is enqueued. This could cause race for calling cleanQueue() in the test.
While most languages support setting environment variables during runtime,
Java does not. In Java, the preferred approach is to use Java System Properties
in order so specify configuration options. By checking for the existence
of the io.grpc.xds.bootstrap property if GRPC_XDS_BOOTSTRAP is not found,
it is possible to either supply the bootstrap location during runtime or as
a Java argument. The environment variable still takes precedence in order
to not break any existing documentation.
Circuit breakers should be applied to clusters in the global scope. However, the LB hierarchy might cause the LB policy (currently EDS, but cluster_impl in the future) that applies circuit breaking to be duplicated. Also, for multi-channel cases, the circuit breaking threshold should still be shared across channels in the process.
This change creates a global map for accessing circuit breaking atomics that used to count the number of outstanding requests per global cluster basis. Atomics in the global map are held by WeakReferences so LB policies/Pickers/StreamTracers do not need to worry about counter's lifecycle and refcount.
The proto field is named as num_failures but its comment is saying it is for number of RPCs that failed to record a remote peer. RPC failed == RPC failed to record a remote peer was true previously (so no existing tests should be affected by this changed) as server completed RPCs immediately. It is no longer true with server capability to keep the call open/delayed.
This change clarifies the proto definition for stats RPC. rpcs_by_peer is for recording RPCs succeeded and num_failures is for RPCs failed. RPCs in the flight when the stats call times out are not counted towards any of the stats.
Update xDS interop test proto to aggregate accumulated stats based on RPC methods (mirroring 643e5bcd1e8db931cf76a3be19cd9bba223ee987 in C-core's change). Updated the xDS interop test client to support querying accumulated stats aggregated to RPC methods.