Commit Graph

755 Commits

Author SHA1 Message Date
Chengyuan Zhang ef5a992e77
xds: fix bug of using the wrong cluster name for client load reporting (#5865)
* fixed bug of using the wrong cluster name for client load reporting

* moved clusterName into LrsStream
2019-06-11 15:54:05 -07:00
Chengyuan Zhang 213b91b165
xds: refactor XdsLoadReportClient and XdsLoadStatsStore in order to integrate with XdsLoadBalancer (part 1) (#5863)
* extract self-defined Locality into XdsLocality class

* separate out functionalities for recording client load from lrsClient, xds load balancer will directly interact with XdsLoadStatsStore to set up locality counters

* added GRPC to constant TRAFFICDIRECTOR_HOSTNAME_FIELD name to better match that in XdsComms

* fixed bug of using the wrong cluster name in load report's ClusterStats, it should be GSLB service name, which is responsed by load report response (same as that in EDS response).

* added a new line to the end of files.

* Revert "fixed bug of using the wrong cluster name in load report's ClusterStats, it should be GSLB service name, which is responsed by load report response (same as that in EDS response)."

This reverts commit 6097dd4066.

* rephrase interface comment for StatsStore

* added equality and hashCode test for XdsLocality
2019-06-11 09:43:15 -07:00
Chengyuan Zhang c98fb2d03e
xds: fix bug of missing total_dropped_requests field in ClusterStats proto (#5862) 2019-06-10 14:22:16 -07:00
ZHANG Dapeng f7077a565a
xds: cleanup XdsLbStateTest
The test case `XdsLbStateTest.handleSubchannelState()` was introduced before `LocalityStore` refactored out of `XdsLbState`. After `LocalityStore` refactored out, the test case should not be in `XdsLbStateTest` anymore. The test case is already covered in `LocalityStoreTest`.
2019-06-10 14:19:27 -07:00
ZHANG Dapeng 33c30db42c
xds: allow grpclb balancer addresses for backward compatibility
During migration, the name resolver may not know when the client has been upgraded to xds, so it may still send grpclb v1 addresses with a list of policies including both grpclb v1 and xds.
2019-06-10 11:27:42 -07:00
ZHANG Dapeng 16de96befe
xds: Add gogoproto dependency to xds
The generated grpc services are not changed.
2019-06-05 10:13:19 -07:00
Chengyuan Zhang 93551719b9
xds: integrate backend metric API to client load reporting (#5797)
* augmented ClientLoadCounter with backend metrics

* added a listener implementation for receiving backend metrics and aggregate in ClientLoadCounter
2019-05-31 14:28:23 -07:00
ZHANG Dapeng d8aa42723d
xds: fix bug in XdsLoadBalancerProvider.parseLoadBalancingConfigPolicy
Resolves #5804
2019-05-30 16:37:08 -07:00
ZHANG Dapeng f9decbf69d
xds: remove unused variables 2019-05-30 14:38:43 -07:00
ZHANG Dapeng 77a512551f
xds: handle 100% drop for fallback mode
- Cancel fallback timer and/or exit fallback mode once receiving an EDS response indicating 100% drop.
- Also update balancing state once receiving the first EDS response with drop information when the channel is at the initial IDLE state.
2019-05-24 20:54:37 -07:00
Chengyuan Zhang 7fd5f261b4
xds: implement lb policy backend metric api (#5639)
* implemented utility methods to create ClientStreamTracer.Factory with OrcaReportListener installed for retrieving per-request ORCA data

* added unit tests

* use delegatesTo instead of spy

* implemented OrcaReportingHelper delegating to some original Helper for load balancing policies accessing OOB metric reports

* added unit tests for out-of-band ORCA metric accessing API in a separate test class

* rebase to master, resolve the breaking change of StreamInfo class being final with builder

* trashed hashCode/equal for OrcaReportingConfig

* changed log level and channel trace event level to ERROR as required by design doc

* added OrcaReportingHelperWrapper layer to allow updating report interval at any time

* reverse the naming of parent/child helper, child helper is the outer-most helper in the wrapping structure

* changed orca listener interface to use separate listener interfaces for per-request and out-of-band cases

* added more comprehensive unit tests

* added test case for per-request reporting that parent creates its own stream tracer

* fixed bug of directly assign reporting config, which would cause it be mutated later

* separate test cases for updating reporting config at different time

* fixed lint style error

* polish comments

* minor polish in unit tests

* refactor OrcaUtil class into OrcaOobUtil and OrcaPerRequestUtil and get rid of static methods for easier user testing

* hide BackoffPolicyProvider and Stopwatch supplier in OrcaOobUtil's public API

* add javadoc for getInstance() methods

* ensure the same Subchannel instance created by the helper that has corresponding OrcaOobReportListener registered are passed to the listener callback

* removed costNames foe OrcaReportingConfig

* removed redundant checks

* reformated the OrcaOobUtilTest class to put helper methods in the bottom

* fixed impl with changes made on Subchannel (SubchannelStateListener now ties with Subchannel)

* fixed comments

* added usage examples in javadoc for OrcaUtils

* add method comments for OrcaUtil's listener API threading

* make fields in OrcaReportingConfig final

* fixed OrcaOobUtilTest for calling setOrcaReportingConfig inside syncContext

* added ExperimentalApi annotation for Orca utils
2019-05-24 15:12:22 -07:00
Chengyuan Zhang d86d3dd363
all: fix lint and revert redundant lint fixes in #5570 (#5787)
* Revert "all: fix lint (#5770)"

This reverts commit 00d4cc29ad.

* all: fix lint and revert redundant lint fix in #5570
2019-05-24 01:02:12 -07:00
ZHANG Dapeng 2180fcd113
xds: fix protobuf fields can not be null 2019-05-23 18:27:09 -07:00
ZHANG Dapeng 08843f8d59
xds: handle drop percentage for SubchannelPicker (#5765)
* xds: handle drop percentage for SubchannelPicker

* XdsCommsTest

* refactor ThreadSafeRandom

* remove hasNonDropBackends

* fix comments
2019-05-23 10:36:59 -07:00
ZHANG Dapeng 54bbd372ef
xds: implement Fallback-at-Startup mode
This is the implementation of the Fallback-at-Startup mode in the design doc.

- The Fallback-After-Startup mode is not implemented.
- Drop related behavior is not implemented.
2019-05-22 17:44:39 -07:00
ZHANG Dapeng 36ae0ed165
xds: temporarily use ManagedChannelBuilder.forTarget for creating resolving oob channel 2019-05-22 13:38:00 -07:00
ZHANG Dapeng b3bac95f90
xds: not sending resource_name in EDS request 2019-05-22 09:36:51 -07:00
ZHANG Dapeng 994fd7429a
xds: populate XdsLoadBalancerProvider to ServiceLoader
Also changed `XdsLoadBalancerProvider` to avoid initialization error when using `ServiceLoader`.
2019-05-22 09:35:38 -07:00
ZHANG Dapeng c242fc8245
xds: fix XdsLoadStatsStoreTest.recordingDroppedRequests flaky NPE 2019-05-20 16:00:18 -07:00
Kun Zhang 7934594dfe
api: pass Subchannel state updates to SubchannelStateListener rather than LoadBalancer (take 2) (#5722)
This is a revised version of #5503 (62b03fd), which was rolled back in f8d0868. The newer version passes SubchannelStateListener to Subchannel.start() instead of SubchannelCreationArgs, which allows us to remove the Subchannel argument from the listener, which works as a solution for #5676.

LoadBalancers that call the old createSubchannel() will get start() implicitly called with a listener that passes updates to the deprecated LoadBalancer.handleSubchannelState(). Those who call the new createSubchannel() will have to call start() explicitly.

GRPCLB code is still using the old API, because it's a pain to migrate the SubchannelPool to the new API.  Since CachedSubchannelHelper is on the way, it's easier to switch to it when it's ready. Keeping
GRPCLB with the old API would also confirm the backward compatibility.
2019-05-17 16:37:41 -07:00
ZHANG Dapeng 53f74c62ba
all: fix lint 2019-05-16 15:00:20 -07:00
Chengyuan Zhang b7fb3c2e93
xds: add counts for recently issued calls in client side load reporting (#5735)
* added counts for recently issued calls in client side load reporting

* use recordCallStarted/Finished to manipulate counter instead of explicitly incr/decr methods
2019-05-15 15:52:20 -07:00
Chengyuan Zhang e483478913
xds/third_party: update xds load report proto by importing latest files from envoy repo (#5727) 2019-05-14 13:29:37 -07:00
ZHANG Dapeng 9dacc45447
xds: implement ADS request and response handling in standard mode (#5532)
Summary of PR: 
- XdsLbState now assumes standard mode only.
- Will not send CDS request. A EDS request will be sent at the constructor of `AdsStream`.
- Added a method to `LocalityStore`
  - `void updateLocalityStore(Map<Locality, LocalityInfo> localityInfoMap);`
- When a EDS response is received. `LocalityStore.updateLocalityStore()` will be called.
- `LocalityStoreImpl` maintains a map `Map<Locality, LocalityLbInfo> localityMap`.
- `LocalityStoreImpl.updateLocalityStore()` will create a child balancer for each locality, with a `ChildHelper`. Then each child balancer will call `handleResolvedAddresses()`.
- `LocalityStoreImpl.updateLocalityStore()` will update `childPickers`.
- `ChildHelper.updateBalancingState()` will update `childPickers` and then delegate to parent `helper.updateBalancingState()`.
- `XdsLbState.handleSubchannelState()` will delegate to `childBalancer.handleSubchannelState()` where the subchannel belongs to the childBalancer's locality.
2019-05-13 17:31:24 -07:00
Chengyuan Zhang 690b655f24
xds: refactor XdsLrsClient and XdsLoadReportStore for integrating backend metric data in load report (#5728)
* make ClientLoadCounter as a separate class, added unit tests for it as it now counts quite many stats

* add MetricListener class that takes in a ClientLoadCounter and updates metric counts from received OrcaLoadReport

* refactor XdsClientLoadRecorder into XdsLoadReportStore for better integrity

* move interceptPickResult implementation to XdsLrsClient, no delegated call

* added unit test annotation

* created a StatsStore interface for better modularize LrsClient and LoadReportStore

* add more tests to ClientLoadCounter to increase coverage

* added tests for add/get/remove locality counter

* refactored tests for XdsLoadReportStore, with newly added abstract base class for ClientLoadCounter, real counter data is not involved, only stubbed snapshot is needed

* comparing doubles doing arithmetic is not recommended, but we are fine here as we are manually repeating the computation exactly

* added test case for two metric listeners with the same counter, metric values should be aggregated to the same counter

* fixed exception message and comment to only refer to interface

* removed unused variables

* cleaned up unused mock init

* removed unnecessary ClusterStats comparison helper method, as we are really comparing with the object manually created, order is deterministic

* trashed stuff for backend metrics, it should be in a separate PR

* added toString test

* remove Duration dependency in LoadReportStore

* use ThreadLocalRandom to generate positive double randoms directly

* rename XdsLoadReportStore to XdsLoadStatsStore

* rename XdsLrsClient to XdsLoadReportClient

* refactor ClientLoadSnapshot to be an exact snapshoht of ClientLoadCounter, use getters for ClientLoadSnapshot and avoid touching fields directly

* renamed XdsLoadStatsManager to XdsLoadReportClient and XdsLoadReportClient to XdsLoadReportClientImpl

* make fields final in ClientLoadSnapshot

* use a constant noop client stream tracer instead of creating new one for each noop client stream tracer factory

* rename loadReportStore for abstraction
2019-05-13 15:38:26 -07:00
Chengyuan Zhang 7712ef596c
xds/third_part: revert change of envoy import script (#5667)
* Revert "xds/third_party: fixed compatibility issue of regex in BSD for import.sh sed command (#5613)"

This reverts commit affce636dd.

* added comment to avoid manual change as the script is synced with internal upstream
2019-05-04 12:23:54 -07:00
Eric Anderson ab2e048f13 Lint fixes for unused and Truth and Queue 2019-04-30 22:44:00 -07:00
Kun Zhang 973885457f
core: change ClientStreamTracer.StreamInfo to a final class with a builder (#5648)
As we are now endorsing the wrapping of ClientStreamTracers by
providing ForwardingClientStreamTracer, there is a need for altering
StreamInfo, especially CallOptions before it's passed onto the
delegate.  A Builder class and a toBuilder() provides a robust way
to copy the rest of the fields.

This is a breaking change for anybody who creates StreamInfo, which is
unlikely in non-test code, because StreamInfo was added as late as
1.20.0.
2019-04-30 09:10:56 -07:00
Chengyuan Zhang ea70de601c
xds: xds LRS client implementation with client load stats (#5588)
* Implement LRS client with backoff. No load data is invovled yet, only for load reporting interval updates. Unit test with load report interval update and streamClosed retry.

* use a separate stopwatch to manage actual load report interval

* refactor XdsLrsClientTest

* LRS response will only receive exactly one cluster name for grpc use case

* add more XdsLrsClientTest

* change class modifier

* fixed class comment

* renamed TRAFFICDIRECTOR_HOSTNAME_FIELD

* removed self-implemented Duration util methods, instead use methods in com.google.protobuf.util.Durations

* starting LrsStream's stopwatch inside LrsStream's start method

* fixed bug of using the wrong stopwatch for XdsLrsClient retrying

* removed try-catch around request StreamObserver

* polished code by eliminating unnecessary operations

* log an error instead of crash the thread when receiving LRS response for different cluster name

* created a XdsLoadStatsManager interface, XdsLrsClient implements it

* added XdsLoadStatsStore component in XdsLrsClient

* specify thread safety in XdsLoadStatsManager

* fixed style and convention issues

* added test case for verifying recorded load data by manually crafting load data

* added thread-safety in interface specification

* minor polish with adding debug logs to LRS client
2019-04-24 13:58:51 -07:00
Eric Anderson 2936242160 xds: Add missing RunWith annotation to test 2019-04-23 17:36:46 -07:00
Carl Mastrangelo 04e07034f3
all: update to truth 0.44 2019-04-23 10:50:49 -07:00
Chengyuan Zhang 43e4bce1c3
xds/third_party: import proto from envoy repo, added udpa orca protos (#5614) 2019-04-19 17:31:29 -07:00
Chengyuan Zhang 07f9efe95e
xds: xds load report store implementation (#5587)
* Implemented XdsCliendLoadRecorder which is a ClientStreamTracer.Factory that takes a counter and produces ClientStreamTracer aggregating the counter in callback.

* WIP: add tests for XdsLoadReportStore

* fix query count logic, use an atomic in-progress call counter instead of callsStarted with manual computation at snapshot

* make XdsLoadReportStore threadsafe

* fix class and field modifiers

* make iterating concurrentMap threadsafe

* fixed forgetting to call delegated streamClosed

* add a method to discard ClientLoadCounter for a given locality

* add test to guard interceptPickResult does not destroy original ClientStreamTracer

* added cluster wide dropCounters and method to be called to record dropped requests, tests to be added later.

* add methods for add/discard ClientLoadCounters for localities manually instead of implicitly added by interceptPickResult call. Unit tested.

* make a static noop ClientStreamTracer and ClientStreamTracer.Factory instead of creating one each every time need it

* refractor interceptPickResult

* modified ClientLoadCounter to allow continuing recording loads for localities no longer exposed by balancer while having ongoing loads

* refractor tests

* reworded method comment for calling in syncContext

* fixed issue of no setting dropCount to 0 after load reporting

* polish test

* added test coverage for recording dropped requests (not concurrent)

* added class comment for XdsClientLoadRecorder
2019-04-19 12:08:05 -07:00
Chengyuan Zhang affce636dd
xds/third_party: fixed compatibility issue of regex in BSD for import.sh sed command (#5613)
* fixed issue import.sh sed expr with non-extended regex does not support \| in BSD

* fixed sed -i issue for cross platform
2019-04-18 14:20:57 -07:00
Carl Mastrangelo a395eec4a3
core: update LB and NR API names
Updates #1770
2019-04-17 12:45:29 -07:00
Eric Anderson 80c3c992a6 core: Move io.grpc to grpc-api
io.grpc has fewer dependencies than io.grpc.internal. Moving it to a
separate artifact lets users use the API without bringing in the deps.
If the library has an optional dependency on grpc, that can be quite
convenient.

We now version-pin both grpc-api and grpc-core, since both contain
internal APIs.

I had to change a few tests in grpc-api to avoid FakeClock. Moving
FakeClock to grpc-api was difficult because it uses
io.grpc.internal.TimeProvider, which can't be moved since it is a
production class. Having grpc-api's tests depend on grpc-core's test
classes would be weird and cause a circular dependincy. Having
grpc-api's tests depend on grpc-core is likely possible, but weird and
fairly unnecessary at this point. So instead I rewrote the tests to
avoid FakeClock.

Fixes #1447
2019-04-16 21:45:40 -07:00
Kun Zhang 0244418d2d
core: Move ConfigOrError up level up. (#5578)
This class is used in other places than just NameResolver.Helper.  It
should not be an inner class of Helper.

Strictly speaking this is an API-breaking change.  However, this is
part of the service config error handling API that hasn't been done
yet.  Nobody has a legitimate reason to use it.
2019-04-10 16:28:23 -07:00
ZHANG Dapeng c8a0af572c
xds: add InterLocalityPicker 2019-04-05 18:52:25 -07:00
Eric Anderson 52dff83717 Update Protobuf to 3.7.1
This mainly avoids protoc from 3.7.0 which has a dependency on libatomic. Most
of our systems have libatomic, so it mostly works, but the interop docker
container does not, so building fails. Version 3.7.1 was rebuilt to avoid
needing the libatomic shared library.

This has the added benefit that Bazel is now on the same version as Gradle, as
3.7.1 included fixes for Bazel.
2019-04-05 10:55:14 -07:00
ZHANG Dapeng 30038fd0be
xds: remove unused code in test 2019-04-02 13:52:55 -07:00
Carl Mastrangelo 17d67f17fa
all: add LoadBalancer overload for Resolution results 2019-03-29 09:31:24 -07:00
Carl Mastrangelo 5ef8377efa
core: remove Type from ConfigOrError 2019-03-28 09:40:50 -07:00
ZHANG Dapeng a2cda8d15d
all: fix lint 2019-03-20 09:01:25 -07:00
Tim van der Lippe d35fbd7eee all: Update to Mockito 2
This is the public port of cl/238445847

Fixes #5319
2019-03-19 14:17:52 -07:00
Carl Mastrangelo c6b505229c
all: move LB parsing logic into LB.Factory 2019-03-19 13:33:13 -07:00
Eric Anderson d7e53e871b
Merge pull request #5454 from ejona86/protobuf-3.7.0
Upgrade to Protobuf 3.7.0
2019-03-11 15:39:50 -06:00
Carl Mastrangelo e5e01b5169
core,grpclb: use better generics on service config 2019-03-08 14:11:13 -08:00
Eric Anderson b48b0ac1d4 all: Stop committing generated protobuf messages
This commit swaps to using a Sync task to place generated code in the
src/generated folder instead of the gradle-protobuf-plugin's
generatedFilesBaseDir. This provides much nicer results on failed
builds, and you will no longer see all the generated files deleted.

But at the same time the Sync task makes it easy to only copy the
grpc-generated code. This was not previously done because we were lazy
and using generatedFilesBaseDir, which made it difficult to treat the
services differently from the messages.
2019-03-05 16:28:55 -07:00
Kun Zhang 02f55189aa
core: refactor load-balancing config handling (#5397)
The LoadBalancingConfig message, which looks like
```json
{
  "policy_name" : {
    "config_key1" : "config_value1",
    "config_key2" : "config_value2"
   }
}
```
appears multiple times. It gets super tedious and confusing to handle, because both the whole config and the value (in the above example is `{ "config_key1" : "config_value1" }`) are just `Map<String, Object>`, and each user needs to do the following validation:
 1. The whole config must have exactly one key
 2. The value must be a map

Here I define `LbConfig` that holds the policy name and the config value, and a method in `ServiceConfigUtil` that converts the parsed JSON format into `LbConfig`.

There is also multiple cases where you need to handle a list of configs (top-level balancing policy, child and fallback policies in xds, grpclb child policies). I also made another helper method in `ServiceConfigUtil` to convert them into `List<LbConfig>`.

Found and fixed a bug in the xds code, where the top-level balancer should pass the config value (excluding the policy name), not the whole config to the child balancers. Search for "supported_1_option" in the diff to see it in the tests.
2019-03-01 19:05:33 -08:00
ZHANG Dapeng b9fb649ce1
xds: fallback handling
added fallback handling
in addition: 
- made XdsLbState not abstract for now
- did not include graceful swapping balancers when service config change, for now just shutdown the old one and use the new one.
2019-02-26 13:13:42 -08:00
ZHANG Dapeng 5ae9d91039
xds: use shadow plugin for generated code
* Import envoy proto file to the latest internal version, which has correct java proto options. (The PGV proto, `validate.proto`, doesn't have the correct and up-to-date java_package proto option yet, but as long as we don't use those generated classes, it seems fine.)
* Stop modifying java proto options by import.sh.
* Apply shadow plugin when publishing.
2019-02-15 10:19:44 -08:00
Eric Anderson eaca73473c
Upgrade to protobuf 3.6.1
For Bazel, we upgrade to protobuf 3.6.1.2 and javalite HEAD to fix
incompatibilities in newer Bazel releases.

compiler/Dockerfile is unused, so it was removed instead of being updated.

protoc no longer includes codegen for nano, so we remain on the older protoc
any time nano is used.

Protobuf now requires C++11 when compiling, so windows was swapped to
VC 14.
2019-02-07 13:40:53 -08:00
ZHANG Dapeng ea8968beed
xds: implement xds plugin selection
- defined XdsLbState, playing a similar role to GrpclbState
- there are two modes of XdsLbState: STANDARD and CUSTOM
- on `XdsLoadBalancer.handleResolvedAddressGroups()`, the `xdsLoadBalancer` will update the `xdsLbState` based on the lb config in the attributes passed in
2019-02-05 14:07:09 -08:00
ZHANG Dapeng 7a547276da
xds: import from envoy and add ads.proto and lrs.proto 2019-01-11 14:46:04 -08:00
ZHANG Dapeng 94fefdda12
xds: import xds service protos
All files other than the following are generated by `import.sh`.
```
settings.gradle
xds/build.gradle
xds/third_party/envoy/import.sh
xds/third_party/protoc-gen-validate/import.sh
```
2018-12-17 10:50:54 -08:00