8.1 KiB
A62: pick_first: sticky TRANSIENT_FAILURE and address order randomization
- Author(s): Easwar Swaminathan (@easwars)
- Approver: @markdroth
- Status: Draft
- Implemented in:
- Last updated: 2023-04-20
- Discussion at: https://groups.google.com/g/grpc-io/c/uUf0V5zZvQc
Abstract
This document describes a couple of changes being made to the pick_first LB
policy with regards to
- Expected behavior when connections to all addresses fail, and
- Support for address order randomization
Background
All gRPC implementations contain a simple load balancing policy named
pick_first that can be summarized as follows:
- It takes a list of addresses from the name resolver and attempts to connect to those addresses one at a time, in order, until it finds one that is reachable.
- Once it finds a reachable address:
- All RPCs sent on the gRPC channel will be sent to this address.
- If this connection breaks at a later point in time,
pick_firstwill not attempt to reconnect until the application requests that it does so, or makes an RPC.
- If none of the addresses are reachable, it applies an exponential backoff before attempting to reconnect.
There are a few problems with the existing pick_first functionality, which
will be described in the following subsections.
Sticky Transient Failure
When connections to all addresses fail, there are some similarities and some differences between the Java/Go implementations and the C-core implementation.
Similarities include:
- Reporting
TRANSIENT_FAILUREas the connectivity state to the channel. - Applying exponential backoff. During the backoff period, wait_for_ready RPCs are queued while other RPCs fail.
Differences show up after the backoff period ends:
- C-core remains in
TRANSIENT_FAILUREwhile Java/Go move toIDLE. - C-core attempts to reconnect to the given addresses, while Java/Go rely on the client application to make an RPC or an explicit attempt to connect.
- C-core moves to
READYonly when a connection attempt succeeds.
This behavior of staying in TRANSIENT_FAILURE until it can report READY is
called sticky TRANSIENT_FAILURE, or sticky-TF.
Current pick_first implementations that don't provide sticky-TF have the
following shortcomings:
- When none of the received addresses are reachable, client applications
experience long delays before their RPCs fail. This is because the channel
does not spend enough time in
TRANSIENT_FAILUREand goes back toCONNECTINGstate while attempting to reconnect. - priority LB policy maintains an ordered list of child policies, and
sends picks to the highest priority child reporting
READYorIDLE. It expects child policies to support sticky-TF, and if not, it can result in picks being sent to a higher priority child with no reachable backends, instead of a lower priority child that is reportingREADY. This comes up is in xDS in the following scenario:- A
LOGICAL_DNScluster is used under an aggregate cluster, and theLOGICAL_DNScluster is not the last cluster in the list. - Each cluster under the aggregate cluster is represented as a child policy
under
priority, and the leaf policy for aLOGICAL_DNScluster ispick_first. - Without sticky-TF support in
pick_first, it can lead to a situation where thepriorityLB policy continues to send picks to a higher priorityLOGICAL_DNScluster when none of the addresses behind it are reachable, becausepick_firstdoesn't reportTRANSIENT_FAILUREas its connectivity state. See gRFC A37 for more details on aggregate clusters.
- A
L4 Load Balancing
Because pick_first sends all requests to the same address, it is often used
for L4 load balancing by randomizing the order of the addresses used by each
client. In general, gRPC expects address ordering to be determined as part of
name resolution, not by the LB policy. For example, DNS servers may randomize
the order of addresses when there are multiple A/AAAA records, and the DNS
resolver in gRPC is expected to perform [RFC-6724][6724] address sorting.
However, there are some cases where DNS cannot randomize the address order,
either because the DNS server does not support that functionality or because it
is defeated by client-side DNS caching. To address such cases, it is desirable
to add a client-side mechanism for randomly shuffling the order of the
addresses.
pick_first via xDS
There are cases where it is desirable to perform L4 load balancing using
pick_first when getting addresses via xDS instead of DNS. As a result, we need
a way to configure use of this LB policy via xDS.
Note that client-side address shuffling may be equally desirable in this case, since the xDS server may send the same EDS resource (with the same endpoints in the same order) to all clients.
Related Proposals:
- gRFC A37: xDS Aggregate and Logical DNS Clusters
- gRFC A52: gRPC xDS Custom Load Balancer Configuration
- gRFC A56:
priority_experimentalLB policy
Proposal
Specific changes are described in their own subsections below.
Use sticky-TF by default
Using sticky-TF by default in all pick_first implementations would enable us
to overcome the shortcomings described above. This
would involve making the following changes to pick_first implementations, once
connections to all addresses fail:
- Report
TRANSIENT_FAILUREas the connectivity state. - Attempt to reconnect to the addresses indefinitely until a connection succeeds
(at which point, they should report
READY), or there is no RPC activity on the channel for the specifiedIDLE_TIMEOUT.
All gRPC implementations should implement IDLE_TIMEOUT and have it enabled by
default. A default value of 30 minutes is recommended.
Enable random shuffling of address list
At the time of this writing, pick_first implementations do not expect any
configuration to be passed to it. As part of this design, we will add a field to
its configuration that would enable it to randomly shuffle the order of the
addresses it receives.
{
// If set to true, instructs the LB policy to shuffle the order of the
// list of addresses received from the name resolver before attempting to
// connect to them.
"shuffleAddressList": boolean
}
In a gRPC implementation that supports this feature, when the
shuffleAddressList option is enabled, the pick_first Lb policy will randomly
shuffle the order of the addresses. This shuffling will be done when the LB
policy receives an updated address list from its parent.
Note that existing gRPC implementations that do not support this feature will ignore this field, so their behavior will remain unchanged.
pick_first via xDS
gRPC recently added support for custom load balancer configuration to be specified by the xDS server. See gRFC A52 for more details.
To enable the xDS server to specify pick_first using this mechanism, an
extension configuration message was added as part of Envoy PR
#26952.
gRPC's Custom LB policy functionality will be enhanced to support this new
extension and will result in the pick_first LB policy being used as the
locality and endpoint picking policy.
Temporary environment variable protection
During initial development, the GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG
environment variable will guard the following:
shuffleAddressListconfiguration knob in thepick_firstLB policy- Accepting the PickFirst config message as a Custom LB policy in xDS
Rationale
N/A
Implementation
Will be implemented in C-core, Java, Go, and Node.
Open issues (if applicable)
N/A