etcd-client starts retrying transient errors from the etcd cluster

This PR enables unaryClientInterceptor in conjunction with Prometheus interceptor.
Previously it was simply overwritten by the Prometheus interceptor.
As a result etcd client didn't attempt to retry certain errors.

The unaryClientInterceptor is important because it knows how to retry all sorts of errors from the etcd cluster. It will make the API server more resilient to failures -  end users won't see certain errors.
The full list of retriable (codes.Unavailable) errors can be found at https://github.com/etcd-io/etcd/blob/main/api/v3rpc/rpctypes/error.go#L72

Kubernetes-commit: b67cf46cf4dad696a4ffdec4d7d2deef2e78df36
This commit is contained in:
Lukasz Szaszkiewicz 2021-09-15 16:40:44 +02:00 committed by Kubernetes Publisher
parent abbd48dff0
commit 6b08eca4c8
1 changed files with 7 additions and 2 deletions

View File

@ -132,8 +132,13 @@ func newETCD3Client(c storagebackend.TransportConfig) (*clientv3.Client, error)
}
dialOptions := []grpc.DialOption{
grpc.WithBlock(), // block until the underlying connection is up
grpc.WithUnaryInterceptor(grpcprom.UnaryClientInterceptor),
grpc.WithStreamInterceptor(grpcprom.StreamClientInterceptor),
// use chained interceptors so that the default (retry and backoff) interceptors are added.
// otherwise they will be overwritten by the metric interceptor.
//
// these optional interceptors will be placed after the default ones.
// which seems to be what we want as the metrics will be collected on each attempt (retry)
grpc.WithChainUnaryInterceptor(grpcprom.UnaryClientInterceptor),
grpc.WithChainStreamInterceptor(grpcprom.StreamClientInterceptor),
}
if egressDialer != nil {
dialer := func(ctx context.Context, addr string) (net.Conn, error) {