grpc-go/anti-patterns.md at 366decfd50a73c37911a51606e08ceda38c3d17c

7.6 KiB

Raw Blame History

Anti-Patterns of Client creation

How to properly create a `ClientConn`: `grpc.NewClient`

grpc.NewClient is the function in the gRPC library that creates a virtual connection from a client application to a gRPC server. It takes a target URI (which represents the name of a logical backend service and resolves to one or more physical addresses) and a list of options, and returns a ClientConn object that represents the virtual connection to the server. The ClientConn contains one or more actual connections to real servers and attempts to maintain these connections by automatically reconnecting to them when they break. NewClient was introduced in gRPC-Go v1.63.

The wrong way: `grpc.Dial`

grpc.Dial is a deprecated function that also creates the same virtual connection pool as grpc.NewClient. However, unlike grpc.NewClient, it immediately starts connecting and supports a few additional DialOptions that control this initial connection attempt. These are: WithBlock, WithTimeout, WithReturnConnectionError, and FailOnNonTempDialError.

That grpc.Dial creates connections immediately is not a problem in and of itself, but this behavior differs from how gRPC works in all other languages, and it can be convenient to have a constructor that does not perform I/O. It can also be confusing to users, as most people expect a function called Dial to create a connection which may need to be recreated if it is lost.

grpc.Dial uses "passthrough" as the default name resolver for backward compatibility while grpc.NewClient uses "dns" as its default name resolver. This subtle difference is important to legacy systems that also specified a custom dialer and expected it to receive the target string directly.

For these reasons, using grpc.Dial is discouraged. Even though it is marked as deprecated, we will continue to support it until a v2 is released (and no plans for a v2 exist at the time this was written).

Especially bad: using deprecated `DialOptions`

FailOnNonTempDialError, WithBlock, and WithReturnConnectionError are three DialOptions that are only supported by Dial because they only affect the behavior of Dial itself. WithBlock causes Dial to wait until the ClientConn reports its State as connectivity.Connected. The other two deal with returning connection errors before the timeout (WithTimeout or on the context when using DialContext).

The reason these options can be a problem is that connections with a ClientConn are dynamic -- they may come and go over time. If your client successfully connects, the server could go down 1 second later, and your RPCs will fail. "Knowing you are connected" does not tell you much in this regard.

Additionally, all RPCs created on an "idle" or a "connecting" ClientConn will wait until their deadline or until a connection is established before failing. This means that you don't need to check that a ClientConn is "ready" before starting your RPCs. By default, RPCs will fail if the ClientConn enters the "transient failure" state, but setting WaitForReady(true) on a call will cause it to queue even in the "transient failure" state, and it will only ever fail due to a deadline, a server response, or a connection loss after the RPC was sent to a server.

Some users of Dial use it as a way to validate the configuration of their system. If you wish to maintain this behavior but migrate to NewClient, you can call GetState, then Connect if the state is Idle and WaitForStateChange until the channel is connected. However, if this fails, it does not mean that your configuration was bad - it could also mean the service is not reachable by the client due to connectivity reasons.

Best practices for error handling in gRPC

Instead of relying on failures at dial time, we strongly encourage developers to rely on errors from RPCs. When a client makes an RPC, it can receive an error response from the server. These errors can provide valuable information about what went wrong, including information about network issues, server-side errors, and incorrect usage of the gRPC API.

By handling errors from RPCs correctly, developers can write more reliable and robust gRPC applications. Here are some best practices for error handling in gRPC:

Always check for error responses from RPCs and handle them appropriately.
Use the status field of the error response to determine the type of error that occurred.
When retrying failed RPCs, consider using the built-in retry mechanism provided by gRPC-Go, if available, instead of manually implementing retries. Refer to the gRPC-Go retry example documentation for more information. Note that this is not a substitute for client-side retries as errors that occur after an RPC starts on a server cannot be retried through gRPC's built-in mechanism.
If making an outgoing RPC from a server handler, be sure to translate the status code before returning the error from your method handler. For example, if the error is an INVALID_ARGUMENT status code, that probably means your service has a bug (otherwise it shouldn't have triggered this error), in which case INTERNAL is more appropriate to return back to your users.

Example: Handling errors from an RPC

The following code snippet demonstrates how to handle errors from an RPC in gRPC:

ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()

res, err := client.MyRPC(ctx, &MyRequest{})
if err != nil {
    // Handle the error appropriately,
    // log it & return an error to the caller, etc.
    log.Printf("Error calling MyRPC: %v", err)
    return nil, err
}

// Use the response as appropriate
log.Printf("MyRPC response: %v", res)

To determine the type of error that occurred, you can use the status field of the error response:

resp, err := client.MakeRPC(context.TODO(), request)
if err != nil {
  if status, ok := status.FromError(err); ok {
    // Handle the error based on its status code
    if status.Code() == codes.NotFound {
      log.Println("Requested resource not found")
    } else {
      log.Printf("RPC error: %v", status.Message())
    }
  } else {
    // Handle non-RPC errors
    log.Printf("Non-RPC error: %v", err)
  }
  return
}

// Use the response as needed
log.Printf("Response received: %v", resp)

Example: Using a backoff strategy

When retrying failed RPCs, use a backoff strategy to avoid overwhelming the server or exacerbating network issues:

var res *MyResponse
var err error

retryableStatusCodes := map[codes.Code]bool{
  codes.Unavailable: true, // etc
}

// Retry the RPC a maximum number of times.
for i := 0; i < maxRetries; i++ {
    // Make the RPC.
    res, err = client.MyRPC(context.TODO(), &MyRequest{})

    // Check if the RPC was successful.
    if !retryableStatusCodes[status.Code(err)] {
        // The RPC was successful or errored in a non-retryable way;
        // do not retry.
        break
    }

    // The RPC is retryable; wait for a backoff period before retrying.
    backoff := time.Duration(i+1) * time.Second
    log.Printf("Error calling MyRPC: %v; retrying in %v", err, backoff)
    time.Sleep(backoff)
}

// Check if the RPC was successful after all retries.
if err != nil {
    // All retries failed, so handle the error appropriately
    log.Printf("Error calling MyRPC: %v", err)
    return nil, err
}

// Use the response as appropriate.
log.Printf("MyRPC response: %v", res)

7.6 KiB Raw Blame History