mirror of https://github.com/dapr/docs.git
Merge pull request #2326 from greenie-msft/resiliency_docs
Resiliency docs
This commit is contained in:
commit
80d3e28ae3
|
@ -6,7 +6,7 @@ weight: 100
|
|||
description: "Overview of the Dapr sidecar process"
|
||||
---
|
||||
|
||||
Dapr uses a [sidecar pattern]({{< ref "overview.md#sidecar-architecture" >}}), meaning the Dapr APIs are run and exposed on a separate process (i.e. the Dapr sidecar) running alongside your application. The Dapr sidecar process is named `daprd` and is launched in different ways depending on the hosting environment.
|
||||
Dapr uses a [sidecar pattern]({{< ref "concepts/overview.md#sidecar-architecture" >}}), meaning the Dapr APIs are run and exposed on a separate process (i.e. the Dapr sidecar) running alongside your application. The Dapr sidecar process is named `daprd` and is launched in different ways depending on the hosting environment.
|
||||
|
||||
<img src="/images/overview-sidecar-model.png" width=700>
|
||||
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
---
|
||||
type: docs
|
||||
title: "Error recovery using resiliency policies"
|
||||
linkTitle: "Resiliency"
|
||||
weight: 550
|
||||
description: "How to configure and customize Dapr error retries, timeouts and circuit breakers"
|
||||
---
|
|
@ -0,0 +1,93 @@
|
|||
---
|
||||
type: docs
|
||||
title: "Policies"
|
||||
linkTitle: "Policies"
|
||||
weight: 4500
|
||||
description: "Configure resiliency policies for timeouts, retries and circuit breakers"
|
||||
---
|
||||
|
||||
### Policies
|
||||
|
||||
You define timeouts, retries and circuit breaker policies under `policies`. Each policy is given a name so you can refer to them from the `targets` section in the resiliency spec.
|
||||
|
||||
#### Timeouts
|
||||
|
||||
Timeouts can be used to early-terminate long-running operations. If you've exceeded a timeout duration:
|
||||
|
||||
- The operation in progress is terminated (if possible).
|
||||
- An error is returned.
|
||||
|
||||
Valid values are of the form `15s`, `2m`, `1h30m`, etc.
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
spec:
|
||||
policies:
|
||||
# Timeouts are simple named durations.
|
||||
timeouts:
|
||||
general: 5s
|
||||
important: 60s
|
||||
largeResponse: 10s
|
||||
```
|
||||
|
||||
#### Retries
|
||||
|
||||
With `retries`, you can define a retry strategy for failed operations, including requests failed due to triggering a defined timeout or circuit breaker policy. The following retry options are configurable:
|
||||
|
||||
| Retry option | Description |
|
||||
| ------------ | ----------- |
|
||||
| `policy` | Determines the back-off and retry interval strategy. Valid values are `constant` and `exponential`. Defaults to `constant`. |
|
||||
| `duration` | Determines the time interval between retries. Default: `5s`. Only applies to the `constant` policy. Valid values are of the form `200ms`, `15s`, `2m`, etc. |
|
||||
| `maxInterval` | Determines the maximum interval between retries to which the `exponential` back-off policy can grow. Additional retries always occur after a duration of `maxInterval`. Defaults to `60s`. Valid values are of the form `5s`, `1m`, `1m30s`, etc |
|
||||
| `maxRetries` | The maximum number of retries to attempt. `-1` denotes an indefinite number of retries. Defaults to `-1`. |
|
||||
|
||||
The exponential back-off window uses the following formula:
|
||||
|
||||
```
|
||||
BackOffDuration = PreviousBackOffDuration * (Random value from 0.5 to 1.5) * 1.5
|
||||
if BackOffDuration > maxInterval {
|
||||
BackoffDuration = maxInterval
|
||||
}
|
||||
```
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
spec:
|
||||
policies:
|
||||
# Retries are named templates for retry configurations and are instantiated for life of the operation.
|
||||
retries:
|
||||
pubsubRetry:
|
||||
policy: constant
|
||||
duration: 5s
|
||||
maxRetries: 10
|
||||
|
||||
retryForever:
|
||||
policy: exponential
|
||||
maxInterval: 15s
|
||||
maxRetries: -1 # Retry indefinitely
|
||||
```
|
||||
|
||||
##### Circuit breakers
|
||||
|
||||
Circuit breakers (CBs) policies are used when other applications/services/components are experiencing elevated failure rates. CBs monitor the requests and shut off all traffic to the impacted service when a certain criteria is met. By doing this, CBs give the service time to recover from their outage instead of flooding them with events. The CB can also allow partial traffic through to see if the system has healed (half-open state). Once successful requests start to occur, the CB can close and allow traffic to resume.
|
||||
|
||||
| Retry option | Description |
|
||||
| ------------ | ----------- |
|
||||
| `maxRequests` | The maximum number of requests allowed to pass through when the CB is half-open (recovering from failure). Defaults to `1`. |
|
||||
| `interval` | The cyclical period of time used by the CB to clear its internal counts. If set to 0 seconds, this never clears. Defaults to `0s`. |
|
||||
| `timeout` | The period of the open state (directly after failure) until the CB switches to half-open. Defaults to `60s`. |
|
||||
| `trip` | A Common Expression Language (CEL) statement that is evaluated by the CB. When the statement evaluates to true, the CB trips and becomes open. Default is `consecutiveFailures > 5`. |
|
||||
| `circuitBreakerScope` | Specify whether circuit breaking state should be scoped to an individual actor ID, all actors across the actor type, or both. Possible values include `id`, `type`, or `both`|
|
||||
| `circuitBreakerCacheSize` | Specify a cache size for the number of CBs to keep in memory. The value should be larger than the expected number of active actor instances. Provide an integer value, for example `5000`. |
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
spec:
|
||||
policies:
|
||||
circuitBreakers:
|
||||
pubsubCB:
|
||||
maxRequests: 1
|
||||
interval: 8s
|
||||
timeout: 45s
|
||||
trip: consecutiveFailures > 8
|
||||
```
|
|
@ -0,0 +1,168 @@
|
|||
---
|
||||
type: docs
|
||||
title: "Overview"
|
||||
linkTitle: "Overview"
|
||||
weight: 4500
|
||||
description: "Configure Dapr retries, timeouts, and circuit breakers"
|
||||
---
|
||||
{{% alert title="Note" color="primary" %}}
|
||||
Resiliency is currently a preview feature. Before you can utilize a resiliency spec, you must first [enable the resiliency preview feature]({{< ref support-preview-features >}}).
|
||||
{{% /alert %}}
|
||||
|
||||
Distributed applications are commonly comprised of many microservices, with dozens, even hundreds, of instances for any given application. With so many microservices, the likelihood of a system failure increases. For example, an instance can fail or be unresponsive due to hardware, an overwhelming number of requests, application restarts/scale outs, or several other reasons. These events can cause a network call between services to fail. Designing and implementing your application with fault tolerance, the ability to detect, mitigate, and respond to failures, allows your application to recover to a functioning state and become self healing.
|
||||
|
||||
Dapr provides a capability for defining and applying fault tolerance resiliency policies via a [resiliency spec]({{< ref "resiliency-overview.md#complete-example-policy" >}}). Resiliency specs are saved in the same location as components specs and are applied when the Dapr sidecar starts. The sidecar determines how to apply resiliency policies to your Dapr API calls. In self-hosted mode, the resiliency spec must be named `resiliency.yaml`. In Kubernetes Dapr finds the named resiliency specs used by your application. Within the resiliency spec, you can define policies for popular resiliency patterns, such as:
|
||||
|
||||
- [Timeouts]({{< ref "policies.md#timeouts" >}})
|
||||
- [Retries/back-offs]({{< ref "policies.md#retries" >}})
|
||||
- [Circuit breakers]({{< ref "policies.md#circuit-breakers" >}})
|
||||
|
||||
Policies can then be applied to [targets]({{< ref "targets.md" >}}), which include:
|
||||
|
||||
- [Apps]({{< ref "targets.md#apps" >}}) via service invocation
|
||||
- [Components]({{< ref "targets.md#components" >}})
|
||||
- [Actors]({{< ref "targets.md#actors" >}})
|
||||
|
||||
Additionally, resiliency policies can be [scoped to specific apps]({{< ref "component-scopes.md#application-access-to-components-with-scopes" >}}).
|
||||
|
||||
Below is the general structure of a resiliency policy:
|
||||
|
||||
```yaml
|
||||
apiVersion: dapr.io/v1alpha1
|
||||
kind: Resiliency
|
||||
metadata:
|
||||
name: myresiliency
|
||||
scopes:
|
||||
# optionally scope the policy to specific apps
|
||||
spec:
|
||||
policies:
|
||||
timeouts:
|
||||
# timeout policy definitions
|
||||
|
||||
retries:
|
||||
# retry policy definitions
|
||||
|
||||
circuitBreakers:
|
||||
# circuit breaker policy definitions
|
||||
|
||||
targets:
|
||||
apps:
|
||||
# apps and their applied policies here
|
||||
|
||||
actors:
|
||||
# actor types and their applied policies here
|
||||
|
||||
components:
|
||||
# components and their applied policies here
|
||||
```
|
||||
|
||||
### Complete example policy
|
||||
|
||||
```yaml
|
||||
apiVersion: dapr.io/v1alpha1
|
||||
kind: Resiliency
|
||||
metadata:
|
||||
name: myresiliency
|
||||
# similar to subscription and configuration specs, scopes lists the Dapr App IDs that this
|
||||
# resiliency spec can be used by.
|
||||
scopes:
|
||||
- app1
|
||||
- app2
|
||||
spec:
|
||||
# policies is where timeouts, retries and circuit breaker policies are defined.
|
||||
# each is given a name so they can be referred to from the targets section in the resiliency spec.
|
||||
policies:
|
||||
# timeouts are simple named durations.
|
||||
timeouts:
|
||||
general: 5s
|
||||
important: 60s
|
||||
largeResponse: 10s
|
||||
|
||||
# retries are named templates for retry configurations and are instantiated for life of the operation.
|
||||
retries:
|
||||
pubsubRetry:
|
||||
policy: constant
|
||||
duration: 5s
|
||||
maxRetries: 10
|
||||
|
||||
retryForever:
|
||||
policy: exponential
|
||||
maxInterval: 15s
|
||||
maxRetries: -1 # retry indefinitely
|
||||
|
||||
important:
|
||||
policy: constant
|
||||
duration: 5s
|
||||
maxRetries: 30
|
||||
|
||||
someOperation:
|
||||
policy: exponential
|
||||
maxInterval: 15s
|
||||
|
||||
largeResponse:
|
||||
policy: constant
|
||||
duration: 5s
|
||||
maxRetries: 3
|
||||
|
||||
# circuit breakers are automatically instantiated per component and app instance.
|
||||
# circuit breakers maintain counters that live as long as the Dapr sidecar is running. They are not persisted.
|
||||
circuitBreakers:
|
||||
simpleCB:
|
||||
maxRequests: 1
|
||||
timeout: 30s
|
||||
trip: consecutiveFailures >= 5
|
||||
|
||||
pubsubCB:
|
||||
maxRequests: 1
|
||||
interval: 8s
|
||||
timeout: 45s
|
||||
trip: consecutiveFailures > 8
|
||||
|
||||
# targets are what named policies are applied to. Dapr supports 3 target types - apps, components and actors
|
||||
targets:
|
||||
apps:
|
||||
appB:
|
||||
timeout: general
|
||||
retry: important
|
||||
# circuit breakers for services are scoped app instance.
|
||||
# when a breaker is tripped, that route is removed from load balancing for the configured `timeout` duration.
|
||||
circuitBreaker: simpleCB
|
||||
|
||||
actors:
|
||||
myActorType: # custom Actor Type Name
|
||||
timeout: general
|
||||
retry: important
|
||||
# circuit breakers for actors are scoped by type, id, or both.
|
||||
# when a breaker is tripped, that type or id is removed from the placement table for the configured `timeout` duration.
|
||||
circuitBreaker: simpleCB
|
||||
circuitBreakerScope: both ##
|
||||
circuitBreakerCacheSize: 5000
|
||||
|
||||
components:
|
||||
# for state stores, policies apply to saving and retrieving state.
|
||||
statestore1: # any component name -- happens to be a state store here
|
||||
outbound:
|
||||
timeout: general
|
||||
retry: retryForever
|
||||
# circuit breakers for components are scoped per component configuration/instance. For example myRediscomponent.
|
||||
# when this breaker is tripped, all interaction to that component is prevented for the configured `timeout` duration.
|
||||
circuitBreaker: simpleCB
|
||||
|
||||
pubsub1: # any component name -- happens to be a pubsub broker here
|
||||
outbound:
|
||||
retry: pubsubRetry
|
||||
circuitBreaker: pubsubCB
|
||||
|
||||
pubsub2: # any component name -- happens to be another pubsub broker here
|
||||
outbound:
|
||||
retry: pubsubRetry
|
||||
circuitBreaker: pubsubCB
|
||||
inbound: # inbound only applies to delivery from sidecar to app
|
||||
timeout: general
|
||||
retry: important
|
||||
circuitBreaker: pubsubCB
|
||||
```
|
||||
|
||||
## Related links
|
||||
- [Policies]({{< ref "policies.md" >}})
|
||||
- [Targets]({{< ref "targets.md" >}})
|
|
@ -0,0 +1,132 @@
|
|||
---
|
||||
type: docs
|
||||
title: "Targets"
|
||||
linkTitle: "Targets"
|
||||
weight: 4500
|
||||
description: "Apply resiliency policies to apps, components and actors"
|
||||
---
|
||||
|
||||
### Targets
|
||||
Named policies are applied to targets. Dapr supports three target types that apply all Dapr building block APIs:
|
||||
- `apps`
|
||||
- `components`
|
||||
- `actors`
|
||||
|
||||
#### Apps
|
||||
|
||||
With the `apps` target, you can apply `retry`, `timeout`, and `circuitBreaker` policies to service invocation calls between Dapr apps. Under `targets/apps`, policies are applied to each target service's `app-id`. The policies are invoked when a failure occurs in communication between sidecars, as shown in the diagram below.
|
||||
|
||||
> Dapr provides [built-in service invocation retries]({{< ref "service-invocation-overview.md#retries" >}}), so any applied `retry` policies are additional.
|
||||
|
||||
<img src="/images/resiliency_svc_invocation.png" width=1000 alt="Diagram showing service invocation resiliency" />
|
||||
|
||||
Example of policies to a target app with the `app-id` "appB":
|
||||
|
||||
```yaml
|
||||
specs:
|
||||
targets:
|
||||
apps:
|
||||
appB: # app-id of the target service
|
||||
timeout: general
|
||||
retry: general
|
||||
circuitBreaker: general
|
||||
```
|
||||
|
||||
|
||||
#### Components
|
||||
|
||||
With the `components` target, you can apply `retry`, `timeout` and `circuitBreaker` policies to component operations.
|
||||
|
||||
Policies can be applied for `outbound` operations (calls to the Dapr sidecar) and/or `inbound` (the sidecar calling your app).
|
||||
|
||||
##### Outbound
|
||||
|
||||
`outbound` operations are calls from the sidecar to a component, such as:
|
||||
|
||||
- Persisting or retrieving state.
|
||||
- Publishing a message.
|
||||
- Invoking an output binding.
|
||||
|
||||
> Some components may have built-in retry capabilities and are configured on a per-component basis.
|
||||
|
||||
<img src="/images/resiliency_outbound.png" width=1000 alt="Diagram showing service invocation resiliency">
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
targets:
|
||||
components:
|
||||
myStateStore:
|
||||
outbound:
|
||||
retry: retryForever
|
||||
circuitBreaker: simpleCB
|
||||
```
|
||||
|
||||
##### Inbound
|
||||
|
||||
`inbound` operations are calls from the sidecar to your application, such as:
|
||||
|
||||
- Subscriptions when delivering a message.
|
||||
- Input bindings.
|
||||
|
||||
> Some components may have built-in retry capabilities and are configured on a per-component basis.
|
||||
|
||||
<img src="/images/resiliency_inbound.png" width=1000 alt="Diagram showing service invocation resiliency" />
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
targets:
|
||||
components:
|
||||
myInputBinding:
|
||||
inbound:
|
||||
timeout: general
|
||||
retry: general
|
||||
circuitBreaker: general
|
||||
```
|
||||
|
||||
##### PubSub
|
||||
|
||||
In a PubSub `target/component`, you can specify both `inbound` and `outbound` operations.
|
||||
|
||||
<img src="/images/resiliency_pubsub.png" width=1000 alt="Diagram showing service invocation resiliency">
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
targets:
|
||||
components:
|
||||
myPubsub:
|
||||
outbound:
|
||||
retry: pubsubRetry
|
||||
circuitBreaker: pubsubCB
|
||||
inbound: # inbound only applies to delivery from sidecar to app
|
||||
timeout: general
|
||||
retry: general
|
||||
circuitBreaker: general
|
||||
```
|
||||
|
||||
#### Actors
|
||||
|
||||
With the `actors` target, you can apply `retry`, `timeout`, and `circuitBreaker` policies to actor operations.
|
||||
|
||||
When using a `circuitBreaker` policy, you can specify whether circuit breaking state should be scoped to:
|
||||
|
||||
- An individual actor ID.
|
||||
- All actors across the actor type.
|
||||
- Both.
|
||||
|
||||
Specify `circuitBreakerScope` with values `id`, `type`, or `both`.
|
||||
|
||||
You can specify a cache size for the number of circuit breakers to keep in memory. Do this by specifying `circuitBreakerCacheSize` and providing an integer value, e.g. `5000`.
|
||||
|
||||
Example
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
targets:
|
||||
actors:
|
||||
myActorType:
|
||||
timeout: general
|
||||
retry: general
|
||||
circuitBreaker: general
|
||||
circuitBreakerScope: both
|
||||
circuitBreakerCacheSize: 5000
|
||||
```
|
|
@ -19,3 +19,5 @@ For CLI there is no explicit opt-in, just the version that this was first made a
|
|||
| **Pub/Sub routing** | Allow the use of expressions to route cloud events to different URIs/paths and event handlers in your application. | `PubSub.Routing` | [How-To: Publish a message and subscribe to a topic]({{<ref howto-route-messages>}}) | v1.7 |
|
||||
| **ARM64 Mac Support** | Dapr CLI, sidecar, and Dashboard are now natively compiled for ARM64 Macs, along with Dapr CLI installation via Homebrew. | N/A | [Install the Dapr CLI]({{<ref install-dapr-cli>}}) | v1.5 |
|
||||
| **--image-registry** flag with Dapr CLI| In self hosted mode you can set this flag to specify any private registry to pull the container images required to install Dapr| N/A | [init CLI command reference]({{<ref "dapr-init.md#self-hosted-environment" >}}) | v1.7 |
|
||||
| **Resiliency** | Allows configuring of fine-grained policies for retries, timeouts and circuitbreaking. | `Resiliency` | [Configure Resiliency Policies]({{<ref "resiliency-overview">}}) |
|
||||
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 170 KiB |
Binary file not shown.
After Width: | Height: | Size: 218 KiB |
Binary file not shown.
After Width: | Height: | Size: 272 KiB |
Binary file not shown.
After Width: | Height: | Size: 191 KiB |
Loading…
Reference in New Issue