Initial start of resliency docs

Signed-off-by: Nick Greenfield <nigreenf@microsoft.com>
This commit is contained in:
Nick Greenfield 2022-04-04 14:15:20 -07:00
parent fcd2df416c
commit fd65847297
10 changed files with 316 additions and 284 deletions

View File

@ -6,7 +6,7 @@ weight: 100
description: "Overview of the Dapr sidecar process"
---
Dapr uses a [sidecar pattern]({{< ref "overview.md#sidecar-architecture" >}}), meaning the Dapr APIs are run and exposed on a separate process (i.e. the Dapr sidecar) running alongside your application. The Dapr sidecar process is named `daprd` and is launched in different ways depending on the hosting environment.
Dapr uses a [sidecar pattern]({{< ref "concepts/overview.md#sidecar-architecture" >}}), meaning the Dapr APIs are run and exposed on a separate process (i.e. the Dapr sidecar) running alongside your application. The Dapr sidecar process is named `daprd` and is launched in different ways depending on the hosting environment.
<img src="/images/overview-sidecar-model.png" width=700>

View File

@ -1,282 +0,0 @@
---
type: docs
title: "How-To: Error recovery using resiliency policies"
linkTitle: "Resiliency Policies"
weight: 4500
description: "Configure Dapr error retries, timeouts and circuit breakers"
---
Resiliency is currently a preview feature. Before you can utilize resiliency policies you must first [enable the resiliency preview feature]({{<ref preview-features >}}).
## Introduction
- TODO: What is resiliency in Dapr?
- TODO: What problems does it solve?
## Overview
A Dapr resiliency policy allow retry, timeout and circuitbreaker policies to be created and applied to particular targets including specific components and apps (service invocation calls to other apps).
Additionally, resiliency policies can also be [scoped to specific apps]({{<ref "component-scopes.md#application-access-to-components-with-scopes">}}).
In selfhosted mode the resiliency policy must be named `resiliency.yaml` and reside in the components folder provided to the sidecar. In Kubernetes Dapr scans all resiliency policies.
The general structure of a resiliency policy looks like this:
```yaml
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
name: resiliency
scopes:
# optionally scope the policy to specific apps
spec:
policies:
timeouts:
# timeout policy definitions
retries:
# retry policy definitions
circuitBreakers:
# circuit breaker policy definitions
targets:
apps:
# apps and their applied policies here
actors:
# actor types and their applied policies here
components:
# components and their applied policies here
```
### Policies
#### Timeouts
Timeouts can be used to early-terminate long-running operations. If a timeout is exceeded the operation in progress will be terminated if possible and an error is returned. Valid values are of the form `15s`, `2m`, `1h30m`, etc
Example definitions:
```yaml
spec:
policies:
# Timeouts are simple named durations.
timeouts:
general: 5s
important: 60s
largeResponse: 10s
```
#### Retries
Retries allow defining of a retry stragegy for failed operations. Requests failed due to triggering a defined timeout or circuit breaker policy will also be retried per the retry strategy. The following retry options are configurable:
- `policy`: determines the backoff and retry interval strategy. Valid values are `constant` and `exponential`. Defaults to `constant`.
- `duration`: determines the time interval between retries. Default: `5s`. Only applies to the `constant` `policy`. Valid values are of the form `200ms`, `15s`, `2m`, etc
- `maxInterval`: determines the largest interval between retries to which the `exponential` backoff `policy` can grow. Additional retries will always occur after a duration of `maxInterval`. Defaults to `60s`. Valid values are of the form `5s`, `1m`, `1m30s`, etc
- `maxRetries`: The number of retries to attempt. `-1` denotes an indefinite number of retries. Defaults to `-1`.
The exponential backoff window uses the following formula:
```
BackOffDuration = PreviousBackOffDuration * (Random value from 0.5 to 1.5) * 1.5
if BackOffDuration > maxInterval {
BackoffDuration = maxInterval
}
```
Example definitions:
```yaml
spec:
policies:
# Retries are named templates for retry configurations and are instantiated for life of the operation.
retries:
pubsubRetry:
policy: constant
duration: 5s
maxRetries: 10
retryForever:
policy: exponential
maxInterval: 15s
maxRetries: -1 # Retry indefinitely
```
##### Circuit Breakers
???
Example:
```yaml
spec:
policies:
circuitBreakers:
pubsubCB:
maxRequests: 1
interval: 8s
timeout: 45s
trip: consecutiveFailures > 8
```
### Targets
#### Apps
Allows applying of `retry`, `timeout` and `circuitbreaker` policies to service invocation calls to other Dapr apps. Policy assignments are optional.
Example
```yaml
specs:
targets:
apps:
appB:
timeout: general
retry: general
circuitBreaker: general
```
#### Actors
Allows applying of `retry`, `timeout` and `circuitbreaker` policies to actor operations. Policy assignments are optional.
When using a `circuitbreaker` policy, you can additionally specify whether circuit breaking state should be scoped to an invididual actor ID, to all actors across the actor type, or both. Specify `circuitBreakerScope` with values `id`, `type`, or `both`.
Additionally, you can specify a cache size for the number of circuit breakers to keep in memory. This can be done by specifying `circuitBreakerCacheSize` and providing an integer value, e.g. `5000`.
Example
```yaml
spec:
targets:
actors:
myActorType:
timeout: general
retry: general
circuitBreaker: general
circuitBreakerScope: both
circuitBreakerCacheSize: 5000
```
#### Components
Allows applying of `retry`, `timeout` and `circuitbreaker` policies to components operations. Policy assignments are optional. Policies can be applied for `outbound` operations (calls to the Dapr sidecar) or `inbound` (the sidecar calling your app). At this time, inbound only applies to PubSub and InputBinding components.
Example
```yaml
spec:
targets:
components:
myPubsub:
outbound:
retry: pubsubRetry
circuitBreaker: pubsubCB
inbound: # inbound only applies to delivery from sidecar to app
timeout: general
retry: general
circuitBreaker: general
```
### Complete Example Policy
TODO: What are the `general` retries and circuit breakers in this example? Are they provided by Dapr by default, or is the example just wrong?
```yaml
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
name: resiliency
# Like in the Subscriptions CRD, scopes lists the Dapr App IDs that this
# configuration applies to.
scopes:
- app1
- app2
spec:
policies:
# Timeouts are simple named durations.
timeouts:
general: 5s
important: 60s
largeResponse: 10s
# Retries are named templates for retry configurations and are instantiated for life of the operation.
retries:
pubsubRetry:
policy: constant
duration: 5s
maxRetries: 10
retryForever:
policy: exponential
maxInterval: 15s
maxRetries: -1 # Retry indefinitely
important:
policy: constant
duration: 5s
maxRetries: 30
someOperation:
policy: exponential
maxInterval: 15s
largeResponse:
policy: constant
duration: 5s
maxRetries: 3
# Circuit breakers are automatically instantiated per component and app endpoint.
# Circuit breakers maintain counters that can live as long as the Dapr sidecar.
circuitBreakers:
pubsubCB:
maxRequests: 1
interval: 8s
timeout: 45s
trip: consecutiveFailures > 8
# This section specifies policies for:
# * service invocation
# * requests to components
targets:
apps:
appB:
timeout: general
retry: general
# Circuit breakers for services are scoped per endpoint (e.g. hostname + port).
# When a breaker is tripped, that route is removed from load balancing for the configured `timeout` duration.
circuitBreaker: general
actors:
myActorType: # custom Actor Type Name
timeout: general
retry: general
# Circuit breakers for actors are scoped by type, id, or both.
# When a breaker is tripped, that type or id is removed from the placement table for the configured `timeout` duration.
circuitBreaker: general
circuitBreakerScope: both
circuitBreakerCacheSize: 5000
components:
# For state stores, policies apply to saving and retrieving state.
statestore1: # any component name -- happens to be a state store here
outbound:
timeout: general
retry: general
# Circuit breakers for components are scoped per component configuration/instance (e.g. redis1).
# When this breaker is tripped, all interaction to that component is prevented for the configured `timeout` duration.
circuitBreaker: general
pubsub1: # any component name -- happens to be a pubsub broker here
outbound:
retry: pubsubRetry
circuitBreaker: pubsubCB
pubsub2: # any component name -- happens to be another pubsub broker here
outbound:
retry: pubsubRetry
circuitBreaker: pubsubCB
inbound: # inbound only applies to delivery from sidecar to app
timeout: general
retry: general
circuitBreaker: general
```

View File

@ -0,0 +1,81 @@
---
type: docs
title: "Policies"
linkTitle: "Policies"
weight: 4500
description: "Configure resiliency policies for timeouts, retries/backoffs and circuit breakers"
---
### Policies
Policies is where timeouts, retries and circuit breaker policies are defined. Each is given a name so they can be referred to from the `targets` section in the resiliency spec.
#### Timeouts
Timeouts can be used to early-terminate long-running operations. If a timeout is exceeded the operation in progress will be terminated if possible and an error is returned. Valid values are of the form `15s`, `2m`, `1h30m`, etc
Example definitions:
```yaml
spec:
policies:
# Timeouts are simple named durations.
timeouts:
general: 5s
important: 60s
largeResponse: 10s
```
#### Retries
Retries allow defining of a retry stragegy for failed operations. Requests failed due to triggering a defined timeout or circuit breaker policy will also be retried per the retry strategy. The following retry options are configurable:
- `policy`: determines the backoff and retry interval strategy. Valid values are `constant` and `exponential`. Defaults to `constant`.
- `duration`: determines the time interval between retries. Default: `5s`. Only applies to the `constant` `policy`. Valid values are of the form `200ms`, `15s`, `2m`, etc
- `maxInterval`: determines the largest interval between retries to which the `exponential` backoff `policy` can grow. Additional retries will always occur after a duration of `maxInterval`. Defaults to `60s`. Valid values are of the form `5s`, `1m`, `1m30s`, etc
- `maxRetries`: The number of retries to attempt. `-1` denotes an indefinite number of retries. Defaults to `-1`.
The exponential backoff window uses the following formula:
```
BackOffDuration = PreviousBackOffDuration * (Random value from 0.5 to 1.5) * 1.5
if BackOffDuration > maxInterval {
BackoffDuration = maxInterval
}
```
Example definitions:
```yaml
spec:
policies:
# Retries are named templates for retry configurations and are instantiated for life of the operation.
retries:
pubsubRetry:
policy: constant
duration: 5s
maxRetries: 10
retryForever:
policy: exponential
maxInterval: 15s
maxRetries: -1 # Retry indefinitely
```
##### Circuit Breakers
Circuit Breakers (CBs) are policies that are used when other applications/services/components are experiencing elevated failure rates. Their purpose is to monitor the requests and, when a certain criteria is met, shut off all traffic to the impacted service. This is to give the service time to recover from their outage instead of flooding them with events. The circuit breaker can also allow partial traffic through to see if the system has healed (half open state). Once successful requests start to occur, the CB can close and allow traffic to resume.
- `maxRequests`: The maximum number of requests allowed to pass through when the CB is half-open (recovering from failure). Defaults to `1`.
- `interval`: The cyclical period of time used by the CB to clear its internal counts. If set to 0 seconds, this will never clear. Defaults to `0s`.
- `timeout`: The period of the open state (directly after failure) until the CB switches to half-open. Defaults to `60s`.
- `trip`: A Common Expression Language (CEL) statement that is evaluated by the CB. When the statement evaluates to true, the CB trips and becomes open. Default is `consecutiveFailures > 5`.
Example:
```yaml
spec:
policies:
circuitBreakers:
pubsubCB:
maxRequests: 1
interval: 8s
timeout: 45s
trip: consecutiveFailures > 8
```

View File

@ -0,0 +1,156 @@
---
type: docs
title: "Overview"
linkTitle: "Overview"
weight: 4500
description: "Configure Dapr error retries, timeouts and circuit breakers"
---
Resiliency is currently a preview feature. Before you can utilize resiliency policies you must first [enable the resiliency preview feature]({{<ref preview-features >}}).
## Introduction
Distributed applications are commonly comprised of many moving pieces, there could be dozens or even hundreds of instances for any given service. With this many moving pieces, the likelihood of a system failure increases. An instance can fail for any number of reasons, for example, hardware failures, overwhelming number of requests, application restarts/scale outs. Any of these events can cause a network call between services to fail. Having your application designed with the ability to detect and mitigate these failures allows for your application to respond and recover quickly back to a functioning state.
## Overview
Dapr provies a mechanism for defining and applying resiliency policies via a [resiliency spec]({{<ref "resiliency-overview.md#complete-example-policy">}}). The resiliency spec sits with your components and is applied when the dapr sidecar starts. It's up to the sidecar to know when and how to apply resiliency policies to your Dapr APIs calls. Within the resiliency spec, you define policies for popular resiliency patterns, such as [timeouts]({{<ref "policies.md#timeouts">}}), [retries/back-offs]({{<ref "policies.md#retries">}}) and [circuit breakers]({{<ref "policies.md#circuit-breakers">}}). Policies can then be applied consistently to [targets]({{<ref "targets.md">}}), which include [apps]({{<ref "targets.md#apps">}}) via service invocation, [components]({{<ref "targets.md#components">}}) and [actors]({{<ref "targets.md#actors">}}).
Additionally, resiliency policies can be [scoped to specific apps]({{<ref "component-scopes.md#application-access-to-components-with-scopes">}}).
Below is the general structure of what a resiliency policy looks like:
```yaml
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
name: resiliency
scopes:
# optionally scope the policy to specific apps
spec:
policies:
timeouts:
# timeout policy definitions
retries:
# retry policy definitions
circuitBreakers:
# circuit breaker policy definitions
targets:
apps:
# apps and their applied policies here
actors:
# actor types and their applied policies here
components:
# components and their applied policies here
```
> Note: In selfhosted mode the resiliency policy must be named `resiliency.yaml` and reside in the components folder provided to the sidecar. In Kubernetes Dapr scans all resiliency policies.
### Complete Example Policy
TODO: What are the `general` retries and circuit breakers in this example? Are they provided by Dapr by default, or is the example just wrong?
```yaml
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
name: resiliency
# Like in the Subscriptions CRD, scopes lists the Dapr App IDs that this
# configuration applies to.
scopes:
- app1
- app2
spec:
policies:
# Timeouts are simple named durations.
timeouts:
general: 5s
important: 60s
largeResponse: 10s
# Retries are named templates for retry configurations and are instantiated for life of the operation.
retries:
pubsubRetry:
policy: constant
duration: 5s
maxRetries: 10
retryForever:
policy: exponential
maxInterval: 15s
maxRetries: -1 # Retry indefinitely
important:
policy: constant
duration: 5s
maxRetries: 30
someOperation:
policy: exponential
maxInterval: 15s
largeResponse:
policy: constant
duration: 5s
maxRetries: 3
# Circuit breakers are automatically instantiated per component and app endpoint.
# Circuit breakers maintain counters that can live as long as the Dapr sidecar.
circuitBreakers:
pubsubCB:
maxRequests: 1
interval: 8s
timeout: 45s
trip: consecutiveFailures > 8
# This section specifies policies for:
# * service invocation
# * requests to components
targets:
apps:
appB:
timeout: general
retry: important
# Circuit breakers for services are scoped per endpoint (e.g. hostname + port).
# When a breaker is tripped, that route is removed from load balancing for the configured `timeout` duration.
circuitBreaker: general
actors:
myActorType: # custom Actor Type Name
timeout: general
retry: important
# Circuit breakers for actors are scoped by type, id, or both.
# When a breaker is tripped, that type or id is removed from the placement table for the configured `timeout` duration.
circuitBreaker: general
circuitBreakerScope: both
circuitBreakerCacheSize: 5000
components:
# For state stores, policies apply to saving and retrieving state.
statestore1: # any component name -- happens to be a state store here
outbound:
timeout: general
retry: general
# Circuit breakers for components are scoped per component configuration/instance (e.g. redis1).
# When this breaker is tripped, all interaction to that component is prevented for the configured `timeout` duration.
circuitBreaker: general
pubsub1: # any component name -- happens to be a pubsub broker here
outbound:
retry: pubsubRetry
circuitBreaker: pubsubCB
pubsub2: # any component name -- happens to be another pubsub broker here
outbound:
retry: pubsubRetry
circuitBreaker: pubsubCB
inbound: # inbound only applies to delivery from sidecar to app
timeout: general
retry: general
circuitBreaker: general
```

View File

@ -0,0 +1,77 @@
---
type: docs
title: "Targets"
linkTitle: "Targets"
weight: 4500
description: "Apply resiliency policies for apps, components and actors"
---
### Targets
Targets are what policies are applied to. Dapr supports 3 targets apps, components and actors, which estentially covers all the Dapr Builing blocks, with the exception of observability. It's important to note that resiliency capabilities might differ between components as each target is handled differently and may already include resilient behavior, for example service invocation.
#### Apps
Allows applying of `retry`, `timeout` and `circuitbreaker` policies to service invocation calls to other Dapr apps. Dapr offers [built-in service invocation retries]({{<ref "service-invocation-overview.md#retries">}}), so any resiliency policies added additional.
The below diagram demonstrates how resiliency policies are service invocation work:
<img src="/images/resiliency_svc_invocation.png" width=800 alt="Diagram showing service invocation resiliency">
Example
```yaml
specs:
targets:
apps:
appB:
timeout: general
retry: general
circuitBreaker: general
```
#### Components
Allows applying of `retry`, `timeout` and `circuitbreaker` policies to components operations. Policy assignments are optional.
Policies can be applied for `outbound` operations (calls to the Dapr sidecar) or `inbound` (the sidecar calling your app). At this time, inbound only applies to PubSub and InputBinding components.
The below diagrams demonstrate how resiliency policies are applied to components:
<img src="/images/resiliency_outbound.png" width=800 alt="Diagram showing service invocation resiliency">
<img src="/images/resiliency_inbound.png" width=800 alt="Diagram showing service invocation resiliency">
Example
```yaml
spec:
targets:
components:
myPubsub:
outbound:
retry: pubsubRetry
circuitBreaker: pubsubCB
inbound: # inbound only applies to delivery from sidecar to app
timeout: general
retry: general
circuitBreaker: general
```
#### Actors
Allows applying of `retry`, `timeout` and `circuitbreaker` policies to actor operations. Policy assignments are optional.
When using a `circuitbreaker` policy, you can additionally specify whether circuit breaking state should be scoped to an invididual actor ID, to all actors across the actor type, or both. Specify `circuitBreakerScope` with values `id`, `type`, or `both`.
Additionally, you can specify a cache size for the number of circuit breakers to keep in memory. This can be done by specifying `circuitBreakerCacheSize` and providing an integer value, e.g. `5000`.
Example
```yaml
spec:
targets:
actors:
myActorType:
timeout: general
retry: general
circuitBreaker: general
circuitBreakerScope: both
circuitBreakerCacheSize: 5000
```

View File

@ -14,4 +14,4 @@ Preview features in Dapr are considered experimental when they are first release
| **Partition actor reminders** | Allows actor reminders to be partitioned across multiple keys in the underlying statestore in order to improve scale and performance. | `Actor.TypeMetadata` | [How-To: Partition Actor Reminders]({{< ref "howto-actors.md#partitioning-reminders" >}}) |
| **Pub/Sub routing** | Allow the use of expressions to route cloud events to different URIs/paths and event handlers in your application. | `PubSub.Routing` | [How-To: Publish a message and subscribe to a topic]({{<ref howto-route-messages>}}) |
| **ARM64 Mac Support** | Dapr CLI, sidecar, and Dashboard are now natively compiled for ARM64 Macs, along with Dapr CLI installation via Homebrew. | N/A | [Install the Dapr CLI]({{<ref install-dapr-cli>}}) |
| **Resiliency** | Allows configuring of fine-grained policies for retries, timeouts and circuitbreaking. | `Resiliency` | [Configure Resiliency Policies]({{<ref configure-policies>}}) |
| **Resiliency** | Allows configuring of fine-grained policies for retries, timeouts and circuitbreaking. | `Resiliency` | [Configure Resiliency Policies]({{<ref "resiliency-overview">}}) |

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB