istio.io/content/docs/tasks/traffic-management/fault-injection.md

176 lines
6.3 KiB
Markdown

---
title: Fault Injection
description: This task shows how to inject delays and test the resiliency of your application.
weight: 20
aliases:
- /docs/tasks/fault-injection.html
---
> Note: This task uses the new [v1alpha3 traffic management API](/blog/2018/v1alpha3-routing/). The old API has been deprecated and will be removed in the next Istio release. If you need to use the old version, follow the docs [here](https://archive.istio.io/v0.7/docs/tasks/traffic-management/).
This task shows how to inject delays and test the resiliency of your application.
## Before you begin
* Setup Istio by following the instructions in the
[Installation guide](/docs/setup/).
* Deploy the [Bookinfo](/docs/guides/bookinfo/) sample application.
* Initialize the application version routing by either first doing the
[request routing](/docs/tasks/traffic-management/request-routing/) task or by running following
commands:
```command
$ istioctl create -f @samples/bookinfo/routing/route-rule-all-v1.yaml@
$ istioctl replace -f @samples/bookinfo/routing/route-rule-reviews-test-v2.yaml@
```
# Fault injection
## Fault injection using HTTP delay
To test our Bookinfo application microservices for resiliency, we will _inject a 7s delay_
between the reviews:v2 and ratings microservices, for user "jason". Since the _reviews:v2_ service has a
10s hard-coded connection timeout for its calls to the ratings service, we expect the end-to-end flow to
continue without any errors.
1. Create a fault injection rule to delay traffic coming from user "jason" (our test user)
```command
$ istioctl replace -f @samples/bookinfo/routing/route-rule-ratings-test-delay.yaml@
```
Confirm the rule is created:
```command-output-as-yaml
$ istioctl get virtualservice ratings -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
...
spec:
hosts:
- ratings
http:
- fault:
delay:
fixedDelay: 7s
percent: 100
match:
- headers:
cookie:
regex: ^(.*?;)?(user=jason)(;.*)?$
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
```
Allow several seconds to account for rule propagation delay to all pods.
1. Observe application behavior
Log in as user "jason". If the application's front page was set to correctly handle delays, we expect it
to load within approximately 7 seconds. To see the web page response times, open the
*Developer Tools* menu in IE, Chrome or Firefox (typically, key combination _Ctrl+Shift+I_
or _Alt+Cmd+I_), tab Network, and reload the `productpage` web page.
You will see that the webpage loads in about 6 seconds. The reviews section will show
*Sorry, product reviews are currently unavailable for this book*.
## Understanding what happened
The reason that the entire reviews service has failed is because our Bookinfo application
has a bug. The timeout between the productpage and reviews service is less (3s + 1 retry = 6s total)
than the timeout between the reviews and ratings service (hard-coded connection timeout is 10s). These
kinds of bugs can occur in typical enterprise applications where different teams develop different
microservices independently. Istio's fault injection rules help you identify such anomalies without
impacting end users.
> Notice that we are restricting the failure impact to user "jason" only. If you login
> as any other user, you would not experience any delays.
**Fixing the bug:** At this point we would normally fix the problem by either increasing the
productpage timeout or decreasing the reviews to ratings service timeout,
terminate and restart the fixed microservice, and then confirm that the `productpage`
returns its response without any errors.
However, we already have this fix running in v3 of the reviews service, so we can simply
fix the problem by migrating all
traffic to `reviews:v3` as described in the
[traffic shifting](/docs/tasks/traffic-management/traffic-shifting/) task.
(Left as an exercise for the reader - change the delay rule to
use a 2.8 second delay and then run it against the v3 version of reviews.)
## Fault injection using HTTP Abort
As another test of resiliency, we will introduce an HTTP abort to the ratings microservices for the user "jason".
We expect the page to load immediately unlike the delay example and display the "product ratings not available"
message.
1. Create a fault injection rule to send an HTTP abort for user "jason"
```command
$ istioctl replace -f @samples/bookinfo/routing/route-rule-ratings-test-abort.yaml@
```
Confirm the rule is created
```command-output-as-yaml
$ istioctl get virtualservice ratings -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
...
spec:
hosts:
- ratings
http:
- fault:
abort:
httpStatus: 500
percent: 100
match:
- headers:
cookie:
regex: ^(.*?;)?(user=jason)(;.*)?$
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
```
1. Observe application behavior
Login as user "jason". If the rule propagated successfully to all pods, you should see the page load
immediately with the "product ratings not available" message. Logout from user "jason" and you should
see reviews with rating stars show up successfully on the productpage web page.
## Cleanup
* Remove the application routing rules:
```command
$ istioctl delete -f @samples/bookinfo/routing/route-rule-all-v1.yaml@
```
* If you are not planning to explore any follow-on tasks, refer to the
[Bookinfo cleanup](/docs/guides/bookinfo/#cleanup) instructions
to shutdown the application.
## What's next
* Learn more about [fault injection](/docs/concepts/traffic-management/fault-injection/).