mirror of https://github.com/istio/istio.io.git
214 lines
7.2 KiB
Markdown
214 lines
7.2 KiB
Markdown
---
|
|
title: Fault Injection
|
|
description: This task shows you how to inject faults to test the resiliency of your application.
|
|
weight: 20
|
|
keywords: [traffic-management,fault-injection]
|
|
aliases:
|
|
- /docs/tasks/fault-injection.html
|
|
owner: istio/wg-networking-maintainers
|
|
test: yes
|
|
---
|
|
|
|
This task shows you how to inject faults to test the resiliency of your application.
|
|
|
|
## Before you begin
|
|
|
|
* Set up Istio by following the instructions in the
|
|
[Installation guide](/docs/setup/).
|
|
|
|
* Deploy the [Bookinfo](/docs/examples/bookinfo/) sample application including the
|
|
[default destination rules](/docs/examples/bookinfo/#apply-default-destination-rules).
|
|
|
|
* Review the fault injection discussion in the
|
|
[Traffic Management](/docs/concepts/traffic-management) concepts doc.
|
|
|
|
* Apply application version routing by either performing the
|
|
[request routing](/docs/tasks/traffic-management/request-routing/) task or by
|
|
running the following commands:
|
|
|
|
{{< text bash >}}
|
|
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-all-v1.yaml@
|
|
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml@
|
|
{{< /text >}}
|
|
|
|
* With the above configuration, this is how requests flow:
|
|
* `productpage` → `reviews:v2` → `ratings` (only for user `jason`)
|
|
* `productpage` → `reviews:v1` (for everyone else)
|
|
|
|
## Injecting an HTTP delay fault
|
|
|
|
To test the Bookinfo application microservices for resiliency, inject a 7s delay
|
|
between the `reviews:v2` and `ratings` microservices for user `jason`. This test
|
|
will uncover a bug that was intentionally introduced into the Bookinfo app.
|
|
|
|
Note that the `reviews:v2` service has a 10s hard-coded connection timeout for
|
|
calls to the `ratings` service. Even with the 7s delay that you introduced, you
|
|
still expect the end-to-end flow to continue without any errors.
|
|
|
|
1. Create a fault injection rule to delay traffic coming from the test user
|
|
`jason`.
|
|
|
|
{{< text bash >}}
|
|
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml@
|
|
{{< /text >}}
|
|
|
|
1. Confirm the rule was created:
|
|
|
|
{{< text bash yaml >}}
|
|
$ kubectl get virtualservice ratings -o yaml
|
|
apiVersion: networking.istio.io/v1
|
|
kind: VirtualService
|
|
...
|
|
spec:
|
|
hosts:
|
|
- ratings
|
|
http:
|
|
- fault:
|
|
delay:
|
|
fixedDelay: 7s
|
|
percentage:
|
|
value: 100
|
|
match:
|
|
- headers:
|
|
end-user:
|
|
exact: jason
|
|
route:
|
|
- destination:
|
|
host: ratings
|
|
subset: v1
|
|
- route:
|
|
- destination:
|
|
host: ratings
|
|
subset: v1
|
|
{{< /text >}}
|
|
|
|
Allow several seconds for the new rule to propagate to all pods.
|
|
|
|
## Testing the delay configuration
|
|
|
|
1. Open the [Bookinfo](/docs/examples/bookinfo) web application in your browser.
|
|
|
|
1. On the `/productpage` web page, log in as user `jason`.
|
|
|
|
You expect the Bookinfo home page to load without errors in approximately
|
|
7 seconds. However, there is a problem: the Reviews section displays an error
|
|
message:
|
|
|
|
{{< text plain >}}
|
|
Sorry, product reviews are currently unavailable for this book.
|
|
{{< /text >}}
|
|
|
|
1. View the web page response times:
|
|
|
|
1. Open the *Developer Tools* menu in you web browser.
|
|
1. Open the Network tab
|
|
1. Reload the `/productpage` web page. You will see that the page actually loads in about 6 seconds.
|
|
|
|
## Understanding what happened
|
|
|
|
You've found a bug. There are hard-coded timeouts in the microservices that have
|
|
caused the `reviews` service to fail.
|
|
|
|
As expected, the 7s delay you introduced doesn't affect the `reviews` service
|
|
because the timeout between the `reviews` and `ratings` service is hard-coded at 10s.
|
|
However, there is also a hard-coded timeout between the `productpage` and the `reviews` service,
|
|
coded as 3s + 1 retry for 6s total.
|
|
As a result, the `productpage` call to `reviews` times out prematurely and throws an error after 6s.
|
|
|
|
Bugs like this can occur in typical enterprise applications where different teams
|
|
develop different microservices independently. Istio's fault injection rules help you identify such anomalies
|
|
without impacting end users.
|
|
|
|
{{< tip >}}
|
|
Notice that the fault injection test is restricted to when the logged in user is
|
|
`jason`. If you log in as any other user, you will not experience any delays.
|
|
{{< /tip >}}
|
|
|
|
## Fixing the bug
|
|
|
|
You would normally fix the problem by:
|
|
|
|
1. Either increasing the `productpage` to `reviews` service timeout or decreasing the `reviews` to `ratings` timeout
|
|
1. Stopping and restarting the fixed microservice
|
|
1. Confirming that the `/productpage` web page returns its response without any errors.
|
|
|
|
However, you already have a fix running in v3 of the `reviews` service.
|
|
The `reviews:v3` service reduces the `reviews` to `ratings` timeout from 10s to 2.5s
|
|
so that it is compatible with (less than) the timeout of the downstream `productpage` requests.
|
|
|
|
If you migrate all traffic to `reviews:v3` as described in the
|
|
[traffic shifting](/docs/tasks/traffic-management/traffic-shifting/) task, you can then
|
|
try to change the delay rule to any amount less than 2.5s, for example 2s, and confirm
|
|
that the end-to-end flow continues without any errors.
|
|
|
|
## Injecting an HTTP abort fault
|
|
|
|
Another way to test microservice resiliency is to introduce an HTTP abort fault.
|
|
In this task, you will introduce an HTTP abort to the `ratings` microservices for
|
|
the test user `jason`.
|
|
|
|
In this case, you expect the page to load immediately and display the `Ratings
|
|
service is currently unavailable` message.
|
|
|
|
1. Create a fault injection rule to send an HTTP abort for user `jason`:
|
|
|
|
{{< text bash >}}
|
|
$ kubectl apply -f @samples/bookinfo/networking/virtual-service-ratings-test-abort.yaml@
|
|
{{< /text >}}
|
|
|
|
1. Confirm the rule was created:
|
|
|
|
{{< text bash yaml >}}
|
|
$ kubectl get virtualservice ratings -o yaml
|
|
apiVersion: networking.istio.io/v1
|
|
kind: VirtualService
|
|
...
|
|
spec:
|
|
hosts:
|
|
- ratings
|
|
http:
|
|
- fault:
|
|
abort:
|
|
httpStatus: 500
|
|
percentage:
|
|
value: 100
|
|
match:
|
|
- headers:
|
|
end-user:
|
|
exact: jason
|
|
route:
|
|
- destination:
|
|
host: ratings
|
|
subset: v1
|
|
- route:
|
|
- destination:
|
|
host: ratings
|
|
subset: v1
|
|
{{< /text >}}
|
|
|
|
## Testing the abort configuration
|
|
|
|
1. Open the [Bookinfo](/docs/examples/bookinfo) web application in your browser.
|
|
|
|
1. On the `/productpage`, log in as user `jason`.
|
|
|
|
If the rule propagated successfully to all pods, the page loads
|
|
immediately and the `Ratings service is currently unavailable` message appears.
|
|
|
|
1. If you log out from user `jason` or open the Bookinfo application in an anonymous
|
|
window (or in another browser), you will see that `/productpage` still calls `reviews:v1`
|
|
(which does not call `ratings` at all) for everybody but `jason`. Therefore you
|
|
will not see any error message.
|
|
|
|
## Cleanup
|
|
|
|
1. Remove the application routing rules:
|
|
|
|
{{< text bash >}}
|
|
$ kubectl delete -f @samples/bookinfo/networking/virtual-service-all-v1.yaml@
|
|
{{< /text >}}
|
|
|
|
1. If you are not planning to explore any follow-on tasks, refer to the
|
|
[Bookinfo cleanup](/docs/examples/bookinfo/#cleanup) instructions
|
|
to shutdown the application.
|