website/docs/status-check-in-workflow.md

9.2 KiB

title
Status Check in Workflow

In Workflow, the status check could execute specified operations on external systems, such as application systems and monitoring systems, to obtain their statuses, and automatically abort the Workflow when it finds the system is unhealthy. The concept is similar to Container Probes in Kubernetes. This article describes how to execute status checks in Workflow using YAML files.

:::note

Chaos Mesh does not yet support creating StatusCheck nodes on Chaos Dashboard, so you could only create StatusCheck nodes using YAML for now.

:::

Status Check type

Chaos Mesh only support the HTTP type to execute a status check.

Define a HTTP StatusCheck node

A StatusCheck node sends GET or POST HTTP requests to the specific URL, with custom headers and body, and then determines the result of the request by the conditions in the criteria field.

- name: workflow-status-check
  templateType: StatusCheck
  deadline: 20s
  statusCheck:
    mode: Continuous
    type: HTTP
    intervalSeconds: 1
    timeoutSeconds: 1
    http:
      url: http://123.123.123.123
      method: GET
      criteria:
        statusCode: '200'

In the configuration, you can see a StatusCheck node with HTTP type. The deadline field specifies that this node could be executed for a maximum of 20 seconds. The mode field specifies that this node will execute status checks continuously. The intervalSeconds field specifies a repetition interval of 1 second. The timeoutSeconds field specifies the timeout for each execution.

When Workflow runs to this StatusCheck node, the specified status check would be executed every second. The status check uses the GET method to send an HTTP request to the URL http://123.123.123.123. If the response is returned within 1 second and the status code is 200, this execution succeeds, otherwise it fails.

Status Check results

Each execution of the status check will get an execution result, either Success or Failure. Because a single execution result may not reflect the real situation of the system, due to fluctuations in certain conditions, the final status check result is not determined based on a single execution result.

The StatusCheck node has failureThreshold and successThreshold two fields:

  • When the number of consecutive failed execution results exceeds the failureThreshold, the status check result is considered to be a Failure.
  • When the number of consecutive successful execution results exceeds the successThreshold, the status check result is considered to be a Success.
- name: workflow-status-check
  templateType: StatusCheck
  deadline: 20s
  statusCheck:
    mode: Continuous
    type: HTTP
    successThreshold: 1
    failureThreshold: 3
    http:
      url: http://123.123.123.123
      method: GET
      criteria:
        statusCode: '200'

In the configuration, the StatusCheck node will execute status checks continuously:

  • When the execution result is Success for 1 or more consecutive times, the status check result is considered to be a Success.
  • When the execution result is Failure for 3 or more consecutive times, the status check result is considered to be a Failure.

:::note

In the following sections, status check fails refers to that status check result is Failure, rather than a single execution result is Failure.

:::

When the Status Check is unsuccessful, abort the Workflow

:::note

The StatusCheck node only supports aborting the workflow automatically when the status check is unsuccessful. It could not pause or resume the workflow.

:::

When executing chaos experiments, the application system might become unhealthy, this function can be used to restore the application system by quickly ending chaos experiments. To enable the workflow to abort automatically when the status check fails, you can set the abortWithStatusCheck field to true on the StatusCheck node.

- name: workflow-status-check
  templateType: StatusCheck
  deadline: 20s
  abortWithStatusCheck: true
  statusCheck:
    mode: Continuous
    type: HTTP
    http:
      url: http://123.123.123.123
      method: GET
      criteria:
        statusCode: '200'

The status check is considered unsuccessful when any of the following conditions are met:

  • The status check fails.
  • When the StatusCheck node timeout is exceeded, and the status check result is not successful. For example, successThreshold is 1, failureThreshold is 3, and when the timeout is exceeded, there are 2 consecutive failures and 0 successes. Although it does not meet the condition for "status check fails", it is also considered to be unsuccessful in this case.

Status Check mode

Continuous Status Check

When the mode field is Continuous, it means this StatusCheck node will execute status checks continuously until the node times out or the status check fails.

- name: workflow-status-check
  templateType: StatusCheck
  deadline: 20s
  statusCheck:
    mode: Continuous
    type: HTTP
    intervalSeconds: 1
    successThreshold: 1
    failureThreshold: 3
    http:
      url: http://123.123.123.123
      method: GET
      criteria:
        statusCode: '200'

In the configuration, the StatusCheck node will execute status checks every second, and exit when any of the following conditions are met:

  • The status check fails, i.e. 3 or more consecutive failed execution results
  • Trigger the node timeout after 20 seconds

One time Status Check

When the mode field is Synchronous, it means that this StatusCheck node will exit immediately when the status check result is clear, or when the node times out.

- name: workflow-status-check
  templateType: StatusCheck
  deadline: 20s
  statusCheck:
    mode: Synchronous
    type: HTTP
    intervalSeconds: 1
    successThreshold: 1
    failureThreshold: 3
    http:
      url: http://123.123.123.123
      method: GET
      criteria:
        statusCode: '200'

In the configuration, the StatusCheck node will execute status checks every second, and exit when any of the following conditions are met:

  • The status check succeeds, i.e. 1 or more consecutive successful execution results
  • The status check fails, i.e. 3 or more consecutive failed execution results
  • Trigger the node timeout after 20 seconds

StatusCheck vs HTTP Request Task

Similarities:

  • The StatusCheck node and the HTTP Request Task node (the Task node that executes HTTP requests) are a node type of Workflow.
  • The StatusCheck node and the HTTP Request Task node can obtain information about external systems through HTTP requests.

Differences:

  • The HTTP Request Task node could only send an HTTP once, and cannot send HTTP requests continuously.
  • The HTTP Request Task node cannot affect the status of the workflow when the request fails, such as aborting the workflow.

Field description

For more information about Workflow and Template, refer to Create Chaos Mesh Workflow.

StatusCheck field description

Parameter Type Description Default value Required Example
mode string The execution mode of the status check. Support value: Synchronous/Continuous. None Yes Synchronous
type string The type of the status check. Support value: HTTP. HTTP Yes HTTP
duration string The duration of the whole status check if the number of failed execution does not exceed the failureThreshold. It is available in both Synchronous and Continuous modes. None No 100s
timeoutSeconds int The timeout seconds when the status check fails. 1 No 1
intervalSeconds int Defines how often (in seconds) to perform an execution of status check. 1 No 1
failureThreshold int The minimum consecutive failure for the status check to be considered failed. 3 No 3
successThreshold int The minimum consecutive successes for the status check to be considered successful. 1 No 1
recordsHistoryLimit int The number of records to retain. 100 No 100
http HTTPStatusCheck Configure the detail of the HTTP request to execute. None No

HTTPStatusCheck field description

Parameter Type Description Default value Required Example
url string The URL of the HTTP request. None Yes http://123.123.123.123
method string The method of the HTTP request. Support value: GET/POST. GET No GET
headers map[string][]string The headers of the HTTP request. None No
body string The body of the HTTP request. None No {"a":"b"}
criteria HTTPCriteria Defines how to determine the result of the HTTP StatusCheck. None Yes

HTTPCriteria field description

Parameter Type Description Default value Required Example
statusCode string The expected http status code for the request. A statusCode string could be a single code (e.g. 200), or an inclusive range (e.g. 200-400, both 200 and 400 are included). None Yes 200